Poetiq Shatters ARC-AGI-2 State of the Art at Half the Cost

ARC-AGI-2 Leaderboard showing Poetiq's SOTA performance

Figure 1: Poetiq’s system establishes a new state-of-the-art, delivering 54% accuracy at less than half the cost of the previous best on the ARC-AGI-2 Semi-Private Eval Set.

Sources: Semi-Private Eval results are from ARC-AGI official website (semi-private set results)

Official Verification

Poetiq’s Solution Validated on Semi-Private Test Set

On November 20, 2025, we announced strong preliminary performance on the ARC-AGI-2 public evaluation. Today, ARC Prize has officially verified our results. We are proud to confirm that our system has officially outperformed existing methods, establishing a new state-of-the-art by a significant margin. Our novel approach of learned test time reasoning is the first to break through the 50% barrier and solve a majority of the problems.

Figure 2: Poetiq’s system tops the table of official ARC-AGI-2 Semi-Private Test Set results. Box added for emphasis.

Sources: Table from ARC-AGI official leaderboard

Poetiq's systems establish an entirely new Pareto frontier on the public ARC-AGI-2 set, surpassing previous results and pushing the boundary for what is possible in cost-effective reasoning. We publicly released one of our pure Gemini-based configurations for official evaluation. The ARC Prize Team evaluated our open-source ARC-AGI solver on the Semi-Private Test Set and reported 54% at $30.57 per problem. The previous best score of 45% was set by Gemini 3 Deep Think and cost $77.16 per problem.

We achieved this result using Poetiq’s meta-system to optimize all parts of our solution (see our previous blog post for more details). The flexibility of our meta-system allowed us to achieve this within hours of Gemini 3’s release. At Poetiq, we do not need to build, or even fine-tune, our own large frontier models. Our meta-system is designed to automatically create full systems that solve specific tasks by utilizing any existing frontier model. In our first public demonstration, we focused on ARC-AGI using Gemini 3.

What’s Next?

Our meta-system improves with every task that it solves by learning how the task was solved. For this, diversity of tasks is crucial. To this end, we’re using our system to address a number of benchmarks, spanning a variety of different reasoning and retrieval tasks. Stay tuned.
We play well with others. Our system can be used to optimize AI components inside existing, larger systems. Look for this in the near future.
Can we solve long horizon tasks by leveraging the wealth of world knowledge already present in frontier models without resorting to updating the models themselves? If we can transform the underlying knowledge extraction mechanisms to be just a bit more LLM friendly, maybe we can get away with no model tuning at all. Wouldn’t that be something?

Use Poetiq Yourself

The Poetiq meta-system is built to handle complex real-world problems that frontier models struggle to solve. We’re working with early partners now. If you want to discuss how Poetiq can help with your company’s AI challenges, contact us at

poetiq@poetiq.ai

Join Our Team!

Poetiq is a lean, deeply technical team of 6 researchers and engineers with a combined 53 years of experience from Google DeepMind. We're focused on solving the fundamental problems of AI reasoning and knowledge extraction in the presence of noise and uncertainty. Want to join us? Check out our open positions.

get in touch

hello@poetiq.ai