Poetiq Shatters ARC-AGI-2 State of the Art at Half the Cost
We are proud to confirm that our system has officially outperformed existing methods,
establishing a new state-of-the-art by a significant margin.
December 5, 2025
Figure 1: Poetiq’s system
establishes a new state-of-the-art, delivering 54% accuracy at less than half the
cost of the previous best on the ARC-AGI-2 Semi-Private Eval Set.
Poetiq’s Solution Validated on Semi-Private Test Set
On November 20, 2025,
we announced
strong preliminary performance on the ARC-AGI-2 public evaluation. Today, ARC Prize has
officially verified our results. We are proud to confirm that our system has officially outperformed existing methods,
establishing a new state-of-the-art by a significant margin. Our novel approach of learned
test time reasoning is the first to break through the 50% barrier and solve a majority of
the problems.
Figure 2: Poetiq’s system tops the table of official ARC-AGI-2 Semi-Private Test Set
results. Box added for emphasis.
Poetiq's systems establish an entirely new
Pareto frontier
on the public ARC-AGI-2 set, surpassing previous results and pushing the boundary for what is
possible in cost-effective reasoning. We publicly released one of our pure Gemini-based configurations
for official evaluation. The ARC Prize Team evaluated our open-source
ARC-AGI solver
on the Semi-Private Test Set and reported 54% at $30.57 per problem. The previous
best score of 45% was set by Gemini 3 Deep Think and cost $77.16 per problem.
We achieved this result using Poetiq’s meta-system to optimize all parts of our solution
(see our
previous blog post
for more details). The flexibility of our meta-system allowed us to achieve this within hours
of Gemini 3’s release. At Poetiq, we do not need to build, or even fine-tune, our own large frontier
models. Our meta-system is designed to automatically
create full systems that solve specific tasks by utilizing any existing frontier model.
In our first public demonstration, we focused on ARC-AGI using Gemini 3.
What’s Next?
Our meta-system improves with every task that it solves by learning how the task was
solved. For this, diversity of tasks is crucial. To this end, we’re using our system to address
a number of benchmarks, spanning a variety of different reasoning and retrieval tasks. Stay
tuned.
We play well with others. Our system can be used to optimize AI components inside
existing, larger systems. Look for this in the near future.
Can we solve long horizon tasks by leveraging the wealth of world knowledge already
present in frontier models without resorting to updating the models themselves? If we
can transform the underlying knowledge extraction mechanisms to be just a bit more LLM
friendly, maybe we can get away with no model tuning at all. Wouldn’t that be something?
Use Poetiq Yourself
The Poetiq meta-system is built to handle complex real-world problems that frontier models
struggle to solve. We’re working with early partners now. If you want to discuss how Poetiq
can help with your company’s AI challenges, contact us at
poetiq@poetiq.ai
.
Join Our Team!
Poetiq is a lean, deeply technical team of 6 researchers and engineers with a combined 53
years of experience from Google DeepMind. We're focused on solving the fundamental problems
of AI reasoning and knowledge extraction in the presence of noise and uncertainty. Want to
join us?
Check out our open positions.