Poetiq’s Solution Validated on Semi-Private Test Set
On November 20, 2025,
we
announced strong preliminary performance on the ARC-AGI-2 public evaluation. Today, ARC
Prize has
officially verified our
results. We are proud to confirm that our system has officially outperformed existing
methods, establishing a new state-of-the-art by a significant margin. Our novel approach of learned
test time reasoning is the first to break through the 50% barrier and solve a majority of the
problems.
Figure 2: Poetiq’s system tops the table of official ARC-AGI-2 Semi-Private Test Set results. Box added for emphasis.
Poetiq's systems establish an entirely new
Pareto frontier on the public ARC-AGI-2 set, surpassing previous results
and pushing the boundary for what is possible in cost-effective reasoning. We publicly released one
of our pure Gemini-based configurations for official evaluation. The ARC Prize Team evaluated our
open-source
ARC-AGI
solver on the
Semi-Private Test Set and reported 54% at $30.57 per problem. The
previous best score of 45% was set by Gemini 3 Deep Think and cost $77.16 per problem.
We achieved this result using Poetiq’s meta-system to optimize all parts of our solution (see our
previous blog post for
more details). The flexibility of our meta-system allowed us to achieve this within hours of Gemini
3’s release. At Poetiq, we do not need to build, or even fine-tune, our own large frontier models.
Our meta-system is designed to automatically
create full systems that solve specific tasks
by utilizing any existing frontier model. In our first public demonstration, we focused on ARC-AGI
using Gemini 3.
- Our meta-system improves with every task that it solves by learning how the task was
solved. For this, diversity of tasks is crucial. To this end, we’re using our system to
address a number of benchmarks, spanning a variety of different reasoning and retrieval
tasks. Stay tuned.
- We play well with others. Our system can be used to optimize AI components inside existing,
larger systems. Look for this in the near future.
- Can we solve long horizon tasks by leveraging the wealth of world knowledge already
present in frontier models without resorting to updating the models themselves? If we can
transform the underlying knowledge extraction mechanisms to be just a bit more LLM friendly,
maybe we can get away with no model tuning at all. Wouldn’t that be something?
The Poetiq meta-system is built to handle complex real-world problems that frontier models struggle
to solve. We’re working with early partners now. If you want to discuss how Poetiq can help with
your company’s
AI challenges, contact us at
poetiq@poetiq.ai
.
Poetiq is a lean, deeply technical team of 6 researchers and engineers with a combined 53 years of
experience from Google DeepMind. We're focused on solving the fundamental problems of AI reasoning
and knowledge extraction in the presence of noise and uncertainty. Want to join us?
Check out our open positions.