Despite years of industrial and academic research, most approaches to Recursive
Self-Improvement are still too slow and expensive to be practical; each step of
self-improvement requires extensive training of LLMs (or even retraining them from
scratch!) This does not scale. Poetiq is building intelligent systems that are able to
improve themselves directly. This works now. And it is fast. It’s fast enough that in
a few months, we will be offering it for free. It will enable us, and you, to create
reliable reasoning systems to solve real world, practical, problems and workflows that
businesses face everyday.
We’ve used RL for years. But, RL post training has severe limitations. It's slow,
expensive, and requires millions of data points. These limitations make effective RL
training impractical for all but a handful of a few lucky companies. Already, we see
reasoning models struggling with problems outside of coding and math. Why do these
domains work well? In both of these domains, it’s possible to generate large amounts
of synthetic data cheaply, crucial for RL. But that’s not true for most domains.
Our approach allows us to find effective task-specific reasoning strategies using much
less data (hundreds of data points, rather than millions), while being compatible with
the LLMs you’re already using.
LLMs are amazing — LLMs are amazing databases. They contain much of humanity’s
digitized knowledge. But if you use them naively, you will not reliably get access to
all of their knowledge. And it’s not just prompt optimization. You have to know what
questions to ask, not just how to ask them. You have to synthesize the pieces of
information that they reveal, piece by piece. The information is in there, but it’s in
fragments that must be put together to reveal the complete answer - whether it’s a
fact, an algorithm, or an insight. We are building the intelligence on-top of LLMs to
extract their hidden information.
Self improvement is how we do it quickly. The more problems we tackle, the better our
system is at tackling the next one. Our systems are continually improving. They adapt
to the underlying models’ information storage mechanisms (e.g. their quirks) and
figure out how to best extract and synthesize the information each contains. We do not
build our intelligence into the LLMs - but rather build a complete intelligent
eco-system around them that harnesses what the LLMs contain for better reasoning and
problem solving. And, soon, we will be doing it automatically.