Recently, a prominent Silicon Valley drama took place -- the OpenAI CEO, Sam Altman, was fired by his board and rehired after pressure from Microsoft and OpenAI employees. Employees allegedly threatened to leave the company if Altman was not reinstated. Microsoft assisted with handling the crisis and returning Altman to his CEO role. I won't go into the details of the drama but I will provide you with a summary card below that covers my analysis of this saga.
As this unfolded on Twitter, gossip emerged that a specific OpenAI development had concerned the board. They allegedly believed Altman needed to be more truthful about the state of progress toward AGI (artificial general intelligence) within the company. This led to speculation and conspiracy theories on Twitter, as often happens with high-profile industry drama.
One theory pointed to OpenAI's advancements with an algorithm called Q*. Some suggested Q* allowed internal LLMs (large language models) to perform basic math, seemingly bringing OpenAI closer to more advanced AI. In this post, I'll explain what Q* is and why its advancements could theoretically bring AI systems closer to goals like AGI.
What is Q*?
In simple terms, Q* is like a GPS that learns over time. Usually, when there's traffic or an accident, your GPS doesn't know and tries to lead you to the usual route, which gets stuck. So, you wait for it to recalculate a new path fully. What if your GPS started remembering problems and closures so that next time, it already knows alternate routes? That's what Q* does.
Whenever Q* searches for solutions, like alternate directions, it remembers what it tried before. This guides future searches. So if something changes along a route, Q* doesn't restart like a GPS recalculating. It knows most of the road and can focus only on adjusting the tricky, different parts.
This reuse makes Q* get answers faster than restarting every time. It "learns" from experience, like you learning backroad ways around town. The more Q* is used, the better it adapts to typical area changes.
Here is a more technical explanation:
Q* is an influential algorithm in AI for search and pathfinding. Q* extends the A* search algorithm. It improves A* by reusing previous search efforts even as the environment changes. This makes it efficient for searches in dynamic environments. Like A*, Q* uses a heuristic function to guide its search toward the goal. It balances exploiting promising areas (the heuristic) with exploring new areas (like breadth-first search). Q* leverages experience from previous searches to create a reusable graph/tree of surveyed states.
This significantly speeds up future searches rather than starting fresh each time. As the environment changes, Q* updates its reusable structure to reflect changes rather than discarding it.
This allows reusing valid parts and only researching affected areas. Q* is famously used for robot path planning, manufacturing, and video games where environments frequently change. It allows agents to replan paths as needed efficiently.
In summary, Q* efficiently finds solutions in systems where the state space and operators change over time by reusing experience. It can discover solutions much faster than restarting the search from scratch.
So, in the context of the rumors about OpenAI, some hypothesize that advances leveraging Q* search techniques could allow AI and machine learning models to more rapidly explore complex spaces like mathematics. Rather than re-exploring basic rules from scratch, models might leverage prior search "experience" and heuristics to guide discovery. This could unlock new abilities and general skills.
However, whether OpenAI has made such advances leveraging Q* or algorithms like it is speculative. The details are vague, and rumors should be critically examined before conclusions are drawn. But Q* illustrates interesting AI capabilities applicable in various domains. And it hints at future systems that may learn and adapt more and more like humans.