Google DeepMind Plans to Track AGI Progress With These 10 Traits of General Intelligence

Few terms are as closely associated with AI hype as artificial general intelligence, or AGI. But Google DeepMind researchers have now proposed a framework that could more concretely measure how close models are to this tech industry holy grail.

Artificial general intelligence refers to a mythical AI system that can match the general and highly adaptable form of intelligence found in humans. As the number of tasks that large language models can tackle has rocketed in recent years, there’s been a growing chorus of voices suggesting the technology is creeping ever closer to this threshold.

But so far, there’s been no clear way to assess progress toward AGI, leaving plenty of room for speculation and exaggeration. To address this gap, a team from Google DeepMind has introduced a new cognitively inspired framework that deconstructs general intelligence into 10 key faculties. More importantly, they propose a way to evaluate AI systems across these key capabilities and compare their performance to humans.

“Despite widespread discussion of AGI, there is no clear framework for measuring progress toward it. This ambiguity fuels subjective claims, makes it difficult to track progress, and risks hindering responsible governance,” the researchers write in a paper outlining their new approach. “We hope this framework will provide a practical roadmap and an initial step toward more rigorous, empirical evaluation of AGI.”

This isn’t DeepMind’s first attempt to clarify the term. In 2023, the company proposed separating AI systems into different levels of capability, in much the same way self-driving systems are categorized.

But the approach didn’t really propose a way to measure what level AI systems have reached. The new framework goes further by building a firmer conceptual footing for the key aspects underpinning model performance and a practical way to evaluate and compare systems.

Digging through decades of research in psychology, neuroscience, and cognitive science, the researchers identify eight basic cognitive building blocks that they say make up general intelligence.

These include the perception of sensory inputs and generation of outputs like text, speech, or actions. Add to those learning, memory, reasoning, and the ability to focus attention on specific information or tasks. Rounding out the list are metacognition—or the ability to reason about and control your own mental processes—and so-called executive functions, like planning and the inhibition of impulses.

The researchers also outline two “composite faculties” that require several building blocks to be applied together. These are problem solving and social cognition, which refers to the ability to understand and react appropriately to the social context.

To judge how well AI systems perform on each measure, the researchers suggest subjecting them to a broad suite of cognitive evaluations that target each specific ability. They also propose collecting human baselines for each task. This would involve asking a demographically representative sample of adults with at least a high school education to complete them under identical conditions.

The results of these tests can then be combined to create “cognitive profiles” that give a sense of a model’s strengths and weaknesses. And by comparing the results against the human baselines, it should be possible to determine when a system matches or surpasses the general intelligence of an average person.

Crucially, the framework focuses on what a system can do rather than how it does it, which means the evaluation is agnostic about the underlying technology. However, the researchers concede that there is currently no good way to measure many of the core cognitive capabilities identified.

While there are already well-established benchmarks for faculties like problem solving and perception, there are no reliable tests for things like metacognition, attention, learning, and social cognition. In addition, many of the best benchmarks are public, which means the testing criteria are easily accessible and may have already been included in model training data. So the authors say they’re working with academics to build more robust, non-public evaluations to fill the gaps.

How useful the new framework will be depends on several factors. First, it remains to be seen whether the criteria identified by the DeepMind team truly capture the essence of human general intelligence. Second, they need to prove that acing this test actually leads to better performance on practical problems compared to narrower, specialist AI systems.

But considering the hand-waving nature of the debate around AGI so far, any framework grounded in well-established cognitive theory and rigorous evaluation represents a significant step forward.

Source link