Artificial intelligence (AI) has seen tremendous progress recently, with systems like ChatGPT demonstrating impressive language abilities. However, current AI still falls short of human-level intelligence in many ways. So how close are we to developing accurate artificial general intelligence (AGI) - AI that can perform any intellectual task a human can?
A new paper from researchers at Google DeepMind proposes a framework for classifying and benchmarking progress towards AGI. The core idea is to evaluate AI systems based on their performance across diverse tasks, not just narrow capabilities like conversing or writing code. This allows us to understand how general vs specialized current AI systems are and track advancements in generality over time.
Why do we need a framework for thinking about AGI? Firstly, "AI" has become an overloaded term, often synonymously with "AGI," when systems are still far from human-level abilities. A clear framework helps set realistic expectations. Secondly, shared definitions enable the AI community to align on goals, measure progress, and identify risks at each stage. Lastly, policymakers need actionable advice on regulating AI; a nuanced, staged understanding of AGI is more valuable than considering it a single endpoint.
Levels of AGI
The paper introduces "Levels of AGI" - a scale for classifying AI based on performance across various tasks. The levels range from 0 (Narrow non-AI) to 5 (Artificial Superintelligence exceeding all human abilities).
Within each level, systems can be categorized as either Narrow AI (specialized for a specific task) or General AI (able to perform well across many tasks). For instance, ChatGPT would be considered a Level 1 General AI ("Emerging AGI") - it can converse about many topics but makes frequent mistakes. Google's AlphaFold protein folding system is Level 5 Narrow AI ("Superhuman Narrow AI") - it far exceeds human abilities on its specialized task.
Higher levels correspond to increasing depth (performance quality) and breadth (generality) of capabilities. The authors emphasize that progress may be uneven - systems may "leapfrog" to higher generality before reaching peak performance. But both dimensions are needed to achieve more advanced AGI.
Principles for Defining AGI
In developing their framework for levels of AGI, the researchers identified six fundamental principles for defining artificial general intelligence in a robust, measurable way:
- AGI should be evaluated based on system capabilities rather than internal mechanisms.
- Both performance and generality must be separately measured, with performance indicating how well an AI accomplishes tasks and generality indicating the breadth of tasks it can handle.
- The focus should be on assessing cognitive abilities like reasoning rather than physical skills.
- An AI's capabilities should be evaluated based on its potential rather than deployment status.
- Benchmarking should utilize ecologically valid real-world tasks that reflect skills people authentically value rather than convenient proxy tasks.
- AGI should be thought of in terms of progressive levels rather than as a single endpoint to better track advancement and associated risks.
By following these principles, the levels of AGI aim to provide a definition and measurement framework to enable calibrated progress in developing beneficial AI systems.
Testing AGI Capabilities
The paper argues that shared benchmarks are needed to objectively evaluate where AI systems fall on the levels of AGI. These benchmarks should meet the above principles - assessing performance on a wide range of real-world cognitive tasks humans care about.
Rather than a static set of tests, the authors propose a "living benchmark" that grows over time as humans identify new ways to demonstrate intelligence. Even complicated open-ended tasks like understanding a movie or novel should be included alongside more constrained tests. Such an AGI benchmark does not yet exist. However, developing it is an essential challenge for the community. With testing methodology aligned around the levels of AGI, we can build systems with transparent, measurable progress toward human abilities.
Responsible AGI Development
The paper also relates AGI capabilities to considerations of risk and autonomy. More advanced AI systems may unlock new abilities like fully independent operation. However, increased autonomy does not have to follow automatically from greater intelligence. Thoughtfully chosen human-AI interaction modes can allow society to benefit from powerful AI while maintaining meaningful oversight. As capabilities grow, designers of AGI systems should carefully consider which tasks and decisions we choose to delegate vs monitor. Striking the right balance will ensure AI aligns with human values as progress continues.
Overall, the levels of AGI give researchers, companies, policymakers, and the broader public a framework for understanding and shaping the responsible development of intelligent machines. Benchmarking methodologies still need substantial work - but the path forward is more precise thanks to these guidelines for thinking about artificial general intelligence.
Top Facts from the Paper
- Current AI systems have some narrow abilities resembling AGI but limited performance and generality compared to humans. ChatGPT is estimated to be a Level 1 "Emerging AGI."
- Performance and generality (variety of tasks handled) are critical for evaluating progress.
- Shared benchmarks are needed to objectively measure AI against the levels based on a diverse range of real-world cognitive tasks.
- Increased autonomy should not be an automatic byproduct of intelligence - responsible development involves carefully considering human oversight.
The levels of AGI give us a framework to orient AI progress towards beneficial ends, not just technological milestones. Understanding current systems' capabilities and limitations provides the clarity needed to assess risks, set policies, and guide research positively. A standardized methodology for testing general intelligence remains an open grand challenge.
But initiatives like Anthropic's AI Safety technique, and this AGI roadmap from DeepMind researchers represent an encouraging step toward beneficial artificial intelligence.
Q: What are the levels of AGI?
The levels of AGI are a proposed framework for classifying AI systems based on their performance across a wide range of tasks. The levels range from 0 (Narrow Non-AI) to 5 (Artificial Superintelligence), with increasing capability in both depth (performance quality) and breadth (generality across tasks).
Q: Why do we need a framework like levels of AGI?
A framework helps set expectations on AI progress, enables benchmarking and progress tracking, identifies risks at each level, and advises policymakers on regulating AI. Shared definitions allow coordination.
Q: How are performance and generality evaluated at the levels?
Performance refers to how well an AI system can execute specific tasks compared to humans. Generality refers to the variety of different tasks the system can handle. Both are central dimensions for AGI.
Q: What's the difference between narrow AI and general AI?
Narrow AI specializes in particular tasks, while general AI can perform well across various tasks. Each level includes both limited and available categories.
Q: What are some examples of different AGI levels?
ChatGPT is currently estimated as a Level 1 "Emerging AGI." Google's AlphaFold is Level 5 "Superhuman Narrow AI" for protein folding. There are yet to be examples of Level 3 or 4 General AI.
Q: How will testing determine an AI's level?
Shared benchmarks that measure performance on diverse real-world cognitive tasks are needed. This "living benchmark" will grow as new tests are added.
Q: What principles guided the levels of AGI design?
Fundamental principles include:
- Focusing on capabilities over mechanisms.
- Separating the evaluation of performance and generality.
- Prioritizing cognitive over physical tasks.
- Analyzing potential rather than deployment.
- Using ecologically valid real-world tests.
Q: How do the levels relate to autonomous systems?
Higher levels unlock greater autonomy, but it does not have to follow automatically. Responsible development involves carefully considering human oversight for AGI.
Q: How can the levels help with safe AGI development?
The levels allow for identifying risks and needed policies at each stage. Progress can be oriented towards beneficial ends by tracking capabilities, limitations, and risks.
Q: Are there any AGI benchmarks available yet?
There has yet to be an established benchmark, but developing standardized tests aligned with the levels of AGI capabilities is a significant challenge and opportunity for the AI community.
- AGI - Artificial General Intelligence
- Benchmark - Standardized tests to measure and compare the performance of AI systems
- Cognitive - Relating to perception, reasoning, knowledge, and intelligence
- Ecological validity - How well a test matches real-world conditions and requirements
- Generality - The ability of an AI system to handle a wide variety of tasks
- Human-AI interaction - How humans and AI systems communicate and collaborate
- Performance - Quality with which an AI system can execute a particular task