Formal classification

AI Agent Taxonomy

Last updated: June 1, 2026

Every definition article defines "an AI agent" as one thing. This page maps the actual diversity of agent types — ten distinct categories, each with a formal definition, the paper that named it, historical examples from the field's 70-year history, and the current products that fall within it.

The word "agent" covers a thermostat (simple reflex), a chess engine (model-based reflex), a BDI-programmed industrial system (deliberative), AlphaGo (learning), AutoGPT (tool-use LLM), Anthropic's computer use (computer-use), Boston Dynamics Spot (embodied), and the Sakana AI Scientist (autonomous research). These are not the same thing. Using one word for all of them — as the current industry does — produces definitional confusion at scale, which is exactly what agent washing exploits.

This taxonomy uses two organizing frameworks: the classical framework from Russell and Norvig (1995), which classifies agents by their internal architecture (reflex → model → goal → utility → learning), and the modern framework that distinguishes agents by their operational context (tool-use LLM, computer-use, embodied, research). The two frameworks are complementary: the classical framework describes how an agent reasons; the modern framework describes where and how it acts.

Editorial source note: Taxonomy claims are checked against named papers, dated systems, and cross-page evidence in the Primary Sources Library, AI agent history timeline, and editorial methodology. When a modern product spans multiple categories, this page names the primary type first and explains secondary types instead of treating marketing language as the classification.

Ten agent types

Evidence Standard for This Taxonomy — how categories are sourced and reviewed
Reactive Agents — perceive and act with no world model
Model-Based Reflex Agents — perceive, track state, act
Deliberative Agents — symbolic world model + planning
BDI Agents — beliefs, desires, intentions
Learning Agents — improve through experience
Multi-Agent Systems — coordination among multiple agents
Tool-Use LLM Agents — the modern agentic AI paradigm
Computer-Use Agents — operating software interfaces directly
Embodied Agents — acting in the physical world
Autonomous Research Agents — full scientific lifecycle automation

Jump to full comparison table · Jump to "where does X product fall?"

Evidence Standard for This Taxonomy

How This Taxonomy Is Sourced

Classical Literature and Modern Agent Evidence

Each category is anchored to either a formal research lineage or a documented product capability. Classical categories rely on Russell and Norvig, Brooks, Wooldridge and Jennings, Rao and Georgeff, Watkins, Sutton and Barto, and the multi-agent systems literature. Modern categories rely on primary papers and releases for ReAct, Toolformer, SWE-bench, computer use, embodied AI, and autonomous research agents.

How Categories Are Assigned

Primary Type, Secondary Types, and Boundary Cases

The taxonomy assigns a primary type based on the feature that most defines the system's behavior. Secondary types are listed when a product spans categories: Claude with tools is primarily a tool-use LLM agent, Anthropic Computer Use is primarily a computer-use agent, Waymo is primarily an embodied agent, and Sakana AI Scientist is primarily an autonomous research agent. Borderline cases are treated explicitly because many current systems combine old architectures with modern interfaces.

How Product Placement Is Reviewed

Agent Washing and Category Discipline

Product placement is not copied from vendor marketing. A chatbot is not treated as an agent unless it can pursue a goal across steps, use tools or actions, maintain task state, or operate in an environment. This is the same category discipline used on the agent washing terminology entry: the label must describe observable behavior, not only the way a product is sold.

Update and Correction Policy

Material Taxonomy Changes

Material changes to category definitions, product placement, or source interpretation should be visible on the page through updated text, corrected dates, or revised source notes. Corrections and primary-source additions can be sent to curator@agentichistory.org.

Classical AI Agent Types

Russell & Norvig 1995 framework: internal agent architectures from reactive rules through learning systems.

Classical — Type I

1. Reactive Agents

Definition

A reactive agent maps directly from perceived state to action, with no internal memory, no world model, and no deliberation. Its behavior is entirely determined by condition-action rules operating on current percepts. Derived from Russell & Norvig (1995), Chapter 2 ("Simple Reflex Agents"); formalized by Brooks (1986) in the subsumption architecture.

Reactive agents are the simplest agent type: sense the environment, fire a rule, take action. They cannot reason about what they haven't observed, cannot plan ahead, and cannot improve over time. What they can do — and do extremely well — is respond to environmental stimuli in real time without the computational overhead of deliberation.

Rodney Brooks at MIT developed the defining implementation: the subsumption architecture, introduced in his 1986 paper "A Robust Layered Control System for a Mobile Robot" (IEEE Journal of Robotics and Automation, 2(1), DOI: 10.1109/JRA.1986.1087032). Brooks argued that intelligence emerges from direct coupling of perception and action in the physical world — without explicit internal representations, world models, or planning. The agent is defined by layered behaviors (avoid obstacles → wander → follow a person), where higher layers can suppress lower ones. Brooks's robots — Herbert, Allen, Squirt — were physically embodied reactive agents.

Russell and Norvig formalize the type as the "simple reflex agent": "if condition → then action." The thermostat is the canonical example: temperature below threshold → activate heat. There is no concept of "last temperature" or "target by 6pm" — only the current percept and its associated action.

Historical Examples

Thermostat

Pre-AI

The textbook canonical reactive agent. Temperature below threshold → heat. No memory, no model.

Brooks's Herbert robot

1988, MIT

Collected soda cans reactively; no world model; perception directly coupled to action via subsumption layers.

Squirt robot

1990, MIT

A light-shy creature that hides in the dark — six behaviors, no world model, 100% reactive.

Early video game enemies (Pac-Man ghosts)

1980, Namco

Each ghost follows a fixed condition-action rule in pursuit mode; no world model, no planning.

Rule-based spam filters

1990s

If contains("Nigerian prince") → spam. Pure reactive classification without learning or planning.

Modern Examples and Boundary Cases

Pure reactive agents are rare in modern commercial AI — they have been superseded by model-based and learning approaches. However, the reactive pattern persists inside larger systems: the collision-avoidance layer in an autonomous vehicle is reactive even when the overall vehicle system is deliberative. Many "AI agents" marketed in 2025 that simply trigger workflows based on keyword detection or webhook events are functionally reactive, even when built on LLMs.

Primary Sources

Primary sources: Brooks, R. A. (1986). "A Robust Layered Control System for a Mobile Robot." IEEE Journal of Robotics and Automation 2(1), 14–23. DOI: 10.1109/JRA.1986.1087032. Russell, S. & Norvig, P. (1995). AI: A Modern Approach, Ch. 2 ("Simple Reflex Agents"). Grokipedia, "Subsumption architecture," 2026 (citing Brooks 1986 as the canonical reactive architecture).

Classical — Type II

2. Model-Based Reflex Agents

Definition

A model-based reflex agent maintains an internal state that tracks aspects of the world not currently observable, using both the current percept and this internal state to select actions. The internal state is updated based on a model of how the world works. Russell & Norvig (1995), Chapter 2 ("Model-Based Reflex Agents").

The model-based reflex agent extends the reactive agent with memory. Where a reactive agent sees only the current state, a model-based agent remembers relevant history and maintains a representation of the world sufficient to make better decisions than current perception alone would allow. The agent updates its internal state based on (a) what has happened, (b) what it just did, and (c) a model of how its actions affect the world.

The classic Russell and Norvig example is an automated vacuum cleaner with a map of rooms it has already cleaned: the vacuum's next action depends on where it currently is and which rooms it has visited, not just its current percept. This is structurally distinct from a reactive agent, which would re-clean the same room if randomly placed there again.

Historical Examples

Roomba (Gen 2+)

2005+, iRobot

Maintains a map of the cleaned area to avoid re-cleaning and to ensure full coverage — model-based reflex at its simplest.

Chess engines (classical)

1950s–1990s

Maintain a complete board state model; evaluate positions using heuristic functions rather than learned policies.

STRIPS-based planners

1971, Fikes & Nilsson

Maintain a world state model; take actions that change that model toward a goal state. The first formal model-based agent.

GPS (General Problem Solver)

1957, Newell & Simon

Maintained a means-ends representation of the gap between current and goal states — the earliest model-based problem-solving agent.

Primary Sources

Primary sources: Russell, S. & Norvig, P. (1995). AI: A Modern Approach, Ch. 2 ("Model-Based Reflex Agents"). Fikes, R. E. & Nilsson, N. J. (1971). "STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving." Artificial Intelligence 2(3–4). Newell, A. & Simon, H. A. (1963). "GPS, a Program That Simulates Human Thought." In E. Feigenbaum & J. Feldman (Eds.), Computers and Thought.

Classical — Type III

3. Deliberative Agents

Definition

A deliberative agent "possesses an explicitly represented, symbolic model of the world, and in which decisions (for example about what actions to perform) are made via symbolic reasoning" — that is, logical or pseudo-logical reasoning based on pattern matching and symbolic manipulation. Wooldridge, M. J., & Jennings, N. R. (1995). "Intelligent Agents: Theory and Practice." The Knowledge Engineering Review, 10(2), 115–152. The term "deliberate agent" was introduced by Genesereth & Nilsson (1987).

The deliberative agent adds to the model-based reflex agent the capacity for explicit reasoning about that model — using symbolic logic, planning algorithms, or inference to decide what to do. Where a model-based reflex agent selects actions by matching internal state to condition-action rules, a deliberative agent can reason across chains of hypothetical actions, evaluate their consequences, and select a plan.

The deliberative approach faces two classic problems identified by Wooldridge and Jennings: the transduction problem (how do you translate the messy real world into a precise symbolic description, fast enough for the description to be useful?) and the representation/reasoning problem (how do you represent complex real-world knowledge symbolically, and reason with it in real time?). These are the same problems that limited classical AI — what critics called GOFAI (Good Old-Fashioned Artificial Intelligence) — through the 1980s and 1990s.

Most BDI agents (Type 4 below) are deliberative agents, but not all deliberative agents are BDI agents. STRIPS-based planners, for instance, are deliberative without being BDI. The distinction is whether the deliberation is organized around a belief-desire-intention mental model.

Historical Examples

STRIPS planner

1971, Fikes & Nilsson

The original deliberative agent: represents world states symbolically, reasons from current state to goal state via action sequences.

Shakey the Robot

1966–1972, SRI

The first mobile robot that used STRIPS planning to reason about actions. Deliberative agent with physical embodiment.

Early autonomous vehicle planners

1985–2000s

ALV (DARPA Autonomous Land Vehicle) used symbolic map representations and deliberative path planning.

Soar cognitive architecture

1987, Laird, Newell, Rosenbloom

A unified theory of cognition implemented as a deliberative agent; still used in military simulation and game AI.

Primary Sources

Primary sources: Wooldridge, M. J., & Jennings, N. R. (1995). "Intelligent Agents: Theory and Practice." KER 10(2) (formal definition of deliberative agent). Genesereth, M. R. & Nilsson, N. J. (1987). Logical Foundations of Artificial Intelligence. Morgan Kaufmann (origin of the term "deliberate agent"). Fikes & Nilsson (1971). STRIPS. ScienceDirect Topics, "Reactive Agent" (citing Wooldridge 1995 definition).

Classical — Type IV

4. BDI Agents (Belief-Desire-Intention)

Definition

A BDI agent organizes its internal state around three mental attitudes: beliefs (the agent's model of the world), desires (goals the agent wants to achieve), and intentions (plans the agent has committed to pursuing). The agent selects actions by reasoning over its beliefs, filtering desires into achievable options, and committing to intentions — which it then maintains unless there is a compelling reason to reconsider. Rao, A. S. & Georgeff, M. P. (1991). "Modeling Rational Agents within a BDI Architecture." KR91. Derived from Bratman, M. E. (1987). Intention, Plans, and Practical Reason. Harvard University Press.

BDI agents are a specific subtype of deliberative agent, organized around the philosophical framework introduced by Michael Bratman in 1987. Bratman's key insight is that rational agents don't continuously reconsider their goals — they commit to plans (intentions) and follow through, reconsidering only when there is a clear reason to do so. This commitment is what enables action over extended time horizons without constant re-planning.

The computational operationalization came from Anand Rao and Michael Georgeff (1991), who formalized BDI in modal logic and described the control loop: update beliefs from percepts → generate options (desires) → filter options using beliefs → form intentions → execute. This architecture was implemented in the Procedural Reasoning System (PRS) and later dMARS, then formalized as the programming language AgentSpeak(L) by Rao (1996), and implemented in the Jason interpreter by Hübner and Bordini.

Historical Examples

PRS (Procedural Reasoning System)

1987, Georgeff & Lansky, SRI

The first implemented BDI system; used in NASA space shuttle fault diagnosis and air traffic control management.

dMARS

1996, Australian AI Institute

The successor to PRS; the most widely deployed BDI system in industrial settings through the 1990s and 2000s.

Jason / AgentSpeak(L)

2005–ongoing

Open-source BDI interpreter by Hübner & Bordini; still the primary BDI platform for academic multi-agent systems research.

Jack Agent Platform

1999, Agent Oriented Software

Commercial BDI agent platform; deployed in Australian defence force command systems, air traffic management, and financial applications.

BDI and LLM Agents: The Conceptual Parallel

Modern LLM-based agents implement a structurally similar architecture without knowing the BDI literature: the system prompt functions as beliefs (world model and context), the user's objective functions as desire (goal), and the ReAct reasoning trace functions as intention (committed plan). The LLM's tendency to "commit" to a line of reasoning across a multi-step task mirrors Bratman's insight about intention-as-commitment. This parallel is noted in the Agentic History Terminology Archaeology and in the arxiv paper "Agentic AI and Multiagentic: Are We Reinventing the Wheel?" (arXiv:2506.01463, 2025).

Primary Sources

Primary sources: Bratman, M. E. (1987). Intention, Plans, and Practical Reason. Harvard University Press. Rao, A. S. & Georgeff, M. P. (1991). "Modeling Rational Agents within a BDI Architecture." KR91. Rao, A. S. (1996). "AgentSpeak(L): BDI Agents Speak Out in a Logical Computable Language." MAAMAW-96, LNAI 1038, Springer. Agentic History, Primary Sources Library — AgentSpeak(L).

Learning Agents — Type V

5. Learning Agents

Definition

A learning agent has four components: a learning element (makes improvements based on feedback), a performance element (selects actions based on current knowledge), a critic (tells the learning element how well the agent is doing relative to a performance standard), and a problem generator (suggests actions that will lead to new, informative experiences). Russell, S. & Norvig, P. (1995). AI: A Modern Approach, Chapter 2 ("Learning Agents"). Extended by the reinforcement learning literature: Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.

Learning agents improve their performance over time through experience. Unlike the types above — which operate with fixed rules, fixed models, or fixed architectures — a learning agent adapts based on what happens when it acts. The dominant implementation paradigm is reinforcement learning (RL): the agent receives reward signals from the environment and learns a policy that maximizes cumulative reward.

Deep reinforcement learning — combining deep neural networks with RL — produced the most dramatic AI capability demonstrations of the 2013–2022 era: DQN playing Atari (Mnih et al., 2013), AlphaGo defeating world champion Lee Sedol (Silver et al., 2016), AlphaZero mastering chess, Go, and shogi from scratch (Silver et al., 2017), and AlphaStar reaching Grandmaster level at StarCraft II (Vinyals et al., 2019). These are all learning agents.

Modern LLMs are also learning agents in a specific sense: they are trained via reinforcement learning from human feedback (RLHF, Christiano et al. 2017) and Constitutional AI (Bai et al. 2022). The "learning" in this case happens during training, not at runtime — which distinguishes them from RL agents that continue learning during deployment.

Historical Examples

TD-Gammon

1992, Tesauro, IBM

First RL agent to reach expert human level in a complex game (backgammon) through self-play. Trained via temporal difference learning.

DQN (Atari)

2013, Mnih et al., DeepMind

First deep RL agent to learn from raw pixels. Achieved superhuman performance on 6 of 7 Atari games tested.

AlphaGo

2016, Silver et al., DeepMind

Defeated European champion 5-0 and world champion Lee Sedol 4-1 using deep RL and Monte Carlo tree search.

AlphaZero

2017, Silver et al., DeepMind

Mastered chess, Go, and shogi from random play in hours using only self-play RL, with no human game knowledge.

OpenAI Five

2018–2019, OpenAI

5-agent team learned Dota 2 through 45,000 years of self-play; defeated world champion team OG 2-0 in April 2019.

AlphaFold 2

2020, Jumper et al., DeepMind

Solved the protein folding problem; 2024 Nobel Prize in Chemistry. Combines deep learning with evolutionary data rather than RL but fits the learning-agent paradigm of discovering solutions through training.

Primary Sources

Primary sources: Sutton, R. S. & Barto, A. G. (1998/2018). Reinforcement Learning: An Introduction (free online 2nd ed. at incompleteideas.net). Mnih, V. et al. (2013). "Playing Atari with Deep Reinforcement Learning." arXiv:1312.5602. Silver, D. et al. (2016). "Mastering the game of Go with deep neural networks and tree search." Nature 529, 484–489. DOI:10.1038/nature16961. Christiano, P. et al. (2017). "Deep Reinforcement Learning from Human Preferences." NeurIPS 2017. arXiv:1706.03741.

Multi-Agent Coordination

Systems where the organizing question is how multiple agents allocate tasks, communicate, and coordinate.

Multi-Agent Systems — Type VI

6. Multi-Agent Systems (MAS)

Definition

A multi-agent system is a system composed of multiple interacting agents. Each agent has its own goals, beliefs, and capabilities; the system's behavior emerges from the interactions — cooperative, competitive, or both — among agents. Wooldridge, M. J., & Jennings, N. R. (1995). "Intelligent Agents: Theory and Practice." The Knowledge Engineering Review 10(2). Earlier: Smith, R. G. (1980). "The Contract Net Protocol." IEEE Transactions on Computers C-29(12).

Multi-agent systems are not a type of individual agent — they are an organizational structure in which multiple agents (of any of the types above) interact. The key research questions in MAS: how do agents communicate (KQML, FIPA-ACL, MCP); how do they coordinate tasks (Contract Net Protocol, auctions, organizational structures); how do they reach agreement when interests conflict (game theory, mechanism design); and how does emergent collective behavior arise.

The modern LLM-based MAS — AutoGen, CrewAI, LangGraph, OpenAI's multi-agent patterns — are direct descendants of this 40-year research tradition, usually without knowing it. AutoGen's "GroupChat" is structurally analogous to the KQML blackboard architecture. CrewAI's role-based collaboration is structurally analogous to organizational agent frameworks from the 1990s. LangGraph's graph-based orchestration is a formal state-machine approach that mirrors FIPA-compliant agent coordination protocols.

Historical Examples

Contract Net (distributed sensor network)

1980, Smith & Davis

First implemented MAS — nodes bidding on sensing tasks. The founding deployment of the Contract Net Protocol.

DVMT (Distributed Vehicle Monitoring Testbed)

1988, Lesser & Corkill, UMass

Acoustic sensors detecting vehicles using blackboard MAS. The canonical 1980s MAS research system.

FIPA-compliant agent platforms

1997–2000s

JADE (Java Agent Development Framework) implemented FIPA-ACL standards; deployed in logistics, manufacturing, e-commerce.

RoboCup

1997–ongoing

International MAS competition: teams of software/hardware agents playing soccer. The premier MAS benchmark for four decades.

Modern LLM-Based Multi-Agent Systems

AutoGen

2023, Microsoft Research

Multi-agent conversation framework where LLM agents debate, critique, and collaborate. GroupChat pattern enables N-agent dialogue.

CrewAI

2024

Role-based MAS where agents have explicit roles (researcher, writer, critic) with structured handoffs and shared memory.

LangGraph

2024, LangChain team

Graph-based agent orchestration with explicit state machines governing agent interactions. Parallel execution, branching, and cycles.

OpenAI Agents SDK (multi-agent)

March 2025, OpenAI

Handoff-based MAS where agents transfer control and conversation context explicitly. Production-grade with tracing and guardrails.

Google ADK

April 2025, Google

Hierarchical agent tree: root agent delegates to sub-agents. Supports A2A (Agent-to-Agent) protocol for cross-framework communication.

Primary Sources

Primary sources: Smith, R. G. (1980). "The Contract Net Protocol." IEEE Transactions on Computers C-29(12). DOI:10.1109/TC.1980.1675516. Wooldridge, M. J. & Jennings, N. R. (1995). "Intelligent Agents: Theory and Practice." KER 10(2). Latenode, "LangGraph vs AutoGen vs CrewAI," 2026 (documenting modern MAS frameworks). gurusup.com, "Best Multi-Agent Frameworks in 2026" (A2A protocol, Google ADK).

Modern AI Agent Types Explained

LLM-era agents defined by tool use, software operation, and autonomous task execution.

Modern LLM Agent — Type VII

7. Tool-Use LLM Agents

Definition

A tool-use LLM agent is a large language model augmented with the ability to call external tools — search engines, code interpreters, databases, APIs — within a plan-act-observe loop. The model generates reasoning traces interleaved with tool calls; tool outputs are fed back as observations; the model iterates until the task is complete or it stops. Yao, S. et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." arXiv:2210.03629. ICLR 2023. The defining architecture for this agent type.

The tool-use LLM agent is what most people mean when they say "AI agent" in 2025–2026. It is the category into which AutoGPT, BabyAGI, Claude with tools, ChatGPT with plugins, Devin, and most enterprise agentic AI products fall. The defining architecture is the ReAct loop (Yao et al. 2022): reason → act (call a tool) → observe (receive tool output) → reason → repeat.

The key architectural components: an LLM as the reasoning and planning core; a tool registry (search, code execution, file I/O, API calls); a memory system (conversation history, vector database, or scratchpad); an observation mechanism (tool outputs returned to the context); and a termination condition (task complete, maximum steps reached, or handoff to human).

Tool-use LLM agents are, in the classical framework, a hybrid of model-based and deliberative agents: they maintain context across steps (model-based) and reason about what to do next using that context (deliberative). But the "deliberation" happens in natural language via LLM inference rather than formal symbolic logic, which makes them qualitatively different from classical deliberative agents.

Defining Moments

BabyAGI

March 28, 2023, Nakajima

First viral autonomous LLM agent. Objective → task creation → execution → reprioritization → repeat. ~140 lines of Python.

AutoGPT

March 30, 2023, Richards

The project that made "AI agent" mainstream. GPT-4 + self-prompting + web browsing + file I/O + code execution. 100,000 GitHub stars in weeks.

LangChain agents

2022–2023, Chase

Productized the ReAct pattern; made LLM agent development accessible to the broad developer community.

Devin

March 12, 2024, Cognition

Tool-use LLM agent specialized for software engineering. Shell + editor + browser + sandbox. ARR $1M → $73M in 9 months.

Claude (with tools)

2023–2026, Anthropic

Frontier model with native tool use, computer use, and MCP integration. Benchmarked specifically on long-horizon agentic tasks.

Claude Code

2025, Anthropic

Terminal-native agentic coding assistant. SWE-bench Verified score trajectory: 1.96% (Oct 2023) → 78.4% (Apr 2026).

Subtypes within tool-use LLM agents

The category is large enough to have meaningful internal distinctions:

Single-turn tool-use (chatbot with plugins) — one round of tool calls per user message; not truly agentic
Autonomous loop agent (BabyAGI, AutoGPT pattern) — self-directed multi-step loops; the core "agent" pattern
Long-horizon agent (Devin, Claude Code) — sustains coherent goal pursuit over extended sessions (30+ minutes, 100+ steps)
Specialized domain agent (legal research agent, financial analysis agent) — tool-use LLM agent fine-tuned or constrained to a domain

Primary Sources

Primary sources: Yao, S. et al. (2022). "ReAct." arXiv:2210.03629. Schick, T. et al. (2023). "Toolformer." arXiv:2302.04761. Jimenez, C. E. et al. (2023). "SWE-bench." arXiv:2310.06770. Agentic History, BabyAGI entry, AutoGPT entry, Devin entry.

Modern LLM Agent — Type VIII

8. Computer-Use Agents

Definition

A computer-use agent interacts with graphical user interfaces directly — viewing screenshots of a screen, moving a cursor, clicking UI elements, and typing — rather than through structured APIs. It can operate any software a human can use, including legacy applications with no API. Anthropic (2024, October 22). "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku." Anthropic Blog. The term "computer use" as a product capability name was coined here.

Computer-use agents are a subtype of tool-use LLM agent distinguished by how they interact with software. Where tool-use agents call APIs (structured interfaces designed for machine access), computer-use agents interact with GUIs (interfaces designed for humans). This distinction is practically enormous: most enterprise software has no API, or its API is incomplete, or API access requires developer integration. A computer-use agent can access all of it through the same interface a human employee would use.

The conceptual predecessor is Adept AI's ACT-1 (Action Transformer, September 2022) — the first widely publicized demonstration of a transformer model operating web interfaces. Adept co-founders included several authors of the original Transformer paper (Vaswani et al. 2017). Anthropic's October 22, 2024 release was the first production deployment from a frontier lab; OpenAI's Operator (January 23, 2025) was the first consumer product in the category.

Defining Products

Adept ACT-1

September 2022

First widely publicized GUI-operating transformer model. Browsed web, used Salesforce, operated software in response to natural language.

Anthropic Computer Use

October 22, 2024

First production frontier model with public computer-use API. Exposes screenshot, move_mouse, click, type capabilities.

OpenAI Operator

January 23, 2025

First consumer browser-agent product from a frontier lab. Books travel, fills forms, places orders autonomously.

Manus

March 2025, Monica

General-purpose autonomous agent; combines computer use with multi-step planning for complex end-to-end tasks.

Computer-Use vs Tool-Use: The Key Distinction

Tool-use agents call APIs: search(query="climate change"). Computer-use agents take GUI actions: click the search bar, type "climate change," press Enter, read the results from the screen. The same underlying task; completely different interface. Tool use requires the tool to have been pre-integrated; computer use requires only that the target software have a graphical interface. This is why computer use is considered a qualitative capability expansion: it removes the API-integration bottleneck.

Primary Sources

Primary sources: Anthropic (2024, October 22). "Introducing computer use." anthropic.com/news. Agentic History, ACT-1 timeline entry; Anthropic computer use entry; OpenAI Operator entry. Terminology entry for "computer use".

Physical and Frontier Agent Types Compared

Agents that operate in physical environments or attempt full research lifecycle automation.

Physical/Embodied — Type IX

9. Embodied Agents

Definition

An embodied agent is an artificial intelligence system instantiated in a physical or simulated physical form — a robot body, autonomous vehicle, or avatar — that perceives and acts in the physical or physical-simulation world through sensors and actuators. Embodied agents are grounded in the physical environment in ways that purely digital agents are not. Brooks, R. A. (1990). "Elephants Don't Play Chess." In P. Maes (Ed.), Designing Autonomous Agents. MIT Press. (The foundational argument for embodied intelligence.) More recently: NASA JPL / Sakana AI arxiv:2506.22355 (2026), "Embodied AI Agents: Modeling the World."

Embodied agents face a set of challenges that purely digital agents do not: the physical world is continuous, noisy, and unpredictable; actions have irreversible consequences; perception is limited by sensor placement and quality; and the agent's own body must be modeled as part of the environment. Rodney Brooks argued in 1990 that intelligence cannot be separated from embodiment — "elephants don't play chess" — and that the kind of intelligence needed in the physical world is fundamentally different from the kind that beats chess engines.

The integration of large language models and vision models with physical robotic platforms is the defining trend in embodied AI as of 2024–2026. Google DeepMind's RT-2 (Robotic Transformer 2, 2023) demonstrated that a vision-language model trained on internet data could transfer semantic knowledge to robot control — a robot told "move to the Coke can" could identify it from visual context even for objects not seen during robot training. Physical Intelligence's π₀ (2024) and Google's Gemini Robotics (2025) extend this to more complex manipulation tasks.

Historical Lineage

Shakey the Robot

1966–1972, SRI

First robot to use STRIPS planning for physical navigation. Deliberative embodied agent — slow but principled.

Brooks's Gen-ghis

1988, MIT

Six-legged walking robot using subsumption architecture. Demonstrated robust reactive locomotion without world model.

Honda ASIMO

2000, Honda

First widely known humanoid robot capable of walking bipedally and navigating dynamic environments.

Stanley (DARPA Grand Challenge)

2005, Stanford

Won the 2005 DARPA Grand Challenge — 132-mile autonomous desert drive. Hybrid reactive/deliberative embodied agent.

Modern Embodied AI

Boston Dynamics Spot + GPT

2023

Spot robot integrated with GPT-4V for natural-language-guided inspection tasks. Tool-use LLM + embodied agent hybrid.

RT-2 (Robotic Transformer 2)

2023, Google DeepMind

VLA (vision-language-action) model: internet-trained vision-language model fine-tuned for robot control. Transfers semantic knowledge to physical actions.

π₀ (Physical Intelligence)

2024, Physical Intelligence

Generalist robot policy trained across diverse household manipulation tasks. Executes multi-step tasks in new homes from natural language.

Figure 02 + OpenAI

2024, Figure AI

Humanoid robot using OpenAI's multimodal models for real-time object recognition, reasoning, and task execution.

Waymo (production)

2021–2026, Alphabet

The most-deployed autonomous vehicle system. 10M+ miles of autonomous driving experience; commercial robotaxi in San Francisco and Phoenix.

Primary Sources

Primary sources: Brooks, R. A. (1990). "Elephants Don't Play Chess." In Maes (Ed.), Designing Autonomous Agents. MIT Press. Brohan, A. et al. (2023). "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." arXiv:2307.15818. Google DeepMind. NVIDIA Glossary, "Embodied AI" (2026). Air Street Press, "Embodied AI: The breakthroughs shaping the next 12 months" (December 2025). arxiv:2506.22355 (2026), "Embodied AI Agents: Modeling the World."

Frontier — Type X

10. Autonomous Research Agents

Definition

An autonomous research agent conducts the full scientific lifecycle — generating hypotheses, designing experiments, executing experiments (in simulation or by directing wet-lab equipment), analyzing results, and writing manuscripts — without step-by-step human direction of each stage. Lu, C. et al. (2024). "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery." arXiv:2408.06292. Sakana AI. Nature, "How to build an AI scientist: first peer-reviewed paper spills the secrets" (March 2026).

Autonomous research agents represent the most ambitious current application of agentic AI. Rather than assisting human researchers, they are designed to conduct research autonomously — forming the hypothesis, running the experiment, analyzing the data, and writing the paper. This is the class of agent that comes closest to the predictions made by Dario Amodei ("AI smarter than Nobel Prize winners") and Sam Altman ("AI compressing decades of scientific progress").

The landmark system is Sakana AI's AI Scientist, released August 2024 and developed in collaboration with scientists from Oxford and the University of British Columbia. It generates machine learning research papers for approximately $15 each. AI Scientist v2 (2025) produced the first workshop paper written entirely by AI and accepted through peer review. A Nature paper summarizing the AI Scientist's capabilities was published in 2026 (Lu et al., Nature 651, 914–919, 2026). An independent evaluation (Beel et al., arXiv:2502.14297, 2025) found that the system "significantly limits autonomy" in practice through its reliance on human-authored templates, and that Sakana's claims were more optimistic than independent replication supported.

AlphaFold 2 (Jumper et al., 2020) represents the learning-agent route to scientific discovery: rather than reasoning through the problem, AlphaFold learned protein structure directly from sequence data. The 2024 Nobel Prize in Chemistry awarded to Demis Hassabis and John Jumper explicitly recognized AI's role — the first Nobel to acknowledge an AI system's scientific contribution as primary rather than merely assistive.

Defining Systems

AlphaFold 2

2020, Jumper et al., DeepMind

Solved protein structure prediction from sequence. 2024 Nobel Prize in Chemistry (Hassabis, Jumper). Learning agent approach to scientific discovery.

Sakana AI Scientist v1

August 2024

Full research lifecycle automation: idea → experiment → analysis → manuscript. ~$15/paper. Collaborative with Oxford/UBC.

Sakana AI Scientist v2

2025, Sakana AI

Removes reliance on human-authored templates. First workshop paper written entirely by AI and accepted through peer review.

FunSearch (DeepMind)

2023

LLM + evolutionary search discovers new mathematical solutions; found best-known solutions to the cap set problem — genuine mathematical discovery.

Frontier Status and Honest Caveats

Autonomous research agents are the most hyped and least mature category. The AI Scientist v1 requires human-authored templates and cannot yet operate without significant human scaffolding, per the independent Beel et al. (2025) evaluation. The systems produce papers — but the quality, novelty, and reproducibility of those papers is actively debated in the research community. This category represents where the most ambitious agent predictions are concentrated; it is also where the evidence base is thinnest relative to the claims.

Primary Sources

Primary sources: Lu, C. et al. (2024). "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery." arXiv:2408.06292. Sakana AI / Oxford / UBC. Beel, J. et al. (2025). "Evaluating Sakana's AI Scientist for Autonomous Research." arXiv:2502.14297 (independent critical evaluation). Nature, "How to build an AI scientist: first peer-reviewed paper spills the secrets" (March 2026); citing Lu et al., Nature 651, 914–919 (2026). SakanaAI/AI-Scientist-v2 GitHub repository (v2 capabilities and first accepted AI-authored paper).

AI Agent Comparison and Product Placement Matrix

Full Comparison Table

All ten types across the key dimensions that define agent behavior.

Type	World model?	Plans ahead?	Learns over time?	Uses external tools?	Needs physical body?	Natural language?	Defined by
Reactive	No	No	No	No	No	No	Condition-action rules; current percept only
Model-Based Reflex	Yes	No	No	No	No	No	Internal state tracks past; rules still govern action
Deliberative	Yes	Yes	No	Sometimes	No	No	Symbolic reasoning over explicit world model
BDI	Yes (beliefs)	Yes (intentions)	Sometimes	Sometimes	No	No	Beliefs + desires + committed intentions
Learning	Often	Often	Yes	Sometimes	Sometimes	Sometimes	Updates behavior from feedback/reward signals
Multi-Agent System	Per agent	Per agent	Per agent	Per agent	Per agent	Increasingly	Organizational: coordination among multiple agents
Tool-Use LLM Agent	Yes (context window)	Yes (ReAct)	No (at runtime)	Yes (core feature)	No	Yes (required)	LLM + tool calls in a reason-act-observe loop
Computer-Use Agent	Yes (screenshot)	Yes	No	Yes (GUI is the tool)	No	Yes	LLM + GUI interaction (click, type, scroll)
Embodied Agent	Yes	Yes	Often	Yes (physical actuators)	Yes (defining)	Increasingly	Physical body; sensors + actuators in real world
Autonomous Research	Yes	Yes	Yes	Yes	No	Yes	Full research lifecycle from hypothesis to paper

Where Does Each Product Fall?

The ten types above are not mutually exclusive — most commercial products span multiple categories. Here is how the most frequently discussed AI agent products map to the taxonomy.

Placement review note: The table below is an editorial classification, not a vendor category list. It separates "not an agent," "tool-use agent," "computer-use agent," "embodied agent," and "autonomous research agent" by observable behavior so readers can compare products without relying on inconsistent commercial labels.

Product	Primary type	Secondary type(s)	Notes
ChatGPT (no tools)	Not an agent	—	Single-turn chatbot. Responds to messages; does not pursue goals across steps.
ChatGPT (with tools, Code Interpreter)	Tool-use LLM agent	—	Calls tools within a response; limited multi-step autonomy.
Claude (with tools)	Tool-use LLM agent	Computer-use (via computer use API)	Native tool use + MCP + computer use. Benchmarked on long-horizon agentic tasks.
AutoGPT	Tool-use LLM agent	Multi-agent (task delegation loop)	Autonomous loop; web browsing + file I/O + code execution. Cultural origin of "AI agent" as a term.
Devin	Tool-use LLM agent	Long-horizon	Software engineering specialist; shell + editor + browser. ARR $1M → $73M in 9 months.
OpenAI Operator	Computer-use agent	Tool-use LLM agent	Browser-native; fills forms, books travel, places orders via GUI interaction.
Anthropic Computer Use	Computer-use agent	Tool-use LLM agent	Claude 3.5 Sonnet with screenshot, mouse, keyboard capabilities. First frontier lab production computer-use release.
Manus	Computer-use agent	Tool-use LLM agent, multi-agent	General-purpose autonomous agent; combines GUI interaction with multi-step planning.
CrewAI / AutoGen / LangGraph	Multi-agent system	Tool-use LLM agent (each sub-agent)	Frameworks, not products. Each coordinates multiple LLM tool-use agents with different roles.
AlphaGo	Learning agent	—	Pure RL + MCTS. Not an LLM agent; not a tool-use agent. Canonical RL learning agent.
AlphaFold 2	Learning agent	Autonomous research agent (capability)	Solved protein folding via deep learning. Not agentic in the loop-based sense; achieves scientific discovery through training.
Waymo	Embodied agent	Learning agent	Hybrid deliberative + reactive + learning; physically embodied in vehicles. Commercial robotaxi in San Francisco and Phoenix.
Boston Dynamics Spot	Embodied agent	Reactive (low-level), deliberative (high-level)	Multi-layer: reactive for balance/collision avoidance; deliberative for mission planning. LLM integration in 2023+ versions.
Sakana AI Scientist	Autonomous research agent	Tool-use LLM agent, multi-agent	Full research lifecycle; $15/paper; first workshop paper accepted through peer review (v2). Requires human-authored templates (v1).
Jason / BDI agents	BDI agent	Multi-agent system	AgentSpeak(L) interpreter; academic and industrial deployments; the canonical BDI agent platform.
Thermostat	Reactive agent	—	The textbook example. Temperature below threshold → heat. No world model, no planning, no learning.