Advances & challenges in foundation agentsBrain power

Advances & challenges in foundation agents: Section 1.1 – The rise and development of AI agents

This article is Chapter 1, Section 1.1 of a series of articles featuring Liu and colleagues’ book Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems.

The concept of “agent” is a cornerstone of modern AI, representing a system that perceives its environment, makes decisions, and takes actions to achieve specific goals. This idea, while formalized in AI in the mid-20th century, has roots in early explorations of autonomy and interaction in intelligent systems. One of the most widely cited definitions, proposed by Russell and Norvig1, describes an agent as “anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators”. This definition emphasizes the dual nature of agents as both observers and actors, capable of dynamically adapting to their surroundings rather than following static rules. It encapsulates the shift in AI from systems that merely compute to systems that engage with their environment.

The historical development of agents parallels the evolution of AI itself. Early symbolic systems, such as Newell and Simon’s General Problem Solver2, sought to replicate human problem-solving processes by breaking tasks into logical steps. However, these systems were limited by their reliance on structured environments and predefined logic. The agent paradigm emerged as a response to these limitations, focusing on autonomy, adaptability, and real-world interaction. Rodney Brooks’s subsumption architecture in the 1980s exemplified a pivotal shift toward behavior-based robotics (BBR), introducing agents capable of real-time, reactive behavior in physical environments3. Unlike traditional approaches that relied on constructing detailed internal models of the world, BBR emphasizes systems with minimal internal state, where behavior emerges from direct sensory-motor interactions. These robots exhibit complex-appearing actions by continuously adjusting to their environment, not through deep planning, but through layered and reflexive responses. Brooks’s architecture demonstrated that robust, scalable intelligence could arise from simple, modular behaviors operating in parallel, marking a foundational departure from deliberative AI design.

Agents have since become a versatile framework across AI subfields. In robotics, they enable autonomous navigation and manipulation; in software, they form the foundation of multi-agent systems used for simulation and coordination4. By integrating perception, reasoning, and action into a cohesive structure, the agent paradigm has consistently served as a bridge between theoretical AI constructs and practical applications, advancing our understanding of how intelligent systems can operate in dynamic and complex environments.

The advent of large language models (LLMs) has redefined the capabilities of agents, transforming their role in artificial intelligence and opening up new horizons for their applications. Agents, once confined to executing narrowly defined tasks or following rigid rule-based frameworks, now leverage the broad generalization, reasoning, and adaptability of models like OpenAI’s ChatGPT5, DeepSeek AI’s DeepSeek6, Anthropic’s Claude7, Alibaba’s Qwen8, and Meta’s LLaMA9. These LLM-powered agents have evolved from static systems into dynamic entities capable of processing natural language, reasoning across complex domains, and adapting to novel situations with remarkable fluency. No longer merely passive processors of input, these agents have become active collaborators, capable of addressing long-horizon challenges and interacting with their environments in a way that mirrors human problem-solving.

A key advancement in the LLM era is the seamless integration of language understanding with actionable capabilities. Modern LLMs, equipped with function-calling APIs, enable agents to identify when external tools or systems are required, reason about their usage, and execute precise actions to achieve specific goals. For instance, an agent powered by ChatGPT can autonomously query a database, retrieve relevant information, and use it to deliver actionable insights, all while maintaining contextual awareness of the broader task. This dynamic combination of abstract reasoning and concrete execution allows agents to bridge the gap between cognitive understanding and real-world action. Furthermore, the generalization abilities of LLMs in few-shot and zero-shot learning have revolutionized the adaptability of agents, enabling them to tackle a diverse array of tasks (from data analysis and creative content generation to real-time collaborative problem-solving) without extensive task-specific training. This adaptability, coupled with their conversational fluency, positions LLM-powered agents as intelligent mediators between humans and machines, seamlessly integrating human intent with machine precision in increasingly complex workflows.

Next part: Section 1.2 – A parallel comparison between human brain and AI agents.

Article source: Liu, B., Li, X., Zhang, J., Wang, J., He, T., Hong, S., … & Wu, C. (2025). Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems. arXiv preprint arXiv:2504.01990. CC BY-NC-SA 4.0.

Header image: AI is Everywhere by Ariyana Ahmad & The Bigger Picture / Better Images of AI, CC BY 4.0.

References:

  1. Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs, NJ, 1 edition, 1995. ISBN 0-13-103805-2.
  2. Allen Newell and Herbert A. Simon. “GPS, a program that simulates human thought.” In Computation & Intelligence: collected readings, pp. 415-428. 1995.
  3. Rodney Brooks. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation, 2(1):14–23, 1986.
  4. Michael Wooldridge. An Introduction to MultiAgent Systems. John Wiley & Sons, 2009.
  5. OpenAI. Introducing ChatGPT. 2022.
  6. Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-V3 Technical Report. arXiv preprint arXiv:2412.19437, 2024.
  7. Anthropic. Claude: The next step in helpful AI. 2023. Accessed: 2024-12-01.
  8. An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2.5 Technical Report. arXiv preprint arXiv:2412.15115, 2024.
  9. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971, 2023.

RealKM Magazine

RealKM Magazine brings managers and knowledge management (KM) practitioners the findings of high-value knowledge management research through concise, practically-oriented articles.

Related Articles

Back to top button