Cognitive Architectures for Language Agents

# Cognitive Architectures for Language Agents ## 1 Introduction - CoALA is an architecture or proposed framework for language agents - draw parallels from production systems and cognitive architectures ![image](https://hackmd.io/_uploads/B1Lh15TYT.png) - information storage, action space, decision-making procedure ## 2 Background: From Strings to Symbolic AGI ### 2.1 Production systems for string manipulation Intuitively, production systems consist of a set of rules, each specifying a precondition and an action. When preconditions are met, the action is executed. A production system characterizes the set of strings that can be generated from a starting point. They can specify algorithms if we impose control flow to determine which productions are executed. ### 2.2 Control flow: From strings to algorithms - think of this as context free grammars The following algorithm implements division-with-remainder by converting a number written as strokes | into the form Q ∗ R, where Q is the quotient of division by 5 and R is the remainder: ![image](https://hackmd.io/_uploads/rypEVy0ta.png) ### 2.3 Cognitive architectures: From algorithms to agents ![image](https://hackmd.io/_uploads/SJLeSJAKT.png) - productions were generalized beyond string rewriting to logical operations - preconditions that could be checked against agent goals and world state and actions are taken if preconditions are met - lost of psychological modeling and **soar** is one of them ![image](https://hackmd.io/_uploads/Syk5bq6Ya.png) Soar stores productions in long-term memory and executes them based on how well their preconditions match working memory - soar uses working memory which stores visual input, goals, internal reasoning - long term memory: procedural, semantic, episodic - decision making, grounding, multiple modes of learning ### 2.4 Language models and agents - LMs are probabilistic by nature and researchers leverage their implicit world knowledge to use them as the brain behind these cognitive architectures/agents ## 3 Connections Between Language Models and Production Systems ### 3.1 Language models as probabilistic production systems - can formulate problem of text completion as a production X -> X Y - LLMs can be viewed as probabilistic productions that sample a possible completion each time they are called - probabilistic nature lends itself to very powerful LLMs that can problem solve but it also means they are black boxes and uninterpretable ### 3.2 Prompt engineering as control flow ![image](https://hackmd.io/_uploads/HygzS5pta.png) ### 3.3 Towards cognitive language agents - language agents move beyond pre-defined prompt chains and instead place LLMs into a feedback loop in an external env ## 4 Cognitive Architectures for Language Agents (CoALA): A Conceptual Framework ![image](https://hackmd.io/_uploads/HJ6UFkCKa.png) - Cognitive Architectures for Language Agents (CoALA) is a framework to organize existing language agents - external actions interact with external environments through grounding - internal actions interact with internal memories (retrieval, reasoning, learning) ![image](https://hackmd.io/_uploads/rkC0F10t6.png) ### 4.1 Memory - working memory maintains active, readily-available information as symbolic variables - episodic memory stores experiences from earlier decision cycles - semantic memory stores agent knowledge about the world and about itself - procedural memory contains implicit knowledge stored in the LLM weights and explicit knowledge written in the agent's code ### 4.2 Grounding actions - physical environments - dialogue with human or other agents - digital environments ### 4.3 Retrieval actions - retrieval reads long-term memory info into working memory ### 4.4 Reasoning actions - reasoning reads from and writes to working memory (retrieval only reads) ### 4.5 Learning actions - update episodic memory with experience - update semantic memory with knowledge - update LLM parameters (procedural memory) - update agent code (procedural memory) - update reasoning (e.g. prompt templates) - update grounding - update retrieval - update learning or decision making ### 4.6 Decision making - planning stage -> execution stage - planning stage - proposal (reasoning) - evaluation - selection ## 5 Case Studies ![image](https://hackmd.io/_uploads/B1VujyRKp.png) ## 6 Actionable Insights - thinking beyond monolithic designs for individual applications - CoALA, like openai gym or MDPs in RL, provide a standard framework for conceptual comparisons - industry applications can benefit from organized agent library - CoALA suggests more structured reasoning procedure to update working memory variables - prompting frameworks like LangChain and LlamaIndex - Guidance for structural output parsing - LLMs can benefit from insights on agent reasoning - thinking beyond retrieval augmentation - combining exissting human knowledge with new experience and skills can help agents bootstrap efficiently - integrating retrieval and reasoning to better ground planning - thinking beyond in-context learning or finetuning - meta-learning by modifying agent code - new forms of learning and unlearning - beyond external tools or actions - more capable agents wil lhave larger action spaces - learning and grounding actions to assess their safety - beyond action generation - mixing language-based reasoning and code-based planning - extending deliberative reasoning to real-world settings - metareasoning to improve efficiency (LLMs are costly) - calibration and alignment ## 7 Discussion - boundary between agent and its env? - think of controllability and coupling - differences between physical and digital env? - one life in real life and in a digital env, can have many - how should agents continuously/autonomously learn? - follow a design similar to biological agents, learning when neccessary - how would agent design change with more powerful LLMs? - hard to say but CoALA and agent frameworks will still be useful