Ch02: Intelligent Agents - HackMD

<style> .r { color: rgb(255, 0, 0); font-weight: bold; } .lb { color: rgb(81, 152, 204); font-weight: bold; } .lg { color: rgb(0, 181, 0); font-weight: bold; } </style>  # Intelligent Agents ## What is an agent? - Agent include humans, robots, softbots, thermostats, etc. - An agent is anything that can be viewed as **perceiving** its **environment** through **sensors** and acting upon that environment through **actuators**. ![image.png](https://hackmd.io/_uploads/SJRpClqQp.png) ### Agents and Environments - The agent function maps from percept histories to actions - f: P* $\rightarrow$ A - That is, mathematically speaking, an agent’s behavior is described by the agent function. ## Vacuum-cleaner ### Vacuum-cleaner World ![image.png](https://hackmd.io/_uploads/BkLilWq7p.png =400x) - A small world - Perspects: location and contents - e.g., [A, Dirty] - Actions: Left, Right, Suck, NoOP ![image.png](https://hackmd.io/_uploads/S1NtlWcX6.png) ## The Concept of Rationality - A rational agent is: - The one that does the <font class="r">right</font> thing. - By considering the consequences of the agent’s behavior, which generates a desirable sequence of environment states. - Captured by a performance measure that evaluates any given sequence of **environment states**. - Notice, it <font class="lb">said environment states not agent states</font> - Why? - Human agents in particular are notorious for “sour grapes” – believing they did not really want something after not getting it ### Rationality - Fixed performance measure evaluates the environment sequence - One point per square cleaned up in time T? - One point per clean square per time step, minus one per move? - Penalize fot > k dirty squares? > *Better to design performance measures according to what one actually wants in the environment, rather than according to how one thinks the agent should behave.* > > 最好根據人們在環境中實際想要的成果來設計績效衡量標準，而不是根據人們認為代理應該如何行事。 - A rational agent chooses whichever action **maximizes** the expected value of the performance measure given the percept sequence to date. - Rational $\ne$ Omniscient（全知的） - Rational $\ne$ Clairvoyant（先知） - $\rightarrow$ Rational $\ne$ Successful - Rational $\rightarrow$ exploration, learning, autonomy (探索, 學習, 自主) ## <font class="lg">PEAS</font> - To design a rational agent, we must specify the task environment. - Consider - ex. the task of designing an automated taxi: - Performance measure?? - Environment?? - Actuators?? - Sensors?? - <font class="lg">P</font>erformance measure (性能指標) - Safty, desination, profits, legality, comfort... - <font class="lg">E</font>nvironment - US streets/freeways, traffic, pedestrians, weather... - <font class="lg">A</font>ctuators (執行器) - Steering, accelerator, brake, horn, speaker/display... - <font class="lg">S</font>ensors (感應器) - Video, accelerometers, gauges, engine sensors, GPS... ## Internet Shopping Agent ### PEAS - Performance measure?? - price, quality, appropriateness, efficiency - Environment?? - current and future WWW site, vendors, shippers - Actuators?? - display to user, follow URL, fill in form - Sensors?? - HTML pages (text, graphics, scripts) ## Environment Types - <font class="lb">Fully observable</font> vs. <font class="lg">Partially observable</font> - **Single agent** v.s. **Multiagent** - **Competitive** multiagent - Ex. chess game - **Cooperative** multiagent - Ex. taxi-driving environment - <font class="lb">Deterministic</font> (決定性的) vs. <font class="lg">Stochastic</font>（隨機的） - <font class="lb">Deterministic</font>: next state of the environment is <font class="lb">completely determined by the current state</font> and action executed by the agent - <font class="lg">Stochastic</font>: such as taxi-driving（<font class="lg">不知道路上會有什麼狀況</font>） - <font class="lb">Episodic</font> vs. <font class="lg">Sequential</font> - Episodic: <font class="lb">先前的事件不影響之後的事件</font> - Sequential: <font class="lg">當前決策會影響未來的決策</font> - Ex. chess or taxi driving - Static vs. dynamic - If the environment can **change while an agent is deliberating**, then the environment is dynamic for that agent. (如果環境可以在代理思考時發生變化，那麼該環境對該代理來說是動態的) - **Semidynamic**: the environment itself does not change with the passage of time but the agent’s performance score does. (環境本身不會隨著時間的推移而改變，但代理的表現分數會隨著時間的推移而改變) - Ex. chess played with a clock - Taxi driving is clearly dynamic. Crossword puzzles are static. - Discrete vs. continuous - Distinction applies to the state of the environment, to the way time is handled, and to the percepts and actions of the agent. (區別適用於環境狀態、處理時間的方式以及代理的感知和行動) - Chess game has a discrete set of percepts and actions. - Tax driving is a continuous-state and continuous-time problem - Known vs. unknown - Refers not to the environment but to the agent’s state of knowledge about the “laws of physics” of the environment. (不是指環境，而是指主體對環境「物理定律」的了解狀態) - Known environment and partially observable: solitaire card game - **Unknown environment and fully observable**: a new video game, you don’t know what the buttons in the game do until you try them. --- ![](https://hackmd.io/_uploads/BJawCu5kp.png =600x) - The environment type largely determines the agent design - The real world is (of course) partially observable, stochastic, sequential, dynamic, continuous, multi-agent - The hardest case: - Partially observable, multi-agent, stochastic, sequential, dynamic, continuous, and unknown - Taxi driving has all these expect that for the most part the driver's environment is known ## Agent Programs - <font class="r">Agent = architecture + program</font> - Program will run on some sort of computing device with physical sensors and actuators – we call this the architecture ## Agent Types Four basic types in order to **increasing generality**: - Simple reflex agents - Model-based reflex agents - Goal-based agent 目標型代理 - Utility-based agents 實用型代理 All these can be turned into learning agents. ### Simple Reflex Agents ![](https://hackmd.io/_uploads/ryqHgF516.png =400x) > **It acts according to a rule whose condition matches the current state, as defined by the percept** ### Model-based reflex agents (Reflex Agents with State) ![](https://hackmd.io/_uploads/rkAoeK5kT.png =400x) > **It keeps track of the current state of the world, using an internal model** ### Goal-based Agent ![](https://hackmd.io/_uploads/HJRU-FcJa.png =400x) ### Utility-based Agents ![](https://hackmd.io/_uploads/HywYWt5kT.png =400x) ### Learning Agents ![](https://hackmd.io/_uploads/Sks17Y516.png =400x) ## Summary - Agents interact with environments through actuactors and sensors - The agent function describe what the agent does in all circumstances - The performance measure evaluates the environment sequence - A perfectly rational agent maximizes expected performance - Agent programs implements (some) agent functions - PEAS descriptions define task environments - Environments are categorized along several dimensions: - Observable? Deterministic? Episodic? Static? Discrete? Single-agent? - Several basic agent architectures exists: - Reflex, reflex with state, goal-based, utility-based