Question 1 - HackMD

Actions are mapped to indices from 0 to 4 and are defined as follows - 0 -> Stay, 1 -> Left, 2 -> Right, 3 -> Up, 4 -> Down States are mapped to indices from 0 to 127. A state's index can be used to determine the player position, target position and the call status of the current state, using the following formulae player position = index//16 target position = (index%16)//2 call status = (index)%2 The players positions mapped to the actual positions on the grid is as follows: 0,1,2,3 4,5,6,7 A 1 in the call state means that the call is active, while a 0 means the call is inactive. # Question 1 Target is in (1,0) cell and observation is o6. [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0.1, 0, 0, 0, 0, 0, 0] Any state with player position 1,2,3,6,or 7; target position at 4 and call status 0 or 1 will have a probability of 0.1 while all other states will have a probability of 0. # Question 2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0, 0, 0, 0.25, 0, 0.25, 0, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] Any state with player position 5; target position at 1 or 4 or 5 or 6 and call status 0 will have a probability of 0.25 while all other states will have a probability of 0. # Question 3 Expected reward for Q1 is 9.06 Expected reward for Q2 is 20.13 # Question 4 We use the formula n = ![equation](http://www.sciweavers.org/tex2img.php?eq=%7CA%7C%5E%7B%5Cfrac%7B%20O%5E%7Bh%7D%20-%201%20%7D%7BO%20-%201%7D%20%7D%20&bc=White&fc=Black&im=jpg&fs=12&ff=arev&edit=0) where __ = 265,288,703,664,880,029,479,731 and |A| = 5 # Question 5 P of o2 = 0.1 as there is only one case when o2 is observed. This case is when agent is at 0,0 and target is at 0,1. P of o4 is 0.15 as there is only one case when o2 is observed. This case is when agent is at 1,3 and target is at 1,2. P of o6 is 0.75 as, in all other cases this occurs. This is the observation that is most likely to be observed.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.