# hw3
https://hackmd.io/@y56/HJZXlQgFL
https://colab.research.google.com/drive/1tHOu52V1ES_wHiDmA31W5juWvIoeWp3G
## 1
### (a)
```python=
walking_model = LIPM()
step_time = 0.7 #in seconds
iterations_per_step = int(round(step_time / walking_model.dt))
step_length = 0.8
num_steps = 10
horizon_length = num_steps * iterations_per_step + 2 * iterations_per_step # 84
meta_horizon_length = num_steps * iterations_per_step + 30 * iterations_per_step # 280
meta_foot_position = np.zeros([meta_horizon_length])
for i in range(num_steps):
meta_foot_position[i*iterations_per_step:(i+1)*iterations_per_step] = i*step_length
# for the last cycles we stay in place
meta_foot_position[num_steps*iterations_per_step:] = meta_foot_position[num_steps*iterations_per_step-1]
foot_position = meta_foot_position[:horizon_length]
local_horizon_length = 84#160
G_bounds = []
h_bounds = []
Q = []
q = []
R = []
r = []
for i in range(meta_horizon_length):
Q_nominal = np.eye(2)
Q.append(Q_nominal)
q.append(Q_nominal.dot(np.array([[-meta_foot_position[i]],[0.]])))
R_nominal = 100*np.eye(1)
R.append(R_nominal)
r.append(R_nominal.dot(np.array([-meta_foot_position[i]])))
G_bounds.append(np.array([[0,0,1],[0,0,-1]]))
h_bounds.append(np.array([[walking_model.foot_size+meta_foot_position[i]],[walking_model.foot_size-meta_foot_position[i]]]))
def local_controller(x,i):
local_x, local_u = solve_mpc_collocation(walking_model.A,walking_model.B,
Q[i:i+local_horizon_length],q[i:i+local_horizon_length],
R[i:i+local_horizon_length],r[i:i+local_horizon_length],
G_bounds[i:i+local_horizon_length],
h_bounds[i:i+local_horizon_length],
local_horizon_length, x)
return local_u[0,0]
x0 = np.array([-0.0, 0.0])
x_plan, u_plan = solve_mpc_collocation(walking_model.A,walking_model.B,Q,q,R,r,G_bounds, h_bounds, horizon_length, x0)
x_real,u_real = walking_model.simulate(x0, local_controller, horizon_length, foot_position, noise=True)
plot_results(x_real, u_real, x_plan, u_plan)
animate_walker(x_real, u_real, foot_position)
```
### (b)
11 iterations is the min local horizon length we need.
### \(c\)
`iterations_per_step` is 7. So we need 11/7 step.*italicized text*
### (d)
From the plots below, we can see that the control fluctuates because of noise and that our robot steps by a shorter step distance in midway.
### (e)
```
Q,q==1, R,r==1000, local_horizon_length==11 will fail.
Q,q==1, R,r==1000, local_horizon_length==30 will succeed.
Larger R,r needs longer local_horizon_length to see more future.
Q,q==10, R,r==1000, local_horizon_length==11 will fail.
Q,q==10, R,r==1000, local_horizon_length==30 will fail.
Larger Q,q needs longer local_horizon_length to see more future.
Q,q==1*10^9, R,r==100*10^9, local_horizon_length==11 will fail.
Q,q==1*10^9, R,r==100*10^9, local_horizon_length==30 will fail.
Large cost need a longer local_horizon_length to see more future.
```
## 2
### (a)
#### implementation
In Python, implement the value iteration algorithm to solve the problem (initialize the value function to 0).
```python=
import copy
f = float('inf')
g = [
[0, 0, 1, 0, -1],
[0, f, 1, 0, 10],
[0, 10, f, 1, 0],
[1, 0, 0, 10, -1]]
def nice_cost(pre_tot_value, i, j):
ans = pre_tot_value[i][j]
if j - 1 >= 0:
ans = min(ans, pre_tot_value[i][j - 1])
if j + 1 < len(g[0]):
ans = min(ans, pre_tot_value[i][j + 1])
if i - 1 >= 0:
ans = min(ans, pre_tot_value[i - 1][j])
if i + 1 < len(g):
ans = min(ans, pre_tot_value[i + 1][j])
return ans
class tolerance_checker:
def __init__(self):
self.memory = None
self.tolerance = 1e-6
def check(self, tot_value):
if self.memory is None:
self.memory = copy.deepcopy(tot_value)
return True
for r1, r2 in zip(self.memory, tot_value):
for e1, e2 in zip(r1, r2):
if abs(e1 - e2) > self.tolerance:
self.memory = copy.deepcopy(tot_value)
return True
print("gotcha")
return False
al = 0.99
# tot_value at N-th step is
# min \sum_{n=0}^N al^n g_n(x_n)
# take N to +inf
# init
tot_value = [[0] * len(g[0]) for _ in range(len(g))]
pre_tot_value = copy.deepcopy(tot_value)
ZEROS = [[0] * len(g[0]) for _ in range(len(g))]
do_more = True
iter_count = 0
my_t_checker = tolerance_checker()
while do_more and iter_count < 10000:
for i, r in enumerate(tot_value):
for j, e in enumerate(r):
tot_value[i][j] = g[i][j] + al * nice_cost(pre_tot_value, i, j)
pre_tot_value = copy.deepcopy(tot_value)
do_more = my_t_checker.check(tot_value)
print(iter_count)
# Print digit
for r in tot_value:
for e in r:
print(' % .3f ' % e, end='')
print()
print()
iter_count += 1
```
#### number of iterations
How many iterations does it take to attain convergence? (we assume here that convergence happens when all the elements of the value function do not change more than $10^{−6}$ in a new iteration.
:::info
It takes 1375 iterations to converge.
:::
The resulted value function is
```
-95.079 -96.040 -97.010 -99.000 -100.000
-94.129 inf -96.030 -98.010 -89.000
-93.187 -82.255 inf -97.010 -99.000
-91.255 -90.343 -89.439 -89.000 -100.000
```
The resulted policy is
```
→ → → → Stay
↑ ↑ → ↑ ↑
↑ ← → → ↓
↑ ← ← → Stay
```
### (b)
#### implementation
In Python, implement the policy iteration algorithm to solve the problem (use the version that solves the linear equation $(I − \alpha A)J_\mu = \bar{g}$. Start with an initial policy that does not move.
```python=
import copy
import numpy as np
from numpy.linalg import inv
f = 1e8 # float("inf") results in NaN when np.matmul()
g = [
[0, 0, 1, 0, -1],
[0, f, 1, 0, 10],
[0, 10, f, 1, 0],
[1, 0, 0, 10, -1]]
al = 0.99
N = len(g) * len(g[0]) # tot num of possible_states
def fn_index(i, j):
return i * len(g[0]) + j
g_bar = [] # just a re-shaping
for r in g:
for e in r:
g_bar.append([e])
g_bar = np.array(g_bar)
class tolerance_checker:
def __init__(self):
self.memory = None
def check(self, mu):
if self.memory is None:
self.memory = copy.deepcopy(mu)
return True
for r1, r2 in zip(self.memory, mu):
for e1, e2 in zip(r1, r2):
if e1 != e2:
self.memory = copy.deepcopy(mu)
return True
print("gotcha, all the same as before")
return False
```
```python=
def fn_J(mu):
def fn_A(mu):
A = [[0] * N for _ in range(N)]
for i, r in enumerate(mu):
for j, e in enumerate(r):
index = fn_index(i, j)
U_index = fn_index(i - 1, j)
D_index = fn_index(i + 1, j)
L_index = fn_index(i, j - 1)
R_index = fn_index(i, j + 1)
if e == "U":
if not (0 <= (i - 1) < len(mu) and 0 <= j < len(mu[0])):
print("is going beyond the gird")
raise ValueError
A[U_index][index] = 1
if e == "D":
if not (0 <= (i + 1) < len(mu) and 0 <= j < len(mu[0])):
print("is going beyond the gird")
raise ValueError
A[D_index][index] = 1
if e == "L":
if not (0 <= i < len(mu) and 0 <= (j - 1) < len(mu[0])):
print("is going beyond the gird")
raise ValueError
A[L_index][index] = 1
if e == "R":
if not (0 <= i < len(mu) and 0 <= (j + 1) < len(mu[0])):
print("is going beyond the gird")
raise ValueError
A[R_index][index] = 1
if e == "S":
A[index][index] = 1
return A
A = fn_A(mu)
A = np.array(A)
A = A.transpose() # !!! necessary
one = np.identity(N)
tmp1 = one - al * A
tmp2 = inv(tmp1)
J = np.matmul(tmp2, g_bar)
return J
```
```python=
def argmin_u(i, j, J):
cur_min = g[i][j] + al * J[fn_index(i, j)]
cur_control = "S"
if 0 <= (i - 1) < len(g) and 0 <= j < len(g[0]):
if g[i][j] + al * J[fn_index(i - 1, j)] < cur_min:
cur_min = g[i][j] + al * J[fn_index(i - 1, j)]
cur_control = "U"
if 0 <= (i + 1) < len(g) and 0 <= j < len(g[0]):
if g[i][j] + al * J[fn_index(i + 1, j)] < cur_min:
cur_min = g[i][j] + al * J[fn_index(i + 1, j)]
cur_control = "D"
if 0 <= i < len(g) and 0 <= (j - 1) < len(g[0]):
if g[i][j] + al * J[fn_index(i, j - 1)] < cur_min:
cur_min = g[i][j] + al * J[fn_index(i, j - 1)]
cur_control = "L"
if 0 <= i < len(g) and 0 <= (j + 1) < len(g[0]):
if g[i][j] + al * J[fn_index(i, j + 1)] < cur_min:
cur_min = g[i][j] + al * J[fn_index(i, j + 1)]
cur_control = "R"
return cur_control
# tot_value at N-th step is
# min \sum_{n=0}^N al^n g_n(x_n)
# take N to +inf
# init a policy for each cell in grid
mu = [["S"] * len(g[0]) for _ in range(len(g))]
# Up, Down, Right, Left, Stay
do_more = True
iter_count = 0
my_t_checker = tolerance_checker()
while do_more and iter_count < 1000:
# from mu_now(==policy of now), generate J_mu(==value function of now)
J = fn_J(mu)
# update mu_now(==policy of now) to be mu_next(==policy of next)
for i, r in enumerate(mu):
for j, e in enumerate(r):
# output is in {Up, Down, Right, Left, Stay}
mu[i][j] = argmin_u(i, j, J)
# choose a control that minimize g(x_now)+alpha*J_mu_now(f(x_next)
# i,j is enough to stand for state x_now
do_more = my_t_checker.check(mu)
print(iter_count)
# Print policy
for r in mu:
for e in r:
print(e, end='')
print()
print()
# Print value function
for r in J.reshape(len(g), len(g[0])):
for e in r:
print(' % .2f ' % e, end='')
print()
print()
iter_count += 1
```
#### number of iterations
How many iterations does it take to converge?
:::info
It takes 7 iterations.
:::
The policy is
```
RRRRS
UURUU
ULRRD
ULLRS
```
The value function is
```
-95.08 -96.04 -97.01 -99.00 -100.00
-94.13 99999904 -96.03 -98.01 -89.00
-93.19 -82.26 99999903 -97.01 -99.00
-91.26 -90.34 -89.44 -89.00 -100.00
```
### \(c\)
Compare the solutions and convergence/complexity of each algorithm to solve this problem.
#### value iteration
For value iteration, the time complexity is $O(i\ n\ 5)$, where $i=1375$ is number of iteration, $n$ the size of the grid, and $5$ the number of possible controls. And the space complexity is $O(n)$ to store the previous result so that we can calculate tolerance.
#### policy iteration
For policy iteration, the time complexity is $O(i\ ((n+x)+(n \ 5)))$, where $i=7$ is number of iteration, $n$ the size of the grid, and $5$ the number of possible controls. $x$ is the time complexity of computing the inverse matrix, which is $O(n^3)$ for Gauss-Jordan method. And the space complexity is $O(n)$ to store the previous result so that we can calculate tolerance but Gauss-Jordan method take $O(n^2)$ space.
> http://mathforum.org/library/drmath/view/51908.html
> For time of matrix inversion, there exists Strassen's method that requires only $O(n^{\lg_2(7)}) = O(n^{2.807})$. The constant is not small (between 4 and 5), and the programming of Strassen's algorithm is so awkward, that often Gaussian Elimination is still the preferred method. Even Strassen's method is not optimal. I believe that the current record stands at O(n^2.376), thanks to Don Coppersmith and Shmuel Winograd.
For the problem here policy iteration method seems better, while for large size problems, we may not able to do matrix inversion.