hw3 - HackMD

# hw3 https://hackmd.io/@y56/HJZXlQgFL https://colab.research.google.com/drive/1tHOu52V1ES_wHiDmA31W5juWvIoeWp3G ## 1 ### (a) ```python= walking_model = LIPM() step_time = 0.7 #in seconds iterations_per_step = int(round(step_time / walking_model.dt)) step_length = 0.8 num_steps = 10 horizon_length = num_steps * iterations_per_step + 2 * iterations_per_step # 84 meta_horizon_length = num_steps * iterations_per_step + 30 * iterations_per_step # 280 meta_foot_position = np.zeros([meta_horizon_length]) for i in range(num_steps): meta_foot_position[i*iterations_per_step:(i+1)*iterations_per_step] = i*step_length # for the last cycles we stay in place meta_foot_position[num_steps*iterations_per_step:] = meta_foot_position[num_steps*iterations_per_step-1] foot_position = meta_foot_position[:horizon_length] local_horizon_length = 84#160 G_bounds = [] h_bounds = [] Q = [] q = [] R = [] r = [] for i in range(meta_horizon_length): Q_nominal = np.eye(2) Q.append(Q_nominal) q.append(Q_nominal.dot(np.array([[-meta_foot_position[i]],[0.]]))) R_nominal = 100*np.eye(1) R.append(R_nominal) r.append(R_nominal.dot(np.array([-meta_foot_position[i]]))) G_bounds.append(np.array([[0,0,1],[0,0,-1]])) h_bounds.append(np.array([[walking_model.foot_size+meta_foot_position[i]],[walking_model.foot_size-meta_foot_position[i]]])) def local_controller(x,i): local_x, local_u = solve_mpc_collocation(walking_model.A,walking_model.B, Q[i:i+local_horizon_length],q[i:i+local_horizon_length], R[i:i+local_horizon_length],r[i:i+local_horizon_length], G_bounds[i:i+local_horizon_length], h_bounds[i:i+local_horizon_length], local_horizon_length, x) return local_u[0,0] x0 = np.array([-0.0, 0.0]) x_plan, u_plan = solve_mpc_collocation(walking_model.A,walking_model.B,Q,q,R,r,G_bounds, h_bounds, horizon_length, x0) x_real,u_real = walking_model.simulate(x0, local_controller, horizon_length, foot_position, noise=True) plot_results(x_real, u_real, x_plan, u_plan) animate_walker(x_real, u_real, foot_position) ``` ### (b) 11 iterations is the min local horizon length we need. ### $c$ `iterations_per_step` is 7. So we need 11/7 step.*italicized text* ### (d) From the plots below, we can see that the control fluctuates because of noise and that our robot steps by a shorter step distance in midway. ### (e) ``` Q,q==1, R,r==1000, local_horizon_length==11 will fail. Q,q==1, R,r==1000, local_horizon_length==30 will succeed. Larger R,r needs longer local_horizon_length to see more future. Q,q==10, R,r==1000, local_horizon_length==11 will fail. Q,q==10, R,r==1000, local_horizon_length==30 will fail. Larger Q,q needs longer local_horizon_length to see more future. Q,q==1*10^9, R,r==100*10^9, local_horizon_length==11 will fail. Q,q==1*10^9, R,r==100*10^9, local_horizon_length==30 will fail. Large cost need a longer local_horizon_length to see more future. ``` ## 2 ### (a) #### implementation In Python, implement the value iteration algorithm to solve the problem (initialize the value function to 0). ```python= import copy f = float('inf') g = [ [0, 0, 1, 0, -1], [0, f, 1, 0, 10], [0, 10, f, 1, 0], [1, 0, 0, 10, -1]] def nice_cost(pre_tot_value, i, j): ans = pre_tot_value[i][j] if j - 1 >= 0: ans = min(ans, pre_tot_value[i][j - 1]) if j + 1 < len(g[0]): ans = min(ans, pre_tot_value[i][j + 1]) if i - 1 >= 0: ans = min(ans, pre_tot_value[i - 1][j]) if i + 1 < len(g): ans = min(ans, pre_tot_value[i + 1][j]) return ans class tolerance_checker: def __init__(self): self.memory = None self.tolerance = 1e-6 def check(self, tot_value): if self.memory is None: self.memory = copy.deepcopy(tot_value) return True for r1, r2 in zip(self.memory, tot_value): for e1, e2 in zip(r1, r2): if abs(e1 - e2) > self.tolerance: self.memory = copy.deepcopy(tot_value) return True print("gotcha") return False al = 0.99 # tot_value at N-th step is # min \sum_{n=0}^N al^n g_n(x_n) # take N to +inf # init tot_value = [[0] * len(g[0]) for _ in range(len(g))] pre_tot_value = copy.deepcopy(tot_value) ZEROS = [[0] * len(g[0]) for _ in range(len(g))] do_more = True iter_count = 0 my_t_checker = tolerance_checker() while do_more and iter_count < 10000: for i, r in enumerate(tot_value): for j, e in enumerate(r): tot_value[i][j] = g[i][j] + al * nice_cost(pre_tot_value, i, j) pre_tot_value = copy.deepcopy(tot_value) do_more = my_t_checker.check(tot_value) print(iter_count) # Print digit for r in tot_value: for e in r: print(' % .3f ' % e, end='') print() print() iter_count += 1 ``` #### number of iterations How many iterations does it take to attain convergence? (we assume here that convergence happens when all the elements of the value function do not change more than $10^{−6}$ in a new iteration. :::info It takes 1375 iterations to converge. ::: The resulted value function is ``` -95.079 -96.040 -97.010 -99.000 -100.000 -94.129 inf -96.030 -98.010 -89.000 -93.187 -82.255 inf -97.010 -99.000 -91.255 -90.343 -89.439 -89.000 -100.000 ``` The resulted policy is ``` → → → → Stay ↑ ↑ → ↑ ↑ ↑ ← → → ↓ ↑ ← ← → Stay ``` ### (b) #### implementation In Python, implement the policy iteration algorithm to solve the problem (use the version that solves the linear equation $(I − \alpha A)J_\mu = \bar{g}$. Start with an initial policy that does not move. ```python= import copy import numpy as np from numpy.linalg import inv f = 1e8 # float("inf") results in NaN when np.matmul() g = [ [0, 0, 1, 0, -1], [0, f, 1, 0, 10], [0, 10, f, 1, 0], [1, 0, 0, 10, -1]] al = 0.99 N = len(g) * len(g[0]) # tot num of possible_states def fn_index(i, j): return i * len(g[0]) + j g_bar = [] # just a re-shaping for r in g: for e in r: g_bar.append([e]) g_bar = np.array(g_bar) class tolerance_checker: def __init__(self): self.memory = None def check(self, mu): if self.memory is None: self.memory = copy.deepcopy(mu) return True for r1, r2 in zip(self.memory, mu): for e1, e2 in zip(r1, r2): if e1 != e2: self.memory = copy.deepcopy(mu) return True print("gotcha, all the same as before") return False ``` ```python= def fn_J(mu): def fn_A(mu): A = [[0] * N for _ in range(N)] for i, r in enumerate(mu): for j, e in enumerate(r): index = fn_index(i, j) U_index = fn_index(i - 1, j) D_index = fn_index(i + 1, j) L_index = fn_index(i, j - 1) R_index = fn_index(i, j + 1) if e == "U": if not (0 <= (i - 1) < len(mu) and 0 <= j < len(mu[0])): print("is going beyond the gird") raise ValueError A[U_index][index] = 1 if e == "D": if not (0 <= (i + 1) < len(mu) and 0 <= j < len(mu[0])): print("is going beyond the gird") raise ValueError A[D_index][index] = 1 if e == "L": if not (0 <= i < len(mu) and 0 <= (j - 1) < len(mu[0])): print("is going beyond the gird") raise ValueError A[L_index][index] = 1 if e == "R": if not (0 <= i < len(mu) and 0 <= (j + 1) < len(mu[0])): print("is going beyond the gird") raise ValueError A[R_index][index] = 1 if e == "S": A[index][index] = 1 return A A = fn_A(mu) A = np.array(A) A = A.transpose() # !!! necessary one = np.identity(N) tmp1 = one - al * A tmp2 = inv(tmp1) J = np.matmul(tmp2, g_bar) return J ``` ```python= def argmin_u(i, j, J): cur_min = g[i][j] + al * J[fn_index(i, j)] cur_control = "S" if 0 <= (i - 1) < len(g) and 0 <= j < len(g[0]): if g[i][j] + al * J[fn_index(i - 1, j)] < cur_min: cur_min = g[i][j] + al * J[fn_index(i - 1, j)] cur_control = "U" if 0 <= (i + 1) < len(g) and 0 <= j < len(g[0]): if g[i][j] + al * J[fn_index(i + 1, j)] < cur_min: cur_min = g[i][j] + al * J[fn_index(i + 1, j)] cur_control = "D" if 0 <= i < len(g) and 0 <= (j - 1) < len(g[0]): if g[i][j] + al * J[fn_index(i, j - 1)] < cur_min: cur_min = g[i][j] + al * J[fn_index(i, j - 1)] cur_control = "L" if 0 <= i < len(g) and 0 <= (j + 1) < len(g[0]): if g[i][j] + al * J[fn_index(i, j + 1)] < cur_min: cur_min = g[i][j] + al * J[fn_index(i, j + 1)] cur_control = "R" return cur_control # tot_value at N-th step is # min \sum_{n=0}^N al^n g_n(x_n) # take N to +inf # init a policy for each cell in grid mu = [["S"] * len(g[0]) for _ in range(len(g))] # Up, Down, Right, Left, Stay do_more = True iter_count = 0 my_t_checker = tolerance_checker() while do_more and iter_count < 1000: # from mu_now(==policy of now), generate J_mu(==value function of now) J = fn_J(mu) # update mu_now(==policy of now) to be mu_next(==policy of next) for i, r in enumerate(mu): for j, e in enumerate(r): # output is in {Up, Down, Right, Left, Stay} mu[i][j] = argmin_u(i, j, J) # choose a control that minimize g(x_now)+alpha*J_mu_now(f(x_next) # i,j is enough to stand for state x_now do_more = my_t_checker.check(mu) print(iter_count) # Print policy for r in mu: for e in r: print(e, end='') print() print() # Print value function for r in J.reshape(len(g), len(g[0])): for e in r: print(' % .2f ' % e, end='') print() print() iter_count += 1 ``` #### number of iterations How many iterations does it take to converge? :::info It takes 7 iterations. ::: The policy is ``` RRRRS UURUU ULRRD ULLRS ``` The value function is ``` -95.08 -96.04 -97.01 -99.00 -100.00 -94.13 99999904 -96.03 -98.01 -89.00 -93.19 -82.26 99999903 -97.01 -99.00 -91.26 -90.34 -89.44 -89.00 -100.00 ``` ### $c$ Compare the solutions and convergence/complexity of each algorithm to solve this problem. #### value iteration For value iteration, the time complexity is $O(i\ n\ 5)$, where $i=1375$ is number of iteration, $n$ the size of the grid, and $5$ the number of possible controls. And the space complexity is $O(n)$ to store the previous result so that we can calculate tolerance. #### policy iteration For policy iteration, the time complexity is $O(i\ ((n+x)+(n \ 5)))$, where $i=7$ is number of iteration, $n$ the size of the grid, and $5$ the number of possible controls. $x$ is the time complexity of computing the inverse matrix, which is $O(n^3)$ for Gauss-Jordan method. And the space complexity is $O(n)$ to store the previous result so that we can calculate tolerance but Gauss-Jordan method take $O(n^2)$ space. > http://mathforum.org/library/drmath/view/51908.html > For time of matrix inversion, there exists Strassen's method that requires only $O(n^{\lg_2(7)}) = O(n^{2.807})$. The constant is not small (between 4 and 5), and the programming of Strassen's algorithm is so awkward, that often Gaussian Elimination is still the preferred method. Even Strassen's method is not optimal. I believe that the current record stands at O(n^2.376), thanks to Don Coppersmith and Shmuel Winograd. For the problem here policy iteration method seems better, while for large size problems, we may not able to do matrix inversion.