HW 1 - HackMD

# ML HW 1 Ivan Christian ## 1 ### a) | A/B | 1 | 2 | 3 | | -------- | -------- | -------- | -------- | | 2 | $\frac{1}{36}$ | $\frac{5}{18}$ |$\frac{1}{9}$ | 3 | $\frac{1}{4}$ | $\frac{5}{36}$ |$\frac{7}{36}$ $P(A=3, B = 3)$ = $\frac{7}{36}$ ### b) $P(B = 1)$ = $\frac{1}{36} + \frac{1}{4}$ = $\frac{5}{18}$ $P(B = 2)$ = $\frac{5}{18} + \frac{5}{36}$= $\frac{5}{12}$ $P(B = 3)$ = $\frac{1}{9} + \frac{7}{36}$ = $\frac{11}{36}$ ### c) $E(X)$ = $\sum{x . P(X)}$ = $1 * \frac{5}{18} + 2 * \frac{5}{12} + 3 * \frac{11}{36}$ = $\frac{73}{36}$ = $2\frac{1}{36}$ --------------- $E(X^2)$ = $\sum{x^2 . P(X)}$ = $1^2 * \frac{5}{18} + 2^2 * \frac{5}{12} + 3^2 * \frac{11}{36}$ $E(X^2)$ = $\frac{5}{18} + \frac{20}{12} + \frac{99}{36} = \frac{169}{36}$ $Var(X) = E(X^2) - E(X)^2$ $Var(X) = \frac{169}{36} - (\frac{73}{36})^2 = 0.58256172839$ ### d) $P(A = 2 | B = 1) = \frac{P(A=2, B = 1)}{P(B=1)} = \frac{\frac{1}{36}}{\frac{5}{18}} = 0.1$ ## 2 ### a) #### Range of the parameters Max of $P(X=1)= 1$ Min of $P(X=1)= 0$ $0 ≤ θ ≤ \frac{1}{2}$ Max of $P(X=2)= 1$ Min of $P(X=2)= 0$ $0 ≤ θ ≤ \frac{1}{3}$ Max of $P(X=3)= 1$ Min of $P(X=3)= 0$ $-\frac{1}{5} ≤ θ ≤ \frac{1}{5}$, we know that probability has a minimum of 0 therefore, $0 ≤ θ ≤ \frac{1}{5}$ #### Maximum Likelihood Estimator $L(θ) = P(X=1)*P(X=2)*P(X=3)$ $L(θ) = 2θ^{x_1}*3θ^{x_2}*(1-5θ)^{x_3}$ $log(L(θ)) = log(2θ^{x_1}*3θ^{x_2}*(1-5θ)^{x_3})$ $log(L(θ)) = x_1log(2θ) +x_2log(3θ) + x_3log(1-5θ)$ Maximum value is found when the derivative is 0 $\frac{d}{dθ}(log(L(θ))) = \frac{d}{dθ}(x_1log(2θ) +x_2log(3θ) + x_3log(1-5θ)) = 0$ $\frac{x_1 + x_2}{θ} = \frac{5x_3}{1-5θ}$ $θ̂= \frac{x_1+x_2}{5(x_1+x_2+x_3)}$ To find the maximum likelihood estimator: $\frac{d^2}{dθ} log⁡(L(θ))= -\frac{x_1}{θ^2} -\frac{x_2}{θ^2} +\frac{(25x_3)}{(1-5θ)^2}$ So long as $\frac{d^2}{dθ} log⁡(L(θ)) < 0$ , $θ$ would be the maximum value of the likelihood. ### b) Let $L(a,b|X_1,…,X_n )=∏_{(i=1)}^{n}\frac{1}{(b-a)} 1_{(X_i∈[a,b])}$ where, $1_{(X_i∈[a,b])}$ is the indicator function, where the value is 1 when $a≤X_i≤b$ $L(a,b|X_1,…,X_n)= (\frac{1}{b-a})^n$ $log(L(a,b|X_1,…,X_n)= log((\frac{1}{b-a})^n)$ $log(L(a,b|X_1,…,X_n)= -nlog(b-a)$ $\frac{d}{da}(log(L(a,b|X_1,…,X_n)) = \frac{n}{b-a} = 0$ $\frac{d}{db}(log(L(a,b|X_1,…,X_n)) = \frac{n}{a-b} = 0$ we know that $n>0$ and therefore the derivative is inconclusive in determining hte maximum likelihood estimator. As such to maximise the likelihood $a ̂=min⁡ X_i$ and $b ̂=max⁡ X_i$ ## 3 ### a) $μ_1=\frac{(2+8)}{2}=5$ $μ_2=\frac{(1+3+9+10)}{4}=5.75$ ### b) $D_1=\{1,2,3\}$ $D_2=\{8,9,10\}$ ### c) $μ_1=\frac{(1+2+3)}{3}=2$ $μ_2=\frac{(8+9+10)}{3}=9$ Clustering is stable since the new possible centroid are the same. $D_1=\{1,2,3\}$ $D_2=\{8,9,10\}$ ## 4 ### a) 1. To be a probability density function one of the criteria is that the sum of a probability function should equal to 1, which is the are/volume calculated by integrating within the limits. $=\int_{0}^{1}\int_{0}^{2} \frac{6}{7} (x^2+ \frac{xy}{2}) \,dy\,dx$ $=\frac{6}{7}\int_{0}^{1}[x^2y +\frac{xy^2}{4} ]_0^2,dx$ $=\frac{6}{7}\int_{0}^{1}2x^2 + x\,dx$ $=\frac{6}{7}[\frac{4}{6}x^3 + \frac{3}{6}x^2]_{0}^{1}$ $= 1$ 2. $f(x,y)≥0$ for $x∈[0,1]$ and $y∈[0,2]$ also shows that this pdf is valid ### b) PDF of X (Marginal probability of X) $g(x)=∫_0^2f(x,y)dy$ $g(x)=∫_0^2\frac{6}{7} (x^2+ \frac{xy}{2}) dy$ $g(x)=[\frac{6}{7} (x^2y+ \frac{xy^2}{4})]_0^2$ $g(x)=[\frac{6x^2y}{7} + \frac{3xy^2}{14})]_0^2$ $g(x)=[\frac{12x^2}{7} + \frac{6x}{7}] \ 0 ≤ x ≤ 1, = 0 \ otherwise$ ### c) Hence, using the pdf of X $f_{Y|X} (y | x=1) = \frac{f(x=1,y)}{f(x=1)}$ $f_{Y|X} (y | x=1)=\frac{\int_0^y 6/7(1 + 0.5y) dy}{18/7}$ $f_{Y|X} (y | x=1)=\frac{\int_0^y (1 + 0.5y) dy}{3}$ ### d) To check for independence $f(x,y) = f_X(x)*f_Y(y) = g(x) g(y)$ Marginal pdf Y $g(y)=∫_0^1f(x,y)dx$ $g(y)=∫_0^2\frac{6}{7} (x^2+ \frac{xy}{2}) dx$ $g(y)=[\frac{6}{7}(\frac{x^3}{3}+\frac{x^2y}{4})]_0^2$ $g(y)=\frac{6}{7}(\frac{8}{3}+y)$ From a glance we can see that $f(x,y) ≠ f_X(x)* f_Y(y)$ Therefore, they are not independent. ## 5 ### a) ``` fprintf("Question 5a\n") RIASEC = readtable('RIASEC.csv'); only_R = table2array(RIASEC(:,2:9)); m = size(only_R,1); row_to_remove = []; for row = 1:m if any(only_R(row,:)<0,2) % returns true if row contains any value less than 0 row_to_remove = [row_to_remove,row]; end end only_R(row_to_remove,:) = []; ``` ### b) ``` fprintf("Question 5b\n") total_R = sum(only_R,2)/8; R1 = only_R(:,1); x = [ones(6500,1) R1(1:6500)]; y = total_R(1:6500); theta = inv(x.'*x)*x.'*y; % theta0 = 8.1448, theta1 = 3.3888 disp("estimated regression function: " + "y = "+ theta(1) + "+" + theta(2) + "x") predicted = theta(1) + theta(2)*x(:,2); RSS = sum((y - predicted).^2); disp("Residual sum of squares: " + RSS) RSS_avg = RSS/size(y,1); disp("Residual sum of squares averaged: " + RSS_avg) ``` ![](https://hackmd.io/_uploads/HkH0M4xb6.png) ### c) ``` fprintf("Question 5c\n") test_x = R1(6501:end); test_y = total_R(6501:end); test_predicted = theta(1) + theta(2)*test_x; test_RSS = sum((test_y - test_predicted).^2); disp("Residual sum of squares of remaining people: " + test_RSS) test_RSS_avg = test_RSS/(size(test_y,1)); disp("Residual sum of squares of remaining people averaged: " + test_RSS_avg) disp("Residual sum of squares for part b: " + RSS_avg) if RSS_avg < test_RSS_avg disp("RSS of test (part c) is more than training (part b)") end ``` ![](https://hackmd.io/_uploads/H1rkQVeba.png)