# ML HW 1
Ivan Christian
## 1
### a)
| A/B | 1 | 2 | 3 |
| -------- | -------- | -------- | -------- |
| 2 | $\frac{1}{36}$ | $\frac{5}{18}$ |$\frac{1}{9}$
| 3 | $\frac{1}{4}$ | $\frac{5}{36}$ |$\frac{7}{36}$
$P(A=3, B = 3)$ = $\frac{7}{36}$
### b)
$P(B = 1)$ = $\frac{1}{36} + \frac{1}{4}$ = $\frac{5}{18}$
$P(B = 2)$ = $\frac{5}{18} + \frac{5}{36}$= $\frac{5}{12}$
$P(B = 3)$ = $\frac{1}{9} + \frac{7}{36}$ = $\frac{11}{36}$
### c)
$E(X)$ = $\sum{x . P(X)}$ = $1 * \frac{5}{18} + 2 * \frac{5}{12} + 3 * \frac{11}{36}$ = $\frac{73}{36}$ = $2\frac{1}{36}$
---------------
$E(X^2)$ = $\sum{x^2 . P(X)}$ = $1^2 * \frac{5}{18} + 2^2 * \frac{5}{12} + 3^2 * \frac{11}{36}$
$E(X^2)$ = $\frac{5}{18} + \frac{20}{12} + \frac{99}{36} = \frac{169}{36}$
$Var(X) = E(X^2) - E(X)^2$
$Var(X) = \frac{169}{36} - (\frac{73}{36})^2 = 0.58256172839$
### d)
$P(A = 2 | B = 1) = \frac{P(A=2, B = 1)}{P(B=1)} = \frac{\frac{1}{36}}{\frac{5}{18}} = 0.1$
## 2
### a)
#### Range of the parameters
Max of $P(X=1)= 1$
Min of $P(X=1)= 0$
$0 ≤ θ ≤ \frac{1}{2}$
Max of $P(X=2)= 1$
Min of $P(X=2)= 0$
$0 ≤ θ ≤ \frac{1}{3}$
Max of $P(X=3)= 1$
Min of $P(X=3)= 0$
$-\frac{1}{5} ≤ θ ≤ \frac{1}{5}$, we know that probability has a minimum of 0 therefore,
$0 ≤ θ ≤ \frac{1}{5}$
#### Maximum Likelihood Estimator
$L(θ) = P(X=1)*P(X=2)*P(X=3)$
$L(θ) = 2θ^{x_1}*3θ^{x_2}*(1-5θ)^{x_3}$
$log(L(θ)) = log(2θ^{x_1}*3θ^{x_2}*(1-5θ)^{x_3})$
$log(L(θ)) = x_1log(2θ) +x_2log(3θ) + x_3log(1-5θ)$
Maximum value is found when the derivative is 0
$\frac{d}{dθ}(log(L(θ))) = \frac{d}{dθ}(x_1log(2θ) +x_2log(3θ) + x_3log(1-5θ)) = 0$
$\frac{x_1 + x_2}{θ} = \frac{5x_3}{1-5θ}$
$θ̂= \frac{x_1+x_2}{5(x_1+x_2+x_3)}$
To find the maximum likelihood estimator:
$\frac{d^2}{dθ} log(L(θ))= -\frac{x_1}{θ^2} -\frac{x_2}{θ^2} +\frac{(25x_3)}{(1-5θ)^2}$
So long as $\frac{d^2}{dθ} log(L(θ)) < 0$ , $θ$ would be the maximum value of the likelihood.
### b)
Let $L(a,b|X_1,…,X_n )=∏_{(i=1)}^{n}\frac{1}{(b-a)} 1_{(X_i∈[a,b])}$
where, $1_{(X_i∈[a,b])}$ is the indicator function, where the value is 1 when $a≤X_i≤b$
$L(a,b|X_1,…,X_n)= (\frac{1}{b-a})^n$
$log(L(a,b|X_1,…,X_n)= log((\frac{1}{b-a})^n)$
$log(L(a,b|X_1,…,X_n)= -nlog(b-a)$
$\frac{d}{da}(log(L(a,b|X_1,…,X_n)) = \frac{n}{b-a} = 0$
$\frac{d}{db}(log(L(a,b|X_1,…,X_n)) = \frac{n}{a-b} = 0$
we know that $n>0$ and therefore the derivative is inconclusive in determining hte maximum likelihood estimator. As such to maximise the likelihood
$a ̂=min X_i$ and $b ̂=max X_i$
## 3
### a)
$μ_1=\frac{(2+8)}{2}=5$
$μ_2=\frac{(1+3+9+10)}{4}=5.75$
### b)
$D_1=\{1,2,3\}$
$D_2=\{8,9,10\}$
### c)
$μ_1=\frac{(1+2+3)}{3}=2$
$μ_2=\frac{(8+9+10)}{3}=9$
Clustering is stable since the new possible centroid are the same.
$D_1=\{1,2,3\}$
$D_2=\{8,9,10\}$
## 4
### a)
1. To be a probability density function one of the criteria is that the sum of a probability function should equal to 1, which is the are/volume calculated by integrating within the limits.
$=\int_{0}^{1}\int_{0}^{2} \frac{6}{7} (x^2+ \frac{xy}{2}) \,dy\,dx$
$=\frac{6}{7}\int_{0}^{1}[x^2y +\frac{xy^2}{4} ]_0^2,dx$
$=\frac{6}{7}\int_{0}^{1}2x^2 + x\,dx$
$=\frac{6}{7}[\frac{4}{6}x^3 + \frac{3}{6}x^2]_{0}^{1}$
$= 1$
2. $f(x,y)≥0$ for $x∈[0,1]$ and $y∈[0,2]$ also shows that this pdf is valid
### b)
PDF of X (Marginal probability of X)
$g(x)=∫_0^2f(x,y)dy$
$g(x)=∫_0^2\frac{6}{7} (x^2+ \frac{xy}{2}) dy$
$g(x)=[\frac{6}{7} (x^2y+ \frac{xy^2}{4})]_0^2$
$g(x)=[\frac{6x^2y}{7} + \frac{3xy^2}{14})]_0^2$
$g(x)=[\frac{12x^2}{7} + \frac{6x}{7}] \ 0 ≤ x ≤ 1, = 0 \ otherwise$
### c)
Hence, using the pdf of X
$f_{Y|X} (y | x=1) = \frac{f(x=1,y)}{f(x=1)}$
$f_{Y|X} (y | x=1)=\frac{\int_0^y 6/7(1 + 0.5y) dy}{18/7}$
$f_{Y|X} (y | x=1)=\frac{\int_0^y (1 + 0.5y) dy}{3}$
### d)
To check for independence $f(x,y) = f_X(x)*f_Y(y) = g(x) g(y)$
Marginal pdf Y
$g(y)=∫_0^1f(x,y)dx$
$g(y)=∫_0^2\frac{6}{7} (x^2+ \frac{xy}{2}) dx$
$g(y)=[\frac{6}{7}(\frac{x^3}{3}+\frac{x^2y}{4})]_0^2$
$g(y)=\frac{6}{7}(\frac{8}{3}+y)$
From a glance we can see that $f(x,y) ≠ f_X(x)* f_Y(y)$
Therefore, they are not independent.
## 5
### a)
```
fprintf("Question 5a\n")
RIASEC = readtable('RIASEC.csv');
only_R = table2array(RIASEC(:,2:9));
m = size(only_R,1);
row_to_remove = [];
for row = 1:m
if any(only_R(row,:)<0,2) % returns true if row contains any value less than 0
row_to_remove = [row_to_remove,row];
end
end
only_R(row_to_remove,:) = [];
```
### b)
```
fprintf("Question 5b\n")
total_R = sum(only_R,2)/8;
R1 = only_R(:,1);
x = [ones(6500,1) R1(1:6500)];
y = total_R(1:6500);
theta = inv(x.'*x)*x.'*y; % theta0 = 8.1448, theta1 = 3.3888
disp("estimated regression function: " + "y = "+ theta(1) + "+" + theta(2) + "x")
predicted = theta(1) + theta(2)*x(:,2);
RSS = sum((y - predicted).^2);
disp("Residual sum of squares: " + RSS)
RSS_avg = RSS/size(y,1);
disp("Residual sum of squares averaged: " + RSS_avg)
```

### c)
```
fprintf("Question 5c\n")
test_x = R1(6501:end);
test_y = total_R(6501:end);
test_predicted = theta(1) + theta(2)*test_x;
test_RSS = sum((test_y - test_predicted).^2);
disp("Residual sum of squares of remaining people: " + test_RSS)
test_RSS_avg = test_RSS/(size(test_y,1));
disp("Residual sum of squares of remaining people averaged: " + test_RSS_avg)
disp("Residual sum of squares for part b: " + RSS_avg)
if RSS_avg < test_RSS_avg
disp("RSS of test (part c) is more than training (part b)")
end
```
