owned this note
owned this note
Published
Linked with GitHub
## Exercise 1
The sum of two normally distributed random variables results in another normal disributed random variable with mean $\mu_1$ + $\mu_2$ and standard deviation $\sigma_1$ + $\sigma_2$.
## Exercise 2
Answer: `y(106) = 12036`
### Approach
#### Polynomial regression
I did polynomial regression to find a polynomial that best fits the given data. Then use the polynomial to extrapolate the given data to find `y(106)`.
#### Second order
I began with the quadratic polynomial $aX + bX^2$. I calculated the coefficients `a` and `b` analytically. The idea is to formulate linear set of two equations assuming that the gradient of the squared residual is zero at the minimum.
```matlab
function result = second_order(x, y, subset)
syms b c
eq1 = 2 * sum(x.^2) .* b + 2 * sum(x.^3) .* c - 2 * sum(x .* y) == 0;
eq2 = 2 * sum(x.^3) * b + 2 * sum(x.^4) * c - 2 * sum(x.^2 .* y) == 0;
% Solve two linear system of equations
sol = solve([eq1, eq2], [b, c]);
% Formulate a quadratic equation to extrapolate the data
result = sol.b .* subset + sol.c .* subset.^2;
end
```
#### Higher orders
Things are much simpler and easier with second order polynomial. However, we perhaps need higher order terms to best fit the data and avoid underfitting/overfitting error during data extrapolation. Hence I added higher order terms to the polynomial up to fifth order $aX + bX^2 + cX^3 + dX^4 + eX^5$ .
```matlab
function polynomial = find_polynomial(degree, x)
switch degree
case 3
polynomial = @(a, b, c, x)((a * x) + (b * x.^2) + (c * x.^3));
case 4
polynomial = @(a, b, c, d, x)((a * x) + (b * x.^2) + (c * x.^3) + (d * x.^4));
case 5
polynomial = @(a, b, c, d, e, x)((a * x) + (b * x.^2) + (c * x.^3) + (d * x.^4) + (e * x.^5));
end
end
```
#### Least squares fitting
Since the polynomials higher than second order cannot be solved by hand, I did least squares fitting to determine the polynomial coefficients.
```matlab
% Least square fitting to find the polynomial coefficients
function [polynomial, coefficients] = least_squares_fit(x, y, order)
polynomial = find_polynomial(order, x);
coeff = fit(transpose(x), transpose(y), fittype(polynomial));
coefficients = cell(order, 1);
elements = char(coeffnames(coeff));
for index = 1: order
coefficients{index, 1} = coeff.(elements(index));
end
end
```
#### Validation by mean squared error
Next I need to find what order of polynomial best fits the data as well as results in lower prediction error. The prediction error is calculated as the mean squared difference between the true data and fitted data. Comparing the mean squared error of all the polynomials (order = {2, 3, 4, 5}) tell us the right order of polynomial that we require.
```matlab
% Validation using mean squared error
function error = mean_squared_error(x, y, degree)
if degree == 2
result = second_order(x, y, x);
else
[polynomial, coefficients] = least_squares_fit(x, y, degree);
result = polynomial(coefficients{:}, x);
end
error = sum((result - y).^2) / length(x) ;
end
```
### Results
Almost all the polynomials perform well, though adding higher order terms seems to improve the curve fitting.

Let's try extrapolating the curves upto x = 106.

Oops, results looks very different between the polynomials. Looking more closer, fourth order curve turns to negative values which doesn't make sense at all, as the input data has an increasing trend as a whole. With this fact, we could already rule out the fourth order polynomial before even finding the prediction error.
Mean squared error (MSE):
```matlab
1.0e+04 *
order 2 = 1.1203
order 3 = 0.7053
order 4 = 0.5471
order 5 = 0.4559
```
Fifth order polynomial seems to exaggerate the result. Second order has higher MSE than the third order. Considering all this, I chose third order polynomial to calculate `y(106) = 12036`.
I'm attaching the full matlab script here for your reference:
```matlab
input = load("y.mat").y;
X = linspace(1, length(input), length(input));
% Data extrapolation
eX = linspace(1, 106, 106);
figure
plot(X, input, 'DisplayName','Given data','LineWidth',3)
second_order_result = second_order(X, input, eX);
[polynomial, coefficients] = least_squares_fit(X, input, 3);
third_order_result = polynomial(coefficients{:}, eX);
[polynomial, coefficients] = least_squares_fit(X, input, 4);
fourth_order_result = polynomial(coefficients{:}, eX);
[polynomial, coefficients] = least_squares_fit(X, input, 5);
fifth_order_result = polynomial(coefficients{:}, eX);
disp(double(third_order_result(length(third_order_result))));
hold on
plot(eX, second_order_result, 'DisplayName', 'Second','LineWidth',3);
hold on
plot(eX, third_order_result, 'DisplayName', 'Third','LineWidth', 3 );
%
hold on
plot(eX, fourth_order_result, 'DisplayName', 'Fourth','LineWidth',3 );
hold on
plot(eX, fifth_order_result, 'DisplayName', 'Fifth','LineWidth',3 );
legend('Location', 'northwest', 'FontSize', 15);
xlabel('X', 'FontSize', 15);
ylabel('Y', 'FontSize', 15);
prediction_error = zeros(4, 1);
for index = 1 : 4
degree = index + 1;
prediction_error(index) = mean_squared_error(X, input, degree);
end
function polynomial = find_polynomial(degree, x)
switch degree
case 3
polynomial = @(a, b, c, x)((a * x) + (b * x.^2) + (c * x.^3));
case 4
polynomial = @(a, b, c, d, x)((a * x) + (b * x.^2) + (c * x.^3) + ...
(d * x.^4));
case 5
polynomial = @(a, b, c, d, e, x)((a * x) + (b * x.^2) + (c * x.^3) + ...
(d * x.^4) + (e * x.^5));
end
end
% % Least square fitting to find the polynomial coefficients
function [polynomial, coefficients] = least_squares_fit(x, y, order)
% f = ;
polynomial = find_polynomial(order, x);
coeff = fit(transpose(x), transpose(y), fittype(polynomial));
coefficients = cell(order, 1);
elements = char(coeffnames(coeff));
for index = 1: order
coefficients{index, 1} = coeff.(elements(index));
end
end
function result = second_order(x, y, subset)
syms b c
eq1 = 2 * sum(x.^2) .* b + 2 * sum(x.^3) .* c - 2 * sum(x .* y) == 0;
eq2 = 2 * sum(x.^3) * b + 2 * sum(x.^4) * c - 2 * sum(x.^2 .* y) == 0;
% Solve two linear system of equations
sol = solve([eq1, eq2], [b, c]);
% Formulate a quadratic equation to extrapolate the data
result = sol.b .* subset + sol.c .* subset.^2;
end
% Validation using mean squared error
function error = mean_squared_error(x, y, degree)
if degree == 2
result = second_order(x, y, x);
else
[polynomial, coefficients] = least_squares_fit(x, y, degree);
result = polynomial(coefficients{:}, x);
end
error = sum((result - y).^2) / length(x) ;
end
```