--- tags: Academic --- # CV HW 3: Deep Learning Review Obviously you should have read [PyTorch Tutorial](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-custom-nn-modules). ## Problem A: Line Fitting ![](https://i.imgur.com/5Dff6V7.png) Left is `pA1.csv` and right is `pA2.csv`. Model of A1 is $y = ax + b$, A2 is $y=w_0 x^2 + w_1 x + w_2$. ### A1 The answer is $y = 5x + 4$. To draw the loss function (of whole dataset) against parameter space, i.e., the function we want to find its minimium, one simply sample a lot of combinations of parameter and compute the corresponding (average) loss. Take $(a, b) = (2, 3)$ for example, now the model (predict function) is $y = 2x + 3$. Use this model to find prediction of all $x$ in our dataset and check the *difference* between the predicted $y$ and the ground truth $y$. The average of this *difference* is the loss. Using different $(a, b)$ will get different loss. We sample a lot of $(a, b)$ and calculate corresponding loss. These form the surface you see in the figure. At the same time, the black path in the figure is the process of our SGD. You can find that it's getting closer and closer to the correct $(a, b)$ as well as moving to the minimum of the surface. PS. *difference* depends on your code. If you use `nn.L1Loss` as `criterion`, *difference* means the absolute of difference. If you use `nn.MSELoss`, *difference* means mean square error. ![](https://i.imgur.com/rf7qszq.png =300x) ### A2 The answer is $y = -2x^2 + x + 4$. $y = w_0x^2 + w_1x + w_2$ can be formulated as a perceptron: ```graphviz digraph { rankdir=LR; A[label="1"]; B[label="x"]; C[label="x^2"]; D[label="y"]; A->D [label="w2"]; B->D [label="w1"]; C->D [label="w0"]; } ``` A perceptron is a `nn.Linear`, so the code is: ```python= class Net(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(2, 1, bias=True) def forward(self, xs): inp_b = torch.stack([xs ** 2, xs], dim=1) # [N, 2] out_b = self.fc(inp_b).flatten() # [N, 1] -> [N,] return out_b model = Net() data = Data('./pA2.csv') loader = DataLoader(data, batch_size=5, shuffle=True) # similar training code to A1 print(model.fc.weight) # tensor([[-1.9931, 1.0007]], requires_grad=True) print(model.fc.bias) # tensor([4.0012], requires_grad=True) ``` If your code gets the correct answer but the model is not equivalent to above, say you used 3 layers of `nn.Linear`, you get no points because the model (function) is not the same. For the loss figure part, it should be easy to implement. ## Problem B: License Plate Localization ### Requirement 1: Traing & Validation Code Since the training code is mostly the same as problem A, most problems are the validation code. A lot of people forget to turn the model to evaluation mode when validation. You will not get the score of valiation if TA didn't see `model.eval()` or equivalence statements in your code. ### Requirement 2: Visualization Code Visualization is used for checking the result of model and the code is given in referece code... ### Requirement 3: Loss Plotting Simple fucntions from `matplotlib` code do the job. This figure is usually used for checking if overfitting occurs: training loss is way below validation loss and valiation loss converged. ### Requirement 4: Testing There are 3 common problems: 1. Forget to denormalize the keypoint to the scale of image. The model predicts normalized keypoints which is in range `[0, 1]`, but the coordinates required for submission is non-normalized coordinates, like the one used in `train.csv` and `sample.csv`. 2. Forget the turn the model to evaluation mode. When a PyTorch model is instantiated, it's in training mode by default. 3. Use `glob` to find all paths of testing images, but not aware that the result of `glob` is not in alphabetical order. If anything is implemented correctly, the reference code should have $RMSE$ between $20$ and $30$. ### Improvement Simple hyperparameter modification is enough. Training longer and changing LR to smaller value are able to get you the improvement score. ### Bonus LR decay or using modern model like resnet18 shoule be enough to get $RMSE \le 15$. I saw a lot of people achieved $14$ or smaller. ## Conclusion The average is 71.5, while raw score average (without delay and TA Fix) is 79.5. This homework gives you 18 points (8 for A1, 10 for report) even you don't write any code. However due to the incorrect format, lots of you (29 people) are not able to get high score. We have specified the format in all the homeworks in this course. We do not penalize anyone in previous homeworks while we should do so according to spec. I decided to strictly follow the spec for this homework since I had informed you when announcing homework and had put warning messages on iLMS. If you have any question or want to see my implementation of this homework, feel free to find me during the office hours. 由左至右負責的助教分別為:張克齊、林哲聰、黃鈺程、程薰瑩。 ![](https://i.imgur.com/8fngqpR.png)