MLTech HW3 Common Grading Policy Questions

# MLTech HW3 Common Grading Policy Questions > p7 and p9 **Requires \beta-dependent solution** Due to various communication misunderstandings and errors, some submissions are mistakenly penalized. Please wait for notification on further updates. Thank you for the patience and apologies for the trouble. <s>(Confirmed with Professor) Directly quoting "If the student gave a beta-dependent T but then minimized over beta to get T_min = 2, I'll give a full score. If the student did not give a beta-dependent T at all, you (TA) can consider the answer incomplete ... I personally may give a 5 point penalty ..."</s> > p10 **Missing multiplication of w with p when testing** \[legacy\] Since dropout is not applied when testing, the overall number of activated neurons is $\frac{1}{p}$ times of training. Thus, $w$ has to be scaled by $p$, i.e., $w' = wp$ to maintain similar magnitude. \[Edit\] However, as pointed out by 林庭風 (many thanks), the problem asks for the optimal solution for the objective function and thus the rescaling is not necessary. Therefore, no penalty will be imposed. > p10 **Typo** The official accepted solution is $w^* = (X^TX + \text{diag}(X^TX))^\dagger X^Ty$ or $w^* = 2(X^TX + \text{diag}(X^TX))^\dagger X^Ty$. Any constant scaled solution, i.e., $w' = \alpha w^*, \alpha \in \mathbb{R}$ (except caused by missing rescaling with $p$, see the point above) would be considered a typo.