# MLTech HW3 Common Grading Policy Questions
> p7 and p9 **Requires \beta-dependent solution**
Due to various communication misunderstandings and errors, some submissions are mistakenly penalized. Please wait for notification on further updates. Thank you for the patience and apologies for the trouble.
<s>(Confirmed with Professor) Directly quoting "If the student gave a beta-dependent T but then minimized over beta to get T_min = 2, I'll give a full score. If the student did not give a beta-dependent T at all, you (TA) can consider the answer incomplete ... I personally may give a 5 point penalty ..."</s>
> p10 **Missing multiplication of w with p when testing** \[legacy\]
Since dropout is not applied when testing, the overall number of activated neurons is $\frac{1}{p}$ times of training. Thus, $w$ has to be scaled by $p$, i.e., $w' = wp$ to maintain similar magnitude.
\[Edit\] However, as pointed out by 林庭風 (many thanks), the problem asks for the optimal solution for the objective function and thus the rescaling is not necessary. Therefore, no penalty will be imposed.
> p10 **Typo**
The official accepted solution is $w^* = (X^TX + \text{diag}(X^TX))^\dagger X^Ty$ or $w^* = 2(X^TX + \text{diag}(X^TX))^\dagger X^Ty$.
Any constant scaled solution, i.e., $w' = \alpha w^*, \alpha \in \mathbb{R}$ (except caused by missing rescaling with $p$, see the point above) would be considered a typo.