2023-05-04 tune loss weight

# 2023-05-04 tune loss weight In this article, I want to show how loss weight affect loss landscape, and help us improve the training result of physics informed neural network(PINN) for heat conduction. The heat conduction problem is described as $$ \frac{\partial^2 T}{\partial x^2}+\frac{\partial^2 T}{\partial y^2} = 0 $$ B.C. $$ T = \begin{cases} 0 & x=0, 1; y=0 \\ 1 & y=1 \end{cases} $$ and the loss function of PINN is defined as $$ \mathcal{L} = \lambda_{DE} \mathcal{L}_{DE} + \lambda_{DBC}\mathcal{L}_{DBC} \\ \mathcal{L}_{DE} = ||\nabla ^2 T_{i,j}||^2_{i \in (0, 30), j \in (0, 30)}\\ \mathcal{L}_{DBC} = ||(T_{i,j} - b_{i,j})||^2_{i = 0, 30, j = 0,30}\\ $$ where $$ b_{i,j} = \begin{cases} 0 & \text{if $i = 0, 30, j = 0$} \\ 1 & \text{if $j = 30$} \end{cases} \\ $$ Every neural network architecture has its own loss landscape, a change of hyper-parameter, number of layers, choice of activation functions, value of $\lambda_{DE}$, $\lambda_{DBC}$, gives change to the loss landscape. Some research study the visualization of loss landscape, [1] reveal faults in a number of visualization methods for loss landscape, this implies that loss landscape is generally an unknown to us, especially to user-ends. Also, loss is determined by large number of parameters, which makes it multi-dimensional, it is impossible to learn the actual landscape without a deep exploration. ![](https://www.cs.umd.edu/~tomg/img/landscapes/noshort.png =300x) Despite the actual loss landscape is an unknown, we can enforce neural networks to follow a certain procedure in its training process to produce a more physical-sensitive result. This can be achieved by tunning weight loss $\lambda$. In the heat conduction problem, the prerequisition is to satisfy the given boundary condition, and solve temperature distribution under this given boundary condition. It is also the procedure solving equations with numerical method. Namely, we define the boundary first, then we compute the inner field. Suppse there are only 1 learnable parameter $w$, and the visualized loss landscape is illustrated as figure below, where $$ \lambda_{DE} = 1, \lambda_{DBC} = 1 \\ \mathcal{L} = \lambda_{DE} \mathcal{L}_{DE} + \lambda_{DBC}\mathcal{L}_{DBC} + b \\ $$ $b$ is to translat $\mathcal{L}$ in y-dir for better view. The absolute value does not affect the training process, the gradient does. ![](https://i.imgur.com/pJ1KMO7.png) Suppose the initial condition for the learnable parameter is $w=0$, based on gradient descent, $w$ tends to proceed in $-x$ direction. Despite the fact that we do not know where exactly the global minima is, we know that the glabal minima has a higher chance to be one of the candidates who is able to correctly decide the boundary condition. We want to inform the neural network that the global minima has a higher chance to be somewhere $\mathcal{L}_{DBC}$ is smaller, therefore, we should give a larger value to $\lambda_{DBC}$. With $\lambda_{DBC}$ assigned with a larger value, the visualized loss landscape is illustrated as figure below, where $$ \lambda_{DE} = 1, \lambda_{DBC} = 5 $$ ![](https://i.imgur.com/GXq8RHy.png) Suppose the initial condition for the learnable parameter is $w=0$, based on gradient descent, $w$ tends to proceed in $+x$ direction. >This method can be applied when we do not know the answer, yet we know the region where the answer lie, so we guide the trainable parameter in that direction. Yet, setting $\lambda_{DBC}$ too large may cause the neural network to focus on wrong target, therefore, they should be tuned. ## Experimental result ### $\lambda_{DBC} = 1$ vs training process #### $\lambda_{DE} = 1, \lambda_{DBC} = 1$ ($\eta = 5 \cdot 10^{-5}$) |Epoch|$\mathcal{L}$|$\mathcal{L}_{DE}$|$\mathcal{L}_{DBC}(\times 1)$|timer|$T$| |---|---|---|---|---|---| |0|651.2289|650.3887|0.8401|0.01 sec|![](https://i.imgur.com/qf5YtfM.png =30x)| |500|0.8781|0.1191|0.7590|2.16 sec|![](https://i.imgur.com/KyM80CZ.png =30x)| |1000|0.8183|0.0617|0.7566|4.36 sec|![](https://i.imgur.com/V1YRXQr.png =30x)| |5000|0.7722|0.0210|0.7512|22.78 sec|![](https://i.imgur.com/6W6cu2d.png =30x)| |10000|0.7671|0.0166|0.7505|45.43 sec|![](https://i.imgur.com/quqDDW5.png =30x)| |20000|0.7656|0.0153|0.7504|91.35 sec|![](https://i.imgur.com/1mpHCa2.png =30x)| |30000|0.7655|0.0152|0.7503|136.66 sec|![](https://i.imgur.com/PY85jR4.png =30x)| ![](https://i.imgur.com/Md4M8sA.png) #### $\lambda_{DE} = 1, \lambda_{DBC} = 25$($\eta = 5 \cdot 10^{-5}$) |Epoch|$\mathcal{L}$|$\mathcal{L}_{DE}$|$\mathcal{L}_{DBC}(\times n)$|timer|$T$| |---|---|---|---|---|---| |0|702.9374|681.2249|21.7125|0.01 sec|![](https://i.imgur.com/XO2aIP2.png =30x)| |500|18.8261|0.2527|18.5734|2.51 sec|![](https://i.imgur.com/QhU6BCL.png =30x)| |1000|18.6550|0.1884|18.4666|4.91 sec|![](https://i.imgur.com/TjYXJkT.png =30x)| |5000|16.5722|0.6568|15.9154|23.23 sec|![](https://i.imgur.com/XbJa1Ev.png =30x)| |10000|4.0163|0.6608|3.3554|45.93 sec|![](https://i.imgur.com/sD5CwS7.png =30x)| |20000|2.4328|0.3381|2.0947|91.43 sec|![](https://i.imgur.com/CeSMXmX.png =30x)| |30000|2.3631|0.3198|2.0433|138.07 sec|![](https://i.imgur.com/1j881Fk.png =30x)| ![](https://i.imgur.com/s0jW63i.png) #### $\lambda_{DE} = 1, \lambda_{DBC} = 50$($\eta = 5 \cdot 10^{-5}$) |Epoch|$\mathcal{L}$|$\mathcal{L}_{DE}$|$\mathcal{L}_{DBC}(\times 50)$|timer|$T$| |---|---|---|---|---|---| |0|402.9631|336.2708|66.6923|0.01 sec|![](https://i.imgur.com/bCI55gx.png =30x)| |500|36.7407|0.5828|36.1579|2.50 sec|![](https://i.imgur.com/dPgGrt2.png =30x)| |1000|34.4805|1.5358|32.9448|4.85 sec|![](https://i.imgur.com/j5j0mM7.png =30x)| |5000|3.7172|0.4476|3.2696|22.60 sec|![](https://i.imgur.com/4pheoQ6.png =30x)| |10000|2.9383|0.1731|2.7652|45.02 sec|![](https://i.imgur.com/RvAKSPt.png =30x)| |20000|2.8065|0.1541|2.6525|89.97 sec|![](https://i.imgur.com/GgEmTKW.png =30x)| |30000|2.7947|0.1524|2.6424|135.08 sec|![](https://i.imgur.com/XDmXwGH.png =30x)| ![](https://i.imgur.com/mAC04a7.png) #### $\lambda_{DE} = 1, \lambda_{DBC} = 100$ ($\eta = 5 \cdot 10^{-5}$) |Epoch|$\mathcal{L}$|$\mathcal{L}_{DE}$|$\mathcal{L}_{DBC}(\times 100)$|timer|$T$| |---|---|---|---|---|---| |0|1118.9207|1014.3817|104.5390|0.01 sec|![](https://i.imgur.com/67E0ri4.png =30x)| |500|50.9027|5.5001|45.4026|42.26 sec|![](https://i.imgur.com/cJOif6k.png =30x)| |1000|16.1834|2.6292|13.5542|83.27 sec|![](https://i.imgur.com/rw6I1V7.png =30x)| |5000|5.4276|0.4332|4.9945|420.53 sec|![](https://i.imgur.com/KWkBQfe.png =30x)| |10000|4.5645|0.1496|4.4149|848.87 sec|![](https://i.imgur.com/HeI3wDv.png =30x)| |20000|4.4304|0.1283|4.3021|1705.14 sec|![](https://i.imgur.com/FBJcvQl.png =30x)| |30000|4.4188|0.1271|4.2916|2566.77 sec|![](https://i.imgur.com/Kyy1Hli.png =30x)| ![](https://i.imgur.com/OayIXR4.png) #### $\lambda_{DE} = 1, \lambda_{DBC} = 10000$ ($\eta = 10^{-5}$) |Epoch|$\mathcal{L}$|$\mathcal{L}_{DE}$|$\mathcal{L}_{DBC}(\times n)$|timer|$T$| |---|---|---|---|---|---| |0|11005.7129|727.8101|10277.9033|0.01 sec|![](https://i.imgur.com/gMAtQvP.png =30x)| |500|2017.2341|691.2729|1325.9613|2.45 sec|![](https://i.imgur.com/IxnubTa.png =30x)| |1000|3173.2915|787.1122|2386.1794|4.70 sec|![](https://i.imgur.com/enC9KjR.png =30x)| |5000|335.1620|3.5772|331.5848|22.65 sec|![](https://i.imgur.com/u8pgAZr.png =30x)| |10000|326.7932|1.7891|325.0041|47.36 sec|![](https://i.imgur.com/KhZOYDi.png =30x)| |20000|325.9821|1.4433|324.5389|93.99 sec|![](https://i.imgur.com/Wu8RKts.png =30x)| |30000|325.9157|1.4149|324.5008|140.49 sec|![](https://i.imgur.com/yM1A5DP.png =30x)| ![](https://i.imgur.com/ntKoTMZ.png) ### $\lambda_{DBC} = 1$ vs result |Epoch|$\mathcal{L}$|$\mathcal{L}_{DE}$|$\mathcal{L}_{DBC}(\times 1)$|timer|$T$| |---|---|---|---|---|---| |30000|0.7655|0.0152|0.7503|136.66 sec|![](https://i.imgur.com/PY85jR4.png =30x)| |30000|2.3631|0.3198|2.0433|138.07 sec|![](https://i.imgur.com/1j881Fk.png =30x)| |30000|2.7947|0.1524|2.6424|135.08 sec|![](https://i.imgur.com/XDmXwGH.png =30x)| |30000|4.4188|0.1271|4.2916|2566.77 sec|![](https://i.imgur.com/Kyy1Hli.png =30x)| |30000|325.9157|1.4149|324.5008|140.49 sec|![](https://i.imgur.com/yM1A5DP.png =30x)| ## Bibo [1]Li, H., Xu, Z., Taylor, G., Studer, C., & Goldstein, T. (2018). Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31.