<!--
{%hackmd dI-J6ApjSiWhbAt2wrqCDA %}
-->
# 3.6.4~3.6.8
p.3-92~3-106
## 3.6.4多分類交叉熵損失
- 樣本屬於每個分類的機率 : ($f_1^{(i)},f_2^{(i)},...,f_c^{(i)}$)
- 樣本屬於目標分類$y^{(i)}$的機率 : $f_{y^{(i)}}^{(i)}$
- m個樣本($x^i$,$y^i$)均以他們對應的目標分類出現的機率 : $\prod^m_{i=1}{f^{(i)}_{y^{(i)}}}$
- 最佳參數: $W$ 使這m個樣本有最大的正確出現的機率
:::warning
但${\prod}$容易使數值快速的趨近無限大或0
-->改成求代價函數(上面機率的負對數的平均值)
:::
- 代價函數 : $L(W)=-{\frac {1}{m}}{\sum^m_{i=1}{\log{(f^{(i)}_{y^{(i)}})}}}$
- 其中"$-\log{(f^{(i)}_{y^{(i)}})}$"稱為交叉熵損失
- 問題從求$\prod^m_{i=1}{f^{(i)}_{y^{(i)}}}$最大,變成求${-\log{(f^{(i)}_{y^{(i)}})}}$最小的問題
---
<!-- 矩陣的打法
https://hackmd.io/@sysprog/B1RwlM85Z
-->
舉例:
2個樣本(m=2),對應的機率矩陣F、目標向量y如下
$F=\left[
\begin{array}{ccc}
0.2&0.5&0.3\\
0.2&0.6&0.2\\
\end{array}
\right]$ , $y=\left[
\begin{array}{ccc}
2\\
1\\
\end{array}
\right]$
則$F_y=\left[
\begin{array}{c}
0.3\\
0.6\\
\end{array}
\right]$
因此其平均交叉商損失為:$L(W)=-\frac{1}{2}(\log(0.3)+\log(0.6))$
計算程式如下
```python=
import numpy as np
def cross_entropy(F,y):
m = len(F) #y.shape[0]
log_Fy = -np.log(F[range(m),y]) #o.s. 什麼神奇的寫法
return np.sum(log_Fy)/m
F = np.array([[0.2,0.5,0.3],[0.2,0.6,0.2]])
y = np.array([2,1])
print(cross_entropy(F,y))
```
若是用one-hot vector表示$y^{(i)}$,程式如下
```python=
import numpy as np
def cross_entropy_one_hot(F,y):
m=len(F)
return -np.sum(Y*np.log(F)) #-(1.0/m)*np.sum(np.multiply(y,np.log(F)))
F = np.array([[0.2,0.5,0.3],[0.2,0.6,0.2]])
y = np.array([[0,0,1],[0,1,0]])
print(cross_entropy_one_hot(F,y))
```
### one-hot encoding

[可能有用的補充](https://axk51013.medium.com/%E4%B8%8D%E8%A6%81%E5%86%8D%E5%81%9Aone-hot-encoding-b5126d3f8a63)
## 3.6.5透過加權和計算交叉熵損失
- 一個樣本的加權和**z**的softmax函數的輸出就是機率$f$
```python=
#https://www.parasdahal.com/softmax-crossentropy
def softmax(Z):
A = np.exp(Z-np.max(Z,axis=1,keepdims=True))
return A/np.sum(A,axis=1,keepdims=True)
def softmax_cross_entropy(Z,y):
m = len(Z)
F = softmax(Z)
log_Fy = -np.log(F[range(m),y])
return np.sum(log_Fy)/m
```
舉例:
```python=+
Z = np.array([[2,25,13],[54,3,11]])
y = np.array([2,1])
print(softmax_cross_entropy(Z,y))
```
若目標向量是one-hot型式:
```python=
def softmax(Z):
A = np.exp(Z-np.max(Z,axis=1,keepdims=True))
return A/np.sum(A,axis=1,keepdims=True)
def softmax_cross_entropy_one_hot(Z,y):
F = softmax(Z)
loss = -np.sum(y*np.log(F),axis=1)
return np.mean(loss)
```
舉例:
```python=+
Z = np.array([[2,25,13],[54,3,11]])
y = np.array([[0,0,1],[0,1,0]])
print(softmax_cross_entropy_one_hot(Z,y))
```
## 3.6.6 softmax回歸的梯度計算
- 目標: 求解使交叉熵損失$\mathcal L(W)$最小的$W$
- 方法 : 一樣是梯度下降法
- 需計算${\mathcal L(W)}$關於W的梯度(關於$W_{jk}$的偏導數)
### 1.交叉商損失關於加權和的梯度
推導:[p.3-97]()
程式:
```python=
def grad_softmax_crossentropy(Z,y):
F = softmax(Z)
I_i = np.zeros_like(Z)
I_i[np.arrange(len(Z)),y] = 1
return (F - I_i)/Z.shape[0]
def grad_softmax_cross_entropy(Z,y):
m = len(Z)
F = softmax(Z)
F[range(m),y] -=1
return F/m
```
舉例:
```python=+
Z = np.array([[2,25,13],[54,3,11]])
y = np.array([2,1])
print(grad_softmax_cross_entropy(Z,y))
```
數值梯度函數的程式(用以確認分析梯度是正確的)
```python=
def loss_f():
return softmax_cross_entropy(Z,y)
import util
Z = Z.astype(float)#注意:必須將整數陣列換成float型態
print("num_grad",util.numerical_gradient(loss_f,[Z]))
```
### 2.交叉熵損失關於權值參數的梯度
推導: [p.3-99]()
程式:
X表示 資料特徵矩陣
y表示 目標特徵值向量
reg表示 正則化參數
```python=
def gradient_softmax(W,X,y,reg):
m = len(X)
Z = np.dot(X,W)
I_i = np.zeros_like(Z)
I_i[np.arrange(len(Z)),y] = 1
F = softmax(Z)
#F = np.exp(Z)/np.exp(Z).sum(axis=1,keepdoms=True)
grad = (1/m)*np.dot(X.T,F - I_i) #Z.shape[0]
grad = grad +2*reg*W
return grad
def loss_softmax(W,X,y,reg):
m = len(X)
Z = np.dot(X,W)
Z_i_y_i = Z[np.arrange(len(Z)),y]
negtive_log_prob = - Z_i_y_i + np.log(np.sum(np.exp(Z),axis=1))
loss = np.mean(negtive_log_prob)+reg*np.sum(W*W)
return loss
```
測試一下:
```python=+
X = np.array([[2,3],[4,5]])
y = np.array([2,1])
W = np.array([[0.1,0.2,0.3],[0.4,0.2,0.8]])
reg = 0.2
print(gradient_softmax(W,X,y,reg))
print(loss_softmax(W,X,y,reg))
```
若用one-hot表示 code在[p.3-102]()
## 3.6.7 softmax回歸的梯度下降法實現 p.3-103
```python=
def gradient_descent_softmax(x,X,y,reg=0.0,alpha=0.01,iterations=100,gamma=0.8,epsilon=1e-8):
X = np.hstack((np.ones((X.shape[0],1),dtype=X.dtype),X)) #增加一列特徵 "1"
v= np.zeros_like(w)
#losses = []
w_history=[]
for i in range(0,iterations):
gradient = gradient_softmax(w,X,y,reg)
if np.max(np.abs(gradient))<epsilon:
print("gradient is small enough!")
print("iterated num is: ",i)
break
w = w - (alpha*gradient)
#v = gamma*v+alpha*gradientz
#w= w-v
#losses.append(loss)
w_history.append(w)
return w_history
```
## 3.6.8 spiral 資料集的softmax回歸模型
對三分類資料及spiral訓練一個softmax回歸模型:
```python=
X_spiral,y_spiral = gen_spiral_dataset()
X = X_spiral
y = y_spiral
alpha = 1e-0
iteration = 200
reg = 1e-3
w = np.zeros([X.shape[1]+1,len(np.unique(y))])
w_history = gradient_descent_softmax(w,X,y,reg,alpha,iterations)
w = w_history[-1]
print("w: ",w)
loss_history = compute_loss_history(w_history,X,y,reg)
print(loss_history[:-1:len(loss_history)//10])
plt.plot(loss_history,color='r')
```
計算訓練模型在一批資料(X,y)上的預測準確性
```python=
def getAccuracy(w,X,y):
X = np.hstack((np.ones((X.shape[0],1)dtype=X.dtype),X)) #增加一列特徵"1"
probs = softmax(np.dot(X,w))
predicts = np.argmax(probs,axis=1)
accuracy = sum(predicts ==y)/(float(len(y)))
return accuracy
```
使用
```python=+
getAccuracy(w,X_spiral,y_spiral)
```
繪製softmax模型的分類邊界
```python=
#plot the resulting classifier
h = 0.02
x_min, x_max = X[:,0].min()-1,X[:,0].max()+1
y_min, y_max = X[:,1].min()-1,X[:,1].max()+1
xx,yy = np.meshgrid(np.arrange(x_main,x_max,h), np.arrange(y_min,y_max,h))
z=np.dot(np.c_[np.ones(xx.size),xx.ravel(),yy.ravel()],w)
Z = np.argmax(Z,axis=1)
Z = Z.reshape(xx.shape)
fig = plt.figure()
plt.contourf(xx,yy,Z,camp=plt.cm.Spectral,alpha=0.3)
plt.scatter(X[:,0],X[:,1],c=y,s=40,cmap=plt.cm.Spectral)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
#fig.savefig('spiral_linear.png')
```