###### tags: `111專題展` `LeNet5` `PYNQ-Z2` `OV5640` `Verilog`
# 硬體影像辨識系統
## Outline
這份筆記將會分成6部分,說明我們的專題實作過程。
1. 專題簡介
2. 傷口端
3. 流程圖
4. 軟體端
5. 硬體端
---
## 專題簡介
隨著《國民法官法》的實施,各領域的專業人士將參與審判,然而目前鑑識科學證據存在主觀性和專業度等問題,因此利用科技找出客觀的關鍵性證據至關重要,其中,透過傷口評估可能的犯罪凶器是破案的關鍵步驟之一。
本專題與鑑識專家合作開發一種辨識傷口對應凶器種類的硬體自動化工具來輔助鑑識人員。該工具搭載深度學習技術建構的辨識系統,僅利用自製擬真傷口數據集進行訓練便可準確辨識真實傷口。經過進一步的硬體優化,更解決了離線辨識所需的龐大神經網路系統存儲問題。經過測試,本專題的影像辨識系統能夠透過傷口影像即時辨識對應的凶器種類,並且足夠輕巧和能夠離線辨識。這將為鑑識人員提供了一個新的輔助工具,可以大大提高犯罪現場的處理效率和準確性,從而更好地保障司法公正和人民安全。
---
## 凶器與傷口
為了建立一套有效的辨識模型,在進行凶器傷口特徵辨識的系統實作之前,需先了解各種常見的凶器類型及其對應的傷口特徵。我們分析了近五年故意殺人案的各式凶器使用次數,並了解各式凶器可能導致的傷口型態。
1. 凶器
* 根據警政署網站上的公開資料,我們統計了民國105~110年故意殺人案凶器(刀類)使用次數。
* 
2. 傷口分類
傷口種類有許多種,我們透過分析 ==凶器(刀類)== 可能造成的傷口類型為 ==機械性損傷== 。
```mermaid
graph TD;
機械性損傷---->鈍器傷;
機械性損傷-->銳利傷;
鈍器傷-->挫傷;
鈍器傷-->擦傷;
鈍器傷-->撕裂傷;
鈍器傷-->壓傷;
鈍器傷-->骨折;
銳利傷-->穿刺傷;
銳利傷-->切割傷;
銳利傷-->防禦傷;
銳利傷-->躊躇傷;
```
---
## 實作系統
1. 神經網路模型-==LeNet5==:
為了辨識傷口圖像所對應的凶器,因此需要搭建一個圖像辨識模型。在本專題中,我們使用了深度神經網路技術,並在眾多的深度神經網路架構中選擇了LeNet5神經網路作為主要模型。
* 參數量少和結構簡單:
由於FPGA資源有限,較簡單的結構意味著較小的模型大小可以降低硬體運算負荷和記憶體使用,從而提高硬體效能和速度,較少的參數量這對於FPGA上的實現非常有利。
* 高度發展:
做為最早被廣泛應用於手寫字辨識等問題的神經網路之一,其在許多數據集上的表現已經被廣泛驗證和證實,甚至在經過多年的發展和優化後,其性能已被進一步提高和優化,可以應用於更廣泛的情況。
總體而言,LeNet5作為一個經典的神經網路,在FPGA上的實現具有較好的可行性和應用前景,這也是我們選擇LeNet5作為本研究的主要模型的主要原因。

2. 系統實作流程圖:

---
## 傷口端
有鑑於傷口的圖片零碎且未經過分類,我們依據鑑識專家的建議,採集了黏土樣本模擬真實傷口的特徵,並進行大量攝影和精確分類,以建立可用於訓練神經網路的數據集。
本專題從代表性凶器中挑選剪類(剪刀)、單刃類(折疊刀)、剁刀類(菜刀)作為研究對象。其中,剪類和單刃類主要以刺擊造成穿刺傷;而剁刀類則以揮砍造成切割傷。
```mermaid
graph LR;
凶器-->折疊刀;
凶器-->菜刀;
凶器-->剪刀;
折疊刀-->穿刺傷;
菜刀-->切割傷;
剪刀-->穿刺傷;
穿刺傷-->傷口;
切割傷-->傷口;
```
為了準確模擬各種真實傷口,我們採用了樹脂黏土進行採樣,並透過軟體端後製處理來呈現擬真的傷口圖像。
* **黏土的選擇**:
黏土的種類繁多,特性也不同。為了模擬真實傷口,必須要讓黏土盡可能地接近人體皮膚的狀況。根據實驗,我們最終採用==樹脂黏土==,其具有以下特性:
* 韌性高,可以進行重複採樣
* 為油性成分,保存容易
* 纖維的撕裂與皮膚較相似
* 黏性適中
* **刀具穿刺傷口採樣**:
在進行採樣時,要使黏土維持於一定的溼度,使其最接近人皮。因此,我們訂定控制變因和可變變因,以便蒐集不同特徵的擬真傷口。
* 採樣控制變因如下:
* 黏土的溼度
* 黏土表面的平整
* 拍攝的角度和焦點
* 傷口位置(置中)
* 採樣可變變因如下:
* 穿刺和收回的力度
* 穿刺和收回的角度、深度
* 燈光角度和強度(不同膚色)
* 代表性凶器與擬真傷口照片
* 剪刀類(黃皮膚) -> 
* 單刃類(白皮膚) -> 
* 剁刀類(黑皮膚) -> 
---
## 軟體端
1. 在影像進入到神經網路訓練前,必須先經過影像的強化和輸入格式的轉化。
* 將未處理照片先隨機排序並依8:2的比例分成訓練集和驗證集,再進行影像處理,可避免驗證集出現訓練集的部分照片,反之亦然。而我們的數據集增強處理如下:
* 對比度的調整
* 90°旋轉×4
* 影像數據正規化
* 壓縮成28×28×1的灰階影像
* 轉換成Tensor的數據結構,包含維度、形狀及數據類型
* 處理過後的圖片:
* 原圖 -> 
* 對比度調整 -> 
* 旋轉 -> 
2. LeNet5神經網路的訓練
* 將圖片集設置成訓練數據集
```python=
from PIL import Image #Python Imaging Library(PIL)是python影像處理的模組,此處用以讀取圖片
from torch.utils import data #Dataset的原型
class MyDataset(data.Dataset): #創建 MyDataset 繼承 torch.utils.data.Dataset
def __init__(self, txt_path, transform): #將txt內的各檔案路徑分開儲存
txt_file = open(txt_path, 'r') #以唯讀方式打開檔。 指標會在文件的開頭。
imgs = []
for line in txt_file: #每個檔案run過一次
line = line.rstrip() #將字串末端的\n刪掉
words = line.split() #以空格為界,將字串切片
imgs.append((words[0], int(words[1]))) #word[0]為檔案路徑,word[1]為label種類
self.imgs = imgs
self.transform = transform
def __getitem__(self, index): #讀取每個檔案做處理之後,返回圖片像素和lebel
filepath, label = self.imgs[index]
img = Image.open(filepath) #利用PIL的Image.open讀取圖片
img = self.transform(img) #轉換圖片
return img, label #回傳圖片像素矩陣和lebel
def __len__(self): #返回影像大小
return len(self.imgs)
```
```python=
from torchvision import transforms #載入torchvision以對圖片做各種圖像處理
trans = transforms.Compose([
transforms.RandomCrop(28),
transforms.ToTensor()
])
train_data = MyDataset(train_txt_path, transform = trans) #調用 MyDataset 包裝 訓練集,並將圖片轉為Tensor格式
test_data = MyDataset(test_txt_path, transform = trans) #調用 MyDataset 包裝 測試集,並將圖片轉為Tensor格式
trainloader = data.DataLoader(dataset=train_data, batch_size=8, shuffle=True) #調用 DataLoader 將圖片拼接成一個個batch,並洗牌
testloader = data.DataLoader(dataset=test_data, batch_size=4, shuffle=False)
```
* 架設LeNet5神經網路
```python=
class LeNet(nn.Module): #LeNet-5 class
def __init__(self): #建構式Constructor
super(LeNet,self).__init__()
# input:(1*28*28); output:(6*12*12)
self.conv1 = nn.Sequential(
#卷積層, 卷積核大小 5,步長 1,padding=0
nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5,stride=1),
nn.ReLU(),
#Maxpool and avgpool 的參數是一樣的,所以可根據需求更改
nn.MaxPool2d(kernel_size = 2 ,stride = 2,padding=0) #input_size=(6*24*24),output_size=(6*12*12)
)
# input:(6*12*12); output:(16*4*4)
self.conv2 = nn.Sequential(
nn.Conv2d(in_channels=6,out_channels=16,kernel_size=5,stride=1,padding=0), #input_size=(6*14*14),output_size=16*10*10
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2,stride = 2,padding=0) ##input_size=(16*10*10),output_size=(16*5*5)
)
# input:((16*4*4)*1*1); output:(120*1*1)
self.fc1 = nn.Sequential(
nn.Linear(16*4*4,120),
nn.ReLU()
)
# input:(120*1*1); output:(84*1*1)
self.fc2 = nn.Sequential(
nn.Linear(120,84),
nn.ReLU()
)
# input:(84*1*1); output:(10*1*1)
self.fc3 = nn.Linear(84,10)
def forward(self,x): #向前傳播
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1) #全連接層皆為線性nn.Linear(),輸入輸出均為1維,此為reshape成1維
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
```
*神經網路實體化成GPU/CPU形式
```python=
import torch
import torch.optim as optim
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") #如果偵測到cuda就用cuda,否則就用cpu版本
model = LeNet().to(device) #將網路實體化到device
optimizer = optim.Adam(model.parameters(), lr=0.001) #宣告optimizer,並設置learning rate為0.001
```
* 建立訓練函數
```python=
def train_runner(model, device, trainloader, optimizer, epoch):
model.train() # train模式,啟用 BatchNormalization 和 Dropout
total = 0
correct = 0.0
for i, data in enumerate(trainloader):
#================================================取得資料集================================================#
inputs, labels = data # inputs: 一個batch中所有圖片的像素矩陣,labels: 一個batch中所有圖片的label
inputs, labels = inputs.to(device), labels.to(device) #把模型實體化到device上
#===============================================進行網路運算===============================================#
optimizer.zero_grad() #初始化梯度
outputs = model(inputs) #將每一個batch的訓練結果存在outputs(列:一個batch的大小,行:lebel種類數量)
loss = nn.functional.cross_entropy(outputs, labels) #計算loss,多分類通常用cross_entropy,二分類則是sigmod
#==============================================結果的數據讀取==============================================#
predict = outputs.argmax(dim=1)
total += labels.size(0)
correct += (predict == labels).sum().item()
#=================================================反向傳播=================================================#
loss.backward() #反向傳播
optimizer.step()#更新參數
#==============================================顯示結果的數據==============================================#
if i % 100 == 0:
print("Train Epoch{} \t Loss: {:.6f}, accuracy: {:.6f}%".format(epoch, loss.item(), 100*(correct/total)))
return loss.item(), correct/total
```
* 建立測試函數
```python=
def test_runner(model, device, testloader):
model.eval() # test模式,不啟用 BatchNormalization 和 Dropout
total = 0
correct = 0.0
test_loss = 0.0
with torch.no_grad():#torch.no_grad就不會更新梯度, 也不會進行反向傳播
for inputs, labels in testloader:
#================================================取得資料集================================================#
inputs, labels = inputs.to(device), labels.to(device)
#===============================================進行網路運算===============================================#
outputs = model(inputs)
test_loss += nn.functional.cross_entropy(outputs, labels).item()
#==============================================結果的數據讀取==============================================#
predict = outputs.argmax(dim=1)
total += labels.size(0)
correct += (predict == labels).sum().item()
#================================================顯示結果的數據================================================#
print("test_avarage_loss: {:.6f}, accuracy: {:.6f}%".format(test_loss/total, 100*(correct/total)))
```
* 訓練並輸出結果
```python=
import time #用來顯示執行時間和結束時間
from matplotlib import pyplot as plt #用來畫圖
epoch = 5 #訓練次數
Loss = []
Accuracy = []
for epoch in range(1, epoch+1):
print("start_time",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
loss, acc = train_runner(model, device, trainloader, optimizer, epoch)
Loss.append(loss)
Accuracy.append(acc)
test_runner(model, device, testloader)
print("end_time: ",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())),'\n')
#================================================畫出趨勢圖================================================#
print('Finished Training')
plt.subplot(2,1,1)
plt.plot(Loss)
plt.title('Loss')
plt.show()
plt.subplot(2,1,2)
plt.plot(Accuracy)
plt.title('Accuracy')
plt.show()
```
3. LeNet5神經網路的輸出
為了佈署神經網路於硬體端,我們必須使用定點數的輸出減少硬體端的計算量,以精度換取速度。此外,Vivado端的輸入格式是coe檔,Vitis端的輸入為.h檔,並且必須配合各層神經層的運算模式做相對應的處理。
* 卷積層輸出
卷積層使用定點數的形式輸出,配合Vivado端的神經網路運算做逆序的處理,並且各層的運算維度優先級不同,必須調整for的順序
```python=
for name, param in model.named_parameters():
#---------------
print(name)
#---------------
if name == "conv1.0.weight":
para = param.data.detach().numpy()
for s_0 in range(para.shape[0]):
f = open(C1_path + "\\c1_w_{}_rom".format(s_0+1) + ".coe","w") # https://www.runoob.com/python/python-func-open.html
data ='memory_initialization_radix = 2;\nmemory_initialization_vector=\n'
f.writelines(data)
for s_1 in range(para.shape[1]-1,-1,-1): #配合Vivado端的運算做逆序處理
for s_2 in range(para.shape[2]-1,-1,-1):
for s_3 in range(para.shape[3]-1,-1,-1):
if round(para[s_0][s_1][s_2][s_3]*4096) < 0 : #處理正負數,4096為保留12位小數(2的12次方)
data_b = "{0:016b}".format(65536 + round(para[s_0][s_1][s_2][s_3]*4096)) #負數用最大值-計算值
else:
data_b = "{0:016b}".format(round(para[s_0][s_1][s_2][s_3]*4096)) #正數直接輸出
#---------------將結果print在python端以檢查結果
print("para[{}][{}][{}][{}]".format(s_0,s_1,s_2,s_3), end=":")
print(para[s_0][s_1][s_2][s_3], end=", ")
print(data_b, end = ",")
print()
#---------------
f.writelines(data_b)
if not (s_1==0 and s_2==0 and s_3==0):
f.writelines(',')
print("--------------------------------------------------")
f.writelines(';')
f.close()
if name == "conv1.0.bias":
para = param.data.detach().numpy()
f = open(C1_path + "\\c1_b_rom" + ".coe","w")
data ='memory_initialization_radix = 2;\nmemory_initialization_vector=\n'
f.writelines(data)
for s_0 in range(para.shape[0]):
if round(para[s_0]*4096) < 0 : #處理正負數,4096為保留12位小數(2的12次方)
data_b = "{0:016b}".format(65536 + round(para[s_0]*4096))#負數用最大值-計算值
else:
data_b = "{0:016b}".format(round(para[s_0]*4096)) #正數直接輸出
#---------------將結果print在python端以檢查結果
print("para[{}]".format(s_0), end=":")
print(para[s_0], end=", ")
print(data_b)
#---------------
f.writelines(data_b)
if not (s_0==para.shape[0]-1):
f.writelines(',')
f.writelines(';')
f.close()
if name == "conv2.0.weight":
para = param.data.detach().numpy()
for s_2 in range(para.shape[2]):
f = open(C2_path + "\\c2_w_{}_rom".format(s_2) + ".coe","w")
data ='memory_initialization_radix = 2;\nmemory_initialization_vector=\n'
f.writelines(data)
for s_0 in range(para.shape[0]): #配合Vivado端的運算做逆序處理
for s_1 in range(para.shape[1]-1,-1,-1):
for s_3 in range(para.shape[3]-1,-1,-1):
if round(para[s_0][s_1][s_2][s_3]*256) < 0 : #處理正負數,256為保留8位小數(2的8次方)
data_b = "{0:08b}".format(256 + round(para[s_0][s_1][s_2][s_3]*256))#負數用最大值-計算值
else:
data_b = "{0:08b}".format(round(para[s_0][s_1][s_2][s_3]*256)) #正數直接輸出
#---------------將結果print在python端以檢查結果
print("para[{}][{}][{}][{}]".format(s_0,s_1,s_2,s_3), end=":")
print(para[s_0][s_1][s_2][s_3], end=", ")
print(data_b)
#---------------
f.writelines(data_b)
if(s_1==0 and s_3 ==0 and s_0 !=15):
f.writelines(',')
f.writelines('\n')
print("--------------------------------------------------")
f.writelines(';')
f.close()
if name == "conv2.0.bias":
para = param.data.detach().numpy()
f = open(C2_path + "\\c2_b_rom" + ".coe","w")
data ='memory_initialization_radix = 2;\nmemory_initialization_vector=\n'
f.writelines(data)
for s_0 in range(para.shape[0]-1,-1,-1):
if round(para[s_0]*256) < 0 : #處理正負數,256為保留8位小數(2的8次方)
data_b = "{0:08b}".format(256 + round(para[s_0]*256))#負數用最大值-計算值
else:
data_b = "{0:08b}".format(round(para[s_0]*256)) #正數直接輸出
#---------------將結果print在python端以檢查結果
print("para[{}]".format(s_0), end=":")
print(para[s_0], end=", ")
print(data_b)
#---------------
f.writelines(data_b)
if not (s_0==0):
f.writelines(',')
f.writelines(';')
f.close()
```
* 全連接層輸出
全連接層不需要轉成定點數處理,也不需做順序的調整,並用.h檔輸出
```python=
for name, param in model.named_parameters():
#---------------
print(name)
#---------------
if name == "fc1.0.weight":
para = param.data.detach().numpy()
f = open(F1_path + "\\f1_rom" + ".h","w")
for s_0 in range(para.shape[0]):
for s_1 in range(para.shape[1]):
#---------------
print("para[{}][{}]".format(s_0, s_1), end=":")
print(para[s_0][s_1], end=", ")
#---------------
f.writelines(str(para[s_0][s_1]))
if not (s_0 == 119 and s_1 == 255):
f.writelines(',')
f.writelines('\n')
f.close()
#---------------
print("=========================================================================")
#---------------
if name == "fc1.0.bias":
para = param.data.detach().numpy()
f = open(F1_path + "\\f1_b_rom" + ".h","w")
for s_0 in range(para.shape[0]):
#---------------
print("para[{}]".format(s_0), end=":")
print(para[s_0], end=", ")
#---------------
f.writelines(str(para[s_0]))
if not (s_0 == 119):
f.writelines(',')
f.writelines('\n')
f.close()
#---------------
print("=========================================================================")
#---------------
if name == "fc2.0.weight":
para = param.data.detach().numpy()
f = open(F2_path + "\\f2_rom" + ".h","w")
for s_0 in range(para.shape[0]):
for s_1 in range(para.shape[1]):
#---------------
print("para[{}][{}]".format(s_0, s_1), end=":")
print(para[s_0][s_1], end=", ")
#---------------
f.writelines(str(para[s_0][s_1]))
if not (s_0 == 83 and s_1 == 119):
f.writelines(',')
f.writelines('\n')
f.close()
#---------------
print("=========================================================================")
#---------------
if name == "fc2.0.bias":
para = param.data.detach().numpy()
f = open(F2_path + "\\f2_b_rom" + ".h","w")
for s_0 in range(para.shape[0]):
#---------------
print("para[{}]".format(s_0), end=":")
print(para[s_0], end=", ")
#---------------
f.writelines(str(para[s_0]))
if not (s_0 == 83):
f.writelines(',')
f.writelines('\n')
f.close()
#---------------
print("=========================================================================")
#---------------
if name == "fc3.weight":
para = param.data.detach().numpy()
f = open(F3_path + "\\f3_rom" + ".h","w")
for s_0 in range(para.shape[0]):
for s_1 in range(para.shape[1]):
#---------------
print("para[{}][{}]".format(s_0, s_1), end=":")
print(para[s_0][s_1], end=", ")
#---------------
f.writelines(str(para[s_0][s_1]))
if not (s_0 == 9 and s_1 == 83):
f.writelines(',')
f.writelines('\n')
f.close()
#---------------
print("=========================================================================")
#---------------
if name == "fc3.bias":
para = param.data.detach().numpy()
f = open(F3_path + "\\f3_b_rom" + ".h","w")
for s_0 in range(para.shape[0]):
#---------------
print("para[{}]".format(s_0), end=":")
print(para[s_0], end=", ")
#---------------
f.writelines(str(para[s_0]))
if not (s_0 == 9):
f.writelines(',')
f.writelines('\n')
f.close()
#---------------
print("=========================================================================")
#---------------
```
---
## 硬體端
### 系統架構
為了使辨識更加的快速,LeNet5在硬體端採用了一個特定的概念—將最耗時的卷積運算實現於PL端,利用平行處理的優勢來提高系統的運算速度。同時,我們將參數量龐大的全連接層實現於PS端,以降低硬體資源的使用率。

### 使用的設備
1. FPGA:[PYNQ-Z2](http://www.pynq.io/board.html)

2. Camera: OV5640
<br><br><br>
---
### LeNet5的IP設計
這裡我們會介紹我們在verilog實現中各層的設計概念
* 卷積層1:
* 為實現了長寬5×5,6通道的卷積核,與輸入的28×28×1圖片進行卷積運算。
* 輸入數據處理的部分,我們調用5個大小為28單元的可讀寫RAM,儲存輸入影像的5列28像素。

* 引入6個大小為25單元的ROM儲存軟體端所訓練好的6通道的參數值。

* 單通道卷積運算:
* 將參數與數據大小匹配做運算。

* 六通道平行處理。

* 減少數據傳輸的延遲
* 將六通道計算出的輸出數據進行位元拼接。

```verilog=
`timescale 1ns / 1ps
module conv1(
input clk,
input rst_n,
input [7:0] data_in,
input data_in_valid,
output [191:0] data_out, // 191 = 32 * 6 - 1
output reg data_out_valid
);
genvar k;
wire [31:0] data_out_array[0:5];
genvar p;
generate
for (p=0;p<6;p=p+1)begin
assign data_out_array[p] = data_out[p*32+:32];
end
endgenerate
//======================= addr ===============================
reg [4:0] wr_addr;
reg [4:0] rd_addr;
wire [4:0] rd_addr_pre2 = wr_addr + 2'd2;
always@(posedge clk, negedge rst_n)begin
if(~rst_n)begin
wr_addr<=0;
rd_addr<=0;
end
else if( data_in_valid == 1'b1 )begin
//========== A ============
if(wr_addr =='d27)
wr_addr <= 5'd0;
else
wr_addr <= wr_addr + 1'd1;
//========== B ============
if(rd_addr_pre2 > 'd27)
rd_addr <= rd_addr_pre2 - 5'd28;
else
rd_addr <= rd_addr_pre2;
end
end
//======================= data ===============================
wire [7:0] window_in[0:4];
wire [7:0] window_out[0:4];
assign window_in[0] = data_in;
generate
for(k=1;k<5;k=k+1)begin
assign window_in[k]=window_out[k-1];
end
endgenerate
//======================= Instance ===============================
generate
for (k=0;k<5;k=k+1)begin
gray_linebuff gray_linebuffer_U (
.clka(clk), // input wire clka
.wea(data_in_valid), // input wire [0 : 0] wea
.addra(wr_addr), // input wire [4 : 0] addra
.dina(window_in[k]), // input wire [7 : 0] dina
.clkb(clk), // input wire clkb
.enb(1'b1), // input wire enb
.addrb(rd_addr), // input wire [4 : 0] addrb
.doutb(window_out[k]) // output wire [7 : 0] doutb
);
end
endgenerate
//======================= window ===============================
reg signed [8:0] window[4:0][4:0];
integer i,j;
always@(posedge clk ,negedge rst_n)begin
if(~rst_n)begin
for(i=0;i<5;i=i+1)begin
for(j=0;j<5;j=j+1)begin
window[i][j]<=0;
end
end
end
else if(data_in_valid==1'b1 )begin
for(i=0;i<5;i=i+1)begin
window[i][0] <= window_in[i];
for(j=1;j<5;j=j+1)begin
window[i][j]<=window[i][j-1];
end
end
end
end
//======================= x_cnt y_cnt ====================
reg [4:0] x_cnt;
reg [4:0] y_cnt;
always@(posedge clk,negedge rst_n)begin
if(~rst_n)
x_cnt<=0;
else if(x_cnt == 5'd27 && data_in_valid==1'b1)
x_cnt<=0;
else if(data_in_valid==1'b1)
x_cnt<= x_cnt +1'b1;
end
always@(posedge clk,negedge rst_n)begin
if(~rst_n)
y_cnt<=0;
else if(y_cnt == 5'd27 && x_cnt == 5'd27 && data_in_valid==1'b1)
y_cnt<=0;
else if(data_in_valid==1'b1 && x_cnt == 5'd27)
y_cnt<= y_cnt +1'b1;
end
//======================= weights =======================
wire c1_w_rd_en;
assign c1_w_rd_en = (data_in_valid && x_cnt>=0 && y_cnt==0)? 1'b1 : 1'b0;
wire [15:0] rd_c1_w_1_data;
wire [15:0] rd_c1_w_2_data;
wire [15:0] rd_c1_w_3_data;
wire [15:0] rd_c1_w_4_data;
wire [15:0] rd_c1_w_5_data;
wire [15:0] rd_c1_w_6_data;
wire [15:0] rd_c1_b_data;
reg signed [15:0] c1_w_1[4:0][4:0];
reg signed [15:0] c1_w_2[4:0][4:0];
reg signed [15:0] c1_w_3[4:0][4:0];
reg signed [15:0] c1_w_4[4:0][4:0];
reg signed [15:0] c1_w_5[4:0][4:0];
reg signed [15:0] c1_w_6[4:0][4:0];
reg signed [15:0] c1_b[5:0];
c1_b_rom c1_b_U (
.clka(clk), // input wire clka
.ena(c1_w_rd_en), // input wire ena
.addra(x_cnt), // input wire [4 : 0] addra
.douta(rd_c1_b_data) // output wire [15 : 0] douta
);
c1_w_1_rom c1_w_1_U (
.clka(clk), // input wire clka
.ena(c1_w_rd_en), // input wire ena
.addra(x_cnt), // input wire [4 : 0] addra
.douta(rd_c1_w_1_data) // output wire [15 : 0] douta
);
c1_w_2_rom c1_w_2_U (
.clka(clk), // input wire clka
.ena(c1_w_rd_en), // input wire ena
.addra(x_cnt), // input wire [4 : 0] addra
.douta(rd_c1_w_2_data) // output wire [15 : 0] douta
);
c1_w_3_rom c1_w_3_U (
.clka(clk), // input wire clka
.ena(c1_w_rd_en), // input wire ena
.addra(x_cnt), // input wire [4 : 0] addra
.douta(rd_c1_w_3_data) // output wire [15 : 0] douta
);
c1_w_4_rom c1_w_4_U (
.clka(clk), // input wire clka
.ena(c1_w_rd_en), // input wire ena
.addra(x_cnt), // input wire [4 : 0] addra
.douta(rd_c1_w_4_data) // output wire [15 : 0] douta
);
c1_w_5_rom c1_w_5_U (
.clka(clk), // input wire clka
.ena(c1_w_rd_en), // input wire ena
.addra(x_cnt), // input wire [4 : 0] addra
.douta(rd_c1_w_5_data) // output wire [15 : 0] douta
);
c1_w_6_rom c1_w_6_U (
.clka(clk), // input wire clka
.ena(c1_w_rd_en), // input wire ena
.addra(x_cnt), // input wire [4 : 0] addra
.douta(rd_c1_w_6_data) // output wire [15 : 0] douta
);
always@(*)begin
if(y_cnt==0)begin
c1_w_1[(x_cnt-1)/5][(x_cnt-1)%5] = rd_c1_w_1_data;
c1_w_2[(x_cnt-1)/5][(x_cnt-1)%5] = rd_c1_w_2_data;
c1_w_3[(x_cnt-1)/5][(x_cnt-1)%5] = rd_c1_w_3_data;
c1_w_4[(x_cnt-1)/5][(x_cnt-1)%5] = rd_c1_w_4_data;
c1_w_5[(x_cnt-1)/5][(x_cnt-1)%5] = rd_c1_w_5_data;
c1_w_6[(x_cnt-1)/5][(x_cnt-1)%5] = rd_c1_w_6_data;
c1_b[(x_cnt-1)] = rd_c1_b_data;
end
end
//======================= mul =======================
reg signed[31:0] window_mul_result_1[4:0][4:0];
reg signed[31:0] window_mul_result_2[4:0][4:0];
reg signed[31:0] window_mul_result_3[4:0][4:0];
reg signed[31:0] window_mul_result_4[4:0][4:0];
reg signed[31:0] window_mul_result_5[4:0][4:0];
reg signed[31:0] window_mul_result_6[4:0][4:0];
always@(posedge clk,negedge rst_n)begin
if(~rst_n)begin
for(i=0;i<5;i=i+1)begin
for(j=0;j<5;j=j+1)begin
window_mul_result_1[i][j] <= 0;
window_mul_result_2[i][j] <= 0;
window_mul_result_3[i][j] <= 0;
window_mul_result_4[i][j] <= 0;
window_mul_result_5[i][j] <= 0;
window_mul_result_6[i][j] <= 0;
end
end
end
else begin
for(i=0;i<5;i=i+1)begin
for(j=0;j<5;j=j+1)begin
window_mul_result_1[i][j] <={ { 24{1'b0} }, window[i][j] } * { {16{c1_w_1[i][j][15]}}, c1_w_1[i][j] };
window_mul_result_2[i][j] <={ { 24{1'b0} }, window[i][j] } * { {16{c1_w_2[i][j][15]}}, c1_w_2[i][j] };
window_mul_result_3[i][j] <={ { 24{1'b0} }, window[i][j] } * { {16{c1_w_3[i][j][15]}}, c1_w_3[i][j] };
window_mul_result_4[i][j] <={ { 24{1'b0} }, window[i][j] } * { {16{c1_w_4[i][j][15]}}, c1_w_4[i][j] };
window_mul_result_5[i][j] <={ { 24{1'b0} }, window[i][j] } * { {16{c1_w_5[i][j][15]}}, c1_w_5[i][j] };
window_mul_result_6[i][j] <={ { 24{1'b0} }, window[i][j] } * { {16{c1_w_6[i][j][15]}}, c1_w_6[i][j] };
end
end
end
end
wire [31:0] window_sum_1;
wire [31:0] window_sum_2;
wire [31:0] window_sum_3;
wire [31:0] window_sum_4;
wire [31:0] window_sum_5;
wire [31:0] window_sum_6;
//========================== lut ========================================
assign window_sum_1 =c1_b[0] + window_mul_result_1[0][0]+window_mul_result_1[0][1]+window_mul_result_1[0][2]+window_mul_result_1[0][3]+window_mul_result_1[0][4]+
window_mul_result_1[1][0]+window_mul_result_1[1][1]+window_mul_result_1[1][2]+window_mul_result_1[1][3]+window_mul_result_1[1][4]+
window_mul_result_1[2][0]+window_mul_result_1[2][1]+window_mul_result_1[2][2]+window_mul_result_1[2][3]+window_mul_result_1[2][4]+
window_mul_result_1[3][0]+window_mul_result_1[3][1]+window_mul_result_1[3][2]+window_mul_result_1[3][3]+window_mul_result_1[3][4]+
window_mul_result_1[4][0]+window_mul_result_1[4][1]+window_mul_result_1[4][2]+window_mul_result_1[4][3]+window_mul_result_1[4][4];
assign window_sum_2 =c1_b[1] + window_mul_result_2[0][0]+window_mul_result_2[0][1]+window_mul_result_2[0][2]+window_mul_result_2[0][3]+window_mul_result_2[0][4]+
window_mul_result_2[1][0]+window_mul_result_2[1][1]+window_mul_result_2[1][2]+window_mul_result_2[1][3]+window_mul_result_2[1][4]+
window_mul_result_2[2][0]+window_mul_result_2[2][1]+window_mul_result_2[2][2]+window_mul_result_2[2][3]+window_mul_result_2[2][4]+
window_mul_result_2[3][0]+window_mul_result_2[3][1]+window_mul_result_2[3][2]+window_mul_result_2[3][3]+window_mul_result_2[3][4]+
window_mul_result_2[4][0]+window_mul_result_2[4][1]+window_mul_result_2[4][2]+window_mul_result_2[4][3]+window_mul_result_2[4][4];
assign window_sum_3 =c1_b[2] + window_mul_result_3[0][0]+window_mul_result_3[0][1]+window_mul_result_3[0][2]+window_mul_result_3[0][3]+window_mul_result_3[0][4]+
window_mul_result_3[1][0]+window_mul_result_3[1][1]+window_mul_result_3[1][2]+window_mul_result_3[1][3]+window_mul_result_3[1][4]+
window_mul_result_3[2][0]+window_mul_result_3[2][1]+window_mul_result_3[2][2]+window_mul_result_3[2][3]+window_mul_result_3[2][4]+
window_mul_result_3[3][0]+window_mul_result_3[3][1]+window_mul_result_3[3][2]+window_mul_result_3[3][3]+window_mul_result_3[3][4]+
window_mul_result_3[4][0]+window_mul_result_3[4][1]+window_mul_result_3[4][2]+window_mul_result_3[4][3]+window_mul_result_3[4][4];
assign window_sum_4 =c1_b[3] + window_mul_result_4[0][0]+window_mul_result_4[0][1]+window_mul_result_4[0][2]+window_mul_result_4[0][3]+window_mul_result_4[0][4]+
window_mul_result_4[1][0]+window_mul_result_4[1][1]+window_mul_result_4[1][2]+window_mul_result_4[1][3]+window_mul_result_4[1][4]+
window_mul_result_4[2][0]+window_mul_result_4[2][1]+window_mul_result_4[2][2]+window_mul_result_4[2][3]+window_mul_result_4[2][4]+
window_mul_result_4[3][0]+window_mul_result_4[3][1]+window_mul_result_4[3][2]+window_mul_result_4[3][3]+window_mul_result_4[3][4]+
window_mul_result_4[4][0]+window_mul_result_4[4][1]+window_mul_result_4[4][2]+window_mul_result_4[4][3]+window_mul_result_4[4][4];
assign window_sum_5 =c1_b[4] + window_mul_result_5[0][0]+window_mul_result_5[0][1]+window_mul_result_5[0][2]+window_mul_result_5[0][3]+window_mul_result_5[0][4]+
window_mul_result_5[1][0]+window_mul_result_5[1][1]+window_mul_result_5[1][2]+window_mul_result_5[1][3]+window_mul_result_5[1][4]+
window_mul_result_5[2][0]+window_mul_result_5[2][1]+window_mul_result_5[2][2]+window_mul_result_5[2][3]+window_mul_result_5[2][4]+
window_mul_result_5[3][0]+window_mul_result_5[3][1]+window_mul_result_5[3][2]+window_mul_result_5[3][3]+window_mul_result_5[3][4]+
window_mul_result_5[4][0]+window_mul_result_5[4][1]+window_mul_result_5[4][2]+window_mul_result_5[4][3]+window_mul_result_5[4][4];
assign window_sum_6 =c1_b[5] + window_mul_result_6[0][0]+window_mul_result_6[0][1]+window_mul_result_6[0][2]+window_mul_result_6[0][3]+window_mul_result_6[0][4]+
window_mul_result_6[1][0]+window_mul_result_6[1][1]+window_mul_result_6[1][2]+window_mul_result_6[1][3]+window_mul_result_6[1][4]+
window_mul_result_6[2][0]+window_mul_result_6[2][1]+window_mul_result_6[2][2]+window_mul_result_6[2][3]+window_mul_result_6[2][4]+
window_mul_result_6[3][0]+window_mul_result_6[3][1]+window_mul_result_6[3][2]+window_mul_result_6[3][3]+window_mul_result_6[3][4]+
window_mul_result_6[4][0]+window_mul_result_6[4][1]+window_mul_result_6[4][2]+window_mul_result_6[4][3]+window_mul_result_6[4][4];
assign data_out = {
(window_sum_6[31]==0)?window_sum_6:0,
(window_sum_5[31]==0)?window_sum_5:0,
(window_sum_4[31]==0)?window_sum_4:0,
(window_sum_3[31]==0)?window_sum_3:0,
(window_sum_2[31]==0)?window_sum_2:0,
(window_sum_1[31]==0)?window_sum_1:0
};
wire data_out_valid_o = (x_cnt>=4 && y_cnt>=4)?1'b1:1'b0;
reg delay_data_out_valid_o;
always@(posedge clk)begin
delay_data_out_valid_o <=data_out_valid_o;
data_out_valid <= delay_data_out_valid_o;
end
endmodule
```
* 池化層1:
* 為了實現長寬為2×2的最大池化(Maxpooling)運算。
* 為了在同時間比較兩個數,做了一個delay。

* 設計概念:
1. 先比較同列相鄰兩數值的大小。
2. 從RAM中取出上一列相應位置的最大值
3. 將比較後的最大值為所求之輸出
4. 將此列相鄰兩數值的較大值存入RAM中,減少多餘空間利用。

```verilog=
`timescale 1ns / 1ps
//////////////////////////////////////////////////////////////////////////////////
module pool1(
input clk,
input rst_n,
input [191:0] data_in, // 191 = 6 * 32 - 1
input data_in_valid,
output [191:0] data_out, // 191 = 6 * 32 - 1
output data_out_valid
);
reg [4:0] x_cnt;
reg [4:0] y_cnt;
always@(posedge clk ,negedge rst_n)begin
if(~rst_n)
x_cnt <= 0;
else if(data_in_valid && x_cnt == 'd23 )
x_cnt <= 0;
else if(data_in_valid)
x_cnt <=x_cnt +1'b1;
end
always@(posedge clk ,negedge rst_n)begin
if(~rst_n)
y_cnt <= 0;
else if(data_in_valid && x_cnt == 'd23 && y_cnt == 'd23 )
y_cnt <= 0;
else if(data_in_valid && x_cnt == 'd23 )
y_cnt <=y_cnt +1'b1;
end
//==================== delay data_in =============
reg [191:0] delay_data_in; // 191 = 6 * 32 - 1
always@(posedge clk)begin
delay_data_in<=data_in;
end
//==================== prepare for ram =============
wire [31:0] wr_data[0:5];
wire [31:0] rd_data[0:5];
wire wr_en;
reg [4:0] wr_addr;
reg [4:0] rd_addr;
assign wr_en = x_cnt >0;
genvar k;
generate
for (k=0;k<6;k=k+1)begin
assign wr_data[k] = ( data_in[(k+1)*32-1:k*32] > delay_data_in[(k+1)*32-1:k*32]) ? data_in[(k+1)*32-1:k*32] : delay_data_in[(k+1)*32-1:k*32];
end
endgenerate
wire [4:0] rd_addr_pre2 = wr_addr +2;
always@(posedge clk,negedge rst_n)begin
if(~rst_n)begin
wr_addr <=0;
rd_addr <= 0;
end
else if(data_in_valid )begin
if(wr_addr == 'd23)
wr_addr<=0;
else
wr_addr <= wr_addr +1'b1;
if(rd_addr_pre2 > 'd23)
rd_addr <= rd_addr_pre2-'d24;
else
rd_addr <= rd_addr_pre2;
end
end
reg [4:0] wr_addr_delay;
always@(posedge clk,negedge rst_n)begin
if(~rst_n)begin
wr_addr_delay <=0;
end
else begin
wr_addr_delay <= wr_addr;
end
end
generate
for (k=0;k<6;k=k+1)begin
pool1_data_linebuffer pool1_data_linebuffer_U (
.clka(clk), // input wire clka
.ena(1'b1), // input wire ena
.wea(wr_en), // input wire [0 : 0] wea
.addra(wr_addr_delay), // input wire [4 : 0] addra
.dina(wr_data[k]), // input wire [30 : 0] dina
.enb(1'b1),
.clkb(clk), // input wire clkb
.addrb(rd_addr), // input wire [4 : 0] addrb
.doutb(rd_data[k]) // output wire [30 : 0] doutb
);
end
endgenerate
generate
for (k=0;k<6;k=k+1)begin
assign data_out[(k+1)*32-1:k*32] = ( rd_data[k] > wr_data[k] )?rd_data[k] :wr_data[k];
end
endgenerate
assign data_out_valid = ( x_cnt[0:0]==1 && y_cnt[0:0]==1)?1'b1:1'b0;
endmodule
```
* 卷積層2:
* 為了實現了一個長寬為12×12且有6個通道的特徵圖(卷積1輸出)與16層大小為5×5,6通道卷積核的多通道卷積運算。
* 多通道卷積:

* 1×1×6的卷積運算器。

* 5×1×6的卷積運算器。

* 5×5×6的卷積運算器。

* 為了進一步節省硬體空間,藉由p1_fifo控制輸入的特徵圖數據,將一個數據用16個運算週期送進來。

```verilog=
`timescale 1ns / 1ps
//////////////////////////////////////////////////////////////////////////////////
module conv2(
input clk,
input rst_n,
input [191:0] data_in, // 191 = 32 * 6 - 1
input data_in_valid,
output [15:0] data_out,
output reg c2_ready,
output reg data_out_valid
);
genvar m, n, k;
//================== ADDR =======================
reg [3:0] wr_addr;
reg [3:0] rd_addr;
wire [3:0] rd_addr_pre2 = wr_addr + 2;
always@(posedge clk or negedge rst_n)begin
if(~rst_n)begin
wr_addr <= 0;
rd_addr <= 0;
end
else if( data_in_valid == 1'b1 )begin
//============ write ==============
if(wr_addr == 'd11)
wr_addr <= 0;
else
wr_addr <= wr_addr + 1'd1;
//============ read ==============
if(rd_addr_pre2 > 'd11)
rd_addr <= rd_addr_pre2 - 4'd12;
else
rd_addr <=rd_addr_pre2;
end
end
//================== DATA ========================
wire [191:0] window_in[0:2]; // 191 = 32 * 6 - 1
wire [191:0] window_out[0:2]; // 191 = 32 * 6 - 1
assign window_in[0] = data_in;
generate
for(k=1;k<3;k=k+1)begin
assign window_in[k] = window_out[k-1];
end
endgenerate
reg delay_c2_ready;
always@(posedge clk) begin
delay_c2_ready<=c2_ready;
end
generate
for(k=0;k<3;k=k+1)begin
conv2_linebuffer conv2_linebuffer_U (
.clka(clk), // input wire clka
.ena(1'b1), // input wire ena
.wea(data_in_valid), // input wire [0 : 0] wea
.addra(wr_addr), // input wire [3 : 0] addra
.dina(window_in[k]), // input wire [191 : 0] dina
.clkb(clk), // input wire clkb
.enb(delay_c2_ready), // input wire enb
.addrb(rd_addr), // input wire [3 : 0] addrb
.doutb(window_out[k]) // output wire [191 : 0] doutb
);
end
endgenerate
//===================== data window(6 channel) ============
integer i,j;
reg [191:0] window[2:0][2:0];
always@(posedge clk,negedge rst_n)begin
if(~rst_n)begin
for(i=0;i<3;i=i+1)begin
for(j=0;j<3;j=j+1)begin
window[i][j] <= 0;
end
end
end
else if(data_in_valid) begin
for(i=0;i<3;i=i+1)begin
window[i][0] <= window_in[i];
for(j=1;j<3;j=j+1)begin
window[i][j] <= window[i][j-1];
end
end
end
end
// 可視化每個通道的值
wire [31:0] window_vis_0[2:0][2:0];
wire [31:0] window_vis_1[2:0][2:0];
wire [31:0] window_vis_2[2:0][2:0];
wire [31:0] window_vis_3[2:0][2:0];
wire [31:0] window_vis_4[2:0][2:0];
wire [31:0] window_vis_5[2:0][2:0];
generate
for(m=0 ; m<3 ; m=m+1)begin
for(n=0 ; n<3 ; n=n+1)begin
assign window_vis_0[m][n] = window[m][n][(0+1)*32-1:0*32];
assign window_vis_1[m][n] = window[m][n][(1+1)*32-1:1*32];
assign window_vis_2[m][n] = window[m][n][(2+1)*32-1:2*32];
assign window_vis_3[m][n] = window[m][n][(3+1)*32-1:3*32];
assign window_vis_4[m][n] = window[m][n][(4+1)*32-1:4*32];
assign window_vis_5[m][n] = window[m][n][(5+1)*32-1:5*32];
end
end
endgenerate
//============================X_CNT =================================
reg [1:0] out_channel_cnt;
reg [4:0] x_cnt;
reg [4:0] y_cnt;
always@(posedge clk ,negedge rst_n)begin
if(~rst_n)
x_cnt <= 0;
else if(y_cnt == 12 && x_cnt == 1)
x_cnt <= 0;
else if(y_cnt == 12 && x_cnt == 0 && out_channel_cnt==2)
x_cnt <=x_cnt +1;
else if(data_in_valid && x_cnt == 11)
x_cnt <= 0;
else if(data_in_valid)
x_cnt <=x_cnt +1;
end
always@(posedge clk ,negedge rst_n)begin
if(~rst_n)
y_cnt <= 0;
else if( x_cnt == 1 && y_cnt == 12)
y_cnt <= 0;
else if(data_in_valid && x_cnt == 11)
y_cnt <=y_cnt +1;
else
y_cnt <=y_cnt;
end
//======================== OUT_CHANNEL_CNT============================
always@(posedge clk or negedge rst_n)begin
if(~rst_n)
out_channel_cnt<=0;
// else if(x_cnt > 0 && x_cnt < 1 && y_cnt > 5 && y_cnt != 12)
// out_channel_cnt<=0;
else if(out_channel_cnt==2)
out_channel_cnt<=0;
else if(x_cnt>=3 && y_cnt>=2)
out_channel_cnt<=out_channel_cnt+1;
else if(x_cnt == 0 && y_cnt>2)
out_channel_cnt<=out_channel_cnt+1;
else if(out_channel_cnt==1 && x_cnt == 1 && y_cnt > 2)
out_channel_cnt <= out_channel_cnt+1;
else
out_channel_cnt<=out_channel_cnt;
end
//========================= C2_READY =================================
//給P1_FIFO用的
always@(posedge clk or negedge rst_n)begin
if(~rst_n) //復位
c2_ready <= 1'b1;
else if(out_channel_cnt == 2 && x_cnt != 0)
c2_ready <= 1'b0;
else if(out_channel_cnt == 1 && x_cnt != 1)
c2_ready <= 1'b1;
else if(x_cnt == 1 && y_cnt >= 2)//第一個不ready時刻
c2_ready <= 1'b0;
else if(x_cnt == 0 && y_cnt > 2)
c2_ready <= 1'b1;
// else if(x_cnt > 0 && x_cnt < 1 && y_cnt > 2)//第一個不ready時刻
// c2_ready <= 1'b1;
end
//=========== 參數讀取:6個輸入通道上是並行,16個輸出通道上是串行 =========
// 參數: 3×6×5×5 ===> 讀取 3串行計算輸出通道上的運算
// 按列展開 c2_w => 3輸出通道 ×(6輸入通道)×( 5行 )
// |-----3個列方向數(每個數是是8bit ×輸入6通道並行)-----|-------|------|
// |--0--|--1--|--2--...|--15--| 每個窗口是240bits
//==================== PARAMETER ====================
// 一個點是六通道並行的三個數 -> 18個數 -> 每個數8位=144位
wire [143:0] c2_w_row0_rd_data;
wire [143:0] c2_w_row1_rd_data;
wire [143:0] c2_w_row2_rd_data;
// 一個點是六通道並行的三個數 -> 18個數
wire [7:0] c2_w_row0_data[0:17];
wire [7:0] c2_w_row1_data[0:17];
wire [7:0] c2_w_row2_data[0:17];
c2_w_row0 Uc2_w_row0 (
.clka(clk), // input wire clka
.addra(out_channel_cnt), // input wire [3 : 0] addra
.douta(c2_w_row0_rd_data) // output wire [239 : 0] douta
);
c2_w_row1 Uc2_w_row1 (
.clka(clk), // input wire clka
.addra(out_channel_cnt), // input wire [3 : 0] addra
.douta(c2_w_row1_rd_data) // output wire [239 : 0] douta
);
c2_w_row2 Uc2_w_row2 (
.clka(clk), // input wire clka
.addra(out_channel_cnt), // input wire [3 : 0] addra
.douta(c2_w_row2_rd_data) // output wire [239 : 0] douta
);
generate
for(k=0;k<18;k=k+1)begin
assign c2_w_row0_data[k] = c2_w_row0_rd_data[k*8+:8];
assign c2_w_row1_data[k] = c2_w_row1_rd_data[k*8+:8];
assign c2_w_row2_data[k] = c2_w_row2_rd_data[k*8+:8];
end
endgenerate
//=============================== MUL================================
reg signed [31:0] in_channel_0_mul_result[0:2][0:2];
reg signed [31:0] in_channel_1_mul_result[0:2][0:2];
reg signed [31:0] in_channel_2_mul_result[0:2][0:2];
reg signed [31:0] in_channel_3_mul_result[0:2][0:2];
reg signed [31:0] in_channel_4_mul_result[0:2][0:2];
reg signed [31:0] in_channel_5_mul_result[0:2][0:2];
wire [34:0] in_channel_0_sum_result;
wire [34:0] in_channel_1_sum_result;
wire [34:0] in_channel_2_sum_result;
wire [34:0] in_channel_3_sum_result;
wire [34:0] in_channel_4_sum_result;
wire [34:0] in_channel_5_sum_result;
wire signed [34:0] in_channel_sum_result;
//============c1_w *4096 | c2_w * 256 ===
wire [15:0] in_channel_sum_result_s;
always@(posedge clk)begin
for(j=0;j<3;j=j+1)begin
in_channel_0_mul_result[0][j]<= window[2][2-j][32*(0+1)-1:32*0] *{{24{c2_w_row0_data[j+0*3][7]}}, c2_w_row0_data[j+0*3]};
in_channel_0_mul_result[1][j]<= window[1][2-j][32*(0+1)-1:32*0] *{{24{c2_w_row1_data[j+0*3][7]}}, c2_w_row1_data[j+0*3]};
in_channel_0_mul_result[2][j]<= window[0][2-j][32*(0+1)-1:32*0] *{{24{c2_w_row2_data[j+0*3][7]}}, c2_w_row2_data[j+0*3]};
in_channel_1_mul_result[0][j]<= window[2][2-j][32*(1+1)-1:32*1] *{{24{c2_w_row0_data[j+1*3][7]}}, c2_w_row0_data[j+1*3]};
in_channel_1_mul_result[1][j]<= window[1][2-j][32*(1+1)-1:32*1] *{{24{c2_w_row1_data[j+1*3][7]}}, c2_w_row1_data[j+1*3]};
in_channel_1_mul_result[2][j]<= window[0][2-j][32*(1+1)-1:32*1] *{{24{c2_w_row2_data[j+1*3][7]}}, c2_w_row2_data[j+1*3]};
in_channel_2_mul_result[0][j]<= window[2][2-j][32*(2+1)-1:32*2] *{{24{c2_w_row0_data[j+2*3][7]}}, c2_w_row0_data[j+2*3]};
in_channel_2_mul_result[1][j]<= window[1][2-j][32*(2+1)-1:32*2] *{{24{c2_w_row1_data[j+2*3][7]}}, c2_w_row1_data[j+2*3]};
in_channel_2_mul_result[2][j]<= window[0][2-j][32*(2+1)-1:32*2] *{{24{c2_w_row2_data[j+2*3][7]}}, c2_w_row2_data[j+2*3]};
in_channel_3_mul_result[0][j]<= window[2][2-j][32*(3+1)-1:32*3] *{{24{c2_w_row0_data[j+3*3][7]}}, c2_w_row0_data[j+3*3]};
in_channel_3_mul_result[1][j]<= window[1][2-j][32*(3+1)-1:32*3] *{{24{c2_w_row1_data[j+3*3][7]}}, c2_w_row1_data[j+3*3]};
in_channel_3_mul_result[2][j]<= window[0][2-j][32*(3+1)-1:32*3] *{{24{c2_w_row2_data[j+3*3][7]}}, c2_w_row2_data[j+3*3]};
in_channel_4_mul_result[0][j]<= window[2][2-j][32*(4+1)-1:32*4] *{{24{c2_w_row0_data[j+4*3][7]}}, c2_w_row0_data[j+4*3]};
in_channel_4_mul_result[1][j]<= window[1][2-j][32*(4+1)-1:32*4] *{{24{c2_w_row1_data[j+4*3][7]}}, c2_w_row1_data[j+4*3]};
in_channel_4_mul_result[2][j]<= window[0][2-j][32*(4+1)-1:32*4] *{{24{c2_w_row2_data[j+4*3][7]}}, c2_w_row2_data[j+4*3]};
in_channel_5_mul_result[0][j]<= window[2][2-j][32*(5+1)-1:32*5] *{{24{c2_w_row0_data[j+5*3][7]}}, c2_w_row0_data[j+5*3]};
in_channel_5_mul_result[1][j]<= window[1][2-j][32*(5+1)-1:32*5] *{{24{c2_w_row1_data[j+5*3][7]}}, c2_w_row1_data[j+5*3]};
in_channel_5_mul_result[2][j]<= window[0][2-j][32*(5+1)-1:32*5] *{{24{c2_w_row2_data[j+5*3][7]}}, c2_w_row2_data[j+5*3]};
end
end
assign in_channel_0_sum_result = in_channel_0_mul_result[0][0]+in_channel_0_mul_result[0][1]+in_channel_0_mul_result[0][2]+
in_channel_0_mul_result[1][0]+in_channel_0_mul_result[1][1]+in_channel_0_mul_result[1][2]+
in_channel_0_mul_result[2][0]+in_channel_0_mul_result[2][1]+in_channel_0_mul_result[2][2];
assign in_channel_1_sum_result = in_channel_1_mul_result[0][0]+in_channel_1_mul_result[0][1]+in_channel_1_mul_result[0][2]+
in_channel_1_mul_result[1][0]+in_channel_1_mul_result[1][1]+in_channel_1_mul_result[1][2]+
in_channel_1_mul_result[2][0]+in_channel_1_mul_result[2][1]+in_channel_1_mul_result[2][2];
assign in_channel_2_sum_result = in_channel_2_mul_result[0][0]+in_channel_2_mul_result[0][1]+in_channel_2_mul_result[0][2]+
in_channel_2_mul_result[1][0]+in_channel_2_mul_result[1][1]+in_channel_2_mul_result[1][2]+
in_channel_2_mul_result[2][0]+in_channel_2_mul_result[2][1]+in_channel_2_mul_result[2][2];
assign in_channel_3_sum_result = in_channel_3_mul_result[0][0]+in_channel_3_mul_result[0][1]+in_channel_3_mul_result[0][2]+
in_channel_3_mul_result[1][0]+in_channel_3_mul_result[1][1]+in_channel_3_mul_result[1][2]+
in_channel_3_mul_result[2][0]+in_channel_3_mul_result[2][1]+in_channel_3_mul_result[2][2];
assign in_channel_4_sum_result = in_channel_4_mul_result[0][0]+in_channel_4_mul_result[0][1]+in_channel_4_mul_result[0][2]+
in_channel_4_mul_result[1][0]+in_channel_4_mul_result[1][1]+in_channel_4_mul_result[1][2]+
in_channel_4_mul_result[2][0]+in_channel_4_mul_result[2][1]+in_channel_4_mul_result[2][2];
assign in_channel_5_sum_result = in_channel_5_mul_result[0][0]+in_channel_5_mul_result[0][1]+in_channel_5_mul_result[0][2]+
in_channel_5_mul_result[1][0]+in_channel_5_mul_result[1][1]+in_channel_5_mul_result[1][2]+
in_channel_5_mul_result[2][0]+in_channel_5_mul_result[2][1]+in_channel_5_mul_result[2][2];
assign in_channel_sum_result = in_channel_0_sum_result+
in_channel_1_sum_result+
in_channel_2_sum_result+
in_channel_3_sum_result+
in_channel_4_sum_result+
in_channel_5_sum_result;
assign in_channel_sum_result_s = in_channel_sum_result >>> 20; //2^12 * 2^8
reg [4:0] delay_x_cnt;
always@(posedge clk)
delay_x_cnt <= x_cnt;
always@(posedge clk or negedge rst_n)begin
if(~rst_n) //復位
data_out_valid <= 0;
else if(x_cnt==3 && y_cnt >=2 && out_channel_cnt == 1)
data_out_valid <= 1;
else if(x_cnt==0 && y_cnt ==0)
data_out_valid <= 0;
else if(x_cnt==1 && y_cnt >=2 && y_cnt!=12)
data_out_valid <= 0;
end
assign data_out = (in_channel_sum_result_s[15]==1)? 0 : in_channel_sum_result_s;
endmodule
```
* 池化層2:
* 概念與池化層1相同
```verilog=
module pool2(
input clk,
input rst_n,
input [47:0] data_in,
input data_in_valid,
output reg done,
output [15:0] data_out_1,
output [15:0] data_out_2,
output [15:0] data_out_3,
output data_out_valid
);
reg [4:0] x_cnt;
reg [4:0] y_cnt;
always@(posedge clk ,negedge rst_n)begin
if(~rst_n)
x_cnt <= 0;
else if(data_in_valid && x_cnt == 'd9)
x_cnt <= 0;
else if(data_in_valid)
x_cnt <=x_cnt +1'b1;
end
always@(posedge clk ,negedge rst_n)begin
if(~rst_n)
y_cnt <= 0;
else if(data_in_valid && x_cnt == 'd9 && y_cnt == 'd9)
y_cnt <= 0;
else if(data_in_valid && x_cnt == 'd9)
y_cnt <=y_cnt +1'b1;
end
always@(posedge clk,negedge rst_n)begin
if(~rst_n)
done<=0;
else if(x_cnt=='d9 && y_cnt=='d9 && data_in_valid)
done<=1'b1;
else
done<=0;
end
//==================== delay data_in =============
reg [47:0] delay_data_in;
always@(posedge clk)begin
delay_data_in<=data_in;
end
//==================== prepare for ram =============
wire [15:0] wr_data[0:2];
wire [15:0] rd_data[0:2];
wire wr_en;
reg [4:0] wr_addr;
reg [4:0] rd_addr;
assign wr_en = data_in_valid;
genvar k;
generate
for (k=0 ; k<3 ; k=k+1)begin
assign wr_data[k] = ( data_in[(k+1)*16-1:k*16] > delay_data_in[(k+1)*16-1:k*16]) ? data_in[(k+1)*16-1:k*16] : delay_data_in[(k+1)*16-1:k*16];
end
endgenerate
wire [4:0] rd_addr_pre2 = wr_addr +2;
always@(posedge clk,negedge rst_n)begin
if(~rst_n)begin
wr_addr <=0;
rd_addr <= 0;
end
else if(data_in_valid )begin
if(wr_addr == 'd9)
wr_addr<=0;
else
wr_addr <= wr_addr +1'b1;
if(rd_addr_pre2 > 'd9)
rd_addr <= rd_addr_pre2-'d10;
else
rd_addr <= rd_addr_pre2;
end
end
generate
for(k=0;k<3;k=k+1)begin
p2_linebuffer p2_linebuffer (
.clka(clk), // input wire clka
.ena(1'b1), // input wire ena
.wea(wr_en), // input wire [0 : 0] wea
.addra(wr_addr), // input wire [3 : 0] addra
.dina(wr_data[k]), // input wire [15 : 0] dina
.enb(1'b1), // input wire enb
.clkb(clk), // input wire clkb
.addrb(rd_addr), // input wire [3 : 0] addrb
.doutb(rd_data[k]) // output wire [15 : 0] doutb
);
end
endgenerate
//wire [15:0] data_out_vis[0:2];
assign data_out_1 = ( rd_data[0] > wr_data[0] )?rd_data[0] :wr_data[0];
assign data_out_2 = ( rd_data[1] > wr_data[1] )?rd_data[1] :wr_data[1];
assign data_out_3 = ( rd_data[2] > wr_data[2] )?rd_data[2] :wr_data[2];
// assign data_out_vis[k] = data_out[(k+1)*16-1:k*16];
assign data_out_valid = ( x_cnt[0:0]==1 && y_cnt[0:0]==1)?1'b1:1'b0;
endmodule
```
* 全連接層:我們將此層設計在PS端
```C=
//============================== FC ==============================
void fc1(double in[75], double fc1_w[60][75], double fc1_b[60], double out[60]){
double sum;
for(int i=0;i<60;i++){
sum = 0;
for(int j=0;j<75;j++){
sum += in[i] * fc1_w[i][j];
}
out[i]=sum + fc1_b[i];
}
//======RELU1======
for(int i=0;i<60;i++){
out[i]+=fc1_b[i];
if(out[i]<0)
out[i]=0;
}
}
void fc2(double in[60],double fc2_w[42][60],double fc2_b[42],double out[42]){
double sum;
for(int i=0;i<42;i++){
sum = 0;
for(int j=0;j<60;j++){
sum += in[i] * fc2_w[i][j];
}
out[i]=sum + fc2_b[i];
}
//======RELU2======
for(int i=0;i<42;i++){
out[i]+=fc2_b[i];
if(out[i]<0)
out[i]=0;
}
}
void fc3(double in[42],double fc3_w[10][42],double fc3_b[10],double out[10]){
double sum;
for(int i=0;i<10;i++){
sum = 0;
for(int j=0;j<42;j++){
sum += in[i] * fc3_w[i][j];
}
out[i]=sum + fc3_b[i];
}
//======RELU3======
for(int i=0;i<10;i++){
out[i]+=fc3_b[i];
if(out[i]<0)
out[i]=0;
}
}
```
---
### OV5640照相機+HDMI顯示系統
**設計架構**
<br>
各IP介紹
1. ZYNQ7 Processing System
* 功能
* PS 和 PL 的連接
* 透過此IP可以將我們自己設計的IP和引用的別人的IP和PS端連接
* 使用
* 自動配置
* 打開HP接口 (VDMA傳輸)
* 打開給PL端的Clock (100 -> stream的clock, 130 -> 給分頻的clock)
2. DVI_TX
* 功能
* VGA2HDMI的IP
3. OV5640
* 功能
* OV5640 CAM 的驅動IP
4. AXI Video Direct Memory Access
* 功能
* 提供從AXI4接口協議到AXI4-Stream接口協議的視頻讀/寫傳輸功能
5. AXI4-Stream to Video Out
* 功能
* 可將 AXI4-Stream 接口信號轉換成一個支持定時信號的標準並行視頻輸出接口
* 能夠與 Xilinx 視頻定時控制器 (VTC) IP協同工作,生成視頻格式定時信號
* 使用
* 用獨立的Clock控制
6. Video in to AXI4-Stream
* 功能
* 將視頻源(帶有同步信號的時鐘並行視頻數據,即同步sync或消隱blank信號或者而後者皆有)轉換成AXI4-Stream接口形式
* 使用
* 獨立使用
7. Video Timing Controller
* 功能
* 通用視頻時序檢測器和發生器。
* 能夠與 Xilinx Video in to AXI4-Stream 或 AXI4-Stream to Video Out 搭配使用
* 使用
* 設定分辨率 (我們用的是480p*640p)
* AXI-Lite 不控制
* 也不要開啟 Detection
8. Clock Wizard
* 功能
* 可簡化在 Xilinx FPGA 中配置時鐘資源的過程
* 作為一個分頻器
* 使用
* 不用reset
* 我們這要使用的Clock (25 -> 給I2C協議使用, 125 -> 給DVI_TX使用)
HDMI顯示系統
* XDC Constraint
```
set_property IOSTANDARD TMDS_33 [get_ports TMDS_Clk_p]
set_property IOSTANDARD TMDS_33 [get_ports TMDS_Clk_n]
set_property IOSTANDARD TMDS_33 [get_ports {TMDS_Data_p[0]}]
set_property IOSTANDARD TMDS_33 [get_ports {TMDS_Data_p[1]}]
set_property IOSTANDARD TMDS_33 [get_ports {TMDS_Data_p[2]}]
set_property IOSTANDARD TMDS_33 [get_ports {TMDS_Data_n[0]}]
set_property IOSTANDARD TMDS_33 [get_ports {TMDS_Data_n[1]}]
set_property IOSTANDARD TMDS_33 [get_ports {TMDS_Data_n[2]}]
set_property PACKAGE_PIN L16 [get_ports TMDS_Clk_p]
set_property PACKAGE_PIN L17 [get_ports TMDS_Clk_n]
set_property PACKAGE_PIN K17 [get_ports {TMDS_Data_p[0]}]
set_property PACKAGE_PIN K19 [get_ports {TMDS_Data_p[1]}]
set_property PACKAGE_PIN J18 [get_ports {TMDS_Data_p[2]}]
set_property PACKAGE_PIN K18 [get_ports {TMDS_Data_n[0]}]
set_property PACKAGE_PIN J19 [get_ports {TMDS_Data_n[1]}]
set_property PACKAGE_PIN H18 [get_ports {TMDS_Data_n[2]}]
```
* ==PS端控制顯示系統==
* 實現FPGA控制HDMI顯示的功能
```c
#include "xparameters.h" //查看地址
#include "xil_cache.h"
#include "xil_types.h"
#define VDMA_ADDR XPAR_AXI_VDMA_0_BASEADDR
#define FRAME_ADDR 0x01200000
#define H 480
#define W 640
typedef struct{
u8 g;
u8 b;
u8 r;
}Pixel;
u8 WORD_MAP[95][16] ={
/* " !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"*/
{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x00,0x00,0x10,0x10,0x00,0x00,0x00},
{0x12,0x24,0x24,0x48,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x12,0x12,0x12,0x7E,0x24,0x24,0x24,0x7E,0x24,0x24,0x24,0x00,0x00,0x00},
{0x00,0x08,0x3C,0x4A,0x4A,0x48,0x38,0x0C,0x0A,0x0A,0x4A,0x4A,0x3C,0x08,0x08,0x00},
{0x00,0x00,0x44,0xA4,0xA8,0xA8,0xB0,0x54,0x1A,0x2A,0x2A,0x4A,0x44,0x00,0x00,0x00},
{0x00,0x00,0x30,0x48,0x48,0x48,0x50,0x6E,0xA4,0x94,0x98,0x89,0x76,0x00,0x00,0x00},
{0x60,0x20,0x20,0x40,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00},
{0x02,0x04,0x08,0x08,0x10,0x10,0x10,0x10,0x10,0x10,0x08,0x08,0x04,0x02,0x00,0x00},
{0x40,0x20,0x10,0x10,0x08,0x08,0x08,0x08,0x08,0x08,0x10,0x10,0x20,0x40,0x00,0x00},
{0x00,0x00,0x00,0x10,0x10,0xD6,0x38,0x38,0xD6,0x10,0x10,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x08,0x08,0x08,0x7F,0x08,0x08,0x08,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x60,0x20,0x20,0x40,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x7E,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x60,0x60,0x00,0x00,0x00},
{0x00,0x02,0x04,0x04,0x04,0x08,0x08,0x10,0x10,0x10,0x20,0x20,0x40,0x40,0x00,0x00},
{0x00,0x00,0x18,0x24,0x42,0x42,0x42,0x42,0x42,0x42,0x42,0x24,0x18,0x00,0x00,0x00},
{0x00,0x00,0x08,0x38,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x3E,0x00,0x00,0x00},
{0x00,0x00,0x3C,0x42,0x42,0x42,0x02,0x04,0x08,0x10,0x20,0x42,0x7E,0x00,0x00,0x00},
{0x00,0x00,0x3C,0x42,0x42,0x02,0x04,0x18,0x04,0x02,0x42,0x42,0x3C,0x00,0x00,0x00},
{0x00,0x00,0x04,0x0C,0x0C,0x14,0x24,0x24,0x44,0x7F,0x04,0x04,0x1F,0x00,0x00,0x00},
{0x00,0x00,0x7E,0x40,0x40,0x40,0x78,0x44,0x02,0x02,0x42,0x44,0x38,0x00,0x00,0x00},
{0x00,0x00,0x18,0x24,0x40,0x40,0x5C,0x62,0x42,0x42,0x42,0x22,0x1C,0x00,0x00,0x00},
{0x00,0x00,0x7E,0x42,0x04,0x04,0x08,0x08,0x10,0x10,0x10,0x10,0x10,0x00,0x00,0x00},
{0x00,0x00,0x3C,0x42,0x42,0x42,0x24,0x18,0x24,0x42,0x42,0x42,0x3C,0x00,0x00,0x00},
{0x00,0x00,0x38,0x44,0x42,0x42,0x42,0x46,0x3A,0x02,0x02,0x24,0x18,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x18,0x18,0x00,0x00,0x00,0x00,0x18,0x18,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x10,0x00,0x00,0x00,0x00,0x00,0x10,0x10,0x10,0x00},
{0x00,0x00,0x02,0x04,0x08,0x10,0x20,0x40,0x20,0x10,0x08,0x04,0x02,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x7E,0x00,0x00,0x7E,0x00,0x00,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x40,0x20,0x10,0x08,0x04,0x02,0x04,0x08,0x10,0x20,0x40,0x00,0x00,0x00},
{0x00,0x00,0x3C,0x42,0x42,0x62,0x04,0x08,0x08,0x08,0x00,0x18,0x18,0x00,0x00,0x00},
{0x00,0x00,0x38,0x44,0x5A,0xAA,0xAA,0xAA,0xAA,0xAA,0x5C,0x42,0x3C,0x00,0x00,0x00},
{0x00,0x00,0x10,0x10,0x18,0x28,0x28,0x24,0x3C,0x44,0x42,0x42,0xE7,0x00,0x00,0x00},
{0x00,0x00,0xF8,0x44,0x44,0x44,0x78,0x44,0x42,0x42,0x42,0x44,0xF8,0x00,0x00,0x00},
{0x00,0x00,0x3E,0x42,0x42,0x80,0x80,0x80,0x80,0x80,0x42,0x44,0x38,0x00,0x00,0x00},
{0x00,0x00,0xF8,0x44,0x42,0x42,0x42,0x42,0x42,0x42,0x42,0x44,0xF8,0x00,0x00,0x00},
{0x00,0x00,0xFC,0x42,0x48,0x48,0x78,0x48,0x48,0x40,0x42,0x42,0xFC,0x00,0x00,0x00},
{0x00,0x00,0xFC,0x42,0x48,0x48,0x78,0x48,0x48,0x40,0x40,0x40,0xE0,0x00,0x00,0x00},
{0x00,0x00,0x3C,0x44,0x44,0x80,0x80,0x80,0x8E,0x84,0x44,0x44,0x38,0x00,0x00,0x00},
{0x00,0x00,0xE7,0x42,0x42,0x42,0x42,0x7E,0x42,0x42,0x42,0x42,0xE7,0x00,0x00,0x00},
{0x00,0x00,0x7C,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x7C,0x00,0x00,0x00},
{0x00,0x00,0x3E,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x88,0xF0,0x00},
{0x00,0x00,0xEE,0x44,0x48,0x50,0x70,0x50,0x48,0x48,0x44,0x44,0xEE,0x00,0x00,0x00},
{0x00,0x00,0xE0,0x40,0x40,0x40,0x40,0x40,0x40,0x40,0x40,0x42,0xFE,0x00,0x00,0x00},
{0x00,0x00,0xEE,0x6C,0x6C,0x6C,0x6C,0x6C,0x54,0x54,0x54,0x54,0xD6,0x00,0x00,0x00},
{0x00,0x00,0xC7,0x62,0x62,0x52,0x52,0x4A,0x4A,0x4A,0x46,0x46,0xE2,0x00,0x00,0x00},
{0x00,0x00,0x38,0x44,0x82,0x82,0x82,0x82,0x82,0x82,0x82,0x44,0x38,0x00,0x00,0x00},
{0x00,0x00,0xFC,0x42,0x42,0x42,0x42,0x7C,0x40,0x40,0x40,0x40,0xE0,0x00,0x00,0x00},
{0x00,0x00,0x38,0x44,0x82,0x82,0x82,0x82,0x82,0x82,0xB2,0x4C,0x38,0x06,0x00,0x00},
{0x00,0x00,0xFC,0x42,0x42,0x42,0x7C,0x48,0x48,0x44,0x44,0x42,0xE3,0x00,0x00,0x00},
{0x00,0x00,0x3E,0x42,0x42,0x40,0x20,0x18,0x04,0x02,0x42,0x42,0x7C,0x00,0x00,0x00},
{0x00,0x00,0xFE,0x92,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x38,0x00,0x00,0x00},
{0x00,0x00,0xE7,0x42,0x42,0x42,0x42,0x42,0x42,0x42,0x42,0x42,0x3C,0x00,0x00,0x00},
{0x00,0x00,0xE7,0x42,0x42,0x44,0x24,0x24,0x28,0x28,0x18,0x10,0x10,0x00,0x00,0x00},
{0x00,0x00,0xD6,0x54,0x54,0x54,0x54,0x54,0x6C,0x28,0x28,0x28,0x28,0x00,0x00,0x00},
{0x00,0x00,0xE7,0x42,0x24,0x24,0x18,0x18,0x18,0x24,0x24,0x42,0xE7,0x00,0x00,0x00},
{0x00,0x00,0xEE,0x44,0x44,0x28,0x28,0x10,0x10,0x10,0x10,0x10,0x38,0x00,0x00,0x00},
{0x00,0x00,0x7E,0x84,0x04,0x08,0x08,0x10,0x20,0x20,0x42,0x42,0xFC,0x00,0x00,0x00},
{0x1E,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x1E,0x00,0x00},
{0x00,0x40,0x20,0x20,0x20,0x10,0x10,0x10,0x08,0x08,0x04,0x04,0x04,0x02,0x02,0x00},
{0x78,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x78,0x00,0x00},
{0x18,0x24,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xFF,0x00},
{0x60,0x10,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x38,0x44,0x0C,0x34,0x44,0x4C,0x36,0x00,0x00,0x00},
{0x00,0x00,0x00,0xC0,0x40,0x40,0x58,0x64,0x42,0x42,0x42,0x64,0x58,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x1C,0x22,0x40,0x40,0x40,0x22,0x1C,0x00,0x00,0x00},
{0x00,0x00,0x00,0x06,0x02,0x02,0x3E,0x42,0x42,0x42,0x42,0x46,0x3B,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x3C,0x42,0x42,0x7E,0x40,0x42,0x3C,0x00,0x00,0x00},
{0x00,0x00,0x00,0x0C,0x12,0x10,0x7C,0x10,0x10,0x10,0x10,0x10,0x7C,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x3E,0x44,0x44,0x38,0x40,0x3C,0x42,0x42,0x3C,0x00},
{0x00,0x00,0x00,0xC0,0x40,0x40,0x5C,0x62,0x42,0x42,0x42,0x42,0xE7,0x00,0x00,0x00},
{0x00,0x00,0x30,0x30,0x00,0x00,0x70,0x10,0x10,0x10,0x10,0x10,0x7C,0x00,0x00,0x00},
{0x00,0x00,0x0C,0x0C,0x00,0x00,0x1C,0x04,0x04,0x04,0x04,0x04,0x04,0x44,0x78,0x00},
{0x00,0x00,0x00,0xC0,0x40,0x40,0x4E,0x48,0x50,0x70,0x48,0x44,0xEE,0x00,0x00,0x00},
{0x00,0x00,0x10,0x70,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x10,0x7C,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xFE,0x49,0x49,0x49,0x49,0x49,0xED,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xDC,0x62,0x42,0x42,0x42,0x42,0xE7,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x3C,0x42,0x42,0x42,0x42,0x42,0x3C,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xD8,0x64,0x42,0x42,0x42,0x64,0x58,0x40,0xE0,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x1A,0x26,0x42,0x42,0x42,0x26,0x1A,0x02,0x07,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xEE,0x32,0x20,0x20,0x20,0x20,0xF8,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x3E,0x42,0x40,0x3C,0x02,0x42,0x7C,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x10,0x10,0x7C,0x10,0x10,0x10,0x10,0x12,0x0C,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xC6,0x42,0x42,0x42,0x42,0x46,0x3B,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xEE,0x44,0x44,0x28,0x28,0x10,0x10,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xDB,0x89,0x4A,0x5A,0x54,0x24,0x24,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x76,0x24,0x18,0x18,0x18,0x24,0x6E,0x00,0x00,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0xE7,0x42,0x24,0x24,0x18,0x18,0x10,0x10,0x60,0x00},
{0x00,0x00,0x00,0x00,0x00,0x00,0x7E,0x44,0x08,0x10,0x10,0x22,0x7E,0x00,0x00,0x00},//z
{0x03,0x04,0x04,0x04,0x04,0x04,0x04,0x08,0x04,0x04,0x04,0x04,0x04,0x03,0x00,0x00},
{0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x00,0x80},
{0xC0,0x20,0x20,0x20,0x20,0x20,0x20,0x10,0x20,0x20,0x20,0x20,0x20,0xC0,0x00,0x00},
{0x5A,0x04,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00}
};
void HDMI_STRING(u16 x, u16 y, u8 *str){
while(*str != '\0'){
if(x>640){
x = 0;
y += 16;
}
HDMI_CHAR(x,y,*str);
str++;
x+=8;
}
}
void HDMI_CHAR(u16 x, u16 y, u8 letter){
Pixel *p = FRAME_ADDR;
p = p + y*W + x; //偏移到規定顯示地址
u8 letter_index;
u8 temp;
letter_index = letter - ' ';//a b c ... z
for(int i = 0; i<16; i++){
temp = WORD_MAP[letter_index][i];
for(int j=0; j<8; j++){
Pixel *pixel_tmp = p +i*W +j;
if(temp & 0x80){
pixel_tmp->r =0xff;
pixel_tmp->g =0xff;
pixel_tmp->b =0xff;
}
else{
pixel_tmp->r =0x00;
pixel_tmp->g =0x00;
pixel_tmp->b =0x00;
}
temp = temp << 1;
}
}
}
void HDMI_IMG(u16 x, u16 y, u8 *img){
Pixel *p = FRAME_ADDR;
p = p + y*W + x; //偏移到規定顯示地址
for(int i=0;i<56;i++){
for(int j=0;j<56;j++){
Pixel *pix_tmp = p + i*W + j;
pix_tmp->b = img[i*56+j];
pix_tmp->g = img[i*56+j];
pix_tmp->r = img[i*56+j];
}
}
}
u8 img_0[3136]={
#include "img.h"
};
int main(){
//---------- vdma config ----------
//緩存配置
*(volatile unsigned int*)(VDMA_ADDR + 0x00) = 0x01; //enable mm2s channel
*(volatile unsigned int*)(VDMA_ADDR + 0x5c) = FRAME_ADDR; //frame buffer addr
*(volatile unsigned int*)(VDMA_ADDR + 0x58) = W*3; //flast 間隔字節
*(volatile unsigned int*)(VDMA_ADDR + 0x54) = W*3; //寬度
*(volatile unsigned int*)(VDMA_ADDR + 0x50) = H; //高度
//---------- data ----------
Pixel *p = FRAME_ADDR;
for(int i=0;i<H;i++){
for(int j=0;j<W;j++){
p->g = 0xff;
p->b = 0x00;
p->r = 0x00;
p++;
}
}
HDMI_STRING(200, 100, "LeNet predict result:");
HDMI_IMG(200,200,img_0);
//---------- 更新cache 兩塊cache同步 ----------
Xil_DCacheFlush();
}
```