# resnet50_LORA on Intel® NUC12SNKi72 and Data Flex 140

https://ark.intel.com/content/www/tw/zh/ark/products/196170/intel-nuc-12-enthusiast-kit-nuc12snki72.html

|Arc A770M | no XPU | XPU |XPU + bf16|
| -------- | -------- | -------- | -------- |
| TTT | 757.72341| 247.55658|228.99354 |

# 請注意 Flex 140 裡面有兩個 8 Xe cores 的GPU chip,所以圖片寫著16 Xe Cores 但是 Training時只用到1個. 所以TTT會是A770M 的 1/4 (= 8/32 Xe Cores)
| Flex 140 | no XPU | XPU |XPU + bf16|
| -------- | -------- | -------- | -------- |
| TTT | N/A |1055.0727 |952.8210 |

| Flex 170 | no XPU | XPU |XPU + bf16|XPU + bf16 + CL|
| -------- | -------- | -------- | -------- | ------------- |
| TTT | N/A |234.4946 |210.3633 | 34.5559 |
# Flex 140 Log : python lora_RESNET_basic_003_train_xpu.py
```
(ipex) eapet@spr-flex140:~/resnet_LoRA$ python lora_RESNET_basic_003_train_xpu.py
/home/eapet/ipex/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
240 張圖片
tensor([340, 947, 24, 947, 24, 340, 24, 947, 340, 340, 340, 24, 947, 24,
947, 24, 340, 340, 340, 24, 24, 947, 340, 24, 340, 24, 947, 24,
947, 24, 340, 947, 24, 947, 947, 24, 947, 24, 947, 340, 947, 947,
24, 340, 24, 24, 340, 340, 340, 24, 947, 947, 24, 340, 340, 947,
340, 947, 340, 24])
------ 原模型測試: ------
Accuracy: 0.358
wrong counts for the digit 0: 24
wrong counts for the digit 1: 64
wrong counts for the digit 2: 66
------ 外掛LoRA模型,協同訓練: ------
ep= 0 / 100 , loss= 7.510186433792114
ep= 5 / 100 , loss= 0.5501982718706131
ep= 10 / 100 , loss= 0.31392351537942886
ep= 15 / 100 , loss= 0.2685333229601383
ep= 20 / 100 , loss= 0.24566764384508133
ep= 25 / 100 , loss= 0.22704798355698586
ep= 30 / 100 , loss= 0.21157032251358032
ep= 35 / 100 , loss= 0.1939369961619377
ep= 40 / 100 , loss= 0.1808459535241127
ep= 45 / 100 , loss= 0.16677183099091053
ep= 50 / 100 , loss= 0.15783711895346642
ep= 55 / 100 , loss= 0.14647604897618294
ep= 60 / 100 , loss= 0.13769924268126488
ep= 65 / 100 , loss= 0.12535075191408396
ep= 70 / 100 , loss= 0.1170132402330637
ep= 75 / 100 , loss= 0.11048464849591255
ep= 80 / 100 , loss= 0.10236630216240883
ep= 85 / 100 , loss= 0.09762760624289513
ep= 90 / 100 , loss= 0.09118808852508664
ep= 95 / 100 , loss= 0.08459446299821138
ep= 100 / 100 , loss= 0.07853546738624573
TTT = 1055.0727772712708
saved to /home/eapet/resnet_LoRA/m_data/LORA_for_RESNET_ep50.ckpt
------ 原模型測試: ------
Accuracy: 1.0
wrong counts for the digit 0: 0
wrong counts for the digit 1: 0
wrong counts for the digit 2: 0
```
# Flex 140 Log : python lora_RESNET_basic_003_train_xpu_bf16.py
```
(ipex) eapet@spr-flex140:~/resnet_LoRA$ python lora_RESNET_basic_003_train_xpu_bf16.py
/home/eapet/ipex/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
240 張圖片
tensor([340, 947, 24, 947, 24, 340, 24, 947, 340, 340, 340, 24, 947, 24,
947, 24, 340, 340, 340, 24, 24, 947, 340, 24, 340, 24, 947, 24,
947, 24, 340, 947, 24, 947, 947, 24, 947, 24, 947, 340, 947, 947,
24, 340, 24, 24, 340, 340, 340, 24, 947, 947, 24, 340, 340, 947,
340, 947, 340, 24])
------ 原模型測試: ------
Accuracy: 0.358
wrong counts for the digit 0: 24
wrong counts for the digit 1: 64
wrong counts for the digit 2: 66
------ 外掛LoRA模型,協同訓練: ------
ep= 0 / 100 , loss= 7.478078365325928
ep= 5 / 100 , loss= 0.5512928068637848
ep= 10 / 100 , loss= 0.31658096611499786
ep= 15 / 100 , loss= 0.27006159722805023
ep= 20 / 100 , loss= 0.24738863483071327
ep= 25 / 100 , loss= 0.22922709956765175
ep= 30 / 100 , loss= 0.21194004639983177
ep= 35 / 100 , loss= 0.1970753725618124
ep= 40 / 100 , loss= 0.18202561140060425
ep= 45 / 100 , loss= 0.1694798320531845
ep= 50 / 100 , loss= 0.15941141359508038
ep= 55 / 100 , loss= 0.14813835360109806
ep= 60 / 100 , loss= 0.13937207870185375
ep= 65 / 100 , loss= 0.12657135911285877
ep= 70 / 100 , loss= 0.11715048179030418
ep= 75 / 100 , loss= 0.11164099723100662
ep= 80 / 100 , loss= 0.10284002870321274
ep= 85 / 100 , loss= 0.09817586932331324
ep= 90 / 100 , loss= 0.09128697728738189
ep= 95 / 100 , loss= 0.084747064858675
ep= 100 / 100 , loss= 0.07912518829107285
TTT = 952.8210139274597
saved to /home/eapet/resnet_LoRA/m_data/LORA_for_RESNET_ep50.ckpt
------ 原模型測試: ------
Accuracy: 1.0
wrong counts for the digit 0: 0
wrong counts for the digit 1: 0
wrong counts for the digit 2: 0
```
### Data Flex 140

### A770M



100 epochs takes 4min, 09 seconds

### XPU
```=python
# lora_RESNET_basic_003_train.py
import numpy as np
import torch
import torch.nn as nn
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import Dataset, DataLoader
from torchvision.models import resnet50, ResNet50_Weights
import time
data_path = '/home/eapet/resnet_LoRA/m_data/train/'
base_path = '/home/eapet/resnet_LoRA/m_data/'
# Make torch deterministic
# 設定:每次訓練的W&B初始化都是一樣的
_ = torch.manual_seed(0)
#-------------------------------------
# 載入ResNet50預訓練模型
# Step 1: Initialize model with the best available weights
weights = ResNet50_Weights.IMAGENET1K_V1
resnet_model = resnet50(weights=weights)
# 遷移學習不需要梯度(不更改權重)
for param in resnet_model.parameters():
param.requires_grad = False
#-------------------------------
resnet_model.eval()
#------------------------------------------
# 準備Training data
# 把圖片轉換成Tensor
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor()
])
data_set = ImageFolder(data_path, transform=transform)
length = len(data_set)
print("\n")
print(length, "張圖片\n")
bz = 60
test_loader = DataLoader(data_set, batch_size=bz, shuffle=True)
#--------------------------------------
def process_lx(labels):
lx = labels.clone()
for i in range(bz):
if(labels[i]==0):
lx[i]=340
elif(labels[i]==1):
lx[i]=24
elif(labels[i]==2):
lx[i]=947
return lx
for idx, (images, la) in enumerate(test_loader):
break
labels = process_lx(la)
print(labels)
#-----------------------------------
def test():
correct = 0
total = 0
wrong_counts = [0 for i in range(3)]
with torch.no_grad():
for idx, (images, la) in enumerate(test_loader):
labels = process_lx(la)
prediction = resnet_model(images)
for idx, zv in enumerate(prediction):
if torch.argmax(zv) == labels[idx]:
correct +=1
else:
#print(idx)
wrong_counts[la[idx]] +=1
total +=1
#print(correct)
print('\n', f'Accuracy: {round(correct/total, 3)}')
for i in range(len(wrong_counts)):
print(f'wrong counts for the digit {i}: {wrong_counts[i]}')
#------------------------------------------
print('\n------ 原模型測試: ------')
test()
#==========================================
class Lora(nn.Module):
def __init__(self, m, n, rank=10):
super().__init__()
self.m = m
self.A = nn.Parameter(torch.randn(m, rank))
self.B = nn.Parameter(torch.zeros(rank, n))
def forward(self, inputs):
inputs = inputs.view(-1, self.m)
return torch.mm(torch.mm(inputs, self.A), self.B)
lora = Lora(224 * 224 * 3, 1000)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(lora.parameters(), lr=1e-4)
#========================================================
print('\n------ 外掛LoRA模型,協同訓練: ------')
import intel_extension_for_pytorch as ipex
loss_fn = loss_fn.to("xpu")
lora = lora.to("xpu")
lora, optimizer = ipex.optimize(lora, optimizer=optimizer)
resnet_model = resnet_model.to("xpu")
resnet_model = ipex.optimize(resnet_model)
base = 0
epochs = 100
begin = time.time()
for ep in range(epochs+1):
total_loss = 0
for idx, (images, la) in enumerate(test_loader):
labels = process_lx(la)
images = images.to("xpu")
labels = labels.to("xpu")
pred = resnet_model(images) + lora(images)
loss = loss_fn(pred, labels)
loss.backward()
optimizer.step()
optimizer.zero_grad()
total_loss += loss.item() * bz
if((base+ep)%5 == 0):
loss_np = total_loss / 120
print('ep=', base+ep,'/',base+epochs, ', loss=', loss_np)
end = time.time()
print("TTT =", end-begin)
#------ Saved to *.CKPT ---------------------------
FILE = base_path + 'LORA_for_RESNET_ep50.ckpt'
torch.save(lora.state_dict(), FILE)
print('\nsaved to ' + FILE)
#-------------------------------------
def test22():
correct = 0
total = 0
wrong_counts = [0 for i in range(3)]
with torch.no_grad():
for idx, (images, la) in enumerate(test_loader):
labels = process_lx(la)
images = images.to("xpu")
labels = labels.to("xpu")
prediction = resnet_model(images) + lora(images)
for idx, zv in enumerate(prediction):
if torch.argmax(zv) == labels[idx]:
correct +=1
else:
#print(idx)
wrong_counts[la[idx]] +=1
total +=1
#print(correct)
print('\n', f'Accuracy: {round(correct/total, 3)}')
for i in range(len(wrong_counts)):
print(f'wrong counts for the digit {i}: {wrong_counts[i]}')
#------------------------------------------
print('\n------ 原模型測試: ------')
test22()
#END
```
### XPU with bf16
```
# lora_RESNET_basic_003_train.py
import numpy as np
import torch
import torch.nn as nn
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import Dataset, DataLoader
from torchvision.models import resnet50, ResNet50_Weights
import time
data_path = '/home/eapet/resnet_LoRA/m_data/train/'
base_path = '/home/eapet/resnet_LoRA/m_data/'
# Make torch deterministic
# 設定:每次訓練的W&B初始化都是一樣的
_ = torch.manual_seed(0)
#-------------------------------------
# 載入ResNet50預訓練模型
# Step 1: Initialize model with the best available weights
weights = ResNet50_Weights.IMAGENET1K_V1
resnet_model = resnet50(weights=weights)
# 遷移學習不需要梯度(不更改權重)
for param in resnet_model.parameters():
param.requires_grad = False
#-------------------------------
resnet_model.eval()
#------------------------------------------
# 準備Training data
# 把圖片轉換成Tensor
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor()
])
data_set = ImageFolder(data_path, transform=transform)
length = len(data_set)
print("\n")
print(length, "張圖片\n")
bz = 60
test_loader = DataLoader(data_set, batch_size=bz, shuffle=True)
#--------------------------------------
def process_lx(labels):
lx = labels.clone()
for i in range(bz):
if(labels[i]==0):
lx[i]=340
elif(labels[i]==1):
lx[i]=24
elif(labels[i]==2):
lx[i]=947
return lx
for idx, (images, la) in enumerate(test_loader):
break
labels = process_lx(la)
print(labels)
#-----------------------------------
def test():
correct = 0
total = 0
wrong_counts = [0 for i in range(3)]
with torch.no_grad():
for idx, (images, la) in enumerate(test_loader):
labels = process_lx(la)
prediction = resnet_model(images)
for idx, zv in enumerate(prediction):
if torch.argmax(zv) == labels[idx]:
correct +=1
else:
#print(idx)
wrong_counts[la[idx]] +=1
total +=1
#print(correct)
print('\n', f'Accuracy: {round(correct/total, 3)}')
for i in range(len(wrong_counts)):
print(f'wrong counts for the digit {i}: {wrong_counts[i]}')
#------------------------------------------
print('\n------ 原模型測試: ------')
test()
#==========================================
class Lora(nn.Module):
def __init__(self, m, n, rank=10):
super().__init__()
self.m = m
self.A = nn.Parameter(torch.randn(m, rank))
self.B = nn.Parameter(torch.zeros(rank, n))
def forward(self, inputs):
inputs = inputs.view(-1, self.m)
return torch.mm(torch.mm(inputs, self.A), self.B)
lora = Lora(224 * 224 * 3, 1000)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(lora.parameters(), lr=1e-4)
#========================================================
print('\n------ 外掛LoRA模型,協同訓練: ------')
import intel_extension_for_pytorch as ipex
loss_fn = loss_fn.to("xpu")
lora = lora.to("xpu")
lora, optimizer = ipex.optimize(lora, optimizer=optimizer, dtype=torch.bfloat16)
resnet_model = resnet_model.to("xpu")
resnet_model = ipex.optimize(resnet_model, dtype=torch.bfloat16)
base = 0
epochs = 100
begin = time.time()
for ep in range(epochs+1):
total_loss = 0
for idx, (images, la) in enumerate(test_loader):
labels = process_lx(la)
images = images.to("xpu")
labels = labels.to("xpu")
with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
pred = resnet_model(images) + lora(images)
loss = loss_fn(pred, labels)
loss.backward()
optimizer.step()
optimizer.zero_grad()
total_loss += loss.item() * bz
if((base+ep)%5 == 0):
loss_np = total_loss / 120
print('ep=', base+ep,'/',base+epochs, ', loss=', loss_np)
end = time.time()
print("TTT =", end-begin)
#------ Saved to *.CKPT ---------------------------
FILE = base_path + 'LORA_for_RESNET_ep50.ckpt'
torch.save(lora.state_dict(), FILE)
print('\nsaved to ' + FILE)
#-------------------------------------
def test22():
correct = 0
total = 0
wrong_counts = [0 for i in range(3)]
with torch.no_grad():
for idx, (images, la) in enumerate(test_loader):
labels = process_lx(la)
images = images.to("xpu")
labels = labels.to("xpu")
with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
prediction = resnet_model(images) + lora(images)
for idx, zv in enumerate(prediction):
if torch.argmax(zv) == labels[idx]:
correct +=1
else:
#print(idx)
wrong_counts[la[idx]] +=1
total +=1
#print(correct)
print('\n', f'Accuracy: {round(correct/total, 3)}')
for i in range(len(wrong_counts)):
print(f'wrong counts for the digit {i}: {wrong_counts[i]}')
#------------------------------------------
print('\n------ 原模型測試: ------')
test22()
#END
```
### Channel last diff to XPU with bf16
```
(ipex) eapet@eapet-6414U-DF170:~/resnet_LoRA$ !diff
diff -u lora_RESNET_basic_003_train_xpu_bf16.py lora_RESNET_basic_003_train_xpu_bf16_cl.py
--- lora_RESNET_basic_003_train_xpu_bf16.py 2023-11-02 15:18:16.700000000 +0800
+++ lora_RESNET_basic_003_train_xpu_bf16_cl.py 2023-11-17 15:14:21.692000000 +0800
@@ -104,7 +104,8 @@
self.B = nn.Parameter(torch.zeros(rank, n))
def forward(self, inputs):
- inputs = inputs.view(-1, self.m)
+ #inputs = inputs.view(-1, self.m)
+ inputs = torch.reshape(inputs, [60,3*224*224])
return torch.mm(torch.mm(inputs, self.A), self.B)
lora = Lora(224 * 224 * 3, 1000)
@@ -115,6 +116,8 @@
#========================================================
print('\n------ 外掛LoRA模型,協同訓練: ------')
import intel_extension_for_pytorch as ipex
+resnet_model = resnet_model.to(memory_format=torch.channels_last)
+lora = lora.to(memory_format=torch.channels_last)
loss_fn = loss_fn.to("xpu")
lora = lora.to("xpu")
lora, optimizer = ipex.optimize(lora, optimizer=optimizer, dtype=torch.bfloat16)
@@ -131,6 +134,7 @@
labels = process_lx(la)
images = images.to("xpu")
labels = labels.to("xpu")
+ images =images.to(memory_format=torch.channels_last)
with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16):
pred = resnet_model(images) + lora(images)
```
# Data Flex 170
```=bash
(ipex) eapet@eapet-6414U-DF170:~/resnet_LoRA$ python lora_RESNET_basic_003_train_xpu_bf16_cl.py
/home/eapet/ipex/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
240 張圖片
tensor([340, 947, 24, 947, 24, 340, 24, 947, 340, 340, 340, 24, 947, 24,
947, 24, 340, 340, 340, 24, 24, 947, 340, 24, 340, 24, 947, 24,
947, 24, 340, 947, 24, 947, 947, 24, 947, 24, 947, 340, 947, 947,
24, 340, 24, 24, 340, 340, 340, 24, 947, 947, 24, 340, 340, 947,
340, 947, 340, 24])
------ 原模型測試: ------
Accuracy: 0.358
wrong counts for the digit 0: 24
wrong counts for the digit 1: 64
wrong counts for the digit 2: 66
------ 外掛LoRA模型,協同訓練: ------
ep= 0 / 100 , loss= 7.477551579475403
ep= 5 / 100 , loss= 0.5515237748622894
ep= 10 / 100 , loss= 0.3148090988397598
ep= 15 / 100 , loss= 0.27021610736846924
ep= 20 / 100 , loss= 0.2469922173768282
ep= 25 / 100 , loss= 0.22928182035684586
ep= 30 / 100 , loss= 0.21190199255943298
ep= 35 / 100 , loss= 0.19545279070734978
ep= 40 / 100 , loss= 0.1801932454109192
ep= 45 / 100 , loss= 0.16701150499284267
ep= 50 / 100 , loss= 0.15749669820070267
ep= 55 / 100 , loss= 0.1474070642143488
ep= 60 / 100 , loss= 0.1395247932523489
ep= 65 / 100 , loss= 0.1270718313753605
ep= 70 / 100 , loss= 0.11698541603982449
ep= 75 / 100 , loss= 0.111823420971632
ep= 80 / 100 , loss= 0.10338117741048336
ep= 85 / 100 , loss= 0.09828238002955914
ep= 90 / 100 , loss= 0.0912911631166935
ep= 95 / 100 , loss= 0.08557180035859346
ep= 100 / 100 , loss= 0.07931549288332462
TTT = 34.555999517440796
saved to /home/eapet/resnet_LoRA/m_data/LORA_for_RESNET_ep50.ckpt
------ 原模型測試: ------
Accuracy: 1.0
wrong counts for the digit 0: 0
wrong counts for the digit 1: 0
wrong counts for the digit 2: 0
```