resnet50_LORA on Intel® NUC 12 行家版套件 - NUC12SNKi72 - HackMD

# resnet50_LORA on Intel® NUC12SNKi72 and Data Flex 140 ![](https://hackmd.io/_uploads/SJW3z50fp.png) https://ark.intel.com/content/www/tw/zh/ark/products/196170/intel-nuc-12-enthusiast-kit-nuc12snki72.html ![image.png](https://hackmd.io/_uploads/HJfM1txX6.png) |Arc A770M | no XPU | XPU |XPU + bf16| | -------- | -------- | -------- | -------- | | TTT | 757.72341| 247.55658|228.99354 | ![image.png](https://hackmd.io/_uploads/Skz11KxQT.png) # 請注意 Flex 140 裡面有兩個 8 Xe cores 的GPU chip,所以圖片寫著16 Xe Cores 但是 Training時只用到1個. 所以TTT會是A770M 的 1/4 (= 8/32 Xe Cores) | Flex 140 | no XPU | XPU |XPU + bf16| | -------- | -------- | -------- | -------- | | TTT | N/A |1055.0727 |952.8210 | ![Data_Flex_170-1200x900.jpg](https://hackmd.io/_uploads/Hy9N7-zma.jpg) | Flex 170 | no XPU | XPU |XPU + bf16|XPU + bf16 + CL| | -------- | -------- | -------- | -------- | ------------- | | TTT | N/A |234.4946 |210.3633 | 34.5559 | # Flex 140 Log : python lora_RESNET_basic_003_train_xpu.py ``` (ipex) eapet@spr-flex140:~/resnet_LoRA$ python lora_RESNET_basic_003_train_xpu.py /home/eapet/ipex/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 240 張圖片 tensor([340, 947, 24, 947, 24, 340, 24, 947, 340, 340, 340, 24, 947, 24, 947, 24, 340, 340, 340, 24, 24, 947, 340, 24, 340, 24, 947, 24, 947, 24, 340, 947, 24, 947, 947, 24, 947, 24, 947, 340, 947, 947, 24, 340, 24, 24, 340, 340, 340, 24, 947, 947, 24, 340, 340, 947, 340, 947, 340, 24]) ------ 原模型測試： ------ Accuracy: 0.358 wrong counts for the digit 0: 24 wrong counts for the digit 1: 64 wrong counts for the digit 2: 66 ------ 外掛LoRA模型，協同訓練： ------ ep= 0 / 100 , loss= 7.510186433792114 ep= 5 / 100 , loss= 0.5501982718706131 ep= 10 / 100 , loss= 0.31392351537942886 ep= 15 / 100 , loss= 0.2685333229601383 ep= 20 / 100 , loss= 0.24566764384508133 ep= 25 / 100 , loss= 0.22704798355698586 ep= 30 / 100 , loss= 0.21157032251358032 ep= 35 / 100 , loss= 0.1939369961619377 ep= 40 / 100 , loss= 0.1808459535241127 ep= 45 / 100 , loss= 0.16677183099091053 ep= 50 / 100 , loss= 0.15783711895346642 ep= 55 / 100 , loss= 0.14647604897618294 ep= 60 / 100 , loss= 0.13769924268126488 ep= 65 / 100 , loss= 0.12535075191408396 ep= 70 / 100 , loss= 0.1170132402330637 ep= 75 / 100 , loss= 0.11048464849591255 ep= 80 / 100 , loss= 0.10236630216240883 ep= 85 / 100 , loss= 0.09762760624289513 ep= 90 / 100 , loss= 0.09118808852508664 ep= 95 / 100 , loss= 0.08459446299821138 ep= 100 / 100 , loss= 0.07853546738624573 TTT = 1055.0727772712708 saved to /home/eapet/resnet_LoRA/m_data/LORA_for_RESNET_ep50.ckpt ------ 原模型測試： ------ Accuracy: 1.0 wrong counts for the digit 0: 0 wrong counts for the digit 1: 0 wrong counts for the digit 2: 0 ``` # Flex 140 Log : python lora_RESNET_basic_003_train_xpu_bf16.py ``` (ipex) eapet@spr-flex140:~/resnet_LoRA$ python lora_RESNET_basic_003_train_xpu_bf16.py /home/eapet/ipex/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 240 張圖片 tensor([340, 947, 24, 947, 24, 340, 24, 947, 340, 340, 340, 24, 947, 24, 947, 24, 340, 340, 340, 24, 24, 947, 340, 24, 340, 24, 947, 24, 947, 24, 340, 947, 24, 947, 947, 24, 947, 24, 947, 340, 947, 947, 24, 340, 24, 24, 340, 340, 340, 24, 947, 947, 24, 340, 340, 947, 340, 947, 340, 24]) ------ 原模型測試： ------ Accuracy: 0.358 wrong counts for the digit 0: 24 wrong counts for the digit 1: 64 wrong counts for the digit 2: 66 ------ 外掛LoRA模型，協同訓練： ------ ep= 0 / 100 , loss= 7.478078365325928 ep= 5 / 100 , loss= 0.5512928068637848 ep= 10 / 100 , loss= 0.31658096611499786 ep= 15 / 100 , loss= 0.27006159722805023 ep= 20 / 100 , loss= 0.24738863483071327 ep= 25 / 100 , loss= 0.22922709956765175 ep= 30 / 100 , loss= 0.21194004639983177 ep= 35 / 100 , loss= 0.1970753725618124 ep= 40 / 100 , loss= 0.18202561140060425 ep= 45 / 100 , loss= 0.1694798320531845 ep= 50 / 100 , loss= 0.15941141359508038 ep= 55 / 100 , loss= 0.14813835360109806 ep= 60 / 100 , loss= 0.13937207870185375 ep= 65 / 100 , loss= 0.12657135911285877 ep= 70 / 100 , loss= 0.11715048179030418 ep= 75 / 100 , loss= 0.11164099723100662 ep= 80 / 100 , loss= 0.10284002870321274 ep= 85 / 100 , loss= 0.09817586932331324 ep= 90 / 100 , loss= 0.09128697728738189 ep= 95 / 100 , loss= 0.084747064858675 ep= 100 / 100 , loss= 0.07912518829107285 TTT = 952.8210139274597 saved to /home/eapet/resnet_LoRA/m_data/LORA_for_RESNET_ep50.ckpt ------ 原模型測試： ------ Accuracy: 1.0 wrong counts for the digit 0: 0 wrong counts for the digit 1: 0 wrong counts for the digit 2: 0 ``` ### Data Flex 140 ![image.png](https://hackmd.io/_uploads/rJ2ygYg7a.png) ### A770M ![](https://hackmd.io/_uploads/r156gqAza.png) ![](https://hackmd.io/_uploads/Skjy-cRMa.png) ![](https://hackmd.io/_uploads/r1Kz-cRMT.png) 100 epochs takes 4min, 09 seconds ![](https://hackmd.io/_uploads/ryV7fcRGT.png) ### XPU ```=python # lora_RESNET_basic_003_train.py import numpy as np import torch import torch.nn as nn from torchvision import transforms from torchvision.datasets import ImageFolder from torch.utils.data import Dataset, DataLoader from torchvision.models import resnet50, ResNet50_Weights import time data_path = '/home/eapet/resnet_LoRA/m_data/train/' base_path = '/home/eapet/resnet_LoRA/m_data/' # Make torch deterministic # 設定：每次訓練的W&B初始化都是一樣的 _ = torch.manual_seed(0) #------------------------------------- # 載入ResNet50預訓練模型 # Step 1: Initialize model with the best available weights weights = ResNet50_Weights.IMAGENET1K_V1 resnet_model = resnet50(weights=weights) # 遷移學習不需要梯度(不更改權重) for param in resnet_model.parameters(): param.requires_grad = False #------------------------------- resnet_model.eval() #------------------------------------------ # 準備Training data # 把圖片轉換成Tensor transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor() ]) data_set = ImageFolder(data_path, transform=transform) length = len(data_set) print("\n") print(length, "張圖片\n") bz = 60 test_loader = DataLoader(data_set, batch_size=bz, shuffle=True) #-------------------------------------- def process_lx(labels): lx = labels.clone() for i in range(bz): if(labels[i]==0): lx[i]=340 elif(labels[i]==1): lx[i]=24 elif(labels[i]==2): lx[i]=947 return lx for idx, (images, la) in enumerate(test_loader): break labels = process_lx(la) print(labels) #----------------------------------- def test(): correct = 0 total = 0 wrong_counts = [0 for i in range(3)] with torch.no_grad(): for idx, (images, la) in enumerate(test_loader): labels = process_lx(la) prediction = resnet_model(images) for idx, zv in enumerate(prediction): if torch.argmax(zv) == labels[idx]: correct +=1 else: #print(idx) wrong_counts[la[idx]] +=1 total +=1 #print(correct) print('\n', f'Accuracy: {round(correct/total, 3)}') for i in range(len(wrong_counts)): print(f'wrong counts for the digit {i}: {wrong_counts[i]}') #------------------------------------------ print('\n------ 原模型測試： ------') test() #========================================== class Lora(nn.Module): def __init__(self, m, n, rank=10): super().__init__() self.m = m self.A = nn.Parameter(torch.randn(m, rank)) self.B = nn.Parameter(torch.zeros(rank, n)) def forward(self, inputs): inputs = inputs.view(-1, self.m) return torch.mm(torch.mm(inputs, self.A), self.B) lora = Lora(224 * 224 * 3, 1000) loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(lora.parameters(), lr=1e-4) #======================================================== print('\n------ 外掛LoRA模型，協同訓練： ------') import intel_extension_for_pytorch as ipex loss_fn = loss_fn.to("xpu") lora = lora.to("xpu") lora, optimizer = ipex.optimize(lora, optimizer=optimizer) resnet_model = resnet_model.to("xpu") resnet_model = ipex.optimize(resnet_model) base = 0 epochs = 100 begin = time.time() for ep in range(epochs+1): total_loss = 0 for idx, (images, la) in enumerate(test_loader): labels = process_lx(la) images = images.to("xpu") labels = labels.to("xpu") pred = resnet_model(images) + lora(images) loss = loss_fn(pred, labels) loss.backward() optimizer.step() optimizer.zero_grad() total_loss += loss.item() * bz if((base+ep)%5 == 0): loss_np = total_loss / 120 print('ep=', base+ep,'/',base+epochs, ', loss=', loss_np) end = time.time() print("TTT =", end-begin) #------ Saved to *.CKPT --------------------------- FILE = base_path + 'LORA_for_RESNET_ep50.ckpt' torch.save(lora.state_dict(), FILE) print('\nsaved to ' + FILE) #------------------------------------- def test22(): correct = 0 total = 0 wrong_counts = [0 for i in range(3)] with torch.no_grad(): for idx, (images, la) in enumerate(test_loader): labels = process_lx(la) images = images.to("xpu") labels = labels.to("xpu") prediction = resnet_model(images) + lora(images) for idx, zv in enumerate(prediction): if torch.argmax(zv) == labels[idx]: correct +=1 else: #print(idx) wrong_counts[la[idx]] +=1 total +=1 #print(correct) print('\n', f'Accuracy: {round(correct/total, 3)}') for i in range(len(wrong_counts)): print(f'wrong counts for the digit {i}: {wrong_counts[i]}') #------------------------------------------ print('\n------ 原模型測試： ------') test22() #END ``` ### XPU with bf16 ``` # lora_RESNET_basic_003_train.py import numpy as np import torch import torch.nn as nn from torchvision import transforms from torchvision.datasets import ImageFolder from torch.utils.data import Dataset, DataLoader from torchvision.models import resnet50, ResNet50_Weights import time data_path = '/home/eapet/resnet_LoRA/m_data/train/' base_path = '/home/eapet/resnet_LoRA/m_data/' # Make torch deterministic # 設定：每次訓練的W&B初始化都是一樣的 _ = torch.manual_seed(0) #------------------------------------- # 載入ResNet50預訓練模型 # Step 1: Initialize model with the best available weights weights = ResNet50_Weights.IMAGENET1K_V1 resnet_model = resnet50(weights=weights) # 遷移學習不需要梯度(不更改權重) for param in resnet_model.parameters(): param.requires_grad = False #------------------------------- resnet_model.eval() #------------------------------------------ # 準備Training data # 把圖片轉換成Tensor transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor() ]) data_set = ImageFolder(data_path, transform=transform) length = len(data_set) print("\n") print(length, "張圖片\n") bz = 60 test_loader = DataLoader(data_set, batch_size=bz, shuffle=True) #-------------------------------------- def process_lx(labels): lx = labels.clone() for i in range(bz): if(labels[i]==0): lx[i]=340 elif(labels[i]==1): lx[i]=24 elif(labels[i]==2): lx[i]=947 return lx for idx, (images, la) in enumerate(test_loader): break labels = process_lx(la) print(labels) #----------------------------------- def test(): correct = 0 total = 0 wrong_counts = [0 for i in range(3)] with torch.no_grad(): for idx, (images, la) in enumerate(test_loader): labels = process_lx(la) prediction = resnet_model(images) for idx, zv in enumerate(prediction): if torch.argmax(zv) == labels[idx]: correct +=1 else: #print(idx) wrong_counts[la[idx]] +=1 total +=1 #print(correct) print('\n', f'Accuracy: {round(correct/total, 3)}') for i in range(len(wrong_counts)): print(f'wrong counts for the digit {i}: {wrong_counts[i]}') #------------------------------------------ print('\n------ 原模型測試： ------') test() #========================================== class Lora(nn.Module): def __init__(self, m, n, rank=10): super().__init__() self.m = m self.A = nn.Parameter(torch.randn(m, rank)) self.B = nn.Parameter(torch.zeros(rank, n)) def forward(self, inputs): inputs = inputs.view(-1, self.m) return torch.mm(torch.mm(inputs, self.A), self.B) lora = Lora(224 * 224 * 3, 1000) loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(lora.parameters(), lr=1e-4) #======================================================== print('\n------ 外掛LoRA模型，協同訓練： ------') import intel_extension_for_pytorch as ipex loss_fn = loss_fn.to("xpu") lora = lora.to("xpu") lora, optimizer = ipex.optimize(lora, optimizer=optimizer, dtype=torch.bfloat16) resnet_model = resnet_model.to("xpu") resnet_model = ipex.optimize(resnet_model, dtype=torch.bfloat16) base = 0 epochs = 100 begin = time.time() for ep in range(epochs+1): total_loss = 0 for idx, (images, la) in enumerate(test_loader): labels = process_lx(la) images = images.to("xpu") labels = labels.to("xpu") with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16): pred = resnet_model(images) + lora(images) loss = loss_fn(pred, labels) loss.backward() optimizer.step() optimizer.zero_grad() total_loss += loss.item() * bz if((base+ep)%5 == 0): loss_np = total_loss / 120 print('ep=', base+ep,'/',base+epochs, ', loss=', loss_np) end = time.time() print("TTT =", end-begin) #------ Saved to *.CKPT --------------------------- FILE = base_path + 'LORA_for_RESNET_ep50.ckpt' torch.save(lora.state_dict(), FILE) print('\nsaved to ' + FILE) #------------------------------------- def test22(): correct = 0 total = 0 wrong_counts = [0 for i in range(3)] with torch.no_grad(): for idx, (images, la) in enumerate(test_loader): labels = process_lx(la) images = images.to("xpu") labels = labels.to("xpu") with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16): prediction = resnet_model(images) + lora(images) for idx, zv in enumerate(prediction): if torch.argmax(zv) == labels[idx]: correct +=1 else: #print(idx) wrong_counts[la[idx]] +=1 total +=1 #print(correct) print('\n', f'Accuracy: {round(correct/total, 3)}') for i in range(len(wrong_counts)): print(f'wrong counts for the digit {i}: {wrong_counts[i]}') #------------------------------------------ print('\n------ 原模型測試： ------') test22() #END ``` ### Channel last diff to XPU with bf16 ``` (ipex) eapet@eapet-6414U-DF170:~/resnet_LoRA$ !diff diff -u lora_RESNET_basic_003_train_xpu_bf16.py lora_RESNET_basic_003_train_xpu_bf16_cl.py --- lora_RESNET_basic_003_train_xpu_bf16.py 2023-11-02 15:18:16.700000000 +0800 +++ lora_RESNET_basic_003_train_xpu_bf16_cl.py 2023-11-17 15:14:21.692000000 +0800 @@ -104,7 +104,8 @@ self.B = nn.Parameter(torch.zeros(rank, n)) def forward(self, inputs): - inputs = inputs.view(-1, self.m) + #inputs = inputs.view(-1, self.m) + inputs = torch.reshape(inputs, [60,3*224*224]) return torch.mm(torch.mm(inputs, self.A), self.B) lora = Lora(224 * 224 * 3, 1000) @@ -115,6 +116,8 @@ #======================================================== print('\n------ 外掛LoRA模型，協同訓練： ------') import intel_extension_for_pytorch as ipex +resnet_model = resnet_model.to(memory_format=torch.channels_last) +lora = lora.to(memory_format=torch.channels_last) loss_fn = loss_fn.to("xpu") lora = lora.to("xpu") lora, optimizer = ipex.optimize(lora, optimizer=optimizer, dtype=torch.bfloat16) @@ -131,6 +134,7 @@ labels = process_lx(la) images = images.to("xpu") labels = labels.to("xpu") + images =images.to(memory_format=torch.channels_last) with torch.xpu.amp.autocast(enabled=True, dtype=torch.bfloat16): pred = resnet_model(images) + lora(images) ``` # Data Flex 170 ```=bash (ipex) eapet@eapet-6414U-DF170:~/resnet_LoRA$ python lora_RESNET_basic_003_train_xpu_bf16_cl.py /home/eapet/ipex/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( 240 張圖片 tensor([340, 947, 24, 947, 24, 340, 24, 947, 340, 340, 340, 24, 947, 24, 947, 24, 340, 340, 340, 24, 24, 947, 340, 24, 340, 24, 947, 24, 947, 24, 340, 947, 24, 947, 947, 24, 947, 24, 947, 340, 947, 947, 24, 340, 24, 24, 340, 340, 340, 24, 947, 947, 24, 340, 340, 947, 340, 947, 340, 24]) ------ 原模型測試： ------ Accuracy: 0.358 wrong counts for the digit 0: 24 wrong counts for the digit 1: 64 wrong counts for the digit 2: 66 ------ 外掛LoRA模型，協同訓練： ------ ep= 0 / 100 , loss= 7.477551579475403 ep= 5 / 100 , loss= 0.5515237748622894 ep= 10 / 100 , loss= 0.3148090988397598 ep= 15 / 100 , loss= 0.27021610736846924 ep= 20 / 100 , loss= 0.2469922173768282 ep= 25 / 100 , loss= 0.22928182035684586 ep= 30 / 100 , loss= 0.21190199255943298 ep= 35 / 100 , loss= 0.19545279070734978 ep= 40 / 100 , loss= 0.1801932454109192 ep= 45 / 100 , loss= 0.16701150499284267 ep= 50 / 100 , loss= 0.15749669820070267 ep= 55 / 100 , loss= 0.1474070642143488 ep= 60 / 100 , loss= 0.1395247932523489 ep= 65 / 100 , loss= 0.1270718313753605 ep= 70 / 100 , loss= 0.11698541603982449 ep= 75 / 100 , loss= 0.111823420971632 ep= 80 / 100 , loss= 0.10338117741048336 ep= 85 / 100 , loss= 0.09828238002955914 ep= 90 / 100 , loss= 0.0912911631166935 ep= 95 / 100 , loss= 0.08557180035859346 ep= 100 / 100 , loss= 0.07931549288332462 TTT = 34.555999517440796 saved to /home/eapet/resnet_LoRA/m_data/LORA_for_RESNET_ep50.ckpt ------ 原模型測試： ------ Accuracy: 1.0 wrong counts for the digit 0: 0 wrong counts for the digit 1: 0 wrong counts for the digit 2: 0 ```