owned this note
owned this note
Published
Linked with GitHub
# 2018q1 Homework1 (phonebook)
contributed by < `t5i0m7` >
### Reviewed by `Wiilemchen`
* 實做 hash function 後的效能比較可以加入改 entry size 的版本
* 若有新檔案要提交,可以用`git add <file>`
### Reviewed by `BroLeaf`
* 中文跟英文間要用空格隔開
* Commit Message 只有標題,應該在內文中敘述做了什麼,詳細可以參考老師給的 [如何寫一個 Git Commit Message]
* 關於 Cache 的大小,`dmidecode` 上面顯示的是 CPU 總共有多少 cache 所以還要除以你的 CPU 上有多少 core 才會是對的,可以參考 [連結](http://debugo.com/ls-cpu-dmidecode/) 上面的算法
```
L1d 快取: 16K
L1i 快取: 64K
L2 快取: 2048K
```
* 從 lscpu 得到的才是正確的值
## 前言
這個課程有很多工具都是第一次接觸,一開始難免有很多不懂的,但現在對我來講
邊看邊學編寫就對了,這是我認為學習東西最快的方法。在此先感謝教授的影片與`ryanpatiency`同學
看著影片與同學的共筆讓我學到了許多。前言之就是我的開發紀錄。
## linux系統安裝
在一開始使用usb安裝linux使自己的電腦變成雙系統就出了點問題,這跟筆電的BIOS有關
當我使用UEFI而非傳統BIOS時,這時會發生讀不到USB的情況。在花了許多白工之後,我決定
將本來要廢棄的電腦拿來安裝linux,並用putty靠SSH做遠端操控來繼續我的作業。
## CPU 配置
* CPU
```
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: AuthenticAMD
CPU 家族: 21
型號: 16
Model name: AMD A10-5800K APU with Radeon(tm) HD Graphics
製程: 1
CPU MHz: 1400.000
CPU max MHz: 3800.0000
CPU min MHz: 1400.0000
BogoMIPS: 7599.19
虛擬: AMD-V
L1d 快取: 16K
L1i 快取: 64K
L2 快取: 2048K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate retpoline retpoline_amd rsb_ctxsw vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
```
* chache
```
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
Handle 0x002F, DMI type 7, 19 bytes
Cache Information
Socket Designation: L1 CACHE
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 192 kB
Maximum Size: 192 kB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 2-way Set-associative
Handle 0x0030, DMI type 7, 19 bytes
Cache Information
Socket Designation: L2 CACHE
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 4096 kB
Maximum Size: 4096 kB
Supported SRAM Types:
Pipeline Burst
Installed SRAM Type: Pipeline Burst
Speed: 1 ns
Error Correction Type: Multi-bit ECC
System Type: Unified
Associativity: 16-way Set-associative
```
## 開始進入作業一 (phonebook)
這次的作業要修改phonebook裡的code,將phonebook_orig做一個最佳化的版本。
借由實作phonebook_opt降低cache miss,而以下我參考很多人的想法。
#### 方法一 降低 entry 大小
* 降到 32bytes entry 留 lastName , zip[5] , pNext

```
Performance counter stats for './phonebook_orig' (100 runs):
724,422 cache-misses # 14.698 % of all cache refs ( +- 0.52% ) (74.46%)
4,928,566 cache-references ( +- 0.57% ) (50.81%)
265,940,300 instructions # 0.58 insn per cycle ( +- 0.21% ) (75.87%)
458,223,017 cycles ( +- 0.86% ) (75.66%)
0.132371343 seconds time elapsed ( +- 2.43% )
```
```
Performance counter stats for './phonebook_opt' (100 runs):
270,950 cache-misses # 10.407 % of all cache refs ( +- 0.84% ) (74.41%)
2,603,570 cache-references ( +- 0.78% ) (51.14%)
245,928,903 instructions # 0.66 insn per cycle ( +- 0.26% ) (76.17%)
372,881,696 cycles ( +- 1.02% ) (75.67%)
0.107570050 seconds time elapsed ( +- 2.86% )
```
* 降到 28bytes entry 留 lastName , pNext

```
Performance counter stats for './phonebook_orig' (100 runs):
679,209 cache-misses # 13.959 % of all cache refs ( +- 0.51% ) (74.43%)
4,865,670 cache-references ( +- 0.67% ) (51.31%)
264,981,539 instructions # 0.59 insn per cycle ( +- 0.18% ) (76.05%)
452,444,619 cycles ( +- 0.92% ) (75.18%)
0.132400425 seconds time elapsed ( +- 3.31% )``
```
```
Performance counter stats for './phonebook_opt' (100 runs):
184,417 cache-misses # 9.483 % of all cache refs ( +- 1.22% ) (74.08%)
1,944,692 cache-references ( +- 0.82% ) (51.87%)
242,371,815 instructions # 0.77 insn per cycle ( +- 0.29% ) (76.76%)
316,335,496 cycles ( +- 1.15% ) (75.05%)
0.102210667 seconds time elapsed ( +- 9.41% )
```
::: info
問題 1 :降低entry的大小不一樣會造成 findName 的時間下降,像是我做的第一個嘗試,將entry大小降為32bytes entry 留 lastName、zip[5]、pNext,findName時間反而phonebook_orig的實作還來的多。但將entry改為降到 28bytes entry 留 lastName、pNext就能將低findName的時間。
:::
::: info
問題 2 :Cache miss的機率太詭異了,從phonebook_orig的版本就蠻奇怪的,不使用任何演算改變實作其cahce miss的機率只有14%,這數字跟其他人的比起來相當詭異。
:::
:::success
後來我看了自己CPU的資訊,我看到了自己CPU的L1 cache是 192 kB 、L2 cache 4096 kB,我認為是cache太大導致cache miss較低,而search的時間還是很久。
:::
#### 方法二 使用DJB hash function

```
Performance counter stats for './phonebook_orig' (100 runs):
634,523 cache-misses # 13.076 % of all cache refs ( +- 0.33% ) (74.59%)
4,852,419 cache-references ( +- 0.48% ) (50.69%)
263,565,254 instructions # 0.71 insn per cycle ( +- 0.21% ) (75.95%)
371,318,558 cycles ( +- 0.79% ) (75.79%)
0.101866438 seconds time elapsed ( +- 1.88% )
```
```
Performance counter stats for './phonebook_opt' (100 runs):
154,175 cache-misses # 8.831 % of all cache refs ( +- 0.44% ) (73.86%)
1,745,748 cache-references ( +- 0.79% ) (50.76%)
233,546,335 instructions # 1.03 insn per cycle ( +- 0.21% ) (77.00%)
227,029,862 cycles ( +- 0.47% ) (76.98%)
0.062459330 seconds time elapsed ( +- 0.80% )
```
我更改了phonebook_opt.c,並在裡面時作一個DJB hash function,並在phonebook_opt.h裡加入了hashtable以便實作。而在產生碰撞時要作的處理,append和findname都要做處理。
由上可以看的出來,在serach時間有顯著的降低,cache miss也比只改entry大小的方式還來的小。
```
// my DJB Hash Function
unsigned int DJBHash(char lastName[])
{
unsigned int hash = 5381;
while (*lastName) {
hash += (hash << 5) + (*lastName++);
}
return (hash & 0x3FF);
}
```
:::info
小插曲:沒注意到commit首位大寫的警告,導致之前的code都沒傳上去,最後再回github看才注意到,這點以後再寫commit要注意。
:::
## 參考資料
* [Ubuntu 14.04 + xRDP + Xfce 实现Windows远程桌面连接](http://www.cnblogs.com/platero/p/4123720.html)
* [各种字符串Hash函数比较](https://www.byvoid.com/zhs/blog/string-hash-compare)