Sign in to import from GitHub:
contributed by <YuanHaoHO
>
catpig1630
hoyuanhao@hoyuanhao-X550VB:~$ lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 58
Model name: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz
製程: 9
CPU MHz: 1203.109
CPU max MHz: 3200.0000
CPU min MHz: 1200.0000
BogoMIPS: 5188.13
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3
1.降低cache-miss的比率
2.降低 findName()時間
3.降低append()時間
這次功課主要處理的目標在於有關優化後時間與cache-miss上的差異,所以主要是爭對phonebook_opt與main.c上著手更改,先從main.c開始看:
CLOCK_REALTIME
這種計時方式,並且將其運用在diff_in_second這個static double的function中,molloc()動態配置
main.c上主要是在進行動態讀取txt檔與計算時間,並沒有進一步去進行更改讀取效果或著套用資料結構,而運用OPT這個變數來分化是origin還是optimal,如此再從phonebook_opt.[hc]上看其程式碼內容:
/\* allocate memory for the new entry and put lastName */
e = e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
phonebook_opt.h
可以觀察到,在__PHONE_BOOK_ENTRY被分為許多組字元陣列來儲存資料,並將其指標儲存於pNext中,以待動態配置運用。typedef struct \_\_PHONE\_BOOK_ENTRY {
char lastName\[MAX\_LAST\_NAME_SIZE\];//只有這裡再搜索時會運用到
char firstName\[16\];
char email\[16\];
char phone\[10\];
char cell\[10\];
char addr1\[16\];
char addr2\[16\];
char city\[16\];
char state\[2\];
char zip\[5\];
struct __PHONE_BOOK_ENTRY* pNext;
} entry;
當CPU在快速的搬運資料時會使用到cache,然而因為cache空間及為有限,所以如果沒有在搬運資料時,妥善運用搬運空間,cache-miss rate 上升,而整體效能下降。目前主要解決方向式讓搬運內容比例上升還有減少搜索時間,如此可以向鎖定精確區域搜索,和增加一次可搬運的量。
ryanwang
的做法,先用註解的方式拿掉其他資料,看看執行結果如何。origin
execution time of findName() : 0.005664 sec
Performance counter stats for './phonebook_orig' (100 runs):
2,269,002 cache-misses # 83.264 % of all cache refs ( +- 0.19% )
2,725,054 cache-references ( +- 0.24% )
322,678,743 instructions # 1.06 insns per cycle ( +- 0.02% )
303,609,265 cycles ( +- 1.27% )
0.102145149 seconds time elapsed ( +- 1.33% )
perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./phonebook_opt
opt
Performance counter stats for './phonebook_opt' (100 runs):
283,392 cache-misses # 58.172 % of all cache refs ( +- 0.33% )
487,162 cache-references ( +- 1.08% )
279,804,090 instructions # 1.52 insns per cycle ( +- 0.02% )
184,513,078 cycles ( +- 0.72% )
0.062222535 seconds time elapsed ( +- 0.77% )
./calculate
gnuplot scripts/runtime.gp
在這次實驗上,我同我同學一起研究,發現再硬體設備上的差異,對於cache-miss上的比率有很大的關係,我使用的筆電在cache上的大小不大,為:
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
如此同樣使用拿掉其他無用的資料的方法我的cache-miss從83%降至58%,而我的同學硬體設備更好,一開始的cache-miss就比我低很多,在attend()上所花的時間也比我更少,如此因硬體設備而影響數據,造就大家在coding上的結果也會受到快取的大小而改變。
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing