2016q3 Homework3（software-pipelining)

# 2016q3 Homework3（software-pipelining) #### contributed by <`andy19950`> ## 作業要求 - 說明 naive_transpose, sse_transpose, sse_prefetch_transpose 之間的效能差異，以及 prefetcher 對 cache 的影響 - 用 perf 分析 cache miss/hit - 學習 perf stat 的 raw counter 命令 - 用 SSE/AVX intrinsic 來改寫程式碼 --- ## 執行結果 - 拿到程式碼先馬上執行看看結果如何 ``` sse prefetch: 63016 us sse: 164243 us naive: 326131 us ``` | |cache-miss|cache-references |miss rate| |:------|:--------:|:---------------:|:-------:| |`Naive`|43,632,670|47,533,255 |91.79% | |`SSE` |15,943,703|19,252,257 |82.81% | |`SSE_Prefetch`|11,713,156|14,782,574|79.23% | --- ## 新增 AVX & AVX_Prefetch - 參考 [周曠宇共筆](https://embedded2015.hackpad.com/Week8--VGN4PI1cUxh) - 把程式碼加進 `impl.c` - 新增 `time_test.c benchmark_clock_gettime.c` 讓程式可以分不同的 function 執行分別測試 cache miss 以及 run time ### 執行結果 | |cache-miss|cache-references |miss rate| |:------|:--------:|:---------------:|:-------:| |`Naive`|42,116,848|47,274,088 |89.09% | |`SSE` |15,376,848|19,201,215 |80.08% | |`AVX` |11,278,056|15,047,495 |74.95% | |`SSE_Prefetch`|8,891,630 |12,605,390|70.53% | |`AVX_Prefetch`|11,741,365|15,659,025|74.98% | ![](https://i.imgur.com/nryaP0A.png)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.