owned this note
owned this note
Published
Linked with GitHub
# 2016q3 Homework3
contributed by <`SarahCheng`>
###### tags: `sarah`,`2016q3_hw3`
## 閱讀筆記
### [Modern Microprocessors](http://www.lighterra.com/papers/modernmicroprocessors/)
* VLIW designs are not interlocked.
* not check for dependencies between instructions.
* have no way of stalling instructions other than to stall the whole processor on a cache miss.
* The programmable shaders in graphics processors (GPUs) are sometimes VLIW designs, as are many digital signal processors
> 看不太懂: This complicates the compiler somewhat, because it is doing something that a superscalar processor normally does at runtime, however the extra code in the compiler is minimal and it saves precious resources on the processor chip.
* Branches & Branch Prediction
* make the guess.(Two alternatives)
* static -->compiler
* dynamic-->on-chip branch prediction table containing the addresses of recent branches and a bit indicating whether each branch was taken or not last time.
>In reality, most processors actually use two bits, so that a single not-taken occurrence doesn't reverse a generally taken prediction (important for loop back edges).
*The most advanced modern processors often implement several branch predictors
* Use empty cycles
* reordering in hardware at runtime:OOO( software need not be recompiled)
> complex logic to the processor,harder to design, larger chip area, power-hungry
* rearranging the instructions :compiler (multiple paths)
> means more cores, or extra cache, could be placed onto the same amount of chip area
### [SIMD](https://docs.google.com/presentation/d/1LeVe7EAmZvqD3KN7p4Wynbd36xPOk9biBCFrdmsqzKs/edit#slide=id.p18)
* The difficult parts
* Finding Parallelism in Algorithm
* Portability between different intrinsics
* Boundary handling
* Padding, Predication, Fallback
* Divergence
* Predication, Fallback to scalar
* Register Spilling
* multi-stages, # of variables + braces, assembly
* Non-Regular Access/Processing Pattern/Dependency
* Multi-stages/Reduction ISA/Enhanced DMA
* Unsupported Operations
* Division, High-level function (eg: math functions)
* Floating-Point
* Unsupported/cross-deivce Compatiblilty
## mergesort-concurrent
參照 [LanKuDot](https://hackmd.io/s/S1ezGhIA)學長的實作
### Test
* Problem:
* 確保編譯時加入 -g 參數,確保包含 debug info 的執行檔正確產生
> —g加在makefile的OBJS和sort 沒有用...
* 不要直接修改git的master,建一個local branch,push成遠端branch
> ` git checkout -b testbysarah` `git push -u origin testbysarah`
### 作業
* `$ uniq words.txt | sort -R > input.txt`
> argc 是argument count(參數總和)的縮寫,代表包括指令本身的參數個數。系統會自動計算所輸入的參數個數。
argv 則是argument value 的縮寫。代表參數值。
也就是使用者在命令列中輸入的字串,每個字串以空白相隔。
同時,系統會自動將程式本身的名稱指定給argv[0] ,再將程式名稱後面所接續的參數依序指定給argv[1]
#### [Doxygen](http://www.stack.nl/~dimitri/doxygen/download.html)
* install
* `git clone https://github.com/doxygen/doxygen.git`
* `cd doxygen`
* `mkdir build`
* `cd build`
* `cmake -G "Unix Makefiles" .`
> `Configuring incomplete, errors occurred!`(待解)
* `make`
`Configuring incomplete, errors occurred!
#### [SuperMalloc](https://github.com/sysprog21/SuperMalloc)
* error : `format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘uint64_t {aka long long unsigned int}’ [-Werror=format=`
```=Clike
# if __WORDSIZE == 64
typedef long int int64_t;
# else
__extension__
typedef long long int int64_t;
# endif
```
```=Clike
both 32-bit compile with GCC (and with 32- and 64-bit MSVC), the output of the program will be:
int: 0
int64_t: 1
long int: 0
long long int: 1
```
*可能是os灌不對 `uname -a`
Linux sarahcheng-MacBookPro 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:41:41 UTC 2016 i686 i686 i686 GNU/Linux
* error: `Cannot open the file`
> not input.txt, just input
* error: `input unsorted data line-by-line`
* error: `x range not ....`
* error: `redeclaration of ‘fp’ with no linkage`
* error: `Segmentation fault (core dumped)`程式記憶體區段錯誤
```