owned this note
owned this note
Published
Linked with GitHub
# 1117-CS:APP 學習指引 (1) - Homework
### 2.58: 撰寫函式 is_little_endian,在 Little endian 機器上返回 1,反之返回 0,應能再任意機器上,無論幾位元的架構都適用 (Page 88)
* Hint: **Byte Order**
* 利用 uintptr_t
![](http://i.imgur.com/Y4GHzVH.png)
* 想法1: 將 uintptr_t 只像一個
```clike=
#include <stdint.h>
#include <inttypes.h>
int is_little_endian(){
int test_num = 0xff;
uintptr_t byte_start = (uintptr_t) &test_num;
if(*(int*)byte_start == 0xff){
return 1;
}
return 0;
}
int main(){
if (is_little_endian())
printf("Little Endian");
else
printf("Big Ednain")
return 0;
}
```
* 想法2: 利用 char pointer 指向 將 int 變數 cast 成 char pointe, 並用一個 char pointer 指向他, 在存取其值(第一個byte)來判斷
```clike=
#include <stdio.h>
int is_little_endian(){
int check=0x12345678;
char *cptr;
cptr = (char *) ✓
if (*cptr == 0x78)
return 1;
else
return 0;
}
int main(){
if (is_little_endian())
printf("Little Endian");
else
printf("Big Ednain")
return 0;
}
```
---
### 2.66: 撰寫函式 leftmost_one,產生得以找出數值 x 最高位 1 的位元遮罩 (Page 90)
* hint: clz
* 0xFF --> 0x80 , 0x6600 --> 0x4000
若系統為 32 bits
先將 leftmost_one 右邊的位元都變成 1,
用 x & (~x >> 1) 產生最高位 1 的位元遮罩
```clike=
int leftmost(unsigned x){
x |= x>>1;
x |= x>>2;
x |= x>>4;
x |= x>>8;
x |= x>>16;
return x & (~x >> 1);
}
```
---
### 2.92: 撰寫函式 float_negate,給定浮點數 f,回傳 -f,倘若 f 是 NaN,直接回傳 f。輸入數值範圍為有效的 2^32^ 個數值 (Page 96)
* infinite & not a number INF,NaN
* difficulty +0,-0, how to know "range"
* INF * 0 = -0?
* Distribution of Value
先取得 sign, exp, frac 的值
exp 為 0xff
frac 不為 0, 則為 NaN
若不是 NaN, 把 sign, exp, frac 的組合加上 inverse 再回傳
```clike=
float_bits float_negate(float_bits f){
unsigned sig = f>>31;
unsigned exp = f>>23 & 0xff;
unsigned frac = f & 0x7FFFFF;
int is_NAN = (exp == 0xFF) && (frac != 0);
if(is_NAN){
return f;
}
return ~sig<<31 | exp<<23 | frac;
}
```
---
### 4.47: 以 C 語言指標改寫原本使用陣列表示法的氣泡排序程式碼 (Page 327)
* 參考 [Optimizing bubble sort](https://en.wikipedia.org/wiki/Bubble_sort#Optimizing_bubble_sort)
```clike=
void bubble_b(long* data, long count) {
long *i, *last;
for (last = data+count-1; last > data; last--) {
for (i = data; i < last; i++) {
if (*(i+1) < *i) {
*(i+1) = *(i+1) ^ *i;
*i = *(i+1) ^ *i;
*(i+1) = *(i+1) ^ *i;
}
}
}
}
```
Optimizing version:
```clike=
void bubble_b(long* data, long count) {
long *i, *last;
bool swapped = true;
for (last = data+count-1; swapped && last > data; last--) {
swapped = false;
for (i = data; i < last; i++) {
if (*(i+1) < *i) {
*(i+1) = *(i+1) ^ *i;
*i = *(i+1) ^ *i;
*(i+1) = *(i+1) ^ *i;
swapped = true;
}
}
}
}
```
---
### 6.37: 計算 N=64 和 N=60 的 cache miss rate (Page 455)
4K cache, 16 block size
zeof(int) =4
When N = 64,
zeof(array_t) = 64 * 64 = 4096 = 4 * C
* SumA
```clike=
for (int i=0; i<N; i++)
for (int j=0; j<N; j++){
sum += a[i][j];
}
```
第一次存取會 miss, 接連三次存取會 hit
miss rate = 25%
* SumB
```clike=
for (int j=0; j<N; j++)
for (int i=0; i<N; i++){
sum += a[i][j];
}
```
first loop all miss, next 3 loop all hit
so miss rate is 25%.
* SumC
```clike=
for (j= 0; j < N; j+=2)
for (i = 0; i < N; i+=2)
sum += (a[i][j] + a[i][j+1] + a[i+1][j] + a[i+1][j+1]);
```
total read count = 64*64
because of i+=2,
read cache order only loop 2 times
0, 16, 32, 48, ... 240,(2 times)
so miss rate is 50%
totol read miss count = 64/2 * 64 * 50% = 64*64/4
so miss rate is still 25%.
When N = 60,
* SumA
```
miss rate 25%
```
* SumB
```
first inner loop a[0][0] -> a[59][0]
read memory address order:
0, 604, 6042, .... 604*59
read cache order:
0, 15, 30, ...., 225, (17 numbers) 255, 14, 29, ....., 224, (17 numbers) 254, 13, 28, ....., 223, (17 numbers) 253, 12, 27, 42, 57, 72, 87, 102, 117 (9 numbers)
all read miss and store into different blocks
next 3 loops: a[0][1] -> a[59][1], a[0][2] -> a[59][2], a[0][3] -> a[59][3]
all hit
```
* SumC
```
same as miss rate when N = 64
25%
```
---
### 6.45: 撰寫更快的轉置矩陣實作 (Page 45)
原始程式碼:
```clike=
void transpose(int* dst, int* src, int dim) {
int i, j;
for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
dst[j*dim+i] = src[i*dim+j];
}
```
if dim = 4
* 分析 dst 的存取順序
0 4 8 1 5 9 13 2 6 10 14 3 7 11 15
第一次存取 dst[0] 會 miss
load dst[0] ~ dst[3]
再存取 dst[4] 一樣會 miss
可能會造成過多 cahce miss
讓 dst 的存取符合 locality:
```clike=
void transpose(int *dst, int *src, int dim) {
for(int n = 0; n<dim*dim; n++) {
int i = n/dim;
int j = n%dim;
dst[n] = src[j*dim + i];
}
}
```
---
### 6.46: 有向圖轉換為無向圖 (Page 45) / 參考: Graph, 相鄰矩陣 (Adjacency Matrix))
同 4.45
```clike=
void col_convert(int *G, int dim){
for(int n = 0; n < dim*dim; n++){
int i = n/dim;
int j = n%dim;
G[n] = G[n] || G[j*dim + i]
}
}
```