2018q3 Homework3 (review)

contributed by < ofAlpaca >

tags: `CSIE5006` `HW`

第 3 週測驗題測驗 `1`

考慮以下程式碼:













#include <stdio.h>
#include <stdint.h>
struct test {
    unsigned int x : 5;
    unsigned int y : 5;
    unsigned int z;
};
int main() {
    struct test t;
    printf("Offset of z in struct test is %ld\n",
           (uintptr_t) &t.z - (uintptr_t) &t);
    return 0;
}

在 GNU/Linux x86_64 環境中執行，得到以下輸出:

Offset of z in struct test is 4

倘若將第 10 和第 11 換為以下:


    printf("Address of t.x is %p", &t.x);

會發生什麼事？

編譯失敗，不能將指標指向沒有對齊 1 byte 的結構體成員;

延伸問題: 解釋原因，並且找出 C99/C11 規格書對應的描述

根據 C99 說明可以得知， address 最小的單位是 byte ，而且無法對 bit-field 的 member 使用 & operator 。

C99 [3.2] alignment
requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address.
C99 [6.7.2.1] The unary & (address-of) operator cannot be applied to a bit-field object; thus, there are no pointers to or arrays of bit-field objects.

為什麼 z 和 x 的 offset 是 4 ? (source)

最主要原因是 structure alignment 有關
試想以下的資料結構 :

strcut S {
    char y; // 1 byte
    int  z; // 4 bytes
};

出乎意料的是 sizeof(S) 不會是 1 + 4 = 5 而是 8 ，這是為了 access memory 時能更加有效率， struct member 是會自動對齊的。

        (hex)
0x0000 : yy 00 00 00
0x0004 : zz zz zz zz
--------------------
&S = 0x0000
&y = 0x0000
&z = 0x0004

由此可知 z 的 offset 是 &z - &S = 4 。
回到 struct test 來看，其 structure alignment 應該會是 :

         (binary)
0x0000 : xxxx xyyy yy00 0000 0000 0000 0000 0000 
0x0004 : zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz
--------------------
&S = 0x0000
&y = 0x0000
&z = 0x0004

這就是為什麼 z 的 offset 為 0x0004 - 0x0000 = 4 。

第 3 週測驗題測驗 `2`

考慮以下程式碼，在 little-endian 硬體上，會返回 1，反之，在 big-endian 硬體上會返回 0:

int main() {
    union { int a; char b;
    } c = { .a = K1 };
    return c.b == K2;
}

補完上述程式碼。

K1 = ?
1

K2 = ?
1

延伸問題: 解釋運作原理，並找出類似的程式碼

union 的 members 會使用相同位置的記憶體，但可能因為 endian 的不同而讀到不同的 byte :
- 在 Little endian 時，因為低位元組先存在低位址，所以 char 與 int 都可讀到相同 0x01 的位元組。
- 在 Big endian 時，因為高位元組先存在低位址，所以 char 的範圍讀不到 0x01 的那個位元組。

類似的檢測 endianness 的程式碼 : (source)

#include <stdio.h> 
int main()  
{ 
   unsigned int i = 1; 
   char *c = (char*)&i; 
   if (*c)     
       printf("Little endian"); 
   else
       printf("Big endian"); 
   return 0; 
}

原理是一樣的，在 little endian 的狀況下可以知道 0x01 的 byte 會先被放在低位址，所以轉型後 char * c 還讀得到 ; 相反的，如果是 big endian ， 0x01 這個 byte 會被放在高位址， char * c 的範圍只有 1 byte ，因此就讀不到了。

第 3 週測驗題測驗 `3`

以下程式碼計算 parity bit:

#include <stdint.h>
int parity(uint32_t x)
    x ^= x >> 1;
    x ^= x >> 2;
    x = (x & 0x11111111U) * 0x11111111U;
    return (x >> P) & 1;
}

補完程式碼

P=28

延伸問題: 解釋原理，並且提出更快的實作機制 (提示: 透過 SIMD)

第一次看的時候會嚇到，但是深入了解後會發現其觀念蠻簡單的，先有個概念，以 nibble(4-bits) 為單位，用 shift 與 XOR 來判斷此 nibble 是奇同位還是偶同位，最後再將所有 nibble 相加並取其 LSB ，若是 1 則為奇同位 0 則為偶同位。

x ^= x >> 1 與 x ^= x >> 2 是先用 shift 來將 nibble 內的每個 bit 重疊在一起，並透過 XOR 的特性來達到兩兩消除 b'1 的效果，如 b'1 XOR b'1 = b'0 ，如果最後 nibble 的 LSB 是 1 則是奇同位 0 則是偶同位， 32 bits 總共會有 8 組 nibbles 。

         [a][b][c][d]                                (x)
          0  1  1  0     (x)                      [a] 0      
     XOR  0  0  1  1     (x >> 1)                 [b] 1 
     -----------------                            [c] 1
          0  1  0  1                         +    [d] 0
                                            ---------------
         [a][b][c][d]                              1  0
          0  1  0  1     (x)                          ^ only see LSB
     XOR  0  0  0  1     (x >> 2)
     -----------------
          0  1  0  0 
                   ^ parity bit

x = (x & 0x11111111U) * 0x11111111U 此行的目的便是先取出所有 nibble 的 LSB 並相加在一起，並取得後面的 32 bits。

                         0001 0010 0001 ... 0100  (32-bits)
                     &   0001 0001 0001 ... 0001  (0x11111111U)
                     ----------------------------
                         0001 0000 0001 ... 0000  <== extract all nibble's LSB

                         0001 0000 0001 ... 0000  (x & 0x11111111U)
                     *   0001 0001 0001 ... 0001  (0x11111111U)
                     ----------------------------
                         0001 0000 0001 ... 0000
                                    ...                     
                                ...
           0001 0000 0001 ... 0000                                       
  +   0001 0000 0001 ... 0000
 ------------------------------------------------
      0001 0001 0001 ... 0000 ... ... ...   0000 
      [   out of bound  ][ x only take 32 bits ]

(x >> 28) & 1 最後再 shift right 28 bits 得到所有 nibble 相加後結果，再取其 LSB 便得到了 x 的 parity bit 了。

SIMD
- 參考了 Bit Twiddling Hacks 中提到的 Compute parity in parallel 。 ~~也發現了幾題每週測驗的題目~~

    unsigned int v;  // word value to compute the parity of
    v ^= v >> 16;
    v ^= v >> 8;
    v ^= v >> 4;
    v &= 0xf;
    return (0x6996 >> v) & 1;

後來找到 bit-interleave(SIMD ver) 與 SIMD Example 於是想嘗試看看能不能使用 AVX 指令集來作到 parity check 。

第 2 週測驗題測驗 `2`

指標篇提到 signal 系統呼叫的原型宣告:

void (*signal(int sig, void (*handler)(int))) (int);

該如何解析呢？
提示: 參閱 manpage: signal(2)

延伸問題: 解釋 signal(2) 的作用，並在 GitHub 找出應用案例

解析 :
- 先看 signal(int sig, void (*handler)(int)) 有兩個參數， int 與有著 int 參數返回 void 的 function pointer ， signal 為函式名稱。
- void (*signal(int sig, void (*handler)(int))) (int) 則是需要 int 參數並回傳 void 的 function pointer 。
[x]signal(2)

signal 是個軟體中斷，例如 Ctrl-C ，而 signal(2) 函式提供了一個可以處理非同步事件的方法。
- sig 為 signal number ，為中斷的編號，定義在 C 函式庫的 signal.h 裡，變數是 SIG 開頭，像是 SIGINT 、 SIGSEGV 。
- handler 可以有三種值 :
  1. SIG_IGN ，訊號忽略。
  2. SIG_DFL ，訊號會照預設情況處理。
  3. 當訊號發生時的 handler 就會被呼叫，此稱為 signal handler 或 signal-catching function。
  4. handler 所傳入的參數即是 signal number 。
- 只有 SIGKILL 與 SIGSTOP 不能被忽略或是 catch 。

TuxPLC 是個能從 Linux 端控制 Programable Logic Controller (PLC) 的 OpenSource 。

此開源程式也有使用到 signal 來處理中斷訊息如 :

signal(SIGTERM,SigHand);
signal(SIGINT,SigHand);
signal(SIGSEGV,SigHand);

SigHand() 則為 :

void SigHand(int num)
{ 
  switch (num)
  {
      case SIGTERM:	Log(LOG_NOTICE,"receive SIGTERM\n");
                                  Terminated=1;
                                  break;
      case SIGINT:	Log(LOG_NOTICE,"receive SIGINT\n");
                                  Terminated=1;
                                  break;
      case SIGIO:	Log(LOG_INFO,"receive SIGIO\n");
                                  //Terminated=1;
                                  break;
      case SIGSEGV:	Log(LOG_CRIT,"receive SIGSEGV, program ERROR\n");
                                  siglongjmp(jb,-1);
                                  //Terminated=1;
                                  //exit(1);
                                  break;

      default:	Log(LOG_CRIT,"receive signal: %d\n",num);
                                  Terminated=1;
                                  break;
  }
}

第 2 週測驗題測驗 `5`

以下程式是合法 C99 程式嗎？

#include <stdio.h>
int main() { return (********puts)("Hello"); }

請搭配 C 語言規格書解釋

繼續思考以下是否合法:

#include <stdio.h>
int main() { return (**&puts)("Hello"); }

繼續變形:

#include <stdio.h>
int main() { return (&**&puts)("Hello"); }

也會合法嗎？為什麼？請翻閱 C 語言規格書解說。

是合法的
- 根據 C99 可知，一個有 returning type 的 function designator 會被轉為 pointer to function returning type ，也就是說 int t() 的 function designator 會變成 int (*t) () 的 pointer to function 。
  
  C99 [6.3.2.1] A function designator is an expression that has function type. Except when it is the operand of the sizeof operator or the unary & operator, a function designator with type ‘‘function returning type ’’ is converted to an expression that has type ‘‘pointer to function returning type ’’.
- put() 是個 function designator ，所以會被轉為 pointer to function :
  1. *put() ，在前面加上 dereference of (*) 會使 pointer to function 變成 function designator ，但會再被轉回 pointer to function 。
  2. &put() ，在前面加上 address of (&) 會讓原本的 function designator 變為 pointer to function 。
- 所以可以看到，不論在 put() 前怎麼加 * 或 & 最終的結果都是 pointer to function 。

2018q3 Homework3 (review)

tags: CSIE5006 HW

第 3 週測驗題 測驗 1

第 3 週測驗題 測驗 2

第 3 週測驗題 測驗 3

第 2 週測驗題 測驗 2

第 2 週測驗題 測驗 5