自我檢查事項(sandbox)

contributed by <zhanyangch, tina0405, LinRiver>

隔離執行環境的建構

Trusted Execution Environment (TEE) 的應用為何？(舉出共筆以外的真實世界案例)

Trusty TEE

在 android 上支援 TEE 的軟體，包含
- 一個運行在 TEE 處理器上的 Trusty OS
- Drivers for the Android kernel 處理在 Trusty OS 下的應用程式間溝通
- 一個函式庫處理在 Trusty OS 下的應用程式間溝通使用 kernel drivers
TEE 處理器是獨立的微處理器或主處理器虛擬實例
TEE processor is isolated from the rest of the system using memory and I/O protection mechanisms supported by the hardware.
應用 DRM,mobile payments, secure banking, full-disk encryption, multi-factor authentication, device reset protection, replay-protected persistent storage, wireless display ("cast") of protected content, secure PIN and fingerprint processing, and even malware detection.
API
- Trusted applications or services that run on the TEE processor
- Normal/untrusted applications that run on the main processor and use services provided by Trusted applications
運行在主處理器的應用程式可以透過　Trusty APIs 連接可信任的應用並交換訊息，這個過程是completely asynchronous
Trusted applications run as isolated processes under the Trusty OS kernel. Each process runs in its own virtual memory sandbox utilizing the MMU capabilities of the TEE processor
Trusty 應用為單執行序
Trusty applications 只在載入時初始化一次且會一直在記憶體中直到 TEE 處理器被 reset，Trusty 不支援動態載入
Trusted applications are written as event-driven servers waiting for commands from other applications or from applications running on the main processor
Currently all Trusty applications are developed by a single party and packaged with the Trusty kernel image. The entire image is signed and verified by the bootloader during boot. Third-party application development is not supported in this version of Trusty
Intel moves on Blockchain, Self-Driving Cars
Intel 與 Microsoft 合作投入新興市場開發自動載具與區塊鏈, 此區塊鏈平台稱為 Coco Framwork. 此平台整合了 Intel 的 Software Guard Extensions (SGX) 來強化在區塊鏈隱私和資料安全性. Coco Framework 應用了 SGX 所提供的 TEE 來保護執行中的資料與程式安全性.
IoT and TEE
經由醫療感測裝置所蒐集到的用戶資料, 一般若有需要會傳送到個人照護醫生手上進行了解. 然而有時候可能會因應其他機構或是裝置需求, 傳送至他處進行資料分析. 透過 TEE 所提供的隔離環境進行資料傳送可確保其正確性.

moxiebox 打造出隔離執行 (sandbox) 的運作環境該如何與外界溝通？列出對應的程式碼並解讀

輸出:函式 setreturn 會將 pointer to return buffer 放在 sregs[6]、length of return buffer 放在 sregs[7] 的位置，利用 sandbox.cc　中 gatherOutput 函式將 return buffer 的資料輸出到指定的檔案

sandboxrt.h

extern void setreturn(void *addr, size_t length);

setreturn.ｓ

/*
 * Input:
 * $r0	-- pointer to return buffer
 * $r1	-- length of return buffer
 *
 * Output:
 * none
 */

	.globl	setreturn
	.type	setreturn,@function
	.text
setreturn:
	ssr	$r0, 6
	ssr	$r1, 7
	ret

.Lend:
	.size	setreturn,.Lend-setreturn

輸入 : loadRawData 函式配置記憶體

ELF 執行檔格式包含哪些 section 呢？又在哪裡可見到詳細描述？

先看了 wikipedia 介紹的 (ELF)Executable and Linkable Format
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
但因為中間有部份 section 被略過，而在課本深入理解計算機系統中有提到
典型的 ELF 可重定位目標文件格式，參考 7.4 節
- 重定位（relocation）：當 compiler 和 assembler 產生了object code，linker 會把 symbol 相對應的 memory link 起來，進而 relocation 這些 section，而 linker 還會使用 assembler 所產生的 relocation entry 去使用 relocation

	| ELF header         | ->
	| .text              | -> 放置程式碼
 	| .rodata            | -> 只讀數據 ，ro 是 rom 的意思
	| .data              | -> 放置初始化的 globle 和 static 變數
	| .bss               | -> 放置未初始化或令為 0 的 globle 和 static 變數
	| .symtab            | -> 放置定義和引用的 function 還有 global 變數的信息
	| .rel.text          | -> 程式段的重定位資訊
	| .rel.data          | -> 資料段的重定位資訊
	| .debug             | -> 除錯資訊
	| .line              | -> 原本行號和 .text 的對應，加入 -g 後才會有這 table
	| .strtab            | -> 字串表
	|section header table| ->

data 劃分為初始化與未初始化的原因：
- 為了空間效率，未初始化的數值不需佔據實際 disk 空間，只需要在 run 時都給0
利用 objdump -x 執行檔 察看 section 的類型
- 這邊我選 moxiebox/tests/rtlib 當輸入 objdump -x rtlib
顯示格式： file format elf32-little
參考資料

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000512  00001000  00001000  00000074  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .init         0000000e  00001512  00001512  00000586  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .fini         00000008  00001520  00001520  00000594  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  3 .rodata       00000018  00001528  00001528  0000059c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .data         00000430  00001640  00001640  000005b4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  5 .eh_frame     00000004  00001a70  00001a70  000009e4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  6 .ctors        00000008  00001a74  00001a74  000009e8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  7 .dtors        00000008  00001a7c  00001a7c  000009f0  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  8 .bss          00000024  00001a84  00001a84  000009f8  2**2
                  ALLOC
  9 .comment      0000003b  00000000  00000000  000009f8  2**0
                  CONTENTS, READONLY
 10 .debug_aranges 00000150  00000000  00000000  00000a33  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_info   00003538  00000000  00000000  00000b83  2**0
                  CONTENTS, READONLY, DEBUGGING
 12 .debug_abbrev 00000ead  00000000  00000000  000040bb  2**0
                  CONTENTS, READONLY, DEBUGGING
 13 .debug_line   00000fb1  00000000  00000000  00004f68  2**0
                  CONTENTS, READONLY, DEBUGGING
 14 .debug_frame  000001f8  00000000  00000000  00005f1c  2**2
                  CONTENTS, READONLY, DEBUGGING
 15 .debug_str    00000a61  00000000  00000000  00006114  2**0
                  CONTENTS, READONLY, DEBUGGING
 16 .debug_loc    000005eb  00000000  00000000  00006b75  2**0
                  CONTENTS, READONLY, DEBUGGING
 17 .debug_ranges 00000098  00000000  00000000  00007160  2**0
                  CONTENTS, READONLY, DEBUGGING

參考資料
先解釋 VMA 和 LMA
- LMA （Load Memory Address）： the address at which the section will be loaded.
  - 例如把指令從 Disk 搬到 Memory，就是利用這個 Load Memory Address ，我們不用 CPU 讀取 Disk 上的指令，而是將他 copy 到 Memory 再讀取，是因為讀取 Disk 的時間過久
- VMA（Virtual Memory Address）： the address the section will have when the output file is run
  - Virtual Address 和 Physical Address 透過 MMU 轉換
    Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
在 stackoverflow 中所提到的 Algn 2**n 意思：
- Algn :
  $2^{n}$
  - n : Values 0 and 1 mean the section has no alignment constraints
- File off : 參考 object.c , 會印出格式為 unsigned long 的 section->filepos
- 所以如果拿 .text 和 .init 來舉例的話， 1000+512 =1512 因為 .text size 512 ，.text 的 file 結束再 74 ，所以 .init 的 file 結束在 74+512=586

Idx Name          Size      VMA       LMA       File off  Algn
 0 .text         00000512  00001000  00001000  00000074  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 1 .init         0000000e  00001512  00001512  00000586  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

是否已掌握 GNU gprof 的使用？運作原理為何？

參考使用 Gnu gprof 进行 Linux 平台下的程序分析

GNU gprof 的使用

先利用 gcc -pg test.c 做出一份 a.out 執行檔
test.c 內容與聯結中相同，主要是嘗試各種方法的使用


















#include "stdio.h"
#include "stdlib.h"
void a(){
  printf("\t\t+---call a() function\n");
}
void c(){
  printf("\t\t+---call c() function\n");
}
int b(){
printf("\t+--- call b() function\n");
a();
c();
return 0;
}
int main(){
  printf(" main() function()\n");
  b();
}

運行方式是 main() call b() ，然後 b（）再 call a（）和 c（），而每一個 function 在一開始會印出一行字（表示呼叫到自己）
執行 ./a.out 會排列出函式呼叫的順序，產生 gmon.out 的檔案

 main() function()
	+--- call b() function
		+---call a() function
		+---call c() function

使用 gprof -b a.out gmon.out 會產生

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00        1     0.00     0.00  a
  0.00      0.00     0.00        1     0.00     0.00  b
  0.00      0.00     0.00        1     0.00     0.00  c


			Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name
                0.00    0.00       1/1           b [2]
[1]      0.0    0.00    0.00       1         a [1]
-----------------------------------------------
                0.00    0.00       1/1           main [9]
[2]      0.0    0.00    0.00       1         b [2]
                0.00    0.00       1/1           a [1]
                0.00    0.00       1/1           c [3]
-----------------------------------------------
                0.00    0.00       1/1           b [2]
[3]      0.0    0.00    0.00       1         c [3]
-----------------------------------------------


Index by function name

   [1] a                       [2] b                       [3] c

會發現調用圖寫出了 main 呼叫 b ，而 b 呼叫 a 和 c
- 文中說明所花時間很少是因為函式都只有做 printf（）
  - 如果使用 -p 是只輸出函式調用圖，而上頭的 -b 是不會輸出標題的描述，-q 是只輸出時間消耗
搭配 Cflow ，使用 cflow test.c 生成出來的 gmon.out
- 執行 gprof -b cflow gmon.out 會將函式使用的情況印出

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00      122     0.00     0.00  auto_processor
  0.00      0.00     0.00      122     0.00     0.00  delete_parm_processor
  0.00      0.00     0.00      121     0.00     0.00  move_parm_processor
  0.00      0.00     0.00      114     0.00     0.00  _option_is_end
  0.00      0.00     0.00      110     0.00     0.00  hash_string
  0.00      0.00     0.00      110     0.00     0.00  hash_symbol_hasher
...
...
                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name
                0.00    0.00     122/122         hash_do_for_each [31]
[1]      0.0    0.00    0.00     122         auto_processor [1]
-----------------------------------------------
                0.00    0.00     122/122         hash_do_for_each [31]
[2]      0.0    0.00    0.00     122         delete_parm_processor [2]
-----------------------------------------------
                0.00    0.00     121/121         hash_do_for_each [31]
[3]      0.0    0.00    0.00     121         move_parm_processor [3]
...
...

Graphviz 工具
- 文中提到 Graphviz 有 Dot 語言直連圖的能力，所以我們使用他將程式視覺化
- 搭配 gprof2dot 套件
將 test.c 畫成圖
GNU gprof 的原理，參考 GNU gprof
分析步驟：
1. You must compile and link your program with profiling enabled. See Compiling a Program for Profiling.
  - The -pg option also works with a command that both compiles and links
  - -pg 是 compliation 和 link 的 option，如果沒使用就沒有 call-graph data
2. You must execute your program to generate a profile data file.
  - 一但程序被編譯成分析用，他還是會像原本那樣輸出，只是會多花時間蒐集和寫入資料
  - 程序結束時會寫入 gmon.out file
3. You must run gprof to analyze the profile data.
通常在 compile 時加入 -pg ，就代表著每一個 function 會去 call mcount (or _mcount, or __mcount, depending on the OS and compiler) 做為第一個操作之一
包含 profiling library 的 mcount routine，通常利用檢查 stack frame 去找 child 的地址和返回 original parent 的地址
- mcount 是一個典型的簡短的組語 stub routine ，利用兩個參數 frompc 和 selfpc 呼叫 __mcount_internal ， __mcount_internal 是負責維護 in-memory call graph
  記錄 frompc （從哪個 program counter 來） , selfpc （自己的 program counter）, 和調用次數。

SGX 介紹與特性 (Intel SGX)

SGX 之特性: 參考
- 使應用開發人員能夠保護其機密資料與程式碼在 SGX 所創建的 enclave (可理解為 TEE) 運行, 不受其他具有更高特權軟體或是惡意程式進行修改
- 能使敏感性高的程式和數據之完整性與機密性受到保護, 因為正常運作的系統軟體會對系統進行資源管理和控制
- 即使攻擊者以掌握系統控制權並且直接攻擊記憶體內部資料情況下, 透過應用程序來定義程式和數據安全區域進而受到保護
- SGX enclave 啟動與銷毀
- SGX enclave 可信任通訊通道
SGX 之缺陷與改善:
- 研究人員打造可攻陷英特爾SGX硬體安全保護的惡意程式
- SGXKernel: A Library Operating System Optimized for Intel SGX
SGX 之應用: 參考
- 透明性驗證
- 防止洩漏隱私資訊的頁錯誤

Cross Compiler 參考

當使用 native compiler 來編譯某程式時, 此程式只能執行於同編譯器的運行平台.
然而 cross complier 編譯的是預執行於目標平台上的程式. 舉例來說, 有一 cross compiler 雖運行於 windows 7 pc 上卻可編譯程式而運行於 Android smartphoe. Cross compiler 必須能夠在多重平台上編譯.

倘若以 GCC 為例來解釋, 則有三個概念須先了解:
Build
編譯 GCC 的平台
Host
運行 GCC 的平台
Target
GCC 編譯產生的程式所運行的平台

build = host = target 為 native compiler，例如 Ubuntu 的 GCC 就是native compiler
build = host，但 target 與前兩者不同，就是cross compiler

Libffi 專案功用參考

Calling Covention (呼叫慣例)
因為函式呼叫牽涉到參數的傳遞, 所以並不只是單純跳到那個 Address 執行程式碼再跳回來這麼簡單, 呼叫副程式（函式）的主程式, 需要知道怎麼填參數, 副程式（函式）才能接到參數後進行處理, 再將結果傳給主程式. 這段協定, 稱之為Calling Convention(呼叫慣例)
Libffi Package
有些程式在編譯期間可能不知道有什麼參數傳遞到函式. 舉例來說, 直譯器 (Interpreter) 可能在執行期間被告知關於呼叫函數上的參數的型態與數目, 而 Libffi 可以扮演這些函式從直譯器到編譯完成之程式碼的橋樑
Libffi 函式
在使用 Libffi 之前需要先創造出 ffi_cif 物件, 方便之後的多重呼叫. 在 ffi_cif 中的 cif 扮演著 call interface, ffi_prep_cif 是進一步為了 call interface object 所使用, 如下範例程式碼所示

































#include <stdio.h>
#include <ffi.h>

int main()
{
  ffi_cif cif;
  ffi_type *args[1];
  void *values[1];
  char *s;
  ffi_arg rc;
  
  /* Initialize the argument info vectors */    
  args[0] = &ffi_type_pointer;
  values[0] = &s;
  
  /* Initialize the cif */
  if (ffi_prep_cif(&cif, FFI_DEFAULT_ABI, 1, 
		       &ffi_type_sint, args) == FFI_OK)
    {
      s = "Hello World!";
      ffi_call(&cif, puts, &rc, values);
      /* rc now holds the result of the call to puts */
      
      /* values holds a pointer to the function's arg, so to 
         call puts() again all we need to do is change the 
         value of s */
      s = "This is cool!";
      ffi_call(&cif, puts, &rc, values);
    }
  
  return 0;
}

BOLOS 運作原理參考

BOLOS 全名為 Blockchain Open Ledger Operating System, 是為一個嵌入式安全作業系統, 可執行於 Secure Elements, Hardware Security Modules 或 CPU enclave (TEE, Intel SGX)
在其系統架構中, 其應用程式運行在虛擬裝置上, 可在所有硬體上重新配置, 為其一大優點
下圖為透過應用程式來俯視整體系統運作, 在 Appliation 周圍的方塊可被視為 coprocessor, 皆受於 Application 的直接指令之下. 有些方塊除了接收指令外, 也能夠觸發事件產生, 舉例來說, 透過使用者按下按鈕後, application 下指令給 I/O peripheral 進行運作
BOLOS 被定義成兩層的 Delegation Model. 首先當應用程式執行時, BOLOS 便只提供基礎的服務給他們. 其後執行終了, BOLOS 不會有任何指令運作也就不會更動到 I/O 操作. 下圖為 USB delegation 流程
下圖 BOLOS 架構細節:
- BOLOS 可分成兩個晶片; 一個為 secure, 另一具有 JTAG 作為 proxy
- 在 secure 上劃分成兩部分; 一部分可與軟硬體互動且在 NDA 下, 另一部分可讓應用程式載入所用
- 透過 proxy 晶片作為數據的 Inputs/outputs 來連接 secure element, 一來消弭其安全性隱憂外, 更善用 simple asynchronous protocol 讓 secure element 與 proxy 合作執行
moxiebox execution environment 所對應 BOLOS 運作原理為:
- Phase 1 發生在 Secure Element 當中, 並將 ELF 資料從 BOLOS Loader 匯入
- Phase 2 一樣發生在 Secure Element 中, 並將匯入的資料與程式在 BOLOS kernel 中執行. 本身是 single thread 且 single pipeline
- Phase 3 發生在 proxy 部份, 將運算結果存放在此

Model Checking

自動機理論 (Automata theory)

自動機理論的主要目標是, 建立能夠對具有離散狀態的系統進行其動態行為的描述與分析, 在此系統中其訊號能夠被週期性地取樣.

這一類的機器具有以下三個特徵:

輸入: 假設從輸入訊號的有限集合選取一組符號序列
輸出: 從一輸出的有限集合中選取一組符號序列
狀態: 基於自動機類型下所定義的一有限集合

主動機有以下四種類型:

Finite-state machine
Pushdown automata
Linear-bounded automata
Turing machine

有限狀態主動機介紹:

Model Checking in Practice

Case Study

A Model Checking-based Method for Verifying Web Application Design
IOT: Formal Security Analysis of Smart Embedded
Systems

Practical Application

Stochastic Model Checking

a probabilistic security protocol
dynamic power management
a biological pathway

cbmc

Bounded Model Checker for C and C++

參數，詳細可以使用 cbmc –help 查看
–bounds-check　
–pointer-check
–show-vcc :verification conditions
–unwind :將迴圈展開的數量上限，詳細參考
–unwinding-assertions
–trace

cbmc –unwind 20 binary_search.c

** Results:
[test_bsearch.assertion.1] assertion linear_search_result == binary_search_result: SUCCESS
[test_sort.assertion.1] assertion is_sorted(x, size): SUCCESS


** 0 of 2 failed (1 iteration)
VERIFICATION SUCCESSFUL

cbmc fft.c

[main.assertion.1] assertion !ok: FAILURE


** 1 of 1 failed (1 iteration)
VERIFICATION FAILED

利用 –trace 顯示詳細資料
cbmc –trace fft.c

State 1620 file fft.c line 175 function main thread 0
----------------------------------------------------
  k=0u (00000000000000000000000000000000)

State 1621 file fft.c line 175 function main thread 0
----------------------------------------------------
  k=0u (00000000000000000000000000000000)

State 1624 file fft.c line 175 function main thread 0
----------------------------------------------------
  k=1u (00000000000000000000000000000001)

State 1628 file fft.c line 175 function main thread 0
----------------------------------------------------
  k=2u (00000000000000000000000000000010)

State 1632 file fft.c line 175 function main thread 0
----------------------------------------------------
  k=3u (00000000000000000000000000000011)

State 1636 file fft.c line 179 function main thread 0
----------------------------------------------------
  expected_value=8u (00000000000000000000000000001000)

State 1637 file fft.c line 175 function main thread 0
----------------------------------------------------
  k=4u (00000000000000000000000000000100)

State 1641 file fft.c line 171 function main thread 0
----------------------------------------------------
  j=4u (00000000000000000000000000000100)

State 1644 file fft.c line 169 function main thread 0
----------------------------------------------------
  i=4u (00000000000000000000000000000100)

Violated property:
  file fft.c line 190 function main
  assertion !ok
  !(ok != FALSE)

可以發現錯誤發生在 i,j,k=4 時

第 6 週作業回顧

是否已知曉搭配 computed goto 實作更高效的 opcode dispatch 呢？舉例說明並且附上效能分析

full-stack-hello 利用 label as values 的方式實做 computed goto
修改 full-stack-hello 的 vm_run 部份，改為使用 switch dispatch，比較兩個實做的效能差異
修改後的 code

switch dispatch

 Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./as_exec tests/fib-iterative.s --input 46' (100 runs):

                74      cache-misses              #   14.327 % of all cache refs      ( +-  5.00% )
               516      cache-references                                              ( +-  0.71% )
             1,675      instructions              #    0.45  insn per cycle
             3,682      cycles                                                        ( +-  1.40% )

       0.005200966 seconds time elapsed                                          ( +-  7.80% )

[ ]computed goto

 Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./as_exec tests/fib-iterative.s --input 46' (100 runs):

                78      cache-misses              #   15.177 % of all cache refs      ( +-  3.95% )
               513      cache-references                                              ( +-  0.77% )
             1,675      instructions              #    0.47  insn per cycle
             3,591      cycles                                                        ( +-  2.02% )

       0.005170761 seconds time elapsed                                          ( +-  3.95% )

執行時間差距不明顯，computed goto 比　switch dispatch 快 0.5%

Fibonacci sequence 在真實世界中究竟有何應用？

fibonacci search

參考資料
與二元搜尋法類似，但改用費氏數列當作搜尋間隔
目標 : 對一有 n 個元素且排序好的數列 a[n]，找到 a[i]==key
方法 : 找到最接近的費波那契數
$f i b_{k}$ 且
$f i b_{k} + m = n, m \geq 0$
若
$k e y < a [f i b [k - 1]]$ ，則第一個搜尋位置為 fib[k-1] (即搜尋範圍為[0, fib[k]])
若
$k e y > a [x]$ 則第一個搜尋位置為 fib[k-1]+m (即搜尋範圍為[m, fib[k]+m])

int fibonacciSearch(int number[], int length, int key) { 

    int k  = findK(Fib, length + 1); 
    int m = length - Fib[k]; 
    int x = k - 1; 
                                                         
    int i = x; 
    if(number[i] < key) 
        i += m; 

    int result = -1;
    while(Fib[x] > 0) { 
        if(number[i] < key) 
            i += Fib[--x]; 
        else if(number[i] > key) 
            i -= Fib[--x]; 
        else {
            result = i;
            break;
        }
    } 
    return result; 
}

時間複雜度:O(lg(n))

何謂 tail recursion 呢？舉出具體而微的案例並用 perf 解釋細部差異 (特別是 cycle count 的落差)

tail recursion :
- A recursive function is tail recursive when the recursive call is the last thing executed by the function.
我們拿 Fibonacci 的數列來做比較
- Fibonacci tail recursion
```
// A tail recursive function to
// calculate n th fibnacci number
int fib(int n, int a = 0, int b = 1)
{
	if (n == 0)
		return a;
	if (n == 1)
		return b;
	return fib(n-1, b, a+b);
}
```
- Fibonacci recursion
```
int fib(int x) {
	if (x == 0)
		return 0;
	if (x == 1)
		return 1;
	return fib(x-1)+fib(x-2);
}
```
- 如果用 tail recursive 以 3 來說就像是：
  - 因為 recursive call 是最後被執行的事，所以可以這樣 return 回去
```
	fib (3,0,1) -> fib(2,1,1) -> fib(1,1,2)
	 return 2 <---  return 2 <--- return 2	
```
- 如果用 recursive 以 3 來說就像是：
```
    fib(3)------> fib(2)------>fib(0)
    |             | return 2   return 1     
    ----->fib(1)  ------>fib(1)
	    return 1       return 1
```
- 這樣比較下來發現，因為 recursive call 因為已經先把一些能計算的值放在參數中便於使用（如迴圈的複雜度），自然會比 recursive （如樹狀複雜度）用到時才呼叫快
- 另一個好處是 STACK 的使用，參考
  - 在編譯器判斷出此程式是 tail recursive （recursive call is the last thing executed by the function）此時的 STACK 就可被重複覆蓋，像上方從尾端 return 回去的 2 都是可以重複使用同一塊空間的
舉出具體而微的案例並用 perf 解釋細部差異 (特別是 cycle count 的落差)
- 程式碼的部份我放在我的上一份作業 simulator
- 這邊先用 –input 10 去做跑 100 次的比較，發現 cycle 數的減少:

Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./as_exec -x Fib' (100 runs):

              1622      instructions              #    0.33  insn per cycle                                            
              4914      cycles                    ( +-  1.46% )
       0.008097402 seconds time elapsed           ( +-  1.50% )

Performance counter stats for 'sudo chrt -f 99 taskset -c 0 ./as_exec -x Fib' (100 runs):

              1633      instructions              #    0.33  insn per cycle                              ( +-  0.69% )
              4802      cycles                    ( +-  1.61% )
       0.007691896 seconds time elapsed           ( +-  1.39% )

以 input 為 10 ， cycle 數量的確猶如上方分析的 tail_recursion 是花比較少時間而且 cycle 數也較少
- 速度變快（0.008097402-0.007691896） / 0.008097402 = 0.05007853136 相當於變快 5%
- cycle 數（4914-4802） / 4914 = 0.02279202279 ，cycle 數下降 2.3%
做出不同 input 數值比較 (這裡列出 1~30 間隔為2，例如：1,3,5,7…)
- 所花費時間上：單位 second
- 有符合時間複雜度 recursion:
  $O (2^{N})$ 和線性的tail_recursion
  $O (N)$
但在 cycle 數方面
- 呈現不規則鋸齒狀:但還是可以看的出來在 number=19 後 tail 所花的 cycle 數都相對比 recursive 較少
尋找鋸齒狀原因：目前猜測與 cache 有關，因為用 perf 觀察 cache-misses 發現 cycle 數的趨勢和 cache-misses 趨勢雷同
tail recursive 的 cycle
tail recursive 的 cache-misses
如何預測 input 為多少時會有較高的 cache-misses？