<style>
H2{color:#BF0060 !important;}
H3{color:#009393 !important;}
p{color:Black !important;}
li strong {color:#4682b4 !important;}
.alert-info{
background-color:#e4edf6 ;
color: black;
}
.alert-warning{color: black;}
td{font-weight:bold;}
</style>
# Distributed File System
## **Basics**
* **2 major goal of Distributed file system**
* Network transparency : Users not aware of the location
* High availability
### Naming
* **3 Approaches**
1. Concatenate the host name to the name of files
* conflicts with network transparency and not location-independent
2. Mounting
* Mouning info can be stored at
* clients : clients 可以掛在 server 不同地方
* server : 移動檔案時由 server 負責更新
3. Have a single global directory
* limit to one server
### **File models**
* **Unstructed File**
* OS 不知道檔案格式/欄位,由應用程式自行解讀
* UNIX
* **Structed File**
* OS 知道檔案結構
* 特殊用途
* **Immutabled File**
* Maintain history
* Easier to support file caching, repication
* 磁碟存取增加,所以通常只存近期 history
* **Mutable File**
* Update overwrite on its old content
### **File Access Model**
* 2 Factors : Access Method and Data Units
* **Access method**
* 3 access Methods
1. ==Remote Service Model==
* User process ^read(file,0,100)^⇨ Client ^msg^⇨ Server
* `Network overhead `
2. ==Data-Caching Model==
* Client maintains cache
* 令要求 100 byte,Server 會回傳 entire file 存進 cache
⇨ User process gets data immediately if cached
* `Concurrence control` : 2台對 local cache 寫
3. ==Hybrid Method==
* 平時用 data caching,要 write 再轉成 remote service
* **Data Units 存檔單位**
* 4 possible levels
* | | pros | cons | example |
| ------ | ---- | ---- | --------|
| File | `efficiency`<br>`scalibility`<br>`reliability`<br>`low disk access overhead`|Greater storage on client side|CFS, AFS-2
| Block |`less storage on client side`|Poor performance as requesting entire file|LOCUS, Sun's NFS
| Byte |`Max Flexibility`|Difficult for cache mgt|Cambridge File Server
| Record |`Best for structured files`||RSS
### **Semantics of Filing Sharing**
* **問題**
* For distributed system with cacheing client :
* client 寫進 cache ,server 還沒更新其他人就 read,因此讀到舊檔案
* **解法**
* 4 ways dealing with shared files in DS
* |Method|Remark|
|------|------|
|1. Unix Semantics|Every operation on files instantly seen to all, and thus difficult to implement|
|2. Session Semantics|No changed are visible to others until the file is closed|
|3. Immutable Files|No updates are possible ; simplifies sharing and replication|
|4. Transaction|All change are occurs atomically <br>(類似lock不會被插入)|
## **Caching**
### **Cache Location**
* 3 place to reduce 3 kinds of latency
*
* 放在 mem 都有 reliability 問題 : 未寫入 disk 的資料 crash 後消失
1. ==Server's Main Mem== :
* Eliminate <span style="background:#ADD8E6">disk access</span> (of server)
* Support Unix-file sharing semantics
* Problems of scalability(更多 clients), reliability(server掛了)
2. ==Client's Disk==
* Eliminate the <span style="background:#ADD8E6">disk access, network access </span>
* Useful for file-level transfer model
3. ==Client's Mem==
* Eliminate the <span style="background:#ADD8E6">disk access, network access </span>
* 效能最大
* Not suitable for file-level sharing (因為 mem 放不下整個 file)
### **File cache vs. Memory cache(in CPU)**
* ||File cache|Mem cache|
|-|-|-|
|size|up to a full file| L1~L4 = 4KB~128KB|
|delay|communication + storage access |local bus + memory access|
### **Writing Policy**
* **Write Through Policy**
* imeediately send to server
* Unix-like semantics
* **Delayed Writing Policy**
* Mark the modified entry, all updated ectries are gather and send to server, which improves the performance but suffers from reliability.
* Write on ejection of cache
* Write on close (session semantics)
### **Cache Consistency (CC)**
* **名詞**
* Server-initiated CC
* 由 server 發現並 inform client's cache manager
* Client-initiated CC
* client 主動 validates the data with server
* **情境一 : Concurrent-Write Sharing**
* multiple readers and at least one writer
解法:
* locking
* 有人寫時 not allow file caching
* 要寫時 server 通知大家清除cache
* **情境二 : Sequential-Write Sharing**
* A opens a file that has been modified by B recently, so A has outdated blocks.
解法:
* Associate files with timestamp for server to detect inconsistency
* Data in B 還在等被 flushed to server (delayed writing policy)
解法
* Whenever a new client opens the file for reading, 其他 clients 都要 flush 自己的 modified cache
## **Replication**
### Multicopy update protocol
* **Quorum-based protocol**
### **Fault Tolerance**
* **Failure**
* server/client crash ⇨ loss state info
* Transient faults ex 電源失壓
* **Stateful File Server**
* require crash recovery
* **Stateless File Server**
## **CODA File Systme**
* **Overview**
* Centrally administered by Vice File System
* Client :virtue,有 VFS
* Cache manager : Venus,有 RPC stub 和 server 溝通
* **Communication**
* RPC2 : 進化版 RPC system
* periodically inform aliveness
* support side-effects : printf, video play
* support multicast : sending invalidation msg in parallel
* **Sharing Files**
* 以 transaction 為單位
* read, write 視為 session
* updates are sent back only when the file is closed ⇨ 不可能達成 Unix semantic
### **Transactional semantic**
* **serializable**
* operations 可序列化因為 session 可以排次序
* **支援斷線** (Allow network partition)
* Venus (cache manager) knows necessary locks at the start of a session
* Conflict across partition
:::info
* 解法 1 :使用 2PL
* 如果 transaction 可序列化則有解,==使用 2PL 必定可以序列化==
* 解法 2 : 利用 version number
* reconnect 時 update 送回給 server 處理
* update accept 條件:client last 版本號 + server 在這段 session 成功 update 的次數 = Current version number + 1
:::
* Disconnected operations
* 斷線時進入 Emulation, Hoarding(囤積)狀態
* Reconnct 進入 Reintegration 狀態
* Cache mgt
* 為了支援端線, CODA caches 整個 file
* priority : user自訂, history遠近, hierarchical(cache 整條 file path)
### **2PL**
* A 等 B 做完才 unlock 就不會有 consistency 的問題,為了維持 consistency,transaction 在 unlock data object 前要拿到所有 needed lock
* 因此解法為實作 2PL
* growing phase
* shrinking phase
* **Test by serialization graph**
* Graph
* Arc from T~i~ to T~j~ 表示 T~i~ unlock 後 T~j~ 又 lock this data
* ==Topology sort== on the graph
* 步驟 : 逐步拿掉 indegree = 0 的 node
* 1人指向你則 indegree = 1
* 結果不唯一
* 若 graph 有 cycle ⇨ not serializable
* 無 cycle ⇨ topology ==結果 is a serial order== for transaction
### **SS2PL**
* 2PL scheme 錯情況
* 出如果 T1 先 release A,T2 就可以讀 A
* T1 後來 rollback 所以 T2 讀錯了也 rollback
* 導致連鎖出錯
* 因此提出 strong strict 2PL
* 所有 transaction 都 commit 才一次 release 所有 lock
### **Replica Control**
* **情境**
* 使用 read-once write all policy 時(從任一server讀,但寫要寫入全部)
* Conflict across partition
:::info
* 解法:人工解,但可以用 version vector detect
對 3 個object,一邊是 [2,2,1],一邊是 [1,1,2]
:::
### **Cache Consistency**
* **server 負責**
* server 追蹤所有有 cache 的 client ,client 更新會 callback
* Upon modifications, server 負責通知這些 clients by sending ==invalidate== 訊息
## **NFS**
### **Architecture 以 Unix 為例**
* VFS(virtual file system)在 kernal
* transparent access to remote file
* application 無察覺,用本來的 system call 即可
* VFS 再判斷需要 ==NFS Client== 還是在 Local File System
* NFS client server 用 RPC 溝通
* RPC 可以用 TCP/UDP,並且像 port 一樣是 open 的
* NFS 可以當 client 也可以當 server 並且是
* OS-independent
* client, server 用不同 OS 也可以
* **NFS 的 File Identifier 是 File Handle**
* 使用 File Handle 是通訊標準,但不同 OS 實作方式不同
* VFS 負責轉換 file handle to local file identifier
* p79
* NFS client 先 mount ⇨ server 回傳 handle
(file system 進入點)
* NFS client 下 lookup, create, and mkdir op,server 回應 file handle
* NFS client 對檔案下 op 時,把 file handle 當argument 傳進去,server 才知道對誰動作
### **NFS 的 Access control**
* **問題**
* Server 是 Stateless,所以 client 每次透過 RPC 溝通都要送 authentication(user id for access permission)
* **解法**
* Kerbose
* client 先跟 Kerbose 的 autheication server 確認身份,再跟 Kerbose 的 ticket-granting server 要 ticket
### **Mount**
* **hard, soft mount**
* When a user-level process accesses a file in a file system, it retries until server is available or only tries for a few times.
* **automounter**
* mount point 動態決定,本來指向 empty
* automounter maintain 一個 moint point table listing NFS servers
* NFS client resolve file path 時, probe table 裡的 servers,第一個回應的就給 client 當 mount point
* 優點:[補]
### **Pathname Translation**
* 要在 client 端一段一段找因為 pathname 涵蓋不同 mount point
### **Caching**
* **Policy**
* Read-Ahead
* prefetch pages
* Delayed Write
* altered page 要被換掉才寫入 disk
* sync operation : 每 30s flush to disk
* Write 這個 operation 需要送 msg 確保真的有完成
* **安心版**
* server reply 前已經 write to disk
* **效率版**
* Delayed-write scheme 中 data 只寫進 memory cache ,所以為了確定 write 真的有完成,client close file 要發 commit 通知 server 寫入 disk, server 寫完 reply 才可安心
### **Client cache**
* client 負責 polling server
* **利用 timestamp validate cached block**
* cache 裡每個 data 有兩種 timestamp
* TC : last validated
* TM : last modified at server
* **Valid**
* 條件一:T-TC < t,ex 上次 valid 時間 < 3s
* t小 consistensy 好
* t大 efficiency 好
* 條件二:TM~client~ = TM~server~,上次更新時間相同
### **Write Policy**
###### tags: `OS`