# Papers Reading Notes ###### tags: `Distributed System` # Cluster-Based Scalable Network Service > ### ACM SIGOPS Operating Systems Review Volume 31 Issue 5 Dec. 1997 pp 78–91 https://doi.org/10.1145/269005.266662 ### 組別: ### 組員: `Distributed System` ## Outline ### Abstract 1. Identify 3 fundemental requirements for scalable network service: - incremental scalability and overflow growth provisioning - 24x7 availability through fault masking - cost effectiveness 2. Proposed a general, layered architecture for building cluster-based scalable network services that encapsulates the abve requirements for reuse, and a service-programming model based on TACC of internet content. 3. Two real implementation of services based on this architecture are discussed, the one is the TranSend, a Web distillation proxy deployed to the UC Berkeley dialup IP population and the other is the HotBot, a commercial implementation of the Inktomi search engine. ### Key Concept #### TACC A Service Programming model based on composable workers that perform transformation, aggregation, caching and customization #### BASE A weaker-than-ACID data semantics that results from trading consistency for availability and relying on soft state for robustness in failure management #### Scalibility, Availability and Cost effectiveness 3 fundemental challenges to the deployment of the network service - By scalibility, 服務的負載升高時,可以線性的進行硬體擴容並且維持相同的服務水準。 - By availability, 即便遭遇短暫軟硬體失效的狀況下,仍可維持 7x24 的可用性 - By cost effectiveness, 即使在工作節點會不斷增加的狀況下,仍能維持經濟的管理以及擴充成本。 #### 2.3 TACC : A programming model for internet service TACC stands for transformation, Aggregation, Caching and customization - Transformation an operation on a single data object that changes its content, such as filtering, transcoding, re-rendering, encryption and compression. - Aggregation collect and collate data - Caching 資料在網路移動的成本比重新計算或者暫存成本都來得高。使用暫存可以有效降低網路負載。 - Customization 個人化、客製化 #### TranSend a scalable transformation and caching proxy for the 25,000 Berkley dialup IP users (connecting through a bank of 600 modems) #### HotBot It is the commercialized search engine from Inktomi, which performs millions of queries per day against a database of over 50 million web pages. ### Architecture Design - Why choose ? - How prolonged burst.. - fault tolerance and availability - narrow interface to servivce-specific workers - ## Motivation 觀察到工作站叢集(clusters of workstations)具有一些加以發揮就可以達成擴展性、可用性以及成本效益的良好性質。比如使用商用PC 就可以讓服務的提升與成本上升的陡峭曲線趨緩,叢集的 redundany 可以屏蔽掉移轉失效(transient failures) 以及 [Embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) 類型的網路服務工作負載 (`network service workloads`)可以很好的對應到工作站網路。然而,要發展一個叢集軟體以及可以管理運行中的叢集仍是一複雜的事。 ## Solved Problem & Contribution 設計、分析以及導入了一個 layered framework 表明了上述所提的複雜之處,以及讓新的服務可以採用此架構即可解決 scalabilty, availability,等等問題,讓開發者可以專注在服務的內容上。 ## Solutions ### Layered framework #### lower layer: handles scalability, availability, load balancing, support for bursty offered load, and system monitoring and visualization #### middle layer: provides extensible support for caching, transformation among MIME types, aggregation of information from multiple sources, and personalization of the service for each of a large number of users #### top layer: allows composition of transformation and aggregation into a specific service, such as accelerated Web browsing or a search engine ### Conclusion 1. Proposed a layered architecture for cluster-based scalable network services 2. Identify the class of cluster-based scalable network services can substantially increase the value of the internet access to end users while reamining cost-efficient to deploy and adminstrater ## Research Value & Application ### Challenge of cluster computing paper 作者 提出相對於 SMP 架構的四點挑戰 #### Administraion 作者在前篇論文提出透過視覺化工具可以有效支援管理工作 #### Component vs. system replicaion 單一台商用 pc 的運算能力或許無法支撐整個服務的運算負載,但透過適當的功能限縮,以及可以可交換的工作類型,例如要 cached data 或對資料進行壓縮等工作。 重製系統以及工作的移轉相對是容易的。 #### partial failure 必須要能解決系統部分失效的問題 #### share state 如何避免跨叢集 share state 的需要或是降到最少 作者特別針對 partial failure 以及 share state 的問題提出以下解法: ### BASE Semantic 相對於交易類型的 ACID,作者提出 BASE 分別代表 B: Basically Available, Soft sate, Eventual consistency - Basically Availability - Soft sate for fault tolerance and availability - SNS Component 借鏡了在廣域 tcp/ip 網路獲得巨大成功的方法之一,其乃依賴定期從同儕節點回報並且暫存的軟體狀態 (soft state) (? ) - 另一方法則使用 time out 作為容錯機制。 ## Institution of the approach ## Main Result, evaluated and mean ## Improvement, limitation and weakness (Author's future work) - adaptation via distillation (?) 作者在過去提出這個方法認為可以用在動態的調整以符合使用者網路連結的行為,未來想要結合 www proxy prototype with the event notification mechanism 以提供一個 adaptive solution 給無線網路進來的 web access - 尚未研究所提出的架構在網際網路的服務器架構是否依然可行。 - TACC 模型仍未成熟,期待可以開發成 SDK ## Questions ## # My Note - How can you introduce the paper in 3min? (Outlined) - What is the motivation of the paper? - What problem does the paper aim to solve? What is the problem definition? - What are the research values of the paper? How can we apply the solution in practice? - What is the institution of the approach proposed by the paper? - What are the main results of the paper? How the results are evaluated? What does the result mean? - What could be improved from the paper? What are e limitations/weakness of the paper? - What other questions do you have? --- # Weighted Voting for Replicated Data