What is HPC?
- Non-function requirement (非功能性需求)
Batch vs Interactive Processing
- Batch processing (批次處理)
- Interactive processing (交互式處理)
Parallel and Distributed Computing
Parallel Computing (平行)
- 在同一計算機系統內同時執行多個任務。
- 目的在於提高效率 (High Performance)。
- 因為是在同個計算機系統內,所以共享相同的記憶體。
- For high performance
Distributed Computing (分散)
- 將任務分配給多個獨立的計算機或節點處理。
- 目的是將大型任務分解為多個小任務,以提高整體性能和可擴展性。
- 資源共享 (Resource sharing)。
- 例如多個電腦使用同一台印表機。
- 節點和節點間需要透過網路傳輸資料,達成資源共享的目的。
Other Computing Demands
- High throughput
- 高產出
- 可以同時服務更多的 user、處理更多的 request
- High availability
- Load sharing
- 把 workload (運算負載) 分攤給不同的 node
- Workflow computing
Throughput vs. Performance
- Throughput 比較偏向業務邏輯層面的效能
- 每分鐘可以處理多少 request
- 但是每個 request 大小是不同的
- Performance 比較偏向絕對、更有標準的效能指標
- 每秒可以執行多少 instruction
- 每秒可以執行多少 operation
- Work harder
- Faster CPU
- 但因為摩爾定律,需要等 1.5 年才能有更快的 CPU
- Work Smart
- Efficient Algorithm
- 時間複雜度有數學上的極限
- Get Help
- Parallel Processing
- 程式要改成平行的版本、需要購買更多CPU和機器
Cluster Computing (PC Cluster 叢集電腦)
NoW (Network of Workstations)
Why PC Cluster?
- The Beowulf Project
- PC 及 networking 的速度越來越快
- 大型電腦中心工作等候時間過長
- 受限於經費, 無法採購大型主機
電腦效能判斷指標:FLOP/s (Floating-point operations per second)
What is Cluster Computing?
- Cluster is a collection of interconnected computers working together as a single system.
- The initial idea leading to cluster computing was developed in the 1960s by IBM as a way of linking large mainframes to provide a cost-effective form of commercial parallelism.
- The nodes of a cluster can exist in a single cabinet or be physically separated and connected via a LAN.
Cluster Usage Modes
- NOW (network of workstations)
- Tapping the idle cycles of existing resources.
(挖掘現有資源的閒置週期。)
- PMMPP (poor man's MPP)
- MPP (Massively Parallel Processor)
- Dedicated cluster acquired for running high performance parallel applications.
Three schemes are used to share cluster nodes
- Dedicated mode
- Space sharing
- Time sharing
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Q. 為什麼大部分還是選擇 Time sharing 而不是 space sharing?
A. 因為需要對使用者即時回應。
例如,滑鼠點擊,必須馬上對使用者產生回應。
Grid Computing
What is the Grid?
"A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities."
"We will probably see the spread of 'computer utilities', which, like present electric and telephone utilities, will service individual homes and offices across the country."
"coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations."
A three point checklist by Ian Foster:
- coordinates resources that are not subject to centralized control
(協調不受集中控制的資源)
- using standard, open, general-purpose protocols and interfaces
(使用標準、開放、通用的協定和接口)
- to deliver nontrivial qualities of service
(提供非凡的服務品質)
The key goals of Grid computing by IBM
- Improved efficiency and utilization of all computing resources within an enterprise.
- The ability to form virtual organizations that collaborate on common problems by enabling them to share applications and data.
- The ability to tackle very large problems demanding huge computing resources by enabling the aggregation of computing power, storage and other resources.
- The ability to help lower the total cost of computing by enabling the sharing, efficient optimization and overall management of those computing resources.
Other Models and Paradigms
- Application Service Provider
- Service-Oriented Computing
- Utility Computing
- Network-Centric Computing
- On-Demand Computing
Cloud Computing
What is Cloud Computing?
- To some, the cloud looks like Web-based applications, a revival of the thin-client.
- To others, the cloud looks like utility computing, a grid that charges metered rates for processing time.
- Service models
- SaaS
- Software as a Service (軟體即服務)
- PaaS
- Platform as a Service (平台即服務)
- Google App Engine
- Microsoft Azure Services Platform
- IaaS
- Infrastructure as a Service (基礎設施即服務)
- Deployment models
- Private cloud
- Public cloud
- Hybrid cloud
- 優先使用 private cloud,不夠用則使用 public cloud
- Community cloud
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- In one sense, HPC is the very earliest adopter of cloud computing.
- But yet, in another sense, HPC is still a very immature service compared to other emerging cloud services.
Types of Parallel Jobs
- Rigid
- 直接寫死要用多少 CPU 執行
- 程式撰寫容易,但使用上沒效率

- Moldable
- Evolving
- 在計算過程中 CPU 數量會一直改變
- 例如因計算量上升而增加 CPU 數量
- Malleable
- Previous Efforts
- HPC in the cloud
- Deploying HPC applications on existing cloud platforms
- Effective only for loosely synchronized applications
- HPC plus clouds
- A hybrid infrastructure that combines HPC resources with public clouds
- Providing advantages such as handling unpredictable bursting workloads, supporting hybrid workflows
- Problems related to the lack of high-end HPC resources remains
- HPC as a cloud
- Connecting small HPC clusters, which could be virtualized or nonvirtualized, together to form a large cloud
HPC as a Service
(p.65 p.66)
- HPC as a Service aims to leverage the advantages of the cloud computing paradigm, such as ease of use and dynamic allocation, to provide an interface for current HPC resources without incurring performance degradation.
- A prototype transforms the IBM Blue Gene/P into an elastic cloud of multiple federated clouds supporting dynamic provisioning, efficient utilization, and maximum accessibility of HPC resources based on Deep Cloud (IaaS) and CometCloud (PaaS)
- From users’ perspective, cloud computing enables convenient, on-demand access to a pool of configurable computing resources, which can be rapidly provisioned and released with minimal management effort or service provider interaction.
- Ease of use
- Dynamic allocation
- Elastic and fast resource provisioning
- Resource-oriented job submission in current HPC systems
- 需要對系統有一定熟悉度
- Best efforts for QoS
- QoS-oriented job submission in future HPCaaS systems
Research Issue
- From HPCaaS providers’ perspective,
- Ease of use implies high user satisfaction
- Dynamic allocation is important to efficiently utilize resources and serve more users
- Elastic and fast resource provisioning is necessary to support advanced features of HPC applications
- Resource utilization rate is the key to users’ job turnaround time
- Number of processors allocated while there are jobs waiting in queue
- Job scheduling approaches
- Each processor’s utilization rate
- Applications’ speedup behavior

Future Challenges
- An iPad interface to run HPCaaS simulations. Scientists can use the easy-to-program interface to set up an experiment, adjust budget parameters or time to completion, monitor and steer the workflow, and obtain interactive results, allowing users to concentrate on their simulations and spend less time as system administrators.
- Elastic resource provisioning
- Evolving and malleable jobs
- Allowing users to programmatically expand or shrink their resource requirement
- Pricing
- Workflow execution
- Fair sharing
- SaaS
- integrating new applications easily and exposing these applications to end users through thin clients as SaaS (Service-Oriented Architecture).
The Data Center View

(cloud computing has more scalability)
- By creating a network that is spread thin and wide rather than narrow and deep, Google created a new kind of concentrated power—derived more from scale of the whole than any one constituent part. This, some say, describes the cloud.
- The cloud is very robust(健壯)and can recover gracefully from the most common ailments, such as connection and hardware failures, because there are so many more drones available to take on the work.
- Data centers are making heavy use of virtualization to squeeze the most out of the watts they are consuming.
- Batch processing vs. service provisioning
The Distributed Computer View
- Distributed computing is not inherently new to the era of the cloud.
- SETI@home, Folding@home, …
- UC Berkeley’s BOINC, desktop grid, volunteer computing
- The open source project Hadoop provides a general-purpose framework for developers to rapidly employ distributed computing in a wide variety of projects.
- MapReduce: dividing jobs into component tasks
- HDFS: Hadoop Distributed File System
The Utility Grid View
- It seems like every shape one sees in the cloud is inspired by a computing model from the past.
- Back in the days of mainframes and fancy supercomputers housed at research universities, valuable processing time was essentially for sale. Processing time was delivered like electricity– you paid for what you used.
- Today, most medium to large-sized organizations invest in their own data centers and use them at will.
- What’s worse for the balance sheet is that organizations need to plan for worst-case scenarios
- Overpowered servers capable of handling loads which can peak high but occur infrequently
- Real world estimates of server utilization in datacenters range from 5% to 20%
- Amazon, Google, and IBM have invested in, innovated, and become expert at housing their own large-scale data centers. Why not scale up their data centers—grow the cloud—and create business models to support third-party use?
- Internet retail giant Amazon is the first out of the gate to commercialize their cloud in October 2007
- Today, many Web applications fail under the load of big traffic spikes
- But in the cloud, additional machine instances can be launched on demand. The application dynamically, and gracefully, scales up. When traffic slows down, the application can scale down, terminating the extra instances
The Software as a Service View
- For cloud computing to move front and center, the networks that tie everything together need to be extremely robust
- Are our networks ready to handle the load?
- The cloud raises concerns among privacy advocates. Most significantly, the cloud demands a high degree of trust
- Blockchain?
So, Cloud Computing is…
- Cloud computing is the long-held dream of computing as a utility.
- An old idea whose time has finally come

- The datacenter hardware and software is the cloud.
- Pay-as-you-go

Three Aspects are New
- The illusion of infinite computing resources available on demand
- The elimination of an up-front commitment by cloud users
- The ability to pay for use of computing resources on a short term basis as needed

The Impact
- Amazon’s EC2 may pave the way to where businesses no longer invest anything into data centers of their own
- Cloud Computing is likely to have the same impact on software that foundries have had on the hardware industry
Why Now?
- Does IT Matter ? (IT 有什麼明天?)
- 差異性競爭優勢
- Application software
- Computing resources
- 共通性基礎設施
- Cloud Computing
Obstacles to Cloud Computing
- Availability of Service
- Data Lock-In
- Data Confidentiality and Auditability
- Data Transfer Bottlenecks
- Performance Unpredictability
- Bugs in Large-Scale Distributed Systems
- Scaling Quickly
- Software Licensing