or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Dynamic Provisioning of Large-scale Data Processing in Cloud Environments
B07902134
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Agenda
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Introduction
Heterogeneous Computing
David Patterson:
A New Golden Age for Computer Architecture
Cloud Computing
Afford us extreme amounts of capacity and flexibility.
Increasing complex, spawning industries to optimize cost.
Not including accelerators such as FPGAs, GPUs, and TPUs!
Source: Trends and Challenges in Big Data
More and More Choices
Not viable for complicated multi-stage tasks?
Source: Trends and Challenges in Big Data
Spot Instances
Big cloud providers now sell unused capacity at a lower price (up to 90% savings).
(https://azureprice.net/: A VM with \(120C\)/\(456G\) RAM can cost only \(0.36\) USD/hr.)
Issues?
Other Billing Complexities
With great flexibility comes great responsibility!
To optimize performance/cost, one needs to consider:
Goals
High performance at low cost
This is unlike traditional scheduling problems, where the amount of resources is fixed.
And unlike cloud optimization solutions, we only focus on data analytics workloads.
Software Stack
Apache Spark
A Spark workflow can be represented by a DAG,
in which vertices are RDDs and edges are operations
Source: Understanding your Apache Spark Application Through Visualization
Spark Example
Source: https://wikipedia.org/wiki/Apache_Spark
Nodes in Spark
Source: Understanding the working of Spark Driver and Executor
Kubernetes
Can be used as cluster manager for Spark.
Accelerators with Spark-on-Kubernetes
Spark 3+ supports accelerators such as GPUs as custom resources (i.e., accelerator-aware scheduling).
ResourceProfile: allows the user to specify executor and task requirements for an RDD that will get applied during a stage.
Dynamic allocation: dynamically scales cluster resources based on workload.
Previous work re. dynamic scaling:
Spark: Graceful Executor Decommissioning
Move data to different node before preemption.
Spark: Locality
Source: https://stackoverflow.com/a/59152772
Infrastructure-as-code (IaC)
Workflow and Challenges
Pricing & Preemption Monitor
Time Estimation
Intelligent Requirements
Proof of Concept
Since estimation and integration is complex,
we mostly deal with the optimization part here.
Recall that a Spark workflow can be represented as a DAG \((V, E)\).
Let \(I\) be the set of possible instance types.
Assume that we have the following estimators:
We then want to find a mapping \(\theta: E \to I\) that minimizes both
Network Costs?
It seems that the objectives above can be optimized via a greedy algorithm.
This is because we have yet to consider network costs \(f_n\) and \(g_n\).
A Simplified Model
We can then proceed to optimize this via
the multi-objective genetic algorithm NSGA-II.
Tl;dr
Automatically provision workers on the cloud w/ accelerators and spot instances to ensure fast large-scale data processing at a low cost.
Q/A?
Further Reading