CS 591 Sys-Net

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Week2: Some interesting paper from ATC/OSDI 2022 - Xudong Sun

Automatic Reliability Testing For Cluster Management Controllers

Fuzz test

DuoAI

Ivy vs CoQ

What is the advantages of Ivy compared to other verification language like CoQ? Because I have heard of CoQ but Ivy is new to me.

RESIN: Memory Leak

Previous Tech

static
dynamic

人工智障

Fair idea - 首先内存如果慢慢上涨，那多半有问题。其次，在其他机器上运行没问题，那一定是你的问题。

误报率有点高

Debugging the OmniTable Way

Idea: replay Insight: Lazy materialization

SteamDrill introduces lazy materialization as a solution. Rather than materializing an OmniTable during execution, SteamDrill uses deterministic record and replay [10] to cap- ture a log of non-deterministic inputs to the execution. The system uses the log to generate OmniTable state on-demand by instrumenting and re-executing the original execution as necessary to resolve debugging queries. Delaying OmniTable materialization allows SteamDrill to filter OmniTable data before extracting state instead of afterwards.

Week4: OSDI / ATC

Week 5: VLDB

Very Large DataBases

Engines
Graphs
ML,AI

4 Papers Today

Netherite: Efficient Execution of Serverless Workflows

Write Buffer

DBOS: a DBMS-oriented operating system

Distributed OS by DBMS

everything is a file -> everything is a table

Database Operating System

straw
- rude test
wood
- run one app
brick
- finally

scheduler: context switch

file system: page change, pointers, metadata,

e.g. scheduler: FIFO with sql:

`order by`

和几年前那个 excel 操作系统有的一拼

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

S: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks

Its common to use Graph database to do training, such as use nebura for GNN. The innovation of this paper should focus on how to avoid communication. The techniques this paper used are

“Despite the promising performance, the major challenge that limits the adoption of GNNs to large-scale graphs lies in the inability to utilize all data in nite time and the scalability of the algorithm itself.” (Peng 等。, 2022, p. 1937) (pdf)

Week 6: WISC Storage from Prof. Andrea Arpaci-Dusseau and Prof. Remzi Arpaci-Dusseau ADSL Lab

Scale and Performance in a Filesystem Semi-Microkernel

uFS

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Can Applications Recover from fsync Failures?

Week 7:

Starvation in End-to-End Congestion Control

Linux TCP: Loss-based congestion control
Delay-convergence congestion control
- delay bias suffer from starvation
- unfair resource cause starvation
Starvation Phenomenon: 不公平的带宽来源于不一样的相应
Delay
- AP, WIFI : non-congestive
Delay-convergencnt CCAs have similar delays for difference link rates
- Delays stands for different link rates, you cannot use RTT to do congestion control
Fixs
- use ECN
- specify the link rate, like QoS (set net speed)

NeuroScaler: Neural Video Enhancement at Scale

Current Solution:

Scale down to low resolution
superresolution on low res

How to acceleraye neural enhancement
How to schedule it on a cluster of gpu instance

Observation

Choose good frame (key frame) by resolve dependencies
Maximize to impact of residual
- Good for superresolv

RF-Protect: Privacy against Device-Free Human Tracking

Indoor Radar-tracking

FMCW for distance
Antenna Array for angle (phase)

{Distance,Angle} Spoofing

Let ghost to do spoofing

ML Part

How to create reflection?

Move in a realistic way

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Week 8 Mobisys

AutoCast: scalable infrastructure-less cooperative perception for distributed collaborative driving

Week 9

Determining non-deterministic events for better idle state prediction

Background

Idle states

Target residency
Exit Latency

Choose idle right states saves energy and performance

Ticked systen & Tickless systems

If a process is waiting for IO, then sleep it.

TEO Governor
- recompute idle duration
- measure the accuracy (Hit / Miss / Early Hit)

Is more history better

No.

What is the prediction state

For differrent system or user behavoiur, do we need to adjust learning rate as well?

Results

Being less wrong is better than being more wrong

Challenges

Learning Rate
Variance across architecture
- This is based on IBM PowerPC. X86 has reported no significant improvement.

CS 591 Sys-Net

Week2: Some interesting paper from ATC/OSDI 2022 - Xudong Sun

Automatic Reliability Testing For Cluster Management Controllers

DuoAI

RESIN: Memory Leak

Previous Tech

Debugging the OmniTable Way

Week4: OSDI / ATC

Week 5: VLDB

Netherite: Efficient Execution of Serverless Workflows

DBOS: a DBMS-oriented operating system

S￿￿￿￿￿: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks

Week 6: WISC Storage from Prof. Andrea Arpaci-Dusseau and Prof. Remzi Arpaci-Dusseau ADSL Lab

Scale and Performance in a Filesystem Semi-Microkernel

Can Applications Recover from fsync Failures?

Week 7:

Starvation in End-to-End Congestion Control

NeuroScaler: Neural Video Enhancement at Scale

Current Solution:

Observation

RF-Protect: Privacy against Device-Free Human Tracking

Indoor Radar-tracking

{Distance,Angle} Spoofing

ML Part

Week 8 Mobisys

AutoCast: scalable infrastructure-less cooperative perception for distributed collaborative driving

Week 9

Determining non-deterministic events for better idle state prediction

Background

Idle states

Ticked systen & Tickless systems

Is more history better

Results

Challenges

Read more

ECE-422

CS-598 HPN

ST-CS120: Computer Network

ST-CS130: Operating Systems

S: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks