CS 591 Sys-Net

Week2: Some interesting paper from ATC/OSDI 2022 - Xudong Sun

Automatic Reliability Testing For Cluster Management Controllers

Fuzz test

DuoAI

Ivy vs CoQ

  • What is the advantages of Ivy compared to other verification language like CoQ? Because I have heard of CoQ but Ivy is new to me.

RESIN: Memory Leak

Previous Tech

  • static
  • dynamic

人工智障

Fair idea - 首先内存如果慢慢上涨,那多半有问题。其次,在其他机器上运行没问题,那一定是你的问题。

  • 误报率有点高

Debugging the OmniTable Way

Idea: replay Insight: Lazy materialization

SteamDrill introduces lazy materialization as a solution. Rather than materializing an OmniTable during execution, SteamDrill uses deterministic record and replay [10] to cap- ture a log of non-deterministic inputs to the execution. The system uses the log to generate OmniTable state on-demand by instrumenting and re-executing the original execution as necessary to resolve debugging queries. Delaying OmniTable materialization allows SteamDrill to filter OmniTable data before extracting state instead of afterwards.

Week4: OSDI / ATC

Week 5: VLDB

Very Large DataBases

  • Engines
  • Graphs
  • ML,AI

4 Papers Today

Netherite: Efficient Execution of Serverless Workflows

Write Buffer

DBOS: a DBMS-oriented operating system

Distributed OS by DBMS

everything is a file -> everything is a table

Database Operating System

  • straw

    • rude test
  • wood

    • run one app
  • brick

    • finally

scheduler: context switch

file system: page change, pointers, metadata,

e.g. scheduler: FIFO with sql:

`order by`

和几年前那个 excel 操作系统有的一拼

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

S￿￿￿￿￿: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks

Its common to use Graph database to do training, such as use nebura for GNN. The innovation of this paper should focus on how to avoid communication. The techniques this paper used are

“Despite the promising performance, the major challenge that limits the adoption of GNNs to large-scale graphs lies in the inability to utilize all data in ￿nite time and the scalability of the algorithm itself.” (Peng 等。, 2022, p. 1937) (pdf)

Week 6: WISC Storage from Prof. Andrea Arpaci-Dusseau and Prof. Remzi Arpaci-Dusseau ADSL Lab

Scale and Performance in a Filesystem Semi-Microkernel

uFS

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Can Applications Recover from fsync Failures?

No

Week 7:

Starvation in End-to-End Congestion Control

  • Linux TCP: Loss-based congestion control

  • Delay-convergence congestion control

    • delay bias suffer from starvation

    • unfair resource cause starvation

  • Starvation Phenomenon: 不公平的带宽来源于不一样的相应

  • Delay

    • AP, WIFI : non-congestive
  • Delay-convergencnt CCAs have similar delays for difference link rates

    • Delays stands for different link rates, you cannot use RTT to do congestion control
  • Fixs

    • use ECN

    • specify the link rate, like QoS (set net speed)

NeuroScaler: Neural Video Enhancement at Scale

Current Solution:

  • Scale down to low resolution

  • superresolution on low res

  1. How to acceleraye neural enhancement

  2. How to schedule it on a cluster of gpu instance

Observation

  • Choose good frame (key frame) by resolve dependencies

  • Maximize to impact of residual

    • Good for superresolv

RF-Protect: Privacy against Device-Free Human Tracking

Indoor Radar-tracking

  • FMCW for distance

  • Antenna Array for angle (phase)

{Distance,Angle} Spoofing

Let ghost to do spoofing

ML Part

How to create reflection?

Move in a realistic way

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Week 8 Mobisys

AutoCast: scalable infrastructure-less cooperative perception for distributed collaborative driving

Week 9

Determining non-deterministic events for better idle state prediction

Background

Idle states
  • Target residency
  • Exit Latency

Choose idle right states saves energy and performance

Ticked systen & Tickless systems

If a process is waiting for IO, then sleep it.

  • TEO Governor
    • recompute idle duration
    • measure the accuracy (Hit / Miss / Early Hit)

Is more history better

No.

What is the prediction state

For differrent system or user behavoiur, do we need to adjust learning rate as well?

Results

Being less wrong is better than being more wrong

Challenges

  • Learning Rate
  • Variance across architecture
    • This is based on IBM PowerPC. X86 has reported no significant improvement.