owned this note
owned this note
Published
Linked with GitHub
## Information about the submission
The submission will be done by email at cyril@cri.epita.fr, tag `[ANSY]`.
You can submit each part individually.
There is no restriction on the language used or the submission architecture, but please make it clear, and readable. The easier it is for me to correct you, the more likely I will be to fully appreaciate your work.
Recommended languages however includes:
- python
- bash
- c
I will also accept "normal" languages for the kind of tasks I'm asking you to do, but I reserve myself the right of refusing a submission if it's an obvious troll like brainfuck or ook!.
In doubt, ask me :)
### Note
Please track the time you're spending on each part and provide the information in the submission. This will allow me to better estimate the workload of those subjets and adapt it eventually for next year
# ANSY 1 subject
1. écrire un programme qui génère un process zombie pendant au moins 2min, zombie que je pourrai observer avec un htop par exemple avec l'état en Z
2. Avec strace, trouver le syscall qui est executé plus de 10k fois par le binaire fourni, et me rapporter combien de fois il est appelé
3. écrire un programme qui permet d'executer le binaire fourni et faire en sorte qu'il fonctionne, c'est à dire qu'il retourne 0 en exit code et qu'il affiche la phrase de fin. Pour ça, il va falloir utiliser strace pour comprendre son fonctionnement, et faire en sorte qu'il puisse s'executer comme il devrait. Il y a 2-3 "pièges", en tout cas du bruit, pour essayer de se mettre dans une condition plus "réelle".
Je vous demanderai aussi de conserver votre bash history (ou zsh history ou équivalent) pour me montrer les commandes et la réflexion que vous avez pu avoir pour ce TP
Le binaire à analyser est disponible sur https://zarak.fr/resources/straceme
Je vous demande de n'utilisser que strace pour l'analyse et pour le comprendre
# ANSY 2 subject
deadline: 9 dec 9h
## creating troubles
In the first part of the subject, you will be asked to create various situations on your linux-based OS, observe them via netdata, export the netdata dashboard with the given situation, and submit it, along with the source code that created this situation.
Here are the situation you're being asked to create:
easy:
1. High system-time CPU usage (above 60%)*
2. High load average (slightly above the number of CPU cores)*
3. A process that uses all the RAM, forcing the OOM-killer to run.
- Make sure that the OOM-killer _will_ kill this process and not another one
4. High iowait*
5. High nice usage.
- What is the lowest nice value that will make the process count as `nice` ?
difficult:
6. High load average (way above the number of cores) but **no CPU usage** (below 30%)*
7. High dirty memory (few hundreds of MiB)
\* The situation shall be held for at least ~10s and no more than a minute for it to be considered valid
### Netdata
To run netdata, you can install it by following instructions from https://github.com/netdata/netdata#get-netdata
To export a situation, first select a time selector that presents the event well. It shall show the normal situation for a few dozens of secs before, the event you're creating, and a few dozens of secs after when it's back to normal.
Please keep it quite short, to avoid sending files too big.
Then, to export the selection, on the top middle/right use the `export snapshort` button.
Please choose a granularity of `1 second`, and keep default compression.
You can add in comment the situation you're trying to show, or its number.
# ANSY 3 subject
deadline: 9 dec 9h
## solving troubles
### Without rr
Context:
> I have created an app which performs _very important_ computation. It's super important, needs to run a lot, but unfortunately takes significant amount of RAM.
>
> Using you knewly aquired knowledge, can you find a system way to reduce this RAM usage ?
> Unfortunately I can't provide you the source code, as it contains very important domain-specific secrets.
> Because of this, I can't either let you use debugging tools on the software.
> It is strictly forbidden to run gdb, strace or rr on the binary
>
> Please find a solution to make it use less RAM **without** modifying the app itself
Provide a report on how you could improve the situation with your knowledge. Include the actual RAM usage this app makes by default, and how much it uses after optimization
The app can be found on zarak.fr/resources/analyze
### With rr
There is an awesome team of people that created a tool to visualize a mandelbrot fractal, and interact with it. However, the tool was lacking some essential features, so I added them. As I'm not a good C developper, it may have introduces a few bugs and regressions.
Your task will be to:
1. Compile the mandelbrot project, and run it
2. Observe the slow performance, and provide an explanation for such slow behavior based _only_ on what you can see on system metrics (no strace, no gdb, no source code reading for now).
- You can use netdata to help you out
- Try to observe all metrics we've discussed, but them in correlation and propose a likely explanation of the app's behavior and how it explains the metrics you're seeing
3. Fix the performance issue you observed
4. Trigger the 3 bugs that exists in this version while recording with rr, and use rr + gdb to track them down and correct them
- The bugs are quite easy to find, read the `help` output of the program and try its features.
- It will be obvious to you when you've triggered a bug, don't worry
As for the submission, I'm expecting a text report for point 2, and `.patch` you've generated for points 3 and 4. You can use `git diff` for this
_( if you wanna feel like a true kernel developper, send me the patch with `git-send-email`)_
The app can be found on zarak.fr/resources/mandelbrot.tar.gz
## Important notes
- Cheating will be heavily sanctionnized (I'm even creating a new word to emphasize on sanctioned). Don't try me.
- I will try my best to be available to answer your questions. The earlier you start the project, the most likely I'll be to answer you quickly enough for the deadline
- You are allowed to re-use code from a previous class or to use code found on internet for `High load average` as a base layer for this task as it's complex . However, if you do so, please provide clearly the source
- Be creative, the project is quite open for you to express yourself
- If you write some cryptic-ish bash script, I'll be happy
## Ansy TP 4
**Deadline: 15th january 23:59**
This TP will make you write an eBPF program to monitor processes scheduling latency.
You will have to use BCC and write both a python script using BCC python API and an eBPF script (embedded in the python script).
The eBPF script will collect metrics about scheduling latency for each non-kernel process. The latency will be collected in ns, but needs to be displayed to the user in µs. The communication between the eBPF program and the python script will done via an eBPF map, to allow async collection of said metrics in python.
The python script will:
- Compile the eBPF script and load it
- Do an infinite loop:
- sleep 1s
- read the eBPF map
- format the data:
- Aggregate the values of each PID by TGID
- Get the TGID's argv from /proc (think about implementing some cache maybe for this)
- Convert the units to µs if necessary
- display it on stdout
The eBPF script will:
- Compute each individual latency of scheduling
- Make some per-PID aggregation of the latency (mean scheduling value)
- Also collect the maximum scheduling latency per-PID
- Sum the number of scheduling events per-PID
- For each PID, get the name of the task:
- COMM name
- exe file name
- Make those data available to userland via an eBPF map
The display format shall be the following:
`print(f'{tgid:<7} {average_us:>8} {max_us:>8} {number_of_events:>6} {comm:>16} {exe:>16} {argv}')`
Also add a delimiter between each sets of events every second.
### Useful tips
- Think about process time to live, and how to handle the death of a process
- To get you started, you can use the runqslower example provided by BCC as a layout. Don't bother with the TRACEPOINT, remove this part altogeter and keep only the eBPF part.
- https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md
- https://github.com/iovisor/bcc/blob/master/docs/tutorial.md
https://app.wooclap.com/DJSZCU?from=instruction-slide