HackMD - Collaborative Markdown Knowledge Base

# ANSY 2022 1. écrire un programme qui génère un process zombie pendant au moins 2min, zombie que je pourrai observer avec un htop par exemple avec l'état en Z # ANSY 2 subject deadline: 9 dec 9h ## creating troubles In the first part of the subject, you will be asked to create various situations on your linux-based OS, observe them via netdata, export the netdata dashboard with the given situation, and submit it, along with the source code that created this situation. Here are the situation you're being asked to create: easy: 1. High system-time CPU usage (above 60%)* 2. High load average (slightly above the number of CPU cores)* 3. A process that uses all the RAM, forcing the OOM-killer to run. - Make sure that the OOM-killer _will_ kill this process and not another one 4. High iowait* 5. High nice usage. - What is the lowest nice value that will make the process count as `nice` ? difficult: 6. High load average (way above the number of cores) but **no CPU usage** (below 30%)* 7. High dirty memory (few hundreds of MiB) \* The situation shall be held for at least ~10s and no more than a minute for it to be considered valid ### Netdata To run netdata, you can install it by following instructions from https://github.com/netdata/netdata#get-netdata To export a situation, first select a time selector that presents the event well. It shall show the normal situation for a few dozens of secs before, the event you're creating, and a few dozens of secs after when it's back to normal. Please keep it quite short, to avoid sending files too big. Then, to export the selection, on the top middle/right use the `export snapshort` button. Please choose a granularity of `1 second`, and keep default compression. You can add in comment the situation you're trying to show, or its number. # ANSY 3 subject deadline: 9 dec 9h ## solving troubles ### Without rr Context: > I have created an app which performs _very important_ computation. It's super important, needs to run a lot, but unfortunately takes significant amount of RAM. > > Using you knewly aquired knowledge, can you find a system way to reduce this RAM usage ? > Unfortunately I can't provide you the source code, as it contains very important domain-specific secrets. > Because of this, I can't either let you use debugging tools on the software. > It is strictly forbidden to run gdb, strace or rr on the binary > > Please find a solution to make it use less RAM **without** modifying the app itself Provide a report on how you could improve the situation with your knowledge. Include the actual RAM usage this app makes by default, and how much it uses after optimization The app can be found on zarak.fr/resources/analyze ### With rr There is an awesome team of people that created a tool to visualize a mandelbrot fractal, and interact with it. However, the tool was lacking some essential features, so I added them. As I'm not a good C developper, it may have introduces a few bugs and regressions. Your task will be to: 1. Compile the mandelbrot project, and run it 2. Observe the slow performance, and provide an explanation for such slow behavior based _only_ on what you can see on system metrics (no strace, no gdb, no source code reading for now). - You can use netdata to help you out - Try to observe all metrics we've discussed, but them in correlation and propose a likely explanation of the app's behavior and how it explains the metrics you're seeing 3. Fix the performance issue you observed 4. Trigger the 3 bugs that exists in this version while recording with rr, and use rr + gdb to track them down and correct them - The bugs are quite easy to find, read the `help` output of the program and try its features. - It will be obvious to you when you've triggered a bug, don't worry As for the submission, I'm expecting a text report for point 2, and `.patch` you've generated for points 3 and 4. You can use `git diff` for this _( if you wanna feel like a true kernel developper, send me the patch with `git-send-email`)_ The app can be found on zarak.fr/resources/mandelbrot.tar.gz ## Important notes - Cheating will be heavily sanctionnized (I'm even creating a new word to emphasize on sanctioned). Don't try me. - I will try my best to be available to answer your questions. The earlier you start the project, the most likely I'll be to answer you quickly enough for the deadline - You are allowed to re-use code from a previous class or to use code found on internet for `High load average` as a base layer for this task as it's complex . However, if you do so, please provide clearly the source - Be creative, the project is quite open for you to express yourself - If you write some cryptic-ish bash script, I'll be happy # Ansy TP 4 **Deadline: 15th january 23:59** This TP will make you write an eBPF program to monitor processes scheduling latency. You will have to use BCC and write both a python script using BCC python API and an eBPF script (embedded in the python script). The eBPF script will collect metrics about scheduling latency for each non-kernel process. The latency will be collected in ns, but needs to be displayed to the user in µs. The communication between the eBPF program and the python script will done via an eBPF map, to allow async collection of said metrics in python. The python script will: - Compile the eBPF script and load it - Do an infinite loop: - sleep 1s - read the eBPF map - format the data: - Aggregate the values of each PID by TGID - Get the TGID's argv from /proc (think about implementing some cache maybe for this) - Convert the units to µs if necessary - display it on stdout The eBPF script will: - Compute each individual latency of scheduling - Make some per-PID aggregation of the latency (mean scheduling value) - Also collect the maximum scheduling latency per-PID - Sum the number of scheduling events per-PID - For each PID, get the name of the task: - COMM name - exe file name - Make those data available to userland via an eBPF map The display format shall be the following: `print(f'{tgid:<7} {average_us:>8} {max_us:>8} {number_of_events:>6} {comm:>16} {exe:>16} {argv}')` Also add a delimiter between each sets of events every second. ### Useful tips - Think about process time to live, and how to handle the death of a process - To get you started, you can use the runqslower example provided by BCC as a layout. Don't bother with the TRACEPOINT, remove this part altogeter and keep only the eBPF part. - https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md - https://github.com/iovisor/bcc/blob/master/docs/tutorial.md