# Introduction
Welcome to your project here at the DISCO Group. This document serves as a guide throughout your journey in completing your Bachelor or Master thesis, or your semester project. Here, you will find resources, guidelines, and expectations that will help you navigate and successfully complete your project.
## Starting the project
For starting the project you need to register yourself in the TIK database: [Register here](https://tik-old.ee.ethz.ch/db/public/tik/?db=students&form=form_enter_legi).
I will create a GitLab repository for you on our internal system. At the end of your project your code, presentation and report has to be pushed to this repository.
You need to login to [ETH's GitLab](https://gitlab.ethz.ch/) and send me your username.
## Useful Resources
Here are some resources that you might find helpful:
- **Guides**:
- [Guide to the compute cluster](https://hackmd.io/hYACdY2aR1-F3nRdU8q5dA). Note that you need to be added to have access. I will do that if necessary for your project. Otherwise, use Google Colab (see below).
- [Presentation and report guidelines](https://disco.ethz.ch/courses/seminar/GreatScientificPresentations.pdf)
- **Useful Links**:
- [Registering at the TIK group's database](https://tik-old.ee.ethz.ch/db/public/tik/?db=students&form=form_enter_legi)
- [DISCO's ETH GitLab](https://gitlab.ethz.ch/disco-students)
- [DISCO thesis template](https://disco.ethz.ch/misc/templates/DISCO_Thesis_Template.zip)
- [Presentation template PowerPoint](https://disco.ethz.ch/misc/templates/disco-template.potx)
- [Presentation template LaTeX](https://disco.ethz.ch/misc/templates/LaTeX-Beamer-Template.zip)
- [Python Style Guide](https://google.github.io/styleguide/pyguide.html) - I *strongly* urge you to follow this guide -- especially Sections 2.2 Imports, 2.7 Comprehensions & Generator Expressions, 2.21 Type Annotated Code, 3.4-3.6, 3.8 Comments and Docstrings, 3.16 Naming, 3.18 Function length, and 4 Parting Words.
- [ISG's Cluster Guide](https://computing.ee.ethz.ch/Services/) - You should try to follow the advice given here
- **Tools and Technologies**:
- [ChatGPT](https://chat.openai.com/)
- [Claude.ai](https://claude.ai/) - I have found Claude to be better at scientific writing, but ChatGPT gives more usage for free.
- [GitHub Student Developer Pack](https://education.github.com/pack)
- [GitHub Copilot](https://docs.github.com/en/copilot) - free for students
- [Cursor code editor](https://www.cursor.com/) - Anecdotally has better integration than copilot _but_ you eventually will want to pay for the "pro" version (we cannot help with this).
- [Google Colab](https://colab.research.google.com/)
- [Grammarly](https://app.grammarly.com/)
- [Apptainer/Singularity](https://apptainer.org/docs/user/main/) - A package manager which I have found to work better than conda
Remember, these resources are here to support you, but your initiative and research are crucial. If you have to do any coding then I strongly recommend that you use GitHub's Copilot (free for students), ChatGPT or another generative tool.
I also recommend you use ChatGPT to improve your writing. Remember though that I do not want to read something written by ChatGPT, it has to be your work. You can use these tools to assist you, but they should not replace you.
We do have access to a cluster with GPU resources, but if Google Colab suffices for your project then that may be the easiest option for experiments. If you need to do heavy computations, then we need to set you up with the cluster (I just need to write an e-mail, so it is easy).
## Useful Tips
The idea of this section is that I will over time add any tips I know or hear of which may or may not be useful for you during your project/thesis.
- **Writing**:
- Start writing as early as possible. This might be hard to justify, but it can help a lot towards the end. There will be many things you cannot write when you are only halfway through, but for instance the data section would likely be fixed when you are a third of the way through the project.
- Do not exhaust yourself during a writing session. If you know you will come back to it within days then leave something (do make notes so you can easily resume). This will make the next writing session much easier to start, and you may prevent staring at your screen with writer's block.
- Use tools like Claude.ai to suggest improvements to your text. Give it a few pages and a good prompt (ask me if you need suggestions). This can quickly give you a lot of feedback.
- **Tools**:
- Also mentioned above, but many seem to miss it. You can get GitHub Copilot for free through the student package. Give it a try! It can make coding much easier.
- Use tools like ChatGPT, Gemmini, Claude, etc. They are good to get some standard stuff done, but you need to ask intelligently. However, it can get you a lot of boilerplate code quickly.
- Use logging tools like [Weights & Biases](https://wandb.ai/site) and log everything that makes sense. It is often better to log something and not use it than not having logged it as rerunning experiments can be expensive.
- **Best practises**: (read: Required practices for your sake -- and my sake)
- BACKUP YOUR (RAW) RESULTS! Is has often happened that some file was lost or something else happened. Save your results in jsons, csv or some other format and create a copy on GitLab (though try to avoid file size >>100MB).
- Create a notebook which yield your tables and figure (required to have at the end).
- Clearly define an experiment and format the results afterwards. This is connected to starting the writing process early, but once you have completed an experiment then put the results (tables or figures) in your report.
- Always repeat experiments to establish the uncertainty of measured variables.
## Overall Project Expectations
Your project will be evaluated on various aspects, which include but are not limited to:
- **Report**: A comprehensive document detailing your methodology, experiments, results, and discussion. This can be in the form of a paper or blog post depending on the results. A blog post will include a paper/report suitable for arxiv.
- **Midterm presentation**: (_Only for a master thesis_) A presentation showcasing your results after 2-3 months.
- **Final Presentation**: A conclusive presentation summarizing your entire project.
## Weekly Expectations
To ensure consistent progress and timely feedback, the following weekly tasks are expected:
- **Weekly Meeting**: A regular scheduled meeting to discuss progress, challenges, and next steps. The meetings are by default in-person, but can if necessary be done online over Zoom (or Microsoft Teams).
- We reserve 30 minutes for BT and ST and 1 hour for MT. The meetings are in person by default but can, if necessary, be conducted online over Zoom (or Microsoft Teams).
- Please make sure to use the time efficiently and effectively. We often have other meetings and, therefore, have to go once the time is up.
- We encourage you to bring some slides or a document (it need not be polished) where you have gathered relevant information (plots, tables, other results, code, etc.).
- **Meeting Summary**: A brief document summarizing the discussion points, decisions, action items from the meeting, and when we are meeting next time (if we are meeting irregularly or cancelled a meeting).
- If we have not settled on a date for a next meeting, but instead agree to, e.g., determine it the week after, then you are in charge of reaching out with a couple of specific proposals.
- **Work log**: Keep a work log with what you have been working on that is automatically shared with me (E.g. Overleaf, Google Docs, etc.). Spend five minutes at the end of a day to write down what you did. This will also help you when you have to write the report at the end.
- The work log needs at a minimum to have weekly updates after each meeting.
## Code and Material Presentation
When presenting code or materials during meetings, please ensure the following:
- **Code Readability**: Your code should be clean, well-commented, and adhere to standard coding practices.
- **Problem Solving First**: Before asking for help, make sure to attempt solving issues on your own. This includes debugging and searching for solutions. If Python is throwing dimension errors when you multiply two arrays, then I expect that you to be able to solve them on your own.
- **Focused Questions**: When seeking assistance, ask specific questions and provide context to ensure effective guidance.
Remember, I am here to help you, and these items are meant to ensure that our time is as efficient as possible.
## Communications
I prefer to communicate by e-mail and I aim to answer within a day and usually within hours. If something is urgent, and I notice the e-mail, then I answer as soon as possible.
If you prefer other means then we can also use Microsoft Teams or another solution.
# Presentations and report guidelines
The most important reference is to adhere to the points made in [Presentation and report guidelines](https://disco.ethz.ch/courses/seminar/GreatScientificPresentations.pdf) and my comments under Figures.
## Presentations
The final presentation (and the midterm if relevant) should be ~15 minutes. It is okay for it to be 16 minutes, but better with 14 minutes. It is important that you ensure the presentation is within this time-frame.
After you have finished there will be a Q&A for around 15 to 30 minutes. This gives people a chance to better understand what you have worked on.
Other people from the group and the professor will attend the presentation. You should therefore expect at least four people (incl. me and Prof. Wattenhofer) in the "audience".
I will always do at least one mock presentation with you before you have to present for the group.
## Report
As mentioned above the format of the report is fairly open, and depending on your results I might suggest you instead write a post or a paper. If I am asking you to write a blog post, then it should include a report suitable for arxiv, and the blog post should be a less detailed version of the report to draw in people.
## Figures
Good figures help you make your point clear and communicate results effectively, however, there are several pitfalls I often see students fall into.
Making good figures is an art, but if you try to steer clear of these pitfalls then your figures will surely improve.
If you are in doubt then ask me!
- Labels
- Your axes should always include a meaningful label. Make sure to use labels that are natural and follow standard practice in related work.
- It should not require a lengthy explanations to explain your labels.
- Font size
- The text in your figure (axis labels, tick labels, etc.) should be *readable*. A good rule is that the font size in figures match the font size in surrounding text.
- Color and legends
- Use meaningful colors and label your curves. Make sure the colors are distinct and nice to look at. The default palettes in matplotlib and MATLAB are defaults for a reason :-)
- Amount of content
- Limit the amount of curves/colors you have in a plot.
- Grid lines
- Add grid lines when relevant - almost always is. In bar plots it makes sense to just have horizontal lines.
- Axis scale and limits
- Always use sensible axis limits. If the data can only span the interval [0, 10] then you should not set the limit to [0, 100]. If you plot the accuracy of your model then use the interval [0, 1] (or [0 %, 100 %]). In some cases it makes sense to plot accuracy on a narrower interval; do not do this to make your model look a lot better, but to help the reader.
- If you make two figures that should be compared then use the same axis limits. This makes comparisons much easier.
- It can often help to make your one of both of your axes log-scaled if your data spans several orders of magnitude.
- Format
- Make sure your figures are saved as vector graphics
- Self contained captions
- A figure should be (mostly) self-contained in that the caption should provide the necessary information to understand the figure (also for tables!).
Here is a decent guide with some information and examples of good and bad figures: [Good and Bad Data Visualization](https://www.oldstreetsolutions.com/good-and-bad-data-visualization)
Some examples from Google:


# Finishing up
When you have finished the project then I need you to push the newest code to the GitLab repository for your project. Additionally, you need to push a copy of the presentation and report (as PDFs).
Additionally, you need to upload the source code for the report as a zip file and a Python notebook (.ipynb) that recreates all figures in the report (thus you must also have the experiment results). This must be present on the day of your presentation.
## Submission
Once it is time to submit your report you should do the following:
- Get a pdf version of your report
- Make sure the first page has the correct project type (i.e. Bachelor’s Thesis)
- For DSL lab we put “Distributed Systems Lab Report” instead of Semester Thesis
- Fill in the form for declaration of originality, the title must match exactly! And sign it.
https://ethz.ch/content/dam/ethz/main/education/rechtliches-abschluesse/leistungskontrollen/declaration-originality.pdf
- Send an email with BOTH files attached to:
Me, the supervisor
The professor wattenhofer@ethz.ch
The assistant beatfu@ethz.ch
(in case of D-INFK: The studies office, i.e. denise.spicher@inf.ethz.ch )
# A successful project
Positive results are not necessary for a good project. You do not need to be discouraged if your models do not beat the benchmark. We are more looking at your methodology, ideas, initiative, and ability to present when evaluating you.
However, a successful project should ideally still finish in us writing a paper.
---
I am excited to see what you will accomplish!