Overview of Ongoing Work in IDT

# Overview of Ongoing Work in IDT The goal of this document are to: - Give Committee members some insight into how we plan out our work for the current and upcoming semmesters. - provide an overview of the different projects that are being worked on in IDT # Current approach to IDT-dev work planification  Overview of the team org structure: - Team Manager: **Frederic Osterrath** (@fosterrath-mila) - Project manager: **Mr Nuage** (shared role between Frederic+Researchers, will eventually be a single person)  - Researchers: Xavier Bouthillier (@bouthilx) and Guillaume Alain (@gyom) - Software Architecture: Arnaud Bergeron (@abergeron) - Devs: - Olexa Bilaniuk (@obilaniu) (Also part of the Infrastructure team in IDT-IT) - Olivier Breuleux (@breuleux) - Steven Bocco (@notoraptor) - Pierre Delauney (@Delaunay) - Satya Ortiz-Gagné (@satyaog) - Bruno Carrez (@nurbal) - Soline Blanc (@soline-b) - Fabrice Normandin (@lebrice)  We have identified that many of our activities and responsibilities fall under 4 broad categories (with some overlap): |category|some examples of projects| |--------|--------| |better hardware| benchmarking hardware for purchases, applying to RAC for CC | |better software| orion, mila tools, wandb integration | |analytics about compute resources| clockwork, SARC | |direct help to researchers| help on Slack, docs.mila.quebec, office hours, paperoni | After cataloguing the main ongoing projects, the minor side-projects, and identifying certain missed opportunities, we have shifted our priorities in order to systematically go after low-hanging fruits with higher impact (ex: less long-term software development and more focus on "how to get started quickly with your research at Mila"). We also emphasized more flexibility for devs to work in "sprints" organized by projects "owners" and scheduled ahead of time. We organize the year around 3 semesters of 4 months each, planning one semester at a time while anticipating the work that will need to be done later. Here is a step-by-step description of the planning process. 1. The researchers (@bouthilx and @gyom) determine which projects or new opportunities should be prioritized during the next session. 1. All "candidate projects" are documented on an internal Confluence page, something like this: https://mila-iqia.atlassian.net/wiki/spaces/IDT/pages/2103148572/Mila+Datamodules ![](https://i.imgur.com/MjuMejE.png) 3. Candidate projects are reviewed by @bouthilx and @gyom, who also consult the project owners to get a sense of the potential amount of work (person*day) involved in each project  5. Once the researchers (@bouthilx and @gyom) have aggregated all this information, identified dependencies between projects, etc., they then pass all of this information over to *Mr Nuage* (currently mostly Frederic) 6. *Mr Nuage* (mostly Frederic atm) sends out a Google Form, where each dev enters their preference in terms of which project they wish to work on. 7. Given all this information, The Project Manager (Frederic for now) creates a schedule of the allocations of each dev for the semmester. This also takes into account the availabilities of the all the devs (vacations, etc): ![](https://i.imgur.com/Om3wym5.png) During the semester when the work is done, sprints are planned by project owners and coordinated in part by *Mr Nuage*, @bouthilx and @gyom. We use Jira to break down sprints into issues and we track the time spent in order to stay on target for the time allocated to each project. Some projects are spread througout the semester (ex: help on Slack), but it's still useful to be able to keep count of how much person*day was spent.  # Ongoing projects The breakdown of the 2023 Winter semester looks like the following, with the person*day counts in parentheses (subject to minor adjustments). This graph highlights how much efforts are spent in each broad category, but also we can see that many projects fall into two categories at the same time. ![](https://i.imgur.com/LwHyMqp.png) ## milatools Repo link: https://www.github.com/mila-iqia/milatools Status: Public, moderately well tested, widely used Objective: Make it easier for researchers to connect and use the Mila cluster Related work this semester: - Adding documentation and unit tests - Facilitate connecting to compute jobs - Fix bugs with the `mila code` command, which makes it easy to connect a VSCode window to a Compute node on the cluster. ## Orion - Repo link: https://www.github.com/epistimio/Orion - Status: Public, well tested, widely used - Objective: Hyper-Parameter Optimization framework. - Related work this semester: - Orion Dashboard : (todo: add description here) - Most Active devs: - Xavier (@bouthilx): Project owner - Steven (@notoraptor): Dashboard - Pierre (@Delauney): lots of varied contributions - Fabrice (@lebrice): not much this semmester ## Milabench - Repo link: https://www.github.com/mila-iqia/milabench - Status: Public, Tested - Objective: Suite of benchmarks used to evaluate new hardware - Most active devs: Olivier (@breuleux), Arnaud (@abergeron), Pierre (@Delauney) ## SARC: Supervision et Analyse des Ressources de Calcul - Repo link: N/A (Private and quite sensitive) - Status: Private, fresh - Objectives: - Develop internal tools to analyze and monitor resource utilization on the Mila / DRAC / (PAICE) clusters. - Most active devs: Bruno Carrez (@nurbal), Olivier (@breuleux), Guillaume (@gyom), Xavier (@bouthilx) ## Clockwork - Repo link: https://github.com/mila-iqia/clockwork - Status: Private (beta soon to be available) - Objective: Provide a web front-end and a REST API to allow all researchers to retrieve information about all compute clusters and to visualize utilization in a convenient way. - Most active devs: Soline (@soline-b), Guillaume (@gyom), Arnaud (@abergeron) ## Paperoni - Repo link: https://github.com/mila-iqia/paperoni - Status: Public, quite used by Mila Staff! - Objective: Easily collect publications from our researchers and generate HTML or reports from them. - Most active devs: Olivier (@breuleux) ## mila_datamodules - GitHub Repo: https://www.github.com/mila-iqia/mila_datamodules - Status: Public, yet not yet "properly released", no userbase (afaik) - Objective: Create datamodules (dataset + dataloader) optimized for the Mila and other SLURM clusters. - Work this semester: - Create optimized preprocessing routines for all Torchvision datasets (ImageNet most importantly). - Facilitate loading datasets with HuggingFace - Most active devs: Fabrice (@lebrice) ## Minimal Code Examples - GitHub Repo: https://www.github.com/mila-iqia/mila-docs - Status: N/A - Objective: Give our users some ready-to-use examples demonstrating good practices / useful SLURM features / etc. - Work this semester: - Some code examples are currently being added by Olexa. More will be added later during a Sprint week. ... - Most active devs: Olexa (@obilaniu), Fabrice (@lebrice), (... TBD) ### Research Templates (Lightning-hydra-template fork) - Repo link: https://www.github.com/lebrice/lightning-hydra-template - Status: Private-ish (Not yet hosted on the Mila-iqia GitHub org) - Objectives: Create a ready-to-use, high-quality research template that follows the best practices for ML software as well as for working with the Mila Cluster.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.