---
author: Marc Evrard
date: 2023
title: "L2-ISD2 (2024-25)"
tags: Edu
---
Info S4: Introduction à la science des données II
========
Year 2024-2025
Teachers
--------
* Yue Ma (CM-TP)
* Marc Evrard (CM-TP)
Prerequisites
-------------
Basic probability and statistics, basic algebra, programming experience in Python
Preparing to
------------
Introduction to Machine Learning (Introduction à l’apprentissage statistique)
Assessments
-----------
100% Continuous evaluation
* Session 1: 30% CC + 70% Project
* Session 2: 100% Improved Project
<!--
Plan
----
Part | Week | Course | Practical
:---:| ------: | -------- | -----------
1 | March 11 | Classification | TP 1: Iris
2 | March 18 | Unsupervised Learning | TP 2: Digits
3 | March 25 | Regression & Evaluat. | TP 3: Ocean
4 | April 1 | Presentations TP 3 | Presentations TP 3
5 | April 8 | ML Projects Checklists | Project
6 | April 15 | Project | Project
6 | April 22 | *Holidays* |
7 | April 29 | Project Presentation (May 3) | Submission Deadline: May 1
-->
Data Scientists Check-List
--------------------------
* Frame the problem and look at the big picture
* Collect the data
* Explore the data to gain insights
* Prepare the data for Machine Learning algorithms 5 Select a model and train it
* Fine-tune your model
* Present your solution
<!-- ---
Questions TP-1
--------------
<iframe class="airtable-embed" src="https://airtable.com/embed/appnj3rDDj4i0djDb/shrNZHi8U93ytzijt" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
-->
---
Project
-------
### Information
<!-- *** For year 2024-25 ***
#### Presentation
* In Engligh
* All group members should participate equaly during the presentation
-->
#### Deadline
Submit your notebook and slides (**only 1 submission per team**) on eCampus the day before the presentation at the latest (**May 1, 23:59**).
#### Recommendation
* Tabular problem (try to avoid NLP or advanced image processing in this class)
* More than 10 features (ideally more than 100)
* More than 1000 instances (ideal 10k to 1M)
* Most problems you have chosen are already solved (e.g., [Codabench](https://www.codabench.org/), [Kaggle](https://www.kaggle.com/))
* Make sure you summarize the results of these solutions in your presentation as state-of-the-art (SOTA)
* And give arguments for choosing your solution (that must be, of course, original)
#### Format to submit
* Notebook (**only 1** per team): Include code and report (in markdown)
* The slides in PDF format (if different from the notebook, **only 1** per team)
Don't forget to include **all team member names** in the Notebook/Slides.
* Include external Python module if used
* Do not include data
* You should include a **link** to the data in the Notebook
* Keep the size of the NB under 20 MB (e.g., avoid using Plotly)
#### Report structure
1. Intro (explanation of the **task** in your own words and **SOTA**)
2. Preprocessing (exploration of the data, cleaning, etc.)
3. Modeling (training)
4. Evaluation
5. Conclusion (what you did, what worked, what didn't, if you had more time, etc.)
#### Evaluation of the project presentation (15 min + 5 min questions)
* Intro et preprocessing (exploration, cleaning) (/5)
* Models (/5)
* Performance evaluation and analysis/explanation (/5)
* Code readability (/5)
* Oral presentation + questions answering (/20, individual grade)
### Schedule
<iframe class="airtable-embed" src="https://airtable.com/embed/appnj3rDDj4i0djDb/shr4QNc36aqX4ssfG" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
<!--
#### Group 1
<iframe class="airtable-embed" src="https://airtable.com/embed/appnj3rDDj4i0djDb/shrkGadMsNSw4163Q" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
#### Group 2
<iframe class="airtable-embed" src="https://airtable.com/embed/appnj3rDDj4i0djDb/shrjQnYMbwj2Kqa7C" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
#### Group 3
<iframe class="airtable-embed" src="https://airtable.com/embed/appnj3rDDj4i0djDb/shrOhyGRJbF5elXRG?viewControls=on" frameborder="0" onmousewheel="" width="100%" height="533" style="background: transparent; border: 1px solid #ccc;"></iframe>
-->
---
References
----------
### Main References
* VanderPlas, J. (2017). *Python Data Science Handbook: Essential Tools for Working with Data.* https://jakevdp.github.io/PythonDataScienceHandbook
* McKinney, W. (2017). *Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython (2^nd^ ed.).* https://github.com/wesm/pydata-book
* Géron, A. (2019). *Hands-on Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (2^nd^ ed.).* https://github.com/ageron/handson-ml2
* Grus, J. (2019). *Data Science from Scratch: First Principles with Python (2^nd^ ed.).*
* VanderPlas, J. (2016). *A Whirlwind Tour of Python.* https://jakevdp.github.io/WhirlwindTourOfPython
<!--
Great references for **machine learning algorithm** theory:
* @russell2003artificial _Artificial Intelligence: A Modern Approach_ (4th ed.)
* @bishop2006pattern _Pattern Recognition and Machine Learning_
-->
### Online References
* *The **Python** Language Reference*:
https://docs.python.org/3/reference/index.html
* *The **Python** Tutorial*:
https://docs.python.org/3/tutorial/
* ***JupyterLab** Documentation*:
https://jupyterlab.readthedocs.io/en/stable/
* ***NumPy** Documentation*:
https://numpy.org/doc/stable/
* ***Matplotlib** Documentation*:
https://matplotlib.org/stable/contents.html
* ***pandas** Documentation*:
https://pandas.pydata.org/docs/
* ***Scikit-learn** User Guide*:
https://scikit-learn.org/stable/user_guide.html
### More Advanced References (ML)
* Bishop, C. M. (2006). *Pattern recognition and machine learning.* Springer.
* (**French**) Archives cours: *IFT 603 - Techniques d'apprentissage* de l'Université de Sherbrooke (Hugo Larochelle)
* http://www.dmi.usherb.ca/~larocheh/cours/ift603_H2015/contenu.html