Welcome to the Book Club "Hands-On Machine Learning with R" - Ch. 9. Decision Trees and Ch. 10. Bagging

# Welcome to the Book Club "Hands-On Machine Learning with R" - Ch. 9. Decision Trees and Ch. 10. Bagging ## Some house rules to make the meeting nice for everyone - Please familiarize with our [code of conduct](https://rladies.org/code-of-conduct/). In summary, please be nice to each other and help us make an **inclusive** meeting! :purple_heart: - The meeting will NOT BE RECORDED but the slides will be shared! - Please list your name in the registry. - Make sure you're in the edit mode (ctrl + alt + e) when trying to edit the file! You'll know you're in the edit mode if the background is black :8ball: - Please keep your mic off during the presentation. It is nice if you have the camera on and participate to make the meeting more interactive. Of course you can have your camera off if you prefer. - If you have questions, raise your hand or write your question in this document. ### Links :link: Book/organisation: - [Book: "Hands-On Machine Learning with R"](https://bradleyboehmke.github.io/HOML/) - [GitHub Repository](https://github.com/rladiesnl/book_club_handsonML) - [Meeting Link](https://us02web.zoom.us/j/89588323742#success) - Meet-up pages: - [R-Ladies Utrecht](https://www.meetup.com/rladies-utrecht/) - [R-Ladies Den Bosch](https://www.meetup.com/rladies-den-bosch/) - Twitter - [@RLadiesUtrecht](https://twitter.com/RLadiesUtrecht) - [@RLadiesDenBosch](https://twitter.com/RLadiesDenBosch) ## Chapter 9-10 ### Registry :clipboard: Name / pronouns / R level / place where you join from - Ale / she, her / Intermediate / Utrecht, NL - Ece / she, her / Intermediate / Rotterdam, NL - Gerbrich / she, her / Intermediate / Utrecht, NL - Lill Eva / she, her / Intermediate / Utrecht, NL - Kirsty / she, her / Intermediate / Den Haag, NL - Shweta / she, her / Intermediate / India, IN - Veerle / she, her / Advanced / Den Bosch, NL ### Do you have any questions? :question: You can write them down here, and if you have answers to posted questions please go ahead, we are all learning together. - **If you reached one of the parameters (max depth or min number of observations), do you stop splitting in all nodes? (from Ale)** - No. You only stop splitting in the node where you reached the limit of the parameter. You keep splitting in the others. That is why the final tree has different depths in different 'branches'. - **Where can we get the HOML presentation template? Do we have any specific instruction on how to prepare the presentation?** - No instructions! You are free to make it as you think is best in terms of technology (PowerPoint/Quarto/Rmarkdown) and content. If you want some examples, you can look at our [GitHub page](https://github.com/rladiesnl/book_club_handsonML). ## About today's topic ### Any take-home messages you want to share? Help others remember the main points you took of these chapters: #### Decision Trees - Terminology: - *Root node*: contains all the observations - *Terminal nodes*: nodes at the bottom of the tree - Different types. Most common is **CART** (Classification and regression tree) - Data is partitioned into similar subgroups. The nodes at the bottom of the tree are closer to each other (more similar) - The model keeps splitting until one of the parameters is reached for each subgroup. - *Maximum depth:* parameter that you choose for the decision tree. How many levels you want in your decision tree. You have to decide how deep you should go. - *Minimum number of observations* in terminal nodes - *Classification vs regression trees:* Regression trees predict the average response value in a subgroup. Classification trees predicts whether an observation belongs to a group - Partitioning - In each node you try to find the best feature/split combination - Features can be used multiple times in the same tree - Preventing overfitting: - Restrict tree depth (maximum depth parameter) - Restrict the minimum number of observations in terminal nodes - Pruning: make a complex tree first and simplify afterwards - It uses automated feature selection! Uninformative features are not used in the model - Decision trees are: - Pros: - easy to explain, visually appealing - require little preprocessing - they are not sensitive to outliers or missing data - can handle mix of categorical and numeric features - Cons: - not the best predictors - rigid, non-smooth boundaries - deep trees have risk of overfitting but shallow trees have risk of low predictability #### Bagging - Bagging is bootstrap aggregating. This means using random sampling with replacement to create multiple smaller datasets out of your original dataset. - Fit multiple prediction models in those new test sets and take the average - This helps reduce variance and minimize overfitting (because you use multiple different models) - You can do bagging for multiple models, not only for decision trees. However, since they help reduce variance, they are particularly useful for decision trees. - A single pruned decision tree performs worse than MARS or KNN but multiple (let's say 100) unpruned, bagged decision trees perform better - Bagging is computationally intensive! However, they can be easy to parallelize. - Using function ipred::bagging(fomula, data, nbagg, coob, control) ### Do you have any interesting links regarding the topic? :link: If you have suggestions of books/blog posts/articles, etc. that could help people getting further into the topic. Write them here: - WRITE YOUR LINK HERE! :point_left: - Decision Making in Health & Medicine Myriam Hunink ### Feedback :left_speech_bubble: Please help us get better at this by giving us some feedback :sparkles: Things you liked or things that could improve! :smile: - WRITE YOUR FIRST COMMENT HERE! :point_left: - I loved the example of your own dataset/analysis about the books you read :heart_eyes: ## Sign-up for presenting a chapter! - Chp 9-10 - Veerle (21 Nov) - Chp 11 - Oussama (5 Dec) - Chp 12 - Ece (19 Dec) - Chp 13 - Galin (TBD) - Chp 14 - ? - Chp 15 - Ece (TBD) - Chp 16 - Brandon, co-author of the book (TBD) - Chp 17 - Shweta (would try to) (TBD) - Chp 18-19 - ? - Chp 20 - 21 -(22) - Martine (TBD) At the end of December, we would like to have a holidays break. We consider starting again on the 9th of January 2023, or maybe a week later? Where can we get the HOML presentation template? Do we have any specific instruction on how to prepare the presentation? - No instructions! You are free to make it as you think is best in terms of technology (PowerPoint/Quarto/Rmarkdown) and content. If you want some examples, you can look at our [GitHub page](https://github.com/rladiesnl/book_club_handsonML).