# Book Club "Hands-On Machine Learning with R" Meet 7 - Chapter 11 Random Forest ## Some house rules to make the meeting nice for everyone - Please familiarize with our [code of conduct](https://rladies.org/code-of-conduct/). In summary, please be nice to each other and help us make an **inclusive** meeting! :purple_heart: - The meeting will NOT BE RECORDED but the slides will be shared! - Please list your name in the registry. - Make sure you're in the edit mode (ctrl + alt + e) when trying to edit the file! You'll know you're in the edit mode if the background is black :8ball: - Please keep your mic off during the presentation. It is nice if you have the camera on and participate to make the meeting more interactive. Of course you can have your camera off if you prefer. - If you have questions, raise your hand or write your question in this document. ### Links :link: - [Book: "Hands-On Machine Learning with R"](https://bradleyboehmke.github.io/HOML/) - [GitHub Repository](https://github.com/rladiesnl/book_club_handsonML) - Hack Md's - [Chp 1 Intro to ML, Chp 2 Modelling process](https://hackmd.io/EhYe_gkWScuoaVCIH6QLAg?both) - [Chp 3 Feature and Target Engineering](https://hackmd.io/fgoe7HBzSRmWrqadtXO1Lw) - Meet-up pages: - [R-Ladies Utrecht](https://www.meetup.com/rladies-utrecht/) - [R-Ladies Den Bosch](https://www.meetup.com/rladies-den-bosch/) - Twitter - [@RLadiesUtrecht](https://twitter.com/RLadiesUtrecht) - [@RLadiesDenBosch](https://twitter.com/RLadiesDenBosch) ## Chp 11: Random Forests Link to today's [slides](https://github.com/rladiesnl/book_club_handsonML/blob/main/Chapter%2011%20Random%20Forests.pptm) ### Registry :clipboard: Name / pronouns / R level / place where you join from - Gerbrich / she, her / intermediate / Utrecht, NL - Veerle / she, her / advanced / Den Bosch, NL - Martine / she, her / advanced / Den bosch NL - Ale / She, her / intermediate / Utrecht, NL - Oussama / He, Him / advanced / Dijon, France - Ece / she, her / intermediate / Rotterdam, NL - Lill Eva / she, her / intermediate / Utrecht, NL ### Do you have any questions? :question: You can write them down here, and if you have answers to posted questions please go ahead, we are all learning together. - Why are the rules of thumb about $m_{try}$ like they are? What is it based on? ### Any take-home messages you want to share? Help others remember the main points you took of these chapters: - Downsides of decision trees and bagging: predictions are not great, bias, correlated trees. A possible solution: random forests :) - Random -> always leave some variables out randomly / random offer of mtry variables to choose the splitting variable from - Random forests have the least variability in prediction accuracy -> high predictive power **Random forest parameters:** * $m_{try}$: number of predictors each tree is based on. For regression trees, $m_{try}$ is often number of features divided by 3. * Factors in the data need to be ordered, otherwise there are too many options for partitioning -> `respect.unordered.factors = "order"` * Number of trees in the forest -> start with number of features x10 * Minimal node size: minimum number of observations in each end node of the tree. Higher node sizes lead to lower tree complexity. Increasing node size lowers time to compute while not affecting accuracy that much * Sampling scheme: sampling with or without replacement. Without replacement models perform better. * Split rule **Pros and cons** - Pro: Good out-of-the-box performance and usually high accuracy - Con: high computational costs (but possible to parallelize) ### Do you have any interesting links regarding the topic? :link: If you have suggestions of books/blog posts/articles, etc. that could help people getting further into the topic. Write them here: - Statquest: https://www.youtube.com/c/joshstarmer - Article on Completey randomized trees: - H2O AI platform: https://h2o.ai/ ### Feedback :left_speech_bubble: Please help us get better at this by giving us some feedback :sparkles: Things you liked or things that could improve! :smile: - WRITE YOUR FIRST COMMENT HERE! :point_left: ## Sign-up for presenting a chapter! - Chp 12 - Ece (19 Dec) - Chp 13 - Galin (9 Jan) - Chp 14 - TBD (TBD) - Chp 15 - Ece - Chp 16 - Brandon, co-author of the book (TBD) - Chp 17 - Shweta (would try to) (TBD) - Chp 18-19 - TBD (TBD) - Chp 20 - 21 -(22) - Martine (TBD)