Welcome to the Book Club "Hands-On Machine Learning with R" - Ch. 7. Splines and Ch. 8. KNN

# Welcome to the Book Club "Hands-On Machine Learning with R" - Ch. 7. Splines and Ch. 8. KNN ## Some house rules to make the meeting nice for everyone - Please familiarize with our [code of conduct](https://rladies.org/code-of-conduct/). In summary, please be nice to each other and help us make an **inclusive** meeting! :purple_heart: - The meeting will NOT BE RECORDED but the slides will be shared! - Please list your name in the registry. - Make sure you're in the edit mode (ctrl + alt + e) when trying to edit the file! You'll know you're in the edit mode if the background is black :8ball: - Please keep your mic off during the presentation. It is nice if you have the camera on and participate to make the meeting more interactive. Of course you can have your camera off if you prefer. - If you have questions, raise your hand or write your question in this document. ### Links :link: Book/organisation: - [Book: "Hands-On Machine Learning with R"](https://bradleyboehmke.github.io/HOML/) - [Slides Chp 7-8](https://ml-book-club-2022-cph7-8.netlify.app/#/title-slide) - [GitHub Repository](https://github.com/rladiesnl/book_club_handsonML) - [Meeting Link](https://us02web.zoom.us/j/89588323742#success) - Meet-up pages: - [R-Ladies Utrecht](https://www.meetup.com/rladies-utrecht/) - [R-Ladies Den Bosch](https://www.meetup.com/rladies-den-bosch/) - Twitter - [@RLadiesUtrecht](https://twitter.com/RLadiesUtrecht) - [@RLadiesDenBosch](https://twitter.com/RLadiesDenBosch) ## Chapter 1-2: Introduction to Machine Learning + Modeling process ### Registry :clipboard: Name / pronouns / R level / place where you join from - Ale / she, her / intermediate / Utrecht, NL - Lill Eva / she, her / beginner / Utrecht, NL - Ece / she, her / intermediate / Rotterdam, NL - Ona / she, her / beginner / Hannover, DE - Veerle / she, her / advanced / Den Bosch, NL ### Do you have any questions? :question: You can write them down here, and if you have answers to posted questions please go ahead, we are all learning together. - What is a closed-form model? (from Elena) - In mathematics, a closed-form expression is a mathematical expression that uses a finite number of standard operations (Wikipedia) - An equation is said to be a closed-form solution if it solves a given problem in terms of functions and mathematical operations from a given generally-accepted set. For example, an infinite sum would generally not be considered closed-form. However, the choice of what to call closed-form and what not is rather arbitrary since a new "closed-form" function could simply be defined in terms of the infinite sum.(https://mathworld.wolfram.com/Closed-FormSolution.html) - Can the KNN be used to predict "future" observations? So if it is not good for real time modeling, does it mean that if I get new observations I always need to re-run the algorithm? (from Ale) ## About today's topic ### Any take-home messages you want to share? Help others remember the main points you took of these chapters: - Splines - They are important when you want to discover nonlinear relationships - Assesses cutpoints (knots) - Many knots may have a good fit in training data but may be not generalizable - Pruning removes knots that are not useful for prediction (does automated feature selection) - `earth::earth()` function used for calculations of splines - It handles different types of predictors (quantitative and qualitative) - Resists against collinearity (highly correlated predictors) - K-Nearest Neighbors - Simple algorithm based on similarity to other observations - Useful for preprocessing - "Algorithm identifies k observations that are “similar”/nearest to the new record being predicted and then uses the average response value (regression) or the most common class (classification) of those k observations as the predicted output" - Different ways to calculate the distance - Euclidean distance <- most common ("as the crow flies") - Manhattan distance - Minkowski distance - Mahalanobis distance - Distance calculated using the `dist()` function - Most distance measures are sensitive to the scale of the features! (you need to scale first!) - Categorical features must be represented numerically (one-hot encoded, ordinal encoding) - You need to choose the value of `k` - Affects the result - The more irrelevant features you have, the larger the value of `k` has to be to smooth out the noise - Better to use odd numbers for `k` to avoid ties in the classification - KNN drawbacks: - can be severely affected by irrelevant features - can have a high computation time - Not suitable for real-time modeling ### Do you have any interesting links regarding the topic? :link: If you have suggestions of books/blog posts/articles, etc. that could help people getting further into the topic. Write them here: - WRITE YOUR LINK HERE! :point_left: ### Feedback :left_speech_bubble: Please help us get better at this by giving us some feedback :sparkles: Things you liked or things that could improve! :smile: - WRITE YOUR FIRST COMMENT HERE! :point_left: ## Sign-up for presenting a chapter! - Chp 7-8 - Elena Dudukina (7 nov) - Chp 9-10 - Veerle (21 Nov) - Chp 11 - Oussama (dec 5) - Chp 12 - Ece (19 Dec) - Chp 13 - ? - Chp 14 - ? - Chp 15 - Ece (TBD) - Chp 16 - Brandon, co-author of the book (TBD) - Chp 17 - Shweta (would try to) (TBD) - Chp 18 - ? - Chp 19 - ? - Chp 20 - 21 -(22) - Martine (TBD)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.