---
tags: Kermadec Class, Machine Learning, Conda Environment, List Slicing, Git Bash, GitHub, CLI, GUI, HTML
---
Machine Learning Week 1
=
# Day 1: What is Machine Learning
- Fact vs Concept:
- Fact no need to memorize, can google at anytime.
- Concept need to make clear and memorize.
- Learning Strategy : Employer vs Intern
- Employer = Instructor
- Intern = Student
## 1. General Data Types:
**Qualitative Data:**
* Color
* Gender
* Brand laptop
* Emotion
* Type of work, table, glasses, clothes
* Brightness
* Time of day
**Quantitative data:**
* number of pencils
* number of hair
* number of teeth
* number of finger
* Time of day
## 2. Data Collection:
**Collect Data:**
* Survey
* Tracking (electronic + physical) - Membership - Marketing tools - Cookies
* Pictures/Video
**Cencus vs Sample**
Census: the whole thing (Tổng thể).
Sample: a part of the whole thing. Beware of bias and resources.
## 3. Data Storage:
Storage of Data: transfer anything to important texts/numbers for storage, then get rid of the original data (image, video, documents…)
Most of data in current era is unstructured.
Deep learning tackle Unstructured Data (not in excel sheets: Images, voices, essays…)
**Internet of Things:** why data go crazy, huge surge in unstructured data
## 4. Data Science:
Data Science = Insight of Data
Assumption
**Data Scientist**: build ML/data model -> **AI/Data Engineer**: put the model in production -> **Data Analyst**: get data from the model to do analysis and prediction (based on the data model)
![](https://i.imgur.com/GkACNu1.png)
![](https://i.imgur.com/e6pGFBY.png)
AI: General AI -> not yet achieve. A machine can do anything that human can do.
We do have Specific AI: machine can only do certain human tasks.
Unseen data = data that the model has not trained on before, never seen before
## 5. Conda Environment:
Initiate an environment with miniconda
[Preparation Kit - Miniconda](https://www.notion.so/Preparation-Kit-Miniconda-b1f371a62ecd419a8724056286cda2b8)
Powershell not supported for opening Jupyter Notebook without `pip install jupyterlab`
[Powershell open Jupyter Notebook Instruction](https://blog.darrenjrobinson.com/getting-started-with-local-powershell-jupyter-notebook/)
```
To activate this environment, use
$ conda activate cs_ftmle
To deactivate an active environment, use
$ conda deactivate
```
Conda environment install package
```
conda install -n <env_name> <package>
or conda install -p <path/to/env> <package>
```
Conda environment update all packages [conda update](https://docs.conda.io/projects/conda/en/latest/commands/update.html)
`conda update -n <env_name> --all`
# Day 2
## 1. Programing languages:
Programing language components:
- Data ~ Input + Output
- Behaviors ~ Actions
All languages have the same/similar 5 elements:
Repeat youself -> put it in function
Clean Code: write code for others to read
Refactor ~ upgrade/clean all the time
## 2. Coding Tips:
### Help:
help(): Get some help on the function
![](https://i.imgur.com/Pf4H1v5.png)
Why tripple quote for commenting a function: **No Comment will be printed out in help() when there is no tripple quote.**
![Why tripple quote for commenting a function](https://i.imgur.com/SsHYlNh.png)
### True/False:
Empty list [], dictionary {}, tuple (), string '' will always be False
Not empty list [], dictionary {}, tuple (), string '' will always be True
### Funtion:
def:
print is only for human to "see".
return is for computer to "see".
Default arguments in a function must stay at the end of arguments list.
nan: not a number
Return multiple value in funcion -> default return type is tuple.
def abc(d):
return d, d+1 ~ (d, d+1)
### List
#### List Slicing:
List slicing always start from left to right by default.
Remove element:
- .pop(): remove + get the removed element
- .remove(): remote seleted element
**Lists can never be used as dictionary keys**, because lists are not immutable.
list + list = list
[1, 2, 3] + [4, 5] = [1, 2, 3, 4, 5]
[1, 2, 3] + [] = [1, 2, 3]
#### Sort:
```
>>> numbers = [6, 9, 3, 1]
>>> sorted(numbers)
[1, 3, 6, 9]
>>> numbers
[6, 9, 3, 1]
```
### Tuple:
Tuples are faster than lists
Tuples for “write-protect” data
Tuples can be used as dictionary keys
1, 2, 3 = (1, 2, 3)
Return multiple value in funcion -> default return type is tuple.
def abc(d):
return d, d+1 ~ (d, d+1)
## Set:
`set(list) = list.unique()` Note: list does not have mehod .unique(), this is only for exampling.
Intersection multiple sets (same for union):
```
sets = [s1, s2, s3]
set.intersection(*sets)
```
### Dictionary:
Dictionary key must be inmutable. That is why list can't be a key in a dictionary.
Make quick dictionary from a list:
```
planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune']
planet_to_initial = {planet: planet[0] for planet in planets}
planet_to_initial
```
Sort dictionary by value:
`sorted(dict, key=dict.get, reverse=True)`
### String:
\n still count as 1 character.
d = "hey\n i "
length = 7
strip(), rstrip(): strip(',.!?/%$#!@_-;:'), it will strip all characters in the "list":
```
,.!?/%$#!@_-;:
```
# Day 3
Doing excercises and review Dictionary
# Day 4
## 1. GUI vs CLI:
Graphical User Interfaces (GUIs): Buttons to click on
Command-Line Interfaces (CLIs): Writing command (instruction) to command the computer
## CLI:
### Path
* Relative path: always start from where you are, then forth or back to get you where you want.
* Absolute path: always start from the begining folder (the folder mothers: C:/, D:/, E:/...)
Go back 1 folder = `cd ..`
Go back 2 folders = `cd ../..`
Go to home directory
### Code:
`touch`: create files (not folder)
`mkdir`: create directory (folder)
`ls`: show what is the current directory (folder)
`ls -l`: `l` stay for "long", show more info.
`ls -la`: `a` show hidden folders/files
`cd ..`: Go backward 1 folder.
`nano`: edit file (.txt, .py,...).
`cat`: to show all everything (text, picture...) in the file. Pictures will be shown in binary.
`mv`: move file and folder.
`rm`: remove file only. Not moving to trash, but gone forever.
`rm -r ./c`: remove folder "./c"
`^C` = ` CTRL + C`: mean "Close" stop anything that bash is running.
`grep`: find
`sed`: replace, can replace anything in the file, then export output (optional)
Use multiple code in one line:
* `cd a $$ grep -i hamlet shakespeare.txt $$ wc shakespeare.txt` -> mulitple code run at once and give multiple output.
* `sort ./b/cat.txt | grep -i b | sed 's/h/BLABLABLA/' > result.txt` -> the next code use the result of previous code.
### GitHub
Initiate GitHub: `git init`
Workflow of every time want to save a version of file: git add -> git commit -m "message" (should put some comment/message) -> git push (to GitHub)
Branch:
`git branch -a`: list all branches
If want to merge branches, must stay on the "core" branch then `git merge branch` to merge the branch to the "core" branch.
If there is a conflict between 2 branches, then resolve the conflict, how to resolve:
1. Install this package in VSC https://code.visualstudio.com/docs/editor/github
2. Resolve conflict UI will appear everytime there is a conflict.
3. Go Source Control (on the left side bar) to commit the changes.
Git Pull = Git Fetch + Git Merge to local branch
#### GitHub/Git Errors:
**How to Push correctly**: Before pushing, have to pull the newest version on GitHub to the computer first, then push the computer version to GitHub.
-> **Rule:** only possible to Push when having the latest version on the computer.
**File Collision:** Changing file names may lead to collision between files because 1 index store multiple files with different names.
`git pull` might return `error: Your local changes to the following files would be overwritten by merge:`
-> **Solution:** `git stash` will solve this issue by creating different files with the names.
# Day 5
A website need JavaScript, CSS, HTML.
![](https://i.imgur.com/WBixhPK.png)
JavaScript: actions on website.
## 1. HTML:
### HTML Tags:
Always need html open and close:
```
<html>
...
</html>
```
Users don't see what in head. Head is for browser to read/see:
```
<head>
...
</head>
```
Defining style (font, color) and apply to the whole website (all background, all body, all paragraph...)
Users only see what in body:
```
<body>
...
</body>
```
div: division, take the whole width of the webpage.
span: only take the width of text/image... within the `<span>...<\span>`.
**HTML 4 vs HTML 5:**
HTML 5 has some new tags compared to HTML 4, which makes
### HTML Attribute:
`class, id...` are HTML Attribute inside HTML Tags.
HTML Attribute `id` must be unique.
1 tag can contain many attributes as a **list** of attributes.
1 attibutes can contain many content as a **list** of content
## 2. CSS:
Should write CSS in an external file then call it in HTML by HTML tags, class or id.
## Cool findout:
HackMD also allow to design a website:
## 3. BeautifulSoup:
`find_all`: find all elements (tags, attribute...) in all levels inside the body of the tag.
**BeautifulSoup** is good for Static website.
**Selinium** for Scrapping Dynamic website, not good for Scrapping Statis website.
## 4. Pandas:
Pandas allow exporting file to pickle:
`dataframe.to_pickle(path)`