--- tags: Kermadec Class, Machine Learning, Conda Environment, List Slicing, Git Bash, GitHub, CLI, GUI, HTML --- Machine Learning Week 1 = # Day 1: What is Machine Learning - Fact vs Concept: - Fact no need to memorize, can google at anytime. - Concept need to make clear and memorize. - Learning Strategy : Employer vs Intern - Employer = Instructor - Intern = Student ## 1. General Data Types: **Qualitative Data:** * Color * Gender * Brand laptop * Emotion * Type of work, table, glasses, clothes * Brightness * Time of day **Quantitative data:** * number of pencils * number of hair * number of teeth * number of finger * Time of day ## 2. Data Collection: **Collect Data:** * Survey * Tracking (electronic + physical) - Membership - Marketing tools - Cookies * Pictures/Video **Cencus vs Sample** Census: the whole thing (Tổng thể). Sample: a part of the whole thing. Beware of bias and resources. ## 3. Data Storage: Storage of Data: transfer anything to important texts/numbers for storage, then get rid of the original data (image, video, documents…) Most of data in current era is unstructured. Deep learning tackle Unstructured Data (not in excel sheets: Images, voices, essays…) **Internet of Things:** why data go crazy, huge surge in unstructured data ## 4. Data Science: Data Science = Insight of Data Assumption **Data Scientist**: build ML/data model -> **AI/Data Engineer**: put the model in production -> **Data Analyst**: get data from the model to do analysis and prediction (based on the data model) ![](https://i.imgur.com/GkACNu1.png) ![](https://i.imgur.com/e6pGFBY.png) AI: General AI -> not yet achieve. A machine can do anything that human can do. We do have Specific AI: machine can only do certain human tasks. Unseen data = data that the model has not trained on before, never seen before ## 5. Conda Environment: Initiate an environment with miniconda [Preparation Kit - Miniconda](https://www.notion.so/Preparation-Kit-Miniconda-b1f371a62ecd419a8724056286cda2b8) Powershell not supported for opening Jupyter Notebook without `pip install jupyterlab` [Powershell open Jupyter Notebook Instruction](https://blog.darrenjrobinson.com/getting-started-with-local-powershell-jupyter-notebook/) ``` To activate this environment, use $ conda activate cs_ftmle To deactivate an active environment, use $ conda deactivate ``` Conda environment install package ``` conda install -n <env_name> <package> or conda install -p <path/to/env> <package> ``` Conda environment update all packages [conda update](https://docs.conda.io/projects/conda/en/latest/commands/update.html) `conda update -n <env_name> --all` # Day 2 ## 1. Programing languages: Programing language components: - Data ~ Input + Output - Behaviors ~ Actions All languages have the same/similar 5 elements: Repeat youself -> put it in function Clean Code: write code for others to read Refactor ~ upgrade/clean all the time ## 2. Coding Tips: ### Help: help(): Get some help on the function ![](https://i.imgur.com/Pf4H1v5.png) Why tripple quote for commenting a function: **No Comment will be printed out in help() when there is no tripple quote.** ![Why tripple quote for commenting a function](https://i.imgur.com/SsHYlNh.png) ### True/False: Empty list [], dictionary {}, tuple (), string '' will always be False Not empty list [], dictionary {}, tuple (), string '' will always be True ### Funtion: def: print is only for human to "see". return is for computer to "see". Default arguments in a function must stay at the end of arguments list. nan: not a number Return multiple value in funcion -> default return type is tuple. def abc(d): return d, d+1 ~ (d, d+1) ### List #### List Slicing: List slicing always start from left to right by default. Remove element: - .pop(): remove + get the removed element - .remove(): remote seleted element **Lists can never be used as dictionary keys**, because lists are not immutable. list + list = list [1, 2, 3] + [4, 5] = [1, 2, 3, 4, 5] [1, 2, 3] + [] = [1, 2, 3] #### Sort: ``` >>> numbers = [6, 9, 3, 1] >>> sorted(numbers) [1, 3, 6, 9] >>> numbers [6, 9, 3, 1] ``` ### Tuple: Tuples are faster than lists Tuples for “write-protect” data Tuples can be used as dictionary keys 1, 2, 3 = (1, 2, 3) Return multiple value in funcion -> default return type is tuple. def abc(d): return d, d+1 ~ (d, d+1) ## Set: `set(list) = list.unique()` Note: list does not have mehod .unique(), this is only for exampling. Intersection multiple sets (same for union): ``` sets = [s1, s2, s3] set.intersection(*sets) ``` ### Dictionary: Dictionary key must be inmutable. That is why list can't be a key in a dictionary. Make quick dictionary from a list: ``` planets = ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'] planet_to_initial = {planet: planet[0] for planet in planets} planet_to_initial ``` Sort dictionary by value: `sorted(dict, key=dict.get, reverse=True)` ### String: \n still count as 1 character. d = "hey\n i " length = 7 strip(), rstrip(): strip(',.!?/%$#!@_-;:'), it will strip all characters in the "list": ``` ,.!?/%$#!@_-;: ``` # Day 3 Doing excercises and review Dictionary # Day 4 ## 1. GUI vs CLI: Graphical User Interfaces (GUIs): Buttons to click on Command-Line Interfaces (CLIs): Writing command (instruction) to command the computer ## CLI: ### Path * Relative path: always start from where you are, then forth or back to get you where you want. * Absolute path: always start from the begining folder (the folder mothers: C:/, D:/, E:/...) Go back 1 folder = `cd ..` Go back 2 folders = `cd ../..` Go to home directory ### Code: `touch`: create files (not folder) `mkdir`: create directory (folder) `ls`: show what is the current directory (folder) `ls -l`: `l` stay for "long", show more info. `ls -la`: `a` show hidden folders/files `cd ..`: Go backward 1 folder. `nano`: edit file (.txt, .py,...). `cat`: to show all everything (text, picture...) in the file. Pictures will be shown in binary. `mv`: move file and folder. `rm`: remove file only. Not moving to trash, but gone forever. `rm -r ./c`: remove folder "./c" `^C` = ` CTRL + C`: mean "Close" stop anything that bash is running. `grep`: find `sed`: replace, can replace anything in the file, then export output (optional) Use multiple code in one line: * `cd a $$ grep -i hamlet shakespeare.txt $$ wc shakespeare.txt` -> mulitple code run at once and give multiple output. * `sort ./b/cat.txt | grep -i b | sed 's/h/BLABLABLA/' > result.txt` -> the next code use the result of previous code. ### GitHub Initiate GitHub: `git init` Workflow of every time want to save a version of file: git add -> git commit -m "message" (should put some comment/message) -> git push (to GitHub) Branch: `git branch -a`: list all branches If want to merge branches, must stay on the "core" branch then `git merge branch` to merge the branch to the "core" branch. If there is a conflict between 2 branches, then resolve the conflict, how to resolve: 1. Install this package in VSC https://code.visualstudio.com/docs/editor/github 2. Resolve conflict UI will appear everytime there is a conflict. 3. Go Source Control (on the left side bar) to commit the changes. Git Pull = Git Fetch + Git Merge to local branch #### GitHub/Git Errors: **How to Push correctly**: Before pushing, have to pull the newest version on GitHub to the computer first, then push the computer version to GitHub. -> **Rule:** only possible to Push when having the latest version on the computer. **File Collision:** Changing file names may lead to collision between files because 1 index store multiple files with different names. `git pull` might return `error: Your local changes to the following files would be overwritten by merge:` -> **Solution:** `git stash` will solve this issue by creating different files with the names. # Day 5 A website need JavaScript, CSS, HTML. ![](https://i.imgur.com/WBixhPK.png) JavaScript: actions on website. ## 1. HTML: ### HTML Tags: Always need html open and close: ``` <html> ... </html> ``` Users don't see what in head. Head is for browser to read/see: ``` <head> ... </head> ``` Defining style (font, color) and apply to the whole website (all background, all body, all paragraph...) Users only see what in body: ``` <body> ... </body> ``` div: division, take the whole width of the webpage. span: only take the width of text/image... within the `<span>...<\span>`. **HTML 4 vs HTML 5:** HTML 5 has some new tags compared to HTML 4, which makes ### HTML Attribute: `class, id...` are HTML Attribute inside HTML Tags. HTML Attribute `id` must be unique. 1 tag can contain many attributes as a **list** of attributes. 1 attibutes can contain many content as a **list** of content ## 2. CSS: Should write CSS in an external file then call it in HTML by HTML tags, class or id. ## Cool findout: HackMD also allow to design a website: ## 3. BeautifulSoup: `find_all`: find all elements (tags, attribute...) in all levels inside the body of the tag. **BeautifulSoup** is good for Static website. **Selinium** for Scrapping Dynamic website, not good for Scrapping Statis website. ## 4. Pandas: Pandas allow exporting file to pickle: `dataframe.to_pickle(path)`