WEEK 1 - HackMD

# WEEK 1 ## DAY 1 - Mon, 16 Nov 2020 [TOC] ## Morning Lecture **Agenda:** ### Introduction * Get to know each others ### Program details ![Program details](https://i.imgur.com/94vEFjW.png) * Class rules, tips and scoring system ![Scoring system](https://i.imgur.com/hB3PZ4u.png) --- ## Afternoon Lecture ### Definition of Data: * Qualitative(descriptative info) vs Quantitative(numerical info) * Quantitative (Discrete & Continuous) ### Collect data by: * Survey and IoT * IoT: everything that got connected to the internet(booming since 2013) ### Census vs Sample ### Structure vs Unstructure data * Structure data: everything that can can put in excel * Unstructure data: data can't be structured, eg. voice, music, --> big data * Data growth driven by unstructured data * 1 exabytes == 1^9 gigabytes ### Data Science ### DA vs DE vs DS * Differenciate DA, DE, DS * This course is more relevant to DA & DS * Skill set for each type of key roles ![DA](https://i.imgur.com/a2bNqrg.png) ![DE](https://i.imgur.com/zVjtuUZ.png) ![DS](https://i.imgur.com/4JTLNal.png) ### Industry Settings * 80% to get data * 20% to make models ### Machine Learning * AI vs ML vs DL ### Supervised Learning * what we expected the output, data with labels * Regression problem(continuous output type) * Classification problem (Discrete output type) ### Steps to predictive modelling ![](https://i.imgur.com/tI2Kv1F.png) ### Equipment ![](https://i.imgur.com/DXjJDLM.png) # DAY 4: Bash Commands + Git/Github. Thu, 19 Nov 2020 pwd: where we are ls: ls -la ls -l rm -fr/*: remove everything grep # DAY5:HTML/CSS & Web Scrapping [TOC] ## HTML - Hypertext Markup Language - Web Dev consists of 3 elements: - HTML - CSS - JS - Understand HTML/CSS to assit scapping the website - Recommended to use GG Chrome for browser & VS code for a Text Editor/(sublime text) #### - **Note**: HTML is **NOT** a programming language. It is a markup language. " A programming language need to have variable and loop" - We focus in the code inside the <body> since this visible on the website - HTML Tags syntax structure * <tagname>content</tagname> * <tagname> attribute name ="attribute value">content </tagname> ![](https://i.imgur.com/3xgZcBn.png) - Common Tags ![](https://i.imgur.com/L4HPY4P.png) - Sematic Tags ![](https://i.imgur.com/wpW68rv.png) - Know the different between HTML4 & HTML5 ## CSS - Cascading Stylesheet - Used for website layouts and design #### - **Note**: CSS is **NOT** a programming language. - Best practise: separate/code CSS in an external file (.css) - CSS Rule & Selector ![](https://i.imgur.com/SLOXge3.png) ![](https://i.imgur.com/KzMQDhA.png) ## JavaScript ## Web Scrapping ### Scrapping Sequences 1. **Send GET Request to the website** * import requests * r=requests.get() 2. **Parse the Raw Text with BeautifulSoup** * from bs4 import BeautifulSoup * **soup = BeautifulSoup(r.text, 'html.parser')** * print(soup.prettify()[:500]) 3. **Extract information** **3a. Find the tag object:** `<soup>.find(<tag>, {<attribute1>:<value1>, <attribute2>:<value2>}): `Return the FIRST occurence of <tag> with <attributes> equal to <values> in <soup> object. Output: Tag Object`` `<soup>.find_all(<tag>, {<attribute1>:<value1>, <attribute2>:<value2>}): `Return ALL occurences of <tag> with <attributes> equal to <values> in <soup> object. Output: ResultSet (List) containing one or many Tag Objects.`` `<soup>.<tag>: `Return FIRST occurence of <tag> in <soup> object. Output: Tag Object.`` **3b. Find the children tag 3c. Data is the CONTENT of the tag (can see in the website) 3d. Data is the VALUE of an ATTRIBUTE of the tag (hidden from the website)** **3e. Putting it all together with Main Component/ package into Functions**