# Week 1: 16 - 20.11.2020
# Monday: Data, ML
## What is data
* Data is a collection of facts
* Kind of data:
* Qualitative (discrete & continuous) : numerical information
* Quantitative : descriptive information
* Why do we need data:
* data helps to better understand the insigth of some things to make decisions
* fit the purpose , cost saving, standing out from the crowd...
* How to collect data
* The Internet of things (IoT) : sensors, software, and other technologies,... => for the purpose of connecting and exchanging data
* Cencus vs Sample
* Census: the entire population, costly to collect, time comsuming
* Sample: collect a subset of the population, but needs to make sure the sample representative of the population, avoid bias
* Structured data vs Unstructured data
* Structured data: fits nicely into a relational database, highly organized and easily analyzed
* Unstructured data: It doesn’t fit nicely into a spreadsheet or database. ie: Audio and video files, images
* Unstructured data is accelerating from 2013. 80-90% of data is unstructured rightnow
## What is Data Science
* The goal of Data science is extract insight from messy data
* DA vs DE vs DS
* DA:analyse data => clean, visualize, explore insight,..
* DE:develop, construct,maitain database => develop data pipeline, ETL, deploy ML model
* DS:analyse and interpret complex data => DA task + ML
## What is machine learning
* give ability to learn without being explicity programmed
## AI vs Machine Learning vs Deep Learning
* AI: enable machine minic human behavior
* Machine learning: subset of AI which use statistical methods
* Deep learning: subset of machine learning- inspired by human brain
## How machine learning work?
* Traditional programming : define the rule, evaluate the software performance, then adjust or update the rule by programer
* ML: the computer learn to solve the problem automately by giving it enough data
# Tuesday - Basic Python
## Function
Functions that operate on other functions are called "Higher order functions"
## Booleans and Conditionals
* Bool
* Comparison Operations
* XOr : difference of 2 bool
* "and" has a higher precedence than "or" => but a safer bet is to just use parenthese => prevent bugs, clearer to others
* Boolean conversion
* bool() : All numbers, strings are treated as true, except 0 and ""
* check if list/tuple/dict... is empyt or not
* use in objects in "if" conditions
## List
* Syntax to create : []
* List index: zero-based indexing
* Slice :
```
list[starting _index:ending_index:jumping_interval]
```
* Lists are mutable, meaning they can be modified "in place"
* List functions :
```
len()
sorted(list, reverse=True/False)
sum()
min()
max()
```
* List method: function attached to an object
```
list.append()
list.pop() : removes and returns the last element of a list
# Searching Lists
list.index("abc")
# we can use the in operator to determine whether a list contains a particular value
"abc" in list
```
## Tuple
* Syntax to create : ()
* They cannot be modified (they are immutable)
* Tuples are faster than lists
* It makes your code safer if you “write-protect” data that does not need to be changed
* Some tuples can be used as dictionary keys
## For loop vs While loop
* For loop : repeatedly execute some code through each character in a string/list/tuple/dict...
```
for i in range(5):
print (i)
```
* While loop : iterates until some condition is met
`
i = 0
while True:
print(i, end=' ')
i += 1
if (i > 10):
break`
## List comprehension
```
[ expression for item in list if conditional ]
This is equivalent to:
for item in list:
if conditional:
expression
```
## String
* backslash(escape) \
* Method:
* string.spilt => output: list
* string.join => output: string
* string.format
```
name = 'tom'
age = 28
print("My name is "+name)
print(f"My name is {name}")
print("My name is {} and my age is {}".format(name, age))`
```
```
My name is tom
My name is tom
My name is tom and my age is 28
```
## Dictionaries
* {}
* Method:
* dict.keys()
* dict.values()
* dict.items()
# Wednesday - Intermediate Python
## List comprehension
* create dictionaries or sets too
```
square_dict = {x: x * x for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
square_set = {x * x for x in [-1, 1]} # {1}
```
* If you don't need the value of the list:
```
zeros = [0 for _ in even_numbers] # has the same length as even_numbers
zeros
```
* A list comprehension can include multiple fors:
```
for x in range(2): #[0 1]
for y in range(3): #[0 1 2]
print(x, y)
```
## Iterables and Generators
* create generator:
* yield
* using for comprehensions wrapped in parentheses
```
evens_below_20 = (i for i in range(20) if i % 2 == 0) # use () instead of []
print(evens_below_20)
```
* generator can't be index
* range is actually a python generator
## Automated Testing via assert
"write test cases before you write code"
## Zip and Argument Unpacking
* zip function transforms multiple iterables into a single iterable of tuples of corresponding function
```
# these two list lengths are the same
list1 = ['a', 'b', 'c','d','e']
list2 = [1, 2, 3, 4, 5 ]
result = []
for i in range(len(list1)):
result.append((list1[i],list2[i]))
result
#[(a,1),(b,2),(c,3),(d,4),(e,5)]
# zip is lazy, so you have to do something like the following
for pair in zip(list1, list2): # is [('a', 1), ('b', 2), ('c', 3)]
print(pair)
```
* Unzip * (the opposite of Zip)
** Input
```
def f(*args, **kwargs):
print('args = ', args) # this is a list
print('kargs = ', kwargs) # this is a dictionary
print("----------------")
for name in args:
print('Hi ' + name)
for key in kwargs:
print(key, '=' ,kwargs[key])
```
```
f('Tom','Quan','Nhan','Ai', year='1992', school='cool', some_number=1234)
```
** Output
```
args = ('Tom', 'Quan', 'Nhan', 'Ai')
kargs = {'year': '1992', 'school': 'cool', 'some_number': 1234}
----------------
Hi Tom
Hi Quan
Hi Nhan
Hi Ai
year = 1992
school = cool
some_number = 1234
```
## Python Decorator
```
def my_decorator(func):
def wrapper(*args, **kwargs):
print("Something is happening before the function is called.")
func(*args, **kwargs)
print("Something is happening after the function is called.")
return wrapper
@my_decorator
def greeting(name):
print("Hi", name)
greeting('coderschool')
```
## Regular Expressions

re.sub(email_regex, 'EMAIL_HERE', text)
re.split('[aeiou]', 'consequential')
re.match("a", "cat")
re.search("c", "dog")
re.split("[ab]", "carbs")
# Thursday - Bash & GIT
## Bash command
* First Commands
* echo “Hello World”
* date
* pwd : Print Working Directory
* ls :Listing)
* cd :change directory
* history
* Working with filesystem
* Root directory: /
* Home directory: -
* Working with files and directories
* mkdir dir1 dir2 dir3 creates three directories
* rmdir dir1 dir2 dir3 removes the empty directories
* rm -R dir deletes non-empty directory
* cp file1 file2 copies file, -i (interactive mode) to prompt user to answer if they want to overwrite file.
* mv file1 file2 moves file
* touch file_name creates an empty file
* File inspection
* File inspection
* cat file_name views a file’s contents or concatenate several files
* less large_file one page is displayed at a time; space bar to page down; q to quit (less serveral files: :n to move to next file, :p to go back, :q to quit
* head -n 3 file_name prints the first 3 lines of the file
* tail -n 3 file_name prints the last 3 lines of the file
* shuf -n 3 file_name prints randomly 3 lines of the file
* wc file_name counts number of lines, words, and characters in the file
* column -s"," -t example_data.csv (be careful to use column on very large files)
* sort file_name sortes the content (-r to reverse, -u for getting rid of duplicates)
* grep
* grep Hello greetings.txt: find line with the word “Hello”
* grep —-ignore-case hello greetings.txt = grep -i hello greetings.txt argument with more than one letter start with two dash
* grep -E ‘[Hh]ello’ greetings.txt: apply regex
* grep —-invert-match hello greetings.txt = grep -v hello greetings.txt : find every line that not contains “hello”
* grep -i -v hello greeting.txt: find every line that not contain the case-insensitive “hello”
* grep -r Hello folder/: search every subfolder and file for the string “Hello”
* sed
* sed ‘s/Hello/Goodbye/‘ greetings.txt: s(substitute) → substitute “Hello” with “Goodbye” for every line in the file. sed printed out changes but not actually change the file. cat greetings.txt to see
* sed -E ‘s/Hello|hello/GOODBYE/’ greetings.txt: regex
* sed -i‘old’ ‘s/Hello/Goodbye/‘ greetings.txt: change the file and create a new “old” file as back up. Then if you don’t want reverse you can do mv greetings.old.txt greetings.txt
* sed -i‘’ ‘s/Hello/Goodbye/’ greetings.txt: change the file without backup
## Git


# Friday - Github, HTML & CSS & Web Scraping
## HTML
* HTML is not a programming language
* HTML - Hyper Text Markup Language is a markup language for documents designed to be displayed in a web browser
* Tag Syntax
* <tagname>content</tagname>
* Can have attributes
* End tag is with forward slash
* Some tags close themselves

## CSS (Cascading Stylesheets)
* NOT a programming language
* Used to website layouts and design
* Can be extended with Sass/ Less