# Week 1: 16 - 20.11.2020 # Monday: Data, ML ## What is data * Data is a collection of facts * Kind of data: * Qualitative (discrete & continuous) : numerical information * Quantitative : descriptive information * Why do we need data: * data helps to better understand the insigth of some things to make decisions * fit the purpose , cost saving, standing out from the crowd... * How to collect data * The Internet of things (IoT) : sensors, software, and other technologies,... => for the purpose of connecting and exchanging data * Cencus vs Sample * Census: the entire population, costly to collect, time comsuming * Sample: collect a subset of the population, but needs to make sure the sample representative of the population, avoid bias * Structured data vs Unstructured data * Structured data: fits nicely into a relational database, highly organized and easily analyzed * Unstructured data: It doesn’t fit nicely into a spreadsheet or database. ie: Audio and video files, images * Unstructured data is accelerating from 2013. 80-90% of data is unstructured rightnow ## What is Data Science * The goal of Data science is extract insight from messy data * DA vs DE vs DS * DA:analyse data => clean, visualize, explore insight,.. * DE:develop, construct,maitain database => develop data pipeline, ETL, deploy ML model * DS:analyse and interpret complex data => DA task + ML ## What is machine learning * give ability to learn without being explicity programmed ## AI vs Machine Learning vs Deep Learning * AI: enable machine minic human behavior * Machine learning: subset of AI which use statistical methods * Deep learning: subset of machine learning- inspired by human brain ## How machine learning work? * Traditional programming : define the rule, evaluate the software performance, then adjust or update the rule by programer * ML: the computer learn to solve the problem automately by giving it enough data # Tuesday - Basic Python ## Function Functions that operate on other functions are called "Higher order functions" ## Booleans and Conditionals * Bool * Comparison Operations * XOr : difference of 2 bool * "and" has a higher precedence than "or" => but a safer bet is to just use parenthese => prevent bugs, clearer to others * Boolean conversion * bool() : All numbers, strings are treated as true, except 0 and "" * check if list/tuple/dict... is empyt or not * use in objects in "if" conditions ## List * Syntax to create : [] * List index: zero-based indexing * Slice : ``` list[starting _index:ending_index:jumping_interval] ``` * Lists are mutable, meaning they can be modified "in place" * List functions : ``` len() sorted(list, reverse=True/False) sum() min() max() ``` * List method: function attached to an object ``` list.append() list.pop() : removes and returns the last element of a list # Searching Lists list.index("abc") # we can use the in operator to determine whether a list contains a particular value "abc" in list ``` ## Tuple * Syntax to create : () * They cannot be modified (they are immutable) * Tuples are faster than lists * It makes your code safer if you “write-protect” data that does not need to be changed * Some tuples can be used as dictionary keys ## For loop vs While loop * For loop : repeatedly execute some code through each character in a string/list/tuple/dict... ``` for i in range(5): print (i) ``` * While loop : iterates until some condition is met ` i = 0 while True: print(i, end=' ') i += 1 if (i > 10): break` ## List comprehension ``` [ expression for item in list if conditional ] This is equivalent to: for item in list: if conditional: expression ``` ## String * backslash(escape) \ * Method: * string.spilt => output: list * string.join => output: string * string.format ``` name = 'tom' age = 28 print("My name is "+name) print(f"My name is {name}") print("My name is {} and my age is {}".format(name, age))` ``` ``` My name is tom My name is tom My name is tom and my age is 28 ``` ## Dictionaries * {} * Method: * dict.keys() * dict.values() * dict.items() # Wednesday - Intermediate Python ## List comprehension * create dictionaries or sets too ``` square_dict = {x: x * x for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} square_set = {x * x for x in [-1, 1]} # {1} ``` * If you don't need the value of the list: ``` zeros = [0 for _ in even_numbers] # has the same length as even_numbers zeros ``` * A list comprehension can include multiple fors: ``` for x in range(2): #[0 1] for y in range(3): #[0 1 2] print(x, y) ``` ## Iterables and Generators * create generator: * yield * using for comprehensions wrapped in parentheses ``` evens_below_20 = (i for i in range(20) if i % 2 == 0) # use () instead of [] print(evens_below_20) ``` * generator can't be index * range is actually a python generator ## Automated Testing via assert "write test cases before you write code" ## Zip and Argument Unpacking * zip function transforms multiple iterables into a single iterable of tuples of corresponding function ``` # these two list lengths are the same list1 = ['a', 'b', 'c','d','e'] list2 = [1, 2, 3, 4, 5 ] result = [] for i in range(len(list1)): result.append((list1[i],list2[i])) result #[(a,1),(b,2),(c,3),(d,4),(e,5)] # zip is lazy, so you have to do something like the following for pair in zip(list1, list2): # is [('a', 1), ('b', 2), ('c', 3)] print(pair) ``` * Unzip * (the opposite of Zip) ** Input ``` def f(*args, **kwargs): print('args = ', args) # this is a list print('kargs = ', kwargs) # this is a dictionary print("----------------") for name in args: print('Hi ' + name) for key in kwargs: print(key, '=' ,kwargs[key]) ``` ``` f('Tom','Quan','Nhan','Ai', year='1992', school='cool', some_number=1234) ``` ** Output ``` args = ('Tom', 'Quan', 'Nhan', 'Ai') kargs = {'year': '1992', 'school': 'cool', 'some_number': 1234} ---------------- Hi Tom Hi Quan Hi Nhan Hi Ai year = 1992 school = cool some_number = 1234 ``` ## Python Decorator ``` def my_decorator(func): def wrapper(*args, **kwargs): print("Something is happening before the function is called.") func(*args, **kwargs) print("Something is happening after the function is called.") return wrapper @my_decorator def greeting(name): print("Hi", name) greeting('coderschool') ``` ## Regular Expressions ![](https://i.imgur.com/xKmGJnY.png) re.sub(email_regex, 'EMAIL_HERE', text) re.split('[aeiou]', 'consequential') re.match("a", "cat") re.search("c", "dog") re.split("[ab]", "carbs") # Thursday - Bash & GIT ## Bash command * First Commands * echo “Hello World” * date * pwd : Print Working Directory * ls :Listing) * cd :change directory * history * Working with filesystem * Root directory: / * Home directory: - * Working with files and directories * mkdir dir1 dir2 dir3 creates three directories * rmdir dir1 dir2 dir3 removes the empty directories * rm -R dir deletes non-empty directory * cp file1 file2 copies file, -i (interactive mode) to prompt user to answer if they want to overwrite file. * mv file1 file2 moves file * touch file_name creates an empty file * File inspection * File inspection * cat file_name views a file’s contents or concatenate several files * less large_file one page is displayed at a time; space bar to page down; q to quit (less serveral files: :n to move to next file, :p to go back, :q to quit * head -n 3 file_name prints the first 3 lines of the file * tail -n 3 file_name prints the last 3 lines of the file * shuf -n 3 file_name prints randomly 3 lines of the file * wc file_name counts number of lines, words, and characters in the file * column -s"," -t example_data.csv (be careful to use column on very large files) * sort file_name sortes the content (-r to reverse, -u for getting rid of duplicates) * grep * grep Hello greetings.txt: find line with the word “Hello” * grep —-ignore-case hello greetings.txt = grep -i hello greetings.txt argument with more than one letter start with two dash * grep -E ‘[Hh]ello’ greetings.txt: apply regex * grep —-invert-match hello greetings.txt = grep -v hello greetings.txt : find every line that not contains “hello” * grep -i -v hello greeting.txt: find every line that not contain the case-insensitive “hello” * grep -r Hello folder/: search every subfolder and file for the string “Hello” * sed * sed ‘s/Hello/Goodbye/‘ greetings.txt: s(substitute) → substitute “Hello” with “Goodbye” for every line in the file. sed printed out changes but not actually change the file. cat greetings.txt to see * sed -E ‘s/Hello|hello/GOODBYE/’ greetings.txt: regex * sed -i‘old’ ‘s/Hello/Goodbye/‘ greetings.txt: change the file and create a new “old” file as back up. Then if you don’t want reverse you can do mv greetings.old.txt greetings.txt * sed -i‘’ ‘s/Hello/Goodbye/’ greetings.txt: change the file without backup ## Git ![](https://i.imgur.com/BSOzWlQ.png) ![](https://i.imgur.com/90FRwk4.png) # Friday - Github, HTML & CSS & Web Scraping ## HTML * HTML is not a programming language * HTML - Hyper Text Markup Language is a markup language for documents designed to be displayed in a web browser * Tag Syntax * <tagname>content</tagname> * Can have attributes * End tag is with forward slash * Some tags close themselves ![](https://i.imgur.com/XxxLvRu.png) ## CSS (Cascading Stylesheets) * NOT a programming language * Used to website layouts and design * Can be extended with Sass/ Less