--- tags: coderschool, note, datastructure --- # Week 2: Intermediate Python and Database Fundamental Quote of the Week >Divide and Conquer ___ ## Day 1 * Check Bao's web scraper: https://github.com/brookyct95/tiki_scraping * Check Tien's website layout: https://github.com/txtien/tiki-scraping ### Data Structure * Sorting Algorithm: https://en.wikipedia.org/wiki/Sorting * Bubble Sort: https://www.geeksforgeeks.org/bubble-sort/ * Pseudo Code: Language Agnostic, Theoretical https://www.geeksforgeeks.org/how-to-write-a-pseudo-code/ * Merge Sort: Notes: Empty and one-arg list are already sorted, Divide and Conquer: split things into upper and lower halves * Most pseudo code are meant for C (there's no len function in C) * Learn more about sorting algorithm: https://www.geeksforgeeks.org/sorting-algorithms/ * Use string dictionary-like format for documenting your code * Merge sort algorithm visualization: https://www.youtube.com/watch?v=ZRPoEKHXTJg * Sorting algorithm comparison: https://www.youtube.com/watch?v=ZZuD6iUe3Pc ### Time complexity * For loop is very resource-consuming * Algorithm is dependant on the size of the input * Big O Notation: https://www.geeksforgeeks.org/analysis-algorithms-big-o-analysis/ Must read: https://www.geeksforgeeks.org/analysis-of-algorithms-set-3asymptotic-notations/ * Logarithm in computer science: https://www.techwalla.com/articles/uses-of-logarithms-in-computers ### Objective oriented Programming (OOP) * Classes in python (need researching): Python class is different from CSS class * A constructer is to set the definition of the class * Class is built-in in Python * An instance of a class? * Class requires a "self" keyword * What is inheritant in Python? * Python Magic method: https://www.tutorialsteacher.com/python/magic-methods-in-python * Some methods are exclusive to class only * Other than OOP? https://www.codenewbie.org/blogs/object-oriented-programming-vs-functional-programming ### Regular Expressions and String Manipulation Reference: https://www.petefreitag.com/cheatsheets/regex/ * There is no substring method in Python * What is an RFC: https://en.wikipedia.org/wiki/Request_for_Comments * Internet Message Format: https://www.loc.gov/preservation/digital/formats/fdd/fdd000393.shtml * Regex is different in some part between languages: * What is PCRE: https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions * This sign "^" is "not in here" for pattern = re.compile('[^cf]ar') * /b is boundary * /s is space * Python regex exercises: https://regexone.com/ * What does an r represent in Python: https://stackoverflow.com/questions/33729045/what-does-an-r-represent-before-a-string-in-python * 99% of regexes have been done: https://emailregex.com/ * Learn about Python compile function: https://www.programiz.com/python-programming/methods/built-in/compile ### Binary Search Tree * What is casting? https://www.peterbe.com/plog/interesting-casting-in-python * Sometimes you have to be more specific for the readability of the code * O(logn) is important * Iterative is not recursive: https://medium.com/backticks-tildes/iteration-vs-recursion-c2017a483890 * The leading underscore refers to internal function: https://hackernoon.com/understanding-the-underscore-of-python-309d1a029edc https://stackoverflow.com/questions/53687998/function-name-with-a-leading-underscore * A node can only have 2 data, the right is always greater than the left * What is tree traversal: https://en.wikipedia.org/wiki/Tree_traversal * What is utility function: https://stackoverflow.com/questions/25060976/what-do-you-mean-by-utility-functions-in-javahow-it-is-related-to-static * What is helper function: https://web.cs.wpi.edu/~cs1101/a05/Docs/creating-helpers.html * What is the advantage of binary search tree? https://practice.geeksforgeeks.org/problems/advantages-and-disadvantages-of-bst * What is tree structure: https://en.wikipedia.org/wiki/Tree_(data_structure) * Falsiness: False, None, 0, Empty list? What are these? * What is base case in recursion: https://en.wikipedia.org/wiki/Recursion_(computer_science) * What is a node? https://en.wikipedia.org/wiki/Node_(computer_science) ___ # Day 3 ## Introduction to SQL Lession: https://sqlbolt.com/ * What is database? https://en.wikipedia.org/wiki/Database * WHat is schema? https://en.wikipedia.org/wiki/Database_schema Some example of schema? ![](https://i.imgur.com/kCaiMK5.png) * What is the rules of primary key? (Or reference key, foreign key?) * We don't delete the data in practice, we give it a flag (or status) to indicate its deletion (or soft delete?). This is temporarily. * Schema: The child should inherit the parents name, 1 parent with many children * What is varchar(varied) and char(fixed)? https://en.wikipedia.org/wiki/Varchar * The data types of SQL: https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver15 * The varchart and chart: What is the performance? * Why char is fastest? https://www.youth4work.com/Talent/MySql/Forum/118724-what-is-the-difference-between-char-and-varchar * It's more professional to put the KEYWORD before a querry * What is Google File Stream? https://support.google.com/a/answer/7491144?hl=en * What is linnaeus in SQL? * JOIN is INNER JOIN by default * What is FULL JOIN? * HAVING applies to GROUP BY, which is the result of another clause * You can GROUP BY something you don't have during SELECTION * You can join multiple table using JOIN * Remember to instal PostgreSQL ___ # Day 4 * Postgress cheat sheet: http://www.postgresqltutorial.com/postgresql-cheat-sheet/ * Postgres uses multiple users and databases as a way to improve security and division of data * Try to crawl the category trees for: https://tiki.vn/ * Check one simple dataset before try bigger datasets * What is FIFO (First in first out)? https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics) * What is deque method? https://www.geeksforgeeks.org/deque-in-python/ * What is pop left method? https://pythontic.com/containers/deque/popleft * What is the path from Main Cat to Sub Cat? * What is Python Multiprocessing? https://docs.python.org/3.4/library/multiprocessing.html?highlight=process * concurrent.futures module in Python: ProcessPoolExecutor & ThreadPoolExecutor * Hints for weekly assignments: > Crawl n products -> store in DB > Read from DB -> HTML > Category trees then all products > Do this using OOP > 150k products