Week 1: Introduction

--- tags: coderschool, machinelearning, note, week 1 --- # Week 1: Introduction ## Objectives Understand the following concepts: * What is Machine Learning / AI ? * Role of a Data Scientist / Data Engineer / Machine Learning Engineer * Basic Programming * Thinking like a developer * Learning techniques: active recall, spaced repetition, taking notes. * How to navigate and read popular library documentation * Manipulate data using the Pandas library and the Numpy library **Links to note:** * Emmet Abbreviation: https://docs.emmet.io/cheat-sheet/ * Big O Notation: https://en.wikipedia.org/wiki/Big_O_notation * Google GPU: https://console.cloud.google.com/ * Python Documentation: https://docs.python.org/3/library/ **Books to read** * Python Data Science Handbook: https://drive.google.com/open?id=1E0u9xyLmEpdmk2Hbv6Ypw58CdeobTo5Q ___ **Table of Content** [TOC] --- # Day 1 - Monday ## Questions: **What are the 3 tips and 3 rules of CodeSchool?** * Believe in yourself, be fit, be patient * Discipline and consistency, Support, No ego **What is Machine Learning/AI?** * Reference: https://en.wikipedia.org/wiki/Machine_learning **What is the difference between a Software Engineer and a Machine Learning Engineer?** * Reference: https://towardsdatascience.com/machine-learning-vs-traditional-programming-c066e39b5b17 **What is my expected role in a company after finishing this course?** * Data cleaning, data visualization, junior level stuff **Why is Python important to Data Science?** * Reference: https://steelkiwi.com/blog/python-for-ai-and-machine-learning/ https://becominghuman.ai/why-learning-python-is-important-for-machine-learning-aspirants-a97c5ec0629a **What caused the boom of Machine Learning?** * Reference: https://en.wikipedia.org/wiki/Timeline_of_machine_learning **What is Google Colab?** * Reference: https://www.kdnuggets.com/2018/02/google-colab-free-gpu-tutorial-tensorflow-keras-pytorch.html **More solutions for team Splitting with Python? (randomly)** * Reference: https://medium.com/@ercanvural.bm/how-to-build-random-team-generator-by-python-67a1724b3c09 https://codeclubprojects.org/en-GB/python/team-chooser/ ___ # Day 2 - Tuesday **Quote of the Day** >*"Break things into small steps.""* ## Questions for reviewing ### 1. How is information stored in binary? What is bit? What is “information”? It is anything that carries a meaning, and the meaning is something we assign to it that means something to us, as humans. We use many different things to convey and store information- words, numbers, books, CDs, flashing lights, sounds, images, hand signals, and so on. Take a simple light, that tells you whether a given device is on or off. If the light is on, the machine is on. The light conveys information to us. In this case, it is one **[bit](https://en.wikipedia.org/wiki/Bit)** (binary digit), the smallest unit of information. A bit can only have two values - **0 and 1**. What meaning we assign to these two states is up to us. We usually assign **1** to mean **ON**, **0** to mean **OFF**, but that’s our doing, nothing inherent in binary itself. If we want to store more than a simple on/off state using binary, we need more bits. Here’s the thing - bits are not just what the binary number system is made up from, it is actually the fundamental unit of information. Bits measure information. A straightforward way to store information in binary is to assign simple positive integer numbers to different things, and just use the direct binary representation of those numbers. The numbers might be literally numbers, or they might be a code that stands for information in a different form - for example, letters in a word or block of text. This idea can take us a very long way. By counting up in binary, we can quickly see how the system works: 0, 1 Now we need a way to represent ‘2’, but we’ve exhausted our symbols. So we put a ‘1’ in the next place column, representing twos, and keep counting 2 → 10, 3 → 11 Now we’ve exhausted the capacity of the 2’s column, so we place a ‘1’ in the next position, representing 4s, and continue 4 → 100, 5 → 101, 6 → 110, 7 → 111 and again, 8 → 1000, 9 → 1001, 10 → 1010, 11 → 1011, 12 → 1100, 13 → 1101, 14 → 1110, 15 → 1111 and so on, as large as we want to go. Each new bit to the left doubles the size of the number, just as each new digit to the left in the decimal system multiplies the size of the number by ten. *For example: the number 109 can be represented as 1101101 in binary:* $101101=1∗2^0+0∗2^1+1∗2^2+1∗2^3+0∗2^4+1∗2^5+1∗2^6$ $=1+0+4+8+0+32+64=109$ Once we can represent numers in binary, we can represent many other kinds of information as well, by converting to numbers using an encoding. There is nothing natural about encoding, it’s just something we humans think up and use. A simple alphabetic code to represent strings of letters could be A = 1, B= 2, C =3, D = 4 … Z = 26 And that would be a possible way to represent a lot of simple English text as numbers, and hence as binary. But as a code it’s rather poor, as it does not allow punctuation marks, lower case characters, foreign characters, other symbols or other writing systems. Plus, everyone has to agree on what code to use so that we can exchange information - information that cannot be read is not really information at all. The code we all currently agree to use for text is called **[Unicode](https://en.wikipedia.org/wiki/Unicode)**, and is an evolution of an earlier system called **[ASCII](https://en.wikipedia.org/wiki/ASCII)**. What about other kinds of information. Sound, for example? How can we represent sound in binary? First we can digitise sound into a stream of discrete samples. A sample is simply a number representing the instantaneous amplitude of the sound signal at a fixed point in time. By sampling sound sufficiently quickly, we can produce enough samples to reproduce the sound completely at the other end of some communication channel. Because samples are just numbers, we can store them as binary directly, just as with text. We have to agree on what our numbers mean, how that relates to samples, and so on. This agreement defines a **[format](https://en.wikipedia.org/wiki/Audio_file_format)** for a sound file, for example. What about images? How can we store the information of an image in binary? If we can turn the image into numbers, we can store the numbers in binary form. A process of sampling is used here, just as for sound. But instead of a sample representing the instantaneous amplitude of a sound wave in time, an image sample represents the colour and luminous intensity of a spatial point in the image. The image samples are called **[pixels](https://en.wikipedia.org/wiki/Pixel)**, and there are hundreds of different ways to sample an image and turn it into numbers. To convey that information to another person, again we have to agree what the numbers mean. This is our image file format, for example. If you can convert anything to numbers, then numbers can be stored in binary. We only have to agree what the numbers mean, and we can convey any information that way. Note that with current technology, not everything can be turned into numbers. We don’t know a good way to digitise smells for example. If we did, we could transmit a smell to another person, then use the numbers to reproduce the smell (using some device) at the other end. The numbers could be turned into binary, just like all numbers. ([Source](https://www.quora.com/How-is-information-stored-in-binary)) *Refer to [CS50's Introduction to Computer Science](https://www.edx.org/course/cs50s-introduction-to-computer-science) for further information.* ### 2. What is byte? The smallest amount of transfer of data(structured/ unstructured) is one bit. It holds the value of a 1, or a 0. (Binary coding). Eight of these 1's and zero's are called a byte. Why eight? The earliest computers could only send 8 bits at a time, it was only natural to start writing code in sets of 8 bits. This came to be called a byte. A bit is represented with a lowercase "b," whereas a byte is represented with an uppercase "b" (B). So Kb is kilobits, and KB is kilobytes. A kilobyte is eight times larger than a kilobit. A simple 1 or 0, times eight of these 1's and 0's put together is a byte. The string of code: 10010101 is exactly one byte. So a small gif image, about 4 KB has about 4000 lines of 8 1's and 0's. Since there are 8 per line, that's over (4000 x 8) 32,000 1's and 0's just for a single gif image. How many bytes are in a kilobyte (KB)? One may think it's 1000 bytes, but its really 1024. Why is this so? It turns out that our early computer engineers, who dealt with the tiniest amounts of storage, noticed that $2^{10}$ (1024) was very close to $10^3$ (1000); so based on the prefix kilo, for 1000, they created the KB. (You may have heard of kilometers (Km) which is 1000 meters). So in actuality, one KB is really 1024 bytes, not 1000. It's a small difference, but it adds up over a while. ([Source](https://www.quora.com/What-is-meant-by-%E2%80%9Cbit%E2%80%9D-and-%E2%80%9Cbyte%E2%80%9D)) ![One bit and One Byte](https://i.imgur.com/l7tZ7ES.gif) *Reference: [History of Bits and Bytes in Computer Science](https://o7planning.org/en/11573/history-of-bits-and-bytes-in-computer-science) [Why is one byte formed by 8 bits?](https://www.quora.com/Why-is-one-byte-formed-by-8-bits) [How big is a petabyte, exabyte or yottabyte? What’s the biggest byte for that matter?](https://www.zmescience.com/science/how-big-data-can-get/)* ### 3. What is computational graph? A computational graph is defined as a directed graph where the nodes correspond to mathematical operations. Computational graphs are a way of expressing and evaluating a mathematical expression. For example, here is a simple mathematical equation: $$p=x+y$$ We can draw a computational graph of the above equation as follows. ![](https://i.imgur.com/1bzzPSU.jpg) The above computational graph has an addition node (node with "+" sign) with two input variables x and y and one output q. Let us take another example, slightly more complex. We have the following equation. $$g=(x+y)∗z$$ The above equation is represented by the following computational graph. ![](https://i.imgur.com/HgKYEVh.jpg) ([Source](https://www.tutorialspoint.com/python_deep_learning/python_deep_learning_computational_graphs.htm)) ## 4. What is a flowchart? Flowchart is a graphical representation of an algorithm. Programmers often use it as a program-planning tool to solve a problem. It makes use of symbols which are connected among them to indicate the flow of information and processing. The process of drawing a flowchart for an algorithm is known as “flowcharting”. Example of a flowchart: ![](https://i.imgur.com/PejIoOH.jpg) ### 5. What is pseudo code? Pseudo code is a term which is often used in programming and algorithm based fields. It is a methodology that allows the programmer to represent the implementation of an algorithm. Simply, we can say that it’s the cooked up representation of an algorithm. Often at times, algorithms are represented with the help of pseudo codes as they can be interpreted by programmers no matter what their programming background or knowledge is. Pseudo code, as the name suggests, is a false code or a representation of code which can be understood by even a layman with some school level programming knowledge. Examples: - If statement with one condition ``` IF you are happy THEN smile ENDIF ``` - If statement with an else section ``` IF you are happy THEN smile ELSE frown ENDIF ``` *Reference: https://blog.usejournal.com/how-to-write-pseudocode-a-beginners-guide-29956242698 https://www.geeksforgeeks.org/how-to-write-a-pseudo-code/* ## Lecture & Lab: Python Programming ### Lecture: - Refer to [MLE Learning Materials](http://learning.coderschool.vn/courses/full_time_mle_2/unit/1#!tuesday) for Table of Contents - Refer to [Basic Python Colab file](https://drive.google.com/open?id=1zQ3zuY1oN03rOU64vprK0nF1YcgY02Nf) (copy version) for lecture ### Lab: - **Problem 1: Find the greatest common divisor of two numbers** **Pseudo code:** ``` IF a ``` ## Sidenotes * Coding Conventions: https://en.wikipedia.org/wiki/Coding_conventions * Fibonacci Sequence: https://www.mathsisfun.com/numbers/fibonacci-sequence.html https://www.geeksforgeeks.org/python-program-for-n-th-fibonacci-number/ * Recursive Function: https://www.geeksforgeeks.org/recursive-functions/ https://www.quora.com/Is-recursion-good-or-bad-in-programming * Define Python function with default arguments: https://www.geeksforgeeks.org/default-arguments-in-python/ * 2D array in Python: https://www.tutorialspoint.com/python_data_structure/python_2darray.htm * 3D array in Python: https://www.educba.com/3d-arrays-in-python/ * Slicing: https://www.journaldev.com/23139/python-slice * Reverse in Python slicing: https://www.educative.io/edpresso/how-do-you-reverse-a-string-in-python * What is the use of tuples: http://openbookproject.net/thinkcs/python/english3e/tuples.html * For loop: iterated variable vs iteration variable * String and tuple are immutable * Use break to stop the loop * Data structures: array, list, dictionary * List comprehension: https://www.digitalocean.com/community/tutorials/understanding-list-comprehensions-in-python-3 * Data Flow Diagram: https://medium.com/@warren2lynch/data-flow-diagram-comprehensive-guide-with-examples-d9858387f25e * Swap variables in Python: https://www.programiz.com/python-programming/examples/swap-variables ___ # Day 4 - Thursday ## Bash and Git Command Line * Absolute path and relative path, in Python X first than y * cd ./ = current folder * cd .. = parent folder * can't delete a folder with files inside * rm -R remove recursively everything inside a folder * Difference between directories and folders: https://www.geeksforgeeks.org/difference-between-file-and-folder/ * mv to move file * cat is concatenate without affecting the original files * nano open an editor for a text file: https://linuxize.com/post/how-to-use-nano-text-editor/ **These commands are important** > mkdir dir1 dir2 dir3 creates three directories >rmdir dir1 dir2 dir3 removes the empty directories >rm -R dir deletes non-empty directory >cp file1 file2 copies file, -i (interactive mode) to prompt user to answer if they want to overwrite file. >mv file1 file2 moves file >touch file_name creates an empty file * We can store commands in bash script: https://www.taniarascia.com/how-to-create-and-use-bash-scripts/ * What is Vim? A text editor: vim create_flask_app.sh https://medium.com/@fay_jai/what-is-vim-and-why-use-vim-54c67ce3c18e * How to use grep: need researching * grep - E "^1" every line starts with 1 * grep - E ",$" every line ends with , * What is sed command? Need researching * What is Git? What is Bash? * history | tail -n 10: Select the last 10 lines * What is a pipeline?ge **Command's Output** * standard output and standard error: https://en.wikipedia.org/wiki/Standard_streams * > redirect output ot file >> is append output to file 2> is to append standard error to file &> is whatever happens, save the result to my file * sudo is for permission: https://kb.iu.edu/d/amyi * You can run bash command on Google Colab * Review Git command: git -- * git reset -- hard hash address * git rest -- soft hash address * What is HEAD? Like a pointer to the current version commit(which is hash) *git checkout -b create new branch and jump to it * remember to git add . after every file change * Exercises to practice git branching: https://learngitbranching.js.org/ * What is Staging Server? https://www.techopedia.com/definition/4205/staging-server ### Reference: http://linuxcommand.org/lc3_wss0010.php https://play.typeracer.com/ ___ # Day 5 - Friday * Javascript controls the behavior of HTML * CSS is for styling HTML * What is a constant? * Learn more about Python requests library: https://realpython.com/python-requests/ * Weekly Assignments: Crawl the website: https://tiki.vn/ Check the crawler rules before executing: https://tiki.vn/robots.txt Crawl this category: https://tiki.vn/dien-thoai-may-tinh-bang The information of the products: name, price, description