# Week 2 (23-28/11/20)
## Data structures and Algorithms (MON, 23/11/20)
==**Big O Notationion**==
Describes the limiting behaviour of a function when the ==argument== tends **towards infinity**
* **Time complexity** of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input.
==**Big O Orders**==
* O(1)-constant
* O(logn)-logarithmic
* O(n)-linear
* O(nlogn)-linearithmic
* O(n**2)-quadratic
* O(c**n)-exponential (c is constant)
**Determine O()**
```
a, b=0, 0
n-1000
for i in range(n):
a=a+random.randit(0, 10)
for j in rnage(n):
b=b+random.randit(0,10)
```
---> O(n)
```
i=n
while (i>0):
print(i)
i=i//2
```
---> O(logn)
---
```
a= [random.random() for _ in range(n)]
for i in range (n-1):
for j in range (i+1, n):
if a[i]<a[j]:
a[i], a[j] = a[j], a[i]
```
---> O(n**2)
---
```
def search(arr, n, x):
"""If x is present then return its location, otherwise return -1
"""
for i in range (n):
if (arr[i] == x):
return i;
return -1;
```
---> O(n)
---
**==Searching & Sorting Algorithms==**
**==Binary Search==**

**==Recursion==**
**Recusion** occurs where a *function* being defined is applied **within its own definition**
---> O(logn) is generally the time complexity of **binary search**
Advantages of recursion:
* neat to solve complex problem, easy to understand
* Used in other algorithms (Binary Search)
**Factorial**
n!= n x (n-1)!
e.g.
factorial(n)=nxfactorial(n-1)
factorial(n-1)=(n-1)xfactorial(n-2)
factorial(1)=1
**The Base Case**
---> have to include base case to stop the recursion or else it goes on forever
---
**==Advanced Sorting==**
1. **==Merge Sort==**
* Average case= Worst case= O(nlogn)
* Split everything first then start sorting only when merging process start
2. **==Quick Sort==**
* Average case: O(nlogn)
* Worst case: O(n^2) (when the list has already been sorted: grow tree on 1 side only)
* Given an array, set an element x of array as ==**pivot**==
bubble sort: O(n^2)
merge sort: O(nlogn) for both worst and average case
quick sort: O(nlogn)-average case, O(n^2)-worst case
---
**==Object Oriented Programming(OOP)==**
**Functional programming vs OOP**
**FP**: create a function to solve problems, ie emphasizes on the evaluation of functions
**OOP**: define a **class** then you have an object to fit into the appropriate class
properties=**attributes** ie description of object
**methods**=what kind of action the object can do
```
type(object) ---> bring out class of the object
```
---
## Numpy (TUE, 24/11/20)
==**Numpy**==: a library that allows us to work with large, multi-dimensional arrays and matrices and perform high-level mathematical functions on these arrays.
Python variable:`x=4`
x is a pointer, pointing to a position in memory containing all the Python object information, including the bytes that contain the integer value 4
Numpy array **stores the exact value** (does not need a medium like Python)
--->Numpy is **faster** than Python list
**==Syntax==**
```
tmp = np.array([3.14, 4, 2, 3])
print(tmp)
print(tmp.dtype)
```
Output:
```
[3.14 4. 2. 3. ]
float64 #type is float stored in 64bits
```
---
* When put a mixture of type of input, numpy automatically changes all into 1 type
`np.array([3.14, 4, 2, 3, 'a'])`
Output:
`array(['3.14', '4', '2', '3', 'a'], dtype='<U32')`
```
# if you want to explicitly set the data type
np.array([1, 2, 3, 4], dtype='float32')
```
**==Basics==**
Tensors: a collection of values

Visualizing tensors:
```
x = np.array(6) #Scalar
print ("x: ", x)
# Number of dimensions
print ("x ndim: ", x.ndim)
# Shape
print ("x shape:", x.shape)
# Size of elements (how many numbers does x store)
print ("x size: ", x.size)
# Data type
print ("x dtype: ", x.dtype)
```
---
```
x = np.array([1.3 , 2.2]) #Vector (1 dimensional)
```
---
```
x = np.array([ [1,2] , [3,4] ]) #Matrix (list of list, 2 dimensional
```
---
```
x = np.array([
[[1,2],
[3,4]],
[[5,6],
[7,8]]
]) #3D tensor,
```
**IMAGE**
`imgage.shape` ---> `(427, 640, 3)`
427:height
640:width
3:channels (R,G,B)
pixel: number between 0-255
**Functions**: to create tensors **quickly**
```
print ("a 2x2 matrix full of zeros:\n", np.zeros(shape=(5,2)))
print ("a 2x2 matrix full of ones:\n", np.ones(shape=(2,2)))
print ("a 2x2 identity matrix:\n", np.eye((2))) # identity matrix
print ("a 2x2 matrix full of random numbers between 0 and 1:\n", np.random.random((2,2)))
```
---
`x= np.random.random((2,2,3)) #2 rows, 2 columns, 3 channels`
```
x[0,:,:] #first matrix
x[1,:,:] #second matrix
x[1,1,:] #second row of second matrix
```
**==Indexing==**

**Slicing**
`x[[0,2],:] #first and third row`
**Integer array indexing**
```
x = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(x)
```
```
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
```
```
print ("indexed values: ", x[[0,1,2], [0,2,1]]) # (0, 0), (1, 2), (2, 1)
```
Output:
`indexed values: [ 1 7 10]`
---
**Boolean array indexing**
```
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x.shape)
print ("x:\n", x)
print ("x > 2:\n", x > 2) #compare every single value in matrix with the number 2
print ("x[x > 2]:\n", x[x > 2]) #filter out number > 2
```
Output:
```
(3, 2)
x:
[[1 2]
[3 4]
[5 6]]
x > 2:
[[False False]
[ True True]
[ True True]]
x[x > 2]:
[3 4 5 6]
```
**==Arithmetic==**
**Basic Maths**
`np.add(x, y)) # or x + y`
`np.subtract(x, y)) # or x - y`
`or x * y (different from x.dot(y))`
**Dot product**
`c = a.dot(b)` OR
`c = np.dot(a, b)`
**Axis Operation**
```
x = np.array([[1,2],[3,4]])
print (x)
print ("sum all: ", np.sum(x)) # adds all elements
print ("sum axis=0: ", np.sum(x, axis=0)) # sum across rows meaning collapsing row together--> become 1 row
print ("sum axis=1: ", np.sum(x, axis=1)) # sum across columns ie collapsing columns together become 1 column
```
Output:
```
[[1 2]
[3 4]]
sum all: 10
sum axis=0: [4 6]
sum axis=1: [3 7]
```
**MIN/MAX**
```
x = np.array(
[[1,2,3],
[44,45,0]]
)
print(x)
print(x.shape)
print ("min: ", x.min())
print ("max: ", x.max())
print ("min axis=0: ", x.min(axis=0))
print ("min axis=1: ", x.min(axis=1))
```
Output:
```
[[ 1 2 3]
[44 45 0]]
(2, 3)
min: 0
max: 45
min axis=0: [1 2 0]
min axis=1: [1 0]
```
**==Axis operation==**
axis 0=rows
axis 1=columns
[1] still a vector ---> 1 dimensional
[[1]] turn into matrix of 1x1 --->2 dimensional
**==Broadcasting==**
---
## Basic SQL (WED, 25/11/20)
**==What is Database==**
A **collectio** of data, can be stored+accessed electronically from a computer system
**Database management system (DBMS)**: a software to manage the database
1. Non-relational database
2. Relational database:collection of tables(relations), each consisting a set of **rows and columns**
- Table(Entity): a structured list of data
- Column: table is made up of one or more columns, each of column have a specific **data type**
- Row(record):table can have multiple row
- **Primary key**: ==column== that **uniquely identifies each row** in a table
--->Values in primary key should never be modified/updated
- **Foreign key**: a referencing key used to link two tables together
==**Entity relationship diagram**==


**Mandatory**:every single genre links to at least one track in the tracks table
**Optional**:genre classical but have no tracks,
**One-Many** (1 genre can have different tracks)

mandatory many --------------------------------optional one
**==Structured Query Language==**
Language designed to query, manipulate, and transform data from a relational database
**==Order of Execution==**

---
Print out how many artist there are in the database
```
pd.read_sql_query('SELECT COUNT(Name) FROM artists', conn)
```
==**Queries with Constraint**==
```
SELECT column, another_column, ...
FROM mytable
WHERE conditions
AND/OR another_condition
AND/OR ...;
```
---
#Find all invoices where Total is less than or equal to 1 (dollar)
`pd.read_sql_query('SELECT * FROM invoices WHERE Total<=1',conn)`
---
#Find all invoice in 2011
```
pd.read_sql_query('SELECT InvoiceDate FROM invoices WHERE strftime("%Y",InvoiceDate) = "2011"',conn)
```
OR
```
pd.read_sql_query('SELECT * FROM invoices WHERE InvoiceDate LIKE "2011%"',conn)
```
---
==**Operators for **numerical** data**==

==**Operators form **text** data**==

---
==**Filtering and Sorting query results**==
* Discard rows that have a duplicate column value by using the DISTINCT keyword:
```
SELECT DISTINCT column, another_column, …
FROM mytable
WHERE condition(s);
```
* Sort result + LIMIT+ OFFSET
```
SELECT column, another_column, …
FROM mytable
WHERE condition(s)
ORDER BY column ASC/DESC;
LIMIT num_limit OFFSET num_offset;
```
==**Multi-table queries with JOINS**==
```
SELECT column, another_table_column, …
FROM mytable
INNER JOIN another_table
ON mytable.id = another_table.id
WHERE condition(s)
ORDER BY column, … ASC/DESC
LIMIT num_limit OFFSET num_offset;
```
==**LEFT/ RIGHT/ OUTER JOINS**==
1. Left join
2. Right join
3. Outer join
4. Inner join
5. Cross join
==**Queries with aggregates and GROUP BY**==

```
SELECT group_by_column, AGG_FUNC(column_expression) AS aggregate_result_alias, …
FROM mytable
WHERE condition
GROUP BY column
```
---
#Find the total number of tracks belong to each genre
```
pd.read_sql_query('''
SELECT genres.Name, COUNT(tracks.TrackId) as NumberOfTracks
FROM tracks
JOIN genres
ON tracks.GenreId = genres.GenreId
GROUP BY genres.GenreId
''', conn)
```
---
#Find the number of customers each employee has supported
```
pd.read_sql_query('''
SELECT e.FirstName, e.Title, COUNT(c.CustomerId) AS NumberOfCustomers
FROM employees AS e
LEFT JOIN customers AS c
ON e.EmployeeId = c.SupportRepId
GROUP BY e.EmployeeId;
''', conn)
```
---
#Find the customer who paid the most
```
pd.read_sql_query('''
SELECT c.CustomerId, c.FirstName, SUM(UnitPrice) AS TotalSpending
FROM customers AS c
JOIN invoices AS invoice
ON c.CustomerId= invoice.CustomerId
JOIN invoice_items AS items
ON items.InvoiceId = invoice.InvoiceId
GROUP BY c.CustomerId
ORDER BY TotalSpending DESC
LIMIT 1
''', conn)
```
---
## Advanced SQL (THU, 26/11/20)
**Components of Relational Database:**

* Tables
* Columns (fields)
* Rows (records)
* Scheme
**Relations:**
* one to one
* one to many
* many to many (avoid this relation)
==**NoSQL**==
* Non-relational database, each offers a unique query language
* Data organization: stored in a dictionary, has few/no relations

* Advatage: fast look up
* Disadvantage: difficult to modify information inside, unsuitable for complex query
**Scaling**
* SQL is suitable for vertical scaling
* NoSQL is suitable for horizontal scaling

---
**==SQL vs NoSQL==**

==**Sub-queries**==

A Sub-query is a SQL query nested inside a larger query
* Used to calculate summary value to use for a query
* 2 types of sub query
* Correlated: there's a relationship between value in first SELECT query and value in secon SELECT value
* Uncorrelated
```
SELECT column_names
FROM table_names
WHERE value IN
(SELECT column_names
FROM other_table
WHERE condition)
```
Example:
```
pd.read_sql_query('''
SELECT artists_mini.ArtistId, artists_mini.Name, albums_mini.AlbumID, albums_mini.Title
FROM
(SELECT * FROM artists LIMIT 5) as artists_mini
INNER JOIN
(SELECT * FROM albums LIMIT 5) as albums_mini
ON albums_mini.ArtistId = artists_mini.ArtistId
''', conn)
```
==**Common Table Expression**==
CTE is a temporary result to use with other queries
CTEs help organize complex queries

**==Conditionals==**
* If-else statements in SQL
* Used to map on value to another

==**Set Operators**==
Allow the results multiple queries to be combined together
1. UNION
2. UNION ALL
3. INTERSECT
4. EXCEPT
---
## TEST