# NLP notebook
## Part one

```python=
import re
string a = "This is a string"
b = re.match('^This.*string$', a)
c = re.search('^This.*string$', a)
```
## Part two
### random
具常態分佈
```python=
random.randn(column, index)
random.randm(shape)
```
### data
Create Numpy Array
```python=
import mumpy as np
arr = np.array(list, type="")
arr = np.ones(shape)
arr = np.full(shape)
arr = np.eye(shape, value)
arr = np.linespace(0, 2*np.pi, 5)
arr = np.arange(5)
```
function
```python=
import mumpy as np
arr = np.array(list)
arr.ndim() #維度
arr.shape() #從外面讀近來
#[[1, 2], [3, 4], [5, 6]] (3, 2)
```
math
```python=
import mumpy as np
arr1 = np.array();
arr2 = np.array();
np.add(arr1, arr2)
np.subtract(arr1, arr2)
np.multipy(arr1, arr2)
np.divide(arr1, arr2)
np.sqrt(arr1)
np.dot(arr1, arr2) # 1 dim vector 2up dim matrix
np.matmul(arr1, arr2) # 1 dim vector 2up dim matrix
arr1.T #transpose
np.linalg.inv(arr1) #反矩陣
np.linalg.det(arr1) #行列式
np.linalg.svd(arr1) #超難
#begin
arr.mean() #平均數 問 value
arr.max()
arr.min()
arr.std()
arr.var()
arr.sum()
#end 以上都可以選填欄或列
arr.argmax() #問 index
```
compare, Boolean index
```python=
import mumpy as np
arr1 = np.array(list)
arr2 = np.array(list)
arr1 == arr2 #也是矩陣
```
sort
```python=
import mumpy as np
arr = np.array(list)
np.sort(arr, axis=0)
```
slicing
```python=
import mumpy as np
arr = np.array(list)
arr[:,:] #參考 迭帶器 不包含尾
```
conversion
```python=
import pandas as pd
import mumpy as np
arr = pd.read_csv()
arr = np.array(arr) #to np
arr = pd.DataFrame(arr) #to pandas
```
bordcast
```python=
import mumpy as np
arr1 = np.arrary()
arr2 = np.arrary()
arr +- number
arr +- arr
```
reshape
```python=
import mumpy as np
arr = np.arrary()
arr.reshape(arr, shape)
np.stack((arrs), axis=)
```
### Pandas
宣告
```python=
import pandas
a = pd.Series([data], column=[], index=[])
```
寫、讀檔
```python=
import pandas
fh = pd.read_csv('path')
fh_html = pd.read_html(url)
fjh_query = pd.read_sql_query(sql)
fh.to_csv('path', index=boolean)
fh.to_sql(table, conn, if_exist=method, index= boolean)
```
讀首、尾
```python=
import pandas
fh = pd.read_csv('path')
print(fh.head(int))
print(fh,tail(int))
```
function
```python=
import pandas
fh = pd.read_csv('path')
print(fh.shape) #(column, index)
print(fh.describe()) #資料資訊
print(fh.max())
print(fh.mean())
print(fh.iloc()) #index location
print(fh[statemenet]) #資料選擇器
#statement ex df['column'] > 500
```
增刪資料
```python=
import pandas
fh = pd.read_csv('path')
fh_add = pd.read_csv('path')
fh = fh.append(fh_add,ignore_index=boolean)
fh = fh.Series(fh_add)
fh = fh.drop([column], axis=1)
fh = fh.drop([index], axis=0)
```
Missing value
```python=
import pandas
fh = pd.read_csv('path')
fh = fh.dropna()
fh[colomn].fillna(fh[colomn].mean(), inplace=True)
```
encoding
```python=
import pandas
fh = pd.read_csv('path')
fh[new column] = pd.Categorical(fh[original column]).codes
```
apply
```python=
import pandas
fh = pd.read_csv('path')
#inline
fh[column] = fh[column].apply(lambda x: 'high' if statement )
#function
def fun(x):
return x
fh = fh[column].apply(fun)
```
merge value
```python=
import pandas
fh1 = pd.read_csv('path')
fh2 = pd.read_csv('path')
fh = pd.merge(fh1, fh2, on=column, how=method)
#method left right outer
```
group by
```python=
import pandas
fh = pd.read_csv('path')
fh.group(column).functiob()
```
Time Series
```python=
pd.to_datetime(time, formate)
```