NLP notebook - HackMD

# NLP notebook ## Part one ![](https://i.imgur.com/SIq65bv.png) ```python= import re string a = "This is a string" b = re.match('^This.*string$', a) c = re.search('^This.*string$', a) ``` ## Part two ### random 具常態分佈 ```python= random.randn(column, index) random.randm(shape) ``` ### data Create Numpy Array ```python= import mumpy as np arr = np.array(list, type="") arr = np.ones(shape) arr = np.full(shape) arr = np.eye(shape, value) arr = np.linespace(0, 2*np.pi, 5) arr = np.arange(5) ``` function ```python= import mumpy as np arr = np.array(list) arr.ndim() #維度 arr.shape() #從外面讀近來 #[[1, 2], [3, 4], [5, 6]] (3, 2) ``` math ```python= import mumpy as np arr1 = np.array(); arr2 = np.array(); np.add(arr1, arr2) np.subtract(arr1, arr2) np.multipy(arr1, arr2) np.divide(arr1, arr2) np.sqrt(arr1) np.dot(arr1, arr2) # 1 dim vector 2up dim matrix np.matmul(arr1, arr2) # 1 dim vector 2up dim matrix arr1.T #transpose np.linalg.inv(arr1) #反矩陣 np.linalg.det(arr1) #行列式 np.linalg.svd(arr1) #超難 #begin arr.mean() #平均數問 value arr.max() arr.min() arr.std() arr.var() arr.sum() #end 以上都可以選填欄或列 arr.argmax() #問 index ``` compare, Boolean index ```python= import mumpy as np arr1 = np.array(list) arr2 = np.array(list) arr1 == arr2 #也是矩陣 ``` sort ```python= import mumpy as np arr = np.array(list) np.sort(arr, axis=0) ``` slicing ```python= import mumpy as np arr = np.array(list) arr[:,:] #參考迭帶器不包含尾 ``` conversion ```python= import pandas as pd import mumpy as np arr = pd.read_csv() arr = np.array(arr) #to np arr = pd.DataFrame(arr) #to pandas ``` bordcast ```python= import mumpy as np arr1 = np.arrary() arr2 = np.arrary() arr +- number arr +- arr ``` reshape ```python= import mumpy as np arr = np.arrary() arr.reshape(arr, shape) np.stack((arrs), axis=) ``` ### Pandas 宣告 ```python= import pandas a = pd.Series([data], column=[], index=[]) ``` 寫、讀檔 ```python= import pandas fh = pd.read_csv('path') fh_html = pd.read_html(url) fjh_query = pd.read_sql_query(sql) fh.to_csv('path', index=boolean) fh.to_sql(table, conn, if_exist=method, index= boolean) ``` 讀首、尾 ```python= import pandas fh = pd.read_csv('path') print(fh.head(int)) print(fh,tail(int)) ``` function ```python= import pandas fh = pd.read_csv('path') print(fh.shape) #(column, index) print(fh.describe()) #資料資訊 print(fh.max()) print(fh.mean()) print(fh.iloc()) #index location print(fh[statemenet]) #資料選擇器 #statement ex df['column'] > 500 ``` 增刪資料 ```python= import pandas fh = pd.read_csv('path') fh_add = pd.read_csv('path') fh = fh.append(fh_add,ignore_index=boolean) fh = fh.Series(fh_add) fh = fh.drop([column], axis=1) fh = fh.drop([index], axis=0) ``` Missing value ```python= import pandas fh = pd.read_csv('path') fh = fh.dropna() fh[colomn].fillna(fh[colomn].mean(), inplace=True) ``` encoding ```python= import pandas fh = pd.read_csv('path') fh[new column] = pd.Categorical(fh[original column]).codes ``` apply ```python= import pandas fh = pd.read_csv('path') #inline fh[column] = fh[column].apply(lambda x: 'high' if statement ) #function def fun(x): return x fh = fh[column].apply(fun) ``` merge value ```python= import pandas fh1 = pd.read_csv('path') fh2 = pd.read_csv('path') fh = pd.merge(fh1, fh2, on=column, how=method) #method left right outer ``` group by ```python= import pandas fh = pd.read_csv('path') fh.group(column).functiob() ``` Time Series ```python= pd.to_datetime(time, formate) ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.