# Pandas
###### tags: `python`
> using python3 in this document :)
### Read Excel
###### Method 1
```python=
import xlrd
wb = xlrd.open_workbook("hw1_table.xlsx")
sheet = wb.sheet_by_index(1)
nrow = sheet.nrows
ncol = sheet.ncols
sheet.cell_value(3, 2) # (row, column)
```
###### Method 2 (recommended!!)
```python=
import pandas as pd
xl = pd.ExcelFile("hw1_table.xlsx")
# header default = 0, 表示標頭(內容從下一行開始)
df = xl.parse("sheet_Name", header=2)
```
### Read csv
> look up [pandas](https://pythonspot.com/read-excel-with-pandas/) package
> chinese version: [pandas tutorial](https://medium.com/@weilihmen/python-pandas-%E5%9F%BA%E6%9C%AC%E6%93%8D%E4%BD%9C%E6%95%99%E5%AD%B8-%E6%88%90%E7%B8%BE%E8%A1%A8-f6d0ec4f89)
```python=
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
df = pd.read_csv('all_3gram.csv', encoding = "ISO-8859-1")
print(df.columns) # titles
print(df.head(3)) # first 3 rows
```
### Methods
```python=
# get row count
rows = df.shape[0]
# get column count
columns = df.shape[1]
# get row by row index
df.loc[index]
```
### Create Dataframe
```python=
scores = [{"姓名":"小華","TF":90, "DF":80},
{"姓名":"賀明","TF":70, "DF":55},
{"姓名":"小李", "DF":75}]
score_df = pd.DataFrame(scores)
# output:
# DF TF 姓名
# 0 80 190.0 小華
# 1 55 70.0 賀明
# 2 75 NaN 小李
# replace Nan with 0
score_df.fillna(0, inplace = True)
# replace empty data with 0
d = {"姓名":"小華", "DF":75}
t = pd.DataFrame(columns = ["姓名", "TF", "DF"])
t = t.append(d, ignore_index = True).fillna(0)
# 姓名 TF DF
# 0 小華 0 75
```
### Append a row
`d` can be `dictionary`, List of `dictionary` , or `pd.DataFrame`
```python=
d = {"姓名":"小", "TF":75, "DF":70}
score_df = score_df.append(d, ignore_index = True)
```
### Sort by column
```python=
# by column[TF]
result = df.sort_values('TF',ascending=False)
```
### Apply
```python=
def SearchWords(row):
if("鴻海" in row["內容"]):
return 1
return 0
df1["Honhai"] = df1.apply(SearchWords, axis=1)
```
### Counter
```python=
string = "abcdaaa"
string.count("a") # output: 4
string.count("ab") # output: 1
```
```python=
from collections import Counter
w = ["a","b","c","b","c","c"]
Counter(w)
# output: Counter({'a': 1, 'b': 2, 'c': 3})
Counter(w)["c"]
# output: 3
Counter(w).most_common()
# output: [('c', 3), ('b', 2), ('a', 1)]
```
### Print to file
`.txt`
```python=
print("Hello stackoverflow!", file=open("output.txt", "a"))
```
`.xlsx`
```python=
with pd.ExcelWriter('keywordPool.xlsx', mode='a') as writer:
for sheet in sheets:
GetWordBySheet(sheet).to_excel(writer, sheet_name= sheet, index = False)
```
### Dictionary
```python=
a = {}
a["key"] = "value"
a["key1"] = "value1"
# output : {'key': 'value', 'key1': 'value1'}
```
turn all `keys` into a list
```python=
keys = list(a.keys())
# output: ['key', 'key1']
```
turn all `values` into a list
```python=
values = list(a.values())
# output : ['value', 'value1']
```
### List
get `inverse` of list
```python=
a = [1,2,3,4,5]
a[::-1] # output: [5, 4, 3, 2, 1]
```
每x個取一個值
Take a value in every x ones
```python=
list = [start:end:x]
t = a[::2] # output: [1, 3, 5]
```
同時取 `index`和 `內容`
Get both the index and its value
```python=
t = ["a","b","c"]
for i, element in enumerate(t):
print(i, element)
# output:
# 0 a
# 1 b
# 2 c
```
### map
map(func, list)
```python=
def addition(n):
return n + n
# We double all numbers using map()
numbers = [1, 2, 3, 4]
result = map(addition, numbers)
print(list(result))
# output : [2,4,6,8]
```
map with lambda
```python=
numbers = [1, 2, 3, 4]
result = list(map(lambda x: x**2, numbers))
print(result)
# output: [1, 4, 9, 16]
```
### pickle
save
```python=
dbfile = open('examplePickle', 'wb')
pickle.dump(db, dbfile)
```
load
```python=
dbfile = open('examplePickle', 'rb')
db = pickle.load(dbfile)
```
### Chinese 中文 in python2
need to add the first four lines and `u` before the desired words.
```python=
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print(u'遠志') # output: 遠志
```
> can just `print('遠志')` in python 3
### Threading
https://chriskiehl.com/article/parallelism-in-one-line