Pandas - HackMD

# Pandas ###### tags: `python` > using python3 in this document :) ### Read Excel ###### Method 1 ```python= import xlrd wb = xlrd.open_workbook("hw1_table.xlsx") sheet = wb.sheet_by_index(1) nrow = sheet.nrows ncol = sheet.ncols sheet.cell_value(3, 2) # (row, column) ``` ###### Method 2 (recommended!!) ```python= import pandas as pd xl = pd.ExcelFile("hw1_table.xlsx") # header default = 0, 表示標頭(內容從下一行開始) df = xl.parse("sheet_Name", header=2) ``` ### Read csv > look up [pandas](https://pythonspot.com/read-excel-with-pandas/) package > chinese version: [pandas tutorial](https://medium.com/@weilihmen/python-pandas-%E5%9F%BA%E6%9C%AC%E6%93%8D%E4%BD%9C%E6%95%99%E5%AD%B8-%E6%88%90%E7%B8%BE%E8%A1%A8-f6d0ec4f89) ```python= import sys import numpy as np import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm df = pd.read_csv('all_3gram.csv', encoding = "ISO-8859-1") print(df.columns) # titles print(df.head(3)) # first 3 rows ``` ### Methods ```python= # get row count rows = df.shape[0] # get column count columns = df.shape[1] # get row by row index df.loc[index] ``` ### Create Dataframe ```python= scores = [{"姓名":"小華","TF":90, "DF":80}, {"姓名":"賀明","TF":70, "DF":55}, {"姓名":"小李", "DF":75}] score_df = pd.DataFrame(scores) # output: # DF TF 姓名 # 0 80 190.0 小華 # 1 55 70.0 賀明 # 2 75 NaN 小李 # replace Nan with 0 score_df.fillna(0, inplace = True) # replace empty data with 0 d = {"姓名":"小華", "DF":75} t = pd.DataFrame(columns = ["姓名", "TF", "DF"]) t = t.append(d, ignore_index = True).fillna(0) # 姓名 TF DF # 0 小華 0 75 ``` ### Append a row `d` can be `dictionary`, List of `dictionary` , or `pd.DataFrame` ```python= d = {"姓名":"小", "TF":75, "DF":70} score_df = score_df.append(d, ignore_index = True) ``` ### Sort by column ```python= # by column[TF] result = df.sort_values('TF',ascending=False) ``` ### Apply ```python= def SearchWords(row): if("鴻海" in row["內容"]): return 1 return 0 df1["Honhai"] = df1.apply(SearchWords, axis=1) ``` ### Counter ```python= string = "abcdaaa" string.count("a") # output: 4 string.count("ab") # output: 1 ``` ```python= from collections import Counter w = ["a","b","c","b","c","c"] Counter(w) # output: Counter({'a': 1, 'b': 2, 'c': 3}) Counter(w)["c"] # output: 3 Counter(w).most_common() # output: [('c', 3), ('b', 2), ('a', 1)] ``` ### Print to file `.txt` ```python= print("Hello stackoverflow!", file=open("output.txt", "a")) ``` `.xlsx` ```python= with pd.ExcelWriter('keywordPool.xlsx', mode='a') as writer: for sheet in sheets: GetWordBySheet(sheet).to_excel(writer, sheet_name= sheet, index = False) ``` ### Dictionary ```python= a = {} a["key"] = "value" a["key1"] = "value1" # output : {'key': 'value', 'key1': 'value1'} ``` turn all `keys` into a list ```python= keys = list(a.keys()) # output: ['key', 'key1'] ``` turn all `values` into a list ```python= values = list(a.values()) # output : ['value', 'value1'] ``` ### List get `inverse` of list ```python= a = [1,2,3,4,5] a[::-1] # output: [5, 4, 3, 2, 1] ``` 每x個取一個值 Take a value in every x ones ```python= list = [start:end:x] t = a[::2] # output: [1, 3, 5] ``` 同時取 `index`和 `內容` Get both the index and its value ```python= t = ["a","b","c"] for i, element in enumerate(t): print(i, element) # output: # 0 a # 1 b # 2 c ``` ### map map(func, list) ```python= def addition(n): return n + n # We double all numbers using map() numbers = [1, 2, 3, 4] result = map(addition, numbers) print(list(result)) # output : [2,4,6,8] ``` map with lambda ```python= numbers = [1, 2, 3, 4] result = list(map(lambda x: x**2, numbers)) print(result) # output: [1, 4, 9, 16] ``` ### pickle save ```python= dbfile = open('examplePickle', 'wb') pickle.dump(db, dbfile) ``` load ```python= dbfile = open('examplePickle', 'rb') db = pickle.load(dbfile) ``` ### Chinese 中文 in python2 need to add the first four lines and `u` before the desired words. ```python= # -*- coding: utf-8 -*- import sys reload(sys) sys.setdefaultencoding('utf-8') print(u'遠志') # output: 遠志 ``` > can just `print('遠志')` in python 3 ### Threading https://chriskiehl.com/article/parallelism-in-one-line