Python pandas Quick Reference === {%hackmd BJrTq20hE %} ## Read/Write CSV ```python import pandas as pd a = pd.read_csv('input.csv') a.to_csv('output.csv', index=False) ``` ## Merge DataFrames ### Merge by a common column "Merge" two data CSV files that * the first line is the column header * at least 1 common column exists in both CSV files ```python import pandas as pd a = pd.read_csv('file1.csv') b = pd.read_csv('file2.csv') a = a.merge(b, on="CommonColumn", how="outer") a.to_csv('output.csv', index=False) ``` Notes: * The option `how` determines how the merge is performed. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html ### Append new rows of identical columns "Merge" two data CSV files that * the first line is the column header * the column headers are the same * the rows are from different data sets ```python import pandas as pd a = pd.read_csv('file1.csv') b = pd.read_csv('file2.csv') # Introduce a new column 'DataSource', if necessary, to indicate the source of the entry a['DataSource'] = 'file1' b['DataSource'] = 'file2' merged_data = pd.DataFrame() merged_data = merged_data.append(a, ignore_index=True, sort=False) merged_data = merged_data.append(b, ignore_index=True, sort=False) merged_data.to_csv('output.csv', index=False) ``` ## Add new columns ```python d = { 'Name': ['Alice', 'Bob'], 'Math': [100, 90], 'English': [80, 85], } df = pd.DataFrame(data=d) """ Name Math English 0 Alice 100 80 1 Bob 90 85 """ # Add a new column to all rows df['Class'] = 'ClassA' # Adda new column based on other columns df['Avg'] = df.apply(lambda row: (row['Math'] + row['English']) / 2, axis=1) ``` [Back to @lancec](/@lancec) ###### tags: `quick-ref`