Try   HackMD

Python pandas Quick Reference

Read/Write CSV

import pandas as pd
a = pd.read_csv('input.csv')
a.to_csv('output.csv', index=False)

Merge DataFrames

Merge by a common column

"Merge" two data CSV files that

  • the first line is the column header
  • at least 1 common column exists in both CSV files
import pandas as pd

a = pd.read_csv('file1.csv')
b = pd.read_csv('file2.csv')
a = a.merge(b, on="CommonColumn", how="outer")
a.to_csv('output.csv', index=False)

Notes:

Append new rows of identical columns

"Merge" two data CSV files that

  • the first line is the column header
  • the column headers are the same
  • the rows are from different data sets
import pandas as pd

a = pd.read_csv('file1.csv')
b = pd.read_csv('file2.csv')

# Introduce a new column 'DataSource', if necessary, to indicate the source of the entry
a['DataSource'] = 'file1'
b['DataSource'] = 'file2'

merged_data = pd.DataFrame()
merged_data = merged_data.append(a, ignore_index=True, sort=False)
merged_data = merged_data.append(b, ignore_index=True, sort=False)
merged_data.to_csv('output.csv', index=False)

Add new columns

d = {
    'Name': ['Alice', 'Bob'],
    'Math': [100, 90],
    'English': [80, 85],
}
df = pd.DataFrame(data=d)
"""
    Name  Math  English
0  Alice   100       80
1    Bob    90       85
"""

# Add a new column to all rows
df['Class'] = 'ClassA'

# Adda new column based on other columns
df['Avg'] = df.apply(lambda row: (row['Math'] + row['English']) / 2, axis=1)

Back to @lancec

tags: quick-ref