Python pandas Quick Reference
===
{%hackmd BJrTq20hE %}
## Read/Write CSV
```python
import pandas as pd
a = pd.read_csv('input.csv')
a.to_csv('output.csv', index=False)
```
## Merge DataFrames
### Merge by a common column
"Merge" two data CSV files that
* the first line is the column header
* at least 1 common column exists in both CSV files
```python
import pandas as pd
a = pd.read_csv('file1.csv')
b = pd.read_csv('file2.csv')
a = a.merge(b, on="CommonColumn", how="outer")
a.to_csv('output.csv', index=False)
```
Notes:
* The option `how` determines how the merge is performed. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html
### Append new rows of identical columns
"Merge" two data CSV files that
* the first line is the column header
* the column headers are the same
* the rows are from different data sets
```python
import pandas as pd
a = pd.read_csv('file1.csv')
b = pd.read_csv('file2.csv')
# Introduce a new column 'DataSource', if necessary, to indicate the source of the entry
a['DataSource'] = 'file1'
b['DataSource'] = 'file2'
merged_data = pd.DataFrame()
merged_data = merged_data.append(a, ignore_index=True, sort=False)
merged_data = merged_data.append(b, ignore_index=True, sort=False)
merged_data.to_csv('output.csv', index=False)
```
## Add new columns
```python
d = {
'Name': ['Alice', 'Bob'],
'Math': [100, 90],
'English': [80, 85],
}
df = pd.DataFrame(data=d)
"""
Name Math English
0 Alice 100 80
1 Bob 90 85
"""
# Add a new column to all rows
df['Class'] = 'ClassA'
# Adda new column based on other columns
df['Avg'] = df.apply(lambda row: (row['Math'] + row['English']) / 2, axis=1)
```
[Back to @lancec](/@lancec)
###### tags: `quick-ref`