---
tags: BMMB554, Block1
---
# 1.5. Graphics with Pandas and Python I
[](https://hackmd.io/u_G9WTI7Slu3T53tk6WAYw)
The best way to understand what is in your data is to look at it. Pandas and Python provide ample opportunities for visually representing you data. WE will introduce graphing concepts by first talking about high and low level graphic libraries. Then we will focus of low level graphing and apply it to SARS-CoV-2 variant frequency data.
## High versus low level
In this example we will use [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet) dataset by borrowing it from [Seaborn](https://seaborn.pydata.org/index.html) package:
```python=
df = sns.load_dataset("anscombe")
```
### High level plotting with [Seaborn](https://seaborn.pydata.org/index.html)
Seaborn is *high level plotting library*. To plot this dataset in seaborn all we need to do is this:
```python=
import seaborn as sns
df = sns.load_dataset("anscombe")
sns.lmplot(x="x",
y="y",
col="dataset",
hue="dataset",
data=df,
col_wrap=2,
ci = None
)
```
which will give us this:

This single command ([`seaborn.lmplot`](https://seaborn.pydata.org/generated/seaborn.lmplot.html)) plots the datapoints and fits a linear regression line to the data. It presents that information in four panels based on `hue` parameter assembled in two columns (`col_wrap = 2`).
This is called **high level** plotting library, because it provides a number of convenience functions. For example, it this case we did not have to explicitly compute regression parameters. The library did this for us. This is great for quick data exploration, but it becomes a bit more problematic when you need full grained control of figure layout.
### Low level plotting with [Bokeh](https://docs.bokeh.org/en/latest/index.html)
Bokeh is a low level library that allows generation of interactive plots for modern web browsers. Initially it seems that plotting a similar figure with Bokeh is much more work:
```python=
import bokeh.io
import bokeh.plotting
from bokeh.models import ColumnDataSource,Quad
from bokeh.plotting import figure, show,output_file
from bokeh.palettes import Set1_7
from bokeh.layouts import gridplot
bokeh.io.output_notebook()
figs = []
for i, dataset in enumerate(df['dataset'].unique()):
if i == 0:
fig = figure(plot_height = 400, plot_width = 600)
else:
fig = figure(plot_height = 400, plot_width = 600, x_range = figs[0].x_range, y_range = figs[0].y_range,)
x = df[df['dataset']==dataset]['x']
y = df[df['dataset']==dataset]['y']
slope,intercept = np.polyfit(x,y, 1)
y_fit = [slope * i + intercept for i in x]
source = ColumnDataSource({'x':x, 'y':y,'y_fit':y_fit})
fig.circle(x='x',y='y',source=source, color = Set1_7[i],size=10)
fig.line(x='x',y='y_fit',source=source,color = Set1_7[i],line_width=4,legend_label='y='+str(round(slope,2))+'x+'+str(round(intercept,2)))
figs.append(fig)
grid = gridplot( [[figs[0],figs[1]],[figs[2],figs[3]]] )
show(grid)
```
which would produce the following image:

While it appears more complicated it is actually much more powerful. One important difference is that this image is **interactive** as you can see [here](https://colab.research.google.com/github/nekrut/BMMB554/blob/master/2021/ipynb/high_versus_low.ipynb).
## Understanding Bokeh
The following [notebook](https://colab.research.google.com/github/nekrut/BMMB554/blob/master/2021/ipynb/Plotting1.ipynb) introduces basics of Bokeh.
## In-class practice
Open [this notebook](https://colab.research.google.com/github/nekrut/BMMB554/blob/master/2021/ipynb/Plotting1_try.ipynb) and complete all exercises. Type your code in cells containing exercise comment:
```
# EXERCISE
```