owned this note
owned this note
Published
Linked with GitHub
# Data visualization using Python
Collaborative document. Everybody can edit (edit button either top left or top right of this page).
- this page: https://hackmd.io/@bast/data-visualization-python
- course material: https://coderefinery.github.io/data-visualization-python/
- chat for follow-up questions: https://coderefinery.zulipchat.com/ (there is a stream `#help` and in that stream a topic `"python and data visualization"`)
## Questions from last week
Moved to here: https://hackmd.io/wk8Ie5NbShSYHMKCt_D_Pg?view to have more space here.
## Day 3
- Seaborn Relplot with many subplots:
- my graphs: I have 18 subplots which are bacterial growth curves (logistic-curves) at different temperatures with different amounts of NaCl. I have made a curve-fit for each of them and graphed the data by looping through my 18 different datasets, and then adding a cruve-fit to each dataset/graph. I think the solution might be in the last part of this video (https://www.youtube.com/watch?v=4DA_dgc521o&fbclid=IwAR378hcCb1XhLQvD8qabbOEnu8FMTnEAFtOB-D6PScTw89-PNmslbPT0Njw) from 11:25, but im not completely sure how to implement it.
- Q1. How to modify labels/text/annotaitone in individual sns.subplots in a sns.facetgrid? In example, I have 18 subplots, and for each of them I would like the growth rate to be presented.
- Here is one solution, which does not mean it is the best solution:
```python
import seaborn as sns
data = sns.load_dataset("tips")
g = sns.relplot(x="total_bill", y="tip", hue="day",
col="time", data=data, facet_kws={'sharex': False})
g.axes[0, 0].set_xlim(0, 100)
g.axes[0, 0].set_xlabel("one label")
g.axes[0, 1].set_xlim(20, 30)
g.axes[0, 1].set_xlabel("different label")
```
- Q2: Can I have different axes on the various subplots in on sns. grid plot? in example, I have hours on the x-axis and some of my data takes 400 hours, while other data takes 30 hours.. its difficult to present the data using the same limits on the x.axis for all plots.
- solved--> found a way to turn of shared axis with 'facet_kws={'sharex': False}) in the sns.relplot settings' it now autoscales the x-axis for the various subplots
- Can you show me (Lisa) how to do that? Will run into this problem soon. Thanks
- Q3. General question, are facetgrids and relplots the same thing in seaborn?
- As far as I understand relplot is an interface to facetgrid. So it is not synonymous but relplots use facetgrids.
- I just recently started my PhD and thus not have much data yet. I have made some distribution plots, staples and pie plots to present soma data I have. Only problem I experienced was the "numbers/data" not appearing on the plots where I want them.
- Also practiced cleaning the data in Python, which went fine but probably not efficently.
- how to plot data with 3 or more dimensions?
- https://www.apnorton.com/blog/2016/12/19/Visualizing-Multidimensional-Data-in-Python/
- https://medium.com/@prasadostwal/multi-dimension-plots-in-python-from-2d-to-6d-9a2bf7b8cc74
- project down to fewer dimensions and more plots
- 2D heatmaps/heatplots where one dimension is the color
- can get a bit difficult to grasp when there are more than 4-5 colors and too many symbols
- but 3-4 dimensions can be ok for live presentations and when it's about inherently 3D properties like molecular structures or geographic profile
- Multiple hover names in Plotly 3D
- Relevant? https://stackoverflow.com/questions/64220238/multiple-hover-name-for-3d-plot-in-python-plotly
- interactive plots
- plotly animations https://plotly.com/python/animations/
- or if you want interactive plots but cannot or don't want to use plotly, try ipywidgets which can add interactive controls to any Python function, see for instance https://ipywidgets.readthedocs.io/en/latest/
- reworking of the plate well data processing/visualization code
- https://github.com/Andersmb/Bacteria-Growth-Curve-Analysis
- Creating a data analysis pipeline
- I am performing stopped flow analysis which outputs 30-40 datasets for each experiment. I have created a pipeline for cleaning and curvefitting the data. However, my code is slow and i would like to improve the curvefitting portion of my code by "remembering" the last calculated variables and attempting the new curvefit with the previously generated values. If the old values dont work, the code will retry with newly generated values.
- try/except can be used to decide what happens when an exception (error) is found. you can then decide to let the code continue. in the except it is good to list the error and not make it fail on any error. otherwise you may not notice if unexpected errors got introduced. there can be several commands under the "try" block and once an error happens, Python leaves the "try" block and looks for a matching "except" block and continues there.
- once the runs get too lengthy, I would look at workflow managers like snakemake or doit (there are many others)
```python=
from glob import glob # glob for finding files matching a pattern
fitted_data = []
files = glob('*.bka')
for f in files:
try:
params = Curve_fit(x, y)
# and the rest of your code
except RuntimeError:
print(f'Could not fit: {f}')
```
- Bar plots / growth curves. potentially trying out seaborn for that
- also needing feedback on tidy data
- got answered by watching other presentation
- Data clean up from csv files, sorting and extracting specific data I need for the visualization
- was answered by other presentations
- Maniplulating a pandas dataframe that I created from a pdb file. I would like to add specific information (not just a new column) but somehow tag specific ID's to other ID's.. in example: one column contains amino acid residue number 1-420 and I would like to create a new column with the residue ID (alanine, serine etc) that matches up to the residue numbers.
- solution was to `strip()` away whitespace
```python
pdb_Ca = pdb_final[pdb_final['Atom'] == 'CA']
```
- Is there a quick way to create/update an environment file/check the dependencies my notebook has?
- `conda env export --from-history`
- `pip freeze` (there might be a way to only list those that you have installed)
- also have a look at https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/
- also `pip list` works in the notebook directly
- In regard to image export and use of figures as graphs in publications. I have a lot of trouble with the resolution when I export using plotly. I now came upon a description that relate to this (paragraph "Background" in https://github.com/plotly/Kaleido). My specific problem is a line plot which contain 1798 x,y points. Whenever I make figures that are relatively small in terms of size (ex 80x60mm), the .svg files get extremely pixelated even when setting dpi of 300. I can try and show the specific problem tomorrow, but maybe someone knows a good solution already (I would prefer not to useI a different library, since it is only 1 out of 10 figures that cause a headache). Bw Mikkel. **I think I have something that works.. but I will leave this up for discussion.
- I suggest you create a small example which demonstrates this problem and then it will be easier to test it out on my side. Can also be the real example but something that I can also run on my computer.
- I don't quite understand the export options for the jupyternotebook (PDF, LAtex). Tried following manuals online but did not quite work the way I want it to (I want to be able to create e.g. pdfs with the code hidden only showing markdown and plot output)
- This seems to work: https://stackoverflow.com/a/50732747 but it requires installing pandoc. Please write me if this didn't work. I am thiking about creating a Docker/Singularity image for this to make this easier for me and others in future.
- see also https://jupyterbook.org/intro.html
- how to move the legend box in matplotlib
- tight layout `plt.tight_layout()`
- the legend can also be positioned like this: `ax.legend(loc = 'upper right')`