owned this note
owned this note
Published
Linked with GitHub
---
tags: ResBaz2021
---
# Data structures (pandas) and data visualization (matplotlib) in Python
## Wednesday, May 19th, 2021 3\:00-5\:00
[Back to Resbaz HackMD Directory](https://hackmd.io/@ResBaz21/directory)
First, we'll go into how to use Python's pandas module to access and interpret data structures (excel, CSV, email, etc.). Then, in the second half of the session, we'll go through how to visualize data in Python using pandas, and matplotlib.
# Link to jupyter notebook
- https://github.com/artinmajdi/resbaz_2021.git
# Getting Started
### install these packages
- jupyter notebook
- numpy
- matplotlib
- pandas
- scikit-image
Optional packages
- opencv
- seaborn
pip install numpy, matplotlib, pandas, scikit-image, opencv-python
pip install jupyterlab notebook
pip install opencv-python
## running your first notebook
1. type in command line: jupyter notebook
2. create a new notebook
3. import below packages
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import skimage
---
## what is pandas
Pandas stands for “Python Data Analysis Library" and is used for data analyzing and wrangaling in python
we can use it in one of three different ways:
1. Create a Pandas data frame from a Python set, dictionary, or Numpy array.
2. Use Pandas to open a local file, normally a CSV file, but it may also be a delimited text file (like TSV), Excel, or something else.
3. Use a URL to open a remote file or directory, such as a CSV or a JSON, or read from a SQL table/database.
### Common uses include:
a) some examples for viewing and analyzing your data
df.mean() Returns the mean of all columns
df.corr() Returns the correlation between columns in a data frame
df.count() Returns the number of non-null values in each data frame column
df.max() Returns the highest value in each column
df.min() Returns the lowest value in each column
df.median() Returns the median of each column
df.std() Returns the standard deviation of each column
b) Selection of Data
- df.iloc[indices]
- df.loc[indices,columns]
c) Filter, Sort and Groupby
- df[some-conndition]
- df.sort_values(col1)
d) Data Cleaning
- df.isnull().sum()
- df.notnull()
- df.replace(old-value,new-value)
e) Join/Combine
- df1.append(df2)
- df.concat([df1, df2],axis=1)
- df1.join(df2,on=col1,how='inner')
## what is matplotlib
Matplotlib is a 2-D plotting library that helps in the visualization of data. Matplotlib creates diagrams and visualizations similar to those used in Matlab. Matlab is not free, is challenging to scale, and is tedious to use as a programming language. As a result, matplotlib in Python is used since it is a stable, open, and simple data visualization library.
## Live code
- https://github.com/artinmajdi/resbaz_2021.git
## Introductions
- Artin Majdi (instructor)
- UA ECE PhD candidate
- mohammadsmajdi@email.arizona.edu
- msm2024@gmail.com
## Questions and Answers
In this section, you can post your questions and feel free to answer if you have it. Questions will be answered during or after the workshop.
1. Ask your question.
- Here is an answer
---
:::info
**Session Feedback :mega:**
Use the link below to provide your feedback on the session:
[**Session Feedback Form**](https://forms.gle/TrnJpr9qRBEKdnVVA)
:::