# Exploratory Data Analysis
---
<!-- Put the link to this slide here so people can follow -->
slide: https://hackmd.io/QLjM3jFRS6OMmQUrDmNHwg?both
----
### An illustration

----
## Definition
> An approach to analyzing datasets to summarize their main characteristics, often with visual methods.
----
## What's in EDA
* Dataset information: numbers of data and dimensions, missing information, size, memory occupy.
* What is data compose of: variables types, simple statistics
* What are the characteristics of the variables: Missing ratio
* What are the statistics within the variables: Correlation, histogram
* Other visualization
----
### Functions built in Pandas


---
## Why automation
* Most of the works are routine and repetitive.
* Consistency in reports and figure format.
* Bring data scientists back to focus on creating more business values.
----
## Introducing pandas-profiling
----
### Why Pandas-profiling
- Most of our data is loaded into Pandas dataframe
- Pandas default methods(describe, head, etc...) are not enough for the analysis
- We want to create report easily and straightforward
- Able to expand the features ourselves
---
### Installation
In the consle
```=bash
conda install -c conda-forge pandas-profiling
```
----
#### In notebook
```=Python
import pandas as pd
import pandas_profiling
profile = pd.read_csv('YOUR_CSV_PATH').profile_report(title = 'YOUR REPORT TITLE')
profile.to_file(output_file = 'YOUR_OUTPUT_FILE_PATH.html')
```
----
## Demo
---
### What is Missing
- Time-series data EDA such as:
- Time-series decomposition
- Literal explanation or summarization:
- Requires NLP techniques
----
### Thank you! :sheep:
{"metaMigratedAt":"2023-06-15T00:51:24.370Z","metaMigratedFrom":"YAML","title":"Talk- EDA in one click","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"df5bcb6f-88f7-498e-9911-1eda2efc0f5e\",\"add\":2300,\"del\":275}]"}