# Exploratory Data Analysis --- <!-- Put the link to this slide here so people can follow --> slide: https://hackmd.io/QLjM3jFRS6OMmQUrDmNHwg?both ---- ### An illustration ![width=200](https://qph.fs.quoracdn.net/main-qimg-461b1633f56dc0aa73df48dda8938d2d) ---- ## Definition > An approach to analyzing datasets to summarize their main characteristics, often with visual methods. ---- ## What's in EDA * Dataset information: numbers of data and dimensions, missing information, size, memory occupy. * What is data compose of: variables types, simple statistics * What are the characteristics of the variables: Missing ratio * What are the statistics within the variables: Correlation, histogram * Other visualization ---- ### Functions built in Pandas ![](https://i.stack.imgur.com/DUecE.png =500x) ![](https://media.geeksforgeeks.org/wp-content/uploads/1-539.png =350x) --- ## Why automation * Most of the works are routine and repetitive. * Consistency in reports and figure format. * Bring data scientists back to focus on creating more business values. ---- ## Introducing pandas-profiling ---- ### Why Pandas-profiling - Most of our data is loaded into Pandas dataframe - Pandas default methods(describe, head, etc...) are not enough for the analysis - We want to create report easily and straightforward - Able to expand the features ourselves --- ### Installation In the consle ```=bash conda install -c conda-forge pandas-profiling ``` ---- #### In notebook ```=Python import pandas as pd import pandas_profiling profile = pd.read_csv('YOUR_CSV_PATH').profile_report(title = 'YOUR REPORT TITLE') profile.to_file(output_file = 'YOUR_OUTPUT_FILE_PATH.html') ``` ---- ## Demo --- ### What is Missing - Time-series data EDA such as: - Time-series decomposition - Literal explanation or summarization: - Requires NLP techniques ---- ### Thank you! :sheep:
{"metaMigratedAt":"2023-06-15T00:51:24.370Z","metaMigratedFrom":"YAML","title":"Talk- EDA in one click","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"df5bcb6f-88f7-498e-9911-1eda2efc0f5e\",\"add\":2300,\"del\":275}]"}
    243 views