Pandas Getting started

# Pandas Getting started ###### tags: `learn` `AI` `course` ``` import pandas as pd ``` DataFrame A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column. ``` pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]}) ``` ![](https://i.imgur.com/qZXHZsw.png) The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves. The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an index parameter in our constructor: ``` pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']}, index=['Product A', 'Product B']) ``` ![](https://i.imgur.com/BKDNyuH.png) Series A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list: ``` pd.Series([1, 2, 3, 4, 5]) ``` Reading data files ``` wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv") ``` We can use the shape attribute to check how large the resulting DataFrame is: ``` wine_reviews.shape ``` ![](https://i.imgur.com/G81xOXe.png) The pd.read_csv() function is well-endowed, with over 30 optional parameters you can specify. For example, you can see in this dataset that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col. ``` wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0) wine_reviews.head() ``` ![](https://i.imgur.com/rcLCrSY.png)