Virgil - Intro To Pandas Seaborn - S41 Selection

--- title: Virgil - Intro To Pandas Seaborn - S41 Selection tags: Virgil, LearnWorld, IntroPandasSeaborn --- <a target="_blank" href="https://colab.research.google.com/drive/1W0rZw7DOOxRy_BcGF-bE2Vp1Es-bwWps"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a> ## Selection ### 1. loc - Direct Location in DataFrame ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Aruba</th> <td>ABW</td> <td>10.244</td> <td>78.9</td> <td>High income</td> </tr> <tr> <th>Afghanistan</th> <td>AFG</td> <td>35.253</td> <td>5.9</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1</td> <td>Upper middle income</td> </tr> <tr> <th>Albania</th> <td>ALB</td> <td>12.877</td> <td>57.2</td> <td>Upper middle income</td> </tr> <tr> <th>United Arab Emirates</th> <td>ARE</td> <td>11.044</td> <td>88.0</td> <td>High income</td> </tr> </tbody> </table> </div> ```python df ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Aruba</th> <td>ABW</td> <td>10.244</td> <td>78.9</td> <td>High income</td> </tr> <tr> <th>Afghanistan</th> <td>AFG</td> <td>35.253</td> <td>5.9</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1</td> <td>Upper middle income</td> </tr> <tr> <th>Albania</th> <td>ALB</td> <td>12.877</td> <td>57.2</td> <td>Upper middle income</td> </tr> <tr> <th>United Arab Emirates</th> <td>ARE</td> <td>11.044</td> <td>88.0</td> <td>High income</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>Yemen, Rep.</th> <td>YEM</td> <td>32.947</td> <td>20.0</td> <td>Lower middle income</td> </tr> <tr> <th>South Africa</th> <td>ZAF</td> <td>20.850</td> <td>46.5</td> <td>Upper middle income</td> </tr> <tr> <th>Congo, Dem. Rep.</th> <td>COD</td> <td>42.394</td> <td>2.2</td> <td>Low income</td> </tr> <tr> <th>Zambia</th> <td>ZMB</td> <td>40.471</td> <td>15.4</td> <td>Lower middle income</td> </tr> <tr> <th>Zimbabwe</th> <td>ZWE</td> <td>35.715</td> <td>18.5</td> <td>Low income</td> </tr> </tbody> </table> <p>195 rows × 4 columns</p> </div> ```python # Choose one row dong_vn = df.loc['Vietnam'] dong_vn # What is the result's datatype? # A/ DataFrame # B/ Series # C/ Index # D/ I don't know man. ``` ```python type(df.loc['Vietnam']) ``` pandas.core.series.Series We can check the type of the output by calling ```type()```. Syntax in Pandas can be written in a continuous style. - If the output is ```dataframe```, we can continue using ```dataframe``` syntax. - If the output is ```series```, we can continue using ```series``` syntax. ```python # Choose one certain data point df.loc['Vietnam', 'Birth rate'] ``` 15.537 ### 2. iloc - Integer Location in DataFrame ***Brief talk about selecting item in tuple and list in Python*** ```python df.shape ``` ```python df['Income Group'].unique() ``` If the collection is wrapped around - by parentheses ```()``` (tuple) - by square brackets ```[]``` (list, numpy array) You can select the item by integer index. For example: ```python a = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'] ``` You can select the first item of ```a```, which is Monday, by calling ```python a[0] ``` ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Aruba</th> <td>ABW</td> <td>10.244</td> <td>78.9</td> <td>High income</td> </tr> <tr> <th>Afghanistan</th> <td>AFG</td> <td>35.253</td> <td>5.9</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1</td> <td>Upper middle income</td> </tr> <tr> <th>Albania</th> <td>ALB</td> <td>12.877</td> <td>57.2</td> <td>Upper middle income</td> </tr> <tr> <th>United Arab Emirates</th> <td>ARE</td> <td>11.044</td> <td>88.0</td> <td>High income</td> </tr> </tbody> </table> </div> ```python # Select the whole row df.iloc[3] ``` Country Code ALB Birth rate 12.877 Internet users 57.2 Income Group Upper middle income Name: Albania, dtype: object ```python # Select a certain data point df.iloc[3, 1] ``` 12.877 ### 3. Select column(s) in DataFrame You can simply choose a column by column name, no need to address loc or iloc ```python df.head() ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Aruba</th> <td>ABW</td> <td>10.244</td> <td>78.9</td> <td>High income</td> </tr> <tr> <th>Afghanistan</th> <td>AFG</td> <td>35.253</td> <td>5.9</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1</td> <td>Upper middle income</td> </tr> <tr> <th>Albania</th> <td>ALB</td> <td>12.877</td> <td>57.2</td> <td>Upper middle income</td> </tr> <tr> <th>United Arab Emirates</th> <td>ARE</td> <td>11.044</td> <td>88.0</td> <td>High income</td> </tr> </tbody> </table> </div> ```python # Choose a column df['Internet users'] ``` Country Name Aruba 78.9 Afghanistan 5.9 Angola 19.1 Albania 57.2 United Arab Emirates 88.0 ... Yemen, Rep. 20.0 South Africa 46.5 Congo, Dem. Rep. 2.2 Zambia 15.4 Zimbabwe 18.5 Name: Internet users, Length: 195, dtype: float64 ```python # Choose 2 or more columns # List tên các cột df[['Country Code', 'Birth rate', 'Internet users']] ``` ### 4. Selection in Series In Series, you can use either direct index or integer index, without having to specify ```loc``` or ```iloc```. ```python df['Birth rate'] ``` What is the datatype of the output from the code ```df['Birth rate']```? A/ Dataframe B/ Index C/ Series D/ Integer ```python df['Birth rate']['Angola'] ``` ```python df['Birth rate'][0] ``` ## Aggregation Function **13 Aggregation Functions in Pandas** `mean()`: Compute mean of groups `sum()`: Compute sum of group values `size()`: Compute group sizes `count()`: Compute count of group `std()`: Standard deviation of groups `var()`: Compute variance of groups `sem()`: Standard error of the mean of groups `describe()`: Generates descriptive statistics `first()`: Compute first of group values `last()`: Compute last of group values `nth()`: Take nth value, or a subset if n is a list `min()`: Compute min of group values `max()`: Compute max of group values ```python df['Internet users'].mean() ``` 42.0764708919487