---
title: Virgil - Intro To Pandas Seaborn - S41 Selection
tags: Virgil, LearnWorld, IntroPandasSeaborn
---
<a target="_blank" href="https://colab.research.google.com/drive/1W0rZw7DOOxRy_BcGF-bE2Vp1Es-bwWps"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
## Selection
### 1. loc - Direct Location in DataFrame
```python
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Country Code</th>
<th>Birth rate</th>
<th>Internet users</th>
<th>Income Group</th>
</tr>
<tr>
<th>Country Name</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>Aruba</th>
<td>ABW</td>
<td>10.244</td>
<td>78.9</td>
<td>High income</td>
</tr>
<tr>
<th>Afghanistan</th>
<td>AFG</td>
<td>35.253</td>
<td>5.9</td>
<td>Low income</td>
</tr>
<tr>
<th>Angola</th>
<td>AGO</td>
<td>45.985</td>
<td>19.1</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>Albania</th>
<td>ALB</td>
<td>12.877</td>
<td>57.2</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>United Arab Emirates</th>
<td>ARE</td>
<td>11.044</td>
<td>88.0</td>
<td>High income</td>
</tr>
</tbody>
</table>
</div>
```python
df
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Country Code</th>
<th>Birth rate</th>
<th>Internet users</th>
<th>Income Group</th>
</tr>
<tr>
<th>Country Name</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>Aruba</th>
<td>ABW</td>
<td>10.244</td>
<td>78.9</td>
<td>High income</td>
</tr>
<tr>
<th>Afghanistan</th>
<td>AFG</td>
<td>35.253</td>
<td>5.9</td>
<td>Low income</td>
</tr>
<tr>
<th>Angola</th>
<td>AGO</td>
<td>45.985</td>
<td>19.1</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>Albania</th>
<td>ALB</td>
<td>12.877</td>
<td>57.2</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>United Arab Emirates</th>
<td>ARE</td>
<td>11.044</td>
<td>88.0</td>
<td>High income</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>Yemen, Rep.</th>
<td>YEM</td>
<td>32.947</td>
<td>20.0</td>
<td>Lower middle income</td>
</tr>
<tr>
<th>South Africa</th>
<td>ZAF</td>
<td>20.850</td>
<td>46.5</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>Congo, Dem. Rep.</th>
<td>COD</td>
<td>42.394</td>
<td>2.2</td>
<td>Low income</td>
</tr>
<tr>
<th>Zambia</th>
<td>ZMB</td>
<td>40.471</td>
<td>15.4</td>
<td>Lower middle income</td>
</tr>
<tr>
<th>Zimbabwe</th>
<td>ZWE</td>
<td>35.715</td>
<td>18.5</td>
<td>Low income</td>
</tr>
</tbody>
</table>
<p>195 rows × 4 columns</p>
</div>
```python
# Choose one row
dong_vn = df.loc['Vietnam']
dong_vn
# What is the result's datatype?
# A/ DataFrame
# B/ Series
# C/ Index
# D/ I don't know man.
```
```python
type(df.loc['Vietnam'])
```
pandas.core.series.Series
We can check the type of the output by calling ```type()```. Syntax in Pandas can be written in a continuous style.
- If the output is ```dataframe```, we can continue using ```dataframe``` syntax.
- If the output is ```series```, we can continue using ```series``` syntax.
```python
# Choose one certain data point
df.loc['Vietnam', 'Birth rate']
```
15.537
### 2. iloc - Integer Location in DataFrame
***Brief talk about selecting item in tuple and list in Python***
```python
df.shape
```
```python
df['Income Group'].unique()
```
If the collection is wrapped around
- by parentheses ```()``` (tuple)
- by square brackets ```[]``` (list, numpy array)
You can select the item by integer index. For example:
```python
a = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
```
You can select the first item of ```a```, which is Monday, by calling
```python
a[0]
```
```python
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Country Code</th>
<th>Birth rate</th>
<th>Internet users</th>
<th>Income Group</th>
</tr>
<tr>
<th>Country Name</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>Aruba</th>
<td>ABW</td>
<td>10.244</td>
<td>78.9</td>
<td>High income</td>
</tr>
<tr>
<th>Afghanistan</th>
<td>AFG</td>
<td>35.253</td>
<td>5.9</td>
<td>Low income</td>
</tr>
<tr>
<th>Angola</th>
<td>AGO</td>
<td>45.985</td>
<td>19.1</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>Albania</th>
<td>ALB</td>
<td>12.877</td>
<td>57.2</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>United Arab Emirates</th>
<td>ARE</td>
<td>11.044</td>
<td>88.0</td>
<td>High income</td>
</tr>
</tbody>
</table>
</div>
```python
# Select the whole row
df.iloc[3]
```
Country Code ALB
Birth rate 12.877
Internet users 57.2
Income Group Upper middle income
Name: Albania, dtype: object
```python
# Select a certain data point
df.iloc[3, 1]
```
12.877
### 3. Select column(s) in DataFrame
You can simply choose a column by column name, no need to address loc or iloc
```python
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Country Code</th>
<th>Birth rate</th>
<th>Internet users</th>
<th>Income Group</th>
</tr>
<tr>
<th>Country Name</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>Aruba</th>
<td>ABW</td>
<td>10.244</td>
<td>78.9</td>
<td>High income</td>
</tr>
<tr>
<th>Afghanistan</th>
<td>AFG</td>
<td>35.253</td>
<td>5.9</td>
<td>Low income</td>
</tr>
<tr>
<th>Angola</th>
<td>AGO</td>
<td>45.985</td>
<td>19.1</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>Albania</th>
<td>ALB</td>
<td>12.877</td>
<td>57.2</td>
<td>Upper middle income</td>
</tr>
<tr>
<th>United Arab Emirates</th>
<td>ARE</td>
<td>11.044</td>
<td>88.0</td>
<td>High income</td>
</tr>
</tbody>
</table>
</div>
```python
# Choose a column
df['Internet users']
```
Country Name
Aruba 78.9
Afghanistan 5.9
Angola 19.1
Albania 57.2
United Arab Emirates 88.0
...
Yemen, Rep. 20.0
South Africa 46.5
Congo, Dem. Rep. 2.2
Zambia 15.4
Zimbabwe 18.5
Name: Internet users, Length: 195, dtype: float64
```python
# Choose 2 or more columns
# List tên các cột
df[['Country Code', 'Birth rate', 'Internet users']]
```
### 4. Selection in Series
In Series, you can use either direct index or integer index, without having to specify ```loc``` or ```iloc```.
```python
df['Birth rate']
```
What is the datatype of the output from the code ```df['Birth rate']```?
A/ Dataframe
B/ Index
C/ Series
D/ Integer
```python
df['Birth rate']['Angola']
```
```python
df['Birth rate'][0]
```
## Aggregation Function
**13 Aggregation Functions in Pandas**
`mean()`: Compute mean of groups
`sum()`: Compute sum of group values
`size()`: Compute group sizes
`count()`: Compute count of group
`std()`: Standard deviation of groups
`var()`: Compute variance of groups
`sem()`: Standard error of the mean of groups
`describe()`: Generates descriptive statistics
`first()`: Compute first of group values
`last()`: Compute last of group values
`nth()`: Take nth value, or a subset if n is a list
`min()`: Compute min of group values
`max()`: Compute max of group values
```python
df['Internet users'].mean()
```
42.0764708919487