---
title: Virgil - Intro To Pandas Seaborn - S61 Groupby
tags: Virgil, LearnWorld, IntroPandasSeaborn
---
<a target="_blank" href="https://colab.research.google.com/drive/1dmqNCOmQ8thWeKz_fW-dX30Hbo3D8waz"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
## 1. Groupby
By `groupby` we are referring to a process involving one or more of the following steps:
* **Splitting** the data into groups based on some criteria
* **Applying** a function to each group independently
* **Combining** the results into a data structure
```python
df.groupby('day')['total_bill'].sum()
```
day
Fri 325.88
Sat 1778.40
Sun 1627.16
Thur 1096.33
Name: total_bill, dtype: float64
```python
# Average tip theo ngày
df.groupby('day')['tip'].mean()
```
day
Fri 2.734737
Sat 2.993103
Sun 3.255132
Thur 2.771452
Name: tip, dtype: float64
```python
df.head()
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>total_bill</th>
<th>tip</th>
<th>sex</th>
<th>smoker</th>
<th>day</th>
<th>time</th>
<th>size</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>16.99</td>
<td>1.01</td>
<td>Female</td>
<td>No</td>
<td>Sun</td>
<td>Dinner</td>
<td>2</td>
</tr>
<tr>
<th>1</th>
<td>10.34</td>
<td>1.66</td>
<td>Male</td>
<td>No</td>
<td>Sun</td>
<td>Dinner</td>
<td>3</td>
</tr>
<tr>
<th>2</th>
<td>21.01</td>
<td>3.50</td>
<td>Male</td>
<td>No</td>
<td>Sun</td>
<td>Dinner</td>
<td>3</td>
</tr>
<tr>
<th>3</th>
<td>23.68</td>
<td>3.31</td>
<td>Male</td>
<td>No</td>
<td>Sun</td>
<td>Dinner</td>
<td>2</td>
</tr>
<tr>
<th>4</th>
<td>24.59</td>
<td>3.61</td>
<td>Female</td>
<td>No</td>
<td>Sun</td>
<td>Dinner</td>
<td>4</td>
</tr>
</tbody>
</table>
</div>
```python
# trung bình table size theo sex
df.groupby('sex')['size'].mean()
```
2.6305732484076434
**13 Aggregation Functions in Pandas**
`mean()`: Compute mean of groups
`sum()`: Compute sum of group values
`size()`: Compute group sizes
`count()`: Compute count of group
`std()`: Standard deviation of groups
`var()`: Compute variance of groups
`sem()`: Standard error of the mean of groups
`describe()`: Generates descriptive statistics
`first()`: Compute first of group values
`last()`: Compute last of group values
`nth()`: Take nth value, or a subset if n is a list
`min()`: Compute min of group values
`max()`: Compute max of group values```
```python
# Apply aggregation function on the groupby
df.groupby('day').mean()
```
🙋🏻♂️ **DISCUSSION ▸ What is the datatype of the above result?**
A --- a DataFrame
B --- a GroupBy Datatype
C --- a Series
D --- a SQL Database
```python
# Continue using dataframe syntax
df.groupby('day').mean()['total_bill']
```
🙋🏻♂️ **DISCUSSION ▸ What is the output of the above code cell?**
A --- a DataFrame
B --- a GroupBy Datatype
C --- a Series
D --- a SQL Database
```python
# Continue using syntax
df.groupby('day').mean()['total_bill']
```
day
Fri 17.151579
Sat 20.441379
Sun 21.410000
Thur 17.682742
Name: total_bill, dtype: float64
***Group by multiple values, one method***
```python
# GROUP BY TWO COLUMNS: Groupby the dataset by day and sex
# Có nhiều hơn 1 cột, sử dụng list []
df.groupby(['day', 'sex'])['tip'].mean()
```
2.781111111111111
```python
# Ngoặc vuông: select, có nhiều hơn 1 cột
# Ngoặt tròn: hàm -- groupby(), mean(), sum(), .info(), sort_values()
# Ngoặt nhọn: (dictionary)
```
```python
# Continue using dataframe syntax to get a certain value
df.groupby(['day', 'sex'])['tip'].mean().loc['Fri'].loc['Female']
```
day sex
Fri Female 2.781111
Male 2.693000
Sat Female 2.801786
Male 3.083898
Sun Female 3.367222
Male 3.220345
Thur Female 2.575625
Male 2.980333
Name: tip, dtype: float64
***Groupby one value, multiple methods***
```python
# Groupby on one value, multiple methods. --> Hàm .agg
df.groupby('day')['tip'].agg(['mean', 'sum'])
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>mean</th>
<th>sum</th>
</tr>
<tr>
<th>day</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>Fri</th>
<td>2.734737</td>
<td>51.96</td>
</tr>
<tr>
<th>Sat</th>
<td>2.993103</td>
<td>260.40</td>
</tr>
<tr>
<th>Sun</th>
<td>3.255132</td>
<td>247.39</td>
</tr>
<tr>
<th>Thur</th>
<td>2.771452</td>
<td>171.83</td>
</tr>
</tbody>
</table>
</div>
***Groupby multiple values, multiple methods***
```python
# Groupby on multiple values, multiple methods.
df.groupby('day')[['tip', 'total_bill']].agg({'tip':'sum', 'total_bill':'mean'})
```