--- title: Virgil - Intro To Pandas Seaborn - S51 Filter and Sort tags: Virgil, LearnWorld, IntroPandasSeaborn --- <a target="_blank" href="https://colab.research.google.com/drive/1TgLNnnTRUAKtI7TZdtNnvkmQ432krELK"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a> ## Filter ```python # Filter data using one condition # Choosing all the country that has Birth rate more than 20 df[df['Birth rate'] > 20] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Afghanistan</th> <td>AFG</td> <td>35.253</td> <td>5.9</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1</td> <td>Upper middle income</td> </tr> <tr> <th>Burundi</th> <td>BDI</td> <td>44.151</td> <td>1.3</td> <td>Low income</td> </tr> <tr> <th>Benin</th> <td>BEN</td> <td>36.440</td> <td>4.9</td> <td>Low income</td> </tr> <tr> <th>Burkina Faso</th> <td>BFA</td> <td>40.551</td> <td>9.1</td> <td>Low income</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>Yemen, Rep.</th> <td>YEM</td> <td>32.947</td> <td>20.0</td> <td>Lower middle income</td> </tr> <tr> <th>South Africa</th> <td>ZAF</td> <td>20.850</td> <td>46.5</td> <td>Upper middle income</td> </tr> <tr> <th>Congo, Dem. Rep.</th> <td>COD</td> <td>42.394</td> <td>2.2</td> <td>Low income</td> </tr> <tr> <th>Zambia</th> <td>ZMB</td> <td>40.471</td> <td>15.4</td> <td>Lower middle income</td> </tr> <tr> <th>Zimbabwe</th> <td>ZWE</td> <td>35.715</td> <td>18.5</td> <td>Low income</td> </tr> </tbody> </table> <p>95 rows × 4 columns</p> </div> ```python # Choose all data with Internet rate less than 40 # YOUR CODE HERE df[df['Internet users'] < 40] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Afghanistan</th> <td>AFG</td> <td>35.253</td> <td>5.9</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1</td> <td>Upper middle income</td> </tr> <tr> <th>Burundi</th> <td>BDI</td> <td>44.151</td> <td>1.3</td> <td>Low income</td> </tr> <tr> <th>Benin</th> <td>BEN</td> <td>36.440</td> <td>4.9</td> <td>Low income</td> </tr> <tr> <th>Burkina Faso</th> <td>BFA</td> <td>40.551</td> <td>9.1</td> <td>Low income</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>Samoa</th> <td>WSM</td> <td>26.172</td> <td>15.3</td> <td>Lower middle income</td> </tr> <tr> <th>Yemen, Rep.</th> <td>YEM</td> <td>32.947</td> <td>20.0</td> <td>Lower middle income</td> </tr> <tr> <th>Congo, Dem. Rep.</th> <td>COD</td> <td>42.394</td> <td>2.2</td> <td>Low income</td> </tr> <tr> <th>Zambia</th> <td>ZMB</td> <td>40.471</td> <td>15.4</td> <td>Lower middle income</td> </tr> <tr> <th>Zimbabwe</th> <td>ZWE</td> <td>35.715</td> <td>18.5</td> <td>Low income</td> </tr> </tbody> </table> <p>95 rows × 4 columns</p> </div> ***Comparison in Python:*** ``` equal: == different: != more than: > less than: < more than or equal: >= less than or equal: <= ``` ```python # Example: Average birth rate of all the countries in High Income group. df[df['Income Group'] == 'High income']['Internet users'].mean() ``` 74.23168417462685 ```python # Chọn nhiều conditions # Lưu ý 1: and/or --> &, | # Lưu ý 2: phải đưa condition vào trong () df[(df['Internet users'] > 20) & (df['Birth rate'] < 50)] ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Aruba</th> <td>ABW</td> <td>10.244</td> <td>78.9</td> <td>High income</td> </tr> <tr> <th>Albania</th> <td>ALB</td> <td>12.877</td> <td>57.2</td> <td>Upper middle income</td> </tr> <tr> <th>United Arab Emirates</th> <td>ARE</td> <td>11.044</td> <td>88.0</td> <td>High income</td> </tr> <tr> <th>Argentina</th> <td>ARG</td> <td>17.716</td> <td>59.9</td> <td>High income</td> </tr> <tr> <th>Armenia</th> <td>ARM</td> <td>13.308</td> <td>41.9</td> <td>Lower middle income</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>Venezuela, RB</th> <td>VEN</td> <td>19.842</td> <td>54.9</td> <td>High income</td> </tr> <tr> <th>Virgin Islands (U.S.)</th> <td>VIR</td> <td>10.700</td> <td>45.3</td> <td>High income</td> </tr> <tr> <th>Vietnam</th> <td>VNM</td> <td>15.537</td> <td>43.9</td> <td>Lower middle income</td> </tr> <tr> <th>West Bank and Gaza</th> <td>PSE</td> <td>30.394</td> <td>46.6</td> <td>Lower middle income</td> </tr> <tr> <th>South Africa</th> <td>ZAF</td> <td>20.850</td> <td>46.5</td> <td>Upper middle income</td> </tr> </tbody> </table> <p>129 rows × 4 columns</p> </div> ```python # Filter data using multiple conditions # Remember to wrap the condition in parentheses df[(df['Birth rate'] > 20) & (df['Internet users'] < 50)] df[(df['Birth rate'] > 20) | (df['Internet users'] < 50)] ``` ## Sort ```python # Sort value (tăng dần) df.sort_values('Birth rate') # Giảm dần df.sort_values('Birth rate', ascending=False) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Niger</th> <td>NER</td> <td>49.661</td> <td>1.7000</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1000</td> <td>Upper middle income</td> </tr> <tr> <th>Chad</th> <td>TCD</td> <td>45.745</td> <td>2.3000</td> <td>Low income</td> </tr> <tr> <th>Burundi</th> <td>BDI</td> <td>44.151</td> <td>1.3000</td> <td>Low income</td> </tr> <tr> <th>Mali</th> <td>MLI</td> <td>44.138</td> <td>3.5000</td> <td>Low income</td> </tr> <tr> <th>...</th> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr> <tr> <th>Germany</th> <td>DEU</td> <td>8.500</td> <td>84.1700</td> <td>High income</td> </tr> <tr> <th>Italy</th> <td>ITA</td> <td>8.500</td> <td>58.4593</td> <td>High income</td> </tr> <tr> <th>Japan</th> <td>JPN</td> <td>8.200</td> <td>89.7100</td> <td>High income</td> </tr> <tr> <th>Portugal</th> <td>PRT</td> <td>7.900</td> <td>62.0956</td> <td>High income</td> </tr> <tr> <th>Hong Kong SAR, China</th> <td>HKG</td> <td>7.900</td> <td>74.2000</td> <td>High income</td> </tr> </tbody> </table> <p>195 rows × 4 columns</p> </div> ```python # Find top 5 country by birth rate df.sort_values('Birth rate', ascending=False).head(5) ``` <div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>Country Code</th> <th>Birth rate</th> <th>Internet users</th> <th>Income Group</th> </tr> <tr> <th>Country Name</th> <th></th> <th></th> <th></th> <th></th> </tr> </thead> <tbody> <tr> <th>Niger</th> <td>NER</td> <td>49.661</td> <td>1.7</td> <td>Low income</td> </tr> <tr> <th>Angola</th> <td>AGO</td> <td>45.985</td> <td>19.1</td> <td>Upper middle income</td> </tr> <tr> <th>Chad</th> <td>TCD</td> <td>45.745</td> <td>2.3</td> <td>Low income</td> </tr> <tr> <th>Burundi</th> <td>BDI</td> <td>44.151</td> <td>1.3</td> <td>Low income</td> </tr> <tr> <th>Mali</th> <td>MLI</td> <td>44.138</td> <td>3.5</td> <td>Low income</td> </tr> </tbody> </table> </div>