# Optimizing Python Code with pandas
###### tags: `Datacamp` `python` `panda` `data science`
>**作者:何彥南**
>Datacamp 課程: [Optimizing Python Code with pandas](https://www.datacamp.com/courses/optimizing-python-code-with-pandas)
**注意:**
1. df 為 pandas 的 DataFrame 的縮寫。
2. pd 為 panda 套件的縮寫。
3. 請以官方文件 [panda doc](https://pandas.pydata.org/pandas-docs/stable/) 為主。
4. 注意panda 的版本,有些功能可能在新版無法使用。
5. 程式碼內`#`標記的地方為 output
[toc]
# [ch1] Select columns and rows efficiently
>This chapter will give you an overview of why efficient code matters and selecting specific and random rows and columns efficiently.
## 1. Why we need efficient code and how to measure it
>使用的資料 [poker.csv](https://assets.datacamp.com/production/repositories/3832/datasets/c715cfae17d00d26693da8e612cb5fbd64e69589/poker_hand.csv)
![](https://i.imgur.com/rj5U9Sz.png)
### How do we measure time?
> 我們這邊使用time 套件,time.time()可以抓取當下的電腦時間
```python=
import time
start_time = time.time()
time.sleep(5)
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 5.001059532165527 seconds
```
* 小技巧: 在要print()的字串裡面加入 `%s` 然後在字串的後面加入 `%(這邊可以做基本運算)` 結果最後會放在 `%s` 的位置輸出,這樣在 print 多個不同 type 資料的時候 可以避免 TypeError 的問題。
* 像這樣:![](https://i.imgur.com/s1hmSrT.png)
* 這時用上面的小技巧就沒問題啦~
```python=
print('%s a' %(1))
```
* 這樣也是可以啦,只是多個變數的時候比較不適用。
```python=
print(1,'a')
```
* 還有另一種方式大家也可以研究看看 `'{} {}'.format( )`>>> [更多](https://pyformat.info/)
### Measuring time using time I | var()
> 比較`np.var()`和 panda底下的`Serise.var()`
* [numpy.var()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html)
```python=
start_time = time.time()
print(poker_hands['R2'].var())
# 14.190090008220789
print("Results from the pandas method calculated in %s seconds"
% (time.time() - start_time))
# Results from the pandas method calculated in 0.001730203628540039 seconds
```
* [pandas.Serise.var()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.var.html)
```python=
start_time = time.time()
print(np.var(poker_hands['R2']))
# 14.189522631571121
print("Results from the NumPy method calculated in %s seconds"
% (time.time - start_time))
# Results from the NumPy method calculated in 0.0010731220245361328 seconds
```
### Measuring time using time II
>這邊我們比較兩個function 耗的時間
>不使用 for 迴圈
![](https://i.imgur.com/rRYBod7.png)
```python=
start_time = time.time()
first_method = formula()
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.000108957290649 seconds
```
>使用 for 迴圈
![](https://i.imgur.com/4nEx7gf.png)
```python=
start_time = time.time()
second_method = brute_force()
print("Results from the second method calculated in %s seconds"
%(time.time() - start_time))
# Results from the second method calculated in 0.174870967865 second
```
* Difference in speed: 160,394.967179%
* 所以在寫程式時盡量避開for 迴圈。
## 2. Locate rows using the .iloc() and .loc() functions
### Locate targeted rows
>loc
```python=
rows = range(0, 500)
start_time = time.time()
data.loc[rows]
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.001951932 seconds
```
>iloc
```python=
start_time = time.time()
data.iloc[rows]
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the second method calculated in 0.0007140636 seconds
```
* Difference in speed: 173.355592654%
* 所以 iloc 比較有效率。
### Locate targeted columns
> iloc
```python=
start_time = time.time()
data.iloc[:,:3]
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.00125193595886 seconds
```
>[['column_name1','column_name2']]
```python=
start_time = time.time()
data[['S1', 'R1', 'S2']]
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.000964879989624 seconds
```
* Difference in speed: 29.7504324188%
* 所以在選取column 時,盡量使用 `simple` 的方式。
## 3. Select random rows
### Sampling random rows & columns | sample()、randint()
> 在隨機取樣的部分,使用sample( ) 比較快
* [pandas.DataFrame.sample()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html)
* `axis=0`: 對 row
* `axis=1`: 對 column
* [numpy.random.randint()](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.randint.html#numpy.random.randint)
>sample()
```python=
start_time = time.time()
poker.sample(100, axis=0)
print("Results from the second method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.000750064849854 seconds
```
>iloc、np.random
```python=
start_time = time.time()
poker.iloc[np.random.randint(low=0, high=poker.shape[0], size=100)]
print("Results from the second method calculated in %s seconds"
% (time.time() - start_time))
# Results from the second method calculated in 0.00103211402893 seconds
```
* Difference in speed: 37.6033057849%
> 在選取column 時 sample()也比較快
>sample()
```python=
start_time = time.time()
poker.sample(3, axis=1)
print("Results from the second method calculated in %s seconds"
%(time.time() - start_time))
# Results from the second method calculated in 0.000683069229126 seconds
```
>iloc、np.random
```python=
N = poker.shape[1]
start_time = time.time()
poker.iloc[:,np.random.randint(low=0, high=N, size=3)]
print("Results from the first method calculated in %s seconds"
%(time.time() - start_time))
# Results from the first method calculated in 0.0010929107666 seconds
```
* Difference in speed: 59.9999999998
---
# [ch2] Replace values of a DataFrame using the .replace() function
>This chapter shows the usage of the replace() function for replacing one or multiple values using lists and dictionaries.
![](https://i.imgur.com/sRWf2Pk.png)
## 1. Replace scalar values using .replace()
### Replace values in pandas | replace()
> 替換指定的字串,使用replace()比較快
* [pandas.Series.str.replace()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html#pandas.Series.str.replace)
* [pandas.DataFrame.replace()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html)
>loc[ ]
```python=
start_time = time.time()
names['Gender'].loc[names.Gender=='MALE'] = 'BOY'
print("Results from the first method calculated in %s seconds"
% (time.time() - start_t
# Results from the first method calculated in 0.0311849 seconds
```
>replace()
```python=
start_time = time.time()
names['Gender'].replace('MALE', 'BOY', inplace=True)
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.0016758441925 seconds
```
* Differerence in speed: 1,704.52411439%
## 2. Replace values using lists
![](https://i.imgur.com/OgqpTaU.png)
> 在替換多個值得時候使用replac() 比較方便
> loc
```python=
start_time = time.time()
names['Ethnicity'].loc[(names["Ethnicity"] == 'WHITE NON HISPANIC')
| (names["Ethnicity"] == 'WHITE NON HISP')] = 'WNH'
print("Results from the above operation calculated in %s seconds"
%(time.time() - start_time))
# Results from the second method calculated in 0.0276169776917 seconds
```
> replace()
```python=
start_time = time.time()
names['Ethnicity'].replace(['WHITE NON HISPANIC','WHITE NON HISP'], 'WNH', inplace=True)
print("Results from the above operation calculated in %s seconds"
% (time.time() - start_time))
# Results from the above operation calculated in 0.00144791603088 seconds
```
* Difference in speed: 2160.68681809%
* inplace=True: 改變原數據(在改變特定範圍時才有差)
>replace() II
```python=
names['Ethnicity'].replace(['WHITE NON HISP'], 'WHITE NON HISPANIC',
inplace=True)
names['Ethnicity'].replace(['BLACK NON HISP'], 'BLACK NON HISPANIC',
inplace=True)
names['Ethnicity'].replace(['BLACK NON HISP','WHITE NON HISP'],
['BLACK NON HISPANIC', 'WHITE NON HISPANIC'], inplace=True)
```
## 3. Replace values using dictionaries
### Replace single values with dictionaries
>with dictionaries
```python=
start_time = time.time()
names['Gender'].replace({'MALE':'BOY', 'FEMALE':'GIRL'}, inplace=True)
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.00197792053223 seconds
```
> 普通 replace()
```python=
start_time = time.time()
names['Gender'].replace('MALE', 'BOY', inplace=True)
names['Gender'].replace('FEMALE', 'GIRL', inplace=True)
print("Results from the first method calculated in %s seconds"
% (time.time() - start_time))
# Results from the first method calculated in 0.00307083129883 seconds
```
* Difference in speed: 55.2555448407%
### Replace multiple values using dictionaries
> 我們可以利用兩層 dict ,外層的key 用來指定column,而內層就像是上面對單行的 replace with dictionaries
```python=
start_time = time.time()
names.replace({'Ethnicity': {'ASIAN AND PACI': 'ASIAN', 'ASIAN AND PACIFIC ISLANDER': 'ASIAN',
'BLACK NON HISPANIC': 'BLACK', 'BLACK NON HISP': 'BLACK',
'WHITE NON HISPANIC': 'WHITE', 'WHITE NON HISP': 'WHITE'}})
print("Results from the above operation calculated in %s seconds"
% (time.time() - start_time))
# Results from the above operation calculated in 0.0028018 seconds
```
* 這邊 `Ethnicity `為指定的欄位
---
# [ch3] Speed efficient methods for iterating through a DataFrame
>This chapter presents different ways of iterating through a Pandas DataFrame and why vectorization is the most efficient way to achieve it.
## 1. Looping using the .iterrows() function
![](https://i.imgur.com/FZfp3gz.png)
### Generators in Python
>generator是一個可以迭代的object
```python=
def city_name_generator():
yield('New York')
yield('London')
yield('Tokyo')
yield('Sao Paolo')
city_names = city_name_generator()
next(city_names)
# 'New York'
next(city_names)
# 'London'
next(city_names)
# 'Tokyo'
next(city_names)
# 'Sao Paolo'
next(city_names)
# Traceback (most recent call last): File "<stdin>", line 1, in <module>
# StopIteratio
```
* 我們也可以用type看出他是個generator
![](https://i.imgur.com/mVr6C6o.png)
* 而generator是不能直接顯示出來的
![](https://i.imgur.com/Sf3N0Va.png)
### Looping using the .iterrows() function | iterrows()
> 對 dataframe 使用 iterrow() 他會返回 generator,它是可以迭代的物件
* [pandas.DataFrame.iterrow()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html)
```python=
gen = poker.iterrows()
first_element = next(gen)
first_element[0]
# 0
first_element[1]
# S1 2
# R1 11
# S2 2
# ...
```
### The iterrows() function for looping a DataFrame
> 我們也可以使用iterrows()做逐行迭代
```python=
data_generator = poker_hands.iterrows()
for index, values in data_generator:
if index % 2 == 1:
hand_sum = sum([values[1], values[3], values[5], values[7], values[9]])
```
* 大家要注意,絕對不要使用 `for i in range(len(df)): iloc[i,:]`去對每個row迭代 ,使用iterrows()會更有效率。
## 2. Looping using the .apply() functionc
![](https://i.imgur.com/FZfp3gz.png)
### The .apply() function for rows | apply()
* [pandas.DataFrame.apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html)
* [pandas.Series.apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.apply.html)
> 計算這五個欄位,`每個row` 的sum(),我們可以發現apply() 的表現優異,速率差到快4倍。
>apply
```python=
start_time = time.time()
poker[['R1', 'R2', 'R3', 'R4', 'R5']].apply(lambda x: sum(x), axis=1)
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.636334896088 seconds
```
>iterrows()
```python=
start_time = time.time()
for ind, value in poker.iterrows():
sum([value[1], value[3], value[5], value[7], value[9]])
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 3.15526986122 seconds
```
* Difference in speed: 395.85051529%
### The .apply() function for columns | sum()
> 這邊我們別用`pandas 和 python 的sum()`對 column 做 sum(),可以發現 pandas內建的sum() 既為優秀。
* [pandas.DataFrame.sum()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html)
>apply + python 內建的sum()
```python=
start_time = time.time()
# 計算欄位總和
poker[['R1', 'R2', 'R3', 'R4', 'R5']].apply(lambda x: sum(x), axis=0)
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.00490880012 seconds
```
```python=
start_time = time.time()
poker[['R1', 'R1', 'R3', 'R4', 'R5']].sum(axis=0)
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.00279092788 seconds
```
* Difference in speed: 160.310951649%
* 這邊apply()就多於了
### Use of .apply() function in every cell
> 你也可以對dataframe 裡的每個儲存格做一樣的動作,但是此動作是要可行且獨立的,也就是每個儲存格都能透過 function(或lambda) 得到對應的值。
> 這邊我們對dataframe 裡面每個儲存個做平方的動作。
```python=
get_square = lambda x: x**2
data_sum = poker_hands.apply(get_square)
print(data_sum.head())
#...
```
## 3. Vectorization over Pandas series
![](https://i.imgur.com/XRQWE8B.png)
### How to perform pandas vectorization(向量化)
> 向量就是一串數字,而這邊我們對每個row使用sum(),而sum()輸出的值只有一個,最後的到的結果就是一個`Series(column)` 也就是一個向量
```python=
start_time = time.time()
poker[['R1', 'R2', 'R3', 'R4', 'R5']].sum(axis=1)
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.0026819705 seconds
poker[['R1', 'R2', 'R3', 'R4', 'R5']].sum(axis=1).head()
```
>output
| ||
|-------------|----|
|0 | 47 |
|1 | 47 |
|2 | 47 |
|3 | 47 |
|4 | 47 |
|- |- |
| dtype: int64| |
### Comparison to the previous methods
> 效率: vectorization >.apply() >.iterows()
>iterrows()
```python=
data_generator = data.iterrows()
start_time = time.time()
for index, value in data_generator:
sum([value[1], value[3], value[5], value[7]])
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 3.37918996 seconds
```
>apply()
```python=
start_time = time.time()
data[['R1', 'R2', 'R3', 'R4', 'R5']].apply(lambda x: sum(x),axis=1)
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.637711048 seconds
```
* Difference between vectorization and the `.iterows()` function: 111,800.75%
* Difference between vectorization and the `.apply()` function: 20,853%
## 4. Vectorization with NumPy arrays using .values()
### NumPy in pandas | numpy
* [Numpy tutorial](https://www.numpy.org/devdocs/user/quickstart.html)
> 用 np.int8 對column 指定特別的格式
```python=
df = pd.DataFrame({'Col1':[0, 1, 2]}, dtype=np.int8)
print(df)
```
>output
| |Col1|
|---------|----|
|0 | 0 |
|1 | 1 |
|2 | 2 |
>使用np.array製造一個array,跟`python的list` 、`pandas的series`很像
```python=
nd = np.array(range(7))
print(nd)
# [0 1 2 3 4 5 6]
```
### How to perform NumPy vectorization | values()
* [pandas.Series.values()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.values.html)
>在 pandas 裡可以使用 `values()`,將資料轉換成 array的形式,下面我們比較一下有無用 array 的差別。發現會比直接用sum() 快一點
> with values
```python=
start_time = time.time()
poker[['R1', 'R2', 'R3', 'R4', 'R5']].values.sum(axis=1)
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.00157618522644 seconds
```
>without values
```python=
start_time = time.time()
poker[['R1', 'R2', 'R3', 'R4', 'R5']].sum(axis=1)
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.00268197059631 seconds
```
* Difference in time: 39.0482%
---
# [ch4] Data manipulation for groups using the .groupby() function
>This chapter describes the groupby() function and how we can use it to transform values in place, replace missing values and apply complex functions group-wise.
## 1. Data transformation using .groupby().transform | groupby()
![](https://i.imgur.com/CzB4fkK.png)
>這邊我們示範最基本的groupby,他對 `smoker`這個行做group(分群),你可以把`smoker`當作一個變數,在這個變數裡存在著 `yes`和`no` 這兩種值,也就是我們可以依據`smoker`這個變數把整個dataframe做分成`yes`和`no`這兩群 。下面我們對分完的`yes`、`no`這兩群資料使用count(),可以看到兩組個別的資料數量。
* [pandas.DataFrame.groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html)
```python=
restaurant_grouped = restaurant.groupby('smoker')
print(restaurant_grouped.count())
```
>output
![](https://i.imgur.com/OgAO6VN.png)
* 他是去算有幾筆資料(row),最後顯示的是一群裡面每個column裡包含的資料數量。因為 count()顯示的是組間的差異,在同一個群裡面的不同變數應當一致。`所以只要沒有空值,同一組(group)裡面每個變數(column)的資料數量是一樣的。`
### Data transformation | transform()
>首先我們先創一個lambda
```python=
zscore = lambda x: (x - x.mean() ) / x.std()
```
* [pandas.DataFrame.transform()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html)
* 相較於apply() ,transform() 限制較多
>再來,我們對`time` groupby() 後得到 `restaurant_grouped` ,接著使用`transform()`對每個group 執行 lambda 的動作
```python=
restaurant_grouped = restaurant.groupby('time')
restaurant_transformed = restaurant_grouped.transform(zscore)
restaurant_transformed.head()
# Results from the first operation calculated in 0.016291141 seconds
```
>output
![](https://i.imgur.com/gfM97r9.png)
### Comparison with native methods
```python=
restaurant.groupby('sex').transform(zscore)
# Results from the first operation calculated in 0.016291141 seconds
mean_female = restaurant.groupby('sex').mean()['total_bill']['Female']
mean_male = restaurant.groupby('sex').mean()['total_bill']['Male']
std_female = restaurant.groupby('sex').std()['total_bill']['Female']
std_male = restaurant.groupby('sex').std()['total_bill']['Male']
for i in range(len(restaurant)):
if restaurant.iloc[i][2] == 'Female':
restaurant.iloc[i][0] = (restaurant.iloc[i][0] - mean_female)/std_female
else:
restaurant.iloc[i][0] = (restaurant.iloc[i][0] - mean_male)/std_male
# Results from the second operation calculated in 3.937326908 seconds
```
* Difference in time: 24,068.5145%
### The min-max normalization using .transform()
```python=
min_max_tr = lambda x: (x - x.min()) / (x.max() - x.min())
restaurant_grouped = restaurant_data.groupby('time')
restaurant_min_max_group = restaurant_grouped.transform(min_max_tr)
print(restaurant_min_max_group.head())
#...
```
### Transforming values to probabilities
```python=
pois_tr = lambda x: np.exp(-x.mean()*x) * x.mean()
restaurant_grouped = restaurant_data.groupby('time')
restaurant_pois_group = restaurant_grouped['tip'].transform(pois_tr)
print(restaurant_pois_group.head())
```
### Validation of normalization transformation
```python=
zscore = lambda x: (x - x.mean()) / x.std()
poker_trans = poker_grouped.transform(zscore)
poker_regrouped = poker_trans.groupby(poker_hands['Class'])
print(np.round(poker_regrouped.mean(), 3))
#...
print(poker_regrouped.std())
#...
```
## 2. Missing value imputation using transform()
> 我們利用 count(),去找空值
```python=
prior_counts = restaurant.groupby('time')['total_bill'].count()
missing_counts = restaurant_nan.groupby('time')['total_bill'].count()
print(prior_counts - missing_counts)
```
>output
![](https://i.imgur.com/RA2kw9d.png)
### Missing value imputation | fillna()
>我們使用和fillna(),用平均值填補空值
* [pandas.DataFrame.fillna](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html)
```python=
missing_trans = lambda x: x.fillna(x.mean())
restaurant_nan_grouped = restaurant_nan.groupby('time')['total_bill']
restaurant_nan_grouped.transform(missing_trans)
# Results from the above operation calculated in 0.00368881225586 seconds
```
>output
![](https://i.imgur.com/tZX5asP.png)
### Comparison with native methods
>使用原本的方式,繞很多路,速度也差很多
```python=
start_time = time.time()
mean_din = restaurant_nan.loc[restaurant_nan.time == 'Dinner']['total_bill'].mean()
mean_lun = restaurant_nan.loc[restaurant_nan.time == 'Lunch']['total_bill'].mean()
for row in range(len(restaurant_nan)):
if restaurant_nan.iloc[row]['time'] == 'Dinner':
restaurant_nan.loc[row, 'total_time'] = mean_din
else:
restaurant_nan.loc[row, 'total_time'] = mean_lun
print("Results from the above operation calculated in %s seconds" % (time.time() - start_time))
# Results from the above operation calculated in 0.172566890717 seconds
```
* Difference in time: 4,578.115%
### Identifying missing values
```python=
restaurant_nan_grouped = restaurant_nan.groupby('smoker')
restaurant_nan_nval = restaurant_nan_grouped['tip'].count()
print(restaurant_nan_grouped['total_bill'].count() - restaurant_nan_nval)
# smoker
# No 41
# Yes 24
# dtype: int64
```
### Missing value imputation
```python=
missing_trans = lambda x: x.fillna(x.median())
restaurant_grouped = restaurant_data.groupby('time')
restaurant_impute = restaurant_grouped.transform(missing_trans)
print(restaurant_impute.head())
#...
```
## 3. Data filtration using the lter() function
### Purpose of filter()
* Limit results based on an aggregate feature
1. Number of missing values
1. Mean of a specic feature
1. Number of occurrences ofthe group
### Filter using groupby().filter() | filter()
>我們可以發現filter()比原來的方式更快。
* [pandas.DataFrame.filter()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html)
* 它是完全針對index_label 去篩選清算成本。
```python=
restaurant_grouped = restaurant.groupby('day')
filter_trans = lambda x : x['total_bill'].mean() > 20
restaurant_filtered = restaurant_grouped.filter(filter_trans)
# Results from the above operation calculated in 0.00414085388184 seconds
print(restaurant_filtered['tip'].mean())
# 3.11527607362
print(restaurant['tip'].mean())
# 2.9982786885245902
```
### Comparison with native methods
```python=
t=[restaurant.loc[df['day'] == i]['tip'] for i in restaurant['day'].unique()
if restaurant.loc[df['day'] == i]['total_bill'].mean()>20]
restaurant_filtered = t[0]
for j in t[1:]:
restaurant_filtered = restaurant_filtered.append(j,ignore_index=True)
# Results from the above operation calculated in 0.00663900375366 seconds
print(restaurant_filtered.mean())
# 3.11527607362
```
* Difference in time: 60.329341317157024%