# PyCon TW 2016 Collaborative Talk Notes <br> Day 3 - R0
> ### Quick Links
> - [Portal for Collobration Notes 共筆統整入口](https://hackfoldr.org/pycontw2016) (hosted by [hackfoldr](https://hackfoldr.org/about) and [HackMD](https://hackmd.io/))
> - [Program Schedule 議程時間表](https://tw.pycon.org/2016/events/talks/)
> - [PyCon TW 2016 Official Site 官網](https://tw.pycon.org/2016/)
>
> ### How to update this note?
> - Everyone can *freely* update this note. 任何人都能自由地更新內容。
> - Please respect all the participants and follow our [code of conduct](https://tw.pycon.org/2016/about/code-of-conduct/) during discussion. 討論、記錄時,請遵守大會的[行為準則](https://tw.pycon.org/2016/about/code-of-conduct/)。
## Talk: Python 的 50 道陰影
- Info: https://tw.pycon.org/2016/en-us/events/talk/69816515947397183/
- Speaker: Tim Hsu
- slide: http://www.slideshare.net/tim518/python50
#### 容易碰到但不容易第一時間弄懂的問題:
- 縮排
- 參數傳遞
- closure
- Global Variable
- Dead or Alive
- Interface
- List Related
- Package
- Quality
- Inheritance
### 縮排
- Tab 和空白視為不同的字元
- 直譯器會用一個 stack 紀錄現在的縮排多少空格, 若是縮排減少, 會把 stack pop, 一直到縮排跟 stack 頂端一致
### 參數傳遞
- 在看到 def 的時候會計算 arguments 的預設值
- 但只有第一次會計算,之後都不會
### Closure
- 把函數先存到list,再一個一個印出來 -> 結果會不正確
- 產生closure時,python只會記住內部變數的名字,不會執行涵數內的程式(只是個symbol table的名字)
- 被 closure 記住的變數不會立刻被 GC
- 解法:
1. 被closure的函數中加`parameter=`參數
2. 改用 class,並使用 `__call__`
3. 用 `functools.partial`
### Global Variable
- Pyhton 在執行函數的第一步, 會先確定語法正確性, 才開始執行程式內容
- 若global var和local var同名,python會錯亂,會把兩個都當成local
- 解法:記得用 `global`
### Dead or Alive
- circular reference 的多個 objects 他們的 `__del__` 不會被執行, 因為他們的 reference count 都大於 0
- 如果兩個物件都有實作自己的`__del__`,Python會不敢動作
- 解法
1. 若一定要用circular reference,使用weak reference
### Interface
### Package
- 用virtualenv隔離套件
- `pip freeze > requirements.txt`
- 如果不熟悉 compile 流程或有用科學計算建議使用 conda
- 建議 requirements.txt 用手動修改, 否則若有用一些嘗試性質的套件會被 pip freeze 匯出, 可能會弄髒其他共同開發的同事的環境
### Quality
- [flake8](https://flake8.readthedocs.io/en/latest/) 可以檢查code符合PEP
## Talk: Analyzing Chinese Lyrics with Python
- Info: https://tw.pycon.org/2016/en-us/events/talk/27349121996161025/
- Speaker: Andy Dai
- slide: https://speakerdeck.com/daikeren/analyzing-chinese-lyrics-with-python
#### 取得中文歌詞
抓歌詞
Tools: [Scrapy](http://scrapy.org/), MongoDB
#### 清理歌詞
有重複歌曲、奇怪字元等
抓出需要的資料(EX.作曲人、作詞人)
#### 開始分析
使用[pandas](http://pandas.pydata.org/)+pymongo
可以把pandas想成程式版的excel。excel能做的事pandas幾乎都做得到
[matplotlib](http://matplotlib.org/):將資料畫成圖表的工具。
資料量不大的話可以直接整個資料庫丟進pandas
`df[df.lyricist=='林夕']` 就可以直接找出所有林夕的歌
#### 斷詞
使用[jieba](https://github.com/fxsjy/jieba)
- 支援繁體、自訂字典...
#### 計算詞頻
使用 [counter](https://docs.python.org/2/library/collections.html#collections.Counter) 計算(counter.most_common)
#### 視覺化(文字雲)
使用[wordcloud](https://github.com/amueller/word_cloud)套件作文字雲
-`pip install wordcloud`
- 需要自己給他中文字型檔
#### 其他來不及說的東西
- [jupyter](http://jupyter.org/)
- elasticsearch。配合ElasticSearchDSL會更好
## Talk: Time series prediction implement on Python
- Info: https://tw.pycon.org/2016/en-us/events/talk/70022152522301510/
- Speaker: 古宣佑 Hsuanyo
- Slides: https://drive.google.com/file/d/0B7XRi9zdePwCUnI1VUlEQzh5cGM/view?usp=sharing
What is Time series data?
Methods:
- OLS, GLS, ...
- ARIMA+ SVR , SdA
- ARIMA, Autoregressive Integrated Moving Average Model
- http://wiki.mbalib.com/zh-tw/ARIMA%E6%A8%A1%E5%9E%8B
- SVR, Support Vector Regression
- Related Paper
- A Distributed PSO-ARIMA-SVR Hybrid System for Time Series Forecasting
- http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6974534
- SDA, Stacked Denoising Autoencoders
- Related Paper
- Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
- http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf
Scenario:
IoT device report, need a model that is more complicated than the linear model.
ARIMA: linear part (big picture)
SVR: residual (details?)
ACF/PACF
AIC/BIC choose best model
package rpy2
-automatically to decide the parameters
- import pandas2ri, imporrtr
use sklearn, SVR
use grid_search to tune the parameters
problem
- pattern modeling (normal part)
- exception modeling (abnormal part)
- cross validation: beware of the order, because it's important for time series
第二部分
預測人會不會待在辦公室 (例如跟氣溫的相關性)
Advantages of SdA
- non-linear
Packages used
- Keras - deep learning
- TensorFlow - backend of Keras
- iypthon - visualization
pretraining: autoencoders
Steps:
- add noise
- autoencode
- train
- stack encoders
supervised learning
- Dataset: UCI SML2010
-
- Target: indoor temperature