# PyCon TW 2016 Collaborative Talk Notes <br> Day 3 - R0 > ### Quick Links > - [Portal for Collobration Notes 共筆統整入口](https://hackfoldr.org/pycontw2016) (hosted by [hackfoldr](https://hackfoldr.org/about) and [HackMD](https://hackmd.io/)) > - [Program Schedule 議程時間表](https://tw.pycon.org/2016/events/talks/) > - [PyCon TW 2016 Official Site 官網](https://tw.pycon.org/2016/) > > ### How to update this note? > - Everyone can *freely* update this note. 任何人都能自由地更新內容。 > - Please respect all the participants and follow our [code of conduct](https://tw.pycon.org/2016/about/code-of-conduct/) during discussion. 討論、記錄時,請遵守大會的[行為準則](https://tw.pycon.org/2016/about/code-of-conduct/)。 ## Talk: Python 的 50 道陰影 - Info: https://tw.pycon.org/2016/en-us/events/talk/69816515947397183/ - Speaker: Tim Hsu - slide: http://www.slideshare.net/tim518/python50 #### 容易碰到但不容易第一時間弄懂的問題: - 縮排 - 參數傳遞 - closure - Global Variable - Dead or Alive - Interface - List Related - Package - Quality - Inheritance ### 縮排 - Tab 和空白視為不同的字元 - 直譯器會用一個 stack 紀錄現在的縮排多少空格, 若是縮排減少, 會把 stack pop, 一直到縮排跟 stack 頂端一致 ### 參數傳遞 - 在看到 def 的時候會計算 arguments 的預設值 - 但只有第一次會計算,之後都不會 ### Closure - 把函數先存到list,再一個一個印出來 -> 結果會不正確 - 產生closure時,python只會記住內部變數的名字,不會執行涵數內的程式(只是個symbol table的名字) - 被 closure 記住的變數不會立刻被 GC - 解法: 1. 被closure的函數中加`parameter=`參數 2. 改用 class,並使用 `__call__` 3. 用 `functools.partial` ### Global Variable - Pyhton 在執行函數的第一步, 會先確定語法正確性, 才開始執行程式內容 - 若global var和local var同名,python會錯亂,會把兩個都當成local - 解法:記得用 `global` ### Dead or Alive - circular reference 的多個 objects 他們的 `__del__` 不會被執行, 因為他們的 reference count 都大於 0 - 如果兩個物件都有實作自己的`__del__`,Python會不敢動作 - 解法 1. 若一定要用circular reference,使用weak reference ### Interface ### Package - 用virtualenv隔離套件 - `pip freeze > requirements.txt` - 如果不熟悉 compile 流程或有用科學計算建議使用 conda - 建議 requirements.txt 用手動修改, 否則若有用一些嘗試性質的套件會被 pip freeze 匯出, 可能會弄髒其他共同開發的同事的環境 ### Quality - [flake8](https://flake8.readthedocs.io/en/latest/) 可以檢查code符合PEP ## Talk: Analyzing Chinese Lyrics with Python - Info: https://tw.pycon.org/2016/en-us/events/talk/27349121996161025/ - Speaker: Andy Dai - slide: https://speakerdeck.com/daikeren/analyzing-chinese-lyrics-with-python #### 取得中文歌詞 抓歌詞 Tools: [Scrapy](http://scrapy.org/), MongoDB #### 清理歌詞 有重複歌曲、奇怪字元等 抓出需要的資料(EX.作曲人、作詞人) #### 開始分析 使用[pandas](http://pandas.pydata.org/)+pymongo 可以把pandas想成程式版的excel。excel能做的事pandas幾乎都做得到 [matplotlib](http://matplotlib.org/):將資料畫成圖表的工具。 資料量不大的話可以直接整個資料庫丟進pandas `df[df.lyricist=='林夕']` 就可以直接找出所有林夕的歌 #### 斷詞 使用[jieba](https://github.com/fxsjy/jieba) - 支援繁體、自訂字典... #### 計算詞頻 使用 [counter](https://docs.python.org/2/library/collections.html#collections.Counter) 計算(counter.most_common) #### 視覺化(文字雲) 使用[wordcloud](https://github.com/amueller/word_cloud)套件作文字雲 -`pip install wordcloud` - 需要自己給他中文字型檔 #### 其他來不及說的東西 - [jupyter](http://jupyter.org/) - elasticsearch。配合ElasticSearchDSL會更好 ## Talk: Time series prediction implement on Python - Info: https://tw.pycon.org/2016/en-us/events/talk/70022152522301510/ - Speaker: 古宣佑 Hsuanyo - Slides: https://drive.google.com/file/d/0B7XRi9zdePwCUnI1VUlEQzh5cGM/view?usp=sharing What is Time series data? Methods: - OLS, GLS, ... - ARIMA+ SVR , SdA - ARIMA, Autoregressive Integrated Moving Average Model - http://wiki.mbalib.com/zh-tw/ARIMA%E6%A8%A1%E5%9E%8B - SVR, Support Vector Regression - Related Paper - A Distributed PSO-ARIMA-SVR Hybrid System for Time Series Forecasting - http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6974534 - SDA, Stacked Denoising Autoencoders - Related Paper - Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion - http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf Scenario: IoT device report, need a model that is more complicated than the linear model. ARIMA: linear part (big picture) SVR: residual (details?) ACF/PACF AIC/BIC choose best model package rpy2 -automatically to decide the parameters - import pandas2ri, imporrtr use sklearn, SVR use grid_search to tune the parameters problem - pattern modeling (normal part) - exception modeling (abnormal part) - cross validation: beware of the order, because it's important for time series 第二部分 預測人會不會待在辦公室 (例如跟氣溫的相關性) Advantages of SdA - non-linear Packages used - Keras - deep learning - TensorFlow - backend of Keras - iypthon - visualization pretraining: autoencoders Steps: - add noise - autoencode - train - stack encoders supervised learning - Dataset: UCI SML2010 - - Target: indoor temperature