PyCon TW 2016 Collaborative Talk Notes <br> Day 2 - R0

# PyCon TW 2016 Collaborative Talk Notes <br> Day 2 - R0 > ### Quick Links > - [Portal for Collobration Notes 共筆統整入口](https://hackfoldr.org/pycontw2016) (hosted by [hackfoldr](https://hackfoldr.org/about) and [HackMD](https://hackmd.io/)) > - [Program Schedule 議程時間表](https://tw.pycon.org/2016/events/talks/) > - [PyCon TW 2016 Official Site 官網](https://tw.pycon.org/2016/) > > ### How to update this note? > - Everyone can *freely* update this note. 任何人都能自由地更新內容。 > - Please respect all the participants and follow our [code of conduct](https://tw.pycon.org/2016/about/code-of-conduct/) during discussion. 討論、記錄時，請遵守大會的[行為準則](https://tw.pycon.org/2016/about/code-of-conduct/)。 ## Talk: 用Numpy做一個自己的股票分析系統 - Info: https://tw.pycon.org/2016/events/talk/35734163478806534/ - Speaker: PF - Slides: http://slides.com/iampf/pycon-2016 - 歷史資料可從台灣期貨交易所、台灣證卷交易所 - 分析資料：時間、開盤價、收盤價、最高價、最低價、成交量 - 買賣情況分三種：買進、賣空、不動 - itertools!! - np.convolve np.piecewise - multiprocess - GPU: PyCuda(還是要寫C!! orz) - Amcharts.js jQuery Flask Sqlite - 有問題可以找 pf@hst.tw ## Talk: 轉轉轉好運旺來一起來之雲端轉檔大作戰！ - Info: https://tw.pycon.org/2016/events/talk/70089712156541000/ - Speaker: 林進錕 - Slides: http://www.slideshare.net/ssuser25242a/ss-62714359 ##### 已有可用的套件 * Gearman: 從 worker 出發的 workflow，但是workflow寫死在worker中了 * Spotify Luigi * Others(Tractor, Celery...) ##### What we need * 不想管理 Job Server * 去中心化 * 自我管理 * 晚上好好睡覺 XD * Top-down: Wroker 可以根據需要，調整種類或需要 #### [KKBOX 的 MASS 套件](https://github.com/kkbox/mass) ## Talk: Continuous Deployment in AWS Lambda and Python - Info: https://tw.pycon.org/2016/events/talk/69186595164520503/ - Speaker: Suiting - Slides: http://bit.ly/1sssLYO #### Deploy pipeline developer -> github -> jenkins -> S3 -> lambda deployer ## Talk: Deep Learning with Python & TensorFlow - Info: https://tw.pycon.org/2016/events/talk/56874546946375700/ - Speaker: Ian Lewis - Twitter @IanMLewis - [線上演講影片](https://www.youtube.com/watch?v=2hYljESm0eQ&feature=youtu.be#t=2h35m15s) - specific field of ML -> ANN -> deep ANN - ANN good at: classification problem - Tensor: n-dim arrary DNN = a large matrix operations You need distributed training [Google揭露首款自製機器學習專用晶片TPU](http://www.ithome.com.tw/news/106042) - [Tensorflow](https://www.tensorflow.org/) - core is cpp-based (thus fast computation) and provide python interface - Graph: dataflow graph - core concepts - constants - Placeholders: fed with data on execution - Variables - Sessions - Operations - [MNIST Tutorials](https://www.tensorflow.org/versions/master/tutorials/index.html) [Jeff Dean's Talk](YoutubeLink) [Tensorflow workshop](https://github.com/amygdala/tensorflow-workshop) > 根本通靈 > NN 就是個 black box, 過程做了什麼都不知道 >[一天搞懂深度學習]( http://www.slideshare.net/tw_dsconf/ss-62245351) > 這個 notebook 找得到嗎 > 什麼東西notebook找的到? > 目前正在講的 jupyter notebook > [jupyter](https://github.com/jupyter) > 你說的是這個檔案對嗎 > 對的 > 可能在他電腦裡？　不知道有沒有公布ＱＱ > github 裡有一個 [notebook](https://github.com/jupyter/notebook) 項目 > ask him for sharing the note after talk? > 從他的twitter來看，這是他第一次講這題目 > jupyter 大多用在爬蟲 > 做data science的也大量再用jupyter喔~~~ > 話說~~ jupyter notebook就是ipython notebook ## Talk: 機器學習在搜尋排序上的應用 - Info: https://tw.pycon.org/2016/events/talk/62015679733170217/ - Speaker: Jiawei Chen - (請問沒有投影片嗎？)https://goo.gl/dt9Rii ### 共筆搜尋引擎 1. indexing 2. 搜尋結果排序方式： - tf-idf term frequency–inverse document frequency - click model 點擊模型 - PageRank 3. 萬一搜尋結果不理想 - 初步想法就是去調特徵的權重 - 加入更多特徵值 - 訓練好主題模型，抽取出其特徵值再加入 * [RankBrain](https://en.wikipedia.org/wiki/RankBrain) * [Learning to rank with scikit-learn: the pairwise transform](http://fa.bianp.net/blog/2012/learning-to-rank-with-scikit-learn-the-pairwise-transform/) * [Letor Dataset](http://research.microsoft.com/en-us/um/beijing/projects/letor/) * [xgboost 模型](https://github.com/dmlc/xgboost) ## Talk: Write your own micro data processing framework in python - Info: https://tw.pycon.org/2016/events/talk/69113918303240246/ - Speaker: David Chen - [Slides]( https://github.com/lucemia/slides/blob/master/slides/micropipeline.md) * gliacloud.com * What is a data processing framework; what is a taskflow... * [TaskFlow (OpenStack)](https://wiki.openstack.org/wiki/TaskFlow) * [Luigi (Spotify)](https://github.com/spotify/luigi) * [DataFlow (Google)](https://cloud.google.com/dataflow/) * [Django-p](https://django-pipeline.readthedocs.io/en/latest/) * [Google Pipeline API](https://github.com/GoogleCloudPlatform/appengine-pipelines) #### Design of Django-P * pipe: abstraction of pipline * future: the return value from pipline would be given in the future * pipeline: store config to db * slot: store pipeline execution results * barrier: to prevent running before its dependent task completed Implemented by Python generator, thus, *asynchronous* programming can be achieved. ## Talk: Neural Art -- Become a Great Artist by Deep Learning Algorithm - Info: https://tw.pycon.org/2016/events/talk/27429730160476163/ - Speaker: Mark Chang - Slides: http://www.slideshare.net/ckmarkohchang/neural-art-english-version - Source Code of Neural Art: https://github.com/ckmarkoh/neuralart_tensorflow #### 人類藝術家是怎麼畫畫的論文: [A Neural Algorithm of Artistic Style](http://arxiv.org/pdf/1508.06576v2.pdf) 1. 看到畫面然後變成訊號 2. 混上自己的風格特色 3. 畫出來 #### Visual Perception - 最小的單元 - Neuron 神經元 - Neuron 結合起來形成 visual pathway #### Visual Pathway - 視網膜接收到的訊號會透過相互連接的神經元所形成的 visual pathway 傳送到 visual area ##### Visual Area V1 這個部分只會感覺到線條 ##### Visual Area V4 這個部分能夠認知方形、三角形、圓形等幾何圖形 ##### Inferior Temporal Gyrus 這可以認知到更複雜的圖像 #### Misconception 錯覺是這個演算法能成功很重要的一個環節在例子當中，有兩個相同顏色的灰點，如果我們幫他們加上不同顏色的背景，那他們的顏色看起來就會不同。如果有兩條平行的線，幫他加上同心圓，那看起來就會變得扭曲如果有兩條一樣長的線，幫他們加上不同角度的兩側就會看起來不一樣長 > "misconception"?? <-- does he mean "false perception" or "illusion"?? > not sure about the differences between them > "misconception" = "誤解" #### Computer Vision - Neural Networks X: input signals w: weight of each input signals n: linear combination of input signals Sigmoid, Rectified Linear Function (non-linear function) sigmoid: 如果 input 趨近於零，那輸出就會是 0 Rectified Linear: 如果輸入小於 0，那輸出就會是零，如果大於零，則保持原輸出 > input layer > hidden layer > output layer #### Convolutional Neural Networks responsible for visual signals duplicate neuron with same weight in different position to sense the color, shape, ... - stride - padding - pooling: maximum/average pooling > Input layer > Convolutional layer > Pooling > ??? > Pooling > Output layer 在 Convolutional layer ，會辨識出線，在 Pooling ... 等三層，可以辨識出方形、圓形等形狀 #### VGG 19 (Convolutional Layer) [Very deep convolutional networks for large-scale image recognition](https://arxiv.org/pdf/1409.1556.pdf) 這個演算法在 ImageNet 2014 獲得冠軍的獎項他有 19 + 5 個層 pre-trained parameter can be downloaded online 不用再自己訓練資料，訓練資料十分耗時 #### Neural Art 模仿人類藝術家畫畫人類藝術家看到 101 -> 腦海生成畫面(不是真的 101，只是訊號，因為有錯覺，所以他不知道真實的樣子，只知道大概的樣子) -> 只能盡量把畫出來的和真實畫像的差異降低。 VGG 19 是一個商業化軟體，有很高的精確度去辨識出圖像來第一層會記錄下 location， ##### Content Generation Input Photo P、Input Canvas x 各自丟入VGG19再minimize兩者的差利用backpropogation修正canvas的RGB value higher layer會讓圖的細節loss越多，因為neural network是模仿人類的visual path 你可以看到一次又一次的，畫出來的畫像和真實的照片越來越相近 ##### Style Generation "Style" is position-independent --> Gram Matrix 我們把有風格的圖畫餵給 VGG19，把他轉換成 Gram Matrix，產生出 Style Image 在第一層，我們只能看到有細碎的風格，隨著層數往後，我們能更清楚的看到風格的細節 ##### Artwork Generation Content vs Style 在創造結果時，我們不要原圖的細節，所以我們只取較後面的層，在抽畫風的時候則要保留細節，可以把layer疊在一起 Initial State 對於最後的結果也有很重要的影響，如果完全沒有對於產生的圖像有任何提示的話，那會和風格輸入的圖片很相近，如果有一些提示（大樓的陰影等），那就會和我們要的結果比較相近，如果把台北 101 的照片餵進去作為 Initial State 的話，那產生出來的就會是最精確的有畫作風格的圖片。 #### Recurrent Neural Network (RNN) language model for generating poet