PyCon TW 2017 Collaborative Talk Notes Day 1 - R0

# PyCon TW 2017 Collaborative Talk Notes Day 1 - R0 > ### Quick Links > - [Portal for Collobration Notes 共筆統整入口](http://beta.hackfoldr.org/pycontw2017/) (hosted by [betw.hackfoldr](http://beta.hackfoldr.org/) and [HackMD](https://hackmd.io/)) > - [Program Schedule 演講列表](https://tw.pycon.org/2017/events/talks/) > - [PyCon TW 2017 Official Site 官網](https://tw.pycon.org/2017/) > > ### How to update this note? > - Everyone can *freely* update this note. 任何人都能自由地更新內容。 > - Please respect all the participants and follow our [code of conduct](https://tw.pycon.org/2017/about/code-of-conduct/) during discussion. 討論、記錄時，請遵守大會的[行為準則](https://tw.pycon.org/2017/about/code-of-conduct/) > ### 10:50-11:20 Talk: [淺嚐LHCb數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib](https://tw.pycon.org/2017/events/talk/345377900414894195/) - Slider: https://www.slideshare.net/yuanchao/lhcb-play-around-the-lhcb-data-on-kaggle-with-sklearn-and-matplotlib - [Demo Repo](https://github.com/yuanchao/flavours-of-physics-start) - Speaker: 趙元 (Yuan CHAO) + 大霹靂的四大問題 - 反物質消失 - Parity violation (宇稱不守恆） - CP violation (電荷・宇稱不守恆） + Machine Learning is nothing new in HEP + The Kaggle Chanllenge + t->3u breaks lepton flavour conservation + sample source: mixed MC + The Goal + ROC curve + The K-S test + Control channel Ds->2u&pi + The Kolmogorov-Smimov test, requiring KS < 0.09 + The CvM test + Classifier should not be too much depened on t-mass + Read training data + Correlation Matrix + 不同的變數間的相依性 + Signal & Background 的分佈 + Define training features + Baseline traning + Check the training curve + 變數多收斂慢 + 不同的演算法收斂的情況不同 + KS Test & CvM Test + AUC ROC + Check the predict distribution + 希望輸出是平滑的 + Prediction test, create file for kaggle Q: Signal vs Backgorund A: MC-已知、Data-做額外分析 - [Kaggle - Flavours of Physics: Finding τ → μμμ](https://www.kaggle.com/c/flavours-of-physics/data) - [Kaggle - Higgs Boson ML Challenge](https://www.kaggle.com/c/higgs-boson) ** 投影片：https://www.slideshare.net/yuanchao/lhcb-play-around-the-lhcb-data-on-kaggle-with-sklearn-and-matplotlib ### 11:45-12:30 Talk: [TensorFlow Wide & Deep: Data Classification the easy way](https://tw.pycon.org/2017/events/talk/343120065593344111/) - Slides: [SlideShare Link](https://www.slideshare.net/YufengGuo4/pycon-tw-tensorflow-wide-deep-data-classification-the-easy-way) - Code: [GitHub Link](https://bit.ly/widendeep-census) - Speaker: [Yufeng Guo](yufengg.com), Google Cloud + A multidimensional array -> Tensor Flow <- graph of operations + tf support many platforms 架構： canned estimators estimator, keras model layers Python or C++ and more coming frontend tf distribution sample: movtivations- a magical food app + v2.0 memorize all the things + v3.0 deep more generalized recommendations for all - No good deed goes unpunished + v4.0 WHY not both Wide (Memorize the exeptions) and Deep (More generalized) ### 14:55-15:40 Talk: [模仿遊戲: AI如何與人類互動獲得新技能](https://tw.pycon.org/2017/events/talk/347535452179267705/) - Slider: https://goo.gl/OjUVcx - Speaker: Jiawei Chen + 為什麼要讓機器模仿人類：因為各種現實中奇怪的狀況 + https://www.youtube.com/watch?v=hXxaepw0zAw + 靠視覺學習　不是學軌跡 + 增強式學習(reinforcement learning) + 目的：learn policy + action -> env -> stage,reward -> agent -> action + https://gym.openai.com + gym 用簡單方法就可以安裝各種遊戲 + 可以自己設計玩各種經典遊戲的AI，還可以看大家演算法的排名、程式碼 + RL 需要 rewards + 舉例遙控機，要讓其飛到天空他的 reward 是什麼？講者：人類操控飛機紀錄操控過程讓電腦學習。 + Imitation {(s0,a0,s1,a1)} 記錄專家示範的軌跡 + 為什麼不用 Supervised learning (例如卷積網路)來做就好？看起來不太能運作。 + 原因每個預測動作都會有誤差 + 直接模仿人類不可行 + 假設人類對於價值判斷 + 機器必須學習出更高層次的東西（？） + Estimation problem + cost function c(s,a) -> RL -> learned policy(下個動作) -> RL policy's trajectiones s0, a0, s1, a1 -> inverse reinforceent -> cost function + 逆增強(inverse RL) + 類似普通增強學習，只是換個方向來想，用比較低成本得到好的效果 + Occupancy Measure + 得到(state, action)的分佈 + Main result + seeks a policy whose occupancy measure is close to the expert's, as measured by ψ\* + 推薦看李宏毅老師[MLDS的第十堂課]( https://www.youtube.com/watch?v=KSN4QYgAtao&lc=z13kz1nqvuqsipqfn23phthasre4evrdo) 截圖1時間點：33:11 截圖2好像是 39:47 + Generative Adversarial Imitation Learning(GAN 生成對抗網路 Generative Adversarial Networks) + 用TRPO這個方法來更新，不要讓學習時太大的變化讓好的學習結果跑偏。不要走看起來比較危險的地方。 + 用Adam來classify + [演算法步驟](https://www.yumpu.com/en/document/view/56442111/significant/6) 右邊那頁 + [論文](https://arxiv.org/pdf/1606.03476.pdf) 建議看之前先看李宏毅老師的課程 + [MuJoCo](http://www.mujoco.org/) + 學生不用錢 + [OpenAI Imitation github repo](https://github.com/openai/imitation) + [講者介紹的程式碼](https://github.com/openai/imitation/blob/master/policyopt/imitation.py) + 程式最後可能還會超越expert ### 16:10-16:55 Talk: [Connect "K" of SMACK：pykafka, kafka-python or ?](https://tw.pycon.org/2017/events/talk/323492357779488854/) - Slider: https://www.slideshare.net/sucitw/connect-k-of-smackpykafka-kafkapython-or - Speaker: Shuhsi Lin + SMACK 是幾個東西的縮寫 - Spark - Mesos - Akka - Cassandra - Kafka + 3 paradigms for programings + request/response + batch + streaming + Data comes from rise of events (orders, sales) + Data pipeline + Kafka + Fast + Scalable + Durable + Distributed + Kafka terminology + Subscribe/Publish + Consumer 是靠offset拿data，version 0.8 + why kafka is fast, store in disk, not random access. speed is linear. + Topics and Partition + partition is ordering and immutatble + Consumer Group + Delivery Guarantees: + at most once, exactly once + ZooKeeper + Kafka timeline: 2010 ~ 2016 + ver 0.8 前後差很多 + TLS connection (Security) + before 0.8 : no security + Kafka is considered as : + commit log service + meassging system + circular buffer + Cons of Kafka: + consumer complexity (smart, not poor) + Lack of tooling/monitoring (3rd party) + still pre 1.0 release + Operationally. It's more manual than desired + Require ZooKeeper + Kafka Use Cases: + Linkedin uses Kafka + Spotify + 2+2 Core APIs + Producer APIs + Consumer APIs + Connect APIs + Streams APIs + Legacy APIs + Kafka Clients + Java, C/C++, Python + Reliability, Performance, API Ability, + Create your own Kafka Broker + Landoop, Cloudkarafka, Cloudera, HortonWorks, Confluent, Vagrant, Docker + Kafka Shell Scripts + python kafka performance is slower than Java + Q&A: Kafka provides High level and low level API, which to use? + api Kafka-python 是原生 python 所開發。 ### 17:20-17:50 Talk: [Tensorflow & Python: Fault Detection System](https://tw.pycon.org/2017/events/talk/344208185701171314/) - 本場次不錄影 - Slider: https://www.slideshare.net/EricAhn/tensorflow-and-python-fault-detection-system-pycon-taiwan-2017 - Speaker: Eric Byungwook Ahn Lots of services to monitors Many views. charts and alarm system in IDC Center. I would like to detect a Fault with ML ault Many log formats: apache, squid, custom format... Type of log: + kernel log, system, cron.... Example of system of logs #### h2LOG2ML > log data is also natural language. > The sequence of words and expression is important sequential data ML: binary classifiction, supervised, RNN, topic model.... word2vec, doc2vec, paragraph2vec, RNN, CNN The speaker chose CNN > As you know, CNN is an architecture to process for image classification. what is convoluation layer compute is the output of neurons that are connected to local regions in the input. each computing a dot product between their weights and a small region they are connected to in the input volume. Original Image + 3x3 filter -> Convolution Layerilter -> Convoluted image (example 3x3 filter: 1 0 1 \n 0 1 0 \n 1 0 1) Can we use CNN on documents? using text CNN filter save a locally information of text, sequential data context information 可參考 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/ :::warning 講者code可以不要這樣配色嘛... ::: > XD > 應該沒有slide可以參考對嗎 > 應該是...沒放出來 > ｑｑ > 左上角很考驗視力 > 有人看得到reference嗎 >