PyCon TW 2017 Collaborative Talk Notes
Day 1 - R0

Quick Links

Portal for Collobration Notes 共筆統整入口 (hosted by betw.hackfoldr and HackMD)

Program Schedule 演講列表

PyCon TW 2017 Official Site 官網

How to update this note?

Everyone can freely update this note. 任何人都能自由地更新內容。

Please respect all the participants and follow our code of conduct during discussion. 討論、記錄時，請遵守大會的行為準則

10:50-11:20
Talk: 淺嚐LHCb數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib

Slider: https://www.slideshare.net/yuanchao/lhcb-play-around-the-lhcb-data-on-kaggle-with-sklearn-and-matplotlib
Demo Repo
Speaker: 趙元 (Yuan CHAO)

大霹靂的四大問題
- 反物質消失
  - Parity violation (宇稱不守恆）
  - CP violation (電荷・宇稱不守恆）
Machine Learning is nothing new in HEP
The Kaggle Chanllenge
- t->3u breaks lepton flavour conservation
- sample source: mixed MC
- The Goal
  - ROC curve
- The K-S test
  - Control channel Ds->2u&pi
  - The Kolmogorov-Smimov test, requiring KS < 0.09
- The CvM test
  - Classifier should not be too much depened on t-mass
Read training data
Correlation Matrix
- 不同的變數間的相依性
Signal & Background 的分佈
Define training features
Baseline traning
Check the training curve
- 變數多收斂慢
- 不同的演算法收斂的情況不同
KS Test & CvM Test
AUC ROC
Check the predict distribution
- 希望輸出是平滑的
Prediction test, create file for kaggle

Q: Signal vs Backgorund
A: MC-已知、Data-做額外分析

** 投影片：https://www.slideshare.net/yuanchao/lhcb-play-around-the-lhcb-data-on-kaggle-with-sklearn-and-matplotlib

11:45-12:30
Talk: TensorFlow Wide & Deep: Data Classification the easy way

Slides: SlideShare Link
Code: GitHub Link
Speaker: Yufeng Guo, Google Cloud

A multidimensional array -> Tensor Flow <- graph of operations
tf support many platforms

架構：
canned estimators
estimator, keras model
layers
Python or C++ and more coming frontend
tf distribution

sample:
movtivations- a magical food app

v2.0 memorize all the things
v3.0 deep more generalized recommendations for all
- No good deed goes unpunished
v4.0 WHY not both

Wide (Memorize the exeptions) and Deep (More generalized)

14:55-15:40
Talk: 模仿遊戲: AI如何與人類互動獲得新技能

Slider: https://goo.gl/OjUVcx
Speaker: Jiawei Chen

為什麼要讓機器模仿人類：因為各種現實中奇怪的狀況
https://www.youtube.com/watch?v=hXxaepw0zAw
- 靠視覺學習　不是學軌跡
增強式學習(reinforcement learning)
- 目的：learn policy
- action -> env -> stage,reward -> agent -> action
- https://gym.openai.com
  - gym 用簡單方法就可以安裝各種遊戲
  - 可以自己設計玩各種經典遊戲的AI，還可以看大家演算法的排名、程式碼
- RL 需要 rewards
  - 舉例遙控機，要讓其飛到天空他的 reward 是什麼？講者：人類操控飛機紀錄操控過程讓電腦學習。
- Imitation {(s0,a0,s1,a1)} 記錄專家示範的軌跡
- 為什麼不用 Supervised learning (例如卷積網路)來做就好？看起來不太能運作。
  - 原因每個預測動作都會有誤差
  - 直接模仿人類不可行
  - 假設人類對於價值判斷
  - 機器必須學習出更高層次的東西（？）
- Estimation problem
  - cost function c(s,a) -> RL -> learned policy(下個動作) -> RL policy's trajectiones s0, a0, s1, a1 -> inverse reinforceent -> cost function
- 逆增強(inverse RL)
  - 類似普通增強學習，只是換個方向來想，用比較低成本得到好的效果
- Occupancy Measure
  - 得到(state, action)的分佈
- Main result
  - seeks a policy whose occupancy measure is close to the expert's, as measured by ψ*
  - 推薦看李宏毅老師MLDS的第十堂課截圖1時間點：33:11 截圖2好像是 39:47
Generative Adversarial Imitation Learning(GAN 生成對抗網路 Generative Adversarial Networks)
- 用TRPO這個方法來更新，不要讓學習時太大的變化讓好的學習結果跑偏。不要走看起來比較危險的地方。
- 用Adam來classify
- 演算法步驟右邊那頁
- 論文建議看之前先看李宏毅老師的課程
MuJoCo
- 學生不用錢
OpenAI Imitation github repo
- 講者介紹的程式碼
- 程式最後可能還會超越expert

16:10-16:55
Talk: Connect "K" of SMACK：pykafka, kafka-python or ?

Slider: https://www.slideshare.net/sucitw/connect-k-of-smackpykafka-kafkapython-or
Speaker: Shuhsi Lin

SMACK 是幾個東西的縮寫
- Spark
- Mesos
- Akka
- Cassandra
- Kafka
3 paradigms for programings
- request/response
- batch
- streaming
  - Data comes from rise of events (orders, sales)
Data pipeline
Kafka
- Fast
- Scalable
- Durable
- Distributed
Kafka terminology
Subscribe/Publish
Consumer 是靠offset拿data，version 0.8
why kafka is fast, store in disk, not random access. speed is linear.
Topics and Partition
- partition is ordering and immutatble
Consumer Group
Delivery Guarantees:
- at most once, exactly once
ZooKeeper
Kafka timeline: 2010 ~ 2016
ver 0.8 前後差很多
TLS connection (Security)
before 0.8 : no security
Kafka is considered as :
- commit log service
- meassging system
- circular buffer
Cons of Kafka:
- consumer complexity (smart, not poor)
- Lack of tooling/monitoring (3rd party)
- still pre 1.0 release
- Operationally. It's more manual than desired
- Require ZooKeeper
Kafka Use Cases:
Linkedin uses Kafka
Spotify
2+2 Core APIs
- Producer APIs
- Consumer APIs
- Connect APIs
- Streams APIs
- Legacy APIs
Kafka Clients
Java, C/C++, Python
Reliability, Performance, API Ability,
Create your own Kafka Broker
Landoop, Cloudkarafka, Cloudera, HortonWorks, Confluent, Vagrant, Docker
Kafka Shell Scripts
python kafka performance is slower than Java
Q&A: Kafka provides High level and low level API, which to use?

api Kafka-python 是原生 python 所開發。

17:20-17:50
Talk: Tensorflow & Python: Fault Detection System

本場次不錄影
Slider: https://www.slideshare.net/EricAhn/tensorflow-and-python-fault-detection-system-pycon-taiwan-2017
Speaker: Eric Byungwook Ahn

Lots of services to monitors

Many views. charts and alarm system in IDC Center.

I would like to detect a Fault with ML

ault

Many log formats: apache, squid, custom format…

Type of log:

kernel log, system, cron…

Example of system of logs

h2LOG2ML

log data is also natural language.
The sequence of words and expression is important sequential data

ML: binary classifiction, supervised, RNN, topic model…

word2vec, doc2vec, paragraph2vec, RNN, CNN

The speaker chose CNN

As you know, CNN is an architecture to process for image classification.

what is convoluation layer
compute is the output of neurons that are connected to local regions in the input. each computing a dot product between their weights and a small region they are connected to in the input volume.

Original Image + 3x3 filter -> Convolution Layerilter -> Convoluted image
(example 3x3 filter: 1 0 1 \n 0 1 0 \n 1 0 1)

Can we use CNN on documents?

using text CNN filter save a locally information of text, sequential data context information

可參考 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

講者code可以不要這樣配色嘛…

XD
應該沒有slide可以參考對嗎
應該是…沒放出來
ｑｑ
左上角很考驗視力
有人看得到reference嗎

PyCon TW 2017 Collaborative Talk Notes Day 1 - R0

Quick Links

How to update this note?

10:50-11:20 Talk: 淺嚐LHCb數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib

11:45-12:30 Talk: TensorFlow Wide & Deep: Data Classification the easy way

14:55-15:40 Talk: 模仿遊戲: AI如何與人類互動獲得新技能

16:10-16:55 Talk: Connect "K" of SMACK：pykafka, kafka-python or ?

17:20-17:50 Talk: Tensorflow & Python: Fault Detection System