Quick Links
- Portal for Collobration Notes 共筆統整入口 (hosted by betw.hackfoldr and HackMD)
- Program Schedule 演講列表
- PyCon TW 2017 Official Site 官網
How to update this note?
- Everyone can freely update this note. 任何人都能自由地更新內容。
- Please respect all the participants and follow our code of conduct during discussion. 討論、記錄時,請遵守大會的行為準則
大霹靂的四大問題
Machine Learning is nothing new in HEP
The Kaggle Chanllenge
Read training data
Correlation Matrix
Signal & Background 的分佈
Define training features
Baseline traning
Check the training curve
KS Test & CvM Test
AUC ROC
Check the predict distribution
Prediction test, create file for kaggle
Q: Signal vs Backgorund
A: MC-已知、Data-做額外分析
A multidimensional array -> Tensor Flow <- graph of operations
tf support many platforms
架構:
canned estimators
estimator, keras model
layers
Python or C++ and more coming frontend
tf distribution
sample:
movtivations- a magical food app
Wide (Memorize the exeptions) and Deep (More generalized)
為什麼要讓機器模仿人類:因為各種現實中奇怪的狀況
https://www.youtube.com/watch?v=hXxaepw0zAw
增強式學習(reinforcement learning)
目的:learn policy
action -> env -> stage,reward -> agent -> action
RL 需要 rewards
Imitation {(s0,a0,s1,a1)} 記錄專家示範的軌跡
為什麼不用 Supervised learning (例如卷積網路)來做就好?看起來不太能運作。
Estimation problem
逆增強(inverse RL)
Occupancy Measure
Main result
Generative Adversarial Imitation Learning(GAN 生成對抗網路 Generative Adversarial Networks)
SMACK 是幾個東西的縮寫
3 paradigms for programings
Data pipeline
Kafka
Kafka terminology
Subscribe/Publish
Consumer 是靠offset拿data,version 0.8
why kafka is fast, store in disk, not random access. speed is linear.
Topics and Partition
Consumer Group
Delivery Guarantees:
ZooKeeper
Kafka timeline: 2010 ~ 2016
ver 0.8 前後差很多
TLS connection (Security)
before 0.8 : no security
Kafka is considered as :
Cons of Kafka:
Kafka Use Cases:
Linkedin uses Kafka
Spotify
2+2 Core APIs
Kafka Clients
Java, C/C++, Python
Reliability, Performance, API Ability,
Create your own Kafka Broker
Landoop, Cloudkarafka, Cloudera, HortonWorks, Confluent, Vagrant, Docker
Kafka Shell Scripts
python kafka performance is slower than Java
Q&A: Kafka provides High level and low level API, which to use?
api Kafka-python 是原生 python 所開發。
Lots of services to monitors
Many views. charts and alarm system in IDC Center.
I would like to detect a Fault with ML
ault
Many log formats: apache, squid, custom format…
Type of log:
Example of system of logs
log data is also natural language.
The sequence of words and expression is important sequential data
ML: binary classifiction, supervised, RNN, topic model…
word2vec, doc2vec, paragraph2vec, RNN, CNN
The speaker chose CNN
As you know, CNN is an architecture to process for image classification.
what is convoluation layer
compute is the output of neurons that are connected to local regions in the input. each computing a dot product between their weights and a small region they are connected to in the input volume.
Original Image + 3x3 filter -> Convolution Layerilter -> Convoluted image
(example 3x3 filter: 1 0 1 \n 0 1 0 \n 1 0 1)
Can we use CNN on documents?
using text CNN filter save a locally information of text, sequential data context information
可參考 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
講者code可以不要這樣配色嘛…
XD
應該沒有slide可以參考對嗎
應該是…沒放出來
qq
左上角很考驗視力
有人看得到reference嗎