owned this note
owned this note
Published
Linked with GitHub
AI Day Research : Prototype : Learning
===
###### tags: `learning`
## "Swift for TensorFlow" & " Introduction to tf.keras and TensorFlow 2.0" - Paige Bailey - Google Brain
Paige Bailey is the product manager for TensorFlow core as well as Swift for TensorFlow. Prior to her role as a PM in Google's Research and Machine Intelligence org, Paige was developer advocate for TensorFlow core; a senior software engineer and machine learning engineer in the office of the Microsoft Azure CTO; and a data scientist at Chevron. Her academic research was focused on lunar ultraviolet, at the Laboratory for Atmospheric and Space Physics (LASP) in Boulder, CO, as well as Southwest Research Institute (SwRI) in San Antonio, TX.
## "TensorFlow Lite: On-Device ML and the Model Optimization Toolkit" - Jason Zaman - Light
Machine Learning at the edge is important for everything from user privacy to battery consumption. This talk will give an overview of the different strategies to optimize models for on-device inference: pruning, integer quantization with the model optimization toolkit. Then there will be a demo of all these techniques together to run a model on an EdgeTPU.
Jason is the community lead for TensorFlow SIG-Build and an ML-GDE. He works as a machine learning engineer at Light doing computational photography in mobile cameras. Along with speaking regularly, he is also active in Open Source as a Gentoo Linux developer and maintainer of the SELinux Project.
## "Which image should we show? Neural Linear Bandit for Image Selection" - Sirinart Tangruamsub - Agoda
Sirinart is a data scientist at Agoda. Before joining Agoda, she was a postdoctoral researcher at the University of Goettingen. She has extensive experience in the fields of computer vision and natural language processing at various startups and corporates. Her current areas of interests include personalization and recommendation systems.
## "XLNet - The Latest in language models" - Martin Andrews - Red Dragon AI
Martin is a Google Developer Expert in Machine Learning based in Singapore - and was doing Neural Networks before the last AI winter... He's an active contributor in the Singapore data science community and is the co-host of the Singapore TensorFlow and Deep Learning MeetUp (with now with 3700+ members in Singapore).
## "Deep Learning on Graphs for Conversational AI" Sam Witteveen - Red Dragon AI
Sam is a Google Developer Expert for Machine Learning and is a co-founder of Red Dragon AI a deep tech company based in Singapore. He has extensive experience in startups and mobile applications and is helping developers and companies create smarter applications with machine learning. Sam is especially passionate about Deep Learning and AI in the fields of Natural Language and Conversational Agents and regularly shares his knowledge at events and trainings across the world, as well as being the co-organiser of the Singapore TensorFlow and Deep Learning group.
## "TensorFlow Extended (TFX): Real World Machine Learning in Production" - Robert Crowe - Google Brain
A data scientist and TensorFlow addict, Robert has a passion for helping developers quickly learn what they need to be productive. He's used TensorFlow since the very early days and is excited about how it's evolving quickly to become even better than it already is. Before moving to data science Robert led software engineering teams for both large and small companies, always focusing on clean, elegant solutions to well-defined needs. You can find him on Twitter at @robert_crowe
-------
## Paige Bailey: tf.keras
webpaige@google.com
Twitter: @DynamicWebPaige
**internal google training material for google engineers**
Alphafold paper
## The Internals of tf.keras
### Architecture
* Engine
* baselayer
* base network - DAG of layers
* model (network + training/evail loop)
* Sequential
*
### Layers and Models
[Layers docs](https://keras.io/layers/about-keras-layers/)
Everything is a layer: models are just layers
Computation from inputs -> outputs
batchwised computation
can't mix eager exec and static
manages states - training and inference mode
supports "type checking" automatic compat checks
frozen or unfrozen
serailised and deserialized - soon Mixed precision layers
DOES NOT DO:
device placement
no dataset no non-batch computation no outputless or inputless processing
Gave an example of a canonical lazy layer - most layers you build look like that, you don't hardcode what the layer does yet
`GradientTape()`something like a training loop
`optimisers.[optimiser]()`
### Functional Models
[Models docs](https://keras.io/models/model/)
similar to how layers work too!
Can nest also
Model is a layer: but also provides access to training, saving and summary/model visualization
Layer: "layer" or "blocks" in the literature
Model: "model" or "network"
model compile: substantial perf hit for eager ops because some parts of it are really not meant for eager exec
compile: build spec: fit: go through data
functional API -> create DAGs of layers
build directed acyclic graph -> show linkage of model
functional API is connectivity of layers : declarative models: no logic : all logic is contained inside the layers
all debug is done at compile time - you merely define the thing
static input compat check
model saving
model plotting
auto masking
good for model debug
Can check the entire model history
### training and inference
When you call `fit()`
it runs an entire list of functions
### losses and custom metrics
custom metric - init update state
add_metric
endpoint pattern
(can also write your own loops)
## "Deep Learning on Graphs for Conversational AI" Sam Witteveen - Red Dragon AI
DL: great for perception tasks, getting better at generative tasks:
but how do we get _reasoning_
GPT2: seems to have dumped all the knowledge into weights: but knowledge as weights is inefficient
Maybe use Graphs? like the nodal representation: undirected, digraph, weighted
Knowledge graphs: the 'knowledge panel': "freebase" wikidata cyc DBpedia wordnet prospera NELL geonames gdelt
Concept seems to be to train a model that does our google searches for us
Symbolic AI - adding of knowledge into conflict
Node - edge - node ( object-property-value) (subject predicate object)
(RDF data)
Informational Retrieval
- Right knowledge at right time?
- Custom Graphs?
- What happens if I have missing information?
DL for knowledge graphs: Getting knowledge out is hard - extraction ++
prediction on graphs: node classification v edge classification
what's the right graph classification?
Node regression mode?
Facebook Ego Network: node value and regression
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
node [color=Red,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
BarackObama->{Michelle}
Michelle -> {Malia Sasha}
}
```
So you can look up the Barack Obama node and check out the other properties
### Why are graphs hard?
meaing and features are in the relationships, not the nodes
no nice fixed position for each node
edges can be directed
non-euclidean space
Inductive bias of DL: euclidean space - fixed space sequences - lack of breakthroughs for representation of this space
How do we represent this input? node+edge embedding? adjacency matrixes - order n^2
deep walk - [node2vec](https://cs.stanford.edu/~jure/pubs/node2vec-kdd16.pdf) - random walks along the graph for N steps
treat as sentence - use skip grams - use random walks as sentence - we don't know if this is the best representation
[Graph-CNN](https://github.com/tkipf/keras-gcn): graph convolution: subgraphing of n nodes reachable from a given node
- assumes nodes connected implies likelihood of similarity
- loss on known nodes only
- treat all nodes as undirected
Kegra Kipf: keras-gcn
Relational-inductive Bias, deep learning
edge-node-global updating - looks at edges - gets embedding values - then nodes - and then a global update
Transfer learning with graphs? graph-in-graph-out? predict a new graph
## TF eXtended: Robert Crowe
@robert_crowe
Real world ML in production
Configuration, Data Collection, Serving Infrastructure, Process Managemet Tools, Analysis Tools,...
TF-extended exists to do this
[Ranking Tweets withTF](https://medium.com/tensorflow/ranking-tweets-with-tensorflow-932d449b7c4)
### Production Pipelining: TFX
Production ML:
- labelled data
- feature space coverage
- minimal dimensionality
- maximum predictive data
- fairness
- rare conditions
- data lifecycle management
Classic problems don't go away: SE problems still sit around!
["Hidden Technical Debt in ML Systems" ](https://bit.ly/ml-techdebt)
Data Ingestion - Validation - Feature Engineering - Train Model - Validate Model - Push if good - Serve Model
[Apache Beam](https://beam.apache.org/documentation/)
Component: Driver does job exec - Exec does work - Publisher updates ml.metadata (in a model validator)
some configurator file
pulls and writes back to the metadata store - based on dependency
task-aware pipeline: transform-trainer (classic)
++++ TFX: training data - input data -> transform -> transformed data -> trainer -> trained models
metadata store: contains artifacts and properties + Exec history of runs +
Metadata-powered functionality - remember what was previously run (and what data they were run on) + carry-over states from previous model runs - caching of previously computed outputs
Beam: unified batch and stream distributed processing API - SDKs in multiple languages + sets of runners
You application lives for years - want to compare, you need the metadata to visualise what's happened
Evaluator let's you check for individual slices in dataset. If one user isn't being served well, he's having a bad experience
Model objective is nearly always a proxy for your business objectives
World doesn't stand still
Data is never what you wished you had
ML Triangle
Business realities - Bad Data -Model needs Improvement
(Demographics? Insights? Processes?)
What-if tool - run inference on your model
TFX and Kubeflow pipelines: Kubeflow team takes TFX code and applies it to a Kubernetes environment
## Sirinart: How do we pick photos for Agoda?
The Multi-armed bandit: AB testing but... extreme
We know that each path has some expected reward - so we try everything!
Exploration-Exploitation
Thompson Sampling - updating posterior distribution with neural linear units to approximate the posterior distrib
Bayesian Linear regression on the representation on the last layer of a neural network
## XLNET: Martin Andrews, Red Dragon AI
### Transformer Architectures
Feed in tokens at the bottom - pass through layers - get result at the top
Masked multi-self attention
Sequential Attention: Turn the input into memory - "attend" to each portion - each step queries dot product to produce score - check match of query - feed into softmax to create attention distribution
q,k,v: queries (what) keys (why choose) values (what you get when you choose)
Transforming all of the stuff - modifies all of the input so that it's "more useful" for the stuff at the end -> take input, generate qkv, score them by dot product, softmax the value -> sum (this is each column)
Learning the meaning of a word through its context in the sentence.
Token and position embed: take words of the english language and then zip compress them into fragments - group words into each other then form a vocabulary - great way to form an infinite vocabulary
Positional embeddings - each position in the input stream comes up with a kind of "sine-wave position" phase -> can compare positional differences by this phase difference
Unsupervised training if I can:
BERT: introspection - don't do one word at a time - mask out some of the words and have the model play fill in the blanks - non-predictive but more like analytical - feedback in all directions over predictive
Reconfigurable output: don't need to retrain from scratch, but rather we can use it to understand text
None of the text is labelled, it's just live data. Stop the sentence and then get it to predict
### What's new in XLNET
from the same team that did BERT
two streams of attention
long memory like TransformerXL
Loads of compute => Results+
Fixing the Masking Problem: multiple mask entries CLASH in BERT: words in setences aren't independent - but MASK appearances are - MASK never appears in RL data
actual RL data is different from training data - better hiding: permutation process to omit certain tokens - rely on positional encoding to preserve order
Solution: split the streams
XL memory: need to make sure that the positional encoding 'joins up': train on whole words: not just tokens -> whole word gives BERT better results
Abandon "next-sentence-or-not" tasks
XLNet-Large - similar sizes to BERT-large
Heavy-compute word generators
### 1-minute glosses
Distil model to a CNN -> use model > train model to get answer
Adapter modes - don't update original transformer - add in extra trainable layers -> "fix up" (which is effective)
"Parameter Efficient Transfer Learning for NLP"
Last Layer "graph layer"
Multimodal learning -> MASK technique to "fill in" text and photos
VideoBERT
## Swift for TensorFlow: Paige Bailey
A next gen framework for ML:
ML Arxiv papers exceed Moore's Law: rapid accuracy improvements
[The Swift Programming Language](swift.org): Python-like but with typing: intuitive
swift for tensorflow allows you to differentiate any function just by adding a decorator `@differenciable`
functional approaches for swift: declare and then use optimizer to update
syntactically similar to Kotlin: useful anywhere C++ can go
Typed APIs; static detections for errors
Interoperability: No wrappers: import and then call: import the c or c++ directly -> using a python wrapper limits you to the pythonic single threading
works for python too: `import Python` and then using it: differentiable programming: language-integrated autodiff - any function in S4TF is differentiable
### Performance:
speedy low level perf: parity with c++
thread-level scalability: no GIL -> no bottleneck in data ingestion process
automatic graph extraction
## TFLite: On-device ML and Model Op Toolkit - Jason Zaman
@perfinion
jason@perfinion.com
Use on phone because everyone has one,lots of data that you can use but not to send over to a server, get an immediate reply
How it works is that it converts a model into TFLite
Model optimization toolkit: