Deep Learning Summit London 2019 - Day 2

Deep Learning Summit London 2019 - Day 2 === ###### tags: `RE.WORK` `Lectures` `Deep Learning` # Day 2 :::info - **Date:** Sep 20, 2019 - [Link to Schedule](https://www.re-work.co/events/deep-learning-summit-london-2019/schedule) ::: --- ## Machine Learning Systems Design - UNIVERSITY OF SHEFFIELD > [name=Neil Lawrence, DeepMind Professor of Machine Learning University of Cambridge & University of Sheffield] Interesting presentation on the nuances of deploying ML systems in uncontrolled environments. Neil Lawrence is a famous Machine Learning expert, with a long experience in Gaussian Processes ![Neil](https://i.imgur.com/yMO3Zda.jpg) Automation historically required **humans** to adapt: - remove people from production lines - build completely new streets (tarmac) and remove pedestrians, carriages & horses from them, to make way to cars AI promises to be the first automation wave which instead **adapts to us** ![automation](https://i.imgur.com/wNS8xMk.jpg) ![artificial-vs-natural](https://i.imgur.com/r5kKwTU.jpg) ![fitness](https://i.imgur.com/Jb4SwWi.jpg) ![decomposability](https://i.imgur.com/Loagnz6.jpg) Unlike natural systems, which had to incorporate redundancies in order to be robust to a constantly changing environment, AI systems are fragile ![AI-fragility](https://i.imgur.com/KCl10cz.jpg) ![data-crisis](https://i.imgur.com/9Lfw79i.jpg) ![data-separated-from-code](https://i.imgur.com/6qVIQId.jpg) ![peppercorns](https://i.imgur.com/L9WZlpU.jpg) How to block Siri with a peppercorn (not possible anymore with the current production version) {%youtube R3-2z4GFd9I %} In order to be able to continuosly test these systems for fragilities, bugs, peppercorns, etc., we can augment them with *emulators*, which are much faster to test at scale and in realtime. At the same time, we need an environment where, once a weakness is captured by testing these emulators, a new improved model can be quickly redeployed ![emulators](https://i.imgur.com/TMjZfFF.jpg) We are not ready. ![nl-conclusions](https://i.imgur.com/5vTpLA4.jpg) Q&A: Neil is not convinced an end-to-end approach to autonomous driving is safe, even though he finds it interesting. Also, he suggests having a look at his video on [data oriented programming](http://inverseprobability.com/talks/notes/modern-data-oriented-programming.html), which as far as I understand is basically probabilistic programming. --- ## Industrial Time Series Anomaly Detection - AIRBUS > [name=Sergei Bobrovskyi, Data Scientist] one of the most interesting talks of the whole event. ![airbus](https://i.imgur.com/kLPFI8q.jpg) Topic: anomaly detection in $d-$dimensional time series in a real world enviroment (flying airplane) using Deep Learning. $d \approx 500$ ![time-series](https://i.imgur.com/ar6u50o.jpg) ![open-dynamical-systems](https://i.imgur.com/DMH5aTq.jpg) Rule-based approaches only find severe anomalies, but not signs that something could grow to be severe (no early warning). Defining whether a certain pattern is anomalous or not, requires **expert judgement** and **looking at different 1D time series at the same time**. For example, in the bottom plot, oscillations in the red time series are not anomalous when the green and blue time series are switching at the same time (bottom-left), but they are when green and blue are more or less steady (first red interval, bottom right), or when they do switch, but the oscillation of red starts small and then increases, rather than vice versa (second red interval, bottom right). ![anomaly-detection](https://i.imgur.com/hdTpyXD.jpg) the validation of their model includes a domain expert step (not fully automated) ![industrial-solution](https://i.imgur.com/2GJqLxZ.jpg) The Deep Learning approaches to Anomaly Detection for multivariate time series are mostly divided in _predictive_ ones (basically, forecasting with RNNs) and _reconstructive_ ones (basically, encoder-decoder architectures with LSTM modules, such as seq2seq or similar stuff) ![an-dect-deep-learn](https://i.imgur.com/ossmyqm.jpg) Airbus basically uses a large LSTM. Their approach is detailed in the NASA paper cited, but they changed the threshold choice ![airbus-approach](https://i.imgur.com/zwiRdxY.jpg) Very interestingly, Airbus built an AI Gym and held an open competition to find anomalies in their dataset ![ai-gym](https://i.imgur.com/TVGxCuE.jpg) quite a large number of time series. Most of the anomalies identified by engineers went undetected by the competitors. Competitors didn't have access to the validation test (real anomaly detection), nor they got any info about the physical system from which the data were generated (an aircraft subsystem, different from the engine: the lecturer didn't say which one) ![ts-challenge](https://i.imgur.com/KjhZs8Q.jpg) Results: academic teams didn't manage to get an F2-score above 0.02(!). Even the best industrial competitor, which won, got an F2-score of 0.51 ![challenge-results](https://i.imgur.com/47Yc7we.jpg) ![next-steps](https://i.imgur.com/ZQewIvt.jpg) Heavy Q&A: - Q: how to set the anomaly threshold? A: NASA approach doesn't work well for large number of parameters, so Airbus fits a probability distribution (not a Gaussian one, some heavy-tailed one) to the distribution of absolute residuals. - Q: how long did you take you to label the data? A: 12 hrs, 2 expert engineers - Q: you mentioned you start from a subsystem, and then you extend the model. What do you change? A: we mostly reduce the number of parameters using dimensionality reduction, in order to keep the model manageable/trainable - Q: what's the secret behind DataPred's much better results? A: they have a way to perform model ensembling in real time, according to the historical accuracy of each model - Q: predictive methods for AD often have the issue that if a large anomaly is suddenly registered on a sensor (e.g., sensor failure), then the anomaly is "propagated" to the prediction of all other sensors. How do you fix that? A: we found this issue and noted that predicting many steps ahead in time, rather than a single step ahead in time, helps. - Q: how do you treat categorical variables? A: we don't really have categoricals, we have either numeric continuous or numeric binary. LSTM can handle both without any special intervention. When we will build a bigger model which handles multiple different subsystems at the same time, we'll probably have to deal with categoricals. --- ## Deep Learning for Space Exploration - NASA JPL > [name=Shreyansh Daftry, Research Scientist] The JPL is using Deep Learning to help design the next mission to Mars. Autonomous agents are very important because of the large delay in comms from Earth to Mars and back. ![AI-space-expl](https://i.imgur.com/eWMmVus.jpg) Of course, main interest is in AI for robots ![capabilities](https://i.imgur.com/5PvoMAV.jpg) CNNs are used to perform semantic segmentation in the rover video stream, and guide navigation. ![mobility-nav](https://i.imgur.com/ahcENgF.jpg) Main issue: annotating the type of terrain requires experts in Martian geology - few data available. JPL is investigating in sample-efficient architectures, but for now it used standard ones ([DeepLabv3](https://arxiv.org/abs/1706.05587)) which are definitely not sample-efficient ![annotations](https://i.imgur.com/hPrdLku.jpg) The model was deployed during Curiosity mission, and it reduced the navigation time (better tracks chosen) ![curiosity](https://i.imgur.com/AVIODjb.jpg) Mars 2020 - the new NASA mission to Mars![mars2020](https://i.imgur.com/C3cOiOJ.jpg) ![app-landing](https://i.imgur.com/pFn8nCB.jpg) ![](https://i.imgur.com/UJfypQq.jpg) ![](https://i.imgur.com/Z549CrR.jpg) AI challenges for space exploration are similar to usual challenges, with the added difficulty of deploying on very resource constrained systems (on Mars, every single Watt of power is hard-earned)![](https://i.imgur.com/adbVFdZ.jpg) --- ## [COTA: Improving the Customer Support Experience Using Deep Learning](https://arxiv.org/pdf/1807.01337.pdf) - UBER > [name=Aditya Guglani, Data Scientist] An interesting presentation about building a DL model which helps customer support by 1) helping the operator by selecting the ticket class (**content type identification**), 2) rerouting tickets to the right teams, 3) propose three possible replies to the ticket (**reply template selection**) . The presentation is a very detailed description of an industrial use case. ![agenda](https://i.imgur.com/emsFGla.jpg) ![cust-supp](https://i.imgur.com/kuI0j2y.jpg) ![ticket-cycle](https://i.imgur.com/i81QUEE.jpg) ![challenge-1](https://i.imgur.com/T597fC7.jpg) ![challenge-2](https://i.imgur.com/DM5kdUS.jpg) COTA v1 (Customer Obsession Ticket Assistance) is based on feature engineering, rather than on Deep Learning. ![cota-v1-1](https://i.imgur.com/eaEbz6G.jpg) Two approaches are compared. Pure classification (attribute a class to each ticket) and pointwise ranking (rank classes in terms of distance from ticket) ![cota-v1-2](https://i.imgur.com/5mJwm4K.jpg) ![cota-v1-3](https://i.imgur.com/kW8ZYYh.jpg) With +4000 classes and +200 dimensions of the LSA vector, straight classification struggles. Cosine similarity works better because it reduces a lot the dimensionality of the binary classifier input ![cota-v1-4](https://i.imgur.com/yzDUQme.jpg) ![cota-v1-rollout](https://i.imgur.com/D9fbZVr.jpg) AB testing was used to quantify the impact of COTA v1 ![cota-v1-abtest](https://i.imgur.com/JH8T38d.jpg) ![cota-v1-online-perf](https://i.imgur.com/ZvRw25Q.jpg) COTA v1 business impact was 20+M$. The reduction in handling time for each ticket was small (6%) but the sheer volume of tickets is so large that a large total benefit was realized. Note the misleading plot, where the zero of the y-axis is not included ![cota-v1-BI](https://i.imgur.com/qhzGOgt.jpg) To increase COTA business impact, Uber leveraged Deep Learning. An Encoder-Combiner-Decoder architecture was selected ![cota-v2](https://i.imgur.com/NSSm0Gy.jpg) Two different text encoders were tested: char/wordCNN and char/wordRNN ![word-CNN](https://i.imgur.com/zlxoedd.jpg) ![word-RNN](https://i.imgur.com/8qN6dye.jpg) All encoders performed more or less the same in terms of accuracy, thus the word-CNN was selected, being 9 times faster but just 1% less accurate than the most accurate encoder. ![comparision](https://i.imgur.com/2AxywWJ.jpg) ![feature-importance](https://i.imgur.com/ok8YZHK.jpg) COTA performance drops with time... ![online-perf](https://i.imgur.com/S3uOlW7.jpg) ...thus a regular retraining schedule is needed. This is another reason why simple models (faster to retrain) are to be preferred ![retrain](https://i.imgur.com/sjq5vAv.jpg) ![](https://i.imgur.com/Q1PknQN.jpg) ![](https://i.imgur.com/pAal7HG.jpg) ![](https://i.imgur.com/4MxSvIx.jpg) ![conclusions](https://i.imgur.com/Y8V0bh0.jpg) As a side note, the [Ludwig toolbox](https://uber.github.io/ludwig/) was developed out this project. --- ## Conclusions The event was really interesting, and its concurrency with the AI Assitant Summit allowed people to move from one session to the other session. Many interesting methods and papers were discussed. At the end of each day, attendees gathered to share their experiences and learnings from the day, thus it was also a good networking event. It would have been nice if the organization also set up an external get-together event, such as a dinner, but overall it was definitely a great experience.