# Dialogue System with Deep Learning
Yoctol Data Team Lead
朱柏憲
cph@yoctol.com
---
![](https://i.imgur.com/GoR82wu.png)
---
## Outline
- Intro <!-- .element: class="fragment" -->
- Frame-Based Dialogue System <!-- .element: class="fragment" -->
- How Deep Learning helps? <!-- .element: class="fragment" -->
- Beyond Frame-Based Dialogue System <!-- .element: class="fragment" -->
<aside class="notes">
</aside>
---
## Intro
---
Why Chatbot (Dialogue System)?
----
![](https://i.imgur.com/BHD4X69.png =720x480)
<aside class="notes">
這幾年,各大 IM 工具皆推出自己的聊天機器人平台
</aside>
----
![](https://upload.wikimedia.org/wikipedia/commons/9/9c/%E3%82%B9%E3%83%9E%E3%83%BC%E3%83%88%E3%82%B9%E3%83%94%E3%83%BC%E3%82%AB%E3%83%BC.jpg =531x480)
<aside class="notes">
另外,語音家庭助理也漸漸出現在人們的生活中
</aside>
----
Apps of Chatbots
![](https://i.imgur.com/MWFuwSQ.png =540x680)
<aside class="notes">
聊天機器人的應用相當廣泛
</aside>
----
![](https://blogs.gartner.com/smarterwithgartner/files/2017/08/Emerging-Technology-Hype-Cycle-for-2017_Infographic_R6A-1024x866.jpg)
<aside class="notes">
聊天機器人的技術正在上升中
(假如有時間的話,口頭補充)
事實上,最早的 chatbot 可以追溯到 1966 ELIZA ,那為甚麼這幾年會特別熱門呢? 夠大的運算資源,夠好的演算法,生活型態的改變(人手一機,多工成為常態),夠大的平台等等。
最近 [Woebot](https://woebot.io/) 展示了聊天機器人如何在心理諮商這塊,應用簡單技術就達到很好的心輔成效。
</aside>
---
What to consider when designing dialogue systems?
<aside class="notes">
這邊不太提如何製作聊天機器人,推薦大家敝公司的 [Bottender](https://github.com/Yoctol/bottender)
</aside>
----
Task Oriented vs Chat Oriented
<aside class="notes">
以完成任務為導向,還是盡可能地跟使用者互動。前者會講求有效率的達到目標,後者想提升使用者黏著度
</aside>
----
Rule Based vs Model Based
<aside class="notes">
使用手刻的規則、程式邏輯,還是用語意模型
</aside>
----
End to End vs Modulized
<aside class="notes">
一句話進來,是通過一個巨大的 function,還是一個模組。前者不易維護,但是理論上 optimization 比較好做,後者容易維護,抽換,但要為每個模組都定義一個適合的 objective function 或 test case。
</aside>
---
## Frame-Based Dialogue System
---
[Bobrow et. al., 1977](https://pdfs.semanticscholar.org/c250/5e3f0e19d50bdd418ca9becf8cbb08f61dc1.pdf)
Design Dialogue System based on domain ontology. The ontology has n frames with several slots.
<aside class="notes">
Siri, Alexa 就是採用這個架構。目前市面上的 NLU 工具也大多以這個模式出發。
</aside>
---
![](https://i.imgur.com/kVgqyAN.jpg)
<aside class="notes">
這種架構有蠻多好處。首先,容易設計和定義功能。
再來,相當的模組化。開發上,只要這個系統的 ontology 不變,任一塊都可以抽換的。這相當重要。開發的早期,我們很常會先用間單的方法來實驗,而
</aside>
---
**Domain Classification**
<aside class="notes">
限縮聊天機器人處理的範疇,簡化後續步驟的複雜度
</aside>
----
for a personal assistant:
`查明天飛往紐約的航班` => 旅遊相關
`預定明天的會議室` => 行程規劃
----
##### methods
- RegExp <!-- .element: class="fragment" -->
- Sequence Classification Models <!-- .element: class="fragment" -->
- guide by UX design <!-- .element: class="fragment" -->
<aside class="notes">
UX 設計經常是被工程師所忽略的一個重要方法,但很多時候非常的有效
</aside>
---
**Intent Detection (Intent Classification)**
<aside class="notes">
確定使用者要處理的 domain 之後,進一步判斷使用者的細部意圖。通常這時侯可以假設使用者不會離開某個狀態。至於有那些意圖,就是設計者的工作。
</aside>
----
`明天飛到大阪有哪些班次?` => USER_SEARCH
`預定明天下午三點的航班` => USER_BOOKING
`改成後天好了` => USER_ADD_CRITERIA
----
##### methods
- RegExp <!-- .element: class="fragment" -->
- Sequence Classification Models <!-- .element: class="fragment" -->
<aside class="notes">
</aside>
----
### Diff. from Domain Classification
- n_intents >> n_domains <!-- .element: class="fragment" -->
- Rather closed set <!-- .element: class="fragment" -->
- multi-label problem <!-- .element: class="fragment" -->
<aside class="notes">
</aside>
---
**Entity Extraction (Slot Filling)**
<aside class="notes">
</aside>
----
`請幫我預定明天下午三點從小港出發到大阪的航班`
- TIME=`明天下午三點`
- DEPARTURE=`小港`
- DESTINY=`大阪`
----
##### methods
- RegExp <!-- .element: class="fragment" -->
- Sequence Labeling Models (HMM, CRF, etc) <!-- .element: class="fragment" -->
<aside class="notes">
</aside>
---
**Dialogue Policy**
<aside class="notes">
</aside>
----
Airplane Bot:
- Actions
- BOOKING
- ASK_FOR_MORE_INFO
----
User: 請幫我預定明天飛到大阪的機票 <!-- .element: class="fragment" -->
state: USER_ASK(DATE='4/1', DESTINY='KIK') <!-- .element: class="fragment" -->
Bot: based on the state, take action: ASK_FOR_MORE_INFO() <!-- .element: class="fragment" -->
User: 我要訂XX航空的 <!-- .element: class="fragment" -->
state: USER_ADD_CRITRIA('XX') <!-- .element: class="fragment" -->
Bot: (based on the state, take action: BOOKING) <!-- .element: class="fragment" -->
----
##### methods
- Finite state machine (Easy to implement with [State Pattern](https://en.wikipedia.org/wiki/State_pattern))
- Reinforcement learning
-
<aside class="notes">
</aside>
---
**Response Generation**
<aside class="notes">
</aside>
----
state: USER_ASK(DESTINY='KIK')
action: ASK_FOR_MORE_INFO('DATE')
=> 目的地大阪,請問您要哪一天抵達?
----
### methods
- Template (plz, use this)
- Generative Model
<aside class="notes">
</aside>
---
## How Deep Learning helps?
---
**Word Embeddings**
word => vector
----
**Low dimention Word Embeddings**
----
Why use it?
- Faster computing time
- Rich latent information
----
#### word2vec
- Train a linear model to "predict neighbor by self" or "predict self by neighbor". The embedding is the model parameters.
- [FastText](https://fasttext.cc/)
----
#### GloVe
[Project page](https://nlp.stanford.edu/projects/glove/)
- Factorize word co-occurence matrix with a special reweighting function.
----
Other interesting topics:
- Unkown token <!-- .element: class="fragment" -->
- cross-lingual embeddings <!-- .element: class="fragment" -->
- [MUSE](https://research.fb.com/downloads/muse-multilingual-unsupervised-and-supervised-embeddings/) <!-- .element: class="fragment" -->
- Word Vectors in non-Euclidean space <!-- .element: class="fragment" -->
- [Poincare Embedding](https://arxiv.org/abs/1705.08039) <!-- .element: class="fragment" -->
<aside class="notes">
這類技術大家已經很熟了,但我想提幾個目前的問題
1. 未知詞要怎麼處理
2. 跨語言的 word embedding。
3. table 太大,有沒有更有效率的做法?
</aside>
---
**Sequence Classification Models**
<aside class="notes">
序列分類的模型輸入一個序列的資料,然後分類或標記。可以用在 domain 或 intent classification 問題上。
</aside>
----
#### Recurrent Neural Network
- RNN, LSTM, GRU, SRU, etc. <!-- .element: class="fragment" -->
- Input word embedding sequence, use the output of the last time step to classify. Learn through BPTT. <!-- .element: class="fragment" -->
- Enhancement skills: Bidirectional, attention mechanism, etc <!-- .element: class="fragment" -->
----
#### [Convolution Neural Network](https://arxiv.org/abs/1408.5882)
- SOTA sentiment classification model <!-- .element: class="fragment" -->
- Use Global Pooling to fit variable length data <!-- .element: class="fragment" -->
![](https://i.imgur.com/ztKyVNm.png =600x300) <!-- .element: class="fragment" -->
- Enhancement skills: BN, SELU <!-- .element: class="fragment" -->
----
Difficulties we've encountered:
- Unknown Intentions <!-- .element: class="fragment" -->
- threshold for the largetest prediction
- Add `unknown` as a extra class
- Use model with confidence
- Lack of labeled data <!-- .element: class="fragment" -->
- Few shot learning <!-- .element: class="fragment" -->
---
**Sequence Labeling Model**
<aside class="notes">
序列標記的模型依序輸入一連串的資料,然後對每步驟的輸入進行標記。可以用在 slot filling 。
</aside>
----
#### Recurrent Neural Network
- RNN and its sucessors
- RNN-CRF
----
#### Convolution Neural Network
- [Gated Convolutional Neural Network](https://blog.yoctol.com/yoctol-paper-note-ep-19-language-modeling-with-gated-cnn-5245343f4c79)
----
Difficulties we've encountered:
- Synonym handling <!-- .element: class="fragment" -->
- Typo <!-- .element: class="fragment" -->
- Special Words <!-- .element: class="fragment" -->
- Use RegExp! <!-- .element: class="fragment" -->
---
**Reinforcement Learning**
<aside class="notes">
強化學習可以讓模型學著如何
</aside>
(With Markovian assumption)
Maximize reward, learn a AI:
state => action
----
BE CAUTIOUS, RLs are:
- Hard to train
- sensitive to reward design and hyper-parameter
----
Given current state, how to maximize reward the policy model to act
----
#### Contextual Bandit
- LinUCB
- LinThompSamp
<aside class="notes">
CB 是發展很久的領域,而且通常有很好的理論 reward bound
</aside>
----
[Model-free RL vs Model-based RL](https://www.quora.com/What-is-the-difference-between-model-based-and-model-free-reinforcement-learning)
----
#### Model-Free RL algorithms
AI simply optimize reward, without knowing how the environment change through actions.
- Policy Based methods
- Value Function based methods
- Actor-Critic
----
#### Model Based Method
AI knows about how the world works by training a world model.
- [Curiosity model](https://arxiv.org/pdf/1705.05363.pdf)
- [Formal Theory of Creativity, Fun, and Intrinsic Motivation](http://people.idsia.ch/~juergen/ieeecreative.pdf)
----
#### Evolution Strategy
- [Evolution Strategies as a Scalable Alternative to Reinforcement Learning](https://blog.openai.com/evolution-strategies/)
---
**Generative Model**
<aside class="notes">
這部分我認為離實際商用應該還有一段距離。目前來說應該還是 rule-base 或套版效益比較高。但這個領域很值得關注。
直接套版雖然快糙猛,但很可能會遇到文法不通順的問題。至於文法不通順但回答正確,對於使用者體驗是否有傷害,這就需要實驗了
</aside>
----
#### Autoencoder
- Seq2Seq
- SOTA: [VAE](https://arxiv.org/abs/1511.06349)
<aside class="notes">
AE 包含了 encoder 和 decoder 兩個部分。encoder 負責把輸入壓縮到一個比較小的維度。
VAE 多了一個 variational 的限制,甚麼意思呢
(if 有空,補充一下 [manifold assumption](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/))
</aside>
----
#### Generative Adversarial Network
- WGAN-GP
- SeqGAN and its successors, like [MaliGAN](https://arxiv.org/abs/1702.07983), [IRGAN](https://arxiv.org/abs/1705.10513)
- [Dialogue Generation](https://arxiv.org/abs/1701.06547)
<aside class="notes">
AE 包含了 encoder 和 decoder 兩個部分。encoder 負責把輸入壓縮到一個比較小的維度。
VAE 多了一個 variational 的限制,甚麼意思呢
(if 有空,補充一下 [manifold assumption](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/))
</aside>
---
## Beyond Frame-Based Dialogue System
---
**QA Bot**
<aside class="notes">
</aside>
----
Question Answering Model
from SQuAd task
- [RNet](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)
<aside class="notes">
</aside>
---
**End-to-end Model through Hierachical RNN**
<aside class="notes">
</aside>
----
- [HRED](https://arxiv.org/abs/1507.02221)
- [VHRED](https://arxiv.org/abs/1605.06069)
- [MrRNN](https://arxiv.org/abs/1606.00776)
<aside class="notes">
堆疊 RNN 來 e2e 的訓練 dialogue system。
</aside>
---
**Text to Command**
<aside class="notes">
自然語言轉換成電腦可讀指令。
</aside>
----
- [Seq2SQL](https://arxiv.org/abs/1709.00103)
- [SQLNet](https://arxiv.org/abs/1711.04436)
<aside class="notes">
作為查詢用途的聊天機器人其實蠻常見的。如果有一套成熟的。目前這兩個模型的大概只有六七十趴,也許資料累積夠多
</aside>
---
# Thank you
---
# Appendix
---
## Few Shot Learning
---
(See [my slides 少量資料訓練聊天機器人的語意模型](https://docs.google.com/presentation/d/1IEX9ucriC48k6fiZ6owqQK0rW689KbXOC_FzSEW8WDw/edit?usp=sharing))
[Yoctol Paper Note.15](https://blog.yoctol.com/%E5%84%AA%E6%8B%93-paper-note-ep-15-few-shot-learning-part-i-493e0e61b116)
[Yoctol Paper Note.17](https://blog.yoctol.com/%E5%84%AA%E6%8B%93-paper-note-ep-17-few-shot-learning-70211e288533)
---
## GANs
- [Why gans are not easy to process on text](https://www.reddit.com/r/MachineLearning/comments/40ldq6/generative_adversarial_networks_for_text/)
---
## Conversational Intelligence Challenge
http://convai.io/
---
{"metaMigratedAt":"2023-06-14T16:04:58.551Z","metaMigratedFrom":"Content","title":"Dialogue System with Deep Learning","breaks":"true","contributors":"[]"}