kazuyahooo
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    1
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Machine Learning > 該筆記內容為李弘毅老師2021機器學習課程 ## 0705 [Fully Connected Layer & Loss function](https://ithelp.ithome.com.tw/articles/10220782) ### Training 步驟 >**Step 1:** Function with unknown >**Step 2:** Define loss from training data >**Step 3:** ==Optimization== ### Loss的可能性 ![](https://i.imgur.com/M6CSsRN.png) :::info - 名詞解釋 - Hyperparameter = 人決定的而非機器找出來的(eg.batch size, learning rate, sigmoid) - 2個reLU疊起來 = 1個Hard sigmoid (這兩個都是Activate func) - In general, training會選擇reLU,sigmoid訓練較困難 - (N-fold) Cross validation - Model bias (大海撈針但集合裡沒有針, ==model set is too simple==) - Optimization Issue (集合裡有針但找不到) ::: ## 0707 [【機器學習2021】類神經網路訓練不起來怎麼辦 (二): 批次 (batch) 與動量 (momentum)](https://www.youtube.com/watch?v=zzbr1h9sF54&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=7&ab_channel=Hung-yiLee) [![small-gradient-v7.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/small-gradient-v7.pdf) ### Optimization with Batch - 1 **epoch** = see all the batches one -> **Shuffle** after each epoch - Small Batch v.s. Large Batch - Small : ~~**Long** time for cooldown~~, but **powerful** - Large : ~~**Short** time for cooldown~~, but **noisy** - Larger batch size不一定需要更長的時間計算gradient (但也不能過大) - eg. Size = 60,000, Batch = 1 & 1000 比較 ![](https://i.imgur.com/goNRZIr.png =600x250) - "Noisy" update is better for training (每一步可能選到不同loss func) - Minima (Flat is better than Sharp) - eg. ![](https://i.imgur.com/T7mFUyX.png =550x300) :::danger <center>Batch size is hyperparameter you have to decide.</center> ::: ### Momentum - 下一步 = Gradient的反方向+ 上一步的移動方向 ![](https://i.imgur.com/aOXy9Sn.png =550x350) - $m^i$ is the weighted sum of all the previous gradient (init $m^0$=0 代入) :::warning ++Conclude:++ - Critical points have zero gradient - Critical points 有可能是<font color="red" >saddle points</font> or <font color="red" >local minima</font> - 可以由[Hessian matrix](https://www.easyatm.com.tw/wiki/%E9%BB%91%E5%A1%9E%E7%9F%A9%E9%99%A3)決定 - 可以從Hessian matrix的eigenvectors走出saddle point - Local minima可能很少 - ==Smaller batch size== and ==momentum== help escape critical points. ::: --- ### Learning Rate [【機器學習2021】類神經網路訓練不起來怎麼辦 (三):自動調整學習速率 (Learning Rate)](https://www.youtube.com/watch?v=HYUXEeh3kwY&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=6&ab_channel=Hung-yiLee) [![optimizer_v4.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/optimizer_v4.pdf) - Training stuck != Small Gradient - Error Surface陡峭, Learning rate小;反之Learning rate大 - 調整Learning Rate - Used in **Adagrad** ![](https://i.imgur.com/9riKyFU.png =550x350) - Used in **RMSProp** ![](https://i.imgur.com/5tR72BV.png =550x350) - Optimization中使用的**Adam**就是==RMSProp+Momentum== - Learning rate scheduling - Learning rate decay -> 因為靠近目的地,所以降低rate值 - Warm up -> rate要先變大再變小 (一開始先探索搜尋sigma和learning rate數據) :::info - 名詞解釋 - Convex function (凸函數) - Error surface (誤差曲線: Total Loss 對參數的變化) - Adagrad、RMSProp ::: :::warning ++Conclude:++ - momentum是梯度所有方向總和、sigma是絕對值只有大小,**兩者不會抵銷** ![](https://i.imgur.com/dSoLTD8.png =550x350) ::: ## 0708 [【機器學習2021】類神經網路訓練不起來怎麼辦 (四):損失函數 (Loss) 也可能有影響](https://www.youtube.com/watch?v=O2VkP8dJ5FE&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=7&ab_channel=Hung-yiLee) [![classification_v2.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/classification_v2.pdf) ### Classification - Regression & Classification - Classification: Class as **one-hot vector** ![](https://i.imgur.com/t5ABlDZ.png =550x350) - Softmax (使y_head能控制在0~1、用於2個分類以上) ![](https://i.imgur.com/A6DWZxS.png =450x80) - exp(x) = $e^x$, e=2.71 - Sigmoid(用於binary classification condition) - Loss of Classification ![](https://i.imgur.com/2ljDlJg.png =450x150) - pytorch中Cross-entropy (包含**softmax**) - MSE在**Large loss**的error surface非常平坦會卡住,Cross entropy不會 :::warning ++Concludes:++ - **Minimizing cross-entropy** is equivalent to **maximizing likelihood** > The difference between MLE and cross-entropy is that **MLE** represents ==a structured and principled approach to modeling and training==, and **binary/softmax cross-entropy** simply ==represent special cases of that applied to problems== that people typically care about. - Changing the loss function can change the difficulty of optimization ::: ## 0709 [【機器學習2021】類神經網路訓練不起來怎麼辦 (五): 批次標準化 (Batch Normalization) 簡介](https://www.youtube.com/watch?v=BABPWOkSbLE&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=9&ab_channel=Hung-yiLee) [![normalization_v4.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/normalization_v4.pdf) ### Batch Normalization - ==Changing Landscape==(使不同輸入x之間的範圍值差不多,達到好走的error surface) - 如何讓內部nodes的數值類似或相同 (統稱Feature Normalization) - 可以優化梯度下降法 - 但不只一開始的input要標準化,**神經層裡乘完權重後的值**也是另一種input,也需要標準化 - 內部nodes也要做feature normalization ![](https://i.imgur.com/Uqru495.png =600x400) - eg.$z^1$改值會影響其他值,所以採用Batch的方式而不是一次更新全部資料來減少GPU負擔 - Batch Normalization formula - 由γ控制Scale、β控制Shift ![](https://i.imgur.com/wkE4YqT.png =400x300) - Testing = Inference - Testing的時候就不用計算μ和σ,拿training算好的 - Pytorch在training的時候會紀錄每次batch的μ和σ並計算平均值(moving average) - Internal ==Covariate Shift== <font color="red">(有實驗證明和Training Network & BN可能不太有關係)</font> - Batch Normalization make a and $a'$ have similar statistics ![](https://i.imgur.com/jZCVaGE.png =600x400) :::info - 名詞解釋 - Covariate Shift: 假設使用X預測Y時,當X的分配隨著時間有變化時,**模型逐漸失效** - 知名 Normalization - Batch Renormalization - Layer Normalization - Instance Normalization - Group Normalization - Weight Normalization - Spectrum Normalization ::: :::warning ++Concludes:++ - 根據實驗結果及理論分析,Batch Normalization**有助於Optimization(改善error surface**) - Batch Normalization 優點 - 收斂快(Train faster)。 - Use higher learning rates - 權重初始化較容易(Parameter initialization is easier) - Activation function在訓練過程中易消失或提早停止學習,經過Batch Normalization 會再復活(Makes activation functions viable by regulating the inputs to them) - Better results overall - 有**類似Dropout的效果**,防止過度擬合(Overfitting);用Batch Normalization,就少一點Dropout,否則可能低度擬合(Underfitting)。 - **Explain**: 沒有BN的話,當input $x_1$很小、$x_2$很大,那$W_1$的值就要很大,$x_1$ * $W_1$才會和$x_2$ * $W_2$一樣大,但這是建立在$x_1$都很小的情況下,如果testing的資料不是,就會造成overfitting ::: ## 0716 [【機器學習2021】卷積神經網路 (Convolutional Neural Networks, CNN)](https://www.youtube.com/watch?v=OP5HcXJg2Aw&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=10&ab_channel=Hung-yiLee) [![cnn_v4.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/cnn_v4.pdf) ### Image Classification(Convolutional Layers & Pooling) - output is one-hot vector, eg.output = 2000 x 1 -> 有兩千種預估可能 - 3-D tensor= 長x寬xChannels - Typical Setting on Receptive field: - 簡化一 - all channels(**kernel** sizes: 3x3) - 通常會有64 or 128 neurons守備同一個receptive field - stride:位移大小,希望Rf間彼此有連接(overlap),不會漏掉資訊 - 邊邊的方式使用padding - 簡化二 - neuron可以有share paremeters(不同Rf才要,相同的話output也會一樣) - The whole CNN 1. Convolution(放大圖片) - CNN的network越深,3x3所代表的filter就能看到較大的pattern ![](https://i.imgur.com/gEAeHUx.png =600x400) 2. Pooling(縮小圖片) - Max Pooling(在一個NxN的小矩陣,只留下最大值) - 為了減少運算量,但可能==漏掉細節==,運算能力強的話可以full Convolution 3. Flatten(拉直矩陣) ![](https://i.imgur.com/gYM1YqV.png =600x400) :::info - 名詞解釋: - tensor: 維度超過2的矩陣 - channel: RGB的顏色強度, = 3 是彩色, = 1 是黑白 - Receptive field: 局部感受眼的性質,非全連接而是一小塊區域連接,這就是局部感受眼 - 可以重疊(也可以是同個field,藉以觀察更完整的pattern) - 可以是長方形 - 可以只觀察一些特定channels - Filter: 用於neuron共用參數的名稱,用來搜尋圖片中與filter一樣的pattern ::: :::warning ++Concludes:++ - Convolutional Layer ![](https://i.imgur.com/yGR85mt.png =550x350) - CNN認不得**放大縮小旋轉**的圖像 - ==Data augmentation==用來解決這個問題,在學習的時候需要將圖片截出一小塊進行放大縮小旋轉的動作讓CNN學習 ::: --- [【機器學習2021】自注意力機制 (Self-attention) (上)](https://www.youtube.com/watch?v=hYdO9CscNes&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=10&ab_channel=Hung-yiLee) [![self_v7.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/self_v7.pdf) [【機器學習2021】自注意力機制 (Self-attention) (下)](https://www.youtube.com/watch?v=gmsMY5kc-zw&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=11&ab_channel=Hung-yiLee) [![self_v7.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/self_v7.pdf) ### Self Attention - Self Attention: 整個sequence透過self-attnetion可以形成有context的向量 - 文字、語音、影像、圖像(社群連結、分子)都可以表示成Vector set - Input長度 = Output長度 (eg.POS tagging詞性標註、語音辨識、社群graph的特性) - 整個sequence 只有一個label (eg.情感分析、語音辨識speaker、哪種分子) ![](https://i.imgur.com/njmbn9s.png =600x400) - Implementation ![](https://i.imgur.com/OI2nNZu.png =600x400) - 在計算a'時,$a^1$當成query,其他$a^x$為key,各自乘完weight後做**dot product**,得到attention score(**值越大代表關聯性越高**) - 也可以把自己同時當成query & key做dot product - Soft-max(右上角為公式)可以換成其他Activate function ![](https://i.imgur.com/ZcflmFD.png =600x400) - 可以透過$b^i$的結果得知哪個$a'$的影響力最大,**再從$v^i$抽取關鍵資訊** ? - 透過右上角的公式可以算出b1,$b^i$的算法分別是把i當成query,其他當key,套入此公式 ![](https://i.imgur.com/xWMT8hb.png =600x400) :::info - 名詞解釋: - Multi-head Self-attention - head數目即為一個input的q、k、v分別要生成幾個 - 假設head設為2,則output會是(bi,1 & bi,2)兩個結果,在與W矩陣transform變成$b^i$ - Truncated Self-attention(可能不需要考慮整個input->cuz語音辨識資訊量大) - Positional Encoding - ==self-attention沒有位置的資訊== - 每個位置都有一個專屬的e,input+$e^i$完成標註 - hand-crafted、可以透過資料學習 ::: :::warning ++Concludes:++ - CNN v.s. Self-attention - CNN每個pixel只考慮receptive field,但self-attention會去計算所有關聯 - Self-attention只要設定特定參數==即可和CNN有一樣的效果==(CNN是self-attention的子集合) - **資料量小,適合限制較多**的模型(CNN),反之有彈性的model可能會overfitting - RNN v.s. Self-attention - 雖然RNN可以是雙向的,但最右邊的input難考慮到一開始的input - RNN無法平行化(Self-attention逐漸取代RNN架構) ::: ## 0720 [【機器學習2021】Transformer (上)](https://www.youtube.com/watch?v=n9TlOhRjYoc&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=14&ab_channel=Hung-yiLee) [![seq2seq_v9.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/seq2seq_v9.pdf) [【機器學習2021】Transformer (下)](https://www.youtube.com/watch?v=N6aRv06iv2g&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=14&ab_channel=Hung-yiLee) [![seq2seq_v9.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/seq2seq_v9.pdf) ### Transformer (Seq2Seq) - Input a sequence, output a sequence - 語音辨識、語音翻譯 (當input的語言沒有context、可用訓練集: 鄉土劇) - 機器翻譯 - 聊天機器人 - Syntactic Parsing - QA can be done by seq2seq - question、context -> Seq2Seq model -> answer #### Encoder ![](https://i.imgur.com/cNFd4nt.png =600x400) - 透過Positional Encoding解決沒有未知資訊的問題 - Add & Norm -> Residual + layer norm - Feed Forward(Fully connected network) ![](https://i.imgur.com/IkSjypr.png =200x250) ## 0724 #### Decoder (Autoregressive) - Masked Self-attention ![](https://i.imgur.com/SmpKyGQ.png =600x400) - 不考慮未出現的詞 (eg.a'2,2只考慮$a^1$ & $a^2$) - Encoder是一個一個output來當Decoder的input,所以只考慮目前左邊的內容 - AT v.s. NAT ![](https://i.imgur.com/CbRsrUU.png =600x400) - Decoder比Encoder多一個Cross Attention的過程 ![](https://i.imgur.com/AHfk0kJ.png =600x400) - 原始paper是拿Encoder最後一層layer的output當成Decoder的input ### Training - Copy Mechanism (Chat-bot: 從輸入copy詞彙當成輸出、文章摘要) - Guided Attention - 適用於語音辨識、合成 - 語音合成的attention是要從左到右,但在training的時候可能不會,要==強制== - Beam Search - 每次找分數最高的 -> Greedy Decoding,可能不是最佳解 - 要看任務本身的特性,如果結果準確(語音辨識只有一種可能),beam search就會非常有用 - 但Sentence completion, TTS(Text to Speech)結果有多種可能,BS會沒用 - Cross Entropy v.s. BLEU score -> 怎麼做最佳化,試著使用reinforcement learning :::info - 名詞解釋: - Residual connection: 廣泛用於deep learning,output會加上input - Ground truth: 訓練集對監督學習技術的分類的準確性 - Cross entropy: 觀測預測的機率分布及實際機率分布的誤差 - Teacher forcing: 將Ground truth當成input (Decoder) - Text to Speech (TTS):語音合成 - Exposure bias: Training過程中都有給Decoder正確的input,但Testing只要一出現錯,就會錯非常多 ::: ++Concludes:++ ![](https://i.imgur.com/mtusi3b.png =350x550) ## 0727 [【機器學習2021】自督導式學習 (Self-supervised Learning) (一) – 芝麻街與進擊的巨人](https://www.youtube.com/watch?v=e422eloJ0W4&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=18&ab_channel=Hung-yiLee) [【機器學習2021】自督導式學習 (Self-supervised Learning) (二) – BERT簡介](https://www.youtube.com/watch?v=gh0hewYkjgo&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=19&ab_channel=Hung-yiLee) [【機器學習2021】自督導式學習 (Self-supervised Learning) (三) – BERT的奇聞軼事](https://www.youtube.com/watch?v=ExXA05i8DEQ&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=21&ab_channel=Hung-yiLee) [【機器學習2021】自督導式學習 (Self-supervised Learning) (四) – GPT的野望](https://www.youtube.com/watch?v=WY_E0Sd4K80&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=21&ab_channel=Hung-yiLee) [![bert_v8.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/bert_v8.pdf) ### Self-supervised Learning #### BERT (Transformer Encoder) - Input = (Word + Segment + Position) embedding - [Implemented way](https://medium.com/@_init_/why-bert-has-3-embedding-layers-and-their-implementation-details-9c261108e28a) - Pre-train 1. Masked token prediction (SpanBERT研究遮住多個token -> 機率性選擇要遮幾個) 2. Next Sentence Prediction (RoBERTa研究指出沒什麼用) - SOP (Sentence order prediction)比較有用 -> Used in ALBERT - Fine-tune (How to use BERT) - Case 1: Sentiment analysis - Input sequence、Output yes or no - BERT(init by pre-train, better than random) - Linear transform (random initialization, 採用gradient descent更新) - Train from scratch(從頭開始訓練) vs. fine-tune (win) - Case 2: POS (part of speech) tagging$ - Input sequence length = output - Case 3: Natural Language Inference - Input two sequences、Output a class (Contradiction?) - Case 4: Extraction-based Question Answering - Output (2 integers: start, end) 一定在 input (Document﹐Query)裡 - Random initialized(2 vectors)與doc做inner product之後softmax, 找出起始和結束位置 - GLUE scores: 拿人類當基準1,去評斷機器相較於人類的評分 - Pre-train + Fine-tune 是一種semi-supervised的方法 - (Pre-train)學會做填空題過程中,也透過==上下文==學習了同字不同義 ->類似於 (CBOW) Word Embedding = **Contextualized** word embedding - Multi-lingual BERT (104 languages in pre-train ) - 資料量很重要,越多越好 -> Better alignment - 雖然不同語言同樣意思的embedding接近,但model還是知道語言不一樣 - GPT (Predict Next Token) ![](https://i.imgur.com/Lud0n3k.png =600x400) - 像是transformer decoder,但attention的時候不看之後的輸入 - GPT模型太過巨大,可能無法fine-tune -> ==In-context Learning== - Input: task description、examples、prompt (要output這題答案) - examples給幾個 -> few-shot learning - example只給一個 -> one-shot learning - 沒給example -> zero-shot learning - Self-supervised learning也可以用於Image、Speech... :::info - 名詞解釋: - Downstream Tasks - The tasks we care - We have a little bit labeled data - Scratch: 不使用預訓練模型的初始化 ::: ## 0810 [【機器學習2021】生成式對抗網路 (Generative Adversarial Network, GAN) (一) – 基本概念介紹](https://www.youtube.com/watch?v=4OWp0wDu6Xw&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=15&ab_channel=Hung-yiLee) [【機器學習2021】生成式對抗網路 (Generative Adversarial Network, GAN) (二) – 理論介紹與WGAN](https://www.youtube.com/watch?v=jNY1WBb8l4U&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=15&ab_channel=Hung-yiLee) [【機器學習2021】生成式對抗網路 (Generative Adversarial Network, GAN) (三) – 生成器效能評估與條件式生成 ](https://www.youtube.com/watch?v=MP0BnVH2yOo&ab_channel=Hung-yiLee) [【機器學習2021】生成式對抗網路 (Generative Adversarial Network, GAN) (四) – Cycle GAN](https://www.youtube.com/watch?v=wulqhgnDr7E&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=17&ab_channel=Hung-yiLee)[![gan_v10.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/gan_v10.pdf) ![](https://i.imgur.com/ti4auSJ.png =600x400) - 將input透過Network後output出一個distribution -> Generator - Normal distribution -> Generator -> Complex distribution - Why we need distribution? - 同樣的輸入有多種不同的輸出 (需要創造力) - i.e. Drawing, Video prediction - 其中要加入一個機制:Discriminator - input: image, output:Scalar (large:real, small:fake) - 分辨generator的結果和real images的差異,讓generator修改model - 這個互動稱為adversarial (Generator: 要騙過dicriminator) - Algorithm - Step 1.固定Generator,訓練Discriminator - Step 2.固定Discriminator,訓練Generator - repeating Step 1 & 2 ![](https://i.imgur.com/bK7DDEx.png =600x400) - How to divergence? - 雖然不知道$P_G$ & $P_{data}$的分佈,但可以從中找到sample - $P_G$: Generaotr產生的sample,$P_{data}$: 從database產生的sample - ![](https://i.imgur.com/Oqww1Qo.png =600x400) - 給一個固定的generator,透過dicriminator找出Objective function的最大值 - 再從中找出最小的值當作generator - Tips for GAN - js divergence:看不出距離差異,結果都為$log_2$,除非重疊會是0 (效果不好) - Another solution (Wasserstein distance) - 想像成推土機將P的土丟到Q,但移動的方式有無窮多種 - 窮舉所有解法,找出最小的移動方式 - WGAN (Spectral Normalization) -> 讓gradient在任何地方皆小於1 - Evaluation - 分類越平坦,Diversity越高(一張圖片) - 分類越集中,Qulaity越好 (多張圖片) - Inception Score(IS): Good quality, large diversity - 也要避免Generator產出的東西和real data一樣 - Conditional GAN - input: image+text描述, output: image符合text描述 -> Image translation - 透過GAN+Supervised能有更好的生成結果 - Use GAN in Unsupervised (以上皆以監督式為主) - Cycle GAN - Image style transfer 3D -> 2D ![](https://i.imgur.com/33YHFqn.png =600x400) - 需要2個Generators & 2個Dicriminators - 分別從3D -> 2D & 2D -> 3D做兩邊的轉換 - 目的:讓生成的2D圖片能保有原來3D圖片的特徵 - Text style transfer(Seq 2 Seq) ![](https://i.imgur.com/70JBxwE.png =600x400) - 將Negetive sentence轉換成Positive sentence - Document -> Summary, Language 1 -> Language 2, Audio -> Text :::info - 名詞解釋: - Mode Collapse: Generated model產生的只有幾張圖片(train到後面會幾乎都長一樣),透過discriminator的盲點產生相同高divergence的圖片 - Mode Dropping: 雖然diversity夠而且沒有model collapse,但基本上都能從每張照片找到相同的特徵 ::: ## 0817 [【機器學習2021】自編碼器 (Auto-encoder) (上) – 基本概念](https://www.youtube.com/watch?v=3oHlf8-J3Nc&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=22&ab_channel=Hung-yiLee) [【機器學習2021】自編碼器 (Auto-encoder) (下) – 領結變聲器與更多應用](https://www.youtube.com/watch?v=JZvEzb5PV3U&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=23&ab_channel=Hung-yiLee)[![auto_v8.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/auto_v8.pdf) ### Auto-encoder ![](https://i.imgur.com/moi4vmG.png =600x400) - Also Called: Dimension Reduction - De-nosing Auto-encoder(input加入雜訊,output要還原無雜訊的input) - BERT中的MASKED類似此概念 - Feature Disentagle (也可以用於語音) - 能不能從code (encoder將input轉成vector)再細分出其代表的資訊 - Vector = Content + Speaker info - Application: Voice conversion - Discrete Latent Representation - 可以將code用binary,甚至是one hot vector表示 ![](https://i.imgur.com/DAEhnes.png =600x400) ![](https://i.imgur.com/78vj7I8.png =600x400) - More Application - Compression (Img -> low dim -> Img, 會失真) - Anomaly Detection (異常檢測) - 判斷input是否是訓練資料裡面的data - Auto-encoder的input&output,透過兩者相差loss來判斷是否anomaly ## 0819 [【機器學習2021】來自人類的惡意攻擊 (Adversarial Attack) (上) – 基本概念](https://www.youtube.com/watch?v=xGQKhbjrFRk&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=24&ab_channel=Hung-yiLee) [【機器學習2021】來自人類的惡意攻擊 (Adversarial Attack) (下) – 類神經網路能否躲過人類深不見底的惡意?](https://www.youtube.com/watch?v=z-Q9ia5H2Ig&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=25&ab_channel=Hung-yiLee)[![attack_v3.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/attack_v3.pdf) ### Adversarial Attack - How to attack - Benign Image (原本圖片), Attack Image (圖片加入雜訊,但肉眼看不出來) - Non-targeted: 只要output是錯的就好 - 找到output和正解最大的entropy -> 也就是找出負的最小entropy - Targeted: 不只錯還要輸出想要的目標 ![](https://i.imgur.com/6EypG7G.png =600x400) ![](https://i.imgur.com/df6gB3h.png =600x400) - 要設計一個肉眼看不出來,但對機器來說差很多的noise - L-infinity越小,人類越看不出來 - 訓練方式類似Gradient descent讓Loss最小 - 調input而非parameters - 當差距大到肉眼看得出來,要調回肉眼看不到的正方形 ![](https://i.imgur.com/zO5Zvzt.png =200x200) - FGSM (Fast Gradient Sign Method) - 只update一次 - 加入sing method: 讓gradient只為 +- 1 - 點都會出現在L-infinity框框的四個點上 - White Box Attack (知道Network參數才能攻擊) - Black Box Attack - 透過同樣的training set進行training產生Network Proxy - Network Black & Network Proxy可能相似 - 可以透過攻擊Network Proxy來達到相同效果 - Ensemble Attack (用騙過多個Network的model來攻擊) - Main reason -> 可能是資料上的features在不同資料集都差不多 - Others Data也會有影響 - Speech Processing, NLP - Backdoor in model (從Training階段就開始攻擊) ### Defense - Passive - Add Filter擋住Noisy signal - Smoothing -> 有side effect - Image Compression, Generator - Randomization -> 任意改變圖片,讓attack不知道攻擊哪層 - Proactive - Adversarial Training (Like Data Augmentation) - 在訓練過程中attack自己產生一組錯誤的label - 再將這些標錯的label改回正確的label - Data Set大會很吃力 -> 因為要生double的資料量 - Conclude ![](https://i.imgur.com/SfF6NtL.png =550x180) ## 0825 [【機器學習2021】機器學習模型的可解釋性 (Explainable ML) (上) – 為什麼類神經網路可以正確分辨寶可夢和數碼寶貝呢?](https://www.youtube.com/watch?v=WQY85vaQfTI&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=27&ab_channel=Hung-yiLee) [【機器學習2021】機器學習模型的可解釋性 (Explainable ML) (下) –機器心中的貓長什麼樣子?](https://www.youtube.com/watch?v=0ayIPqbdHYQ&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=27&ab_channel=Hung-yiLee)[![xai_v4.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/xai_v4.pdf) ### Explainable ML - Local Explaination -> 為什麼圖片長這樣就是貓? ![](https://i.imgur.com/Gh5Ks4f.jpg =600x400) - 透過Saliency Map找出機器判別圖片的根據 - 確保機器是看到關鍵資訊後才判斷正確 (i.e. 不是看到水草而判斷出水牛) - Smooth Gradient -> 加入隨機的noises到image並平均起來,結果會比較明顯 - Probing - 訓練一個分類器放在神經網路的某一層,看準確度高低,判斷這一層有沒有我們要分類的資訊 (要小心是沒訓練好、參數沒設好的原因) - i.e. 在某一層放入詞性的分類器確認這一層有無詞性關係 - i.e. 在某一層有沒有可能去除語者特徵,只保留聲音資訊 - Global Explaination -> 貓長什麼樣子? ![](https://i.imgur.com/KLy6Tqn.png =600x400) ## 0901 [【機器學習2021】概述領域自適應 (Domain Adaptation)](https://www.youtube.com/watch?v=Mnk_oUrgppM&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=30&ab_channel=Hung-yiLee)[![da_v6.pdf](https://i.imgur.com/TfnlIqI.png)](https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/da_v6.pdf) ### Domain Adaptation - Domain Shift: 訓練集和測試集如果資料上的分佈不一樣,結果可能會不好 - 使用時機:有大量的target data(unlabeled) - 期望透過Feature Extractor找出兩個data set相同的分佈 ![](https://i.imgur.com/KdpWgH6.png =600x400) - Domain Classifier是二元分類器 - Source = labeled data, Target = unlabeled data

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully