Allen Tzeng
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Mathematics in Machine Learning == ###### tags: `Mathematics` 在學習 Machine Learning 的過程中,接觸到的數學最主要的還是以線性代數 (Linear Algebra) 與機率統計 (Probability & Statistics)為最重要的兩個面向。 在機器學習中碰到這兩個數學領域,有部分的確是大學即會碰到的內容,但有些部分卻也是比較深入的部分,舉例來說,在討論 VC dimension 時所使用的技巧有部分我想應該就比較偏向於研究所的「數理統計」。儘管如此,在學習的時候或許無法這麼嚴謹的證明每一個性質,但從直觀來理解倒也不算是真的非常困難。 以下我試圖將我所碰到的部分整理出來,以提供初學者了解,並且也可以方便我回頭再來 review。 ## 線性代數 Linear Algebre 在機器學習的數學運算中,會經常將手邊有的資料進行向量化,利用矩陣來進行所有的向量操作,向量跟矩陣就此產生了連結。 當然不只有操作產生連結,我們會反過來思考 : 「在矩陣上的特性是否能展現在向量 ( 資料 )上 ?」,而事實證明,我們的確也利用了許多線性代數的性質來解決機器學習上面的問題。 * **矩陣運算** $\vec{A}=(a_1,a_2,\ldots,a_n)$ $\vec{B}=(b_1,b_2,\ldots,b_n)$ $\vec{A}\cdot\vec{B}=(a_1b_1,a_2b_2,\ldots,a_nb_n)=\begin{bmatrix}a_1&a_2\cdots a_n\end{bmatrix}\cdot\begin{bmatrix}b_1\\b_2\\\vdots \\b_n\end{bmatrix}=A^TB$ * **擬反矩陣 pseudo-inverse** 當我們在進行最佳化找尋解析解時,如若遇到矩陣是不可逆矩陣 ( 不存在反矩陣 ),我們可以利用擬反矩陣(廣義反矩陣) pseudo-inverse來代替 $\forall A\in\mathbb{C}^{m\times n}\ ,\ \exists !A^\dagger$ s.t. : $1.AA^{\dagger}A=A$ $2.A^{\dagger}AA{^\dagger}=A^{\dagger}$ $3.(A^\dagger A)^T=A^\dagger A$ $4.(AA^\dagger)^T=AA^\dagger$ * **矩陣微分** 如上所述,我們在機器學習裡面也常利用矩陣微分來找出最佳解,矩陣微分其實類似於一般的函數微分 : ( 以下假設 $\boldsymbol{A,B,C}$ 均為常數矩陣,$\boldsymbol{a,b,c}$ 為向量矩陣,均與 $\boldsymbol{X,\mathbf{x}}$ 無關 ) #### 一次函數微分 $\frac{d(\boldsymbol{\mathbf{x}}^T\boldsymbol{A})}{d(\boldsymbol{\mathbf{x}})}=\boldsymbol{A}$ $\frac{d(\boldsymbol{a}^T\boldsymbol{X}\boldsymbol{b})}{d(\boldsymbol{X})}=\boldsymbol{ab}^T$, 若 $\boldsymbol{a,b}$ 換成 $\boldsymbol{A,B}$ 也適用 #### 二次函數微分 $\frac{d(\boldsymbol{\mathbf{x}}^T\boldsymbol{A}\boldsymbol{\mathbf{x}})}{d(\boldsymbol{\mathbf{x}})}=(\boldsymbol{A+A})^T\boldsymbol{\mathbf{x}}$ , 若 $\boldsymbol{A}$ 為 Symmetric,則為 $2\boldsymbol{A\mathbf{x}}$ $\frac{d((\boldsymbol{A}\boldsymbol{\mathbf{x}}+\boldsymbol{b})^T\boldsymbol{C(\boldsymbol{D}\boldsymbol{\mathbf{x}}+\boldsymbol{e}))}}{d(\boldsymbol{\mathbf{x}})}=\boldsymbol{A}^T\boldsymbol{C}(\boldsymbol{D}\boldsymbol{\mathbf{x}}+\boldsymbol{e})+\boldsymbol{D}^T\boldsymbol{C}^T(\boldsymbol{A}\boldsymbol{\mathbf{x}}+\boldsymbol{b})$ 更多更仔細的矩陣導數可參考 : [矩陣導數 -- 線代啟示錄](https://ccjou.wordpress.com/2013/05/31/%E7%9F%A9%E9%99%A3%E5%B0%8E%E6%95%B8/) * **投影矩陣 Projection Matrix** 在回歸分析裡面,Projection Matrix 佔有十分重要的分量,尤其在理論推導上面,我們可以將預測值視為真實值經由 Projection Matrix 投射後的結果 (詳見 : [林軒田機器學習基石筆記 - 第九講、第十講](https://hackmd.io/s/rJ7zrsB5V)) Projection Matrix $\boldsymbol{H}$ 具有下列性質 : 1. $\boldsymbol{H}$ is symmetric 2. $\boldsymbol{H}^k=\boldsymbol{H}$ 3. $(\boldsymbol{I}-\boldsymbol{H})^k=\boldsymbol{I}-\boldsymbol{H}$ 4. $tr(\boldsymbol{H})=d+1$ * **空間變換 Transformation** $\phi$ : $X\longrightarrow Z$  via $\phi(\boldsymbol{\mathbf{x}})=\boldsymbol{\mathbf{z}}$ where $\boldsymbol{\mathbf{x}}=(x_1,x_2,\ldots,x_d)\in X$ and $\boldsymbol{\mathbf{z}}=(z_1,z_2,\ldots,z_{\tilde{d}})\in Z$ 空間變換這樣的概念,在許多的應用層面 ( 不只機器學習 ) 都被廣泛使用,我們在現有空間上遭遇困難,便會試著利用縣性或非線性變換成另外一種空間後,在嘗試利用我們熟悉的方式去解決。 在機器學習上,我們可以利用這種方式解決非線性可分的問題 (將非線性可分的資料經由多項式轉換變成線性可分後,再利用線性模型進行處理 ) ,而廣為人知的支持向量機 (SVM) 也是利用類似的概念處理非線性問題。 * **特徵值 (eigen-value) & 特徵向量 (eigen-vector)** 設 A 為一個 $n\times n$ 階矩陣,當$\mathbf{x}$ 與 $\lambda$ 滿足下列特徵方程 : $\mathbf{Ax}=\lambda\mathbf{x}$,我們稱 $\mathbf{x}$ 為 $\mathbf{A}$ 的特徵向量 (eigan-vector),而 $\lambda$ 則為 $\mathbf{A}$ 的特徵值。 當我們在處理 Deep Learning 的最佳權重時,由於經過多層的加權,最後的方程式可能高達四次式以上,無法找出相對應的解析解,但可以經由特徵分解試圖找出最佳解。 ## 機率論 Probability * **聯合機率 Joint Probability** $P(X=x_i,Y=y_j)=\frac{n_{ij}}{N}\Longrightarrow P(X=x_i)=\sum\limits_{\forall j}P(X=x_i,Y=y_j)$ * **條件機率 Conditional Probability** $P(Y=y_j\mid X=x_i)$ : 在 $X=x_i$ 已知的條件下 $Y=y_j$也發生的機率 $\overset{defn}{\Longrightarrow}P(Y=y_j\mid X=x_i)=\frac{P(X=x_i\cap Y=y_j)}{P(X=x_i)}$ $\Longrightarrow P(X=x_i\cap Y=y_j)=P(X=x_i,Y=y_j)=P(Y=y_j\mid X=x_i)\cdot P(X=x_i)$ * **貝氏定理 Bayes' Theorem** $P(Y\mid X)=\frac{P(X\mid Y)\cdot P(Y)}{P(X)}$ * **霍夫丁不等氏 Hoeffding’s Inequality** 給定某事件在母體中機率μ,現有一抽樣,其樣本數 $N$,此事件在樣本中機率$\nu$ 則 $\mu\ 與\ \nu$之誤差大於 $\epsilon$ 的 機率不會超過 $2e^{-2\epsilon^2N}$ **$$\mathbb{P}(|\mu-\nu|>\epsilon) \leq 2e^{-2\epsilon^2N}$$** ## 微積分 Calculus * **梯度 Gradient** 假設 $f(x_1,x_2,..,x_n)$ 為 $\mathbb{R}^n$ 空間中一曲面方程式, 我們便可以求出 在$x_i$ 方向上的偏導函數 $\frac{\partial f}{\partial x_i}=f(x_1,x_2,..,x_n)$ 中某一點 $x$ 在 $x_i$ 方向的切線斜率 $=f(x_1,x_2,..,x_n)$ 中某一點 $x$ 在 $x_i$ 方向的傾斜程度 而偏微分的正負號當然也可以表現出 $f(x_1,x_2,..,x_n)$ 中某一點 $x$ 在各軸上面往上升的方向。 如果我們把上述每一個方向的偏導函數收集起來,便是我們的梯度 : $\nabla f(x_1,x_2,..,x_n) =(\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},..,\frac{\partial f}{\partial x_n})$ 那麼,若我們想要找到 $f(x_1,x_2,..,x_n)$ 的極值,最簡單的方式便是令 $\nabla f(x_1,x_2,..,x_n) =0$ 在機器學習的範疇中,我們很常利用梯度作為最佳化的方式,我們可能無法找到梯度為0的最佳解,但可以經由不停的梯度下降迭代出更好的解,經過夠多次數的迭代或是真的找到梯度為0,我們便可以找出相對最佳解來。(詳見 : [梯度下降法 Gradient Descent](https://hackmd.io/s/ryQypiDK4)) * **拉格朗日算子 Lagrange Multiplier** 基本上沒有條件限制的求解最小值,就是直接令導數為0來求解,但如果 $f(x_1,x_2,\ldots,x_n)$ 必須滿足 $\begin{cases}g_1(x_1,x_2,\ldots,x_n)=0\\g_2(x_1,x_2,\ldots,x_n)=0\\\vdots\\g_m(x_1,x_2,\ldots,x_n)=0\end{cases}$ 這些條件,在數學上我們會使用 Lagrange 乘數法 : 令$h(x_1,x_2,\ldots,x_n)=f+\sum\limits_{\forall m}\lambda_mg_m$ ,然後對 $h$ 求取梯度且令其為0來找出 $x_1,x_2,\ldots,x_n$ 與 $\lambda_m$的關係式後再帶回條件式中求得最佳解( 不難證明經過 Lagrange 乘數法處理之後的解並不會跑掉 ) 當然我們的條件式若為不等式也仍然可以處理 : $f(x_1,x_2,\ldots,x_n)$ 必須滿足 $\begin{cases}g_1(x_1,x_2,\ldots,x_n)\leq0\\g_2(x_1,x_2,\ldots,x_n)\leq0\\\vdots\\g_m(x_1,x_2,\ldots,x_n)\leq0\end{cases}$ 這些條件,一樣我們可以使用 Lagrange 乘數法 : 令$h(x_1,x_2,\ldots,x_n)=f+\sum\limits_{\forall m}\lambda_mg_m$ ,但這邊需要注意的地方是,條件式為不等式之解並非 closed-form ,我們可以用許多優化演算法來找出最優解,而這個最佳解必定要符合 Karush-Kuhn-Tucker Condition。 ( 參閱 : [Constrained Optimization Using Lagrange Multipliers](http://people.duke.edu/~hpgavin/cee201/LagrangeMultipliers.pdf) ) ## 二次規劃 Quadratic Programming 當我們在處理 SVM 的最佳化過程,我們會遭遇到一個難題 : 有條件的 error function 想要手動硬解並非易事,有條件的 Gradient Descent 也是相同困難。 然而我們最後得到的 error function 為二次式,且條件均為線性不等式,我們就可以先利用 Lagrange multiplier 得出 $h$ 後再經由 Quadratic Programming solver 進行對偶空間的最佳化後再回傳到原本空間來得出最佳解。 --- #### 補充 : 在統計學中取 log 的意義 對數函數本身是單調 ( monotonic ) 函數,將原本數據取對數後並不會改變數據間的相對關係,這樣取對數的步驟主要有幾個原因 : **1. 將資料的數據range變小,便於計算** 原本 資料分布由 1-1000,取對數後約分布在0-7之間,方便我們進行各種運算。 **2. 將乘法運算轉換為加法運算** 也就是我們可以經由取對數將複雜、非線性模型轉換成為線性模型 **3. 為了可以更清楚看見資料間的關係** ![](https://i.imgur.com/lXBu97t.png) 左上圖為各國家GDP與人口之間的關係,經過對數轉換後,可以發現這兩個變量有明顯的線性關係。其實我們也可以說,取對數可以將原本過於左偏的資料轉變成為接近常態分佈的樣貌。 **4. 在經濟學上,取對數會讓整個模型更貼近於真實狀況,換句話來說,沒有取對數的情形與現實會產生落差** --- 參考資料 : 1. 林軒田Hsuan-Tien Lin [機器學習基石Machine Learning Foundations](https://www.youtube.com/watch?v=nQvpFSMPhr0&list=PLXVfgk9fNX2I7tB6oIINGBmW50rrmFTqf) 2. 林軒田Hsuan-Tien Lin [機器學習基石Machine Learning Techniques](https://www.youtube.com/watch?v=A-GxGCCAIrg&list=PLXVfgk9fNX2IQOYPmqjqWsNUFl2kpk1U2) 3. [from Probability Theory to Softmax Function, and then Cross Entropy](https://medium.com/@CecileLiu/from-probability-theory-to-softmax-function-775e3c631ede?fbclid=IwAR0AH_s1JE2nB5LhMJg4dutAbiQeyJxhbeehH2SN6ZUjqnccZFnkao0c3pA). 4. [矩阵微分](http://www.cnblogs.com/xuxm2007/p/3332035.html) 5. 王富祥,游雪玲(2017)。七把刀弄懂微積分(三版) 6. 王富祥,游雪玲(2019)。線性代數的天龍八步(三版) 7. 線代啟示錄 --- [Karush-Kuhn-Tucker (KKT) 條件](https://ccjou.wordpress.com/2017/02/07/karush-kuhn-tucker-kkt-%E6%A2%9D%E4%BB%B6/)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully