EastGeno
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Deep Learning with Differential privacy(一) === ### 前言 由於神經網路需要大量的資料做訓練,但通常資料會有敏感的資訊,比如今天你想訓練一個分類器,裡面有部分資料是你們公司獨有的,不希望外流,但加入這些資料做訓練效能才會好。 你可能會想,我只是做訓練,理論上資料轉化成梯度,只是訓練模型,就算我把模型權重釋出,怎麼可能知道我的訓練資料,但實際上是在資安的領域上,只要你的輸入固定、輸出就跟著固定的函式都是(理論上)可以被破解的,像在 Model inversion attacks that exploit confidence information and basic countermeasures 這篇論文中,就能夠將人臉辨識的神經網路的訓練集找出來,甚至他們不需要整個神經網路,你把他放在雲端上當作服務給人使用,也有機會被人攻擊成功。 這篇論文解決了敏感的訓練資料被還原回來的狀況、隱私洩漏情況的新計算方法 ,所以用了一個密碼學常用的手段-差分隱私,差分隱私將資料的統計結果加上一些隨機的性質,藉此保護原始資料集的每一筆記錄不會被輕易解析。 ### 差分隱私(DP) 前面有提到在統計結果上加上一些隨機性質,那在此篇論文中我們是在梯度上,簡單來看一個 batch size 大小的平均梯度就是我們的統計結果,所以我們就是要對其做處理 首先來認識一下 DP 的定義, 我們會有一個存在隨機結果的函式 *M*,資料集 *D*、鄰居資料集 *D'*,鄰居資料集和原本資料集的差至多一筆資料, *S* 代表某個 *M* 的輸出結果,$\epsilon$ 代表洩漏多少隱私,$\delta$ 代表這個 *M* 失效的機率,在 $\epsilon$ 處於0~1、$\delta$ 接近0時,代表不管差哪一筆資料我們的*M*跑出來的結果都不會差太多 ![](https://i.imgur.com/KIyRk9O.png) ![](https://i.imgur.com/CN6ji2w.png) 你可能還是對 DP 滿頭問號,我這邊舉個 DP 核心概念的例子,今天小明會抽煙且有癌症,他跑去幫忙做醫療調查,是關於基因、抽煙等和癌症的關連性,假設調查結果是說有什麼基因的人抽煙的話就會得癌症,那全世界的人都會知道他得癌症,等於洩漏了他的個人隱私,但我們換個說法—抽煙會導致癌症,就不會有隱私洩漏的問題,因為調查結果換成了一種比較普遍 (general) 的說法,因此不會洩漏隱私。(這邊的假設是在,一個人有癌症與否是一件比較隱私的事情) 我們再來看看 *M* 到底是什麼,在這裡由於梯度是數值型,所以我們採用 Gausssian mechanism ,其實就是在原本的統計輸出(這裡是梯度) $f(d)$ 加上 Gaussian noise, $S_f$ 是 function 的 sensitivity 代表 $f(d)$ 和 $f(d')$ 的最大差距 (L2-norm) ,*N* 就是 Gaussian noise 的函式,$\sigma$ 是 noise multiplier 決定了每次隱私的消耗 $\epsilon$ ![](https://i.imgur.com/34p5CbO.png) 那可能大家還是看的霧煞煞,用白話文一點的方式來講就是由於我們的 Gaussian noise 的均值為 0,所以相對其他值,產生出 0 的機率一定是最大的,且隨著標準差變大,為 0 的機率也會跟著下降,這種性質是 DP 所需要的,至於 Gaussian noise 實際上如何符合 DP 定義這邊就不談,有興趣的可以參考 Reference 的 privacy book ### Differentially Private SGD (DP-SGD) 實際看演算法之前,我們先來認識幾個機制 - Norm clipping: 簡單來說就是限制梯度在一定範圍內,對深度學習比較熟的人,可能會疑惑我為什麼要特別提及,除了一般常見的避免梯度爆炸之外,前面我們看到 noise 的標準差和 sensitivity 呈正比,為了避免 noise 過大導致訓練效果不佳的問題,所以才會特別提及 - Per-layer and time-dependent parameters: 如果是一個多層的神經網路,在這裡我們將每一層分別考慮(雖然會將全部的參數用 $\theta$ 表示),這樣能設置不一樣的 clipping bound 和 $\sigma$ - Lots: 這個詞有點像 batch size,主要是限制一次的梯度下降需要聚集多少筆資料的梯度,通常會是 batch size 的倍數,是為了能讓一般人也可以算完多筆資料後再一起更新,如果在一般的套件 batch size 設為1的話,那你的梯度結果不太能稱為一個資料集統計結果、偏差很大 ![](https://i.imgur.com/xF3638X.png) ### Moments Accountant 那在 DP 的世界中,還有一個點是如何計算你的![](https://i.imgur.com/biIWy6G.png =20x20) (privacy loss), 它代表著你的隱私洩漏程度,每一次接觸到原始資料都需要重新計算它,在這裡![](https://i.imgur.com/S1YAzK1.png =250x30) ,意思是你每次的梯度計算都會因為梯度下降倚賴上一次的結果,所以你的![](https://i.imgur.com/biIWy6G.png =20x20)可以說是累加的 一般來說最直接的方是就是將各個 ![](https://i.imgur.com/StL6vod.png =25x20) 的![](https://i.imgur.com/biIWy6G.png =20x20)總和起來,好一點的方法會考慮到,我們一直使用相同的資料集做處理,所以必定某些資料的梯度會重複計算到,那這時我們就可以拿過去的資訊來直接考慮,不需要重新接觸資料 論文中提到的 moments accountant,就很好的考慮到過去資訊的利用,用了一連串的數學證明了,該如何用他們的方法計算![](https://i.imgur.com/biIWy6G.png =20x20) 首先我們得認識到 moment 是什麼,中文是矩或者動差,可以想成他是描述資料形狀的屬性,對於一筆身高的資料,他的一階動差是期望值,二階動差可以是變異數或單純的身高平方的期望值,三階動差可以是偏態等 ![](https://i.imgur.com/BKJRfug.png) 剛有說動差可以說是對資料屬性的描述,那我們可能用一個值來直接描述一個資料的分布嗎?這就是 *MGF* (動差生成函數) 誕生的原因,在這裡用 *MGF* 的原因有二,一是馬可夫不等式,二是他的泰勒展開式就是各種動差的加總 ![](https://i.imgur.com/WyiNMiE.png) 那這邊為什麼會需要到動差呢?因為我們把我們的![](https://i.imgur.com/biIWy6G.png =20x20)視為是隨機變數,那我們會試著找出![](https://i.imgur.com/biIWy6G.png =20x20)在一定範圍外的機率![](https://i.imgur.com/G5gJGTq.png =20x20),這邊我們用到了馬可夫不等式,試著去找出他分布超過某個值的機率的上界,而且使用不同的動差去找出其中最小的上界 那我們介紹完大概內容,把步驟重新看一遍 - step1 將我們的兩個分布的差當作隨機變數 ![](https://i.imgur.com/ERZtmrF.png) - step2 定義我們的 ![](https://i.imgur.com/q5aiHOO.png =20x20)階動差, 會用到 *MGF* 來計算 ![](https://i.imgur.com/Nb45B4W.png) - step3 找出每階最大的動量(第二個式子可以參考論文的詳細證明) ![](https://i.imgur.com/VlzmTZQ.png) ![](https://i.imgur.com/AcbxVjc.png) - step4 將每次梯度下降的動差相加 ![](https://i.imgur.com/vKqvD3I.png) - step5 最後藉由馬可夫不等式,固定![](https://i.imgur.com/biIWy6G.png =20x20)或![](https://i.imgur.com/G5gJGTq.png =20x20)其中之一,計算另外一個 ![](https://i.imgur.com/oU1ouWp.png) ### 結論 有了 DP 的保護後,模型都能安全的釋出,而且觀察到,之前常用的計算![](https://i.imgur.com/biIWy6G.png =20x20)的方式 strong composition,跟 moments accountant的差別,可以看到我們 privacy loss 大幅減少 ![](https://i.imgur.com/8biMiIZ.png) 至於準確度是一定下降了不少,但畢竟這篇可以說是 DP 和 Deep Learning 結合的始祖級論文,所以就期待之後的論文吧~ ![](https://i.imgur.com/IG8jkSx.png) ### Reference - https://arxiv.org/pdf/1607.00133.pdf - https://www.researchgate.net/figure/A-graphical-depiction-of-the-truncated-Gauss-mechanism-on-two-inputs-A-N-0-s-2-B_fig1_332893160

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully