Kien Nam Liew
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Team14 Final Project Report ## Q1. Model design & Concept ### **Rule-based Method** * **Pun location選擇機制的設計想法:** “a humorous use of a word or phrase that has several meanings or that sounds like another word”此為[字典](https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E/pun)中對Pun的說明,可以得知部分雙關語在一句話中會有多種意思。因此在機制的設計上,首先會先將不具多種意思的詞排除在候選列表外。 接著參酌Huang等,在“[Identification of Homographic Pun Location for Pun Understanding](https://dl.acm.org/doi/10.1145/3041021.3054257)”中所提出,以位置資訊作為判斷雙關語的框架。我們以test資料計算雙關語位置,雙關語位置/句子長度,其平均約為0.8。故我們採用這個概念,將句子中前半部分的詞排除在候選列表之外。 最後,觀察雙關語對句子意思的影響。我們比較每個候選詞被替換成對應同義詞後,整個句子意思的變化。當替換後的句子意思與原本的句子意思有所差距時,我們認為此候選詞將會是真正的雙關語,反之,當兩個句子語意上沒有變化時,此候選詞便有更少可能被選為最後的答案。 * **Pun location選擇機制:** 1. 從句子中,篩選出最有可能為正確答案的候選詞列表: a.排除標點符號、stop words。 b.排除同義詞數量小於兩個的詞(使用wordnet查詢)。 c.排除特定位置前的詞。 2. 當候選詞列表建立完成,我們首先找出列表中每個詞對應的同義詞集合。將每個詞的同義詞代換進句子中,計算代換後句子的詞意(將句子中所有詞之向量相加取平均),接著算出代換後句子與原句子的cosine similarity。最後選擇相似度最小的,找出答案。 3. 案例說明: ``` 原句子(text_id=hom_625):OLD PROGRAMMERS never die they just go to bits . 候選列表:['go', 'bits'] go的同義詞集合:['blend_in', 'go_bad', 'hold_up', 'live_on', 'fling', … , 'move', … , 'exit', 'go', 'whirl', 'function', 'sound', 'pass', 'endure', 'decease', 'Adam', 'run', 'offer'] 代換同義詞後的句子:OLD PROGRAMMERS never die they just move to bits . 原句子與代換同義詞後的句子的相似度:0.973536915106364 bits的同義詞集合:['morsel', 'moment', 'second', 'snatch', 'flake', 'routine', 'prick', 'mo', 'turn', 'sting', … , 'number'] 代換同義詞後的句子:OLD PROGRAMMERS never die they just go to number . 原句子與代換同義詞後的句子的相似度:0.929340444103404 ->故選擇較小相似度者(使句子意思出現變化的詞),bits,為雙關語。 ``` ### **Deep Learning Approach** * Architecture: ![](https://i.imgur.com/LzEBUJB.png) 1. 此架構是 [Zou and Lu,2019](https://arxiv.org/abs/1909.00175) 所設計的。 2. 其主要思路是將character和 word 分別 embedding,之後將他們的tag( BPA: Before,Pun,After) 標上去一起餵給雙向LSTM 後,透過CRF Layer 將tagging 預測出來。 ## Q2. What kind of word sense representation used and experimented in your model ### **Rule-based Method** * Word Sense Representation:使用google提供的GoogleNews-vectors-negative300.bin詞向量。 * 找出最佳斷點位置的試驗: | **斷點位置** | **正確答案有在候選列表的比例** |**候選詞總數/詞總數** | | -------- | -------- | -------- | | 0.1 (取句子後90%的詞) | 0.9556 | 0.3885 | | 0.3 (取句子後70%的詞) | 0.9393 | 0.2896 | | 0.35 (取句子後65%的詞) | 0.93 | 0.2696 | | 0.5 (取句子後50%的詞) | 0.8818 | 0.1962 | | 0.7 (取句子後30%的詞) | 0.8017 | 0.1323 | 我們期望正確答案有在候選列表的比例越高越好,候選詞總數/詞總數則越低越好(代表排除更多選擇)。觀察此數據,斷點位置在越前面,代表取到越多的詞,正確答案在候選列表的比例也會越高,但相對的被排除的也會越少,增加更多選錯的可能。 ->考量正確性與排除數量間的取捨,最終選擇0.3作為可接受的斷點位置。 * **Supplement(another ruled based experiment)** 同樣使用google提供的GoogleNews-vectors-negative300.bin詞向量。 從原始的句子中排除stopwords及不在詞向量表中的字後,計算兩兩word的similarity來做最後的選擇。這部分就沒有使用同義詞的替換。 ### **Deep Learning Approach** 根據Zou 的描述,此架構並沒有將word sense考慮進來,而是使用 characters embeddings, pre-trained word embeddings 和 position indicators 作為model 的 input,並利用LSTM 做sequential labeling。 ## Q3. What problem did you face during the homework and how you solved ### **Deep Learning Approach** 1. 起初遇到的問題首先是code libraries 版本的問題,由於這篇文章已經是3年前的文章,source code還用著pytorch 0.4的版本,將syntax 統統更新至目前版本就花了不少時間。 2. 後來在inferences的時候發現該作者沒有提供api,甚至model 也不好直接拿來做預測,於是我們只好去修改他的model,做出可以符合我們需求的api。 ## Q4. Error Analysis and Discussion ### **Deep Learning Approach** KFold Validation的 K 對結果造成的影響非常大,選對K做造成Performance上的差距可以在20%以上。此外所選用的optimizer 也有影響。原作者是推薦使用SGD做optimize,但是我們實測下來用SGD 做optimize 在 K = 5, Epoch=50 的情況下僅僅只有 65% 左右的F1 score. 但是在使用Adam optimizer 在同樣的參數設置下可以到75%左右的準確率。 ## Q5. Compare and implement unsupervised method and supervised method ### **Rule-based Method** 最終此方法在Kaggle上的Mean F1-Score為0.56875,相較深度學習方法的得分(0.75625),存在不小差距。觀察單純依靠此一選擇機制,尚應存在許多其他變因影響最終答案的選擇。相關研究也有提出一些可以參酌的Pun location選擇框架,未來可繼續嘗試結合不同概念,實驗是否能有更好的突破。 (而沒有使用同義詞替換的試驗結果更差,最後的score落在0.42左右,但還是可以看出有蠻大差別的) * **Possible improvement** 可能可以找出每個word可代表的各種語意,再去做比對 ### **Deep Learning Approach** 這是一個Supervised Learning的方法,需要提供ground truth 給model才有結果。和Unsupervised 的方法比較起來,雖然在input上沒有unsupervised的方便,但是它能夠給出更好的performance,在有Ground Truth的情況下會更推薦使用Supervised Learning ## 參考資料 1. Yu-Hsiang Huang, Hen-Hsen Huang, Hsin-Hsi Chen. “Identification of Homographic Pun Location for Pun Understanding” 2. 刁宇峰,杨亮,林鸿飞,吴迪,樊小超,徐博,许侃。<基于潜在语义特性的语义双关语检测及双关词定位> 3. Yanyan Zou, Wei Lu. "Joint Detection and Location of English Puns"

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully