enzyme
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    人工智能让旧课题焕发新一春:对比学习预测酶蛋白功能登<a name="_hlk172483495"></a>science ![图片1](https://hackmd.io/_uploads/HyrRu59u0.jpg) 酶蛋白功能预测工具CLEAN的主要作者照片,从左至右依次为:Tianhao Yu, Haiyang (Ocean) Cui, Huimin Zhao, Guangde Jiang 作为一种加速化学反应的蛋白质,酶能够调节新陈代谢、DNA复制等诸多关键生物过程,对生物细胞和生物体的功能至关重要。到目前为止,酶制品已广泛用于食品加工、卫生清洁、健康诊断等和人们日常息息相关的生活和工业应用中。有些化学反应只能发生在非常苛刻的条件下,并且反应非常缓慢。酶可以让反应在温和的条件下发生,加速单一化学反应,例如用于确定血糖水平的酶只与葡萄糖发生反应,不与任何其他分子发生反应。 酶往往根据其可以催化的化学反应进行分类,最广为人知的酶催化功能数字分类方案是酶委员会(EC)编号,每个EC编号都与酶催化的化学反应相关,所有催化相同反应的酶都具有相同的EC编号。因此,EC编号是到目前为止最常用的酶分类方案,它使用四位数字来确定酶的催化功能,通过预测EC编号,可以用来完全确定酶的催化功能。 针对现有模型无法对研究较少的蛋白质,或者具有未表征功能,或者多种活性的蛋白质的功能进行准确注释的问题,来自伊利诺伊大学香槟分校、康奈尔大学和佐治亚理工学院的研究团队进行合作,基于对比学习机制开发了一种酶蛋白功能预测的AI深度学习框架。 相关研究以「Enzyme function prediction using contrastive learning」为题,于2023年3月30日发表在《Science》杂志上 ![图片2](https://hackmd.io/_uploads/HJOyY9qu0.png) 论文链接:<https://www.science.org/doi/10.1126/science.adf2465> # 一、CLEAN研究背景和核心框架 通过实验方法来确定蛋白质功能是一项非常繁琐且昂贵的任务,因此到目前为止,已经有许多蛋白质功能预测的机器学习模型被开发出来,它们将该任务视为多标记分类问题,目的是为酶分配EC编号,然而不同EC编号下的不同种类的酶非常不均衡,有些酶类别数量非常多,而另一些酶类别几乎没有,可用信息非常有限,因而这些模型的预测能力受到它们所训练的数据集的这种不平衡的影响。 为解决这一问题,Yu 等人提出了他们的机器学习算法 CLEAN (contrastive learning–enabled enzyme annotation)[1]。与目前最先进的工具相比,CLEAN为酶分配EC编号的准确性、可靠性和灵敏度更高。该模型被开发为一个前馈神经网络并使用蛋白质嵌入作为输入,其中包含有关酶功能的重要特征和信息。蛋白质嵌入由ESM-1b语言模型获得[2],然后神经网络的输出层生成输入蛋白质的精炼、功能感知的嵌入。该基于对比学习的 CLEAN 框架主要包括三个模块: (A)**在训练过程中,根据EC编号对正样本和负样本进行采样训练**: ![模型图-A](https://hackmd.io/_uploads/SyBKhEXFA.jpg) (B)**在预测EC编号时,将查询序列嵌入与每个EC编号的表示进行比较,以获得查询序列与每个 EC 编号之间的成对欧几里得距离**: ![模型图-B](https://hackmd.io/_uploads/HJjk6NXF0.jpg) (C)**进行分类时,采用了最大分离(maximum separation)和 P 值(P value)两种方法根据排序顺序对EC编号的可信预测进行优先排序**: ![模型图-C](https://hackmd.io/_uploads/ByDBT4QYA.jpg) CLEAN使用对比学习机制搭建模型框架[3],这是一种自监督、独立于任务的深度学习技术,允许模型学习有关数据的一般特征,即使没有标签,也可以通过教会模型哪些数据点相似(正样本)或不同(负样本)来学习。在训练期间,训练数据集中的每个参考氨基酸序列(锚点)都使用属于具有相同EC编号的酶的氨基酸序列(正样本)和具有不同EC编号的酶序列(负样本)进行采样。训练目标是让CLEAN学习酶的嵌入空间,其中欧几里得距离反映了酶之间的功能相似性,具有相同EC编号的酶的氨基酸序列应具有较小的欧几里得距离,而具有不同EC编号的酶序列应具有较大的距离。学习目标被制定为对比损失函数,该函数最小化参考氨基酸序列(锚点)与正样本之间的距离,同时最大化锚点与负样本之间的距离。**这个模型能够发表在science期刊上的一个重要原因是对酶功能预测引入了全新的训练机制,很好的解决了先前方法不能解决的不同EC组的酶非常不均衡的问题**,从而可以预测很多未鉴定的结合蛋白的功能,比如受体(receptors)和转录因子(transcription factors)。 对于EC编号的预测,首先通过对训练集中属于每个EC编号的所有酶的学习到的嵌入取平均值来获得每个EC编号的数值表示,然后,将查询氨基酸序列的嵌入与每个EC编号的表示进行比较,以获得查询序列与每个EC编号表示之间的成对欧几里得距离,最后为输入蛋白质分配与查询序列明显接近的EC编号。 # 二、计算性能评估 首先,进行计算机验证来比较CLEAN与最先进的EC数量预测工具的进行定量比较。CLEAN是在世界领先的通用蛋白质序列和注释数据知识库UniProt[4] 上进行训练和评估的,训练后将CLEAN的预测性能与六种最先进的EC编号注释工具进行比较,比较的数据集均未包含在任何模型的开发中。CLEAN与现有EC预测工具的在多个数据集和任务上进行了预测比较和定量分析: (A)**在New-392数据库上,为评估CLEAN针对三个多标签准确度指标(准确率、召回率和F1分数)的表现,将其与四个排名靠前的模型 ProteInfer、DeepEC、CatFam 和 ECPred 进行比较**: ![定量比较-A](https://hackmd.io/_uploads/r1NDgSQt0.jpg) (B)**在Price-149数据库上比较CLEAN与BLASTp、ProteInfer、DeepEC、DEEPre、CatFam、ECPred等模型的预测表现**: ![定量比较-B](https://hackmd.io/_uploads/ryblbB7tR.jpg) (C)**在代表性不足的EC数量数据集上对CLEAN、ProteInfer和DeepEC进行比较**: ![定量比较-C](https://hackmd.io/_uploads/HyoUZSQt0.jpg) (D)**使用与训练集同一性小于50%的测试集对CLEAN进行准确率分级图,并使用 SupconH 损失进行评估**: ![定量比较-D](https://hackmd.io/_uploads/SymqzH7F0.jpg) (E)**根据EC编号在CLEAN训练数据集中出现的次数对Price-149和New-392的组合数据集进行评估**: ![定量比较-E](https://hackmd.io/_uploads/S1ygXr7KA.jpg) (F)**与六种常用工具(BLASTp、ProteInfer、DeepEC、DEEPre、ECPred和COFACTOR)相比CLEAN在内部整理的卤化酶数据集上的预测准确度,该数据集具有良好的多样性,涵盖11个不同的EC编号**: ![定量比较-F](https://hackmd.io/_uploads/ByQ_7SXYA.jpg) 实验结果表明CLEAN在几个多标签准确度指标上表现最佳,它比以前基于机器学习开发的模型更加准确,可用于预测新发现的蛋白质的功能,尤其是功能未知的酶蛋白质。 # 三、生物实验验证 此外,作者对CLEAN在未表征的卤化酶方面的预测结果进行实验验证(**这个模型能够发表在science期刊上的原因2:区别于发表在计算机或者生物信息学期刊会议上的论文,该研究开展了生物实验来验证CLEAN预测结果的实用性,确实可以发现一些比较特别的蛋白质酶**)。作者在论文中开展了全面的生物实验来验证CLEAN对未表征卤化酶的功能预测结果。 (A)**显示了36种已鉴定卤化酶的EC数值ID准确度热图**: ![生物实验-结果图-1](https://hackmd.io/_uploads/HJcmaJm9C.jpg) (B)**未表征蛋白质和阳性对照(PC)酶之间的序列同一性热图,其中带有绿色色标的彩条表示百分比**: ![生物实验-结果图-2](https://hackmd.io/_uploads/BkJnaJm90.jpg) (C)**SAM氢氧化腺苷转移酶MJ1651-TTHA0338反应**: ![生物实验-结果图-3](https://hackmd.io/_uploads/rJMW01X90.jpg) (D)**未表征蛋白质 MJ1651、TTHA0338和阳性对照酶 PH0463的三维(3D)结构的结构叠加,对SsFlA、SalL和 ScFlA进行了相同的结构叠加,结果表明这些SAM结合酶的3D结构非常相似,而CLEAN 可以准确区分它们的功能**: ![生物实验-结果图-4](https://hackmd.io/_uploads/SkFg1eX5R.jpg) (E)**SAM与卤素离子或H2O进行亲核取代,从而生成SsFlA**: ![生物实验-结果图-5](https://hackmd.io/_uploads/Sk5PJlXqR.jpg) 生物实验在CLEAN对未鉴定卤化酶的功能预测结果展开,卤化酶在制药、农业生产、化学等领域中有广泛应用,迄今从UniProt中鉴定出的36种未完全注释的卤化酶涵盖了所有四种卤化酶。由于卤化酶的研究尚不足,蛋白质数据库中只有有限数量的卤化酶氨基酸序列可用,因此预测卤化酶功能仍是艰巨任务。在这部分,研究人员从卤化酶类中选取了三种酶,这些卤化酶要么被标记为未表征和/或假设的蛋白质,要么在文献中有相互矛盾的注释。如上图所示,CLEAN为这三种卤化酶预测了新的EC数值,表明它们可能有与之前认为的不同的潜在功能,研究人员通过一系列生物实验证实了这三种卤化酶的功能,验证了CLEAN预测结果的准确性。 # 四、CLEAN模型的扩展应用 由于CLEAN模型框架设计的新颖性,及其预测结果的有效性已经通过生物实验验证,目前已经有多个后续工作使用CLEAN作为其重要计算模块来搭建其模型框架或者帮助提高生物实验的效率,并成功发表在Nature、science正刊或者子刊等顶级期刊上,对一些后续研究工作的介绍如下: (A):2024年3月8日发表在《**Science**》杂志上的「**Prophage proteins alter long noncoding RNA and DNA of developing sperm to induce a paternal-effect lethality**」文章[5]使用CLEAN模型对沃尔巴克氏体的原噬菌体编码的细胞质不相容因子A(CifA)的功能注释进行预测,并推导其体外RNase活性之间存在的因果关系,有效提高了噬菌体蛋白改变发育精子的长链非编码RNA和DNA的效率并诱导父系效应致死的研究效率: ![后续研究-1](https://hackmd.io/_uploads/r1nXy1VYC.jpg) ![后续研究-1-2](https://hackmd.io/_uploads/HkaJIyEK0.jpg) (B):2024年1月23日发表在《**PNAS**》杂志上的「**Methylation of ciliary dynein motors involves the essential cytosolic assembly factor DNAAF3/PF22**」文章[6]使用CLEAN模型对对人类DNAAF3和衣藻 PF22的一级序列进行对比和酶功能预测,并根据该预测结果将必需的组装因子DNAAF3确定为S-腺苷甲硫氨酸依赖性甲基转移酶的结构直系同源物。该文章证明动力蛋白重链,尤其是形成纤毛外臂的重链,在各种核苷酸结合位点内的关键残基和微管结合域螺旋上被甲基化,这些残基直接参与向低结合亲和力的转变: ![后续研究-2](https://hackmd.io/_uploads/HytoEkVF0.jpg) ![后续研究-2-2](https://hackmd.io/_uploads/rJlyPyVFA.jpg) # 五、结论 综上所述,CLEAN是一种基于对比学习的新型酶功能预测机器学习算法,它比目前最先进的工具在酶功能预测上实现了更卓越的预测性能。此外,它还能为研究不足的酶类可靠地注释EC编号,而其他算法因为无法克服酶蛋白数据不平衡问题而会做出错误预测。CLEAN可以成为预测酶功能的有力工具,促进代谢工程、功能基因组、药物等多个领域的发展。而且**模型学习理论的先进性**和**生物实验验证了其有效性**,因此,该模型能够成功发表在science上。 [1] Yu T, Cui H, Li J C, et al. Enzyme function prediction using contrastive learning[J]. Science, 2023, 379(6639): 1358-1363. [2] Rives A, Goyal S, Meier J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv[J]. 2019. [3] Khosla P, Teterwak P, Wang C, et al. Supervised contrastive learning[J]. Advances in neural information processing systems, 2020, 33: 18661-18673. [4] UniProt: the universal protein knowledgebase in 2021[J]. Nucleic acids research, 2021, 49(D1): D480-D489. [5] Kaur R, McGarry A, Shropshire J D, et al. Prophage proteins alter long noncoding RNA and DNA of developing sperm to induce a paternal-effect lethality[J]. Science, 2024, 383(6687): 1111-1117. [6] Sakato-Antoku M, Patel-King R S, Balsbaugh J L, et al. Methylation of ciliary dynein motors involves the essential cytosolic assembly factor DNAAF3/PF22[J]. Proceedings of the National Academy of Sciences, 2024, 121(5): e2318522121.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully