Hua Ge
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    We sincerely appreciate the reviewers' insightful feedback! We believe most of the comments can be addressed via clarification and some additional analysis. Please find our responses below. **R1Q1:Why Is Ranking Necessary? Can't the Model Rank Candidates?** 1. Firstly, ranking prioritizes higher-quality comment candidates, enabling developers to quickly identify and select the most appropriate candidate. Secondly, the "OriSimRerank" ranking method helps mitigate the issue of excessive annotations. While LLMs often generate annotations containing additional details like implementation specifics, our target is a concise description of method functionality, not internal logic or implementation. Thirdly, as shown in Table1 and Table2, the “OriSimRerank” method further improves performance. 2. LLMs can rank candidate comments; however, according to our findings([@Link](https://docs.google.com/spreadsheets/d/e/2PACX-1vRr3Gc3jVmqfUHlV9RcpEsV9rl557Fpabkeaa5FLU7TCmMjPeENUmb_XL6DVVAV2uEBeNjl1dcYhSNM/pubhtml)), employing a pairwise-based ranking method, based on prompt and CodeLlama, may lead to a slight average Accuracy degradation in LLMCup results without reranking, decreasing by 0.1%(from 25.4% to 25.3%). **R1Q2:How Was the Prompt Designed? Were Simpler Prompts Explored?** 1. The prompt in Fig.2 is adapted from the approach in [16]. **However, our focus is on comment updating, which inherently involves additional inputs.** These inputs include ②,④,⑥,⑧, while instructions include ①,③,⑤,⑦, and ⑨, which are essential for structuring the inputs and aiding LLMs in comprehending the task. 2. We initially experimented with simpler prompt designs but abandoned them due to poor performance. LLMs using these prompts often overlooked critical aspects, such as unclear definitions of the model’s role, insufficient focus on code modifications, and inadequate adherence to the desired output format. Consequently, the generated comments often failed to reflect changes to function signatures, variable names, and class names. **R1C1:Suggestions in comments** Will revise the paper accordingly, following the detailed plan outlined in [@DocLink](https://docs.google.com/document/d/e/2PACX-1vT1mrlZ4hOZKqWWoY2rgWcWs-N714XT3nKyfV8RrgxffJm7n1j0_vMqrzG4siz7NY6JSAKIeJ0zJpWy/pub). **R2Q1:What Are Alternative Reranking Approaches?** There are three types of alternative reranking approaches: 1. Pointwise Reranking: Methods like \[71](Multiple-Classification-Badsed), treat the ranking problem as a regression or classification task, independently scoring each candidate’s relevance. 2. Pairwise Reranking: Approaches such as \[72](LLM-Based), RankNet\[73](Neural-Network-Based), RankBoost\[74](Boosting-Based) and RankingSVM\[75](SVM-Based), frame the ranking problem as a binary classification task to determine the relative relevance between candidate pairs. 3. Listwise Reranking: Techniques like \[76](LLM-Based), ListNet\[77](Neural-Network-Based), and \[78](Random-Forest-Based), models the ranking problem as an optimization task for the entire candidate list, aiming to improve overall ranking quality globally. **R2C1: Does Minimizing Changes to Original Comments Improve Results? Can Strict Application Lead to Bias and Lower Quality?** From a software development perspective, developers are more likely to accept updated comments that build on the original human-written comments. We conducted a small-scale experiment where participants received updated comments that were semantically similar or identical. Some comments were minimally altered from the original, while others underwent substantial modifications. The results revealed that users preferred comments with smaller changes. Additionally, our analysis of the dataset, comprising 1,496 projects and 98,622 data points, shows that on average, 84.05% of the original comment’s sub-tokens are reused in the updated comment. However, We acknowledge the potential issues raised and will discuss them in the revised version. **R2C2:Why chose These LLMs?** a) Choice of Model Types: To ensure comprehensive research findings, we aimed to include a diverse set of models, spanning: - General-purpose models (e.g., GPT-4o, Llama3, Mistral) - Code-specific models (e.g., Codelllama, DeepSeek-Coder-v2) - Proprietary commercial models (e.g., GPT-4o) - Open-source models (e.g., Llama3, Mistral, Codelllama, DeepSeek-Coder-v2). b) Model Advancement and Representativeness: - GPT-4o is among the latest and most powerful models in the GPT family. - Llama3 and Codelllama are representative of the newest and most advanced iterations of the Llama series. - Gemma and DeepSeek-Coder-v2 are prominent representatives of other popular LLM frameworks. c) Choice of Parameter Scale: To leverage the improved performance associated with larger parameter scales in LLMs, we prioritized selecting models with greater parameter scales (e.g., 7B, 8B, and 16B), within the constraints of available technical resources. **R2C3: Was the Prompt Designed Through Trial-and-error?How were the Two Rules Derived?** The prompt design undergoes a trail-and-error process. See R1Q2 for details. Without the first rule, generated comments often miss updates to function signatures, variable names, and class names in the code. Without the second rule, LLMs tend to retain typos from the original comment instead of correcting them in the updated version. **R2C4:Why Chose the Three Human Evaluation Perspectives?** We chose them for two reasons: (1)They were inspired by the human evaluation approach used in a similar study[79]; (2)Based on our observations, the generated results may have issues such as inconsistencies with the updated code (e.g., copying the original or missing key changes), unnatural expression (e.g., incomplete sentences, repetition, and grammar errors), and failure to convey the code's intent, making it harder for developers to understand the functionality quickly. **R3Q1: Why Recall@3 and Recall@5? How Were They Determined?** 1. Accuracy is equivalent to recall@1, as their calculation methods align. 2. Due to budget constraints for GPT-4o, we generated only 5 candidate annotations per data point. Therefore, **the maximum value of k is 5**. 3. To balance brevity with result comprehensiveness, we reported recall@k for k=1, 3, and 5. 4. Recall@2 and Recall@4 results are provided in [@TableLink](https://docs.google.com/spreadsheets/d/e/2PACX-1vRavyz5F5zVoh5RIDSUiIdhN-VFRNxpiFooOXytAnAdrI2ckmT4WHRF4DRhwf5S6JLMTiYC6Wt0Ienn/pubhtml). **R3Q2: Why Were Specific Temperature Settings Chosen (0.2 for Most Models, 0.1 for Codellama:7b), and How Do They Impact Result Consistency?** Based on experimental results, the temperature setting shown in Fig. 5(a) is optimal. Relevant factors are discussed in lines 683–704, and we will provide additional details as needed. As shown in Fig. 4, increasing the temperature generally reduces LLM accuracy while improving Recall@5. This indicates that higher temperatures decrease consistency in generated results. Specifically, lower temperatures improve the likelihood of generating the correct comment on the first attempt, while higher temperatures increase the chance of producing a correct comment within k attempts(k=5). **R3Q3: Could you elaborate on the types of code changes in the dataset and how LLMCup addresses different modification complexities?** This paper focuses on comment updates rather than categorizing code changes. While code updates at the method level can involve: (1) syntactic changes that do not alter semantics (2) changes in execution logic. We adopt a unified approach leveraging the LLM’s capabilities. For syntactic changes, the LLM, trained on diverse datasets, handles varying complexities effectively. For execution logic changes, we guide the LLM with a customized prompt, as illustrated in Fig.2. We will discuss it in the revised version. **R3Q4:Challenges in Designing and Validating Reranking Strategies (Especially RefSimRerank) for Maintaining Semantic Accuracy?** 1.Challenges Firstly, The design challenge lies in ensuring that top-ranked comments accurately reflect original context and updated code. Secondly, The validation challenge involves developing the metric of reference similarity score to effectively measure semantic accuracy. 2.Response to challenges Firstly, the LLMCup leverages LLMs to generate candidate comments that already reflect the original context and changes in the code. RefSimRerank, as a subsequent step, focuses on reranking these candidates by using a reference similarity metric, which is calculated based on the updated code. Overall, this ensures that the top-ranked candidate comments are more aligned with both the original context and the updated code. Secondly, Validation integrates manual reviews with the metric of semantic similarity to ensure consistency with ground-truth annotations. **R3Q5:How Does LLMCup Handle Poor-Quality Original Comments? Does Minimizing Changes Retain Them?** Prior studies[80-84] have shown that comments in the referenced projects are of high quality. These comments have been proved to be beneficial for various software engineering tasks, including bug detection[85-88], specification inference[89-91], testing[92,93], and code synthesis[94-98]. However, poor-quality comments could impact updates and lead to suboptimal results. We will discuss this issue in the revised paper and explore alternative solutions to mitigate it. [71] Li et al. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting, ICNIPS, 2007. [72] Qin et al. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. 2023. [73] Burges et al. Learning to Rank Using Gradient Descent. ICML, 2005. [74] Freund et al. An Efficient Boosting Algorithm for Combining Preferences. JMLR, 2003. [75] Herbrich et al. Support Vector Learning for Ordinal Regression. 1999. [76] Ma et al. Zero-shot Listwise Document Reranking with a Large Language Model. ARXIV, 2023. [77] Cao et al. Learning to Rank: From Pairwise Approach to Listwise Approach. ICML, 2007. [78] Ibrahim et al. Comparing Pointwise and Listwise Objective Functions for Random-Forest-Based Learning-to-Rank. TOIS, 2016. [79] Zhai et al. CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis. ICSE, 2020. [80] Tenny et al. Procedures and Comments vs. the Banker’s Algorithm. SIGCSE, 1985. [81] Tenny et al. Program Readability: Procedures Versus Comments. TSE, 1988. [82] Woodfield et al. The Effect of Modularization and Comments on Program Comprehension. ICSE, 1981. [83] Hartzman et al. Maintenance Productivity: Observations Based on an Experience in a Large System Environment. CASCON, 1993. [84] Jiang et al. Examining the Evolution of Code Comments in PostgreSQL. MSR, 2006. [85] Rubio-González et al. Expect the Unexpected: Error Code Mismatches Between Documentation and the Real World. PASTE, 2010. [86] Tan et al. iComment: Bugs or Bad Comments? OSR, 2007. [87] Tan et al. aComment: Mining Annotations from Comments and Code to Detect Interrupt Related Concurrency Bugs. ICSE, 2011. [88] Tan et al. @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. ICST, 2012. [89] Blasi et al. Translating Code Comments to Procedure Specifications. ISSTA, 2018. [90] Pandita et al. Inferring Method Specifications from Natural Language API Descriptions. ICSE, 2012. [91] Zhong et al. Inferring Resource Specifications from Natural Language API Documentation. ASE, 2009. [92] Goi et al. Automatic Generation of Oracles for Exceptional Behaviors. ISSTA, 2016. [93] Wong et al. DASE: Document-Assisted Symbolic Execution for Improving Automated Software Testing. ICSE, 2015. [94] Allamanis et al. Bimodal Modelling of Source Code and Natural Language. ICML, 2015. [95] Gvero et al. Synthesizing Java Expressions from Free-Form Queries. SIGPLAN, 2015. [96] Nguyen et al. Statistical Translation of English Texts to API Code Templates. ICSE-C, 2017. [97] Phan et al. Statistical Learning for Inference Between Implementations and Documentation. ICSE, 2017. [98] Zhai et al. Automatic Model Generation from Documentation for Java API Functions. ICSE, 2016. [99] Nashid et al. Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning, ICSE, 2023.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully