cjuhwan99
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ACL 2024 Rebuttal - 1921 (MultiNews+) ## Reviewer bSjA (Soundness 3, Overall 2) Thank you for dedicating your time to a comprehensive review of our manuscript. We deeply appreciate your acknowledgment of the significance of the dataset cleansing to enhance the quality of the given dataset. Our intention to make Multi-News+ dataset publicly available stems from our commitment to advancing the field and serving the research community. Your feedback is invaluable to us, and we are eager to address and clarify any concerns you've raised to ensure the quality and clarity of our work. ### Response to Weakness 1: Limited Novelty > There is a lack of novelty and significance. In fact, dataset cleansing/filtering with language models is not a new idea. There is already a line of work on improving the factuality of summarization datasets through filtering [1, 2, 3]. Simply using GPT-3.5 to filter one dataset does not seem like enough contribution. We appreciate your feedback regarding the lack of comparison between our work and previous works. Our paper aims to introduce Large Language Models to enhance the quality of real-world datasets, distinguishing them from existing studies [1,2,3]. Previous researches that the reviewer mentioned applied data filtering to training datasets to improve the factual consistency of abstractive summarization models. However, we have focused on the significant noise and extensive cleaning required in datasets primarily obtained through web crawling [4]. To address this data cleaning challenge, we utilized large language models to reduce human effort, eliminating the need for external modules and expensive fine-tuning processes, unlike in previous studies [1, 2, 3]. Furthermore, instead of using multiple fine-tuned models and filtering data based on fixed thresholds through intersection [3], we implemented methods such as majority voting and chain-of-thought reasoning to effectively imitate human experts' processes. This approach not only provides rationale for decision-making and improves transparency, but also reduces the need for fine-tuning each model individually. Therefore, our paper emphasizes our attempt, particularly using large language models, to enhance real-world datasets, such as those obtained through web crawling. Furthermore, we believe that the core contribution of our paper lies in the release of the Multi-News+ dataset, which is an enhanced version of previous Multi-News dataset. As stated in the manuscript, we plan to publicly open the dataset for future research. We are committed to involve these discussion with respect to previous works in our updated manuscript. ``` [1] Kazuki Matsumaru, Sho Takase, and Naoaki Okazaki. 2020. Improving truthfulness of head- line generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Lin- guistics, pages 1335–1346, Online. Association for Computational Linguistics. [2] Feng Nan, Ramesh Nallapati, Zhiguo Wang, Ci- cero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, and Bing Xiang. 2021. Entity-level factual consistency of abstractive text summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2727–2733, Online. Association for Computational Linguistics. [3] Yanzhu Guo, Chloé Clavel, Moussa Kamal Eddine, and Michalis Vazirgiannis. 2022. Questioning the valid- ity of summarization datasets and improving their factual consistency. In Proceedings of the 2022 Con- ference on Empirical Methods in Natural Language Processing, pages 5716–5727, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. [4] Kreutzer, Julia, et al. "Quality at a glance: An audit of web-crawled multilingual datasets." Transactions of the Association for Computational Linguistics 10 (2022): 50-72. ``` ### Response to Weakness 2: Necessity of Extensive Experiments > Downstream task experiments are limited to fine-tuning T5 and BART. State-of-the-art instruction tuning approaches for decoder-only models were not tested. We acknowledge the concern raised by reviewer regarding the limited extent of the experiment. We are planning to conduct additional experiment using LLMs, such as Mistral and LLaMA. Even though we could not include the experimental result in this initial rebuttal due to the limited time, we will ensure to attach this experiment in the updated version. ### Response to Weakness 3: Evaluation on Annotation > Annotations by GPT-3.5 were not evaluated. We deeply appreciate the feedback by the reviewer concerning the evaluation of the annotations performed by GPT-3.5. Our initial intention about the paper was to showcase the possibility of dataset cleansing with proposed method, as the manuscript is a short paper. However, we acknowledge the importance of the evaluation of the annotation. In response to your invaluable feedback, we performed an initial evaluation on the 379 sets that are classified as not to contain relevant source articles, as mentioned Lines 213-214 and Appendix C. We examined these examples and found 93.93% of 379 sets were correctly classified as full of noisy documents. Moreover, we suggest an error analysis on several wrongly classified cases. Upon examining IDX 1890, it can be observed that sentences which could be obtained during crawling, such as "You always have the option to delete your tweet location history. Learn more," are included in the original text. Adn the frequency of such sentences, especially when they do not clearly specify proper nouns, can result in a lack of correlation with the summary. Additionally, when looking at IDX 18846, the summary describes content related to the victims of a suicide bombing, whereas the original text is written from the perspective of the suicide bomber. This indicates that GPT may not distinguish the difference in expression that occurs during the process of converting a narrative into a summary. We are committed to incorporate this evaluation and analysis into the updated version of the manuscript. Once again, we sincerely appreciate your priceless feedback. ### Response to Comment 1: Contribution of the Paper > While the topic of data quality enhancement is of interest to the community, the contribution of this paper is too trivial to warrant publication at a major venue. In its current form, it might be of interest for a workshop. To broaden the impact of the paper, the authors may consider conducting a more comprehensive survey instead of a single use case study. It would be interesting to see a categorization of dataset quality defaults across different NLP tasks and a generalizable LLM based solution. We acknowledge the viewpoint of the reviewer regarding the contribution of our paper. However, we wish the reviewer to consider that the manuscript is a short paper, which is considered to present *[small, focused contributions](https://aclrollingreview.org/cfp)* such as a *[proof of concept](https://aclrollingreview.org/reviewertutorial)*. In this paper, we presented an initial idea to cleanse and enhance the quality of real-world dataset, which aligns with the purpose of the short paper. Additionally, our proposed method exhibits various strenghts such as the demonstration of the rationale behind the annotation with the adaptation of chain-of-thought technique, which eases further investigation by human annotators. Lastly, we presented a cleansed version of existing dataset, Multi-News+. We will incorporate this discussion into our updated version of the paper, strenghtening the presentation of the contribution of our paper. ### Response to Comment 2: Construction Process of Multi-News > There should have been more detailed explanations on how the Multi-News was constructed (in addition to “automated crawling from the Internet Archive”), for the readers to understand why the noisy documents were included in the dataset. We respect this kind comment mentioning the necessity of more detailed explanation of the construction process of previous Multi-News dataset. We will ensure to update the manuscript accordingly. We appreciate your thorough review and your attention to detail, which helps us enhance the quality and clarity of our work. ## Reviewer sBB6 (Soundness 1.5 , Overall 1) We appreciate your insightful feedback and suggestions regarding our manuscript. We are grateful for the recognition of the value of our cleansed Multi-News+ with positive rating on 'Datasets' score. Below, we address the reviewer's comments and suggestions. ### Response to Weakness A: Real-world Case of Multi-News > The authors tried to filter out noise documents. Although noise documents are not relevant to the general topic, they might demonstrate a real-world case where sometimes not all automatically-extracted documents are relevant. In fact, the authors created a cleaner version that fits cases where the documents were extracted manually or with high precision. This is a worthy cause, but it should be discussed in the paper. We acknowledge the viewpoint of the reviewer regarding the possibility of real-world case. While we fully understand the concern raised by the reviewer, we believe such possibility can be dealt with the reciprocal usage of our Multi-News+ and previous Multi-News dataset. For instance, one could utilize previous Multi-News dataset when the trained model is expected to consistently deal with noisy documents for inference and there is no pre-defined strategies for filtering out these noisy document at inference time. Otherwise, for cases where the model is expected to only handle clean documents, it will be more beneficial to utilize our proposed Multi-News+ dataset for training the model. We are grateful for your feedback regarding this discussion and are planning to incorporate the discussion into the manuscript. ### Response to Weakness B: Easier Task of Multi-News+ > In practice, the authors filtered out not only documents that are not related to the topic of the set of documents, but also documents that are not related to the summary. This may cause filtering relevant documents that do not contain salient information. In fact, it makes the task easier while focusing on the salient information only. I would at least expect manual verification showing the rate of actual noise filtering. We acknowledge the reviewer's perspective regarding this issue. While we acknowledge the concern raised by the reviewer, we would like to carefully point out every experiment were conducted on the test set of Multi-News+, to ensure the fair comparison between the models. For instance, it would be also easier to handle clean test set of Multi-News+ dataset for the model trained with previous Multi-News dataset, which includes noisy documents. Additionally, we include an initial version of verification showcasing the rate of correct noise filtering in the next response. ### Response to Weakness C: Evaluation on the Quality of the Filtering > No manual annotation at all. Not to assess the quality of filtering, and not to assess the quality of the result summaries. We fully understand the concern raised by the reviewer. Our initial intension was to showcase the possibility of LLM-based dataset cleansing, and to present the cleansed version of the real-woirld datasets, different from previous studies on LLM-based data annotation. This intention was based on the purpose of the short paper according to the [CFP](https://aclrollingreview.org/cfp), and [reviewer guide](https://aclrollingreview.org/reviewertutorial), which suggests a *small, focused contribution* such as *proof of a concept* is suitable for a short paper. Furthermore, we were planning to publicly open the dataset, which facilitates the future investigation regarding the quality of the annotation performed in our study. However, thanks to the invaluable feedback of the reviewer, we now acknowledge the importance of human verification of the result of annotation. In response to the feedback raised by the reviewer, we conducted an examination on 379 documents that are classified to not have any relevant documents regarding the summary. As a result, we found 93.93% of the annotation is valid. We are planning to incorporate this evaluation result and thorough error analysis of failure caess. We once again extend our sincere gratitude on the reviewer's feedback regarding the importance of human verification of the result of the annotation. ### Response to Weakness D: Limited Novelty > The filtering method is not so novel. It is very similar to things that were done in the past. We acknowledge your concern regarding the novelty of our proposed method. In respose to your valuable feedback, we conducted a comparison between our method and previous studies. Our paper aims to introduce large language models to enhance the quality of real-world datasets, distinguishing them from existing studies [1, 2, 3]. Previous researches focused on data filtering to training datasets to improve the factual consistency of abstractive summarization models. However, we have focused on the significant noise and extensive cleaning required in datasets primarily obtained through web crawling [4]. To address this data cleaning challenge, we utilized large language models to reduce human effort, eliminating the need for external modules and expensive fine-tuning processes, unlike in previous studies [1, 2, 3]. Furthermore, instead of using multiple fine-tuned models and filtering data based on fixed thresholds through intersection [3], we implemented methods such as majority voting and chain-of-thought reasoning to effectively imitate human experts' processes. This approach not only provides rationale for decision-making and improves transparency, but also reduces the need for fine-tuning each model individually. Therefore, our paper emphasizes our attempt, particularly using large language models, to enhance real-world datasets, such as those obtained through web crawling. We appreciate your kind feedback regarding this issue. We plan to improve the presentation of the novelty of our method in the updated version of the paper. ``` [1] Kazuki Matsumaru, Sho Takase, and Naoaki Okazaki. 2020. Improving truthfulness of head- line generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Lin- guistics, pages 1335–1346, Online. Association for Computational Linguistics. [2] Feng Nan, Ramesh Nallapati, Zhiguo Wang, Ci- cero Nogueira dos Santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, and Bing Xiang. 2021. Entity-level factual consistency of abstractive text summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2727–2733, Online. Association for Computational Linguistics. [3] Yanzhu Guo, Chloé Clavel, Moussa Kamal Eddine, and Michalis Vazirgiannis. 2022. Questioning the valid- ity of summarization datasets and improving their factual consistency. In Proceedings of the 2022 Con- ference on Empirical Methods in Natural Language Processing, pages 5716–5727, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. [4] Kreutzer, Julia, et al. "Quality at a glance: An audit of web-crawled multilingual datasets." Transactions of the Association for Computational Linguistics 10 (2022): 50-72. ``` ### Response to Comment: Modification on Figure 2 > It is hard to distinguish the different colors in Figure 2 in the high "number of articles". We appreciate this feedback from the reviewer. We will ensure to update the organization of Figure 2. Once again, we extend our genuine appreciation for the feedbacks from the reviewer. We are committed to incorporate these feedbacks and discussions into the updated version of the manuscript. ## Reviewer n44F (Soundness 2.5 , Overall 2) Thank you for your review and thoughtful feedback on our manuscript. We have carefully considered your comments and concerns and would like to address them as follows: ### Response to Weakness 1: Lack of Analysis of Multi-News dataset > The study lacks a thorough analysis of the Multi-NEWS dataset. Metrics specific to automatic summarization reveal the characteristics of the Multi-NEWS dataset, shedding light on the types of noisy samples it contains. Previous works have proposed several filtration strategies to uphold the dataset's high quality. We acknowledge your concern regarding the necessity of thorough analysis of Multi-News dataset. Our initial intention about the paper was to showcase the possibility of dataset cleansing with proposed method, as the manuscript is a short paper. However, we acknowledge the importance of the thorough analysis of the pattern of the noise included in Multi-News dataset. We are planning to analyze the characteristics and pattern of the noise of Multi-News dataset, leveraging the method mentioned in studies the reviewer mentioned. We appreciate the suggestion of the reviewer, and we believe such thorough analysis will help readers to understand the problem of Multi-News dataset. ### Response to Weakness 2: The Usage of Summarization-specific Filtration Strategies > The initial step in enhancing the quality of any summarization dataset involves applying summarization-specific filtration strategies. From the remaining samples, identifying noisy ones becomes more feasible. This approach could potentially reduce data annotation costs to under $500. Regarding the previous feedback, we acknowledge the possibility of incorporating previous filtration strategies and our proposed method, further decreasing the required cost. We are committed to include the discussion on this future extension. We sincerely appreciate for this valuable feedback. ### Response to Weakness 3: Qualitative Analysis > Despite the performance enhancements on the Multi-NEWS dataset, the qualitative analysis of the models is lacking. We deeply appreciate the feedback by the reviewer concerning the evaluation of the annotations performed by GPT-3.5. In response to your invaluable feedback, we performed an initial evaluation on the 379 sets that are classified as not to contain relevant source articles, as mentioned in Lines 213-214 and Appendix C. We examined these examples and found 93.93% of 379 sets were correctly classified as full of noisy documents. We are committed to incorporate this evaluation and analysis for failure cases into the updated version of the manuscript. Once again, we sincerely appreciate your priceless feedback. ### Response to Weakness 4: Partial Contribution of the Documents to the Summary > How does the proposed approach handle cases where one document contributes only partially to the summary? We appreciate this insightful question. While we could not attach the result of the analysis regarding this question, we are planning to incorporate this analysis into the updated version of the manuscript. ### Response to Weakness 5: Necessity of LLM Baseline > It is suggested to include one or two baselines with Large Language Models (LLMs). We acknowledge the concern raised by reviewer regarding the limited extent of the experiment. We are planning to conduct additional experiment using LLMs, such as Mistral and LLaMA. Even though we could not include the experimental result in this initial rebuttal due to the limited time, we will ensure to attach this experiment in the updated version. We are grateful for your thoroguh feedback, which gives us opportunities to refine our work.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully