cjuhwan99
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # COLING Rebuttal 3385 (KICK) ## Response to Reviewer 1 We highly value your insightful comments and constructive suggestions on our manuscript. We have addressed each point raised by the reviewer as follows: ### [1. The language of the proposed dataset] We acknowledge the need for clarity regarding our dataset's language. KICK is exclusively in Korean, as stated in the manuscript. To highlight this more effectively, we will add parallel Korean examples in the appendix, which were not permissible in the initial submission. Furthermore, the figures depict the interaction between commentators and casters based on English-translated examples. This was a strategic choice to communicate the features of KICK to a broader audience within the confines of our space limitations. In the camera-ready version, we will explicitly state the dataset's language, including revised figures, to enhance comprehension of its linguistic aspects. ### [2. The definition of "belief state"] Thank you for pointing out the need for a clearer definition of "belief state". In our paper, it conforms to the standard dialogue state tracking (DST) usage. A belief state in DST is an essential component that encapsulates a subject and its specific content, generally represented as (domain-slot-value) pairs. In our context, it's confined to (slot-value) pairs within a single domain. The model's role is to predict these belief states at each dialogue turn, with "joint goal accuracy(JGA)" measuring the cumulative accuracy and "turn goal accuracy(TGA)" indicating individual turn accuracy. Even though various papers that proposed a dataset for DST use the term "belief state" without a distinct definition [1-3], we recognize that a more comprehensive definition could better assist readers less familiar with DST. Therefore, we will elaborate on this in the camera-ready version with relevant references [3-5] and support understanding through releasing the source code. [1] Budzianowski et al., "MultiWOZ -- A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling", EMNLP 2018. [2] Henderson et al., "The Second Dialog State Tracking Challenge", SIGDIAL 2014. [3] Park et al., "KLUE: Korean Language Understanding Evaluation", NeurIPS 2021. [4] Kim et al., "Mismatch between Multi-turn Dialogue and its Evaluation Metric in Dialogue State Tracking", ACL 2022. [5] Dey et al., "Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances", ACL 2022. ### [3. The statement on Section 3.1.1] We appreciate your attention to detail in Section 3.1.1. The statement regarding 'adding 45 minutes to the second half' was intended to emphasize that the 45 minutes represent the full duration of the first half, rather than a literal extension of the second half. This practice aligns with standard football statistical methods and enhances clarity between the halves. For instance, if the raw data indicates '11 minutes into the second half,' it is adjusted to '56 minutes (11 + 45)' for clarity and to prevent confusion between the first and second halves. In conclusion, we aimed to use expressions commonly employed in prior research studies and standard football statistical systems [1-3]. [1] https://www.fifa.com/fifaplus/en/match-centre [2] https://www.fotmob.com/ [3] Taniguchi et al., "Generating Live Soccer-Match Commentary from Play Data", AAAI 2019. ### [4. Details on Section 3.1.2] Your feedback on Section 3.1.2 is invaluable. We will explicitly elaborate the annotators' roles and tasks mentioned in item 1 of Section 3.1.2, detailing their responsibility to accurately transcribe ASR-generated text for timestamps corresponding to events in the conversation based on the guidelines. Specifically, the annotators are required to: - Correct sentences transcribed from ASR to ensure accuracy. - Ensure precise alignment with provided metadata for player names and home-away information. - Assign timestamps corresponding to highlights within the match, ensuring they align with the game time indicated in the highlights video provided. - Differentiate speakers between "caster" and "commentator". We are committed to updating the manuscript to clarify this process. Additionally, the English and Korean version of these guidelines will be available in the code repository for better transparency. Furthermore, we will encompass examples of the annotation process in the appendix to offer concrete illustration of the annotation procedure. ### [5. Organization in Section 4 and 5] We acknowledge the necessity for clarity in Sections 4 and 5. Section 4 outlines the tasks with the dataset, and Subsection 4.2 discusses the employed metrics. We introduce "accumulate belief state" also called JGA and "turn goal accuracy" (TGA) metrics to compare conversational characteristics of commentators and casters. The "relative goal index" (RGI) metric is also presented to measure the balance between turn-level and dialogue-level accuracy. A higher RGI value (closer to 1) indicates an emphasis on overall dialogue flow understanding, while a lower RGI value (closer to 0) suggests a focus on local information. In light of your feedback, we will enhance the contextualization of Tables 1, 2, and 3 in the revised version for better comprehension. In Table 1, we will outline slot-value pairs representing match events, including the unique challenge of predicting non-categorical scores without predefined values. Numerical values of Table 2 will be enhanced for clarity, specifically highlighting the "avg. Tokens / Turn" feature. For Table 3, we will include explanations in the caption to clarify terms, particularly RGI. ### [6. Overall writing] We recognize the importance of manuscript organization and writing quality. We will undertake a thorough revision to enhance clarity and integrate the reviewer's feedback effectively. ### [7. The comparison with other datasets] We appreciate your emphasis on comparing our dataset with others. We will include a detailed comparison with MultiWOZ zero-shot DST, where the GPT-3.5-turbo achieved a state-of-the-art JGA score of 56.44 [1], in the revised version. While we acknowledge the value of direct comparisons, our dataset's unique focus on 'user-user' interactions in sports commentary sets it apart. These characteristics is distinguished from previous datasets, which was based on the dialogue between 'user-system'. We will add a section on this interaction in Table 2 to highlight these unique properties. [1] Heck et al., "ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?", ACL 2023. We hope these planned revisions align with your expectations and address the points you raised. We admire your detailed review and your attention to detail, which helps us enhance the quality and clarity of our work. Thank you once again for your insightful comments. ## Response to Reviewer 2 Thank you for your review and thoughtful feedback on our manuscript. We are glad that you recognized the main contributions of our paper: a new dataset that incorporates dialogues between person and person with different roles. Below, we address the reviewer's comment on the extent of experiments. We acknowledge the need for experiments with finetuning-based classifiers. Inspired by the BERT-based TripPy [1], we performed experiments using RoBERTa. However, the performance was notably worse than prompt-based appraoches even in first half setting, described in the manuscript (JGA: -15.29%p, TGA: -20.63%p, SA: -51.02%p, RSA: -57.05%p compared to GPT-3.5-turbo). We examined the result and revealed several possible rationales for this outcome: 1. "KICK" stands out with an average of 40.4 tokens per turn, handling longer sentences per turn compared to other datasets, as stated in the manuscript. As a result, traditional methodologies that rely on inputting the entire conversation history may not perform optimally for this dataset. 2. We observed difficulty in distinguishing between "home" and "away" values, suggesting the necessity for additional preprocessing to differentiate them within the same slot. 3. Unlike previous datasets with a straightforward 'user-system' order, "KICK" presented a dynamic order of caster and commentator, making it challenging for the model to explicitly distinguish between their roles. Additionally, we have conducted additional experiments using the GPT4-1106-preview model. In the using both caster and commentator utterances scenario, we observed overall strong performance metrics (JGA: +1.59%p, TGA: +12.64%p, SA: +7.40%p, RSA: +4.80%p compared to GPT-3.5-turbo, RGI: 0.6074), indicating better comprehension of the overall dialogue flow compared to GPT-3.5. The RGI also indicates a tendency to prioritize turn-level understanding over GPT-3.5 (RGI: 0.3517), a trend similarly observed when evaluating performance solely based on TGA. Moreover, we revealed a tendency where JGA was higher when using only commentator utterance and TGA was higher only caster utterance, consistent with previous experiments in the manuscript. We will attach the detailed experimental results in the camera-ready version. Furthermore, human evaluation was conducted to complement the quantitative analysis. To ensure the fairness of the experiment, it was conducted with five non-experts unrelated to the football field. The results indicated a significant improvement over the baseline in first half setting (JGA: +19.42%p, TGA: +35.49%p, SA: +3.46%p, RSA: +12.64%p compared to GPT-3.5-turbo, RGI: 0.8155), suggesting that while large language model performs well across various tasks, there are still limitations in surpassing human performance on the KICK dataset. We assume that there are two primary reasons for this phenomenon. Firstly, large language model currently lacks explicit coreference capabilities [2-3], such as memorization, making it limited in remembering the history of long conversations like those in KICK. Secondly, given that KICK is in Korean, the model performance could be affected by the multilingual ability of the model [4]. [1] Heck et al., "TripPy: A Triple Copy Strategy for Value Independent Neural Dialog State Tracking", SIGDIAL 2020. [2] Heck et al., "ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?", ACL 2023. [3] Mullick et al., "Better Handling Coreference Resolution in Aspect Level Sentiment Classification by Fine-Tuning Language Models", EMNLP 2023 Workshop. [4] OpenAI, "GPT-4 Technical Report", arXiv Preprint 2023. Your constructive feedback on the need of additional experiment has significantly contributed to the refinement of our work. Thank you once again for your insightful comments. ## General Response We extend our sincere gratitude to the reviewers for their thoughtful feedback and insightful suggestions on our manuscript. We appreciate the positive recognition of our strengths of our dataset. We have diligently addressed the major concerns raised by the reviewers: - **Clarification of the dataset's language (R1)**: To ensure clarity, we will explicitly state the KICK's language in the camera-ready version, along with incorporating parallel Korean examples in the appendix. - **Elaboration on "belief state" (R1)**: To enhance understanding, we will provide a detailed definition in the revised version, supported by relevant references and accompanying source code release. - **Data preprocessing procedure (R1)**: We will clarify the data preprocessing steps in Section 3.1.1 to address concerns raised and ensure clear delineation of each procedure, such as adjusting time notation between first and second half. - **Annotation guidelines (R1)**: We will enhance Section 3.1.2 by providing clear explanations of the annotators' tasks, including transcribing ASR-generated text and timestamps, along with guidelines and examples in the appendix. - **Clarification on metrics and tables (R1)**: We aim to clarify the significance of each metric in Sections 4 and 5, ensuring their contributions to evaluating dataset quality are clearly outlined. Furthermore, detailed explanations will be included in the captions of Tables 1, 2, and 3 to facilitate the interpretation of numerical values. - **Organization of the writing issues (R1)**: We are committed to revising the writing of the paper to align with the reviewers' feedback. - **Comparison with other datasets (R1)**: We will provide a detailed comparison with zero-shot DST on MultiWOZ, showcasing the GPT-3.5-turbo's state-of-the-art JGA score. Additionally, we will clarify the 'user-user' interactions in KICK, explaining the challenges of direct comparisons due to the distinctive nature of our dataset. - **Necessity of Additional Experiments (R2)**: We have conducted additional experiments on RoBERTa, GPT-4, and human evaluation and report their performance. We sincerely thank the reviewers for their constructive feedback, which has played a crucial role in enhancing the quality and clarity of our work. ## Response to Chairs Dear Chairs, We are grateful for the detailed reviews from the reviewers. However, while we appreciate every reviewers, we wish to address some concerns regarding the evaluations from Reviewer 1. Regarding the "belief state" concept, it is a standard term in dialogue state tracking(DST), often not elaborated upon in similar papers. The notation of adding 45 minutes to mark the second half in Section 3.1.1 is a standard practice in football commentary for clarity. The metric joint goal accuracy(JGA), mentioned in Section 4.2, is a well-established metric in DST. We believe that the reviews from Reviewer 1 may not fully reflect the nuances of the sports domain or provide a comprehensive evaluation of the DST aspects. As our paper have received only two reviews, we hope these clarifications aid in a more accurate understanding of the concepts discussed. Thank you for your thoughtful consideration and invaluable service to the community. Sincerely, Paper3385 Authors

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully