jiannanH
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ## Reviewer U69q -- Reject <!-- A1: It's important to note that our work differs from BLIP and BLIP2. While BLIP and BLIP2 focus on image captioning, our model is designed to understand images, requiring a detailed paragraph for description. Since advertisement understanding is a novel task, we couldn't find a suitable baseline for direct comparison. However, we have compared our model to state-of-the-art large vision-language models (such as MiniGPT4, **LLaVA(which you encourage us to compare with )**, and mPLUG-owl) in our **supplementary materials**. I encourage you to review those carefully. Regarding InstructBLIP, it was published after the deadline for our submission, so we were unable to compare our model to it. I'm uncertain about the meaning behind your question about *How do the models perform w/ and w/o fine-tuning?*. It seems there might be a deep misunderstanding, as we do not employ fine-tuning techniques in our models. --> **Q1: Comparison with other models** A1: Thank you for your suggestions, the work you mention is valuable and our response is as follows: * The dataset we use is "widely used" in early fixed-format advertising comprehension tasks. But we propose a **novel** generic ad comprehension task, which allows a comprehensive and complete understanding of ad content, no longer limited to a fixed format. Since advertisement understanding is a novel task, we couldn’t find a published baseline for direct comparison. However, we have compared AdGPT with the latest and widely influential work (as of our submission) HuggingGPT, which should prove the validity of our approach. * As for the models you mentioned, we need to claim that we have compared our model to state-of-the-art large vision-language models (such as MiniGPT4, LLaVA(which you encourage us to compare with ), and mPLUG-owl) in our supplementary materials. I encourage you to review those carefully. In addition, it’s important to note that our work differs from BLIP and BLIP2. BLIP and BLIP2 focus on image captioning, our model is designed to understand images, requiring a detailed paragraph for description. Regarding InstructBLIP, it was submitted to *arxiv* after the deadline for our submission, so we were unable to compare our model to it. While GPT4 is still in an unaccessible state as of our submission. * Thank you for your suggestion about translating our method to other LLMs. We translate our method to Flan-T5 and found it to be equally effective. Flan-T5 with our methods get better performance in 58.9% of cases, and get the same performances in 39.2% of cases. * In fact, there is no ground truth in the new ad comprehension task, and it cannot be directly finetune. <!-- * but since the rebuttal period was relatively short, we did not have time to finish the experiment. Moreover, these two works are only arxiv papers, and the comparison with them is not necessary according to the ACM MM policy. Finally, ChatGPT theoretically outperforms these works, and as a work of the same plugin type, we mainly compare AdGPT with HuggingGPT, which should illustrate the superiority of our approach. --> <!-- * We further translate our method to xxx, the results show that our approach works well on the model.(how well does this method translate to other open source LLMs - Flan-t5, Llama and various sized models - 300m vs for example 65 B? How do the models perform w/ and w/o fine-tuning?) --> <!-- A2:Our paper describes the improvements we have made over previous work. One of the major limitations of previous research, based on the Pitt Ad dataset, is that they could only understand advertisements using fixed sentences. In contrast, our model produces higher-quality and more diverse understandings of advertising content. Consequently, the traditional metrics used with the Pitt Ad dataset may not accurately evaluate the capabilities of our model. In this regard, conducting a user study can be the most effective way to evaluate our model's performance. --> **Q2: Why did the authors need to introduce user study for understanding tasks, when the Pitts Ads dataset already contains many evaluation tasks and corresponding metrics?** A2: Our reasons for not using the previous evaluation tasks and metrics are as follows: 1. The previous evaluation metrics were based on understanding advertisements through fixed-format sentences. Due to the development of LLMs, the linguistic expression of current models is greatly enhanced compared to previous works, new evaluation tasks and metrics are strongly needed. Therefore, we have proposed a more general and meaningful advertisement understanding task that can provide a more comprehensive summary of advertisements. 2. For the novel task of ad understanding, in the absence of ground truth, we first considered manual evaluation of model-generated ad understanding, i.e., user study. Besides, we further proposed a more convenient evaluation metric, Generative Similar Score (GSS). We conducted extensive experiments to verify the consistency of GSS and user study, and proved the reliability of GSS. Such provides an alternative way to evaluate the model's ability to understand advertising. <!-- For instance, the model is already capable of generating more natural ad summaries. It would be contradictory to our motivation for the ad understanding task to abstract these summaries into unnatural fixed-format templates solely for the purpose of using previous evaluation metrics. --> <!-- Traditional metrics are unable to evaluate this task, so we employ the user study as an evaluation metric. Additionally, we have introduced a new evaluation metric called Generative Similarity Score and validated its reliability through user studies. --> ## Reviewer 6hCq -- Boardline reject Q1: The motivation behind understanding advertisements A1: It's true that an unclear messaging in an advertisement could be attributed to a design failure rather than a failure of the reader's comprehension. However, the motivation for ad understanding is, on the one hand, to understand well-designed ads and perform downstream tasks based on these understandings, and on the other hand, ads that design failures can be analyzed and fed back to the ad agency in a timely manner. Our work aims to make the process of understanding advertising more efficient and concise by automating it. Q2: How to ensure the reliability and accuracy of the visual model A2: The accuracy of the visual model is indeed a bottleneck. We cannot guarantee that the output of the vision expert model is always correct. However, our contribution lies in improving the effectiveness of understanding image content when using the output of the vision expert model. Our approach can be seen as a plug-and-play module that can be integrated with more powerful visual expert models. We believe that our work can be really helpful for companys or people who cannot fine-tune their own Large-scale Vision-Language models. Q3: What is the effect of shared memory. A3: We appreciate your valuable advice. Our research explores the effect of shared memory. We randomly sampled 300 examples and performed the user study to validate the method, as shown in the following table. In conclusion, shared memory shows a 3% performance improvement over the unshared one. The results of this experiment will be included in our official version. <!-- Specifically, the shared-memory-based method outperformed the original method in 27.8% of cases, and get the same performance in 47.4% We will update this experiment in the latest version of our paper to provide further details. --> | Model |User Study | Generative Similarity Score | | -------- | -------- | -------- | | AdGPT w/ share memory | 27.8% | 0.5697 | | AdGPT w/o share memory | 24.8% | 0.5844 | ## Reviewer TvRf -- Boardline accept Q1: Overclaimed the multi-modality ability of our work. A1: Our work extends the multimodal capabilities of ChatGPT by way of plugins. Inspired by HuggingGPT[1], which can combine images and texts and more focused on planning for the use of visual expert models, we propose AdGPT focusing more on how to better reason about the results after acquiring visual information. On the other hand, sprit of ACM MM encourages researchers to do some meaningful application with cutting-edge technology. Our work follows this spirit and does some meaningful work. Q2: User study should be conducted in a broader population A2: Thanks for your valuable advise. We apply extern user study with five undergraduate students who with no background of Computer Vision. The extern experiment shows our model beat mPLUG-owl in 56.01% of cases, and get the same performance in 24.48% of cases. We will update this new result of experiment in the latest verision of our paper. | Model |User Study | Generative Similarity Score | | -------- | -------- | -------- | | mPLUG-owl | 24.48% | 0.5699 | |Our model |56.01% | 0.5844 | [1] Shen, Yongliang, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. "Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface." arXiv preprint arXiv:2303.17580 (2023). ## Reviewer dKD2 -- Boardline accept Q1: Additional explanation of the unclear details mentioned in the Further Comments A2: Additional explanation of the unclear details are as follows: 1. The fourth step of AdGPT requires both observation information and an adaptive chain of thought. The Classification step is utilized to generate adaptive inference chains. 2. AdGPT can be viewed as a system, wherein obtaining observations (step 1) is an integral component of this system. 3. Classification influences the adaptive chain of thought, resulting in different chains of thought being produced by ChatGPT for different categories. This demonstrates a focus on distinct priorities. For example, product advertisements may emphasize the selling points of a product, while social advertisements may aim to encourage people to engage in a specific behavior. 4. In HugginGPT, instead of using observation information, we employ images as input. Incorrect visual information can be regarded as a limitation in HugginGPT's capabilities. In figure.1. We input the Oberservation information to non-prompt ChatGPT, instead of HuggingGPT, to better illustrate our method 5. The final result will have the corresponding features of the classification. For example, for a product ad, AdGPT tend to go over the advantages and selling points of the product. While for a public sever ad, AdGPT tend to tap into the actions that the ads call on people to take. I really appreciate your valuable advise, In the new version of the paper we will make the article more specific and clear according to your comments. Q2: More qualitative and quantitative results to verify the effectiveness of AdGPT A2: Thanks for your valuable advice. We do extern experiments both qualitatively and quantitatively. Considering the principle of rebuttal, we submitted the anonymous GitHub page with the qualitative results to Area Chair. He will decide whether to present the results in an open discussion. We also do the extern quantitative experiment in 300 images, the generation score of mPlUG-owl is 0.5699, compared to 0.5844 of Our AdGPT. And our model does better than mplug-owl in 56.01% of cases, getting the same performance in 24.48 % of cases. | Model |User Study | Generative Similarity Score | | -------- | -------- | -------- | | mPLUG-owl | 24.48% | 0.5699 | |Our model |56.01% | 0.5844 | Q3: Relationship between 3000 images and meaningful advertisement. A3: Pitt Ad datasets contain lots of meaningful advertisement. To our knowledge, MetaCLUE[1], a set of vision tasks on visual metaphor, collects image from Pitt ad datasets. So most of advertisement in Pitt Ad datasets can be considered as "meaningful advertisement". [1] Akula, Arjun R., Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi et al. "Metaclue: Towards comprehensive visual metaphors research." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 23201-23211. 2023. ## To AC Dear Area Chairs, We are writing this letter to raise your attention to the inappropriate rejection reasons from *Reviewer U69q*. First, he/she presented a number of models that we were asked to compare. We have carefully examined our paper and clarify as follows: 1. Among them, for the comparison of "competitive methods which essentially do the same thing", we have compared MiniGPT-4, mPLUG-owl and LLaVA(mentioned by *Reviewer U69q*) in our supplementary submission. 2. The work of BLIP and BLIP2 don't do the same thing as our work, they can't accomplish the task of ad understanding that we proposed. Additionally, InstructBLIP, and LLaMa are not officially published work, but only arxiv papers, and according to ACM MM's policy, not necessary comparative work. Second, his/her comment that "Lack of evaluation is the biggest weakness in the current work" is a misunderstanding. We have valid reasons for not using previous evaluation tasks and metrics: 1. Due to the significant advancement of LLMs, they are now more linguistically expressive than before. Therefore, new evaluation tasks and metrics are strongly needed, and we propose a more comprehensive and meaningful advertising comprehension task that captures the essence of advertising. 2. We not only adopt user study, but also proposed a more convenient evaluation metric, **Generative Similar Score (GSS)**. We conducted extensive experiments to verify the consistency of GSS and user study and the reliability of GSS, and other reviewers agreed with this metric(*Reviewer TvRf and dKD2*) and called it **creative** (*Reviewer TvRf*). Therefore, we believe that our work has appropriate evaluation. *Reviewer 6hCq* also mentioned that we have done **through metric evaluations**. It seems that *Reviewer U69q* did not take much time in reviewing this work, and rejecting this work with inaccurate comments is not acceptable. He/She didn't seem to understand our work at all. We sincerely request you to please take a look at our rebuttal and the main paper, consider our appeal, and render a more convincing decision. In addition, for more examples requested by *Reviewer dKD2*, we put our additional qualitative results on an anonymous github page AdGPT1.github.io, you can decide whether to publish this result to the reviewers during the open discussion. We double-checked the link to make sure it was anonymous, so we sincerely hope the reviewers will see this result. Thanks for your time! Authors <!-- Dear Area Chairs, We are writing this letter to raise your attention to the inappropriate rejection reasons from Reviewer U69q. 1. We have show experiment which compare our model with LLaVa, while he asked us to compare with this model. 2. Further, most of the models he asks us to compare are completely unreasonable. BLIP and BLIP2 don't do the same thing as our work. InstructBLIP was published after deadline of MM. 3. He questioned our reasons for using user study, which is exactly what motivated us to do this work. And we elaborate on this reason in the article 4. The reason why he refer to fine-tuning is confusing. It seems that Reviewer U69q did not take much time in reviewing this work, and simply rejecting this work with two inaccurate comments is not acceptable. He didn't seem to understand our work at all. We sincerely request you to please take a look at our rebuttal and the main paper, consider our appeal, and render a more convincing decision. In addition, we put our additional qualitative results on an anonymous github page AdGPT1.github.io, you can decide whether to publish this result to the reviewers during the open discussion. We double-checked the link to make sure it was anonymous, so we sincerely hope the reviewers will see this result. Thanks for your time! Authors --> # To all <!-- We thank the reviewers for their constrcutive comments. It is inspiring to see that they confirmed that our work is meaninful and interesting. Our method get high performance when comparing with mPLUG-owl(dKD2, tvRF). We explored the effects of shared memory(6hCq), and show more qualitative results of AdGPT(dKD2). And we explain the detail of our work(dkD2). --> <!-- mx: --> We thank the reviewers for their constructive comments. It is inspiring to see that the reviewers acknowledge the **motivation and significance** of the paper's approach in leveraging text generation for advertisement understanding(TvRf), and found it **interesting**(dKD2, 6hCq) and **tactful**(TvRf). They also appreciate the **effectiveness of AdGPT** and its improvement over HuggingGPT (TvRf, dKD2). In addition, they endorsed our proposed **new evaluation metric**, the Generative Similarity Score(TvRf, dKD2), and found it to be **creative**(TvRf). The paper's **well-written content(U69q), thorough metric evaluations(6hCq), and inclusion of multiple case studies** are recognized as strengths (6hCq). We believe the remaining issues can be fully addressed and responds to each reviewer in detail under each reviewer's comments.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully