Xin Zhang
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ELSA Rebuttal ## Reviewer YJEr ### Weakness > **Your Comment:** *The authors propose that $P(Y|X)$ is hard to estimate, then existing methods that estimate $\omega$ using the maximum likelihood estimation are not consistent anymore. To accurately estimate $\omega$, the authors propose to avoid estimating $P(Y|X)$ and estimating $P(X|Y)$ instead. However, it seems that estimating $P(Y|X)$ and $P(X|Y)$ have similar difficulty without any assumption.* **Our Response:** Thanks for your comment. Although we rewrite the likelihood function using $p(\boldsymbol{x}|y)$ in (5), we do not intend to estimate $p(\boldsymbol{x}|y)$. Here $p(\boldsymbol{x}|y)$ serves as an intermidiate component for deriving the perpendicular space. The reason for choosing $p(\boldsymbol{x}|y)$ is that $p(\boldsymbol{x}|y)$ is the same in both the source and target distributions, so that we do not need to differentiate $p_s(y|\boldsymbol{x})$ and $p_t(y|\boldsymbol{x})$. In our implementation, we do not need to estimate $p(\boldsymbol{x}|y)$ and thus no additional assumptions needed. In our proposed estimator (see (11) and (12)), our estimator still utilizes $p_s(y|\boldsymbol{x})$ in the end rather than $p(\boldsymbol{x}|y)$. However, different from the maximum likelihood estimator that **requires** a correctly-specified model, our ELSA estimator adopts adopts the proposed semiparametric framework, which is more robust to the model misspecification. ### Questions > **Your Comment:** *Why estimating $P(X|Y)$ instead of estimating $P(Y|X)$ make the estimation $\omega$ more accurate.* **Our Response:** Thanks for your comment. First of all, We'd like to highlight that we do not estimate $p(\boldsymbol{x}|y)$. Instead, we treat it as an inifinte dimensional (i.e. nonparametric) nuisance function. We use $p(\boldsymbol{x}|y)$ to derive the perpendicular space without the need of estimation. The semiparametric model is more robust to model misspecification and offers more flexibility such that we can "integrate" any classificiation model without calibration (see Section 4.2 in our paper). > **Your Comment:** *It would be great to discuss the benefit of asymptotic normality to estimate $\omega$.* **Our Response:** Asymptotic normality enables us to perform hypothesis testing and inference. More specifically, given the null hypothesis $H_0:\boldsymbol{\omega}_0=\mathbf{1}$ (i.e. no distribution shift), one can construct a Wald statistic using the asymptotic normality property. Also, we can further construct confidence intervals for the estimated importance weights. We will add a remark under Theorem 3.2 to highlight it. ### Limitation > **Your Comment:** *To make the method based on estimating $P(X|Y)$ conceptually more accurate than the method based on estimating $P(Y|X)$. Additional assumptions should be required to the best of my knowledge.* **Our Response:** We do not need additional assumption as we do not need to estimate $p(\boldsymbol{x}|y)$. The more accurate estimation hinges on the proposed semiparametric moment matching framework and the carefully designed $h_{\mathrm{ELSA}}(\boldsymbol{x};\boldsymbol{\omega})$ in (11). ## Reviewer BekR ### Weakness > **Your Comment:** *The paper is hard to follow in certain places. In particular, the description and motivation of the method are not completely clear. Theoretically or empirically, it is unclear why ELSA gets better estimation than MLLS and BBSE. Importantly, what are the properties that the ELSA estimator depends on?* **Our Response:** Thanks for the suggestion. We will improve our presentation according to your comments. We will elaborate the description and motivation of our method in details in the rest of the responses. As for the properties, our ELSA estimator is derived based on the label shift assumption and can belong to the Z-estimator (see van der Vaart 1998). We do not need additional assumption beyond the regularity assumptions for the Z-estimator. Detailed disccusions can be found in the following responses. [van der Vaart 1998] Vaart, A. (1998). Asymptotic Statistics (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802256 ### Questions > **Your Comment:** *It is hard to follow how the authors derived the equation (11)* **Our Response:** Thanks for your comments. We will elaborate it point-by-point in below. <!-- We have elaborated the derivation of $h_{\mathrm{ELSA}}(\boldsymbol{x})$ later in the our responses. --> > * *Authors have not defined the phrase "nuisance tangent spaces". It might be good the elaborate on these things in Appendix A. It might also be good to include some background and formal statement of how the influence functions of estimators lay in the perpendicular space.* **Our Response:** Thanks for your suggestion. We will include their defintions and background details in the Appendix. <!-- We have provided more explanations later in the reponses, and will add more details on the background. --> > * *Given authors have some space at the end of page 8, it would be good to structure equations in Theorem 2.1 in a better way. The significance of Theorem 2.1 is unclear.* **Our Response:** Thanks for your suggestion. We will reformulate Theorem 2.1 in the revision. The significance of Theorem 2.1 is to provide the perpendicular space $\Lambda^\perp$, which provides a guidance on choosing the influence function. The perpendicular space also help us design the $h_{\mathrm{ELSA}}(\boldsymbol{x})$ function; detailed discussions are given later. > * *It might good to somewhere define RAL as regular and asymptotically linear.* **Our Response:** Thanks for the suggestion. We will add it in the appendix. > * *The significance of the assumption on function $g_p$ in Theorem 3.2 is hard to follow.* **Our Response:** The proposed estimator belongs to the family of the Z-estimator, and the conditions in Theorem 3.2 are standard regularity assumptions for the Z-estimator. More details on the Z-estimator and its regularity assumptions can be found in Chapter 5 in van der Vaart (1998). > * *Overall, the connection between finding an importance weight estimator and finding a perpendicular space is not clear.* **Our Response:** Our main utilization of semiparametric models is to derive the complememnt of the nuisance tangent space (i.e., the perpendicular space) $\Lambda^{\perp}$. Based on the semiparametric theory (Bickel et al. 1998, Tsiatis 2006), this space corresponds to the influence functions for estimating the parameter of interest $\boldsymbol{\omega}$. In other words, every element in $\Lambda^{\perp}$ corresponds to a RAL estimator of $\boldsymbol{\omega}$. Also, this space indicates, any function that is *not in this space* should ***not*** be used for estimating $\boldsymbol{\omega}$ in the interest of efficiency. For example, if $\boldsymbol{\phi}$ is an function that $\boldsymbol{\phi}\not\in\Lambda^{\perp}$, then one should not use $\boldsymbol{\phi}$ but to use $\Pi(\boldsymbol{\phi}|\Lambda^{\perp})$ instead. Here $$ \boldsymbol{\phi}=\underbrace{\boldsymbol{\phi}-\Pi(\boldsymbol{\phi}|\Lambda^{\perp})}_{\in\Lambda}\oplus\underbrace{\Pi(\boldsymbol{\phi}|\Lambda^{\perp})}_{\in\Lambda^{\perp}}, $$ and $\Pi(\boldsymbol{\phi}|\Lambda^{\perp})$ is the projection of $\boldsymbol{\phi}$ onto the space $\Lambda^{\perp}$. This is because by using the projection $\Pi(\boldsymbol{\phi}|\Lambda^{\perp})$ we can improve the efficiency (i.e., empirically, decrease the MSE of the estimator). ALso, if $\boldsymbol{\phi}\not\in\Lambda^{\perp}$, we cannot get the a estimator; thus, making it difficult to characterize the resulting estimator. > * *It would be good to elaborate on authors' obtain equation (11).* **Our Response:** The motivation of the $h_{\mathrm{ELSA}}(\boldsymbol{x})$ function starts from the score function with respect to $\boldsymbol{\omega}^{(-1)}=(\omega_1,\dots,\omega_{k-1})$, and denoted by $\mathbf{S}_{\boldsymbol{\omega}}(\boldsymbol{x})$. The $i$-th element of the score function is given by $$ [\mathbf{S}_{\boldsymbol{\omega}}(\boldsymbol{x})]_i\propto\frac{p_s(\boldsymbol{x})}{p_t(\boldsymbol{x})}\left\{p_s(y=i|\boldsymbol{x})-p_s(y=k|\boldsymbol{x})\right\},\quad i=1,\dots,k-1. $$ We could use $\mathbf{S}_{\boldsymbol{\omega}}(\boldsymbol{x})$ directly to construct an influence function for a RAL estimator. But we can improve efficiency (i.e., reducing estimation error) by projecting it to the perpendicular space $\Lambda^{\perp}$. Prioritizing computational efficiency and feasibility, we approximate the projection $\Pi(\mathbf{S}_{\boldsymbol{\omega}}(\boldsymbol{x})|\Lambda^\perp)$ with $$ \Pi(S_i(\boldsymbol{x})\mid \Lambda^\perp)\propto \kappa(\boldsymbol{x})S_i(\boldsymbol{x}), $$ where $\kappa(\boldsymbol{x})$ is a "bridging" function that needs to satisfy $$ \frac{1-\kappa(\boldsymbol{x})}{\kappa(\boldsymbol{x})}=E_t\left\{\frac{1-\Pr(R=1|Y,\boldsymbol{X})}{\Pr(R=1|Y,\boldsymbol{X})}|\boldsymbol{x}\right\}. $$ Under the label shift assumption, we further have $$ \frac{1-\kappa(\boldsymbol{x})}{\kappa(\boldsymbol{x})}=\frac{1-\pi}{\pi}E_t\left\{\frac{p_t(Y)}{p_s(Y)}|\boldsymbol{x}\right\}. $$ Next we will show tht the proposed function $h_{\mathrm{ELSA}}(\boldsymbol{x})$ is proportional to $\kappa(\boldsymbol{x})\mathbf{S}_i(\boldsymbol{x})$. Because $\kappa(\boldsymbol{x})\mathbf{S}_i(\boldsymbol{x})=\kappa(\boldsymbol{x})\dfrac{p_s(\boldsymbol{x})}{p_t(\boldsymbol{\boldsymbol{x}})}\left\{p_s(y=i|\boldsymbol{x})-p_s(y=k|\boldsymbol{x})\right\}$, we only need to verify that the denominator of $h_{\mathrm{ELSA}}(\boldsymbol{x})$ is proportional to the reciprocal $\kappa(\boldsymbol{x})\dfrac{p_s(\boldsymbol{x})}{p_t(\boldsymbol{x})}$: the denominator of $h_{\mathrm{ELSA}}(\boldsymbol{x})$ is $$ \begin{aligned} &\frac{E_s(\rho^2\mid \boldsymbol{x})}{\pi} + \frac{E_s(\rho\mid \boldsymbol{x})}{1-\pi}\\ \propto& \frac{p_t(\boldsymbol{x})}{p_s(\boldsymbol{x})}\frac{1-\kappa(\boldsymbol{x})}{\kappa(\boldsymbol{x})}\frac1{1-\pi} + \frac{p_t(\boldsymbol{x})}{p_s(\boldsymbol{x})}\frac1{1-\pi}\\ \propto& \frac{p_t(\boldsymbol{x})}{p_s(\boldsymbol{x})}\frac{1}{\kappa(\boldsymbol{x})}. \end{aligned} $$ > **Your Comment:** *How did the authors implement calibration and implement MLLS? Results about computational efficiency would depend a lot on these? Did authors use LBFGS to obtain convergence of the calibration parameters as it can be faster? Did the authors use the same number of CPU cores with all the methods?* **Our Response:** We ran the MLLS and calibrations with the python package `abstention` owned by the team of authors of Alexandari et al. (2020). We believe this is the state-of-the-art implementation for MLLS. We checked the calibration codes in `abstention`. For the optimization part, they indeed used L-BFGS-B method for the fast computation. In our experiments, all the methods are run under the same environment, which includes the same number of CPU cores. > **Your Comment:** *Will the authors be releasing an implementation of the ELSA estimator? It might be good to release an implementation of the approach either in the Appendix of the paper as a code or as a github repository.* **Our Response:** Yes, we have built a python package for our proposed ELSA method. We will release it on github after the paper is published. ## Reviewer jn6Z ### Weakness > **Your Comment:** *Line 97 states that BBSE method replaces $x$ with $\hat{y}$. There maybe some error in this statement.* **Our Response:** Thanks for your comment. The replacement of $x$ with $\hat{y}$ was proposed in Lemma 1 and Proposition 5 in Lipton et al. 2018. > **Your Comment:** *Line 98 and 99 state that $\hat{p}_s(y|x)$ is a trained model in the where clause. However, $\hat{p}_s(y|x)$ does not occur in the statement before the where clause statement.* **Our Response:** Thanks for your comment. $\hat{p}_s(y|x)$ could be any classification model trained with data from the source distribution. We will add more details in the revision for clarification. > **Your Comment:** *Line 403: Furthermore, We -> we* **Our Response:** Thanks for pointing it out. We have corrected the typo in the revised manuscript. ### Limitation > **Your Comment:** *The classification performance of the method is not known, even if the estimation error is low.* **Our Response:** Thanks for your comment. We agree that the classification performance is an important evaluation metric. In the table below, we show the comparison results on different adaptations and datasets. The metric we reported here is the the *classification accuracy* improvement of the domain-adapted model relative to the original model (Alexandari et al. (2020)). For example, the value $+5.86%$ for ELSA under MNIST is that ELSA adaptation improves the classification accuracy with $5.86\%$ with respect to the model without adaptation. We fixed the sample size as $4500$ and Dirichlet shift with $\alpha=0.1$. It can be seen that our ELSA outperforms the other existing methods across different datasets. We will include the classification performance comparison in our final manuscript. | Adaptation | MNIST | CIFAR-10 | CIFAR-100 | |------------|--------|----------|-----------| | BBSE-hard | +5.74% | +6.47% | +16.10% | | RLLS-hard | +5.74% | +6.47% | +17.12% | | BBSE-soft | +5.76% | +6.51% | +16.45% | | RLLS-soft | +5.77% | +6.52% | +17.66% | | MLLS | +5.75% | +6.55% | +14.01% | | ELSA | +**5.86%** | +**6.76%** | +**21.27%** |

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully