Jiuhai Chen
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ### Reviewer Mnik Thanks for the detailed comments. We address each one below. **Questions**: The notation used is easy to understand, as is the mathematical explanation in section 3, which is presented in a comprehensive but concise manner. "for EBBS we must fit the weak-learners to gradients from both the training and test nodes". This is the sentence that I am most concerned about, as the use of test data in the training phase may render the results obtained invalid. Although the authors give their own explanation of why the test nodes should also be used during training, i.e. for the propagation of information in the graph, if the test labels are used during the train there is no longer any separation between train and test. I have this doubt because the labels are used to calculate function-space gradients. Is this correct? **Response**: Actually test node *labels* are never used during training; critically, only test node *input features* are utilized. This is because the test node input features contribute to the prediction function for nearby training node labels. Consequently, there is no data leakage issue, and the function-space gradient signal used for training is completely independent of test node labels. **Questions**: Algorithm 1 could be described in a little more detail. The analysis of the convergence of the method by theorem is very good. The remarks connected to the theorem are interesting, but could have been treated in more detail (they are in part in the supplements). **Response**: This year it appears that the page length for ICLR will not be extended to allow for extra content addressing reviewer concerns. So while we agree with the reviewer that more detail in these areas would be quite helpful, unfortunately we do not presently have sufficient space outside of the supplementary. **Questions**: If experiments are done with different random seeds (as stated), then results in Table 1 should be reported with their corresponding standard deviation or confidence interval. Why are some results reported with different decimal places in the table? I am talking specifically about the CS column, but also for the Slap, DBLP and Phy columns one decimal place could be added. **Response**: Comprehensive discussion of standard errors and statistical significance was originally deferred to the supplementary to save space, noting that the Table 1 caption provided this reference. However, we agree with the reviewer that it could be nice to include at least some of these results in the main paper, which only strengthens the support for EBBS. Therefore, while it is hard to expand Table 1 to include all details within the page limit, we have added a new small Figure 1 and attendant discussion to demonstrate how our approach actually achieves the best performance across all random trials. Regarding the decimal place issue, this was our formatting mistake, and we have made them all consistent in the revision. Thanks for catching this. **Questions**: If the datasets were taken from Ivanov & Prokhorenkova (2021), why was Wiki not taken? The reason should be that, being a homogeneous dataset, then, as explained by Ivanov & Prokhorenkova (2021), "neural network approaches are sufficient to achieve the best results". It would still be interesting as a comparison. **Response**: We did not run comparisons on the Wiki data in the original submission because it is a regression task with homogeneous node features, and therefore we did not feel it was an important benchmark for tabular+graph models (we did however include OGB-ArXiv results in Table 2, largely because this is a very famous node classification benchmark). Indeed, from reference Ivanov & Prokhorenkova (2021) Table 2, the BGNN model was not competitive on the Wiki data, being outperformed by most GNN models. Regardless, per the reviewer's comment, we have now tried running our EBBS model on the Wiki data with splits from Ivanov & Prokhorenkova (2021), and the trial-averaged performance is 41432 RMSE, which is better than BGNN's 48119 RMSE. Note that the average EBBS improvement is 6687, which is significant given that the stdev of the gap across trials is only 3473 (and EBBS has lower error across all trials). **Questions**: Also as a comparison with them, the House and VK datasets could also be used for classification. They also report the standard deviation of all results. **Response**: House_class and VK_class are actually redundant, synthetic classification datasets. More specifically, in the BGNN paper Ivanov & Prokhorenkova (2021), the original House and VK regression datasets were merely modified by converting numerical target labels into several discrete classes by thresholding. Thus we omitted these two artificial datasets and instead employed two real classification datasets popular in the GNN literature: Coauthor-CS and Coauthor-Phy. **Questions**: Also, the results in the table match those of Ivanov & Prokhorenkova (2021), but I do not understand why their LightGBM results row has become the CatBoost row in this article for the Slap, DBLP and OGB-ArXiv datasets. Is this perhaps an error? **Response**: Yes, this discrepancy seems to originate from the original arxiv version; please refer to the camera-ready version of Ivanov & Prokhorenkova (2021). **Questions**: "Although tabular graph data for node classification is widely-available in industry, unfortunately there is currently little publicly-available, real-world data that can be used for benchmarking." This sentence is very vague and I am not fully convinced of its veracity. **Response**: As future work, we are looking to create more suitable classification benchmarks that combine tabular node features with graphs. This is critical given that, while industry is awash in such datasets (or the means of extracting such datasets from widely-prevalent relational databases), representative publicly-available benchmarks are slim to none. **Questions**: The mention of the method called CatBoost+ is interesting, but it is given too little space. Why is it not considered in "ours"? If the idea is picked up by some other work, let it be mentioned properly. "revealing that it may be more robust to non-ideal use cases". That's why it might be interesting to add homogenous datasets and see if it applies there too. While in the main part it says "This suggests that in new application domains it may conceivably be easier to adapt EBBS models", in the supplements it says "It shows EBBS can be run with mostly shared hyperparameters across all datasets". I don't think there are enough experiments/results to say that, but I'd stick with "suggest" in the supplementary materials as well. Maybe add a sentence about the possibility of exploring this area more in future work. **Response**: Yes, CatBoost+ can be considered as our method, at least in the sense that prior work has not proposed this specific baseline. Actually, EBBS can be viewed as a way of training CatBoost+ in an end-to-end fashion. Given that our paper's main contribution is to propose EBBS and provide related convergence analysis, we did not talk too much about Catboost+ to accommodate the limited space. Additionally, in terms of further testing with homogeneous node features, please see our response from above that mentions new results with Wiki data. We have also edited the wording in the supplementary per the reviewers suggestion, and reiterate that both Figure 1 (main paper) and Figure S2 (supplementary) support the notion that it may be easier to adapt EBBS models for new use cases. ### Reviewer ahft Thanks for the detailed comments. We address each one below. **Questions**: It [the paper] could perhaps be made stronger by including some of the additional analysis that is in the supplemental material that investigates the trade-offs and ablations of the approaches, in the main body of the text. **Response**: We agree with this suggestion; however, unfortunately the main paper already consumes the full 9 pages and this year ICLR does not grant additional space for addressing reviewer recommendations. **Questions**: I think that the paper could be made much stronger with a simple motivating (perhaps synthetic) example that illustrates where and when EBBS can be useful compared to competing methods. While convergence guarantees and motivations are described, a clear simple example (which might further be useful in using ablations to identify contributions of different parts of the solution) could strength the paper. **Response**: Perhaps one motivational example is the additional stability loosely afforded by our convergent algorithm when applied under non-ideal testing conditions. For example, in Figure 1 we observe how the training curve of EBBS remains stable even though the hyperparameters were simply borrowed from a different dataset. Beyond this though, we unfortunately don't have space to introduce and analyze a new synthetic model. ### Reviewer omBE Thanks for the detailed comments. We address each one below. **Questions**: We can be more clear about how Eq 2 is rooted in Zhou et al 2004. In fact, I didn't get it when I checked the referenced paper. **Response**: The key identity is $\mbox{tr}\left[Z^\top L Z \right] = \sum_{\{i,j\} \in \mathcal{E} }\left\| z_i - z_j \right\|_2^2$, which is also discussed in the sentence beginning with "Intuitively, solutions of ..." in our submission. The only relevant difference then is just that in Zhou et al. (2004) (see their equation 4) a weighted graph and a normalized Laplacian are used (the latter leads to the inverse square-root of the node degree factors), common substitutions that can also be seemlessly integrated within our EBBS framework if needed. **Questions**: In (2), are both Z and \theta learnable? **Response**: In equation (2), the only optimization variable is $Z$. However, the optimal solution of (2) will be a function of $\theta$ as shown in equation (3). And once we plug this optimal, $\theta$-dependent value into the meta-loss function from equation (7), we can then optimize over $\theta$. This is the crux of our bilevel optimization scheme. Another way to view this process is that the solution of (2) produces a set of features that are useful for making predictions by solving (3), and bilevel optimization allows for full end-to-end optimization of both features and the final predictor. **Questions**: P is binded twice, once in P* (Eq 3) and once in P^{k} (Eq 6). **Response**: We may have misunderstood the question, but equation (3) provides a specific definition for $P^*$, while in equation (6) the notation $P^{(k)}$ is used to denote an approximation to $P^*$ obtained by conducting $k$ gradient steps (which can be computed via equation (5); please also see text below equation (6)). **Questions**: In Eq 7, what is for inner level optimization and what is for outer level optimization? **Response**: In equation (7), $\widetilde{f}^{(k)}\left(X; \theta \right)$ denotes a graph regularized version of the base estimator $f^{(k)}\left(X; \theta \right)$, and it is the approximated minimum solution of energy equation (2). This can be referred to as the inner-level optimization. In contrast, the outer-level optimization is an application-specific meta-loss minimization problem, e.g regression or classification task. We specify the meta-loss function in equation (7). **Question**: Is "Graph-Aware Propagation Layers" terminology used in the literature? **Response**: This is a descriptive phrase as opposed to formal terminology, although similar phrasings can be found in the literature. Perhaps more completely though, a graph-aware propagation layer simply refers to a graph-dependent function that smooths (i.e., propagates) an input vector across the edges of a graph. **Question**: it seems that the proposed method EBBS will be incorporating test nodes during training. Will this cause test information to leak into the training process? Is there any specific preprocessing to avoid leaking? **Response**: Actually test node *labels* are never used during training; critically, only test node *input features* are utilized. This is because the test node input features contribute to the prediction function for nearby training node labels. Consequently, there is no data leakage issue, and the function-space gradient signal used for training is completely independent of test node labels. **Question**: Is EBBS easy to implement? **Response**: EBBS is easy to implement, which is one of its attractive qualities. The code will be released after final decisions. ### Reviewer zcK6 Thanks for the detailed comments. We address each one below. **Questions**: The studied problem does not seem particularly novel to me, especially given BGNN. Given BGNN, the scope seems a bit narrow to me (although I acknowledge that the authors solve the problem in a potentially better way than the BGNN paper). **Response**: We certainly agree that the BGNN paper deserves the credit for first addressing the problem of combining boosting with GNNs for handling tabular data. That being said, we still believe that there remain important directions for innovation in this space (especially given the vast relevance of tabular data with attendant relations across numerous industry applications). For example, our specific design of an integrated bi-level loss that facilitates convergence guarantees and simplified yet performant practical deployment can be viewed as novel contributions. **Questions**: I am curious to see the result of XGBoost + C&S (e.g., use XGBoost as the base predictor in C&S). **Response**: This is a good suggestion. Actually we have previously tried C&S with various forms of boosting as the base predictor. The results are very similar to the CatBoost+ baseline reported in our paper. **Questions**: Does the framework supports any propagation rules beyond (6)? I would be curious to see how general the method is. **Response**: Yes, our framework supports a broad family of general propagation rules. Please see Section 3.4 of our submission which presents several possibilities. Even so, the simple rule from equation (6) performs well without adding extra hyperparameters. **Question**: Could you please actually include the results of XGBoost + C&S? I believe this would help the paper, as people think C&S + strong base predictor is the state-of-the-art. For C&S, there are two hyper-smoothing parameters you need to carefully tune in order to achieve the best performance. **Response**: Thanks for the response and question. Per the reviewer's suggestion, we have quickly run XGBoost + C&S, with all hyperparameters for XGBoost and C&S carefully tuned to achieve the best performance on the node regression benchmarks. The trial-averaged RMSE results, compared with CatBoost+ and EBBS as reported in our paper, are as follows: | | House | County | VK | Avazu | |--------------|-------|--------|------|--------| | XGBoost+C&S | 0.55 | 1.25 | 6.97 | 0.1097 | | Catboost+ | 0.54 | 1.25 | 6.96 | 0.1083 | | EBBS | 0.45 | 1.11 | 6.90 | 0.1062 | From these results we observe that XGBoost+C&S performance is nearly the same as CatBoost+ as expected given their analogous basic structure, and our end-to-end EBBS approach remains superior. Note that prior to our submission we tried CatBoost and LightGBM combined with several different forms of graph propagation post-processing, including C&S. And generally speaking the results were similar. That being said, C&S is definitely a powerful method, and as new tabular benchmarks become available in the future it may offer a more pronounced advantage (one that EBBS can actually exploit as well via convergent end-to-end training; this involves combining results from Sections 3.1 and 3.4 of our submission). In any event, we had not previously experimented with XGBoost, so XGBoost+C&S was nonetheless a useful benchmark to test; good suggestion. **Response 2:** Additionally, one quick follow-up point that may be worth addressing here. The reviewer mentioned that a strong base model plus C&S can often achieve SOTA results. In fact, our results are basically consistent with this assertion. For example, the XGBoost base model plus C&S (see results in the added rebuttal table above) mostly outperforms all of the GNN baselines reported in Table 1. And it is really only EBBS that consistently outperforms XGBoost+C&S via convergent, end-to-end training of a strong tabular+graph model. <!-- ~~Table shows XGBoost+C&S has similarly performance with Catboost+C&S, but worse than EBBS. All hypeparameters for XGBoost and C&S are carefully tuned in order to achieve the best performance. Actually Catboost+ shown in our paper is using Catboost as base predictor and then do C&S, we don't think it makes a big difference when we replace the base model Catboost with XGboost, they are both strong tools dealing with Tabular data. Furthermore, EBBS can be viewed as a way of training CatBoost+ in an end-to-end fashion. We agree with the author that strong base predictor + C&S can achieve good performance for some tasks, but here EBBS with end-to-end training is always superior than other baselines.~~ --> **Question:** Indeed, I am positively convinced by your work and your comments only strengthen my perception. I suggest you to try and squeeze in as many remarks as possible from the ones you have made above. **Response:** Thanks for taking the time to read through our detailed response. We will try to squeeze in as much as possible, contingent on space considerations and any new requirements presented by other reviewers during the discussion period.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully