Hyeonah Kim
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # [Rebuttal] Meta-SAGE ## Opening Thanks for the valuable comments. Reviewers' comments will be extremely helpful in improving our paper. We provide detailed responses for each reviewer. Also, we would like to provide general responses regarding our research target below. --- The main goal of this study is not to create a state-of-the-art solver for specific CO tasks. The primary claim is that our new meta-learning algorithm can mitigate the scale shift of pre-trained DRL models applicable to solving various CO tasks. We propose a meta-learning based test-time adaptation method, not an NCO solver itself. To clarify our motivation and research direction, we state our research objective below (and it will be updated in the main paper). **Research Objective** Our main research objective is to mitigate the scale distributional shift of existing DRL-NCO models by leveraging our novel meta-learning and scheduled test-time adaptation techniques. Our method is a supportive method, which means it is not standalone. However, our method can be applied to improve various 'existing' DRL-NCO methods. Similar to our work, efficient active search (EAS) and simulated guided beam search (SGBS) were proposed as NCO test-time adaptation methods. However, they focus on problems with relatively small sizes ($N \leq 200$), so they show limited performances on large-scale problems ($N > 200$). Note that the distribution shifts get bigger when the test size increases, as the pre-trained models with $N = 100$ are employed for all cases. Therefore, we suggest an improved version of EAS by introducing meta-learning and temperature scheduling to get high performance even at larger distribution shifts (e.g., pre-trained on $N=100$, tested on $N=1000$). ## Reply to Reviewer w7eR **Important Note.** Although the rebuttal period for this emergency review is less than 24 hours, please don't hesitate to bring up any outstanding concerns during the upcoming reviewer-author discussion period from March 20th to March 26th. We appreciate your input and look forward to addressing any remaining issues at the reviewer-author discussion period (Mar 20 ~ Mar 26). ### Research scope and baseline choices Thanks for the insightful comments. We agree that there are remarkable algorithms based on mathematical programming, and these algorithms have theoretical guarantees. We categorize approaches in neural combinatorial optimization (which might not be in the traditional categories in the operations research area) based on the algorithmic structures as follows: - Heuristic-Based Methods (mainly described in the paper) - **Constructive heuristics** (Auto-regressive) - Improvement heuristics - Mathematical Programming-Based Methods - Learning to configure algorithms: deep learning models support decisions related to configurations (i.e., parameters) of algorithms (e.g., weighting rules for branching score [1], scheduling of primal heuristics in branch-and-bound [2]) - ML-augmented: algorithmic components, consisting of heuristics or computationally expensive, are replaced by deep learning models (e.g., selection of branching nodes [3,4] or cutting planes [5]). In this paper, we **concentrate on** improving deep learning **constructive heuristics** because of the following reasons. 1. In mathematical programming-based methods, the main algorithms are not trained from data; thus, the main algorithms do not suffer from distribution shifts. 2. Mathematical programming-based methods require a relatively long time to get a solution. Therefore, test adaptation schemes which solve a given problem several times to train models are unsuitable. 3. Improvement heuristics already contain instance-specific processing, such as iterative evaluation of objective function value (specified only when the instances are specified). *We will revise the paper to state our research scope more clearly.* **Motivation of (Euclidean) DRL-NCO, including our method.** The primary goal of deep reinforcement learning-based neural combinatorial optimization (DRL-NCO) is not to outperform or replace existing operations research (OR) methods but rather to improve the learning efficiency and generalization capability of DRL methods on combinatorial space. DRL-NCO has a unique advantage in its applicability to black-box combinatorial optimization problems that cannot be solved through traditional mathematical programming. For example, DRL-NCO can be used for protein optimization [10] and hardware routing [11]. The Traveling Salesman Problem (TSP) and its variants are excellent benchmarks for measuring the exploration capability and learning efficiency of DRL methods, which is the main reason for their use as benchmarks. **Motivation of our adaptation method and baseline choice.** In this research, we aim to address the issue of **distributional shift at scale**, which is a crucial topic in machine learning, particularly for deep reinforcement learning (DRL). We focus on exploring the effectiveness of adaptation methods to mitigate distribution shifts in the combinatorial action space and episodic Markov decision process (MDP), which has received less attention in the ML community. We compare our proposed approach with existing adaptation methods such as active search, efficient active search (EAS), and simulated guided beam search (SGBS) that aim to mitigate distribution shifts in combinatorial action spaces. The primary evaluation metric for our study is the efficiency of these adaptation methods in mitigating distribution shifts: 1. Our first objective is to minimize the cost while keeping the number of adaptations fixed at $K$ (which is proportional to spent time for test-time adaptation). This can be seen in Table 1 and Table 3, where we compare the performance of different adaptation methods in terms of their cost. 2. Our second objective is to minimize the number of adaptations required to achieve a specific cost goal. This is demonstrated in Figure 5 and Figure 8, where we analyze the performance of different adaptation methods in terms of the number of adaptations required to achieve a specific cost. Our goal is to identify the adaptation method that requires the fewest number of adaptations (K) to achieve the desired cost goal. The OR baselines are not our primary competitors but are important measures for objectives 1 and 2. We conduct comparative experiments with EAS and SGBS to verify the proposed meta-learning and bias scheduling approach. The results demonstrate that our method is effective in mitigating (scale) distribution shifts in both fixed adaptation scenarios (i.e., objective 1) and achieving specific cost goals (i.e., objective 2). Furthermore, our findings demonstrate that our method outperforms EAS and SGBS in terms of both objectives 1 and 2, providing further evidence of its effectiveness in handling distribution shifts. ### Applicapability of prposed locality bias. Thanks for bringing this issue to our attention. We appreciate your feedback and will address it in the main paper's discussion section. The locality bias can be applied beyond Euclidean optimization problems. For instance, in non-Euclidean routing problems such as the Asymmetric TSP, edge attributes are defined as distances between nodes, and the graph is fully connected. In this scenario, our Meta-Sage model with locality bias can be directly applied. A recent study [12] has demonstrated the effectiveness of a non-Euclidean NCO solver based on an AM-like architecture, which can be further enhanced by incorporating Meta-Sage for test-time adaptation. Additionally, the locality bias can also be leveraged in the field of 3D drug discovery. Here, the relative distance (i.e., 3D Euclidean distance) between atoms of a molecule plays a crucial role as an inducing bias [13]. **Important remark 1.** Our locality bias is not a hard heuristic rule but rather a soft bias designed to aid in training. While it is true that the locality bias may not work for all graphs, we would like to clarify that our proposed algorithm, the AM, incorporates a learnable neural architecture that includes a global attention mechanism between non-local nodes. Additionally, the level of locality bias gradually decreases over time. While we understand your concern, we believe that our approach effectively balances the benefits of locality bias with the need for a global attention mechanism. **Important remark 2.** Our approach goes beyond simply suggesting a specific inductive bias - we also provide a way to control the bias level based on the adaptation level. Therefore, our method is not limited to only addressing locality bias; it can be used to control other meaningful inductive biases as well. In our paper, we argue that while inductive bias can be helpful in the early stages of adaptation, it should be gradually reduced as adaptation progresses. Our key insight is that inductive bias is only useful when dealing with significant distributional shifts. This can be compared to how parents hold onto their child's bicycle during the early stages of learning, but eventually let go once the child is capable of riding on their own. We believe that this novel perspective can lead to a better understanding and application of inductive bias in machine learning. ### Instance generations and evaluation metric **Instance generations.** We generate training and test instances following Kool et al.[6], like many works in NCO [7-9]. *We will add these in appendix*. - TSP: ($x, y$) coordination of each node is randomly sampled to obtain a uniform distribution within the range of $[0,1]$. - CVRP: ($x, y$) coordination of each node is uniformly distributed within $[0,1]$, while the demand of each node is sampled from the uniform distribution within the range of $[1, 9]$. - PCTSP: ($x, y$) coordination and the penalty of each node is sampled from the uniform distribution within the range of $[0,1]$. - OP: ($x, y$) coordination of each node is randomly sampled to obtain a uniform distribution within $[0,1]$. Furthermore, each node's price ($\rho_i$) is determined based on its distance to the depot ($\rho_i = 1 +\lfloor 99 \cdot \frac{d_{0i}}{max_{j=1}^{n}d_{0j}}\rfloor$), such that it increases with distance. **Evaluation metric.** We measure the performance using baseline solvers for each task. As commented, this is not an optimality gap since these baselines are not exact methods (except Concode, which is based on the cutting plane method); thus, *we changed our terminology in the paper*. Precisely, (Performance) Gap is computed as $$ \text{Performance Gap} = \frac{obj - obj_B}{obj_B} \times 100 (\%),$$ where $obj$ is the objective value of DRL methods, and $obj_B$ is the objective value from baseline solvers (Concode, LKH3, OR-Tools, and Compass). **Solver setting.** We set the solve parameters following AM [6] for CVRP, PCTSP, and OP. - Concorde for TSP: We implemented the code using Julia and used the default setting of Concorde.jl with scaler 10,000 (since Concord supports integer coefficients only); see https://github.com/chkwon/Concorde.jl for details. - LKH3 for CVRP: We conduct 10 runs, and eEach run is performed with a maximum of 10,000 tirals. We use 'SPECIAL' parameters as specified in CVRP runcsript (run CVRP in http://akira.ruc.dk/˜keld/research/LKH-3/BENCHMARKS/CVRP.tgz). - OR-Tools for PCTSP: We set parameters the same as AM except for the local search time. We set the local search time as 60 sec while AM used 10 sec. - Compass for OP: We used the default parameters; see (https://github.com/bcamath-ds/compass for details. **Results in Table 1 (synthetic benchmarks of random instances).** Consistent with other literature on NCO [6-9], we present our experimental findings as the average score and performance gap on random instances in Table 1. This approach allows us to make meaningful comparisons between neural methods, as the instance distribution is uniform within the range of $[0,1] \times [0,1]$, and the optimal score is similar across instances. Empirically, neural methods that give consistently high average scores outperform other neural methods that give low average scores. However, we acknowledge that Table 1 alone is insufficient to claim that ML methods outperform OR methods, as a uniform distribution may not provide a comprehensive assessment of solver performance. **Addtional result of real-world benchmark on CVRPLIB.** In contrast to synthetic benchmarks, real-world benchmarks contain various distributed instances and are not normalized at $[0,1] \times [0,1]$. To address this, we compute the optimality gap between the optimal value and make an average between the optimal gap. As suggested, we present our findings for CVRPLIB, which show that our method outperforms other neural adaptation methods as follows: | Range | LKH3 | Sym-NCO EAS | Sym-NCO Ours | |----------------|-------------|-------------|--------------| | 100 ≤ N < 200 | 0.003±0.002 | 0.017±0.010 | 0.016±0.008 | | 200 ≤ N < 300 | 0.009±0.005 | 0.025±0.009 | 0.021±0.009 | | 300 ≤ N < 400 | 0.010±0.004 | 0.034±0.016 | 0.024±0.010 | | 400 ≤ N < 500 | 0.013±0.006 | 0.045±0.017 | 0.027±0.011 | | 500 ≤ N < 600 | 0.010±0.006 | 0.050±0.019 | 0.032±0.008 | | 600 ≤ N < 700 | 0.013±0.006 | 0.058±0.018 | 0.036±0.009 | | 700 ≤ N < 800 | 0.018±0.006 | 0.056±0.013 | 0.031±0.008 | | 800 ≤ N < 900 | 0.017±0.017 | 0.084±0.038 | 0.049±0.018 | | 900 ≤ N ≤ 1000 | 0.015±0.008 | 0.073±0.029 | 0.042±0.015 | **Note**: Here, we compute the 'optimality gap' based on the best-known objective value (reported as the upper bound in the CVRPLIB website) as follow: $$\text{Optimality Gap} = \frac{obj - best}{best} \times 100.$$ ### Reference [1] Balcan, M.-F., Dick, T., Sandholm, T., and Vitercik, E. (2018). Learning to branch. In International conference on machine learning, pages 344–353. PMLR. [2] Chmiela, A., Khalil, E., Gleixner, A., Lodi, A., and Pokutta, S. (2021). Learning to schedule heuristics in branch and bound. Advances in Neural Information Processing Systems, 34. [3] Gasse, M., Chételat, D., Ferroni, N., Charlin, L., & Lodi, A. (2019). Exact combinatorial optimization with graph convolutional neural networks. Advances in neural information processing systems, 32. [4] Gupta, P., Gasse, M., Khalil, E., Mudigonda, P., Lodi, A., & Bengio, Y. (2020). Hybrid models for learning to branch. Advances in neural information processing systems, 33. [5] Tang, Y., Agrawal, S., & Faenza, Y. (2020). Reinforcement learning for integer programming: Learning to cut. In International conference on machine learning (pp. 9367-9376). PMLR. [6] Kool, W., van Hoof, H., & Welling, M. Attention, Learn to Solve Routing Problems!. In International Conference on Learning Representations. [7] Kwon, Y. D., Choo, J., Kim, B., Yoon, I., Gwon, Y., & Min, S. (2020). Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33, 21188-21198. [8] Kim, M., Park, J., & Park, J. Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization. In Advances in Neural Information Processing Systems. [9] Hottung, A., Kwon, Y. D., & Tierney, K. Efficient Active Search for Combinatorial Optimization Problems. In International Conference on Learning Representations. [10] Angermueller, C., Dohan, D., Belanger, D., Deshpande, R., Murphy, K., & Colwell, L. (2020). Model-based reinforcement learning for biological sequence design. [11] Kim, H., Kim, M., Kim, J., & Park, J. Collaborative symmetricity exploitation for offline learning of hardware design solver. In 3rd Offline RL Workshop: Offline RL as a''Launchpad''. [12] Kwon, Y. D., Choo, J., Yoon, I., Park, M., Park, D., & Gwon, Y. (2021). Matrix encoding networks for neural combinatorial optimization. Advances in Neural Information Processing Systems, 34, 5138-5149. [13] Liu, Y., Wang, L., Liu, M., Lin, Y., Zhang, X., Oztekin, B., & Ji, S. (2022, January). Spherical message passing for 3d molecular graphs. In International Conference on Learning Representations (ICLR). ### Specific comments. Thanks for the detailed comments. We revised our manuscript based on the comments. Q1. I don't understand the main analogy at the end of the introduction. A1. We agree that this analogy leads to some misunderstanding. We will remove it. Q2. Definition of state is not clear in the statement on line 88. $\boldsymbol{x}$ is a vector, but the space/components of this vector are not well defined. The reader is left wondering whether they are binaries, real numbers, or points in space? I understand that there are $N$ components to this vector but beyond this it is not clear. This ambiguity is resolved in the example given in Figure 1. A2. $\boldsymbol{x}$ is a vector of 2D codinations, so $\boldsymbol{x} \in \mathbb{R}^{2N}$. Usually, $\boldsymbol{x}$ is used for the decision variables in MIP, but $\boldsymbol{x}$ is used to denote node information of a problem instance in this paper. Q3. Line 114 what is "in-differentiable"? I believe you mean "non-differentiable". A3. Yes, "non-differentiable" is correct, we revised it. Thanks. Q4. Line 120 "In previous literature..." is used but no citation is given. A4. We added citation. Q5. Avoid using the notation $h^T$ for target embedding as this looks like "transpose". A5. We revised it as $h^A$ (A means 'Adapted'). Q6. The parameter $K$ is introduced only in the introduction and not at all in the main math making it very easy to lose track of. A6. Thanks for pointing it out. We revised to explain the meaning of $K$ at the main and experiment repetively to improve readability. Q7. The introduction of the four main CO problems is very informal and requires knowledge of these problems in order to under their graphical nature. While TSP is well known to a broad audience, CVRP, PCTSP, and OP are somewhat more specific and so a more formal introduction may be warranted. A7. We agree. We will provide additional to explain CVRP, PCTSP and OP in the main text. Q8. The statement of the goal of PCTSP is not very clear. The objective is to minimize the penalty subject to achieving a minimum reward. A8. Thanks for the comments. The cost function is the sum of total distances and penelties with achieving the given minimum prizes. We will revise the main paper to state more clearly. Q9. The mention the Euclidean nature of the CO problems in order to justify the locality bias A9. We already made discussion for euclidea nature of CO we focused. --- ## Reply to Reviewer zgFH ### Hyperparameters The followings are mentioned as hyperparameters in the review: 1. the number of test time adaptation steps ($K$) 2. balance between RL and IL objective ($\lambda$) 3. locality bias coefficient ($\alpha$) 4. scheduling of locality bias ($\gamma_1$) 5. temperature scheduling ($\gamma_2$) However, $K$ and $\lambda$ are from EAS, not our method; thus, we follow the choices of EAS for these hyperparameters. Note that $K$ is a controllable parameter depending on the computation budget. The performances rely on $K$, but we set $K$ as 200 for ours and EAS. In the case of locality bias coefficient $\alpha$, it is not a hyperparameter since it is initialized as one and changed by scheduling operation (Eq. 5) like temperature. We agree that $\alpha$ looks like one of the hyperparameters, which is misleading; we will revise related expressions to improve readability. Therefore, only two hyperparameters, $\gamma_1$ and $\gamma_2$, are newly introduced, but, we set $\gamma_1 = \gamma_2$ in practice. *We provide sensitivity analysis for these hyperparameters below.* | | Meta-SAGE | Meta-SAGE | Meta-SAGE | EAS | | -------- | --------- | --------- | --------- | --- | | $(\gamma_1,\gamma_2)$| (0.99, 0.99) | (0.995, 0.995) | (0.999, 0.999) | (1.00,1.00) | | TSP500 | 62.74 | 62.69 | 62.89 | 63.97 | | CVRP500 | 17.20 | 17.22 | 17.45 | 18.63 | We conduct experiments on TSP and CVRP with 500 customers. The parameters are set $0.99 \leq \gamma_1 = \gamma_2 < 1.0$, since they are exponential decaying rates. The results show that our method is robust to the changes of $\gamma_1$ and $\gamma_2$ and consistently outperforms EAS (baseline method). **Important note:** We set identical hyperparameters for every 12 tasks: three scale benchmarks ($N=200,500,1000$) for four combinatorial optimization problems (TSP, CVRP, PCTSP, and OP). Hyperparameter choices are not critical to the performances. ### Effectiveness of SML Although our temperature scheduling gives an effective improvement on EAS, our SML gives a more effective improvement. The table below shows that EAS + SML (row 2) performs better than EAS + Scheduled temperature (row 3). If every algorithmic component is applied, the best performances are reported. | SML | Sche. Temperature | Sche. Locality Bias | TSP | CVRP | | --- | ----------------- | ------------------- | ----- | ----- | | | | | 18.63 | 63.97 | | :heavy_check_mark: | | | 17.70 | 63.17 | | | :heavy_check_mark: | | 17.73 | 63.26 | | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | **17.22** | **62.69** | ### Standard deviation Thanks for the constructive suggestion. *We updated table 3 in the paper with the standard deviation of multiple instances* as follows: **Table 3** | Range | LKH3 | Sym-NCO EAS | Sym-NCO Ours | |----------------|-------------|-------------|--------------| | 100 ≤ N < 200 | 0.003±0.002 | 0.017±0.010 | 0.016±0.008 | | 200 ≤ N < 300 | 0.009±0.005 | 0.025±0.009 | 0.021±0.009 | | 300 ≤ N < 400 | 0.010±0.004 | 0.034±0.016 | 0.024±0.010 | | 400 ≤ N < 500 | 0.013±0.006 | 0.045±0.017 | 0.027±0.011 | | 500 ≤ N < 600 | 0.010±0.006 | 0.050±0.019 | 0.032±0.008 | | 600 ≤ N < 700 | 0.013±0.006 | 0.058±0.018 | 0.036±0.009 | | 700 ≤ N < 800 | 0.018±0.006 | 0.056±0.013 | 0.031±0.008 | | 800 ≤ N < 900 | 0.017±0.017 | 0.084±0.038 | 0.049±0.018 | | 900 ≤ N ≤ 1000 | 0.015±0.008 | 0.073±0.029 | 0.042±0.015 | ### Grammar comments Thanks for the helpful grammar comments. *We revised every grammar mistake based on the comment*. Note that we changed the notation of $h^{T}$ to $h^{A}$ (A means 'Adapted'). ## Reply to Reviewer 7eg7 ### Summarized comments and strategies ### Readability of paper in the introduction Thanks for the valuable comments. We agree that the motivation and the objective of our research are not clearly stated; *we added the motivation section and problem target subsection in the introduction to make them clear*. Our primary **motivation is mitigating the distributional shift**, precisely, scale shift in this work, of pre-trained NCO models. To this end, we suggest a new meta-learning style method based upon a recently proposed transfer learning algorithm (i.e., EAS). We improve EAS to make faster adaptations and handle larger distributional shifts (i.e., solving problems with $N > 200$ using the pre-trained model with $N=100$). Our research focuses on how to mitigate existing pre-trained DRL models' distributional shifts. To explain the differences clearly, we present massive literature on previous research on NCO in the introduction to show a historical perspective on the recent research flows. *We also revised the introduction for better readability overall*. ### Experiment setting The main goal of this study is not to create a state-of-the-art solver for specific CO tasks. Instead, the primary claim is that our new meta-learning algorithm can mitigate the scale shift of pre-trained DRL models applicable to solving various CO tasks. Since the proposed method is a test-time adaptation method, not an NCO solver itself, we employ other adaptation methods, such as SGBS and EAS, for baselines. The experimental results demonstrate the effectiveness of Meta-SAGE in mitigating scale shifts, even for much larger scales beyond the range of existing NCO transfer learning methods. Moreover, the findings depict that our method, as well as SGBS and EAS, can be applied to existing pre-trained models to improve their performances without limiting the generalizability of DRL solvers. It is worth noting that these test-time adaptation methods can also be integrated into existing large-scale CVRP models. **Note:** Our method outperforms EAS and SGBS at $N=200$, one of their main benchmarks. ### Compasisons with other large-scale algorithms We acknowledge that many promising neural methods for large-scale CVRP can provide notably high performances with fast speed [1, 3]. However, these algorithms usually contain a hierarchical structure and employ problem-specific heuristic solvers like LKH (Lin-Kernighan-Helsgaun). For example, learning to delegate (L2D) has shown promising performances on large-scale CVRP. L2D first assigns customers into several clusters using the supervised learning model, then complete each tour via LKH. This approach requires expert labels and SOTA heuristic solvers to complete solutions; thus, its application to other CO tasks. To be specific, there are two conditions for applying this approach. One is that the problems need to be decomposed (e.g., CVRP contains multiple tours so that it can be decomposed based on tours). The other is there is a solver that can give high-quality solutions (e.g., there are high-quality solvers for TSP, but this is not guaranteed in general). DRL solvers like L2D are directly trained to find high-quality solutions to the target problems, but Meta-SAGE is trained to relieve distributional shifts of the base DRL solvers. Since **our method is a general test-time adaptation method**, it **can be embedded into the existing large-scale CVRP models** as well. Thus, as mentioned in the experimental setting, we conducted comparative experiments with other transfer learning algorithms for NCO, not with the neural methods for large-scale. ### The use of the terminologies ('large-scale' vs. 'larger-scale') The comment about the 'larger scale' and 'large scale' terminologies is sensible. Thanks for the helpful comments. As mentioned, our main research goal is mitigating the scale distributional shift of the existing NCO method, not proposing a standalone high-performance model for the specific 'large-scale' task. Thus, we focus on the distribution shifts to a 'larger' scale as DRL models suffer from direct training when the problem sizes increase, which means the trained models are obtained in the 'smaller' scales in practice. *We revised all expressions related to this and will change our title to* **"Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization"**. ## References (use APA Style) [1] Li, S., Yan, Z., & Wu, C. (2021). Learning to delegate for large-scale vehicle routing. Advances in Neural Information Processing Systems, 34. [2] Arnold, F., Gendreau, M., & Sörensen, K. (2019). Efficiently solving very large-scale routing problems. Computers & operations research, 107, 32-42. [3] Zong, Z., Wang, H., Wang, J., Zheng, M., & Li, Y. (2022). RBG: Hierarchically Solving Large-Scale Routing Problems in Logistic Systems via Reinforcement Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully