# Not from scratch: predicting thermophysical properties through model-based transfer learning using graph convolutional networks
## Reviewer: 1
#### Comments:
In this paper, the author introduced a graph convolutional network for the prediction of thermophysical properties with transfer learning. While a certain level of novelty was shown in the paper, there are several major and minor issues to be revised before the paper is published.
<span style="color:blue">[Comments]</span> **This reviewer does not seem to have many complaints besides the need for a stronger emphasis on the pretraining side. Adding a section with more details might suffice.**
<span style="color:blue">[Kiho Comments]</span> **I agreed with your opinion. The reviewer #1 is quite positive to our paper.**
Major issues:
> 1. To ensure the reproducibility of the paper, the framework and parameter of the convolutional network should be clarified in the method part. While the author uses fig 3 and 4 to symbolically express the network framework, a more precise and rigorous algorithm or flowchart should be given to state the input dimension, convolutional parameter (kernel size, filter size and type of convolution), other network parameters (number of layers, dense layers, etc.)
<span style="color:green">[Add]</span> **Need detailed information about network parameters (can add more technical details now that reviewers align better with an ML background).**
<span style="color:blue">[Kiho Comments]</span> **The inclusion of network parameters and further explanation will make the reviewer happy.**
> 2. Transfer learning is supposed to be the highlight of the paper, however, the author didn’t explain enough details for their transfer learning framework. Apart from references to the history of transfer learning, very limited information was given in section 2.4. The author should explain the pre-trained model used for transfer learning, including input, output, training set, and feature used. Also, the author should clearly state the process to incorporate the pre-trained model with the new model. A clear framework flowchart or algorithm including network parameters should be given in this section.
<span style="color:green">[Add]</span> **We need to add a section only with details of the TL part.**
<span style="color:blue">[Kiho Comments]</span> **Also, the importance of TL can be emphasized in Abstract and Introduction. (Same response in reviewer #2 comment #1).**
##### Minor issue:
> 1. In the result section, the author only reported mean absolute error and mean absolute percentage error. To better evaluate the similarity between predicted results and experiment results, the Pearson correlation $R^2$ should be calculated and reported in the result section.
<span style="color:red">[Discuss]</span> **Do we really need to add these results?**
<span style="color:blue">[Kiho Comments]</span> **It might be better to add R2 in the results.**
> 2. Figure 2 and figure 6 are almost identical except for the input. The author should reorganize those two figures to avoid repetition and highlight the transfer learning framework.
>
<span style="color:purple">[Modify]</span> **We might need to combine both images but that would mean also restructure some of the sections.**
<span style="color:blue">[Kiho Comments]</span> **It might be better to combine these figures to avoid the duplication of the image. Then, the restructure of the paper will be unavoidable.**
## Reviewer: 2
#### Comments:
This work by Hormazabal et. al. is a thorough study on building a new methodology using GCNNs and transfer learning for a wide variety of organic and inorganic molecules. This manuscript is good fit for JCIM after some points are addressed:
<span style="color:blue">[Comments]</span> **This reviewer asks for some extra analysis on the arising clusters seen on the TSN-e projections. Probably just adding results for inorganic/organic separately would make him happy (table or figure?).**
<span style="color:blue">[Kiho Comments]</span> **You are right. The addtion of results might be necessary. I am not sure which format will be better, but any type of results (table or figure) will be acceptable. The reviwer #2 is also positive to our paper.**
> 1. Although the goal of the manuscript is the methodology, specifying which thermophysical properties are optimized (critical temp, pressure and volume) in the abstract and the introduction will be useful for readers.
<span style="color:purple">[Modify]</span> **Simple modification. Clarify the properties being predicted when talking about thermophysical properties**
<span style="color:blue">[Kiho Comments]</span> **Great!**
> 2. Figures 10 and 11 have a representation of the chemical space used tSNE but is missing chemical insights. What are the specific larger chemical groups (amides, alcohols, acids, etc) contained in those spaces? Which particular spaces are underrepresented? Are these spaces difficult to navigate computationally or experimentally?
Additionally, there are formatting errors with Figures 10 and 11 with the size of the box and the outline around the figures.
> 3. Following comment 2, some more detail about the organic vs inorganic chemical space in the dataset is desirable. The authors claim that it generalizes across both groups would need some more substantiation like metrics across both groups as well as the data split for organics vs inorganics in the dataset.
>
<span style="color:green">[Add]</span>
* **Add more analysis to the TSNE plots.**
* **Label the arising clusters and add comments.**
* **Underrepresented parts of the data are hard to explore experimentally.**
* **Not sure about the formating errors the reviewer refers to.**
<span style="color:blue">[Kiho Comments]</span> **I also cannot find the formating error in the figures (size? outline?). In figure 10, the color representing both the upper limit (900) and the lower limit (200) is red. It might be confusing and I recommend to change the color map to avoid the confusion.**
> 4. Would this methodology work for areas where the amount of experimental data is a very small fraction compared to the computational data? What is the minimum amount of experimental data required to make transfer learning meaningful vs only building a model on simulated data? Which properties would be ideal for this workflow vs not? Including this in the discussion will give new users a good sense for the applicability of this method for new work and new properties.
<span style="color:green">[Add]</span>
* **Ideal for properties were there is a strong correlation between graph structure (atom vecinity) and properties, since would be easier to transfer learned representations from models to the experimental task (Add paragraph discussing this).**
* **Talk about the importance of data diversity vs size. Rather than the _"minimum amount of data"_, the question of _"how similar/dissimilar (statistically speaking, the support of the train data distribution) should the model vs experimental data be in order for this to be effective"_ is more important.**
<span style="color:blue">[Kiho Comments]</span> **The reviewer wants to add the insight of the TL method in our paper. As the reviewer suggested, we can add a paragrah in the discussion to explain the applicability of the method to a new properties.**
## Reviewer: 3
#### Comments:
This work developed a transfer learning framework (Fig. 6), where pre-training of a graph-convolutional neural network using the predictions from the best available GCM model helps to create an accurate prediction model for tasks with scarce experimental data.
<span style="color:blue">[Comments]</span> **This reviewer seems to be one who understand the paper contributions and focus the most. All his comments seem adequate and would actually improve the paper. Thoroughly fixing these weaknesses is key.**
<span style="color:blue">[Kiho Comments]</span> **I agreed with your opinion. The response letter to the reviewer #3 looks to be the most important.**
#### Major concerns:
> 1. Section 2.2, “To carry out fair comparisons…it is not known which samples were originally available in the original regression training set”: Is it possible to use time-split method for selecting test set? i.e. Use the experimental data after the publication date of J. Marrero and R. Gani (ref. 40) as test set.
<span style="color:red">[Discuss]</span> **This is a really good idea. However, predictions used for comparison in this work are not the ones directly reported by Marrero and Gani original work. We still do not have access to the data used by ProPred. This point needs more detailed analysis (there might be a way to split data depending on time of publication, although might take effort)**
<span style="color:blue">[Kiho Comments]</span> **If the data collection after the publication date is possible, the utilization of time-split method is the best option. However, I think it is too time-consuming. Then, we can select another option. Explaining and admitting the current limitation of the dataset, and including some discussion of the utilization of time-split method by referring another papers. Let me know if you have any other options.**
> 2. Section 2.2, “In this study, the training dataset is chosen depending on the prediction discrepancy of the GCM and the experimental data available”: This approach doesn’t ensure that the test set in this study was not used for developed the GCM from J. Marrero and R. Gani. Although the GCM predictions on test set are less accurate than training set, they may be much better than random guess (if the test set were used in the GCM model).
<span style="color:red">[Discuss]</span> **No approach can completely ensure which data was used for training without direct access to it. In order to make fair comparisons we must compromise in some way and in this way can at least ensure we are improving parts that were originally hard for previous methods.**
<span style="color:blue">[Kiho Comments]</span> **Can you explain the structure of the dataset and why we have to select the method to classify the training dataset?**
> 3. Section 2.2 Page 15 “By only using experimental points that are predicted accurately by the GCM, the chance of using problematic experimental data in the training process can be minimized.” - Such a data filtering approach will amplify the biases in previous model. What if the data used for constructing the previous GCM is problematic?
<span style="color:purple">[Modify]</span> **This is 100% correct and pointing this in the manuscript is a great addition. However, we might not be able to get around these biases and must compromise in some way. Clarifying this is neccesary.**
<span style="color:blue">[Kiho Comments]</span> **Yes. The clear classification might be necessary in the revised manuscript.**
> 4. Instead of training a deep learning model to utilize the knowledge from existing models, does it help if taking the predictions from the existing models as additional input features for the traditional ML models in section 3.1?
<span style="color:red">[Discuss]</span> **This is a good question that cannot be answered without proper testing. However, my intuition tells me that the model would tend to ignore structural information and overfit on the previous prediction feature. Maybe reframing the task as the difference $|y_m - y_p|$? There is probably some literature on this, so need to check. The pre-training task works as a regularizer too, and it can help filter out poor predictions. Kind of liks with the previous comment of the reviewer, about enforcing biases.**
<span style="color:blue">[Kiho Comments]</span> **As you pointed out, we may find some relevant literature to support that the current approach. Then, we can answer the question and may include a further discussion in the revised manuscript.**
> 5. Does the transfer learning procedures in this work appeared in previous machine learning literature? (Try to demonstrate the innovation in highlight)
<span style="color:red">[Discuss]</span> **Our case is rather unusual since normally predictions are made for related tasks but not exactly the one of interest. There are some parallels to make other tasks and discussing the diferences might be useful (need to recheck literature of the past 2 years).**
<span style="color:blue">[Kiho Comments]</span> **One more check in literature might be needed. Then, we can answer the question simply.**
#### Minor concerns:
> 6. <span style="color:green">[Add]</span> Page 7 first bullet point: Briefly describe/define GCMs here, since it’s the first time the “GCMs” appeared.
> 7. <span style="color:purple">[Modify]</span> Page 7 second bullet point: The claim of “completely replaces the static group creation” method is an exaggeration since the transfer-learning model in this work depends on the predictions from traditional models. And there is no comparison between “GCMs model from previous literature (Ref. 40) + GCN with transfer learning” and “updated GCMs model with new experimental data (training set)”.
> 8. <span style="color:purple">[Modify]</span> Page 15: Define “model data” in Table 2, Table 3 here; or referring to Figure 6.
> 9. <span style="color:green">[Add]</span> Could you include the results from the latest GCM in Figure 7 for comparison?
> 10. <span style="color:purple">[Modify]</span> Table 5: There is no need to show error on training set. And it’s helpful to include the best performing model (XGBoost) results mentioned in 3.1.
> 11. <span style="color:red">[Discuss]</span> Section 3.2.1 feature analysis: Analyzing the results from Graph Attention Network seems less relevant to this work.
**Reviewer is right that this seems a bit unrelated (we added it for previous journal) and might take attention from transfer learning but, I dont think just eliminating it is a good idea, since it might have appealed to previous reviewers.**
<span style="color:blue">[Kiho Comments]</span> **I agreed with your opinion.**
> 12. <span style="color:green">[Add]</span> Page 28 “This implies that either the original molecular structure fed to the model has errors, or the actual experimental data reported might have problems” – Activity cliffs could be another reason.
> 13. <span style="color:purple">[Modify]</span> Figure 12: The caption and figure doesn’t match.