# CloudCom review response ## ----------------------- REVIEW 1 --------------------- ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: The authors investigated the performance, monetary cost and transparency of four representative supercomputing applications on cloud bursting environment. For I/O intensive applications, they can receive a significant performance gain by utilizing local storage on cloud nodes; however, at the expense of transparency. To encourage users to use cloud nodes, a payment policy is applied so that users will not pay different monetary cost than those using on-premise nodes. In overall, although the paper has no major contribution, their findings and suggestions deserve some attention. ## 対応 特に無し ## ----------------------- REVIEW 2 --------------------- ----------- Overall evaluation ----------- SCORE: 0 (borderline paper) ----- TEXT: This paper is a case study on the cloud bursting environment between SQUID and Azure from a practical operational perspective. It presents four system settings and discuss their advantages and disadvantages. The insignt show that the cloud bursting queue with local storage disabled and the cloud-dedicated queue with local storage enabled are used in combination. I This paper is well written and is helpful as a case study for those who consider using the cloud bursting environment. However, it only focuses on the cost and how to encourage users to use the cloud bursting environments. For many users, the security and the timeliness are more important when considering the cloud bursting. Also, this paper does not consider any technical methods to solve the problems emerged in the cloud bursting, such as performance or the automation of configurations. One suggestion to the paper presentation. For figure 4 and figure 5, the scales of data are in different orders, so the small data values cannot be seen clearly. Please change that to log-scale (y-axis) or just use tables to present the results. ## 対応 ### 前半 - cloud burstingによる性能の問題は将来課題として、現在取り組んでいる - automation of configurationsは先行研究で記載しているので、そちらを参考にするよう記述する ### 後半 - Fig. 4とFig. 5のy軸をlogに変更 ## ----------------------- REVIEW 3 --------------------- ----------- Overall evaluation ----------- SCORE: -1 (weak reject) ----- TEXT: The paper discusses a cloud bursting environment between a dedicated supercomputing cluster and two cloud systems. I understand the authors' motivation, i.e. how to formulate server policies to encourage users on the dedicated supercomputer to take advantage of the cloud computing resource. However, by reading the experience results, both execution and I/O times of all applications on the supercomputer significantly outperform that these on the cloud servers. Moreover, the costs of computations on the supercomputer are only <1% of these on the cloud servers. With this investigation, I do not think that the users on the supercomputer want to run any simulations on the cloud servers with any policies. How did the authors calculate/obtain the cost per node-hour shown in Table II? Is the cost of the dedicated supercomputer (SQUID, only 1/40 of two others) only for academic users? Also Lustre is a parallel file system, which is optimized for data I/O from many clients at same time. Usually it is not necessary to transfer data between Lustre and local file systems before/after jobs. ## 対応 ### 1段落目 - EP(CPU-intensive)やCG(communication-intensive) ### 2段落目 - D64ds_v4とHC44rsはMicrosoft Azure、SQUIDは阪大CMCが請求する料金を利用している - academic usersに関して→伊達先生に確認 ### 3段落目 - ローカルストレージを利用するためにジョブ前後でデータ転送を行う ## ----------------------- REVIEW 4 --------------------- ----------- Overall evaluation ----------- SCORE: 2 (accept) ----- TEXT: The proposed approach of using a cloud system to extend a supercomputer is interesting. The article provide valuable insight on the impact (cost, transparency) to use such external capacity for the supercomputer operator. Abstract and introduction should have some elements on the scale of the study (in term of number of servers for the SQUID part, in term of users, and in term of duration). The performance evaluation shows (with a limited set of applications) the relative impacts of running HPC applications on both systems. The policy section is unclear. Either the on-premise facility is more cost-efficient or the cloud one is. In the second paragraph, both seems possible. Also at the light of the rest of the document, the text explaining that reduced energy cost can lead to compensate cloud cost seems strange. The discussion is more balanced than the results. With the obtained data, it seems difficult to back the overhead and costs of having the cloud extensions. ## 対応 ### 2段落目 - CPUノードの規模、科学者向け、2021年5月運用開始 ### 3段落目 #### 1つ目 - section 2-CでCMCのノード時間料金はIaaSより安いと記載している - CMCのノード時間料金が安い→その理由として電気料金だけ請求するから、に変更? #### 2つ目 - 論文では、スパコンの節電で節約したお金をクラウドバースティング予算にあてるとしている - lightly loadedの時、不必要な計算ノードを停止することで予算を節約する→この予算をheavely loadedの時に利用する ### 4段落目 - the overhead and costs of having the cloud extensionsを持つのはSQUIDのこと?それともクラウドバースティング環境全体のこと?このへんが不明瞭 - transparency in terms of usageを優先してクラウドバースティング環境を構築した。この際、SQUIDとAzureがCMC内にあるファイルストレージを共有する方式を採択した。一方この方式により、AzureにおいてI/O-intensive jobの性能やコストがSQUIDより劣ることが判明した。