Summary of Changes

Dear Shepherd, dear Reviewers, We have completed the revision of our paper according to the requirements and plan outlined in our previous correspondence. Attached to this message, kindly find our revised paper and the `latexdiff` PDF highlighting the modifications made. We greatly appreciate the detailed feedback, which has significantly contributed to improving our paper. Below, we have prepared a comprehensive summary of changes, outlining all our modifications to the paper to address the concerns raised during the reviewing process. # Summary of Changes ### Experimentation * **1-A:** Expand analysis and metrics on attack results (Sec 5.3 / Table 7) * We have revised Section 5.3, incorporating a comprehensive analysis of leaks categorized by information types. * **1-B:** Role and effectiveness of prompts (Sec 5.3 / Table 8) * We have added a table to report the number of leaks categorized by prompt-construction method. * We have provided an analysis on the effects of different construction methods in the accompanying text. * **1-C:** Include an additonal model (Sec 5.1.2 / Table 6) * We have validated the approach on two additonal code generation models: * First one is PolyCoder for which only the links to GitHub repos and the hashes of each file used in the training are shared. This represents an intermediate case between CodeParrot and Codex, as even after crawling GitHub we won't be able to have the full training ground truth. * Second one is StarCoder which is not a GPT-based model and comes with novel combination of architectural features unavailable in other open Code LLMs, making it a good candidate for evaluation of generalizability of our approach * We provide the summary of our experiments in Table 6, contrasting these results against those of CodeParrot * We discuss the results in the accompanying text in Section 5.1.2 on how the attack generalizes to these models * **1-D:** Baseline (Sec 5.3) * We provide insights into how the attack would compare against a simple baseline of searching the prompts on Github and report the number of leaks in that case. ### Writing * **2-A:** Expand on Threat Model (Sec 3.2) * To provide clarity, we have added a couple of paragraphs to further delineate the ability of attackers and realism of attack model. * **2-B:** Expand on novelty of submission (Sec 7 & Sec 1 ) * We rewrote parts of Introduction and Related Works sections to better clarify the differences from prior work. * **2-C:** Contributions of the work - "easy fix" (Sec 6.1 & Sec 1) * We addressed the "easy fix" concern by discussing it in the Impact (Section 6.1 of the paper). * We emphasized our contribution by providing appropriate context, thereby ensuring a more comprehensive and coherent presentation of our work. * **2-D:** Choice of Github Threshold (Sec 5.2) * We have added clarification that our choice of a Github search hit rate threshold was conservative, aimed to demonstrate the overall pipeline's feasibility and is customizable based on the privacy requirements. ### Others * Disclosure (Sec 1) * We have added note on our experience of disclosure of our findings to both GitHub and OpenAI. * Hallucinations/Takedowns (Sec 6.1 & Sec 3.1.2) * We have clarified that verification of hallucinations and takedowns is out of the scope of our work and our results represent a lower bound on the attack precision. * Compress and streamline the background to make space for our further results. * We condensed the lengthy content in the background section and some other sections. Should you require any further information or have any additional suggestions, please do not hesitate to let us know. We sincerely appreciate your continued support and look forward to your evaluation of the revised manuscript. Best regards,\ Authors ___ # Disclosure email to: disclosure@openai.com Link: https://openai.com/policies/coordinated-vulnerability-disclosure-policy Email Subject: **Responsible Disclosure of Potential Copilot/Codex Model Information Leaks** Dear GitHub Support Team, We are writing to report our findings of a recent research paper on generating information leaks from the Codex model (the underlining model of GitHub Copilot) and to responsibly disclose these findings to your team. Our group of researchers is based at NYU/NYUAD, and the paper is currently under review at an academic conference. Our research develops a pipeline for extracting sensitive personal information from the Codex model that is trained on GitHub repositories. We have attached the research paper. **(Original Introduction, detailed version)** : This paper develops a pipeline for extracting sensitive personal information from the Codex model that is trained on GitHub repositories. First, we design templates to automatically construct prompts that are more likely to induce privacy leaks (Section 4.2). We then propose and refine a semi-automated method to filter the generated code in favor of actually leaking personal information from the training data; for this purpose, we use a blind membership inference attack (Section 4.4.1) followed by hit rate through Github Search API (Section 4.4.2) as a distinguishing heuristic. We then perform human-in-the-loop evaluation (Section 4.4.3) to reveal that roughly 8% of our constructed prompts yield privacy leaks (43 leaks from 512 prompts). **(A shorter version introduction from chatgpt)** : In brief, we designed templates to construct prompts that are likely to induce privacy leaks (Section 4.2) and then used a semi-automated method to filter the generated code in favor of actually leaking personal information from the training data. For this purpose, we used a blind membership inference attack (Section 4.4.1) followed by hit rate through the Github Search API (Section 4.4.2) as a distinguishing heuristic. Our human-in-the-loop evaluation revealed that approximately 8% of our constructed prompts yield privacy leaks (43 leaks from 512 prompts) (Section 4.4.3). We also observed that the model is more likely to produce indirect leaks that generate information of individuals in close vicinity of the queried subject in the training corpus. Our findings contribute to the ongoing works on identifying the relationship between memorization and privacy by revealing that the Codex model is much more likely to leak personal information of other people contained in the same code file to that of the queried person, potentially violating privacy as contextual integrity for other subjects. We would be happy to provide any additional information or answer any questions you may have regarding our research. Thank you for your attention to this matter. We hope that responsibly disclosing our findings will help to improve the privacy and security of the Codex model. Best regards, NYU/NYUAD CSP-Lab Team # Revision Plan and Tentative Timeline Dear Shepherd, Thank you for helping us improve our paper in the revision process. Our plan to address the issues listed in the reviews is closely related to what we proposed in the rebuttal: Experimentation: - 1-A: Expand analysis and metrics on attack results We will provide an exploratory analysis on what categories of information are more likely to be extracted accurately. - 1-B: Role and effectiveness of prompts We will provide an analysis on the effects of different prompts - by construction methods and by leak category - on the extracted results. - 1-C: Include an additonal model We're in the process of validating the approach on an additional code generation model - Polycoder - for which only the links to GitHub repos and the hashes of each file used in the training are shared. This represents an intermediate case between CodeParrot and Codex, as even after crawling GitHub we won't be able to have the full training ground truth. - 1-D: Baseline We will provide insights into how the attack would compare against the simple baseline of searching the prompts on Github, along with the discussion of these two attacks. Writing: - 2-A: Expand on Threat Model We will clarify the threat model to show why our threat model is realistic and further delineate the ability of attackers. - 2-B: Expand on novelty of submission We'll revise our paper to clarify the differences from text-LLMs and make comparisons with prior work explicit. - 2-C: Contributions of the work - "easy fix" We will better emphasize how the wide adoption of code-generation LLMs similar to Copilot calls for a timely and critical investigation into their privacy implications. GitHub Copilot was trained not only on public code, but also on private user code as specified in their telemetry policy. We verify using only public code that the Codex model can leak sensitive information because we do not have access to the private code data. However, if the model can leak information in public code, it will also leak information in private code. - 2-D: Choice of Github Threshold Our choice of a Github search hit rate threshold of 100 was conservative and aimed to demonstrate the overall pipeline's feasibility. We will make it clear that the choice of threshold is not crucial to the attack and can be customized (e.g., based on the privacy requirements of the audit). - 2-E: Compress and streamline the background to make space for our further results. Others: - Disclosure We have disclosed our findings to both GitHub and OpenAI. We are currently waiting to hear back and will report their reaction. Please let us know if there is anything you would further recommend to change in the paper. We have one question regarding the page limit: For our revised paper, can we assume that the camera-ready instructions apply, i.e., our paper can have 18 pages in total, or are we bound to the submission instructions of 13 pages of main text plus references and appendix? On the organizational side, we would like to target the 2023 deadline. We are already in the process of making the changes that were requested in the reviews. We plan to conduct the required experiments in the next 14 days and then prepare the revision, along with a summary of changes, by May 31, hopefully leaving enough room for you to provide feedback, and suggest further changes if necessary. Please let us know if you are ok with this timeline proposal. Thank you! Best Regards, Authors --- # Meta Review The reviewers found this paper valuable, but they found issues that need to be resolved prior to publication. Therefore they have landed on an Accept Conditional to Major Revision with the following requirements: Experiments: - Add more analysis and metrics on attack results: e.g., what kinds of information are more likely to be extracted accurately - Add more analysis on the effects of different prompts on extracted results. - Add experiments on multiple models with different settings - Compare the proposed attack against the simple baseline of searching the prompts (or prefix) on github Writing: - Add more detailed explanation on Threat model - Clarify the novelty of the submission - Clarify the threat model - Clearly state the contribution of the work since there is an easy fix to the privacy leak problem - Restructure the prose to cut-short some parts of the background and use the space to explain the experiment results - Discuss the choice for Github search hit threshold Others: - Add a paragraph on the disclosure process, including the timeline and the reaction from and changes (if any) made by the informed vendors The Research Ethics Committee (REC) reviewed this paper and the rebuttal provided by the authors. We appreciate the additional details and thank the authors for coordinating on the vulnerability disclosure with both GitHub and OpenAI. As mentioned in the revision criteria, we ask the authors to add a paragraph on the disclosure process, including the timeline and the reaction from and changes (if any) made by the informed vendors. --- # Codex Leak USENIX Sec Rebuttal - 700 words We thank the reviewers for their constructive reviews. We appreciate that they found our work "interesting" (R-A), working on a “timely problem” (R-B,D,E), appreciating the “new attack” (R-C). We address major concerns here. **Novelty and comparison to previous work (R-B,C,D)** As noted by GitHub, the Copilot project has already reached one million users. With the increasing adoption of code-generation LLMs, there is a timely and critical need to investigate their privacy implications. While prior research has found private information in GitHub repositories [1], our study is the first to investigate attacks against AI-based code-generation tools. We draw on the insights from [1], particularly in the human-in-the-loop step, to confirm the identified leaks. Our proposed method differs from extracting training data from Text-LLMs [7,8] in several ways. (1) We proposed a novel attack based on BlindMI rather than naive perplexity scores, and a pragmatic pipeline for verification. (2) We identified a new "mix&match" leak pattern from Codex, which is different from eidetic memory [8]. (3) We designed prompts specific to code-generation models to elicit sensitive information using a variety of methods. We'll revise our paper to clarify these differences and our novelty. **Generality/Transferability of CodeParrot results to Codex (R-A,B,D)** By separately validating our approach on different code generation models, we can gain confidence in the generalizability of our approach to different LLMs. We have investigated it with CodeParrot & Codex. CodeParrot has the advantage that its training dataset is available, giving us ground truth. It's trained on a fraction of GitHub code files. Since the Codex model is a black-box setting for us, we used CodeParrot to evaluate BlindMI and demonstrate the effectiveness by reporting the actual leaks filtered by BlindMI on Codex. We're in the process of validating the approach on further code-generation models, in particular for Polycoder for which only the links to GitHub repos and the hashes of each file used in the training are shared. This represents an intermediate case between CodeParrot and Codex, as even after crawling GitHub we won't be able to have the full training ground truth. **Threat Model (R-A,C)** The training data of Copilot includes both open-source public and private code. It also includes deleted code that may no longer be accessible but was in training. It's a realistic assumption that attackers have access to some training sequences given that it's impracticable to train a code-generation LLM without open-source code. We'll clarify the threat model further. **Prompt analysis and generalization (R-A,C)** Even though our attack is transferable to other code-generation LLMs, we leave extending our attack to Text-LLMs as future work. We expect our Membership Inference to work for Text-LLMs, but acknowledge that the prompt generation won't be directly applicable and will need being extended. We'll provide more analysis for GitHub prompts to show what kinds of prompts lead to better results. **Model Hallucinations (R-E)** Model hallucination will indeed affect the attack precision. By design, our choice of BlindMI caters to hallucinations as the method helps to get rid of non-members of the training set. In the black-box setting, it is almost impossible to report ratio of real sensitive data vs. hallucinations because of lack of ground truth. In fact, some hallucinations might even be members of the training set but might have gotten deleted from Github since training. **Threshold heuristics (R-E)** Our choice of a Github search hit rate threshold of 100 was conservative and aimed to demonstrate the overall pipeline's feasibility. However, this threshold is not crucial to the attack and can be customized (e.g., based on the privacy requirements of the audit). **Easy fix: Remove public information (R-C)** GitHub Copilot was trained not only on public code, but also on private user code as specified in their telemetry policy. We verify using only public code that the Codex model can leak sensitive information because we do not have access to the private code data. However, if the model can leak information in public code, it will also leak information in private code. **Disclosure (R-D)** We're reporting our findings to both GitHub and OpenAI and will report on the process and reaction. We'll address all other issues like clarifications, incorrect claims, and typos. Many thanks for the thoughtful and detailed reviews!