# Bibliography for Lukas' MSc by Research ###### tags: `bibliography` ### [Memorization in NLP Fine-tuning Methods](https://arxiv.org/pdf/2205.12506.pdf) - Main research paper of interest; How do different fine tuning methods change the memorization of a particular ML model. Tested model (GPT-2) is outdated and fine tuning methods are basic. ### [Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models](https://arxiv.org/pdf/2205.10770.pdf) - Memorization changes with model size. Larger LLMs memorize faster than smaller ones. Should be considered during the thesis. ### [Deduplicating Training Data Makes Language Models Better](https://arxiv.org/pdf/2107.06499.pdf) - Duplicate data in training data set is increasing memorization. Should be considered during the thesis. ### [Emergent and Predictable Memorization in Large Language Models](https://arxiv.org/pdf/2304.11158.pdf) - Predicting which parts of the training data set is most likely to be memorized by the LLM. Results unconclusive - needs further research. Not in socpe of this thesis. ### [Membership Inference Attacks against Language Models via Neighbourhood Comparison](https://arxiv.org/pdf/2305.18462.pdf) - Attacks to exploit memorization; Might be useful for later stages of this thesis. ### [Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning](https://arxiv.org/pdf/2305.11759.pdf) - Prompt engineering for extracting training data; Might be useful for later stages of this thesis. ### [Privacy-Preserving Prompt Tuning for Large Language Model Services](https://arxiv.org/pdf/2305.06212.pdf) - This thesis will focus on fine tuning not prompt engineering. Might still be useful. ### [Responsible Use Guide for LLMs - Meta Research](https://github.com/facebookresearch/llama/blob/main/Responsible-Use-Guide.pdf) - Good summary of LLM risks. ### [Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey](https://arxiv.org/pdf/2210.07700.pdf) - Good summary of LLM risks. ### [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/pdf/2307.09288.pdf) - Good paper on everything Llama. ### [Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection](https://arxiv.org/pdf/2302.12173.pdf) - Good paper on everything prompt injection.