# Bibliography for Lukas' MSc by Research
###### tags: `bibliography`
### [Memorization in NLP Fine-tuning Methods](https://arxiv.org/pdf/2205.12506.pdf)
- Main research paper of interest; How do different fine tuning methods change the memorization of a particular ML model. Tested model (GPT-2) is outdated and fine tuning methods are basic.
### [Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models](https://arxiv.org/pdf/2205.10770.pdf)
- Memorization changes with model size. Larger LLMs memorize faster than smaller ones. Should be considered during the thesis.
### [Deduplicating Training Data Makes Language Models Better](https://arxiv.org/pdf/2107.06499.pdf)
- Duplicate data in training data set is increasing memorization. Should be considered during the thesis.
### [Emergent and Predictable Memorization in Large Language Models](https://arxiv.org/pdf/2304.11158.pdf)
- Predicting which parts of the training data set is most likely to be memorized by the LLM. Results unconclusive - needs further research. Not in socpe of this thesis.
### [Membership Inference Attacks against Language Models via Neighbourhood Comparison](https://arxiv.org/pdf/2305.18462.pdf)
- Attacks to exploit memorization; Might be useful for later stages of this thesis.
### [Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning](https://arxiv.org/pdf/2305.11759.pdf)
- Prompt engineering for extracting training data; Might be useful for later stages of this thesis.
### [Privacy-Preserving Prompt Tuning for Large Language Model Services](https://arxiv.org/pdf/2305.06212.pdf)
- This thesis will focus on fine tuning not prompt engineering. Might still be useful.
### [Responsible Use Guide for LLMs - Meta Research](https://github.com/facebookresearch/llama/blob/main/Responsible-Use-Guide.pdf)
- Good summary of LLM risks.
### [Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey](https://arxiv.org/pdf/2210.07700.pdf)
- Good summary of LLM risks.
### [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/pdf/2307.09288.pdf)
- Good paper on everything Llama.
### [Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection](https://arxiv.org/pdf/2302.12173.pdf)
- Good paper on everything prompt injection.