# Open Instruction-Tuning Datasets | Dataset name | Paper | Dataset link | Year | | -------- | -------- | -------- | -------- | |Alpaca| [AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback](https://neurips.cc/virtual/2023/poster/72842) | [GitHub](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) [Hugging Face](https://huggingface.co/datasets/tatsu-lab/alpaca) | 2023/03 | | databricks-dolly-15k | [Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) | [Hugging Face](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 2023/04 | | UltraFeedback | [UltraFeedback: Boosting Language Models with Scaled AI Feedback](https://icml.cc/virtual/2024/poster/34726) | [Hugging Face](https://huggingface.co/datasets/openbmb/UltraFeedback) | 2023/10 | - More instruction-tuning datasets can be found: - [eugeneyan/open-llms](https://github.com/eugeneyan/open-llms?tab=readme-ov-file#open-llm-datasets-for-instruction-tuning) - [Investigating Public Fine-Tuning Datasets: A Complex Review of Current Practices from a Construction Perspective (arXiv 2024)](https://arxiv.org/abs/2407.08475).