# The Open Source AI Definition ### version 0.0.9 :::info :information_source: This document follows the definition of AI system adopted by the [Organization for Economic and Co-operation Development (OECD)](https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449) > An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment. More information about definitions of AI systems on [OSI's blog](https://blog.opensource.org/open-source-ai-establishing-a-common-ground/). ::: # Preamble ## Why we need Open Source Artificial Intelligence (AI) Open Source has demonstrated that massive benefits accrue to everyone after removing the barriers to learning, using, sharing and improving software systems. These benefits are the result of using licenses that adhere to the Open Source Definition. For AI, society needs the same essential freedoms of Open Source to enable AI developers, deployers and end users to enjoy those same benefits: autonomy, transparency, frictionless reuse and collaborative improvement. # What is Open Source AI When we refer to a "system," we are speaking both broadly about a fully functional structure and its discrete structural elements. To be considered Open Source, the requirements are the same, whether applied to a **system**, a **model**, **weights and parameters**, or other structural elements. An Open Source AI is an AI system made available under terms and in a way that grant the freedoms[^1] to: * **Use** the system for any purpose and without having to ask for permission. * **Study** how the system works and inspect its components. * **Modify** the system for any purpose, including to change its output. * **Share** the system for others to use with or without modifications, for any purpose. These freedoms apply both to a fully functional system and to discrete elements of a system. A precondition to exercising these freedoms is to have access to the preferred form to make modifications to the system. [^1]: These freedoms are derived from the [Free Software Definition](https://www.gnu.org/philosophy/free-sw.en.html). ## Preferred form to make modifications to machine-learning systems The preferred form of making modifications to a machine-learning system is: * **Data information**: Sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data. Data information shall be made available with licenses that comply with the Open Source Definition. * For example, if used, this would include the training methodologies and techniques, the training data sets used, information about the provenance of those data sets, their scope and characteristics, how the data was obtained and selected, the labeling procedures and data cleaning methodologies. * **Code**: The source code used to train and run the system, made available with OSI-approved licenses. * For example, if used, this would include code used for pre-processing data, code used for training, validation and testing, supporting libraries like tokenizers and hyperparameters search code, inference code, and model architecture. * **Weights**: The model weights and parameters, made available under OSI-approved terms[^2]. * For example, this might include checkpoints from key intermediate stages of training as well as the final optimizer state. ## Open Source models and Open Source weights For machine learning systems, * An **AI model** consists of the model architecture, model parameters (including weights) and inference code for running the model. * **AI weights** are the set of learned parameters that overlay the model architecture to produce an output from a given input. The preferred form to make modifications to machine learning systems also applies to these individual components. “Open Source models” and “Open Source weights” must include the data information and code used to derive those parameters. [^2]: The Open Source AI Definition does not take any stance as to whether model parameters require a license, or any other legal instruments, and whether they can be legally controlled by any such instruments once disclosed and shared.