The Open Source AI Definition

version 0.0.5

Note: This document is made of three parts: A preamble, stating the intentions of this document; the Definition of Open Source AI itself; and a checklist to evaluate licenses.

This document follows the definition of AI system adopted by the Organization for Economic and Co-operation Development (OECD)

An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

More information about definitions of AI systems on OSI's blog.

Preamble

Why we need Open Source Artificial Intelligence (AI)

Open Source has demonstrated that massive benefits accrue to everyone when you remove the barriers to learning, using, sharing and improving software systems. These benefits are the result of using licenses that adhere to the Open Source Definition. The benefits can be summarized as autonomy, transparency, and collaborative improvement.

Everyone needs these benefits in AI. We need essential freedoms to enable users to build and deploy AI systems that are reliable and transparent.

Out of scope issues

The Open Source AI Definition doesn’t say how to develop and deploy an AI system that is ethical, trustworthy or responsible, although it doesn’t prevent it. We support the efforts to discuss the responsible development, deployment and use of AI systems, including through appropriate government regulation, as a separate conversation.

What is Open Source AI

To be Open Source, an AI system needs to be available under legal terms that grant the freedoms to:

Use the system for any purpose and without having to ask for permission.
Study how the system works and inspect its components.
Modify the system for any purpose, including to change its output.
Share the system for others to use with or without modifications, for any purpose.

Checklist to evaluate legal documents

This table is work in progress. See slide 7 for more details.

Component	Necessary to Use	Necessary to Study	Necessary to Modify	Necessary to Share
Code All code used to parse and process data, including:
- Data preprocessing code
- Training code
- Code used to perform inference for benchmark tests
- Validation code
- Inference code
- Evaluation code
- Other libraries or code artifacts that are part of the system, such as tokenizers and hyperparameter search code, if used.
Data All data sets, including:
- Training data sets
- Testing data sets
- Validation data sets
- Benchmarking data sets
- Data cards
- Evaluation metrics and results
- All other data documentation
Model All model elements, including:
- Model architecture
- Model parameters (including weights)
- Model card
- Sample model outputs
Other Any other documentation or tools produced or used, including:
- Thorough research papers
- Usage documentation
- Technical report
- Supporting tools

Richard Fontana

2024/01/31 01:55:21

Component

I note that this collection of components is specific to machine learning, in contrast with the OECD definition of "AI system" which is apparently much broader.

smaffulli

2024/01/31 13:44:28

Yes, this is an initial release and it's based on the Model Openness Framework being developed by LF AI & Data/Generative AI Commons. We'll expand beyond ML as necessary (suggestions are welcome)

jplorre

2024/02/02 11:09:26

Perhaps we may distinguish between AI Systems (that may or not include ML components) and AI models (cf. AI Act). This would make it easier to spell out the rules for each category

2024/02/02 11:24:38

- Model architecture

We should clarify what the term architecture refers to: type and organisation of the network, but perhaps also for example the type of embedding and other information needed to understand the model properly.

2024/02/02 11:26:33

data

Such as data used to train the tokenizer

Aspie96

2024/02/06 02:27:23

We support the efforts to discuss

This is better than the earlier version, but I still think it's problematic. I really don't think that the document which provides the *definition* of open source AI should "support" anything. A definition is something everyone should be able to reference, regardless of where they stand on any issue, as a shared standard for clarity in communications. OSI can support things, but it doesn't have to do it in documents giving a definition which hopefully will be widely adopted.

2024/02/06 02:31:24

The benefits can be summarized as autonomy, transparency, and collaborative improvement.

I think the benefits of open source go well beyond these three. I'm not arguing all benefits should be listed, of course, just trying to prompt a discussion on what should, if anything.

2024/02/06 02:37:11

legal terms

But legal terms are not enough, are they? That is a necessary condition, but surely not a sufficient condition. If I encode my trained model in a proprietary format which can only be read using my proprietary program, I may very well legally allow you to use, study, modify and share the model how you wish, and maybe even the program itself, but you won't be able to in practice. And, unlike the inherent difficulty in modifying certain AI systems, this would be entirely deliberate and avoidable. Also consider a rule-based (not ML-based) symbolic AI system, written in code and distributed as binary, but under an open source license. It would clearly not be open source software, but it would be an open source AI system under this definition. Wording this aspect correctly, so that all kinds of AI systems can be made to qualify as "open source", but only if they aren't arbitrarily restricted trough technical means, is hard. It's hard, but absolutely necessary.

2024/02/06 02:49:15

Data All data sets, including:

I think data being available is really important, but considering it as a requirement for a system to qualify as "open source" would be unwise. First, the availability of *anything* under proprietary licenses shouldn't make something closer to an "open source" status. The logical consequence might seem to require all data to be open source, but that would mean plenty of systems which are already commonly referred to as "open source" wouldn't be. Furthermore, the training data and the trained model are two different, separate assets and the model does not necessarily contain much information which is specific to the training data, as the learned features can be much more general. Whether the model is open source, therefore, should be orthogonal to whether the training data is open source, which I think is also consistent with the OSD.

2024/02/06 02:50:31

Instead of requiring training data to be available for a model to qualify as "open source", a different, higher standard may be needed, to describe a model which is open source AND trained on fully open source data AND well documented.

Shuji Sado (佐渡秀治)

2024/02/06 03:46:49

licenses

In accordance with the current wording of this document, it is preferable to use the term "legal documents" instead of "licenses".

pchestek

2024/02/09 02:24:59

Other

I think written content goes beyond what open source licenses require. So you're saying that even a research paper has to be freely copied, etc.?

2024/02/17 06:18:44

Open Source has demonstrated

"Open Source, as defined by the Open Source Initiative, has demonstrated that ..." By modifying the beginning of the preamble as above, there is no room for an expansive interpretation of the meaning of the term Open Source.

2024/02/17 07:40:58

I take this down. The phrase "that adhere to the Open Source Definition" in the next sentence is sufficient.

2024/02/17 14:52:15

Open Source AI

For the current article, the title "What is Open Source AI **system**" appears more appropriate. It might be better to state in the article that Open Source AI refers to AI systems.

The Open Source AI Definition

version 0.0.5

Preamble

Why we need Open Source Artificial Intelligence (AI)

Out of scope issues

What is Open Source AI

Checklist to evaluate legal documents

Read more

Answers to frequently asked questions

Checklists-MOF

The Open Source AI Definition v1.0-RC2

The Open Source AI Definition v.0.0.8