# Answers to frequently asked questions ## What's the difference between the Open Source Definition and the Open Source AI Definition? FIXME ## Why is the original training dataset not required? The short answer is because in too many cases sharing a dataset is illegal or technically impossible. To better understand if the training dataset was required to use, study, share and modify an AI system we setup working groups to analyze four specific systems and report what components were required. The groups came back saying that detailed information about the datasets were as important, so the Definition requires `data information` instead. ## Why is there no mention of safety and risk limitations in the Open Source AI Definition? The Open Source AI Definition doesn’t say how to develop and deploy an AI system that is ethical, trustworthy or responsible, although it doesn’t prevent it. The efforts to discuss the responsible development, deployment and use of AI systems, including through appropriate government regulation, are a separate conversation. A good starting point is OECD's Recommendation of the Council on Artificial Intelligence, [Section 1: Principles for responsible stewardship of trustworthy AI](https://legalinstruments.oecd.org/en/instruments/oecd-legal-0449) ## Why the grant of freedoms is to its users? We believe that the users of AI systems are the ones who need to be able to use, study, share and modify them, whether on-prem or remotely via API. The OECD identifies two kinds of users: • human engineer (who develops the AI system); and • [human] user (interacting with AI through a prompt.) ## What are the model parameters? FIXME In neural networks parameters are the weights and biases of a model. ## Are model parameters copyrightable? The grant of only a copyright license for an AI model parameters may not be enough to assure all the necessary freedoms. There are a lot of opinions about whether model parameters are protected by any rights regime at all and, if they are, by which one. This is why the Open Source AI Definition says, for most elements, that they must be *“available under OSI-compliant license”* but for the parameters it says *“available under OSD-conformant terms.”* It's still not clear if the parameters are protectable by some other regime (contract, database rights, or perhaps newly created rights), a grant of a copyright license only isn’t going to ensure that the model is fully available as required by an Open Source software license. ## What does `Available under OSD-compliant license` mean? `OSD-compliant` (or `OSD-compatible`) means `compliant with the principles listed in the Open Source Definition (OSD). This is the best term we could come up with to describe the licenses used for most documentation and datasets, like the Creative Commons Attribution and Attribution-Share-Alike, with or without the Non-Derivative option and dedications to public domain. The Creative Commons licenses that use Non-Commercial clause aren't OSD-compliant because field-of-use restrictions are incompatible with the OSD. If you encounter other licenses for documentation in this exercise, let's review them on a case-by-case basis. ## What does `Available under OSD-conformant terms` mean? Most models we analyzed during the validation phase use MIT/BSD-like licenses, the Apache SL v2 or a variation of RAIL. MIT/BSD and Apache would be 'conformant' to the OSD because they don't impose restrictions. RAIL wouldn't be conformant because that family of licenses introduces field of use restrictions. The Llama 2 and Llama3 licenses also are not conformant. The Apple sample code license used by OpenELM (we're not evaluating this at the moment) is a weird one for which we'd need to have a full license review. ## Why is the Open Source AI Definition includes a list of components while the Open Source Definition for software doesn't say anything about documentation, roadmap and other useful things? The OSI uses the Open Source Definition to review licenses, not software packages. We’ve been working under the axiom that if a program is shipped with an OSI Approved License® approved by the OSI then the software is considered Open Source. In the software space that’s generally understood and mostly works fine (although it’s challenged, at times.) For machine learning systems the OSI can’t simply review licenses as the concept of the “program” in this case is not just the source/binary code. Through the co-design process of the Open Source AI Definition we learned that to use, study, share and modify a ML system one needs a complex combo of multiple components each following diverse legal regimes (not just the copyright+patents.) Therefore we must describe in more details what is required to grant users the agency and control expected. ## Why is the "Preferred form to make modifications" limited to machine learning? Because algorithmic AI is software, written by humans and therefore can be evaluated with the existing framework to evaluate Open Source software. Machine learning systems instead are more complex artifacts, require training and produce parameters: these don't fall squarely in the Open Source Definition and therefore require this Open Source AI Definition to exist.