``` Note - link to this document: https://hackmd.io/@onefact/digital-public-goods-alliance-feedback ``` # Feedback on Sections * [AT] add more detailed suggestions for questions per indicator that will be added to the application form for DPG recognition for the AI category, specifically for indicator 7 (privacy and other applicable laws) and 9 (do no harm). * [IM, LG] **[the following feedback is in the form of suggested additions] to other standard questions such as compliance with data protection legislation, which are already part of the DPG application and agnostic to the category of DPGs**: ## Data collection - Was informed consent obtained if the AI system was trained on user data, or can you justify collecting user data for training and testing the AI system without prior consent as “legitimate interest” if such a provision is included under applicable privacy law? - Is the data collection minimally necessary to ensure sufficient performance of the AI system? - How are sensitive data, e.g., personally identifiable information (PII) or data on vulnerable populations, anonymized if used for training and testing the AI system? ## Data generation - Does the AI system profile individuals or infer sensitive information, such as PII or data on vulnerable populations, e.g., to customize specific information, products, or services? - Are there mechanisms to exercise users’ rights to complain/redress/correct such data? ## Security - What steps are taken to respect, protect, and promote privacy, including preventing unauthorized disclosure of the AI system’s training and testing data, user data, or outputs? ## Safety - Is there reasonable protection against harmful uses, e.g., the creation of malicious material? ## Transparency > _Nota bene_: this will likely be covered under indicator 9 - If users interact directly with the system, have they been informed that they are interacting with an AI system, not a human? - Is AI-generated content clearly labeled as such? ### Transparency with regard to data collection - Aspects such as data collection methods, provenance, labeling practices, etc., will already be covered in the required data information applicants must provide in the form of a data sheet or data card. - If AI models are trained on scraped data (e.g., language models for under-resourced languages), there’s the general challenge that in its current practice, scraping is devoid of any privacy considerations (see, e.g., Common Crawl). - We also see potential copyright challenges for training and testing data, which would need to be addressed under the second part of indicator 7 (compliance with other applicable laws); however, there’s still a lot of uncertainty.