# FAIR principles and ModernStats standards The idea would be to fill a double entry table between [FAIR principles](https://www.go-fair.org/fair-principles/) and high-level components of [ModernStats standards](https://unece.org/statistics/modernization-official-statistics), for example GSBPM/GAMSO, GSIM and CSDA. We could start at a very coarse-grain level of detail and then drill down into more details in the areas that seem interesting. If we take the GSBPM as an example, we can start with crossing phases and overarching activities on one side, and FAIR principles on the other side. That would mean to fill the following table: | | F | A | I | R | |-----|---|---|---|---| | 1 | | | | | | 2 | | | | | | 3 | | | | | | 4 | | | | | | 5 | | | X | X | | 6 | | | | | | 7 | | | | | | 8 | | | | | | 0.1 | | | | | | 0.2 | | | X | X | | 0.3 | | | X | X | Each cell would contain an indicator of how the phase in row is relevant for the principle in column, for example according to three-grade scale color scheme. For the cells with the highest grade, a zoom in greater detail would be made, e.g.: | | F1 | F2 | F3 | F4 | |-----|----|----|----|----| | 2.1 | | | | | | 2.2 | | | | | | 2.3 | | | | | | 2.4 | | | | | | 2.5 | | | | | | 2.6 | | | | | I would suggest to start with the GSBPM, GAMSO, CSDA (top-level, then second-level capabilities) and GSIM (the five top-level groups to start with). Flavio's thoughts --- Looking at the GSBPM grids, I think most if not all principles might apply to all phases. No matter what we do, (meta)data needs to be easy to find and access. Likely not the same (meta)data entities in all phases, but there is alsways some (meta)data that needs to be found and accessed. In contrast, some level of interoperability and reusability are desireable rather than required. For these two, it's probably just a matter of degree since you can do statistical production with minimal interoperability and reusability, it's just that the process is going to be less efficient and streamlined. This makes me think that satisfying some FAIR principles makes the execution of some GSBPM phases possible (Findable and Accessible) whereas others just affect the effectiveness/efficiency of the execution (Interoperable and Reusable). Functional vs. non-functional requirements perhaps? Not sure... GSIM might be a bit different. The Business group seems to benefit from the FAIR principles like the GSBPM, and to some extent Exchange too, but the other groups are different. How do we talk about the impact of FAIR on Concepts or Structures? I've been trying to think also on how GBSPM can help realize the FAIR principles, and there might be more to say there (see X's in the table above). Overarching processes, e.g. Metadata and Data Management, help realize Reusable and Interoperable, but I'm not so sure about Findable and Accessible. (Interestingly, there is no mention of quality in FAIR...) I can think also of some sub-processes of Process, e.g. Classify and Code, Edit and Impute, and Derive new Variables and Units, that can benefit Reusable and Interoperable to some extent. CSDA can provide also some insights since Data Transformation and Data Integration seem to provide some support for Interoperable and Reusable, whereas Information Sharing could do that for Findable and Accessible... # Arofan's Thoughts There is another perspective here which we want to consider, and this may address some of Flavio's quetions about definitional/foundational metadata. If official microdata (as well as aggregates) are going to be useful to researchers, then there needs to be a lot of clarity and general FAIRness in play. The definitions of concepts are often not very rich from a conceptual perspective *as disseminated*, even though data producers may have good definitions in-house during production. Cross-domain FAIR requires complete contextualization of definitions: consider the case where a meteorologist wants to use some data about population density. They may have a lower understanding of the demographic concepts, and might benefit from links to formal definitions and examples, etc. Another key ingredient here is mappings between official statistical classifications. Researchers are often performing data harmonization/integration, and could benefit from having "official" cross-walks. This is a need which CDIF does not (at this point) address, but which the stats agencies have the expertise to provide. Often, researchers do not (they are not experts in classification management). Other aspects of FAIR - when we consider external users in the research world - also become more important. If standards such as Schema.org and DCAT are being used in a common way as part of CDIF, then these should also be followed by the official statistics world in their dissemination. # InKyung's thoughts This idea of mapping between FAIR principles and GSBPM reminds me of the approach that GeoGSBPM (https://statswiki.unece.org/display/GSBPM/GeoGSBPM) took. To provide a bit of background - few years ago, the geo-statistical community developed the Global Statistical Geospatial Framework (GSGF) which lists 5 principles for the production of harmonised and standardised geospatially enbaled data such as: use of fundamental geospatial infrastructure and geocoding (Principle 1), common geographies for dissemination (Principle 3). GeoGSBPM took GSBPM as a tool to ensure that these 5 principles are followed throughout the entire production process, so it extends GSBPM by adding actions and considerations that should be taken to make sure that the final statistics are in line with the GSGF principles (e.g., choice of "common" geographies (principle 3) should not be done at the dissemination, it should be already taken into account much earlier in design phase, better yet, discussed with users in specify needs phase). Going back to FAIR, I think looking at GSBPM as a tool to ensure FAIR principles could be useful. If we take the same approach as GeoGSBPM of going through sub-processes (or phase) at principle level, we might end up with repeatative and generic statements, but perhaps we can focus on few sub-processes that could be most relevant to FAIR principles, like the examples from Arofan (e.g., how the concept definition should be carried out across production process - when discussing with users, when defining at conceptual level, when deriving variables; how relevant referentail metadata should be documented accordingly). # More thoughts (Flavio) I like the idea of the GSBPM extension, since that seems to be the case with both Arofan's and InKyung's examples. In that direction CSDA might also provide some more detailed guidance here, for instance in the Data Integration capability for crosswalks, and mappings across concepts, variables and physical formats (which appear to some extent in GSBPM Integrate Data) that seems to be closely related to FAIR Interoperable: • Conceptual alignment Map (or harmonize) concepts between multiple conceptual frameworks, taxonomies, ontologies and data/metadata exchange standards to support semantic integration. • Variable Reconciliation Map (or harmonize) variables across datasets to improve comparability and linkage. • Format Standardization Modify physical representations and data types to conform to standards and to best support readily shareable information. It should be easy to apply these to the classification concordance use case, and also to a variable harmonization use case. And also the Data Sharing capability: • Search and Exploration Provide support for authorized internal and external users and processes in finding, ranking and browsing information sets based on their affinity to a set of target concepts or keywords. • Disclosure Control Obfuscate, anonymize and/or redact information deemed sensitive by security and privacy policies by applying data suppression, perturbation, summarization or other techniques to ensure the appropriate level of confidentiality while preserving the usefulness of the data outputs to the greatest extent possible. • Publication Make information sets available, either immediately or at a specific future date and time, to authorized consumers, both internal and external, via a range of (output) Exchange Channels. Data Sharing, together with Metadata Management (which appears in both in GSBPM and CSDA), applies across all GSBPM phases everytime that data needs to be exchanged and concepts need to be used, which is across all Process and part of Analysis, at least. And these two capabilities seem to me closely related to FAIR Findable and Accessible.