# Background
Last updated: 2023-09-25
URL for this page: https://hackmd.io/@investinopen/COIs-background
URL for site: https://hackmd.io/@investinopen/COIs
## A Note about Updates
This documentation is for the pilot project Catalog of Open Infrastructure Services, released January 2022. This documentation is no longer being updated, but is preserved to accompany that pilot project. We are looking forward to releasing a new version of this tool in early 2024. [Sign up for our newsletter](https://share.investinopen.org/newsletter) to learn about the latest work.
## Motivation for Creating COIs
IOI was founded to help increase adoption and investment in the open infrastructure needed to drive equitable access and participation in research. A core premise of that is our aim to provide targeted, evidence-based guidance to institutions and funders of open infrastructure to help them become wiser about where to invest. This includes data on the usability, durability, and costs associated with our technology choices, as well as information to advance best practice and community alignment around governance, transparency, and sustainability.
We firmly believe that to make the decision to invest into open infrastructure a competitive and reliable choice for institutions, we need to better understand the underlying costs, economics, and key dimensions to guide decision-making. Data on current investment in the sector - from external funders, institutions, and projects themselves - is at best disaggregated and at worst incomplete.
We heard from over 120 members of the research community via our [Future of Open Scholarship](https://investinopen.org/research/future-of-open-scholarship/) research as well as [focus groups with decision makers](https://investinopen.org/blog/the-costs-of-open-infrastructure-conversations-with-providers-funders-and-institutions/) conducted specifically for this work that more information was needed to guide these decisions in an accessible, standardized, and coordinated form. We heard that time spent individually conducting due diligence on new tools varied greatly due to external budget pressures, decision making structures, and urgency. This often led to decisions to adopt commercial offerings due to expediency and efficiency to meet the needs of the research community.
COIs is designed as a resource for funders, users, and other interested stakeholders looking to make informed decisions about the open infrastructure services available for research and scholarship. We want to support the ongoing conversations about important issues of financing, governance, and administration of open infrastructure by advancing our mutual understanding of key issues in providing important services and building an ecosystem of reliable, resilient, and community-driven infrastructure supporting a global community of researchers, scholars, and other users. We are also proud to announce that in order to achieve these goals, we are including new assessment criteria for open infrastructures that go beyond typical market-driven indicators.
## Aspiration for COIs
The Catalog of Open Infrastructure Services (COIs) is a step towards addressing the information asymmetries that exist in understanding and assessing open infrastructure projects. This effort is designed to model a means of standardizing information about core open infrastructure services for decision makers and members of the community.
Moving forward, we aspire to expand the number of services represented in the catalog and continue to iterate on the information included to increase confidence in decision making.
## Project History
In **August 2021**, IOI staff reviewed various resources (described below) cataloging open infrastructure services. Based on the available information, we selected 10 services meant to be representative of the diversity of services in terms of function, structure, location, and other key aspects. We detailed the selection of these initial 10 services in [this blogpost](https://investinopen.org/blog/costs-characteristics-oi-providers/). Given the limited time and the small group of individuals making this decision (we were a staff of 3 at the time), we understood the selection of these services were arbitrary and subjective but were optimizing within the limitations of time, effort, and other resources. Our intention was to build a prototype we could build on in order to be more comprehensive and inclusive once we validated the value for this effort.
As part of our exploration into the hidden costs of open infrastructure, we arranged interviews with representatives of these 10 services and conducted a pre-interview survey, collecting information on the finances, governance, operations, community engagement, and other topics we intended to explore in our interviews. These interviews were conducted between **October 11th and October 28th, 2021** and materials used in these interviews can be found [here](https://github.com/investinopen/hidden_costs_materials). Of the 10 services, 9 responded to our [pre-interview survey](https://github.com/investinopen/hidden_costs_materials/blob/main/materials/pre-interview-questionnaire.md).
Knowing we had limited time and resources and wanting to build a prototype to validate the use case for this information, we intentionally limited the scope of our investigation to readily available information but strove to be descriptive of not just the service provided but the organization providing the service, its governance, operations, financing, and other key aspects. We also sought to create a basic evaluation of the transformative influence and approach to community engagement of each service. This evaluative framework evolved as we worked with the services we listed to understand their operations and understanding of the key criteria we presented.
In **early November 2021**, we began working with [a designer](http://lady.graphics/) on an initial mockup of the application based on our initial research into [funding of infrastructure services](https://investinopen.org/blog/funding-open-infrastructure-overview-of/) and [the operational needs of service providers](https://investinopen.org/blog/the-costs-of-open-infrastructure-conversations-with-providers-funders-and-institutions/).
At the **end of November 2021**, we started circulating a spreadsheet of the information we'd collected about the 10 services to the representatives we'd been in touch with for our interviews. We received great feedback from these individuals that helped us improve our work. We identified some GDPR concerns and took steps to minimize personal information while we sought legal advice on our existing privacy policy. We clarified our criteria for transformative influence and community engagement, in addition to correcting any inadvertent data errors.
We finalized the data in **mid-December 2021** and began loading it into our AWS data store, supporting the web application. We revised the application design and finalized the data model in **early January 2022** and [announced the release of COIs on January 7th](https://twitter.com/InvestInOpen/status/1479538930898907143?s=20&t=Cepb_HI3BNf7V-nChhSQJA).
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">It’s launch day at IOI! ✨ <br><br>We’re excited to share the first look at the Catalog of Open Infrastructure Services (COIs). <br><br>This resource is designed to provide comprehensive, consistent, + actionable info to guide <a href="https://twitter.com/hashtag/openinfra?src=hash&ref_src=twsrc%5Etfw">#openinfra</a> investment + adoption. <a href="https://t.co/znStiukSEI">https://t.co/znStiukSEI</a> (1/n)</p>— Invest in Open Infrastructure (@InvestInOpen) <a href="https://twitter.com/InvestInOpen/status/1479538930898907143?ref_src=twsrc%5Etfw">January 7, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
On **January 12, 2022**, we conducted two [COIs Informational Sessions](https://investinopen.org/blog/cois-info-sessions-recap/), giving an overview of COIs and getting feedback from participants on the value of COIs. Particularly highlighted was the value of the governance information. We also heard from services wishing to be included in COIs and after taking the month of February to conduct [a strategy retreat](https://investinopen.org/blog/strategy-retreat-recap-key-learnings-and-resources/) and work on a strategic alignment of our work, on **May 9, 2022**, we released [a call to help guide the building of a strategy](https://investinopen.org/blog/next-steps-for-the-catalog-of-open-infrastructure-services-cois/) for the future of COIs.
The COIs project was deemed finished in September, 2023, to make way for a new tool driven by the feedback and lessons learned from COIs.
## Resources Consulted
* [Mapping the Scholarly Communication Landscape 2019 Census](https://educopia.org/2019-Census/) and [bibliographic scan](https://educopia.org/mapping-the-scholarly-communication-landscape-bibliographic-scan/)
* [Scholarly Communication Technology Catalogue (SComCAT)](https://www.scomcat.net/)
* [Open Access Publishing Tools](https://radicaloa.disruptivemedia.org.uk/resources/publishing-tools/) from the [Radical Open Access Collective](https://radicaloa.disruptivemedia.org.uk/)
* [400+ Tools and Innovations in Scholarly Communication](https://docs.google.com/spreadsheets/d/1KUMSeq_Pzp4KveZ7pb5rddcssk1XBTiLHniD0d3nDqo/edit#gid=0) compiled by [Jeroen Bosman and Bianca Kramer](https://101innovations.wordpress.com/) of Utrecht University Library
## Initial Services Selected
The catalog intially included the following 10 open infrastructure services:
* Crossref's Metadata Retrieval (Crossref)
* The DOI® System (International DOI Foundation)
* DSpace (LYRASIS)
* Jupyter Notebook (Project Jupyter)
* Mukurtu (Washington State University)
* ORCID
* Open Journal Systems (Public Knowledge Project)
* OSF Preprints (Center for Open Science)
* SciELO
* Zenodo (CERN)
These 10 services were selected based on a range of service-specific criteria such as the type of service provided, the organizational status of the service provider, and the availability and accessibility of funding information. Other factors considered were the diversity of scholarly practices represented or the demonstration of the intention and ability to create change towards our vision of an equitable, just, and accessible infrastructure for all. We have previously documented this selection process including key criteria in more detail in [this blog post](https://investinopen.org/blog/costs-characteristics-oi-providers/).
## Data Sources
In describing and evaluating the open infrastructure projects included in this catalog, we’ve applied the basic principles of non-profit evaluation and assessment.
The information used in this catalog comes from public sources, including various service provider websites and the grant databases of their funders. With respect to funding data, we included only the funding that could be verified from both funders and providers. We’ve not included information for those funders who don’t publicly disclose the programs they fund or the amounts they provide. This includes many private companies and some private foundations who’ve not made this data publicly available. In addition, we collected information from public taxing authorities for those located in jurisdictions that make this information publicly available and are subject to the reporting requirements.
The projects selected represent a small subset of a more comprehensive list we’ve been pulling together to refine and analyze. To start, we’ve pulled project and infrastructure provider lists from the [Mapping the Scholarly Communication Landscape 2019 Census](https://educopia.org/2019-Census/) and [bibliographic scan](https://educopia.org/mapping-the-scholarly-communication-landscape-bibliographic-scan/), the [Scholarly Communication Technology Catalogue (SComCAT)](https://www.scomcat.net/), the [list of Open Access Publishing Tools](https://radicaloa.disruptivemedia.org.uk/resources/publishing-tools/) from the [Radical Open Access Collective](https://radicaloa.disruptivemedia.org.uk/), and the [400+ Tools and Innovations in Scholarly Communication](https://docs.google.com/spreadsheets/d/1KUMSeq_Pzp4KveZ7pb5rddcssk1XBTiLHniD0d3nDqo/edit#gid=0) compiled by [Jeroen Bosman and Bianca Kramer](https://101innovations.wordpress.com/) of Utrecht University Library. We are tremendously grateful to the work of these colleagues (and more who are unlisted here) for their foundational work.
We also conducted a research process involving a survey and 1-on-1 interviews with service providers to collect additional information and gain further insight into this work. Some of that information is included in this catalog. All information on providers was shared with the providers for their review and input on the data provided. For more information on the publicly available data collected for this work, see [this blog post](https://investinopen.org/blog/funding-open-infrastructure-a-survey-of-available-data-sources/). For more information on our research work, see [this blog post](https://investinopen.org/blog/the-costs-of-open-infrastructure-conversations-with-providers-funders-and-institutions/).
While we’ve made every effort to verify the data and review for errors, we can’t guarantee the accuracy of the data presented. These project pages will be reviewed annually for necessary revisions. Errors, omissions, or changes can be addressed by emailing us at [catalog@investinopen.org](mailto:catalog@investinopen.org). COIs is subject to IOI’s [Privacy Policy](https://investinopen.org/ioi-privacy-policy/).
## Technology Used
As both advocates of open source solutions and believers in the transformative value of open source, IOI endeavors to leverage open solutions whenever possible. In line with this commitment, COIs is built largely using open source solutions, with the code hosted in a Github repository, as is common practice in open source software development. While the repository is currently private, we intend to make the source code and data openly available pending a security review to identify any potential vulnerabilities or privacy concerns.
The following is a brief overview of the technology used to develop COIs. As the development of COIs is ongoing, we will be updating this documentation to reflect the current state of the technology at key milestones to reflect the current operating structure of the application.
### Data Collection and Storage
As outlined above, the data collection for COIs was done manually. We searched publicly available data sources and added the information to a set of internal Google Sheets. We tracked organization details, grant funding, and other key details.
In creating the initial prototype of COIs, we collated the available data for the initial set of service providers into a master Google Sheet. This information was reviwed and then loaded into a [PostgreSQL relational database](https://en.wikipedia.org/wiki/PostgreSQL), an open source relational database management system, hosted on the Amazon Web Services (AWS) [Relational Database Service (RDS)](https://en.wikipedia.org/wiki/Amazon_Relational_Database_Service). The initial data model was simple, built solely to power the web application we were developing to visualize the data.
We are currently migrating the data store to a self-managed PostgreSQL RDMS instance hosted on an AWS [Elastic Compute Cloud (EC2)](https://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud) virtual computing instance running the open source [Linux](https://en.wikipedia.org/wiki/Linux_kernel)-based [Ubuntu operating system](https://en.wikipedia.org/wiki/Ubuntu). While this increases our maintenance requirements, managing the PostgreSQL instance gives us the ability to build connections with external data sources through the use of [foreign data wrappers (FDW)](https://wiki.postgresql.org/wiki/Foreign_data_wrappers) not available through the AWS-managed RDS instance we originally used.
### Data Visualization and Hosting
We had preliminary visualizations of the funding amounts and financial data in our initial Google Sheets. These were shared with an application developer and data visualization expert to help them design and build the web application powering COIs. They provided initial mockups of the data until we decided on a final design and layout for COIs.
After finalizing the design, the application developer created a web application using the open source [Django framework](https://en.wikipedia.org/wiki/Django_(web_framework)) built on the open source [Python programming language](https://en.wikipedia.org/wiki/Python_(programming_language)). While based in Python, the application incorporates the open source [D3.js Javascript data visualization library](https://en.wikipedia.org/wiki/D3.js) to generate some visual elements.
The COIs web application is hosted on a self-managed AWS [Elastic Compute Cloud (EC2)](https://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud) virtual computing instance running on the open source [Linux](https://en.wikipedia.org/wiki/Linux_kernel)-based [Ubuntu operating system](https://en.wikipedia.org/wiki/Ubuntu). This virtual instance acts as a [web server](https://en.wikipedia.org/wiki/Web_server), handling incoming requests to the `catalog.investinopen.org` subdomain and presenting the requested data to users.
---
## See also
* Archived copy of this page (via the Internet Archive's Wayback Machine): https://web.archive.org/web/*/https://hackmd.io/@investinopen/COIs-background
---
This page first published: 2022-05-08
###### tags: `cois-documentation`
---
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is made available under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>. Users are free to share, remix, and adapt this work. (Please attribute [Invest in Open Infrastructure](https://investinopen.org/) in any derivative work).