# SONAR Final Report 05.02.2021 ------ **Project goals** ------ The SONAR project presented three major goals: **Provide a showcase of Swiss scholarly publications** - Collect, promote and preserve scholarly publications of authors affiliated with Swiss public research institutions **Open Access growth and monitoring** - Raise the Open Access (OA) coverage rate of existing Swiss institutional repositories and gather information about research output in Switzerland, and OA in particular **IR hosting** - Propose an institutional repository hosting platform, as an outsourced service for interested Swiss higher education institutions (HEI). Two distinct forms were considered for this service: - *Shared repository*: a multi-institutional portal, with a single common search and navigation interface, as well as common deposit policies. - *Dedicated repositories*: independent portals, each with its own distinct identity and policies, and other extended features. The extent to which the proposed goals were achieved at the end of the project can be measured based on the analysis of actual service implementation, and the deliverable descriptions. They are presented below. ------ **Service implementation** ------ In accordance with the stated goals, the project considered setting up three kinds of service: - Institutional Repository as a Service (IRaaS) / IR hosting - Content tracking - Open access Monitoring The former (IRaaS) was the subject of a concrete proposal for implementation when the project was submitted in 2018. The latter two, on the other hand, were considered as prospective services, the realisation of which depended on the results of the most experimental components of the project. **Institutional Repository as a Service (IRaaS) / IR hosting** A public test instance of the service (*Shared repository* mode) was first deployed in September 2020 at [sonar.test.rero.ch](https://sonar.test.rero.ch), and several institutions were invited for a test drive. On their request, institutional accounts were created for 18 of them, that actually tested the service (document deposit and validation, search, content presentation, configuration, SWITCH edu-ID login), with very positive feedback in general and several useful comments. These were then used by the team for improvements and corrections. A production release of the publication platform was launched de facto in January 2021 at [sonar.ch](https://sonar.ch), which is now ready to receive the first clients. At the time of writing, a general public announcement is in preparation. Active contacts with client institutions are ongoing, a few of them approaching contract signing phase, and in one case including paid custom developments. **Content tracking** In May 2020, a first version of the content tracking pipeline was deployed. It automatically retrieves publications of researchers affiliated to Swiss publicly funded institutions from third-party international databases. A dataset of nearly 500,000 candidate bibliographic records and PDF was collected from CrossRef/unpaywall and MEDLINE/PubMed Central. On that occasion, the team requested the cooperation of Swiss IR managers, through the AKOA, in order to perform a comparative analysis between the SONAR dataset and their own records, and to provide qualitative feedback. We have collected 9 answers from different Swiss IR managers, including from 3 universities. Based on their valuable feedback, several improvements were made in the pipeline. The team estimates that the service is currently able to increase the publication coverage of existing Swiss IRs by approximately 30%. A draft service description exists for the tracking pipeline, although the technological readiness level of the service is still at an early stage. An API is available but regular update modalities and costs need to be adjusted by engaging a discussion with the potential users. **OA Monitoring** The retrieval of publications from a) Swiss institutional repositories and b) international sources (see "Content Tracking" above), constitute the basic elements for SONAR's potential contribution to the implementation of OA monitoring in Switzerland. The implementation of an actual OA Monitoring service implies other elements beyond those mentioned, which do not depend only on SONAR. A concrete follow-up project, including a larger number of participants is under consideration in the framework of the Open Science program. ------ **Deliverable analysis** ------ The project plan included 19 deliverables, with highly varying degrees of importance and impact. The major deliverables include the **stabilized SONAR system in production** with the corresponding **user interface implemented** and the **data exchange processes implemented**; the latter with regard to both **external sources** and **national institutional repositories**, and including a deduplication procedure based on DOI. These major deliverables have all been completed to a large extent, with one exception, which is the harvesting of national IRs: currently, only part of the repositories have been configured and harvested, whereas the others are still being progressively integrated. Overall, among the 19 deliverables, the completion status is: 13 completed, 4 partially completed but ready for production and whose improvement will be pursued along the service phase, 2 to be done; these are: **metadata export process to ORCID** and **content preservation solution**. A detailed analysis of the state of all deliverables is presented below, split by Work Package (WP), according to the original project plan. **WP1. Business model and governance** - `D1.1.` **Business model proposition for prospective services** [document] : **COMPLETED** [TODO: finalize the document and add a mention here] - `D1.2.` **Governance structure scenarios** [document] : **COMPLETED** The governance information is included in D1.1. **WP2. Main IT solution: tools and procedures** - `D2.1.` **SONAR hardware and software basic installation ready for development** [system] : **COMPLETED** - `D2.2.` **Procedures for creating and configuring additional “IR as a Service” instances** [tools and documentation] : **COMPLETED** - `D2.3.` **Stabilized SONAR system in production** [system] : **COMPLETED** A production release of the publication platform was launched in January 2021 at [sonar.ch](https://sonar.ch), including XXXXX imported publications and ready to receive the first IRaaS client institutions. **WP3. User interface** - `D3.1.` **Mockups** [document] : **COMPLETED** - `D3.2.` **User interface implemented** [system] : **PARTIALLY COMPLETED** / **ONGOING** A few features are still being finalized, namely persistent identifier assignment (ARK) and usage statistics. The publicly accessible Kanban on GitHub https://github.com/orgs/rero/projects/4 permanently shows the state of development. This tool will be kept in use in the future, during production operation, by the SONAR service team at RERO. **WP4. Interaction with Swiss IRs** - `D4.1.` **Agreements with Swiss IRs representatives** [document] : **PARTIALLY COMPLETED** / **ONGOING** No formal agreements have been required, as the data can be freely harvested from the corresponding repositories. The deliverable now consists of a description of all the harvesting details for each IR, namely: OAI-PMH URL and sets, metadata field mapping, document type mapping. - `D4.2.` **Data exchange process implemented** [system] : **PARTIALLY COMPLETED** / **ONGOING** Harvested IRs currently includes BORIS, Archive ouverte UNIGE and RERO DOC. The other IRs are being progressively added. **WP5. Recovering of full-text from 3rd-party OA (Feasibility study)** - `D5.1.` **Evaluation of acquisition accuracy** [document] : **COMPLETED** The publication acquisition pipeline (content tracking) has been set up according to plans. The service is currently able to increase the coverage of existing Swiss IRs by an estimated 30%. - `D5.2.` **JSON/XML services** [system prototype] : **COMPLETED**. The data is available at the URL http://candy.hesge.ch/SONAR/. It currently contains about half a million publications/DOIs, covering 78 different Swiss HEI and research institutions. **WP6. Content dissemination** - `D6.1.` **SONAR’s content ready to be indexed by academic search engines** [system]. **ONGOING** For this deliverable, the team focused on Google Scholar. Discussions between RERO and Google regarding the indexing of SONAR have been initiated in June 2020; Google awaits a request for starting the harvesting process. But a prerequisite is to have the system in production, which is now the case. RERO can resume the discussions. The actual indexing by Google Scholar is a very long process that can take several months after launch. - `D6.2.` **Metadata export process to ORCID implemented** [system] : **TO BE DONE** The usefulness of this deliverable has been revised downwards by the team; as a result, it was assigned very low priority, compared to other more essential features. It is still to be initiated. **WP7. Transformation and standardisation of full-text contents** - `D7.1.` **Reviews of formatting standards and resources for document representation** [document] : **COMPLETED** A brief state of the art as been prepared. The main conclusion is that JATS is currently the main standard for semantically rich document structuring, while BioC is a relevant complement standard to support text mining applications. - `D7.2.` **Extraction of full-text from PDF, XML and HTML** [system] : **COMPLETED** An extraction procedure has been set up, which leverages open source software, such as GROBID, and an authority list of author affiliations that was prepared in WP5. - `D7.3.` **Metadata extraction from the full-text** [system] : **COMPLETED**. The process includes the extraction of the following named entities: grant ID, grant agency and affiliations. **WP8. Analytics** - `D8.1.` Report on analytics [document] : **COMPLETED**. [FHGR] The report was written based on a literature review, as well as the outcome of a project-course type seminar with 40 students during winter semester 2020/21. From the literature a fairly clear recommendation for the project could be derived ; though, again, the implementation of our recommendation will depend on a sustainable funding strategy. At the core of our recommendation is the collaboration with a parallel OA Monitor project in Germany which is already very advanced. Politically the ground for a collaboration has been prepared. **WP9. Content preservation** - `D9.1.` **Study of the requirements and selection of the solution** [document] : **COMPLETED** [FHGR] Based on a review of relevant literature, the options for long-term preservation of repository content were assessed and a clear recommendation given. - `D9.2.` **Content preservation solution implemented** [system]. **ADAPTED** / **TO BE DONE** Based on `D9.1`, SONAR's service provider (RERO) considers that the only feasible approach for a long-term archiving solution is to rely on existing service providers (outsourcing). Its intention is to be able to deploy such an external solution and propose it as an option for interested clients. ------ **Project results overview** ------ Considering the original SONAR proposition, as presented to swissuniversities in 2018, the team has evaluated the overall completion rate of the project as quite positive, even considering that not all deliverables have been fully completed. SONAR's most relevant achievements may represent a valuable contribution to the Swiss scientific information domain, namely: - The *Swiss science showcase* stated in the original goals is now a reality, as the [sonar.ch](https://sonar.ch) portal currently contains XXXXX open access publications, even considering that there is room for improvement in the publication tracking pipeline, including coverage rate, regular harvesting updates and extension of the list of harvested Swiss IRs. In its current state, the collected dataset has the potential to increase the coverage rate of existing IR by 30%. A close collaboration with IR managers will allow them to exploit this potential. - A new, sustainable IR hosting service has been setup that is able to respond to the needs of HEI, operated by a service provider that is part of the community. These realisations represent a major milestone and a building block upon which future developments can take place, together with the higher education community, including open access monitoring and other initiatives in the realm of the national OA strategy. ------ **Project Organisation** ------ **Team** Have participated in the project the following people: - HES-SO (HEG Genève): Julien Gobeill, Jeevanthi Liyanapathirana, Patrick Ruch, Anouk Santos - FHGR (ex-HTW Chur): Gerhard Bissels, Karsten Schuldt - RERO: Sébastien Délèze, Johnny Mariéthoz, Igor Milhit, Miguel Moreira, Nicolas Prongué - USI: Silvio Bindella, Davide Dosi, Alessio Tutino **Tools used for collaboration** - GitHub: for the [source code](https://github.com/rero/sonar) of the publication platform and the [Kanban board](https://github.com/orgs/rero/projects/4) for the corresponding development process - SWITCHdrive for file exchange - Ryver for chat and high-level task management - Whereby and Zoom for videoconferencing - a mailing list, containing the email address of every team member This set of tools was quite satisfactory, with the exception of Ryver, that was not agile enough and ended up being seldom used. **Meetings** - Plenary meetings: as planned at project start, quarterly meetings involving all team members effectively took place (8 meetings on total). Those in 2019 took place in Bern, whereas in 2020 they were all made by videoconferencing due to the Covid-19 pandemic. - Biweekly flash meetings: the team gathered once every two weeks on Tuesday morning at 10h15 for a 15 minute discussion (Scrum-like approach). These allowed to discuss ongoing issues and current activity. This approach proved very effective. ------ **Communication** ------ The following official communication channels were used during the project. They will be kept in use in the future, during production operation, by the SONAR service team at RERO. - Web site with blog [https://sonar.ch/post/](https://sonar.ch/post/) : 15 blog posts were published in the period 2019-2020 - Twitter account [@sonardotch](https://twitter.com/sonardotch) : 14 tweets in the period 2019-2020 - Newsletter (142 contacts) - An announcement "Projet SONAR : Invitation à tester" was published in the P-5 program newsletter of October 2020 ------ Advisory bodies ------ [TODO: short introduction] **Technical Advisory Committee (TAC)** [TODO: 2020 meeting] **Scientific Advisory Board (SAB)** [TODO: 2020 meeting] ------ **Attachments** ------ - Updated financial plan (already provided with previous annual reports) - Project plan (already provided with previous annual reports)