--- tags: NDHL --- # NDHL Pre-Study Final Report # The report is drawn-up in agreement between NeIC as the project owner represented by Abdulrahman Azab and the project manager Kristoffer L. Nielbo (KLN). ## Edition History ## | Issue | Date | Comment |Author/Partner | | - | - | - | - | | 0.1 | 01-02-2022 | First draft by PM | KLN | | 0.2 | 21-10-2022 | Second draft by PM | KLN | ## Abstract ## ## Table of Contents ## --- [1. Basic Information](#basic-Information) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[1.1 The Project](#the-project) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[1.2 Background](#background) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[1.3 Summary of NDHL activities](#summary-of-ndhl-activities) [2 Achievement of objective](#achievement-of-objective) [3 Project Execution](#project-execution) [4 Transferral of results](#transferral-of-results) [5 Collected Experience](#collected-experience) [6 Recommendations](#recommendation) [References](#references) ## Basic Information ## _Nordic Digital Humanities Laboratory_ (NDHL) explores the possibilities of joined design and implementation a compute and data infrastructure that can facilitate code and data sharing for highly heterogeneous and unstructured cultural heritage data across the Nordics. NDHL is a digital humanities initiative in the Nordic countries with the goal of providing said infrastructure for expert users initially and, in time, for the rest of humanities and arts. The primary goals of the NDHL pre-study was community building and testing of possible Nordic infrastructure solutions. ### The Project ### NDHL will create convergence in Nordic humanities and arts e-infrastructures through a participant-driven virtual laboratory for compute- and data-intensive research. NDHL is first and foremost a research-driven infrastructure initiative. NDHL’s goals are to * Develop a common data, software and service stack at royal libraries and high performance computing centers across the Nordics * Ensure joint access to restricted and copyrighted cultural heritage data * Develop a sandbox environment which enables explorations of cultural heritage collections. While expert users develop and manage the infrastructure, it will be open to all researchers in the Nordics. NDHL is initiated as a NeIC pre-study with the following objectives: Develop prototype solutions for collaboration, identify stakeholders, map Nordic synergies, and build an open and inclusive community. The partners of NDHL span universities, libraries and infrastructure providers across the Nordics that join forces in order to connect humanities and arts research that rely on compute- and data-intensive applications in a stronger research community where access to and sharing of data and compute resources are made faster and more efficient through a Nordic collaboration. Due to the spread of _COVID-19_ in the Nordics approximately three months into the pre-study, we re-organized NDHL to predominantly online activity mixed with bilateral meetings between the PM and the national partners. ### Background ### Development of e-infrastructure in the humanities has predominantly focused on data enrichment for domain experts in core areas of humanities such as literature, history, and media studies. Only recently have we seen infrastructural initiatives intended to support high performance computing and large data in digital humanities (e.g., modeling and analysis of large collection of cultural heritage data). These initiatives are, however, national and developed in parallel across the Nordics with limited knowledge sharing and cross-fertilization. To anticipate developments in national infrastructure and develop a pan-Nordic platform for expert users (i.e., researchers with rich code repositories and technical research profiles) ### Summary of NDHL activities ### #### Planning a shared stack and data repository #### The first NDHL workshop, [online resources](https://centre-for-humanities-computing.github.io/Nordic-Digital-Humanities-Laboratory/workshop/Sample-workshop_-1/), took place in Gothenburg, Sweden on December 18th-19th, 2019. The overall goal of this workshop was to plan the content of a shared stack. This was done through the following program: * Identify current national infrastructures that NDHL can leverage * Define minimal requirements to NDHL * Discuss pilot infrastructure * Discuss and update planned activities for 2020 #### NeIC AHM Workshop #### NDHL had a small workshop at NeIC's AHM 2020 with FI and DK participation. Both partners worked day and night to develop initial prototypes for 1) describing metadata of arbitrary textual cultural heritage collections and 2) a novelty and trend detector for newspapers in Danish, Swedish, Norwegian and Finish, [online repository](https://github.com/centre-for-humanities-computing/newsFluxus). The novelty and trend detector became a valuable product as COVID-19 during the spring semester 2020 in several partner countries. #### Hack the Pandemic: News Media Responses During Covid-19 #### In reponse to the COVID-19, NDHL decided to run monthly hackathons during 2021 in order to explore forms of pan-Nordic research and infrastructure collaboration for humanities and social sciences. During these events, NDHL focused particularly on topics related to sociolcultural responses to Covid-19. Due to the urgency of the situation, NDHL decided to keep the events open to researchers that were not NDHL members nor limit participation to the Nordics. In particular, the Danish [HOPE](https://hope-project.au.dk/#/) project participated regularly as did Dutch media researchers. This collaboration has continued beyond the pre-study. #### EOSC-Nordic UCloud Demonstrator #### The first hackathon was run as a EOSC-Nordic demonstrator of the Danish DeiC’s HPC system for interactive computing [UCloud](https://cloud.sdu.dk/) in order to test its capacity for pan-Nordic collaboration around siloed data. The FI and DK teams identified change points and salient stories in the Danish and Norwegian news stream during the first and second phase of Covid-19. The NO team worked on a feature extraction pipeline for Norwegian newspaper data. #### Bilateral Meetings #### To facilitate collaboration and ensure personal engagement between national stakeholders under the varying travel-restrictions of COVID-19, NDHL's PM had in-person bilateral meetings with national representatives and stakeholders in Finland, Norway, and Sweden. These meetings were particularly focused on ensuring continued collaboration with data providers (i.e., royal libraries) because 1) access to data is pivotal to NDHL, and 2) the conditions under which data access can be granted varies both due to national and European legislation. In addition, the data providers were in some cases more financially challenged than universities and infrastructure providers during COVID-19. In addition, the PM met with representatives of or organized workshops at leading (Nordic) Digital Humanities conferences (ex. Computational Humanities Research, Digital Humanities in the Nordic and Baltic Countries) in order to generate interest in NDHL and identify potential stakeholders in the Nordics and beyond. #### Publications #### As researchers, NDHL's participants prioritized publications and participation in conferences as communication platforms. To date six publications and conference proceedings have been published or are _in print_ as an outcome of _Hack the Pandemic_: List of publications * Baglini, R.B., Hansen, L., Enevoldsen, K., Nielbo, K.L. (2021) Multilingual Sentiment Normalization for Scandinavian Languages, Scandinavian Studies in Language, Vol. 12, No. 1, p. 50-64. * Enevoldsen, K.C., Hansen, L., Nielbo, K.L. (2021) DaCy: A Unified Framework for Danish NLP. arXiv:2107.05295 [cs.CL]. * Nielbo, K.L., Hæstrup, F., Enevoldsen, K., Vahlstrup, P.B., Baglini, R.B., Roepstorff, A. (2022) "When no news is bad news - Detection of negative events from news media content." DH Benelux 2021 Conference Proceedings. - Nielbo, K.L., Baglini, R.B., Vahlstrup, P.B., Enevoldsen, K., Bechmann, A., Roepstorff, A. (2021) News Information Decoupling: An Information Signature of Catastrophes in Legacy News Media. EADH 2021. - Nielbo, K.L., Baglini, R.B., Roepstorff, A. (2022) Information Decoupling as a Pandemic Signature in News Media, Proceedings of the Digital Humanities in the Nordic Countries 7th Conference. - Nielbo, K.L., Enevoldsen, K., Baglini, R., Fano, E., Roepstorff, A., Gao, J. (in review) Pandemic news information uncertainty - News dynamics mirror differential response strategies to COVID-19. ## Achievement of Objectives ## To work towards NDHL's goals, the pilot study sought to achieve four objectives: Develop prototype solutions for collaboration, identify stakeholders, map Nordic synergies, and build an open and inclusive community. #### 1. Prototype solution for collaboration #### Through engagement with NeIC, EOSC-Nordic and national infrastructure providers, participants in NDHL were exposed to several national and international research cloud and machine learning platforms (ex. STACKn, SNIC Science Cloud, DeiC Interactive HPC, Microsoft Azure) as well as local compute and data infrastructures (ex. The Royal Library of Sweden's KBLab and the National Library of Norway). COVID-19 provided a natural setting for testing Nordic 'collaboration in the cloud'[^1], and NDHL found that the usability, project management features, large application store, and data-centric design of DeiC Interactive HPC platform [UCloud](https://cloud.sdu.dk/) to be ideal. UCloud is a digital research environment developed by University of Southern Denmark's eScience center (SDU eScience Center). It provides an intuitive user interface that improves the usability HPC environments. UCloud provides a way to access and run applications regardless of users’ location and devices. It also serves as a cloud data storage, which allows users to analyse and share their data. UCloud acts as an orchestrator of resources. Allowing users to consume resources, such as compute and storage, from multiple different providers using the same interface. This allowed for a seamless experience when consuming resources from different providers, allowing researchers to focus on their work as opposed to the specifics of any given provider. During NDHL compute and storage were provided by DeiC and Aarhus University via University of Southern Denmark YouGene cluster. For future development of NDHL, it is important to mention that UCloud has very low technical barriers to entry compared to otherwise competitive solutions like Azure and GCP, but also that the default interactive GUI access is however not a hindrance for batch processing. [^1]: With this statement, we do not intend to take COVID-19 lightly or in any way express disregard for the numereous tragedies resulting from the pandemic. #### 2. Identify stakeholders #### NDHL was from the beginning driven _by_ researchers and developed _for_ researchers, but the pilot project dedicated considerable resources to meet with data and infrastructure providers. For the continued development the following nine need-to-have stakeholders were identified: - Aarhus University, Center for Humanities Computing Aarhus (Denmark) - The Royal Danish Library (DenmarK) - DeiC Interactive HPC (Denmark) - University of Helsinki, Helsinki Centre for Digital Humanities (Finland) - The National Library of Finland (Finland) - CSC IT -- Center for Science (Finland) - Sprogbanken (Sweden) - The National Library of Sweden, KBLab (Sweden) - National Library of Norway (Norway) In addition several nice-to-have stakeholder were identified in the areas of preservation and archiving of cultural heritage data (ex. Grundtvigs Værker, Henrik Ibsens skrifter) and European digital infrastructures (DARIAH, CLARIN, SSHOPENCLOUD). Finally, NDHL would have liked to explore additional Nordic and Baltic university partners, in particular The Centre for Digital Humanities and Arts (CDHA) from University of Iceland and DH Estonia -- The Estonian Society for Digital Humanities. #### 3. Map Nordic synergies #### A pan-Nordic combined effort in access to data will result in research productivity and quality improvements that exceed the individual Nordic countries capacities. The synergies of pan-Nordic research and infrastructure collaboration has long been apparent in HPC and has crystalized in the the pan-European pre-exascale supercomputer _LUMI_ located at CSC's data center in Kajaani, Finland, and the related NeiC project _Puhuri_. Similar efforts have also been seen on the data-side with NeiC-funded collaboration on sensitive data _Heilsa Tryggvedottir_. _CLARIN_ and to a lesser extend _NLPL_ are the only examples of a similar ambition in the humanities and arts, but with a more limited scope (i.e., natural language data), target group (i.e., linguistics and computer science), and infrastructure (i.e., shared data, model and tool repositories). In particular, NDHL identified four possible Nordic data-related synergies that would enable research excellence in Nordic humanities and arts faculties (see Recommendations): 1) a Nordic cultural heritage data finder, 2) a Nordic interfederation between National data providers and universities, 3) a Nordic confederation for federated learning on restricted and siloed data, and 4) a Nordic collaborative framework for quality and bias assessment of cultural heritage data. #### 4. Build an open and inclusive community #### COVID-19 severely limited the planned outreach and community-building activities. Beyond online meetings and conference participation, activities were advertised through NDHL's [website](https://centre-for-humanities-computing.github.io/Nordic-Digital-Humanities-Laboratory/) and researchers were encouraged to participate in _Hack the Pandemic_ if they had an interest in the topic and had sufficient technical know-how to utilize the cloud environment. NDHL would then provide data and compute resources, and to a lesser extend technical support. The hackathons were typically joined by 20-25 participants from humanities and social science faculties, and national libraries in Denmark, Norway, and Finland. The PM's billateral in-person meetings contributed to community-building with key stakeholders primarily in Finland and Norway. <!-- ## Project Execution ## --> ## Transferral of results ## All results stemming from the pre-study are made publically available on NDHL's [website](https://centre-for-humanities-computing.github.io/Nordic-Digital-Humanities-Laboratory/) or as preprints and open access publications. The website will be maintained until primo 2023 after which all data can be accessed in the associated GitHub [repository](https://github.com/centre-for-humanities-computing/Nordic-Digital-Humanities-Laboratory).[^2] [^2]: The repository is currently private, but will be open to everyone by 2023. <!-- ## Collected Experience ## --> ## Recommendations ## NDHL has provided a proof-of-concept that the coordination between Nordic data and cloud providers is both possible and feasible for a specific use case (news monitoring during Covid-19), but to provide a viable, inclusive, and productive solution, a digital infrastructure collaboration is required. We therefore propose a collaboration between data providers (ex. national archives) and consumers (arts and humanities researchers), and compute infrastructures (ex. DeIC) in order to establish four unique data services: 1) a data finder that make national archives accessible to and searchable from research cloud infrastructures and facilitate FAIR and open data; 2) a data passport based on a pan-Nordic inter-federation between national archives, universities and infrastructure providers, 3) a federated learning environment for research in restricted national archival data, and 4) a quality assurance and bias analysis library that facilitate fair and transparent data analysis and modeling. Initially, NDHL's had hoped to implement the full infrastructure in a multi-lateral consortium. Experience from the latter part of the pre-study, especially collaboration national archives, complicated this and we therefore suggest project implementation through bilateral collaborations. Assuming DK as lead, then we suggest a DK-NO, DK-FI, and DK-SE order based on the ease of data access and existing collaborations between archives.