--- tags: liber-dslib --- # Additional Meetings ## Understanding AI in Libraries - Workshop (27.09.MO) **Attendees:** - name / role / institution - Péter Király / sw developer, researcher / Göttingen eResearch Alliance - Mia Ridge / Digital Curator / British Library - Camilla Lindelöw / Officer / National Library of Sweden - Linda Sīle / Metadata manager / Anet | University of Antwerp - Angela Vorndran / Data management / German National Library (DNB) - Birgit Schmidt - Sefiane - Heli Kautonen - Laurents - Andrea A. Gasparini **Notes:** intros: Andrea A. Gasparini: Uni of Oslo Library Angela Vorndran (DNB): work with large metadata collection of German and Austrian regional library networks. At DNB AI mainly used as a means to find classifications and indexing for online publications which cannot be treated intellectually. Birgit: open science, 3rd party funding. AI: data analytics, mining large information sources, a moving target Camilla Lindelöw: Swedish Nat. Lib. AI: knowing more about my field. Laurents: Leiden, head of digital scholarships. AI is applied in different fields e.g. Quality of metadata. Linda Sile: Uni Antwerp, library information system. Scholarly publications database maintaining. Suggestions for keywords in subject classifications Mia Ridge: BL, Alan Turing Inst. Text based methods, computer vision, maps. AI: machine learning Péter Király: ...: ... Presenters: the research's sponsor is The Finnish-Norwegian Cultural Foundation. Recipients: Oslo University Library and Finnish Literatore Society This workshop is a part of several workshops offered in Scandinavia mainly. The data scientist (Daniel ...) left the library because of better opportunities in the commercial world The purpose: to create a text predictor They have worked with many commercial off-the-shelf and Open-source software (OSS) tools like Rayan QCRI, YEWNO (good, but very expensive. Used by British NHS. For life science. Now part of ExLibris, so it is possible to use in library context), Keenious (dev'ed by Norwegian students, a plugin for Word and Google Docs, finds possible references during writing), IRIS (may help researchers with extensive literature reviews. Expensive), Research rabbit (navigation btw authors and topics), Connected papers (navigation btw papers) and Open Knowledge Maps (clusters by topics of relevant papers) Individual exercise with sticky notes: What are the acute questions in regard to AI/ML? Link to MIRO board: (https://miro.com/app/board/o9J_lvbjp1o=/) Matthews (2021), Drowning the ... Schoed et al. (2020), Use of AI for Medical Literature Search, ... *Literature review* Retrieved 152 relevant papers for topic AI in libraries, evaluation of 126 papers... - libraries/librarians have different roles (neutral investigator, ) - AI may have a role - users have different roles (exploited by AI, information seeker, co-designer etc) - Role of design (several aspects) Second exercise with sticky notes: What kinds of AI strategies would you need in your library? PPT and reading list will be shared with the group. --- ## DSLib compared to other LIBER WGs - September 2021 - by Camilla, Angela Two LIBER Working groups seem closely related to the suggested topics in the DS WG: Digital Scholarship & Digital Cultural Heritage WG and Research Data Management WG. Contacting the WGs or taking into consideration their outputs, goals and infrastructure would be advisable when planning the activities of our WG. **LIBER Working groups** **1) Digital Scholarship & Digital Cultural Heritage Working Group (since 2017)** This working group´s topics are close to ours. They focus on data management, infrastructures, skill development, accompanying the research process but with a focus on digital humanities. Subgroups: pro(vi)ding expertise, relationships with research community, impact of DH activities in libraries, digital collections. Could be a good example of possible goals and ways to create valuable outputs. Outputs: - They conducted a survey and published a report: [Europe´s Digital Humanities Landscape](https://zenodo.org/record/3247286). Some questions relevant for us, e.g. about Data Capture Activites, Data creation, enrichment of data, analysis of data, storage of data, etc.. - Webinars on [Setting up a GLAM workbench](https://www.youtube.com/watch?v=LXk60YDdaMA&list=PLHA3lUmrYM3sR0sdjTEED4ahsCO3GTx9w&index=28) - reading list and Zotero library How big is the overlap between the research areas Data science and Digital Humanities in general? **2) Architecture Working Group** - not relevant for our topics **3) Citizen Science Working Group (since 2019)** Working on [A librarian´s Guide to Cizien Science](https://libereurope.eu/working-group/liber-citizen-science-working-group/citizen-science-guide-2021/). Main topics: Skills, Infrastrucuters, Good (open) scientific practice, Guidelines for activities involving libraries. Group is not very active. **4) Copyright & Legal Matters Working Group (since 2014)** Goals: Monitor current European Law, react to and impact developments in European law concerning libraries, universities, researchers e.g. right for TDM Tasks: advocate towards policymakers, inform libraries about implications of laws, support libraries to voice concerns in implementation phase of directives Published several brief documents on legal matters e.g. [TDM factsheet](https://libereurope.eu/document/tdm-factsheet/tdm-copyright-exception/). Could be relevant for us when addressing copyright questions. **7) Educational Resources Working Group** - not relevant for our topics **8) FIM4L Working Group** Goal: develop a library policy for federated authentication which is broadly supported by libraries and publishers. Published [recommendations for SSO implementation](https://libereurope.eu/document/liber-fim4l-recommendations-2020-v-01/liber-fim4l-recommendations-2020-v-01/). Could be of interest if our group would focus on access points to data. **9) Leadership Programmes Working Group** - not relevant for our topics **10) Open Access Working Group (unclear start date)** Goal: provides libraries with practical guidance for implementing Open Access. Is part of LIBER's Strategic Direction on Innovative Scholarly Communications. Not so much overlap with the DSLib group but some of their work may have practical consequences, especially their work in IPR (like their principles for CC-BY and CC0) for TDM purposes. Since there is a WG dedicated exclusively to Copyright and Legal matters we assume this group exchange ideas with that group to reach consensus on Liber's stance in copyright issues. **11) Research Data Management Working Group (since 2019?)** Goals: 1. help libraries secure a critical role in the 21st century scholarly information infrastructure. 2. recognize research data as valuable academic resources that need to be managed, shared and preserved to foster research and science. Part of LIBER's Strategic Direction research infrastructure. Overlap with the DSLib group insofar as to the curation of research data, which include data resulting from data science activities, which should be subject to DMPs etc. The group has created a [DMP catalogue](https://libereurope.eu/working-group/research-data-management/plans/) and has created [many resources concerning RDM and libraries](https://libereurope.eu/working-group/research-data-management/documents-resourcess/). It could be of interest to meet with this group and share experiences and ideas. --- ## DSLib group – theme exploration – August 2021 – by Neha, Linda, Pam In the brainstorming notes, a number of elements emerge that are mentioned in common by the participants: - **Landscape analysis:** generate an overview of developments in the field of AI and machine learning, - **Skill development within libraries** : generate an overview of tools and skills needed for this - for example Python/R, - **Service to external users:** establishing existing guidelines and/or generate them, - **Service to external users:** exchange views on how we deal with meta data – for example related to CHRIS/FAIR/Grants. - **Service to 'internal' users:** In further specifying the follow-up steps associated with the above points, the following categorization can be considered: - **Target groups** (e.g. management, researchers, students, colleagues), - **Various fields of research** (e.g. potential differences and/or similarities between alpha, gamma, beta, and meds domains), - **Policy and strategy related action points** (e.g. for example how to report on scientific & societal impact in research evaluations), - **Pragmatic aspects** (e.g. infrastructure, tools, related costs), - **Sharing technical knowledge** (e.g. potential differences and/or similarities within countries and libraries, exploring text and data mining services, data visualization, how to implement new techniques in research data repositories, legal aspects, training colleagues and/or researchers, guidelines), - **Connection between theory and practice** (e.g. for example, what DS developments are short term goals and what developments are realistic in a larger time span, how do DS development tie in with overarching policy and evaluation developments, is every data set suitable for DS or should we choose when to apply it). - **Open Science and Recognition & Rewards** (e.g. how do DS development tie in with overarching policy and evaluation developments). Subsequently, as a working group, we can discuss the questions below, grouping them by the above categories as desired, and prioritize them. **Overarching DSLib questions:** - Is the focus on libraries only or GLAM/LAM? - How much attention to research data (overlap with RDM wg in Liber)? - How much attention metrics for external users? - For upskilling: focus on resources / promotion / training on DS? (or all?) - Resources for DS practitioners or for those who want to learn DS? - What's the outcome: a report? Use-case descriptions (~rdm wg)? DS training / guidance for organising training / resources for training? A specific DS service created collaboratively? **Data science in libraries: a) landscape analysis** - what are libraries doing in data science/AI/machine-learning - Text analysis of LIBER institutions sites for mentions of ml/ds as starting point for landscape survey - what are the current applications of DS in LAM. Specifically I would like to get information for each use case about data sources, methods/tools, output and whether the use case is in the research or production phase. - What are the data that the library engages with and deals with? In-house/open data? What infrastructure and tools are used? How do they publish their outcomes? - split goals in two sections? pragmatic (infrastructure, tools) vs. overarching goals (scientific impact & societal impact) - It would be useful to look into the ideas that are/become embedded in data practices within library and map out what happens in this regard in different libraries. From what I have seen, sometimes, there is a clash between ideas as they are discussed and as they are executed in practice. Given the increasing use of data science / data analytics methods, a discussion on this topic could bring out ideas on directions that can be taken within libraries. - Collect use cases **Data science in libraries: b) a service to external users** - data science-related services that libraries can provide their customers (researchers): the tools/skills that are handy in research - Guidance to researchers how to use AI techniques in their research. - Matchmaking: identification of funding streams for researchers. - Modelling/curating (FAIR) data as fuel for AI techniques. - Monitoring and predicting impact of Open Science for researchers, society, citizens. - Be able to assist researchers / provide guidance on installed or emerging topics regarding data acquisition and transformation, such as TDM ; point them to existing tools - Position the Library as a competence center for such services (guidance on installed or emerging topics regarding data acquisition and transformation) **Data science in libraries: c) skill development within libraries** _Resources_ - Build a list of grant funded ML/DS initiatives to see what is successfully being funded - Open datasets for applying ML/DS in glams - point people to resources already developed or being worked on (e.g. AI4LAM workshops, Turing humanities and data science WG; Library Carpentry AI / machine learning lessons, project or format specific material e.g. newspapers and Living with Machines, Impresso etc, computer vision, OCR/HTR; link to other landscape analysis e.g. Europeana AI survey; Terras forthcoming etc) as reference material for their own learning and answering queries. - cooperation on a common literature list (Zotero group) - Existing resources for processing bibliographic data in R / Python (e.g. DOI verification, ISBN hyphenation, record linkage) and exchanging ideas on what and how is done in different settings (use of machine learning, dashboards, data viz streamlining and so on). _Tasks within libraries / (G)LAM_ - Extraction of metadata from AV, Images. - Explore auto-generated metadata, text summarization, etc as information discovery tools in glams - Scaling up pilots to full coverage of collections _Skills_ - how to upskill librarians in data science-related skills (this can be for use within the library too, not necessarily for researchers or students) - Upskill the colleagues and teams in that regard through professional training - Help people understand what DS / AI / ML is particularly good for and when/where it's harder to do well; where it fits in library data lifecycles and the issues that emerge when applying to GLAM data - Help people understand how to choose a platform for DS work depending on their need, number of images / source files, where the work fits in a general data workflow, etc. - Help people anticipate costs (compute, storage) and overhead for accessing DS services, associated needs for digital preservation and storage, replicability - Help people understand how to minimise environmental impact of DS / AI methods e.g. minimising re-training, avoiding blockchain - how to build a team of data analytics within the organisation - who will join, what are the prerequisites and how to explain to the rest of the organisation what this is about. Since the library has many different data sources, this would probably be a cross-organisational work. What kind of competence is already available, and what is needed. ([https://doi.org/10.7287/peerj.preprints.3160v2](https://doi.org/10.7287/peerj.preprints.3160v2)) - Joint learning/upskilling/team building. Library Carpentry is an example. ---