Meeting minutes 21.10.2022 Dariah WP3.3 ### Attempting to extract knowledge from Survey Open Answers [Work package poster](https://www.kielipankki.fi/organization/fin-clariah/fin-clariah-2022-06-03/#W33_Qualitative_survey_data) ### WP 3.3 Overview #### Project aims - Motivation: - Producing infrasturcture for collaborative resource collection and enrichment - Dealing with open ended survey questions efficiently is challenging - produce a toolset for humanities and social sciences researchers to help with open survey questions and get the most of them - expected user has little to no programming experience, but is able to do small things with insturctions (for example in Jupyter notebooks) - deliverables: - showcase example analyses with pilot data sets - software pipeline & scripts for the process - an interface, where the user can insert their own data to be stored and accessed at CSC #### Current situation - R tools are in production - Exploration of the pilot data sets; attempting to extract key words etc., visualize, sentiment analysis #### First deliverable (goal for the end of 2022) - A web resource (located at CSC?) - Pilot dataset published/stored - Ability to access specific datasets with special rights - Software pipeline ### Collaboration with CSC #### Discussion about restricted data - Managing restricted data sets and granting rights have proven complex - Limited rights to share open survey question data specifically (closed survey questions generally have looser limitations for sharing) - How to control rights to specific datasets - Goal: streamline the management of granting rights? - Clarin licensing is not familiar, could be applicable - Supports public, academic restriction and data that needs personal application to use - Kielipankki has processes for sharing limited access data; does it make sense to store the data in Kielipankki from the end users' perspective? - Probably not, since FSD portal and processes seem quite similar; integration with Kielipankki may not be the best choice - Sensitivity of data sets: - Current pilot data sets do not seem to contain sensitive data, but it is possible in the future that sensitive data could be added to the service #### Toolset hosting in CSC - What is wanted: - R (or Python) packages with the tools - Said tools could be used in Jupyter notebooks - Notebooks team could be included in the discussion - Ideally the software pipeline should be built in a manner that does not require the end user to download and unzip zip-files themselves - A more automated system is wanted - Limited access rights cause issues - Use Puhti for R packages? - Could the web interface be hosted in Puhti web interface? - Technically yes (/maybe?), process unclear #### Data hosting in CSC - What is the procedure for bringing pilot data sets to the CSC - The need for the data sets to exist in CSC over FSD is questionable - Hopes to explore the possibility for external users to submit data in an automated manner - What is the main benefit over researchers submitting their data to FSD? Automation only? Again, issues may rise with restricted data. - Whether to store and share only open or also closed answers from the surveys is not clear yet - Two linked issues: - researcher wants to share their data for others to do analysis - researches want to use the toolset to analyze their own data without sharing it (perhaps due to restrictions to access) - Talk of using Sensitive Data Desktop for running the toolset with sensitive data #### Useful links - [Sensitive Data Desktop](https://docs.csc.fi/data/sensitive-data/sd_desktop/) - [Using RStudio or Jupyter Notebook in Puhti](https://docs.csc.fi/support/tutorials/rstudio-or-jupyter-notebooks/) - [Step-by-step instructions on how to get started with CSC services](https://research.csc.fi/en/accounts-and-projects) - [Kielipankki Development](https://www.kielipankki.fi/development/) - [Using CSC environment efficiently - self learning](https://csc-training.github.io/csc-env-eff/)  - Highly suggested for anyone starting to use CSC services - [Weekly user support sessions](https://ssl.eventilla.com/event/PP4WB) Everyone welcome to ask questions from our experts! - Every Wednesday at 14 in Zoom (currently piloting) - Support * ‘Z is not working as expected’ * 'my code gives error Y ’ * ‘can A be installed to Puhti?’ * ‘any advice how to do X?’ * ‘which service suits my needs?’ * training/example wishes -> servicedesk@csc.fi [Speed up your request ](https://docs.csc.fi/support/support-howto/)