Meeting minutes 21.10.2022 Dariah WP3.3
### Attempting to extract knowledge from Survey Open Answers
[Work package poster](https://www.kielipankki.fi/organization/fin-clariah/fin-clariah-2022-06-03/#W33_Qualitative_survey_data)
### WP 3.3 Overview
#### Project aims
- Motivation:
- Producing infrasturcture for collaborative resource collection and enrichment
- Dealing with open ended survey questions efficiently is challenging
- produce a toolset for humanities and social sciences researchers to help with open survey questions and get the most of them
- expected user has little to no programming experience, but is able to do small things with insturctions (for example in Jupyter notebooks)
- deliverables:
- showcase example analyses with pilot data sets
- software pipeline & scripts for the process
- an interface, where the user can insert their own data to be stored and accessed at CSC
#### Current situation
- R tools are in production
- Exploration of the pilot data sets; attempting to extract key words etc., visualize, sentiment analysis
#### First deliverable (goal for the end of 2022)
- A web resource (located at CSC?)
- Pilot dataset published/stored
- Ability to access specific datasets with special rights
- Software pipeline
### Collaboration with CSC
#### Discussion about restricted data
- Managing restricted data sets and granting rights have proven complex
- Limited rights to share open survey question data specifically (closed survey questions generally have looser limitations for sharing)
- How to control rights to specific datasets
- Goal: streamline the management of granting rights?
- Clarin licensing is not familiar, could be applicable
- Supports public, academic restriction and data that needs personal application to use
- Kielipankki has processes for sharing limited access data; does it make sense to store the data in Kielipankki from the end users' perspective?
- Probably not, since FSD portal and processes seem quite similar; integration with Kielipankki may not be the best choice
- Sensitivity of data sets:
- Current pilot data sets do not seem to contain sensitive data, but it is possible in the future that sensitive data could be added to the service
#### Toolset hosting in CSC
- What is wanted:
- R (or Python) packages with the tools
- Said tools could be used in Jupyter notebooks
- Notebooks team could be included in the discussion
- Ideally the software pipeline should be built in a manner that does not require the end user to download and unzip zip-files themselves
- A more automated system is wanted
- Limited access rights cause issues
- Use Puhti for R packages?
- Could the web interface be hosted in Puhti web interface?
- Technically yes (/maybe?), process unclear
#### Data hosting in CSC
- What is the procedure for bringing pilot data sets to the CSC
- The need for the data sets to exist in CSC over FSD is questionable
- Hopes to explore the possibility for external users to submit data in an automated manner
- What is the main benefit over researchers submitting their data to FSD? Automation only? Again, issues may rise with restricted data.
- Whether to store and share only open or also closed answers from the surveys is not clear yet
- Two linked issues:
- researcher wants to share their data for others to do analysis
- researches want to use the toolset to analyze their own data without sharing it (perhaps due to restrictions to access)
- Talk of using Sensitive Data Desktop for running the toolset with sensitive data
#### Useful links
- [Sensitive Data Desktop](https://docs.csc.fi/data/sensitive-data/sd_desktop/)
- [Using RStudio or Jupyter Notebook in Puhti](https://docs.csc.fi/support/tutorials/rstudio-or-jupyter-notebooks/)
- [Step-by-step instructions on how to get started with CSC services](https://research.csc.fi/en/accounts-and-projects)
- [Kielipankki Development](https://www.kielipankki.fi/development/)
- [Using CSC environment efficiently - self learning](https://csc-training.github.io/csc-env-eff/)
- Highly suggested for anyone starting to use CSC services
- [Weekly user support sessions](https://ssl.eventilla.com/event/PP4WB) Everyone welcome to ask questions from our experts!
- Every Wednesday at 14 in Zoom (currently piloting)
- Support
* ‘Z is not working as expected’
* 'my code gives error Y ’
* ‘can A be installed to Puhti?’
* ‘any advice how to do X?’
* ‘which service suits my needs?’
* training/example wishes
-> servicedesk@csc.fi
[Speed up your request ](https://docs.csc.fi/support/support-howto/)