# Meeting minutes 4. 10. 2022, Dariah WP4.4 UEF - CSC
WP 4.4. aim:
To develop representative benchmark corpus
To develop easy-to-use GUI for researchers to access and query these datasets
Background, poster in [here](https://www.kielipankki.fi/wp-content/uploads/Events/FIN-CLARIAH-2022-06-03/WP4-4.pdf)
Current situation:
* 2 Social media datasets, collection started in 2016 and ongoing :
* 1. dataset from US and UK, complete interaction data 233 000 people
Size 1 TB, contains textual, metadata and interaction data
2: similar from nordic countries, 600 000 people, metadata and textual data
Twitter is used because it has very rich metadata and data on interactions.
Data size not a problem for CSC
Gui running currently in UEF server. In GUI, the researcher can restrict search, fetch results, and also have a map for visualising the findings. There’s also an ability to download the results and process them somewhere else. Data is updated monthly.
* Discussion about HAKA authentication: HAKA supports national and (via eduGAIN) also international login.
* Currently no problem with storage capacity, but there’s a need for future centralised data solution
* Need for HPC in coming months for analysing the interaction network data
* More RAM needed in data pre-processing side
Aim to build a web application as a long term project. This will be online, with cumulative data. Points for discussion:
* How to keep services running beyond 2023 project funding? Currently tied to FIN-CLARIAH project, but will go onward independently
* UEF could become part of Clarin -> services become part of Kielipankki umbrella offering in similar way as Signbank.
* Needs to be discussed internally first within UEF & later with Krister Linden if integration is wanted
* If this would happen, service owner should still have an application specialists to take care of major maintenance efforts.
# Dariah Workshop 25.10. “Big noisy data”
AP: Prepare a more practical idea how you will begin CSC integration
# Fin-Clariah meeting 18.11 in Jyväskylä
Let's discuss ideas at CSC integration -thematic group meeting
# Links:
[Step-by-step instructions on how to get started with CSC services](https://research.csc.fi/en/accounts-and-projects)
[Using CSC environment efficiently - self learning](https://csc-training.github.io/csc-env-eff/) Highly suggested for anyone starting to use CSC services
[Weekly user support sessions
](https://ssl.eventilla.com/event/PP4WB) Everyone welcome to ask questions from our experts! Every Wednesday at 14 in Zoom (currently piloting)
Support
* ‘Z is not working as expected’
* 'my code gives error Y ’
* ‘can A be installed to Puhti?’
* ‘any advice how to do X?’
* ‘which service suits my needs?’
* training/example wishes
-> servicedesk@csc.fi
[Speed up your request ](https://docs.csc.fi/support/support-howto/)