# Meeting minutes 4. 10. 2022, Dariah WP4.4 UEF - CSC WP 4.4. aim: To develop representative benchmark corpus To develop easy-to-use GUI for researchers to access and query these datasets Background, poster in [here](https://www.kielipankki.fi/wp-content/uploads/Events/FIN-CLARIAH-2022-06-03/WP4-4.pdf) Current situation: * 2 Social media datasets, collection started in 2016 and ongoing : * 1. dataset from US and UK, complete interaction data 233 000 people Size 1 TB, contains textual, metadata and interaction data 2: similar from nordic countries, 600 000 people, metadata and textual data Twitter is used because it has very rich metadata and data on interactions. Data size not a problem for CSC Gui running currently in UEF server. In GUI, the researcher can restrict search, fetch results, and also have a map for visualising the findings. There’s also an ability to download the results and process them somewhere else. Data is updated monthly. * Discussion about HAKA authentication: HAKA supports national and (via eduGAIN) also international login. * Currently no problem with storage capacity, but there’s a need for future centralised data solution * Need for HPC in coming months for analysing the interaction network data * More RAM needed in data pre-processing side Aim to build a web application as a long term project. This will be online, with cumulative data. Points for discussion: * How to keep services running beyond 2023 project funding? Currently tied to FIN-CLARIAH project, but will go onward independently * UEF could become part of Clarin -> services become part of Kielipankki umbrella offering in similar way as Signbank. * Needs to be discussed internally first within UEF & later with Krister Linden if integration is wanted * If this would happen, service owner should still have an application specialists to take care of major maintenance efforts. # Dariah Workshop 25.10. “Big noisy data” AP: Prepare a more practical idea how you will begin CSC integration # Fin-Clariah meeting 18.11 in Jyväskylä Let's discuss ideas at CSC integration -thematic group meeting # Links: [Step-by-step instructions on how to get started with CSC services](https://research.csc.fi/en/accounts-and-projects) [Using CSC environment efficiently - self learning](https://csc-training.github.io/csc-env-eff/) Highly suggested for anyone starting to use CSC services [Weekly user support sessions ](https://ssl.eventilla.com/event/PP4WB) Everyone welcome to ask questions from our experts!
Every Wednesday at 14 in Zoom (currently piloting) Support * ‘Z is not working as expected’ * 'my code gives error Y ’ * ‘can A be installed to Puhti?’ * ‘any advice how to do X?’ * ‘which service suits my needs?’ * training/example wishes -> servicedesk@csc.fi [Speed up your request ](https://docs.csc.fi/support/support-howto/)