owned this note
owned this note
Published
Linked with GitHub
---
tags: Geoinformatics Research days
title: Geoinformatics research days workshop
---
# Geoinformatics research days - CSC workshop 2022
:::info
**Link collection**
This document: [hackmd.io/@GeospatialCSC/GISRD](f)
[Todays slides](https://a3s.fi/gis-courses/2022_gi_research_days/gis-research-days-2022.pdf)
[Geoportti webpage](https://www.geoportti.fi/)
[CSC Docs](https://docs.csc.fi/)
[CSC Training calendar](https://www.csc.fi/en/training#training-calendar)
[CSC Geocomputing page](https://research.csc.fi/geocomputing)
[CSC Geocomputing examples](https://github.com/csc-training/geocomputing)
:::
**Table of contents**
[TOC]
## Introduction round
* Who are you?
* What is your research topic?
* Voluntary networking: you may also leave your email address forhere others/CSC to contact you/ add you to GIS/Geoportti mailing list (link to this document is only shared with you)
---
* Johanness Jamaludin, PhD student, VITRI, using geospatial data for tropical forest conservation (johanness.jamaludin@helsinki.fi)
* Merve Keskin, Post-doc researcher, Finnish Geospatial Research Institute (FGI/NLS), working on use/user and cognitive issues of maps, eye tracking, vector geodata processing (geometry learning, deep learning) (merve.keskin@nls.fi)
* Emil Ehnström, Project Planner, YLLI-project, Univeristy of Helsinki, using Twitter to identify sports activitiy locations, geoparsing (emil.ehnstrom@helsinki.fi)
* Mikko Kesälä, GIS specialist Finnish forest centre, diversity modeling and water protection (GIS analysis methods and automation)
(mikko.kesala@metsakeskus.fi)
* Elisa Hanhirova, Project planner, Research Center for Ecological Change (REC), University of Helsinki, land use and land cover chnages correlating with different species in Finland (elisa.hanhirova@helsinki.fi)
* Arttu Kivimäki, Researcher, Finnish Geospatial Research Institute (FGI/NLS), working on Sentinel-data time-series (arttu.kivimaki@maanmittauslaitos.fi)
* Tua Nylén, postdoc Uni Turku & Uni Helsinki, working with Landsat and Sentinel imagery from outside Finland etc. (tua.nylen@utu.fi)
* Charlotte van der Lijn- Postdoc in Geography, Uni Helsinki, working on GIS analyses in YLLI-project. charlotte.vanderlijn@helsinki.fi
* Mika Jalava, postdoc at Aalto University. Working with food system modelling
* Vesa Arki, university teacher in geoinformatics at University of Turku, thinking of ways to get students to use CSC services (vepear@utu.fi)
* Henrikki Tenkanen, Assistant professor of Geoinformation Technology at Aalto University (henrikki.tenkanen@aalto.fi)
* Tatu Leppämäki, PhD researcher, Uni Helsinki (Digital Geography Lab), working on big mobility data to understand presence and interactions of people (tatu.leppamaki@helsinki.fi)
* Anton Kuukka, master level student at University of Turku, writing master's thesis at the moment. I'm interested in eg. topographic mapping and added value of LiDAR in updating existing topographic maps and creating autogenerated maps, identification of stones, cliffs etc. from LiDAR data. antkuu@utu.fi
## Polls and interest collection
Vote by adding o behind your answer or write some text:
**Are you using/Have you used any CSC resources?**
* YES: oooooooooooo
* NO: ooo
**If yes, which? if no, why not?**
* Puhti ooo
* Allas o
* cPouta oooo
* Taito o
**Are you interested in in-person hands-on geocomputing course?**
* YES: oooooooooo
* YES, but online: o
* YES, but webinar: o
* YES, but more specific topic: o
* YES, but starting from very basics: oooo
* NO: o
**Please specify:**
* ...
* ...
* ...
**What software, Python/R packages would you need to use on Puhti?**
* geopandas
* gee
* Lastools
* ...
**Any other wishes (for material, examples, courses, seminars, workshops, instructions, software, support)?**
* Dark theme for Puhti web-interface
* tutorial video on how to connect with SSH to cPouta on Windows would be nice (like the one for mac)
* ...
* Instructions for university teachers on how they can integrate the CSC service in to their teaching. What is possible?
## Questions
You can post any questions here or ask by voice, we will try to make sure to answer everything by voice and later also record the answers here for you
* Are CSC services free for use for everyone?
* Everyone at University/ University of applied sciences/ Research institutes doing open science (see more on [research.csc.fi](https://research.csc.fi/free-of-charge-use-cases))
* What is the best way to get info about new trainings? email-list?
* [name=Samantha]Every user known to work with geospatial stuff / participant in geospatial related course is automatically added to GIS / Geoportti mailing list, where we share information about upcoming courses and news.
* [name=Samantha]For information about all CSC training it is possible to sign up to CSC training newsletter here: https://www.csc.fi/en/newsletter
* [name=Mika] What datasets (and what is their coverage) are directly available at Allas and how to best access them? My (or actually a colleague's) specific interest is Sentinel 2 global data
* [name=Samantha] Here is a list of all datasets hosted on Puhti/Allas: https://docs.csc.fi/data/datasets/spatial-data-in-csc-computing-env/
* [name=Samantha] Data on Allas is provided by a researcher and continuously growing, but mainly covering Sentinel-2 data for the agricultural areas of Finland of the growing season 2016-2021 currently
* [name=Arttu] If the existing data does not match your needs, downloading Sentinel-2 data programmatically into Puhti is pretty straight-forward. Then the data can be made public afterwards.
* [name=Mika] If some types of global data are needed and downloaded, how do we ensure it is usable for everyone and others don't download their own copies?
* [name=Samantha] Very good question. Currently there is no system in place to help with that. But we would like to facilitate it better. You can yourself store the data on Allas and make it public, then everyone with a link has access to it. If you have/get large datasets, please also let us know and we see that we add it to our page. (But note, that in general Allas is for storage during project lifetime).
* What about licensing? I mean if we download Sentinel data, how do we ensure we follow licensing rules? Some of it is already in Allas, so it obviously is possible but also probably information and knowledge we could use exists
* [name=Samantha] Also good question. In Sentinel case it is possible to re-share. And the license information is shared in the Allas buckets (somewhere, as far as I know)
* Another, practical issue: How to deal with automatic updating of data that becomes available?
* [name=Samantha] There has been few tests using eg crontabs or similar tools, but also here no good system is inplace currently. We are currently surveying the needs of our users, but especially within EO, there is so many different products and everyone seems to need differently processed files, that it is impossible for us to provide all these datasets. If we could identify a subset that is interesting for many, we can think of providing it on Allas/Puhti
## Workshop part
### By yourself
Think about your own use case / possible future use case / idea:
* What are your requirements? Memory, storage, time,..
* What software do you need? Any restrictions?
* What is your workflow? Can it be more efficient by using other tools/coding?
* What are steps that could be parallelized? In what way?
* Does your used software support parallelization?
* Expected bottleneck
-> Merve Keskin: We would like to modify the map design based on the attention hotspots derived from eye tracking data (point data: X,Y(screen coordinates),time). We need machine (or geometry) learning for vector geodata processing: to be able to select vector map features within the attention hotspots and also the ones having the similar vector characteristics throughout the whole map. Data is of some MBs or GBs but we need to do this selection (and accordingly the visualization) (semi)real-time based on the collected eye movement data. Rendering power&time and iteration are required.
-> Mika Jalava: I often have quite small data, often just megabytes or some gigabytes, but heavy calculations (MC simulations, spatial optimisation etc), and find Puhti just the right environment for this
-> Emil Ehnström: I start off with several gigabytes of twitter data. I use geoparsing (natural language processing) on strings. I've been trying to optimize the code in several ways, also considered and tested out parallelization. Some parts can be parallelized others cannot. As for software I need Python with libraries like geopandas, stanza (nlp), spacy and a few more. In this work one bottleneck is lemmatizing the text. It takes quite a long time, and it could be parallelized. I've also done preprocessing to the data, to make the lemmatization process easier.
-> Mikko Kesälä: I do point cloud modelling. There is project for processing platform for lidar data in csc environment. My requirment would be more on data integration and its automation. Scientist and other organisation open data should be easy to use.
### In groups
* Can you see potential for using CSC resources for your own usecases?
* What challenges could you identify?
* Do you have ideas on how to overcome these challenges?
#### Group 1
Potentials: Ruokajärjestelmätuktimus, viljelymaaoptimointi, spatiaalinen optimointi, rasteri->vektorimuunnos
Tarve globaaleille Sentinel ja Landsat aineistoille, pitkän aikavälin muutostulkintaan & maanpeiteluokittelulle, bottleneck: tiedostojen read and write operaatiot vievät liikaa aikaa, alusta asti vakaa tapa, GEE muuttuva (esim algoritmit, data, minimaalinen dokumentaatio), GEE datasetin nimi vaihtuu, toistettavuus vaikeutuu, tärkeää esim Altaassa että aineiston nimi & algoritmit eivät muuttuisi,
Satelliittiaineistojen luonne ikuista, tarve löytää myös vanhoja aineistoja
Different people need parts of same data, how to download & name & structure - unify using STAC catalog?
Instead of global data provide insructions on how to best download it, send your work to Puhti as image download embedded into code
Big problem with Allas is downloading data from there and getting it out
Recommending way to automate work for python and R
Request & filter Sentinel images ok but long-term archive usage difficult, worst case to manually download, R list to list datasets and automatically download (sen2r)
Challenges: Isot aineistot, yhteiskäyttö, parempi jos olisi isommat aineistot valmiiksi Altaassa, versiointi, globaalin datan tarve, kuormitus aineiston latauksesta ja siirtämisestä, olisi hyvä jos olisi selkeä prosessi aineiston käyttöönottoon esim Puhdissa, jatkuvan päivittämisen pohtiminen (esim Sentinel),
#### Group 2
Potentials: Huge potentials, we talked about geoparsing and natural language processing, but the potential applications seems limitless. All kinds of cool things combining NLP and geoprocessing. In general: Big Data sources are here to stay. Storage and analysis of them in CSC is useful. Also machine learning and deep learning. Very useful with getting access to more GPU power.
Challenges: Some challenges are related to learning how to use the supercomputers and cloud computing. It's a bit tricky in the beginning and we also talked about optimizing code, which feels like it could be done indefinetly. At some point you just have to be satisfied.
#### Group 3
Potentials:
High for reseachers with large data sets
Challenges:
Learning how to use