# Data Dojo Würzburg 4
## September 2021
- **When:** September 9<sup>th</sup>, 2021 at 6:00pm
- **Where:** Zoom
- Meeting ID: 912 6133 8678
- Password: 887732
- **Info:** [DataDojo Website](https://ddojo.github.io/), [Repo](https://github.com/ddojo/ddojo.github.io)
> Please add your name to the list (click the pen icon at the top left to edit) if you plan to come. And please remove it if you can not make it. Feel free to add your preferred tool or programming language.
- Markus (R/tidyverse)
- Florens (Python/R)
Hourly temperature in Würzburg from 1948 to 2021
Data access: https://cdc.dwd.de/portal/202107291811/mapview
A pre-downloaded file will be shared with all participants.
- What kind of information is stored in the table?
- How much data is missing?
- Is the dataset clean or are there any clear outliers?
- What (and when) was the hottest/coldest temperature ever meassured in Würzburg
- What was the hottest/coldest day/week/month/year (sliding window)?
- What was the most extreme temperature difference within 24 hours?
- Is there a long term trend in mean temperature?
- Is there a difference in temperature per month (shifting seasons)?
- **Add your own questions**
- Further Ideas
- How well can we predict the temparature of the next day/week/month?
- Include precipitation or more weather stations from Germany/Europe/World → maybe another time
## Collaborative Tools and Workflow
For Notebooks (R, python, julia, js, ...) with real time collaboration [CoCalc](https://cocalc.com) seems to be the best option right now. It worked great last time so we'll stick to it for now. You need to register an account there (it is free).
### Other real time collaboration tools
Feel free to add suggestions to this list
- [VS Code](https://code.visualstudio.com/) with [Live Share Extension](https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare) (very promising but notebook support not yet stable), languages: python, R, julia, ...
- Jupyter Lab [real time collaboration](https://github.com/jupyterlab/jupyterlab/pull/10118) (alpha feature), languages: python, R, julia, ...
## Future Suggestions
> Add your suggestions to the list and :+1: to the end of a line you are interested in
### Data Sets
- Weather data throughout Germany over time (incl. temperature, precipitation, ...): https://www.dwd.de/DE/leistungen/cdc_portal/cdc_portal.html
- German [Mikrozensus](https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Haushalte-Familien/Methoden/mikrozensus.html)
- Kaggle [Titanic](https://www.kaggle.com/c/titanic) or [Tabular Playground](https://www.kaggle.com/competitions?hostSegmentIdFilter=8) or [Meta Kaggle](https://www.kaggle.com/kaggle/meta-kaggle)
- World Trade Data ([Open Trade Statistics](https://tradestatistics.io))
- [Open Citation Data](http://opencitations.net/download#coci)
### Kinds of Questions
- [Power BI](https://www.microsoft.com/en-US/download/details.aspx?id=58494)
- interactive maps
### Data Sources
> all data types are welcome, including tables, images, videos, sounds, DNA, ...
- [Our World in Data](https://ourworldindata.org/) (R package: [owidR](https://github.com/piersyork/owidR)), [Sustainable Development Goals](https://sdg-tracker.org/)
- Open Data Initiatives ([Würzburg](https://opendata.wuerzburg.de/), [Germany](https://www.govdata.de/), [Statistisches Bundesamt](https://www.destatis.de/), [Europe](https://data.europa.eu/en), [APIs](https://bund.dev/))
- [Awesome Public Datasets](https://github.com/awesomedata/awesome-public-datasets)
- [Kaggle Datasets](https://www.kaggle.com/datasets) or [Competitions](https://kaggle.com/competitions), e.g. [SLICED](https://www.kaggle.com/search?q=Sliced+in%3Acompetitions)
- [tsibbledata](https://tsibbledata.tidyverts.org/reference/index.html): Time Series Datasets
- [R-text-data](https://github.com/EmilHvitfeldt/R-text-data): Text Datasets, ready to use in R
## Cross Links
- [previous pad](https://hackmd.io/AfCChD1jTCiPY4dFztgvNw)
- [next pad](https://hackmd.io/mHP-kaILTUCdLyooCOus6Q)