owned this note
owned this note
Published
Linked with GitHub
# Data Dojo Würzburg 6
## November 2021
- **When:** :warning: **Tuesday**, November 9<sup>th</sup>, 2021 at 6:00pm
- **Where:** Zoom
- **Zoom:** *This event has ended*
- **Info:** [DataDojo Website](https://ddojo.github.io/), [Repo](https://github.com/ddojo/ddojo.github.io)
:exclamation: This months Data Dojo happens on the second Tuesday, not the second Thursday :exclamation:
## Participants
> Please add your name to the list (click the pen icon at the top left to edit) if you plan to come. And please remove it if you can not make it. Feel free to add your preferred tool or programming language.
- Markus (julia)
-
## Dataset
[Lemurs](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-24/readme.md)
Question Pool:
- Generic
- What kind of information is stored in the table(s)?
- How much data is missing?
- Is the dataset clean or are there any clear outliers?
- Specific
- Which lemur had the most offspring
- What is the mean (adult) weight per species
- What is the age distribution by species (separately for male/female)
- How do the weight curves (weight by age) differ per species?
- Which combinations of species have hybrids (and which don't)?
- What is the name of the oldest (living) lemur?
- **Add your own questions**
- Further Ideas
- Create a genealogical tree for the largest family of lemurs
- **Add your own ideas**
## Collaborative Tools and Workflow
For Notebooks (R, python, julia, js, ...) with real time collaboration [CoCalc](https://cocalc.com) seems to be the best option right now. It worked great the last couple of times so we'll stick to it for now. You need to register an account there (it is free).
### Other real time collaboration tools
Feel free to add suggestions to this list
- [VS Code](https://code.visualstudio.com/) with [Live Share Extension](https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare) (very promising but notebook support not yet stable), languages: python, R, julia, ...
- Jupyter Lab [real time collaboration](https://github.com/jupyterlab/jupyterlab/pull/10118) (alpha feature), languages: python, R, julia, ...
- Observable [multiplayer](https://observablehq.com/@observablehq/introducing-observable-collaboration) (experimental feature), languages: javascript
- [Jupyter Lite](https://jupyterlite.readthedocs.io/en/latest/): in browser version of Jupyter Lab, languages: javascript, (a subset of) python
## Future Suggestions
> Add your suggestions to the list and :+1: to the end of a line you are interested in
### Data Sets
- Results of the [Bundestagswahl 2021](https://www.bundeswahlleiter.de/bundestagswahlen/2021/ergebnisse/opendata.html)
- Weather data throughout Germany over time (incl. temperature, precipitation, ...): https://www.dwd.de/DE/leistungen/cdc_portal/cdc_portal.html
- German [Mikrozensus](https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Haushalte-Familien/Methoden/mikrozensus.html)
- Kaggle [Titanic](https://www.kaggle.com/c/titanic) or [Tabular Playground](https://www.kaggle.com/competitions?hostSegmentIdFilter=8) or [Meta Kaggle](https://www.kaggle.com/kaggle/meta-kaggle)
- World Trade Data ([Open Trade Statistics](https://tradestatistics.io))
- [Open Citation Data](http://opencitations.net/download#coci)
- [Top 100 charts + Audio Features](https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-09-14/readme.md)
### Kinds of Questions
-
### Tools/Languages
- R/tidyverse
- python
- [Power BI](https://www.microsoft.com/en-US/download/details.aspx?id=58494)
- [Tableau](https://www.tableau.com)
- [KNIME](https://www.knime.com/)
- javascript
- julia
### Skills
- interactive maps
- dashboards
- animations
### Data Sources
> all data types are welcome, including tables, images, videos, sounds, DNA, ...
- [TidyTuesday](https://github.com/rfordatascience/tidytuesday)
- [Our World in Data](https://ourworldindata.org/) (R package: [owidR](https://github.com/piersyork/owidR)), [Sustainable Development Goals](https://sdg-tracker.org/)
- Open Data Initiatives ([Würzburg](https://opendata.wuerzburg.de/), [Germany](https://www.govdata.de/), [Statistisches Bundesamt](https://www.destatis.de/), [Europe](https://data.europa.eu/en), [APIs](https://bund.dev/))
- [Awesome Public Datasets](https://github.com/awesomedata/awesome-public-datasets)
- [Kaggle Datasets](https://www.kaggle.com/datasets) or [Competitions](https://kaggle.com/competitions), e.g. [SLICED](https://www.kaggle.com/search?q=Sliced+in%3Acompetitions)
- [tsibbledata](https://tsibbledata.tidyverts.org/reference/index.html): Time Series Datasets
- [R-text-data](https://github.com/EmilHvitfeldt/R-text-data): Text Datasets, ready to use in R
- [data.world](https://data.world/)
- [Statista](https://de.statista.com/) - the University of Würzburg has a campus license
- [Open Legal Data](https://de.openlegaldata.io/)
- [Bundestag Data](https://github.com/bundestag) (e.g. poll results, deputies, wahl-o-mat, [inspirational blog post](https://jollydata.blog/posts/2021-03-14-bundestag-part-iii/))
- [Deutsche Digitale Bibliothek](https://www.deutsche-digitale-bibliothek.de/newspaper) ([API](https://labs.deutsche-digitale-bibliothek.de/app/ddbapi/), old newspapers from Germany)
- [Earth Observation: Satellite Image Time Series](https://e-sensing.github.io/sitsbook)
- [Machine Learning Datasets](https://paperswithcode.com/datasets)
## Cross Links
- [previous pad](https://hackmd.io/mHP-kaILTUCdLyooCOus6Q)
- [next pad](https://hackmd.io/x7zyXNyaSpOCRz8TGadleg)