--- title: hslu dda play3 2022.10.18 tags: presentations description: View the slide with "Slide Mode". slideOptions: theme: white --- <!-- .slide: data-background="#000000" --> <img src="https://i.imgur.com/OSW972P.png" width="100"> <center><pre> ██████╗ ██████╗ █████╗ ██████╗ ██╗ █████╗ ██╗ ██╗ ██╔══██╗██╔══██╗██╔══██╗ ██╔══██╗██║ ██╔══██╗╚██╗ ██╔╝ ██║ ██║██║ ██║███████║ ██████╔╝██║ ███████║ ╚████╔╝ ██║ ██║██║ ██║██╔══██║ ██╔═══╝ ██║ ██╔══██║ ╚██╔╝ ██████╔╝██████╔╝██║ ██║ ██║ ███████╗██║ ██║ ██║ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝ ╚═╝ ╚══════╝╚═╝ ╚═╝ ╚═╝ </pre></center> ```python datetime(2022, 10, 18) && what3('abbauen.beraten.offiziell') ``` PART 2 :o: hackmd.io/@oleg/dda-21-play3-5 PART 1 :rabbit: hackmd.io/@oleg/dda-21-play3 ---- # PART II Lernziele für **Selbstorganisierte Datenerhebung** - Entwicklung von experimentellen Gestaltungs- und Visualisierungstrategien :heavy_check_mark: - Kenntnisse in der Entwicklung eines eigenständigen visuellen Repertoirs :heavy_check_mark: - Kenntnisse im Umgang mit kartografischem Material :heavy_check_mark: - Methodisches und praktisches Vorgehen :eyes: - Kontextualisierung von Themen und Gestaltungsansätzen :eyes: - Aneignen von Kompetenzen in visueller Vermittlung von komplexen Themen und Daten :dark_sunglasses: ---- ### PART II. MAXIMALE # DATENFREIHEIT - **Di.** Datenerhebung wird erweitert, raffiniert und auch gefährdet. - **Mi.** Input LUSTAT + {continue data collection} - **Do.** Publikationtipps: Quellenennung, Sharing und Analytics. - **Fr.** Abschluss Data-Gathering-Projekten. ---- # Start With A [Question](https://etherpad.wikimedia.org/p/0yU3-XA4XRtKyAtpnygD) ---- ``` DD+A 21 PLAY 3 - WTF CSV Question Handout 20.10.2022 11:00 Locations in Luzern, Sitzbänkli, restaurants aktivitäten - was machet Leute am liebsten? Favorite restaurant (Vögeligärtli). Was machen die allgemein, wie sind sie Aktiv? Strassenumfrage an diese Locations. Swisscom Datensatz - wo sich die Leute aufhalten und wo nicht? Warum sind sie dört. Schule, Infrastrukturen, Apotheken. GIS Luzern. Migros/Coop Einkaufszentrum. Local.ch Telefonbuch Wie angenehm ist es hier zu Wohnen? Mehr Daten Einkaufen, Spielplätz. Open Data und so weiter. Wo sind die Neophyten in der Stadt Luzern? Geodaten gibt es, ich kann es auch verwenden. Woher die Pflanzen kommen? Google fragen. Neophyten Abfallsäcke Pilotprojekt. Wo sind die Spassvögel? Popular birdwatching locations. Sources: Wanderkarten auf Swisstopo, Schweizer Familie or other journals. Wo und welle Arten von Neophyten? Oder eine andere Thema. Kanton Thurgau Standortdaten. Flädermause, wichtige Rolle in Ecosystem, Lichtverschmützung beträchtigt Flüg. Flädermauseschütz melden. Wie riecht Luzern? Gerüche in Luzern. Restaurant und Geschäfte. Qualitativ und Quantitativ. Koordinaten. Warum sind Trinkwasserquellen dort wo sie sind? Plätze, Strassendaten zum vergleich. Hauskategorien. Wassernetz. Wie Grün ist Luzern? Koordinaten von Stadtteilen. Wo gibt es mehr Grünfläche? Geoportal + Mathe. Wie ist der Einflüss einem Smart Meter auf einem Haushalt im Energiebereich. Electricity Map. In welches Alter leben wir in der Stadt, soziale Durchmischung? Quartierdaten, Altersgruppen, nicht pro Person sonder pro Quartier koordinaten. Ich muss es lösen, weiss nicht wie. Einkommen, Mithöhen, Gebäudetyp. (Meine Geräusche) Was gehört Mann in Luzern? 3 Standorte ablosen mit Sensor. Gegenüberstellen qualitative (was hören wir) mit Lautstärke. Sehr manuelle Datenbeschaffung. Ruhegebieten, sind sie wirklich so ruhig. Es gibt Datensätze basiert auf Verkehr (Noise Maps). Gibt es lücken wo es sinn machen würde noch weitere Tafeln platziern. Swisscom density of population in Raum Luzern. Wo sind die Werbetafeln located, kann ich es verbinden. Abgedeckt: nur sehr teuer kommst die Daten. ``` ---- ## Start with FUN data sources 1. **F**reedom-loving **Open** data 2. **U**nderutilized **Personal** data 3. **N**aturally **Self**-measured data :joy: ---- ### Freedom-loving Open data ![](https://i.imgur.com/ZQp5qkN.jpg) https://opendata.swiss/de/dataset/standorte-und-verfugbarkeit-von-shared-mobility-angeboten ---- ### Underutilized Personal data ![](https://opendata.utou.ch/presentations/unibern%202021.3/mygeodaten.jpg) https://google.com/maps/timeline ---- ### Naturally Self-measured data ![](https://bucketeer-036aa605-c047-4623-8610-f1764b90cf98.s3.amazonaws.com/hslu-dda/J5L9T98QVIZ6GNDCLZWJZD0T.jpg) ![](https://i.imgur.com/XpMuKJe.png) https://dda.schoolofdata.ch/project/81 ---- ![](https://i.imgur.com/Pg6lph5.jpg) A schema we discussed together from our experiences in the module: 1. Do we have the data already? 2. Is there open data available? 3. May we scrape a website to get it? 4. Can we make the data ourselves? 5. Should we fake it (for now)? --- # Data Pyramid ![](https://i.imgur.com/n4TN3zs.png) When we talk about data, we differentiate between **structured data** (secondary data, in science) and **raw measurements** (primary data). Remixing data typically happens higher up - but you can always start at the bottom and create new data, questioning the status quo. This update of the [DIKW pyramid](https://en.wikipedia.org/wiki/DIKW_Pyramid) may help you as a mental model in understanding where your data starts and ends. ---- ## What is, and what causes, data friction? - Misunderstood data models - Lack of quality control - Performance bottlenecks - Lack of data literacy - Metadata quality lacking ---- > "Friction stops people doing stuff: stops them creating, sharing, collaborating, and using data - especially amongst more distributed communities. It kills the cycles of find, improve, share that would make for a dynamic, productive and attractive (open) data ecosystem." [citation needed; probably blog.okfn.org] ![](https://frictionlessdata.io/img/software/framework.png) ---- ## Frictionless Data principles 1. Narrow focus 2. Build for the web 3. Distributed not centralised 4. Work with existing tools 5. Simplicity (but sufficiency) _(Written by Rufus Pollock in a [2013 blog post](http://blog.okfn.org/2013/04/24/frictionless-data-making-it-radically-easier-to-get-stuff-done-with-data/) and still relevant today.)_ ---- ### The problem of making data portable - Avro (also Parquet, ORC) in Spark and Hadoop - https://avro.apache.org - Apache Arrow - https://arrow.apache.org/ - HDFS etc. https://doc.dataiku.com/dss/latest/connecting/index.html - SQLite - https://www.sqlite.org - Datasette - https://datasette.io - Postgres - https://wiki.postgresql.org/wiki/Sample_Databases - Should you [Put your data in an R package](https://grasshoppermouse.github.io/posts/2017-10-18-put-your-data-in-an-r-package/)? Example: [AustralianPoliticians](https://cran.r-project.org/web/packages/AustralianPoliticians/index.html) We are not going to fix this in an afternoon ;-) ---- ![](https://i.imgur.com/i0zIWT7.png) :point_up_2: Give SQL - e.g. via [Datasette](https://datasette.io) or [SQLite](https://www.sqlite.org/quickstart.html) - a go some time! It runs like 90% of the Internet ... ---- ### The problem of concensus around a standard - JSON-LD - https://json-ld.org - CSV Schema - http://digital-preservation.github.io/csv-schema - GeoJSON - https://geojson.org - GeoPackage - https://www.geopackage.org APIs and Ontologies are typical examples of how concensus can be achieved. ![](https://i.imgur.com/8rFdRtL.jpg) [Sebastian Mate, 2011](https://www.researchgate.net/publication/268163032_Ein_Semantic-Web-Ansatz_zum_Mapping_klinischer_Metadaten_am_Beispiel_eines_Bioproben-Projektvermittlungs-Portals_fur_das_DPKK_auf_der_Basis_von_i2b2/figures) ---- ![](https://i.imgur.com/FCxwKHY.png) The [JSON-LD Playground](https://json-ld.org/playground/) is a good way to experiment with Linked Data, understand how relationships work in typical scenarios, in an interactive and visual way. For more background, read the [Wikipedia page](https://de.wikipedia.org/wiki/JSON-LD). ---- ### The madness of "big data" [Data lakes: where big businesses dump their excess data, and hackers have a field day](https://theconversation.com/data-lakes-where-big-businesses-dump-their-excess-data-and-hackers-have-a-field-day-123865) _"Data lakes is where your data sinks, then stinks."_ -- Anonymous See also: [Hadoop hatred](https://www.chrisstucchio.com/blog/2013/hadoop_hatred.html) ---- # Open Data for Artists ![](https://i.imgur.com/KiKzdOh.png) _Sketch by Jonas Oesch_ --- ![](https://i.imgur.com/N3AO3Re.png) Our projects are coming together on the [DDA.schoolofdata site](https://dda.schoolofdata.ch/), where we are tracking our progress in collecting, cleaning, applying data using the **School of Data Pipeline**. It follows roughly the same seven steps, to help you in your work. ---- ![](https://bucketeer-036aa605-c047-4623-8610-f1764b90cf98.s3.amazonaws.com/hslu-dda/HSUX7X5U3MS6NGIKA4GYWNME.png) See [How to Package your Data](https://dda.schoolofdata.ch/project/8) and try to create.frictionlessdata.io (pictured above) to make your first Data Package based on a simple, highly portable CSV UTF-8 file. Then upload it to [dribdat](https://dribdat.cc). ---- ![](https://i.imgur.com/uw35S0I.png) Dribdat has an [API](https://dda.schoolofdata.ch/about/?) with [support](https://dda.schoolofdata/api/event/current/datapackage.json) for importing and exporting open data using Data Packages, which are now elegantly displayed inside of a project. So here you can practice how to effectively combine your visualisation and research outputs. ---- ## Data Packages in Action --- ![](https://i.imgur.com/67oMn0N.png) Upload a [package](https://frictionlessdata.io/introduction/#who-uses-frictionless), set up an automation, pick up a [badge](https://repository.frictionlessdata.io/docs/badges.html), get a [pull request](https://howchoo.com/webdev/how-and-why-to-use-pull-requests-in-github). This is how many interesting feedback loops and data remixes begin. The charts in the [COVID-19 tracker](https://covid-tracker.frictionlessdata.io/) shown above are Data Packages visualized with Vega and published with [Livemark](https://livemark.frictionlessdata.io/). See also: :books: [Handbook for Researchers](https://raniere-phd.gitlab.io/frictionless-data-handbook/) ---- ![](https://i.imgur.com/pbf5644.jpg) A Swiss research group (Giuseppe Peronato) is working with data packages in energy applications, and contributing to Dataflows. Check out https://www.hotmaps-project.eu/ and Play with https://www.hotmaps.eu/map ---- ![](https://www.gtn-g.ch/wp-content/uploads/bild6.jpg) Recently a University of Zürich researcher (Ethan Welty) published a Data Package of [glacial ice](https://gitlab.com/wgms/glathida) that is being experimented on by the community https://observablehq.com/@randomfractals/datapackage-yaml ---- ## A moment of Fame for your data ![](https://i.imgur.com/RDAOej8.png) :trophy: Datasets published on Opendata.swiss get attention from the community, are worked on through hackathons and other events. You are welcome to the [OpenGLAM Working Group](https://opendata.swiss/de/organization/openglam) at the upcoming [#GLAMhack event](https://hack.glam.opendata.ch/) in Mendrisio, and see this in action. ---- ![](https://i.imgur.com/NoZw3gE.png) :trophy: We have prepared lots of Data Packages for hackathons through School of Data workshops, which you can find at [@schoolofdata-ch](https://github.com/schoolofdata-ch/), and there are a few more at the [GitHub Topic](https://github.com/topics/datapackage). :tada: Find even more examples at https://frictionlessdata.io/adoption/#pilot-collaborations ---- ![](https://i.imgur.com/XsAZ0pG.jpg) There are a lot of places to showcase your own data work, from [DataHub](https://datahub.io/collections) and [GitHub](https://github.com) to [Quilt](https://open.quiltdata.com/) and [data.world](https://data.world/), [Dribbble](https://dribbble.com) to [Behance](https://behance.com) and [various](https://data.stadt-zuerich.ch/showcase) [open data](https://data.bs.ch/pages/reuses/) [portals](https://opendata.swiss/en/showcase). Find an audience that you click with, and participate in the commons. Share the beauty and wonder of data, demand high quality and reproducibility - not just glitzy and glamorous widgets. ---- ![](https://staticaltmetric.s3.amazonaws.com/uploads/2015/06/details-1024x365.png) Understanding "data frictions" allows us to gain more control of our data and measure impact. Tweet, notify, like, retweet. [GitHub](https://www.toolsqa.com/git/git-fork/) forks and stars are similar to the impact ratings in academia, such as [Altmetrics](https://www.altmetric.com/audience/researchers/) pictured above. Projects such as the [Metadata Quality](https://data.europa.eu/mqa/) analysis of [data.europa.eu](https://data.europa.eu) aim to regularly evaluate and score datasets. ---- ![](https://i.imgur.com/yyZDdSN.png) :trophy: Just like the most outstanding designs and most compelling artworks deserve recognition, so do efforts to structure, clean and share data deserve a moment of fame in the community of appreciative users. Soon more data portals will have popularity scores and feedback mechanisms that you can leverage. ---- # Final thoughts ![](https://api.gbif.org/v1/image/unsafe/http:%2F%2Fimages.ctfassets.net%2Fuo17ejk9rkwj%2F7kcarX51BOYKPbEQVTbr5J%2Fbbe62b49ee66e02cbfd54870b6279eb8%2Fwhatisgbif.png) [GBIF](https://gbif.org) is a massive repository for biodiversity observations and related data. It is one of the #GLAMhack challenges. It is an inspiring project and a good lead-up to some final discussions in this module. ---- **Hirnschlag 2.0 - death by data** ![](https://i.imgur.com/rttGe5U.jpg) Information overload affects people working with data quite a lot. It's an occupational hazard. Let's work together to try to overcome this through good habits and "being excellent to each other". ---- **Indischer Ozean, im Überlick** ![](https://i.imgur.com/65IyJ6G.jpg) Be truthful to yourself - remember that everyone makes mistakes some of the time - and that's okay. Really. Best to confess your lack of knowledge than to assume that people won't notice. ---- **Wir beraten dich 27/4** :point_right: **opendata.ch/help** ![](https://i.imgur.com/OcpHn13.png) This is one of several public forums where people hang out and answer data wrangling questions. There are much bigger ones too. Let's build a supportive, functioning data community together! ![](https://i.imgur.com/T1AqRNm.jpg) ---- **Don't worry, raw data now** `^_^` ![](https://i.imgur.com/pV0SmSR.jpg) Just behind the curtain of the websites of the world are worlds of data to explore. There is no harm in just looking around. And, in fact, quite a lot of benefit to society in asking some critical questions about the security or ethics of popular online destinations. ---- **Kunst als Spiegel der Wissenschaft** ![](https://i.imgur.com/lSqJL3Q.jpg) Think of your role as that of a mirror. Observing, emulating, reproducing, questioning, expanding, provoking, evolving. Data art can be a vital instrument in addressing the [Replication crisis](https://en.wikipedia.org/wiki/Replication_crisis) that is evident when you just poke around the [public data](https://www.acsh.org/news/2021/03/31/reproducibility-crisis-15446). ---- **Trust your instincts. Work out your role. Level up!** Social (engineering) skills will be just as important as technical ones to survive and thrive in the data jungle. ![](https://www.digitec.ch/im/Files/6/6/5/7/4/6/0/1/Digimon-Survive-Test-04-pc-games.jpg?impolicy=PictureComponent&resizeWidth=800&resizeHeight=450) [Digimon Survive](https://www.digitec.ch/en/page/digimon-survive-in-review-death-survival-monsters-and-sooooo-much-to-read-24493) (Digitec) ---- **Give a wo/man a fish, and they can be fed for a day. Teach a wo/man to fish, and they can be fed for a lifetime.** I hope that this module has given you insight both into the field of data collection, as well as how to talk to and work with the people that "live the data life". ![](https://cdn.donmai.us/sample/b5/82/__otamamon_and_divemon_digimon_drawn_by_p_k_ru__sample-b582b9d7cf362adf5334eb3745574681.jpg) [Divémon drawn by p!k@ru](https://danbooru.donmai.us/posts/5140171?q=divemon) --- *"I am interested in mathematics only as a creative art."* <small>-- [G.H.Hardy, 1940](https://www.math.ualberta.ca/mss/misc/A%20Mathematician%27s%20Apology.pdf)</small> --- # Cheers! :sailboat: ### dda.schoolofdata.ch/user/loleg ## oleg@opendata.ch <small>This presentation is shared under [CC BY 4.0](http://creativecommons.org/licenses/by/4.0/)</small>