--- tags: sensors, eit --- # Someone collects the data and I use it ![](https://i.imgur.com/85J1V6O.png) ## Different types of data? **Open data** Data that anyone can access, use, and share, with full permission to use any way they like. **Shared data** Data that can be shared with a specific group of people for a specific purpose. **Closed data** Data that can only be accessed by those who collected it or are accountable for it. :::info Another way of putting it > According to the Open Data Institute, “**Open data is data that anyone can access, use or share. Simple as that. When big companies or governments release non-personal data, it enables small businesses, citizens and medical researchers to develop resources which make crucial improvements to their communities.**” ::: ### Different formats **Human readable format** ![](https://i.imgur.com/mLCXN2Y.png) **Machine format** ![](https://i.imgur.com/SFW1Ue8.png) :::info https://www.data.govt.nz/toolkit/open-data/formats-for-open-data-machine-readable-and-human-readable/ ::: ## Accessing internet data As we have seen, we can find a lot of information around in the internet. However, the information can sometimes be very scattered and dissorganised. There are techniques that allow us to collect data from online websites: - Webscrapping (simply making a script for collecting data from websites - not very legal sometimes) - API (the legal way) :::warning Sometimes websites are not very happy when they are scrapped. For instance, [IMDB](https://www.imdb.com/conditions) says in their terms and conditions: - **Robots and Screen Scraping**: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below. ::: For this, other means for interacting with online content is provided in the form of an API: - A **Web API** is an [application programming interface](https://en.wikipedia.org/wiki/Application_programming_interface) for either a web server or a web browser. It is a web development concept, usually limited to a web application's client-side (including any web frameworks being used), and thus usually does not include web server or browser implementation details such as SAPIs or APIs unless publicly accessible by a remote web application. We can connect to an API directly by it's endpoints: - **Endpoints** are important aspects of interacting with server-side web APIs, as they specify where resources lie that can be accessed by third party software. Usually the access is via a URI to which HTTP requests are posted, and from which the response is thus expected. An example of an open API is the [SmartCitizen API](https://api.smartcitizen.me/v0/devices/5452/): ![](https://i.imgur.com/aZaFAiZ.png) :::info **Machine readable format** The data is available generally in [JSON format](https://json.org/). Json is done by packing data in between {}: ``` { "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } } ``` ::: :::info The way we request data to an API comes with the following format: - Base URL: `http://www.omdbapi.com/` - Query: `?` + `parameter` + `queryname`. The `parameter` can be found in the [API documentation](http://www.omdbapi.com/). Several parameters can be separated by `&`. An example: `http://www.omdbapi.com/?s=jose&plot=full&apikey=2a31115` ![](https://i.imgur.com/3Fk2sk1.png) ::: :::warning Music Brainz: https://musicbrainz.org/doc/MusicBrainz_API ::: ### Examples :::info Some use cases here https://opendatahandbook.org/value-stories/en/ ::: - [Open Movie DB](http://www.omdbapi.com) - [Open Data Barcelona](https://opendata-ajuntament.barcelona.cat/) - Environmental Data APIs - [Smart Citizen API](https://api.smartcitizen.me) - [MINKA](https://minka-sdg.org/) - [Ictio](https://ictio.cat/) - [Natusfera](https://spain.inaturalist.org/users/sign_in) - [OdourCollect](https://odourcollect.eu/) - AireCiudadano - [Text analysis](https://orange3-text.readthedocs.io/en/latest/index.html) - [Twitter](https://developer.twitter.com/en/docs/twitter-api) - [Wikipedia](https://www.mediawiki.org/wiki/API:Tutorial) - [The guardian](https://open-platform.theguardian.com/explore/) - [NYT](https://developer.nytimes.com/) - Health - [Covid](https://github.com/CSSEGISandData/COVID-19) - [PubMed](https://pubmed.ncbi.nlm.nih.gov/) - [Socioeconomic Data](https://github.com/biolab/orange3-world-happiness) - https://worldhappiness.report/ - https://data.worldbank.org/ - https://stats.oecd.org/ ### Making use of it ![](https://i.imgur.com/EtsNirn.png) https://orangedatamining.com/ :::info **Setup** https://hackmd.io/LIpX3s4aT4WsloLqqC_7ZQ **Basic example** https://hackmd.io/4_4zeo3QQ6C9VEbhqSYddQ :::