changed 4 years ago
Linked with GitHub

Use git(hub) as overpass cache to serve umap data

motivation

  • not every map project requires real-time data (sync daily or weekly is sufficient)
  • minimize overpass load and serve data from elsewhere
  • public git repo brings benefits of collaboration and granular history for dataset

github

Git scraping is technique I learned about some month ago you can read more about it here https://simonwillison.net/2020/Oct/9/git-scraping/.

In general this can be done via any CI system like (self-hosted) gitlab, github etc. but keep in mind there might be some restrictions e.g. github limits max file size to 100MB.

First we need to prepare overpass query working with umap, I have described it in this article.

Once having the query simple shell script using wget or curl can fetch it, let's call this script umap.sh

wget -O result.json 'https://overpass-api.de/api/interpreter?data=<our_query>'

but how to execute such script? Github deploys so called "Actions" which are triggered when certain conditions are met e.g. new commit or regularly by cron.

Let's create "New workflow"

a workflow defines these conditions an example below will run script called umap.sh on every commit or daily at 4:56 in the morning.

After it was run it will commit the files to git repo if files have changed compared to what's already in repo.

name: Scrape latest data

on:
  push:
  workflow_dispatch:
  schedule:
    - cron:  '56 4 * * *'

jobs:
  scheduled:
    runs-on: ubuntu-latest
    steps:
    - name: Check out this repo
      uses: actions/checkout@v2
    - name: Get the data and analyze
      run: |
        chmod +x ./umap.sh
        ./umap.sh
      shell: bash
    - name: Commit and push if it changed
      run: |-
        git config user.name "Automated"
        git config user.email "actions@users.noreply.github.com"
        git add -A
        timestamp=$(date -u)
        git commit -m "Latest data: ${timestamp}" || exit 0
        git push

examples

Here's couple of examples first one is status of recycling in Czech republic, which focuses on recycling items with incomplete data

repo - https://github.com/mahdi1234/OSM_CZ_recycling
umap result - https://umap.openstreetmap.fr/en/map/odpad_bez_urceni_cr_553696

a different project of friend of mine focusing on vegan/vegetarian/bulk-purchase in South Moravian Region (Jihomoravský kraj)

repo - https://github.com/befeleme/vegan_JMK
umap - https://umap.openstreetmap.fr/en/map/vege-jmk_557579

generating GPX

For current local project I wanted github to generate gpx files, but had tought times until I realized overpass doesn't produce geojson, but "just json".

I decided to switch to xml instead for this project, but as per comment section it should be possible to tranfrom to geojson as well see https://github.com/ThomasG77/demo-parks-metropole-nantes/blob/main/umap.sh in particular osmtogeojson result.json >| result.geojson

Once having xml from overpass some tool for conversion is needed, I chose gpsbabel - https://www.gpsbabel.org/

Fist install it as a part of the workflow - https://github.com/mahdi1234/OSM_CZ_phonebooths/blob/main/.github/workflows/scrape.yml

- name: Install gpsbabel
  run: sudo apt-get install gpsbabel

and then convert into gpx

- name: Convert to gpx
  run: |
    chmod +x ./gpx_convert.sh
    ./gpx_convert.sh
  shell: bash 

where https://github.com/mahdi1234/OSM_CZ_phonebooths/blob/main/gpx_convert.sh is simple gpsbabel

#!/bin/bash

gpsbabel -i osm -f active_phone_booths.xml -o gpx -F active_phone_booths.gpx
gpsbabel -i osm -f disused_phone_booths.xml -o gpx -F disused_phone_booths.gpx

Gpx files can be linked directly from umap for download as in https://umap.openstreetmap.fr/en/map/telefonni-budky_621957

this is done via layer properties

Select a repo