<style>
:root {
--r-heading1-size: 2.2em;
}
.container {
scrollbar-width: none;
}
.container::-webkit-scrollbar {
display: none;
}
.reveal h1, .reveal h2 {
color: #ffaf00;
margin-bottom: 0.5em;
text-shadow: 5px 5px #000 !important;
}
.reveal h2 {
margin-top: 0;
}
/*
.pdf-page h1 {
text-shadow: 5px 5px #000 !important;
}
*/
.reveal .progress, .reveal .controls {
color: #ffaf00;
}
.reveal a {
/*color: #87ff00;*/
color: #00afd7;
text-decoration: underline;
}
.reveal a:hover {
color: #00afd7;
text-decoration: none;
}
.reveal img {
display: block;
margin: auto;
max-height: 560px;
position: relative;
top: -1em;
}
.reveal strong {
color: #ffaf00;
}
.reveal .slide-number {
background-color: inherit;
}
.reveal .slide-number a {
text-decoration: none;
}
.reveal .slides h2 ~ p {
text-align: left;
margin-left: 2em;
}
</style>
# Content-addressing: chances for data distribution and verifiable data pipelines
Workshop on Open Geospatial Science and the Decentralized Geospatial Web, Maryland, USA 2024-04-03
---
## Goals for this talk
- Learn what content-addressing is
- Why content-addressing is useful and some of its appplications
---
## About me
- Volker Mische (vmx)
- Open source geo things for over 15 years ago
- Frontend (OpenLayers), later databases (GeoCouch)
- Offline-first (Apache CouchDB)
- Decentralized Web (IPFS, IPLD, libp2p)
---
## Focus today
- Concepts, not specific technology stacks
---
## Content-addressing
---
## Location-addressing
- Where is the data?
- Example: URLs
- Problems:
- 404 Not found, maybe moved?
- Modified without noticing
---
## Location vs. content-addressing
- Example: Library
---
## Content-addressing: how
- Hashing (building a checksum)
- `3b6c5275…ffaa2b5a ubuntu-23.10.1-desktop-amd64.iso`
- Data ⟶ pure function ⟶ long number
⇒ Same input ⟶ deterministic output
- Almost certain, unless there's a hash collision
⇒ Different input ⟶ different output
⇒ We can use the hash as an identifier
---
## Content-addressing
- Which data?
- Where the data is, doesn't matter
- If it was moved, you can still find it
- You can verify that it's the data you requested
---
## Peer-to-peer
- Sometimes wrong ideas:
- Only publicly accessible data
- Piracy vibes (Napster, Kazaa, BitTorrent)
- Multiple equal servers (nodes):
- no single primary (or only few)
⇒ no single point of failure
- Content-addressing:
- Not where, but which data
- Data locality (e.g. within your local network)
---
## Peer-to-peer: data distribution
- Possibly private network
- Knowing the number of copies
- Possibly expanding to open network
Note:
- Knowing the number of copies:
- Tell story about the DLR (the German delegation of the ESA hat stores a subset of the Sentinel data)
- The open network could be
- Universities (like hosting a Debian mirror)
- Research institutions may host a subset that matches their research area as they want a local copy anyway
---
## Content-addressing: verifiable data
- Example: satellite imagery (e.g. Copernicus Sentinel 2 mission)
- Cloud providers derive Level-2 from Level-1 data independently
- Different outputs from the same data
- Sometims unclear which parameters (e.g. for atmospheric correction) were used
- Worst: How do you know it was correct/that the input data wasn't tempered with
Note:
- Tempering could be things like:
- "Reduce" deforestation
- Make droughts look less bad
---
## Content-addressing: verifiable data (ideal)
- Open source workflow
- Content addresses of the (intermediate) results is published
⇒ You can verify the whole pipeline yourself
- Third parties can provide data and you don't need to trust them
---
## Another application: offline/local-first
- Scale down the peer-to-peer stuff
- Make it work locally on devices
e.g. on your mobile in your browser
---
## In Browser data replication
- Cross platform without app stores
- Works without any server infrastructure, only networking is needed (e.g. Wifi, GSM)
- Prototype based on libp2p/IPFS exists:
https://github.com/vmx/colleemap
- Use case: disaster response
Note:
- You could think of it also as data distribution at a small scale
---
The end.
{"description":"Presentation for the global FOSS4G 2023 in Prizren, Kosovo 2023-06-28","slideOptions":"{\"width\":1280,\"theme\":\"blood\"}","contributors":"[{\"id\":\"03083f6c-6dbb-4064-817b-c45d87e7c765\",\"add\":9999,\"del\":5257}]","title":"Content-addressing: chances for data distribution and verifiable data pipelines"}