<style> :root { --r-heading1-size: 2.2em; } .container { scrollbar-width: none; } .container::-webkit-scrollbar { display: none; } .reveal h1, .reveal h2 { color: #ffaf00; margin-bottom: 0.5em; text-shadow: 5px 5px #000 !important; } .reveal h2 { margin-top: 0; } /* .pdf-page h1 { text-shadow: 5px 5px #000 !important; } */ .reveal .progress, .reveal .controls { color: #ffaf00; } .reveal a { /*color: #87ff00;*/ color: #00afd7; text-decoration: underline; } .reveal a:hover { color: #00afd7; text-decoration: none; } .reveal img { display: block; margin: auto; max-height: 560px; position: relative; top: -1em; } .reveal strong { color: #ffaf00; } .reveal .slide-number { background-color: inherit; } .reveal .slide-number a { text-decoration: none; } .reveal .slides h2 ~ p { text-align: left; margin-left: 2em; } </style> # Content-addressing: chances for data distribution and verifiable data pipelines Workshop on Open Geospatial Science and the Decentralized Geospatial Web, Maryland, USA 2024-04-03 --- ## Goals for this talk - Learn what content-addressing is - Why content-addressing is useful and some of its appplications --- ## About me - Volker Mische (vmx) - Open source geo things for over 15 years ago - Frontend (OpenLayers), later databases (GeoCouch) - Offline-first (Apache CouchDB) - Decentralized Web (IPFS, IPLD, libp2p) --- ## Focus today - Concepts, not specific technology stacks --- ## Content-addressing --- ## Location-addressing - Where is the data? - Example: URLs - Problems: - 404 Not found, maybe moved? - Modified without noticing --- ## Location vs. content-addressing - Example: Library --- ## Content-addressing: how - Hashing (building a checksum) - `3b6c5275…ffaa2b5a ubuntu-23.10.1-desktop-amd64.iso` - Data ⟶ pure function ⟶ long number ⇒ Same input ⟶ deterministic output - Almost certain, unless there's a hash collision ⇒ Different input ⟶ different output ⇒ We can use the hash as an identifier --- ## Content-addressing - Which data? - Where the data is, doesn't matter - If it was moved, you can still find it - You can verify that it's the data you requested --- ## Peer-to-peer - Sometimes wrong ideas: - Only publicly accessible data - Piracy vibes (Napster, Kazaa, BitTorrent) - Multiple equal servers (nodes): - no single primary (or only few) ⇒ no single point of failure - Content-addressing: - Not where, but which data - Data locality (e.g. within your local network) --- ## Peer-to-peer: data distribution - Possibly private network - Knowing the number of copies - Possibly expanding to open network Note: - Knowing the number of copies: - Tell story about the DLR (the German delegation of the ESA hat stores a subset of the Sentinel data) - The open network could be - Universities (like hosting a Debian mirror) - Research institutions may host a subset that matches their research area as they want a local copy anyway --- ## Content-addressing: verifiable data - Example: satellite imagery (e.g. Copernicus Sentinel 2 mission) - Cloud providers derive Level-2 from Level-1 data independently - Different outputs from the same data - Sometims unclear which parameters (e.g. for atmospheric correction) were used - Worst: How do you know it was correct/that the input data wasn't tempered with Note: - Tempering could be things like: - "Reduce" deforestation - Make droughts look less bad --- ## Content-addressing: verifiable data (ideal) - Open source workflow - Content addresses of the (intermediate) results is published ⇒ You can verify the whole pipeline yourself - Third parties can provide data and you don't need to trust them --- ## Another application: offline/local-first - Scale down the peer-to-peer stuff - Make it work locally on devices e.g. on your mobile in your browser --- ## In Browser data replication - Cross platform without app stores - Works without any server infrastructure, only networking is needed (e.g. Wifi, GSM) - Prototype based on libp2p/IPFS exists: https://github.com/vmx/colleemap - Use case: disaster response Note: - You could think of it also as data distribution at a small scale --- The end.
{"description":"Presentation for the global FOSS4G 2023 in Prizren, Kosovo 2023-06-28","slideOptions":"{\"width\":1280,\"theme\":\"blood\"}","contributors":"[{\"id\":\"03083f6c-6dbb-4064-817b-c45d87e7c765\",\"add\":9999,\"del\":5257}]","title":"Content-addressing: chances for data distribution and verifiable data pipelines"}
    569 views