Try   HackMD

Content-Addressable Web (CAW)

This draft specification proposes a format for Web Tiles. Informally, it can be thought of as the wild love child of Web Application Manifest and Content Claims Protocol. Essentially, manifests provide useful metadata for a bundle of content and content claims make it much easier to provide content that already exists on HTTP in an IPFS-friendly way (ie. using CIDs that can be further supported by verifiable claims).

Note that:

  • Tiles' metadata needs differ from those of traditional web apps, so the manifest's content differs.
  • I am not 100% certain (yet) that content claims are an exact match for the need of this format, this is exploratory (but promising).

What the tiles and wishes system needs from this format is:

  • A way of capturing a complete list of content-addressable resources that is the entirety of that that tile is allowed to load from the network.
  • The ability to assign reliable content types to that content.
  • Metadata to render an inactive tile (as part of a feed).
  • Metadata detailing which wishes are available in a given tile.
  • Potentially, verifiable claims about the publisher of a tile.

Example

Say Robin wants to publish The Internet Transition as a tile on social. That tile is composed of the following resources:

  • /internet-transition/: A text/html being the root page.
  • A text/css resouce, /css/berjon.min.css.
  • Two image/png resources, /icon.png and /internet-transition/trilo.png.
  • Two font/ttf resources for Mulish and Catamaran.

The full CAW describing that tile would be:

{
  "name": "The Internet Transition",
  "lang": "en",
  "icons": [{
    "src": "/icon.png",
    "sizes": "32x32"
  }],
  "banners": [{
    "src": "/internet-transition/trilo.png",
    "sizes": "880x300"
  }],
  "description": "Blah dee blah…",
  "start_url": "/internet-transition/",
  "content": {
    "/internet-transition/": {
      "type": "text/html",
      "claim": {
        "op": "assert/location",
        "rsc": "https://tilesr.us",
        "input": {
          "content" : "bafy…", 
          "location": "https://berjon.com/internet-transition/"
        }
      }
    },
    "/css/berjon.min.css": {
      "type": "text/css",
      "claim": {
        "op": "assert/location",
        "rsc": "https://tilesr.us",
        "input": {
          "content" : "bafy…", 
          "location": "https://berjon.com/css/berjon.min.css"
        }
      }
    },
    "/icon.png": {
      "type": "image/png",
      "claim": {
        "op": "assert/location",
        "rsc": "https://tilesr.us",
        "input": {
          "content" : "bafy…", 
          "location": "https://berjon.com/icon.png"
        }
      }
    },
    "/internet-transition/trilo.png": {
      "type": "image/png",
      "claim": {
        "op": "assert/location",
        "rsc": "https://tilesr.us",
        "input": {
          "content" : "bafy…", 
          "location": "https://berjon.com/internet-transition/trilo.png"
        }
      }
    },
    "/fonts/Mulish-VariableFont_wght.ttf": {
      "type": "font/ttf",
      "claim": {
        "op": "assert/location",
        "rsc": "https://tilesr.us",
        "input": {
          "content" : "bafy…", 
          "location": "https://berjon.com/fonts/Mulish-VariableFont_wght.ttf"
        }
      }
    },
    "/fonts/Catamaran-VariableFont_wght.ttf": {
      "type": "font/ttf",
      "claim": {
        "op": "assert/location",
        "rsc": "https://tilesr.us",
        "input": {
          "content" : "bafy…", 
          "location": "https://berjon.com/fonts/Catamaran-VariableFont_wght.ttf"
        }
      }
    }
  }
}

Some notes:

  • This is quite long and repetitive; we might be able to normalize the claims to some degree. Maybe there is no value in independent claims and the entire CAW could be a UCAN invocation. (Because UCAN CAW.)
  • We are operating at the URL level and not mapping onto dubious constructs like directories, which in turn require the ability to specify an index page and other such tricks. This gives us a much more web-native space of URLs to work from.
  • Content types are native and required. You can't build a robust web system without them.
  • Because the original page both has its own path (/internet-transition/) and relies on shared resources (all the CSS, fonts, etc.), we anchor the paths at the root of the origin and provide a start_url field that points to the right starting URL (it would otherwise default to /). This means that relative links will resolve correctly. Note that there is no requirement that the root of the content map must map to the root of any origin. It is only done that way above so as to easily map to existing web content. Each location could be from a completely different origin, we don't care. Essentially, start_url serves as a redirect so that loading tile://bafydeadbeef/ will go to tile://bafydeadbeef/internet-transition/. All resolution from that can then Just Work™.
  • This example does not show wish or publisher claims.
  • Absolutely no content can be loaded outside of what is listed in the CAW. This means that icons and banners have to point to content entries or they won't load.
  • The resource (rsc) of all the above invocations is a fictitious https://tilesr.us that presumably provides some sort of tiles indexing.

Architecture & Processing

CAW is basically 1) and easy map from HTTP to content addressable space and 2) an and easy way of creating a usable web bundle from content-addressed data without having to worry about weird issues with pathing, directory formats, the absence of media types, etc.

The CAW itself is not some JSON that's just loaded dynamically from an origin (as that would defeat much of the point of using content-addressing since it could point to a list of dynamic pieces of a content). Rather, it is sent to an indexer for publication and indexing.

One interesting question is whether there would be value in using a system similar to ATProto's Personal Data Server (PDS)/aggregator architecture that could deal with indexing at scale, syncing content to multiple sources (to obfuscate tracking) and notably to servers that a user can trust. This would also make search easier.

The implication is that publishing a tile is a deliberate publish action, not just dropping something in an open URL space. (Of course, the publish could be to a private or restricted space.) I think that that may actually be a plus.

Under this view, a CAW could be described in IPLD and encoded as DAG-CBOR, which is also ATProto-compatible. A CAW's CID derives from that.

URL resolution from tile://CID/some/path?query-string#anchor is:

  1. Resolve the CID using whatever you have that can resolve CIDs (IPNI, PAPAYA, ATP aggregation) and retrieve the CAW. The CID is the authority.
  2. Resolve the path (without query string or anchor) by looking up an exact match in content. If it's not found that's a 404. If there's no path, use start_url. If there's no start_url, default to /. If / is not in content then the whole resolution fails.
  3. All relative resolution, including of metadata, is off-the-shelf from the URL spec.
  4. The query string and anchor available on the location object to be used inside the tile (subject to nav tracking prevention). They never get shared over the wire.

Individual resources can be retrieved either over HTTP or CID. Ideally, resolving a CAW would provide a response making it easy to obtain the content from multiple sources or from a trusted server, to avoid tracking.

Any reason we couldn't use BLAKE3/bao for the fast streaming hash verification on large files? The ability to avoid indirections like blocks and CAR files would be a huge benefit.

It's worth recalling that the metadata here exists so as to provide ways of rendering tiles in inactive/feed mode, which is to say when they're not running. This rendering can be different in different contexts and is best left declarative and under the control of the feed.

It's an open question how Service Workers would work here. Some definitely ought to be possible; but we should be cautious.