owned this note
owned this note
Published
Linked with GitHub
# Learn you a web3.storage stack
Welcome to the _self directed_ learning programme for Saturn devs!
Developers are encouraged to explore and learn using this framework as a guide. Collect questions and areas where you need clarification and bring to the AMA sessions.
:::warning
⚠️ In **week 1** you should aim to cover **chapter 1 and 2**.
⚠️ In **week 2** you should aim to cover **chapter 3 and 4**.
:::
Each chapter has a _challenge_, consume the materials and then attempt to complete the challenge.
**Key**
🏅 Specification 📚 Documentation 🎬 Video presentation 🏛️ Library repo
---
## Prelude
🎬 [A brief history of web3.storage/NFT.Storage video](https://youtu.be/2ym2i7yULdk?si=_k3CTcC_WMPf0Z6U)
## Chapter 1 - Basics
🌟 **Challenge**: complete [learnyouw3up](https://github.com/web3-storage/learnyouw3up) workshopper.
### DID
🏅 [DID spec](https://www.w3.org/TR/did-core/)
🏅 [`did:mailto` spec](https://github.com/web3-storage/specs/blob/main/did-mailto.md)
DID stands for **D**ecentralized **Id**entifier. A DID is a [URI](https://datatracker.ietf.org/doc/html/rfc3986) that starts with [`did:`](https://www.iana.org/assignments/uri-schemes/prov/did). A DID is just an identifier that identifies something, anything really.
A DID **method** is the part after `did:`, e.g. in the DID `did:key:z6MkiK3h93N8kE6jmAgTWd33QmuyDcsMnaAeQrGjp9iixT2B`, the method is `key`.
In web3.storage we use the following DID methods:
| Method | Description |
| ------ | ----------- |
| `did:key` | A cryptographic keypair. In web3.storage we address [spaces](#Spaces) and [agents](#Agents) using `did:key`. We use [ed25519](https://cryptography.io/en/latest/hazmat/primitives/asymmetric/ed25519/) keys and [RSA](https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/) keys in the browser when necessary. Note: the text following `did:key:` is the _public key_ of the cryptographic keypair. |
| [`did:mailto`](https://github.com/web3-storage/specs/blob/main/did-mailto.md) | An email address. It refers to whoever controls an email address. We use `did:mailto` as a method of easily [sharing permissions across devices](#Access) and for billing purposes. |
| `did:web` | A web address. We refer to the service using this method i.e. `did:web:web3.storage`. |
⭐ **Mini Challenge:** Why does web3.storage use RSA keys in the browser? Hint: search "webcrypto non-extractable".
### UCAN
🏅 [UCAN spec](https://github.com/ucan-wg/spec)
🎬 [DID + UCAN primer video](https://youtu.be/grec5KQeU2U?si=Hd3r2ldsY83JJdvM)
📚 [UCANs and web3.storage docs](https://web3.storage/docs/concepts/ucans-and-web3storage/)
UCAN stands for **U**ser **C**ontrolled **A**uthorization **N**etworks. UCANs are an IPLD native authorization token. Each UCAN contains a chain of signed delegations (UCANs) that prove the issuer of the UCAN has been granted the ability to perform some action.
There is **no central authorization server** because each UCAN contains all the information required to prove the issuer of the UCAN has been granted the ability to perform an action.
You _sign_ a UCAN with your private key whenever you want to interact with web3.storage. For example, when you want to store a file you create a UCAN with the following information:
* **Issuer** - YOU! Well, kinda, it's your [agent](#Agent) DID.
* **Audience** - who the UCAN is intended for (`did:web:web3.storage`).
* **Capability** - name of the ability (`store/add` in this example). You can think of this as a method name of an Remote Procedure Call (RPC).
* **Resource** - the resource being operated on. In this case you will be asking to store a file in your [_space_](#Spaces).
* **Caveats** - data that restricts the use of the UCAN. In our RPC analogy, we can think of caveats as parameters to the method. For a `store/add` invocation UCAN, the parameters are the CID of the data being stored and it's size in bytes.
* **Proofs** - UCANs that prove you (the issuer) have been granted the ability to perform some action. i.e. you can `store/add` to your space _resource_ with the _caveats_ you have specified.
#### Delegation
Delegation is the superpower of UCANs. If you are in possession of a delegated capability, you can re-delegate the same capability (or subset of) to another DID.
Your delegated UCAN must include proof that you have been granted the capability you are delegating. This is the **delegation chain**.
#### Invocation
🏅 [Invocations spec](https://github.com/ucan-wg/invocation)
Possessing a delegated capability is worth nothing unless you use it.
:::info
ℹ️ Creating a UCAN for sending to web3.storage (invocation) is not significantly different to creating a UCAN for the purpose of delegating a capability.
:::
Invoking a delegated UCAN capability is the act of sending it to a service in order to achieve a task. It is like a Remote Procedure Call (RPC).
**Receipts** are issued in response to a UCAN invocation. They are signed by the executor (the task runner) and are an attestation of the result of the task plus any requests for further actions to be performed (async tasks). These async future tasks are called "effects".
**Ucanto** is the library web3.storage uses to perform UCAN RPC:
🏛️ [JS Ucanto library](https://github.com/web3-storage/ucanto)
🏛️ [Go Ucanto library](https://github.com/web3-storage/go-ucanto)
:::warning
⚠️ Ucanto was written before the invocation spec existed so does not conform 100% yet.
:::
⭐ **Mini Challenge:** What is the result of a `store/add` invocation? Hint: `export type StoreAddSuccess`.
### Agents
You! Well, not exactly. An agent is just the keypair you use to sign UCANs. You will typically have control over many agents, because you have many devices (📱+💻+🖥️) and use many applications.
You'll end up with an agent for each application you're using on your phone, your laptop and your desktop. In fact you'll even have multiple agents for the _same_ web application on different browsers.
Each agent has a DID, which is the public key of the keypair.
### Spaces
🏅 [w3 space spec](https://github.com/web3-storage/specs/blob/main/w3-space.md)
A space is just another cryptographic keypair! They are used by web3.storage to group stored items together. You can create as many spaces as you like, but they must be [provisioned](#Provisioning) before they can be used for storing data.
When you create a space, your client will use the private key to sign a UCAN that delegates all capabilities to your current agent. The private key is then discarded, as it is no longer needed. This delegation is stored by your client and used in proofs when storing files to the space.
### Provisioning
🏅 [w3 provider spec](https://github.com/web3-storage/specs/blob/main/w3-provider.md)
A space must be registered with a provider before it can be used. A provider in this context can only be web3.storage, but in the past it could also have been NFT.Storage.
You use your [account DID](#Accounts) to provision a space.
Provisioning a space simultaneously sets up a "subscription" relationship between the space and the account. By provisioning a space you are telling the service that your account is responsible for paying for usage of the space.
:::info
ℹ️ We will soon allow subscriptions to be added/removed so that the account responsible for paying for a space can be changed.
:::
### Accounts
🏅 [w3 accounts spec](https://github.com/web3-storage/specs/blob/main/w3-account.md)
🏅 [w3 session spec](https://github.com/web3-storage/specs/blob/main/w3-session.md)
Account DIDs are for the convenience of sharing capabilities across various user [agents](#Agents). Agents re-delegate their capabilities to an account DID and stash them with the web3.storage service to be claimed by other agents (See the [Access](#Access) section for more information on how delegations are stashed and claimed).
Agents "prove" themselves to be owned by the same entity that has owns the account DID and are then able to use account DID delegations as if they were their own.
The "proof" is an UCAN issued by the web3.storage service. It is a special attestation UCAN, that **attests** that the account DID is owned by the agent DID. We call this an **Authorization Session**.
The only way to obtain this attestation currently is by clicking on a verification email sent to a [`did:mailto`](https://github.com/web3-storage/specs/blob/main/did-mailto.md) account DID.
:::info
ℹ️ A user does not have to take advantage of this functionality. They can explicitly create delegations to another agent DID, exporting and importing them manually.
:::
Additionally account DIDs are used in billing. Account DIDs identify the billing entity for a given space. It is convenient for web3.storage to use an email address to identify customers with the payment provider (Stripe) so we use `did:mailto` at the moment.
:::info
ℹ️ Currently all account DIDs must be `did:mailto` DIDs.
:::
### Access
🏅 [w3 access spec](https://github.com/web3-storage/specs/blob/main/w3-access.md)
In the [Accounts](#Accounts) section we talked about stashing and claiming delegations to your account DID. The access protocol formalises this and is very simple:
1. You store delegations for an agent or account DID (`access/delegate`).
2. You claim delegations that apply to your agent or account DID (`access/claim`).
It is ok for web3.storage to store delegations on behalf of other users. They are not private data, they can only be used as proofs in a UCAN signed by the holder of the private key.
## Chapter 2 - Writes
🌟 **Challenge**: send a PR to improve [web3.storage/docs](https://web3.storage/docs) ([repo](https://github.com/web3-storage/www)), now that you have a better understanding of things.
### w3up-clients
To write to web3.storage you need to use a client library to send CAR encoded UCAN invocations to the service.
:::info
ℹ️ We're working on a bridge that will allow `curl` be used to invoke the service.
:::
The are two web3.storage clients available today in JS and Go. The Go client is work in progress and as such is not feature complete.
🏛️ [JS client library](https://github.com/web3-storage/w3up/tree/main/packages/w3up-client)
🏛️ [Go client library](https://github.com/web3-storage/go-w3up)
### Store vs Upload
📚 [Store vs Upload docs](https://web3.storage/docs/concepts/upload-vs-store/)
To store data with web3.storage clients send a UCAN invoking a `store/add` capability. They specify the [space](#Spaces) DID to store the data in, the CID of the data and the size of the data in bytes.
There is a maximum size for a single PUT request. So sometimes it is necessary to split an upload into multiple `store/add` requests. We call each CAR that is a part of an upload a **shard**.
When all the shards are stored, a client will register an "upload" as the union of it's shards by invoking `upload/add`.
It means that when users retrieve a list of their uploads they get a list of items they uploaded, not a list of shards they stored.
### Streaming UnixFS
🏅 [UnixFS spec](https://github.com/ipfs/specs/blob/main/UNIXFS.md#-unixfs--)
🏛️ [UnixFS writer library](https://github.com/ipld/js-unixfs)
Streaming UnixFS refers to encoding a file/directory into a UnixFS encoded DAG using web streams. Blocks are yielded from the stream as the tree is built.
This allows files/directories of arbitrary size to be encoded without using more memory than a fraction of the total file size. This is useful in the browser (where each tab receives only 2GB of RAM to operate), or other memory constrained devices.
⭐ **Mini Challenge:** What settings do web3.storage clients use by default when encoding UnixFS data?
### Sharding
The web3.storage clients expose a [`ShardingStream`](https://github.com/web3-storage/w3up/blob/e34eed1fa3d6ef24ce2c01982764f2012dbf30d8/packages/upload-client/src/sharding.js#L17) to transform a stream of blocks into a stream of CAR files.
A client collects blocks until the shard size is reached, then a CAR is encoded and yielded from the stream.
The bytes of the CAR file are hashed, and a CAR CID created. You can identify a CAR CID by [multicodec `0x0202`](https://github.com/multiformats/multicodec/blob/696e701b6cb61f54b67a33b002201450d021f312/table.csv#L140). For example:
[`bagbaiera6xcx7hiicm7sc523axbjf2otuu5nptt6brdzt4a5ulgn6qcfdwea`](https://cid.ipfs.tech/#bagbaiera6xcx7hiicm7sc523axbjf2otuu5nptt6brdzt4a5ulgn6qcfdwea)
The CAR CID is the CID used in a `store/add` invocation.
⭐ **Mini Challenge:** What is the default shard size in the web3.storage clients? Specify two reasons for this.
### Upload targets (signed URLs)
The [receipt](https://github.com/web3-storage/specs/blob/main/w3-store.md#responses) for a `store/add` invocation contains a _signed_ URL. It is signed by the service that will receive the upload, and restricted such that only content that corresponds to the specified hash and size may be uploaded.
This decouples the service that receives the `store/add` invocation from the service at the URL that receives the data.
The service that receives the `store/add` invocation can make authorization and routing decisions, instructing the client where to send the bytes and how to authorize that send (e.g. via a upload-target-specific `authorization` header in `headers`).
:::info
ℹ️ Originally designed for compatability with [Amazon S3 presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html) or [Cloudflare R2 presigned URLs](https://developers.cloudflare.com/r2/api/s3/presigned-urls/).
:::
### Conditional upload
The [receipt](https://github.com/web3-storage/specs/blob/main/w3-store.md#responses) for a `store/add` invocation also contains a `status` field that indicates whether the upload needs to be performed, or if web3.storage has already received a file with the same CID.
If a receipt for a `store/add` invocation contains `status: 'done'` then the file does not need to be uploaded.
Hooray for content addressing!
### Indexing
🏅 [CARv2 spec](https://ipld.io/specs/transport/car/carv2/)
🏛️ [Cardex library](https://github.com/alanshaw/cardex)
In order to serve the data via a trusted or trustless [IPFS gateway](#IPFS-gateways) it is necessary to index the contents of uploaded CARs, so we know which blocks are contained within a CAR (their CIDs) as well as their byte offsets. We use CARv2 indexes for this.
[CARv2](https://ipld.io/specs/transport/car/carv2/) adds the ability to the CAR format to include an optional [index](https://en.wikipedia.org/wiki/Database_index). The spec defines two possible index formats. The [`MultihashIndexSorted`](https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted) format is the index predominantly used in Filecoin, and is also the format used by web3.storage.
:::warning
⚠️ `MultihashIndexSorted` maps multihashes to byte offsets. **It does not include length**. The offset is the byte offset of a [varint](https://www.npmjs.com/package/varint) that prefixes every CID+bytes pair in a CAR file. The varint is the length. So in order to know how much data you need to read for any given block, you must first read the varint.
:::
Indexes are generated by the client and stored in a user's space. The CARv2 index is generated and stored separately to the CAR file i.e. the CAR file is not transformed into a CARv2 file. The library used to generate indexes is called [Cardex](https://github.com/alanshaw/cardex).
Clients also generate multiple "[Content Claims](#Content-Claims)" to assert that the index they uploaded relates to the CAR they uploaded.
e.g. [`bagbaierajqk22ta4w4jhes5er7kqibn5d7tqls5ffbbzyrpukbsvwylp4e7q`](https://cid.ipfs.tech/#bagbaierajqk22ta4w4jhes5er7kqibn5d7tqls5ffbbzyrpukbsvwylp4e7q) _includes_ [`bagaqqera4sswi5amck6hsfyv5hojdiaq2hbscgzayen6p75vj3tx4nlzn6ca`](https://cid.ipfs.tech/#bagaqqera4sswi5amck6hsfyv5hojdiaq2hbscgzayen6p75vj3tx4nlzn6ca)
There will be more information about Content Claims in the next Chapter.
:::warning
🚧 **Work in Progress** - CARv2 indexes and block indexes are _currently_ generated and stored on behalf of users, triggered by S3/R2 bucket events. [Content claims](#Content-claims) are materialized on demand. i.e. The client libraries do not yet generate indexes(!), store them or publish content claims for them.
:::
⭐ **Mini Challenge:** Why is `MultihashIndexSorted` not a great index format for reading blocks directly from a remote CAR file?
### w3up-api
🏛️ [w3up protocol implementation libraries](https://github.com/web3-storage/w3up)
🏛️ [w3up infrastructure repo](https://github.com/web3-storage/w3infra)
The implementation of [web3.storage specs](https://github.com/web3-storage/specs/) is separated from infrastructure or environment specifics. The [w3up repo](https://github.com/web3-storage/w3up) is a monorepo that contains many JavaScript [packages](https://github.com/web3-storage/w3up/tree/main/packages) - libraries that are meant to be depended on by other packages, who instantiate the implementations with appropriate adapters to underlying storage and networking layers. As such, the w3up packages should run almost anywhere - Node.js, [Workerd](https://github.com/cloudflare/workerd), [AWS Lambda](https://en.wikipedia.org/wiki/AWS_Lambda) etc.
The [w3infra repo](https://github.com/web3-storage/w3infra) consumes (depends on) the implementation in the w3up repo. It is deployed to AWS, and uses primitives like Lambda, SQS, DynamoDB and S3.
Infrastructure is managed using [sst.dev](https://sst.dev/), which is an [Infrastructure as Code](https://en.wikipedia.org/wiki/Infrastructure_as_code)-in-JS toolkit. [Seed.run](https://seed.run/) automatically manages deployments. Each PR generates a deployment and merging to `main` deploys to a staging environment. Deployments to staging are promoted to production manually when desired.
The w3up service runs at [up.web3.storage](https://up.web3.storage). All web3.storage services have a staging deployment.
### IPNI
🏅 [IPNI spec](https://github.com/ipni/specs/blob/main/IPNI.md)
📚 [IPNI docs](https://docs.ipfs.tech/concepts/ipni/)
🏛️ [IPNI library](https://github.com/web3-storage/ipni)
IPNI stands for **I**nter**P**lanetary **N**etwork **I**ndexers. Advertising to IPNI ensures that content is discoverable in the IPFS network without having to publish provider records to the IPFS DHT. For large storage providers (like web3.storage), it is more scalable and sustainable.
After content has been stored, clients opt in to IPNI indexing by sending an `ipni/offer` invocation. Then web3.storage constructs the appropriate IPNI advert and informs IPNI that it exists.
:::warning
🚧 **Work in Progress** - Advertising to IPNI is _currently_ done automatically on behalf of users, triggered by S3/R2 bucket events. i.e. clients do not send `ipni/offer` invocations(!). The [spec](https://github.com/web3-storage/specs/pull/85) for offering data to IPNI via a UCAN invocation is in review.
:::
## Chapter 3 - Writes cont.
🌟 **Challenge**: upload a file at the start of the day and find your PoDSI inclusion proof in [console.web3.storage](https://console.web3.storage) at the end of the day.
### Content claims
🏅 [Content Claims spec (PR)](https://github.com/web3-storage/specs/pull/86)
🏛️ [Content Claims implementation repo](https://github.com/web3-storage/content-claims)
Content Claims is a UCAN system allowing actors to assert information about content (identified by a CID).
For example, you can sign a UCAN that claims a CID can be found at a given URL.
There are a few specified types of claims available:
* [Location](https://github.com/web3-storage/content-claims#location-claim) - Asserts that content can be found at the provided URL.
* [Equivalency](https://github.com/web3-storage/content-claims#equivalency-claim) - Asserts that a content CID is equivalent to another CID.
* [Inclusion](https://github.com/web3-storage/content-claims#inclusion-claim) - Asserts that content includes content identified by another CID.
* [Partition](https://github.com/web3-storage/content-claims#partition-claim) - Asserts that content can be read from parts it has been divided into.
* [Relation](https://github.com/web3-storage/content-claims#relation-claim) - Asserts that content links directly to a set of other CIDs.
#### Generated claims
Client libraries generate content claims and submit them to web3.storage when uploading data. The claims created automatically by clients are as follows:
1. Partition claims
* DAG root CID ➡️ Shard CID(s)
* Index CID ➡️ Shard CID(s)
2. Inclusion claim for each Shard CID ➡️ Index CID
3. Relation claims
* Non-leaf Parent ➡️ Child CID(s)
* Leaf CIDs when an only child of a UnixFS directory
4. Equivalency claim for Shard CID ➡️ Piece CID
:::warning
🚧 **Work in Progress** - We're actively working on adding this functionality to the client. The old web3.storage API (pre-UCANs) generated these claims for uploaded content on behalf of the client.
:::
#### Content resolution
Given a root CID for some data, clients can learn all the information they need to extract blocks in the correct order from a CAR file:
1. Read partition claim for root CID to get a list of CAR shard(s) the DAG can be read from.
2. Read inclusion claim to get index CID of the shard.
3. Read partition claim for index CID to get shard CID where index can be read from.
4. Read indexes.
5. Read blocks.
#### HTTP API
📚 [HTTP API docs](https://github.com/web3-storage/content-claims#http-api)
Content Claims has a very simple HTTP API that returns a CAR file of claims for a given CID.
The `?walk=` query parameter allows properties of content claims to be "expanded", effectively allowing step 1-3 of the [content resolution](#Content-resolution) to be performed in a single request.
#### GraphQL API
🏛️ [GraphQL implementation repo](https://github.com/web3-storage/content-claims-gql)
There's an _experimental_ GraphQL API available at [graphql.claims.dag.haus](https://graphql.claims.dag.haus).
### Filecoin pipeline
🏅 [w3filecoin spec](https://github.com/web3-storage/specs/blob/main/w3-filecoin.md)
🏛️ [w3filecoin protocol implementation libraries](https://github.com/web3-storage/w3filecoin)
🏛️ [w3filecoin infrastructure repo](https://github.com/web3-storage/w3filecoin-infra)
🎬 [DUDE! Where's my CAR? (w3filecoin pipeline) video](https://www.youtube.com/watch?v=vAzjXtRqChg)
Clients opt into storing their data on Filecoin by submitting a `filecoin/offer` invocation. The invocation includes the shard CID _and_ a piece CID, which are both calculated client side.
A UCAN invocation receipt is generated asynchronously and contains the inclusion proof and data aggregation proof for the offered piece. i.e. a proof that the piece has been included in an aggregate and the proof that the aggregate has been included in a Filecoin deal. The inclusion proof is called [PoDSI](#PoDSI) for short.
:::warning
🚧 **Work in Progress** - Currently all uploads are _automatically_ submitted to the Filecoin pipeline when received. The client does not yet send this invocation. However, piece CIDs _are_ calculated.
:::
#### Pieces
📚 [Piece docs](https://spec.filecoin.io/systems/filecoin_files/piece/)
🏅 [Piece Multihash and v2 Piece CID spec](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md)
🏛️ [fr32-sha2-256-trunc254-padded-binary-tree-multihash library](https://github.com/web3-storage/fr32-sha2-256-trunc254-padded-binary-tree-multihash)
🏛️ [data-segment library](https://github.com/web3-storage/data-segment)
Piece CIDs (CommP) are used to address data in Filecoin deals. In short, they are an alternative merkle DAG encoding of some data. They address a binary tree, with "padding".
Piece CID calculation is done in the _client_ by the Rust WASM based [fr32-sha2-256-trunc254-padded-binary-tree-multihash](https://github.com/web3-storage/fr32-sha2-256-trunc254-padded-binary-tree-multihash) library.
In web3.storage we use [v2 piece CIDs](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md) because they encode the tree size, which otherwise must always be passed around with the piece CID.
The [data-segment](https://github.com/web3-storage/data-segment) library provides functionality for generating an aggregate from a set of pieces. It is based on the [go-data-segment](https://github.com/filecoin-project/go-data-segment) library.
#### Aggregation
🎬 [How data aggregation for Filecoin works video](https://drive.google.com/file/d/1lJ-66tbtQXbmgkm75G0ZmCmio9jr7svA/view)
📚 [Verifiable aggregation docs](https://web3.storage/docs/concepts/podsi/#verifiable-aggregation-pipeline)
![](https://bafybeicwuyt5oan2ivucjmskemdbbwz447a5v3ucljuoyha7z3nf4h2hje.ipfs.w3s.link/w3filecoin.png)
The above diagram shows the actors in the aggregation pipeline and the UCAN invocations that allow pieces to be tracked. All actors accept the Broker and Storage Providers are run by web3.storage.
* **Storefront** - a data storage provider such as web3.storage.
* **Aggregator** - combines individual pieces together into a bigger piece.
* **Dealer** - ensures aggregates are included in Filecoin deals.
* **Broker** - arranges deals with Filecoin Storage Provider(s) on behalf of another party.
* **Deal tracker** - knows active deals for an aggregate piece.
The aggregation pipeline makes use of asynchronous `fx` (defined by the [UCAN invocation spec](https://github.com/ucan-wg/invocation#7-effect)) to allow an offered piece to be tracked through the pipeline. As each stage completes, the service issues a signed receipt with the result for the current stage and `fx` for future tasks.
The receipt for `filecoin/offer` contains two `fx` tasks:
1. `filecoin/submit` - allows detailed tracing through the pipeline at every stage. i.e. `piece/offer` to `piece/accept` to `aggregate/offer` and finally `aggregate/accept`.
2. `filecoin/accept` - resolves to a receipt that contains the data inclusion proof (inclusion proof + deal ID) when the pipeline has completed processing the piece.
The [documentation for the pipeline](https://web3.storage/docs/concepts/podsi/#verifiable-aggregation-pipeline) explains the stages in more depth.
:::info
💡 web3.storage implements and runs a storefront, aggregator, dealer and deal tracker. However, these services could be run by other parties.
:::
#### PoDSI
🏅 [PoDSI spec](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0058.md)
📚 [PoDSI docs](https://web3.storage/docs/concepts/podsi/)
PoDSI stands for **P**roof **o**f **D**ata **S**egment **I**nclusion. It is a proof that a smaller piece (a segment) has been included in a larger piece (an aggregate).
Example merkle proof showing path from aggregate (piece) CID to segment (piece) CID:
```
bafkzcibcaapfjdsoqkjscfzhczy4rnjimqx3tgsno65o6kqeoswkfydrj7rhmpi (Aggregate CID)
├─┬ bafkzcibcaapda5pjhb5sclhwtao2wt6ukuptynafmmnf3drdyaw6yist5en3uci
│ ├── bafkzcibcaaoioxpjsdlccu22wxdatimwsacolfrxg52tqriycijqrixaeb4uanq
│ └─┬ bafkzcibcaaovcpmmjibcigzzwgvtvbcneckv7sc4iueexjy3rbq4z6k3vh7riaq
│ ├── bafkzcibcaanvqzhc4jafjsaqpxicoh3x33zhalfcmmxocqccgdwtmryd54axyeq
│ └─┬ bafkzcibcaaof7lu2l5z4n23smidtebghj2bp7aa7wgwhwksmj6wcoxuimmnreki
│ ├─┬ bafkzcibcaanqs3rdqqhyjrf3kmlq2syimvygcyppthhvex7qjmfendqaqirxijq
│ │ ├─┬ bafkzcibcaang5lhkfpze6bpa5jsfdynk363y2sgvvaz2en524eqt2n2q7v6syha
│ │ │ ├── bafkzcibcaamk2cnccbfg3cryx3s5bvi3u7yyc633vvxzw5lyv4obtebg377o4ai
│ │ │ └─┬ bafkzcibcaam2z7wou43cgvxckx3zhzygx3x2wrocmjmye4xtlus7ghtaznosecy
│ │ │ ├── bafkzcibcaalylgnufj5szpcnmsmt4rftzwyvrena2b644v7o7yknvknymqd52oi
│ │ │ └─┬ bafkzcibcaamjhl43i42ixgpuwuah2zr22jkt6pf4kbpvosyembvsql7jkbsx2py
│ │ │ ├── 🧩 bafkzcibextmt6froqprrlsphlpumy7g6kfiu7xtn2gpu7alxkf2qsc4afwduc2xmaq (Piece CID)
│ │ │ └── bafkzcibcaalmucskjhejhff44ddxyrbck54j3ei5tmybujnkchrixwsouranqhi
│ │ └── bafkzcibcaamxpurbxkxhikuinbeek43fhjsm2qfm47chupscmkizwlnr6vtrefy
│ └── bafkzcibcaani5xvksb3id3rpmv2o2eqkiohwfr4ivluj4qekfvn2rntwqtqnslq
└── bafkzcibcaao3vgx4thj5krfvnwj7zecxrf3qwep7omyyhrhnsh3v25at3wopwfq
```
:::info
ℹ️ PoDSI is used by [Lighthouse](https://docs.lighthouse.storage/lighthouse-1/filecoin-virtual-machine/podsi) for example, as well as other aggregators and is part of the [FVM-Enabled Deal Aggregation FRC](https://github.com/filecoin-project/FIPs/pull/879) (currently WIP).
:::
## Chapter 4 - Reads
🌟 **Challenge**: use the Dagula CLI to download a CAR file of an upload from Hoverboard.
### IPFS gateways
🏅 [Trustless Gateway spec](https://specs.ipfs.tech/http-gateways/trustless-gateway/)
🏛️ [Freeway implementation repo](https://github.com/web3-storage/freeway)
📚 [IPFS Gateway architecture diagram](https://miro.com/app/board/uXjVMvZWPkY=/?moveToWidget=3458764576196584238&cot=14)
The IPFS gateway operated by web3.storage is called Freeway. It uses content claims to resolve and serve data as [described in the Content Claims section](#Content-resolution).
Freeway is a path based, trusted+trustless gateway. It is exposed publically as [w3s.link](https://w3s.link) (aka "[edge gateway](https://github.com/web3-storage/reads/tree/main/packages/edge-gateway)") and [dag.w3s.link](https://dag.w3s.link), which is _trustless only_.
Freeway serves data from CARs at rest. Index data helps Freeway extract blocks as they are needed. This could be as part of a trusted UnixFS export or as a trustless DFS traversal CAR request. Freeway uses HTTP range requests to extract just the bytes that are needed for a given block. It's blockstore is able to send a single request with a bigger range when multiple blocks that are close together are required.
⭐ **Mini Challenge:** The web3.storage gateway currently runs on Cloudflare workers, what might some of the challenges be running a gateway in this environment? Hint: workers are _significantly_ resource constrained.
#### Libraries
🏛️ [gateway-lib library](https://github.com/web3-storage/gateway-lib)
🏛️ [dagula library](https://github.com/web3-storage/dagula)
Over time we've had to setup a few different gateways `w3s.link`, `dag.w3s.link` as well as `nftstorage.link` and an experimental AWS lambda based gateway [Autobahn](https://github.com/web3-storage/autobahn). We extracted a [gateway-lib](https://github.com/web3-storage/gateway-lib) library that provides a toolkit for routing, CORS, caching, but primarily handlers for each content type that needs to be served e.g. UnixFS, raw block, CAR.
The gateway library uses [Dagula](https://github.com/web3-storage/dagula) as a general purpose path traversal and content exporter tool. Dagula is similar to [Lassie](https://github.com/filecoin-project/lassie) except it has no built in functionality for retrieving from different sources. It simply takes a blockstore interface and exports data by requesting blocks.
Freeway instantiates Dagula with a blockstore that is backed by Content Claims and HTTP range requests to a Cloudflare R2 bucket. There is a Dagula CLI that creates an instance with a blockstore that is backed by libp2p and bitswap.
#### Ex-racing gateway
The edge gateway (`w3s.link`) was once a racing gateway. It used to dispatch a request to multiple gateways (including Freeway) and serve the first to respond. Now the edge gateway uses Freeway only and redirects to `dweb.link` if Freeway is not able to serve the request. Note: includes when an IPNS path is requested!
#### CDN cache
Both `w3s.link` and `dag.w3s.link` make use of [Cloudflare's CDN cache](https://developers.cloudflare.com/workers/runtime-apis/cache/). Every successful response under 500MB (the CF limit for cached items) is cached.
#### Denylist
🏛️ [denylist implementation repo](https://github.com/web3-storage/reads/tree/main/packages/denylist)
Both `w3s.link` and `dag.w3s.link` use our denylist API. The denylist is a superset of the [badbits](https://badbits.dwebops.pub/) list. The denylist has a simple API that returns a 404 status for items _NOT_ on the denylist and a 200 status otherwise `GET /:cid`.
The denylist is currently an internal tool and is not documented publically because of it. You can access it, for example, like this:
[denylist.dag.haus/bafybeia6ddkaspqckx324sv3m2cc4dh23ki6qgznx3k2una2ejkvxnk4iy](https://denylist.dag.haus/bafybeia6ddkaspqckx324sv3m2cc4dh23ki6qgznx3k2una2ejkvxnk4iy)
#### CAR native gateway
Since we content address CAR shards in web3.storage it is trivial to serve them directly from the gateway. You can request any CAR from `w3s.link` using a CAR CID. HTTP range requests are also permitted.
e.g. [w3s.link/ipfs/bagbaierabxhdw7wglmlehzgobjuoq3v3bdv64iagjdhu74ysjvdecrezxldq](https://w3s.link/ipfs/bagbaierabxhdw7wglmlehzgobjuoq3v3bdv64iagjdhu74ysjvdecrezxldq)
This functionality enables _any_ client to operate as a "freeway", including Saturn nodes and browser applications.
#### Dudewhere and Satnav indexes
These are legacy indexes that Freeway used to rely on exclusively to serve content.
* Dudewhere - Mapping of root data CIDs to CAR CID(s).
* Satnav - Indexes of block offsets within CARs.
:::warning
🚧 **Work in Progress** - These indexes are currently being replaced with Content Claims.
:::
### Bitswap
A libp2p bitswap API is available for all content stored to web3.storage.
#### Elastic IPFS
🎬 [The Rise of Elastic IPFS video](https://youtu.be/OHmoh2aPyiY?si=eLVrgEcHDBEFMa9p)
🎬 [Elastic IPFS, Freeway and Dudewhere/Satnav indexes video](https://youtu.be/A1c6JEEOgW4?si=Roc5tTg9-54tOb4b)
Elastic IPFS is a cloud native IPFS implementation that reads from CAR files at rest. Elastic IPFS consists of three main subsystems:
1. **Indexing subsystem** - an S3 bucket event driven lambda that reads CAR files and writes block index data to DynamoDB.
2. **Peer subsystem** - a Kubernetes cluster of long lived peers that use the indexing subsystem in order to read blocks from CAR files in buckets and send them out over bitswap.
3. **Publishing subsystem** - an SQS queue and consumers that write IPNI advertisements.
:::warning
🚧 **Work in Progress** - We are in the process of transitioning to the second iteration of the system.
[Hoverboard](#Hoverboard) has replaced the Kubernetes based bitswap peer.
[Content Claims](#Content-claims) are currently in use in production and will eventually fully replace the DynamoDB index.
The [`ipni/offer` invocation](#IPNI) will replace the automatic IPNI publishing subsystem.
:::
#### Hoverboard
🎬 [Hoverboard video](https://drive.google.com/file/d/1A_a6AjnShT2Digane1zwJ0-oBniYflnO/view)
📚 [Hoverboard architecture diagram](https://miro.com/app/board/uXjVMvZWPkY=/?moveToWidget=3458764564114524183&cot=14)
🏛️ [Hoverboard implementation repo](https://github.com/web3-storage/hoverboard)
Hoverboard is a bitswap peer that runs in Cloudflare workers. It uses Dagula with a blockstore that is backed by Content Claims and a Cloudflare R2 bucket.
The libp2p address for hoverboard is:
```console
/dns4/elastic.dag.house/tcp/443/wss/p2p/bafzbeibhqavlasjc7dvbiopygwncnrtvjd2xmryk5laib7zyjor6kf3avm
```
:::warning
🚧 **Work in Progress** - We're teaching Hoverboard to use Content Claims! Currently Hoverboard reads from the Elastic IPFS DynamoDB index.
:::
### Roundabout
📚 [Roundabout docs](https://github.com/web3-storage/w3infra/blob/main/docs/roundabout.md)
The Roundabout is a tool for obtaining a signed URL for a given CAR or piece CID that allows it to be downloaded directly from a Cloudflare R2 bucket without incurring egress charges.
Roundabout URLs are used by Filecoin Storage Providers to download CAR content for inclusion in deals.
## Chapter 5 - Miscellanea
### w3ui
🏛️ [w3ui implementation repo](https://github.com/web3-storage/w3ui)
Headless, type-safe, UI components for the web3.storage APIs. The w3ui project originally provided components for React, Solid, and Vue. Due to significant refactors in the client library and time/resource constraints w3ui currently only provides components for React.
### console.web3.storage
🏛️ [console implementation repo](https://github.com/web3-storage/console)
Web UI for the web3.storage service. Console is an example of how to build an app with w3ui.
[console.web3.storage](https://console.web3.storage)
### w3name
🏛️ [w3name implementation repo](https://github.com/web3-storage/w3name)
A HTTP API to IPNS. Also provides a feature where changes to IPNS keys are pushed over a websocket connection to connected clients. IPNI record updates are also published to the IPFS DHT, and periodically re-published.
### w3clock
🏅 [w3clock spec](https://github.com/web3-storage/specs/blob/main/w3-clock.md)
🏅 [Merkle Clock spec](https://arxiv.org/pdf/2004.00107.pdf)
🏛️ [w3clock implementation repo](https://github.com/web3-storage/w3clock)
UCAN based merkle clock implementation.
### For the interested reader
Additional tech used in web3.storage you might like to explore:
* [ipfs-car](https://github.com/web3-storage/ipfs-car) - CLI tool (and library) for packaging and unpacking CARs and inspecting their contents
* [add-to-web3](https://github.com/web3-storage/add-to-web3) - A github action that...well...you can guess
* [UCAN invocation stream & metrics](https://github.com/web3-storage/w3infra/blob/main/docs/ucan-invocation-stream.md) - All invocation UCAN go to the stream, we can attach consumers to perform actions
* [billing](https://github.com/web3-storage/w3infra/blob/main/docs/billing.tldr) - How web3.storage customers are billed for their usage
* [pail](https://github.com/web3-storage/pail) - Merkle-CRDT based key value store. [Video](https://www.youtube.com/watch?v=ukfrmBVrpo8).
## Appendix
### Resources
* [learnyouw3up workshopper](https://github.com/web3-storage/learnyouw3up)
* web3.storage/docs
* [UCAN](https://web3.storage/docs/concepts/ucans-and-web3storage/)
* [upload vs store](https://web3.storage/docs/concepts/upload-vs-store/)
* [PoDSI](https://web3.storage/docs/concepts/podsi/)
* Main repos
* [ucan-wg](https://github.com/ucan-wg)
* [ucan spec](https://github.com/ucan-wg/spec)
* [invocations spec](https://github.com/ucan-wg/invocation)
* [web3.storage specs](https://github.com/web3-storage/specs)
* [w3up](https://github.com/web3-storage/w3up)
* [w3infra](https://github.com/web3-storage/w3infra)
* [w3cli](https://github.com/web3-storage/w3cli)
* [console](https://github.com/web3-storage/console)
* [w3ui](https://github.com/web3-storage/w3ui)
* [w3filecoin](https://github.com/web3-storage/w3filecoin)
* [filecoin-api](https://github.com/web3-storage/w3up/tree/main/packages/filecoin-api)
* [filecoin-client](https://github.com/web3-storage/w3up/tree/main/packages/filecoin-client)
* [w3filecoin-infra](https://github.com/web3-storage/w3filecoin-infra)
* [data-segment](https://github.com/web3-storage/data-segment)
* [fr32-sha2-256-trunc254-padded-binary-tree-multihash](https://github.com/web3-storage/fr32-sha2-256-trunc254-padded-binary-tree-multihash)
* [edge-gateway](https://github.com/web3-storage/reads/tree/main/packages/edge-gateway)
* [Freeway](https://github.com/web3-storage/freeway)
* [dag.w3s.link](https://github.com/web3-storage/dag.w3s.link)
* [gateway-lib](https://github.com/web3-storage/gateway-lib)
* [dagula](https://github.com/web3-storage/dagula)
* [Hoverboard](https://github.com/web3-storage/hoverboard)
* [Content Claims](https://github.com/web3-storage/content-claims)
* [Ucanto](https://github.com/web3-storage/ucanto)
* [js-unixfs](https://github.com/ipld/js-unixfs)
* Videos
* [Brief history of NFT.Storage/web3.storage](https://youtu.be/2ym2i7yULdk?si=_k3CTcC_WMPf0Z6U)
* [DID + UCAN primer](https://youtu.be/grec5KQeU2U?si=Hd3r2ldsY83JJdvM)
* [The Rise of Elastic IPFS](https://youtu.be/OHmoh2aPyiY?si=eLVrgEcHDBEFMa9p)
* [Elastic IPFS, Freeway and Dudewhere/Satnav indexes](https://youtu.be/A1c6JEEOgW4?si=Roc5tTg9-54tOb4b)
* [Where is my CAR? (w3filecoin pipeline)](https://www.youtube.com/watch?v=vAzjXtRqChg)
* [How data aggregation for filecoin works](https://drive.google.com/file/d/1lJ-66tbtQXbmgkm75G0ZmCmio9jr7svA/view)
* [Hoverboard](https://drive.google.com/file/d/1A_a6AjnShT2Digane1zwJ0-oBniYflnO/view)
* Diagrams
* [Writes pipeline](https://gist.github.com/olizilla/f92c41e4edb963fed5e56ef894bf74f4#file-w3s-writes-pipeline-legacy-md)
* [Supplemental diagram with content claims](https://miro.com/app/board/uXjVMvZWPkY=/)