or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Learn you a web3.storage stack
Welcome to the self directed learning programme for Saturn devs!
Developers are encouraged to explore and learn using this framework as a guide. Collect questions and areas where you need clarification and bring to the AMA sessions.
⚠️ In week 1 you should aim to cover chapter 1 and 2.
⚠️ In week 2 you should aim to cover chapter 3 and 4.
Each chapter has a challenge, consume the materials and then attempt to complete the challenge.
Key
🏅 Specification 📚 Documentation 🎬 Video presentation 🏛️ Library repo
Prelude
🎬 A brief history of web3.storage/NFT.Storage video
Chapter 1 - Basics
🌟 Challenge: complete learnyouw3up workshopper.
DID
🏅 DID spec
🏅
did:mailto
specDID stands for Decentralized Identifier. A DID is a URI that starts with
did:
. A DID is just an identifier that identifies something, anything really.A DID method is the part after
did:
, e.g. in the DIDdid:key:z6MkiK3h93N8kE6jmAgTWd33QmuyDcsMnaAeQrGjp9iixT2B
, the method iskey
.In web3.storage we use the following DID methods:
did:key
did:key
. We use ed25519 keys and RSA keys in the browser when necessary. Note: the text followingdid:key:
is the public key of the cryptographic keypair.did:mailto
did:mailto
as a method of easily sharing permissions across devices and for billing purposes.did:web
did:web:web3.storage
.⭐ Mini Challenge: Why does web3.storage use RSA keys in the browser? Hint: search "webcrypto non-extractable".
UCAN
🏅 UCAN spec
🎬 DID + UCAN primer video
📚 UCANs and web3.storage docs
UCAN stands for User Controlled Authorization Networks. UCANs are an IPLD native authorization token. Each UCAN contains a chain of signed delegations (UCANs) that prove the issuer of the UCAN has been granted the ability to perform some action.
There is no central authorization server because each UCAN contains all the information required to prove the issuer of the UCAN has been granted the ability to perform an action.
You sign a UCAN with your private key whenever you want to interact with web3.storage. For example, when you want to store a file you create a UCAN with the following information:
did:web:web3.storage
).store/add
in this example). You can think of this as a method name of an Remote Procedure Call (RPC).store/add
invocation UCAN, the parameters are the CID of the data being stored and it's size in bytes.store/add
to your space resource with the caveats you have specified.Delegation
Delegation is the superpower of UCANs. If you are in possession of a delegated capability, you can re-delegate the same capability (or subset of) to another DID.
Your delegated UCAN must include proof that you have been granted the capability you are delegating. This is the delegation chain.
Invocation
🏅 Invocations spec
Possessing a delegated capability is worth nothing unless you use it.
ℹ️ Creating a UCAN for sending to web3.storage (invocation) is not significantly different to creating a UCAN for the purpose of delegating a capability.
Invoking a delegated UCAN capability is the act of sending it to a service in order to achieve a task. It is like a Remote Procedure Call (RPC).
Receipts are issued in response to a UCAN invocation. They are signed by the executor (the task runner) and are an attestation of the result of the task plus any requests for further actions to be performed (async tasks). These async future tasks are called "effects".
Ucanto is the library web3.storage uses to perform UCAN RPC:
🏛️ JS Ucanto library
🏛️ Go Ucanto library
⚠️ Ucanto was written before the invocation spec existed so does not conform 100% yet.
⭐ Mini Challenge: What is the result of a
store/add
invocation? Hint:export type StoreAddSuccess
.Agents
You! Well, not exactly. An agent is just the keypair you use to sign UCANs. You will typically have control over many agents, because you have many devices (📱+💻+🖥️) and use many applications.
You'll end up with an agent for each application you're using on your phone, your laptop and your desktop. In fact you'll even have multiple agents for the same web application on different browsers.
Each agent has a DID, which is the public key of the keypair.
Spaces
🏅 w3 space spec
A space is just another cryptographic keypair! They are used by web3.storage to group stored items together. You can create as many spaces as you like, but they must be provisioned before they can be used for storing data.
When you create a space, your client will use the private key to sign a UCAN that delegates all capabilities to your current agent. The private key is then discarded, as it is no longer needed. This delegation is stored by your client and used in proofs when storing files to the space.
Provisioning
🏅 w3 provider spec
A space must be registered with a provider before it can be used. A provider in this context can only be web3.storage, but in the past it could also have been NFT.Storage.
You use your account DID to provision a space.
Provisioning a space simultaneously sets up a "subscription" relationship between the space and the account. By provisioning a space you are telling the service that your account is responsible for paying for usage of the space.
ℹ️ We will soon allow subscriptions to be added/removed so that the account responsible for paying for a space can be changed.
Accounts
🏅 w3 accounts spec
🏅 w3 session spec
Account DIDs are for the convenience of sharing capabilities across various user agents. Agents re-delegate their capabilities to an account DID and stash them with the web3.storage service to be claimed by other agents (See the Access section for more information on how delegations are stashed and claimed).
Agents "prove" themselves to be owned by the same entity that has owns the account DID and are then able to use account DID delegations as if they were their own.
The "proof" is an UCAN issued by the web3.storage service. It is a special attestation UCAN, that attests that the account DID is owned by the agent DID. We call this an Authorization Session.
The only way to obtain this attestation currently is by clicking on a verification email sent to a
did:mailto
account DID.ℹ️ A user does not have to take advantage of this functionality. They can explicitly create delegations to another agent DID, exporting and importing them manually.
Additionally account DIDs are used in billing. Account DIDs identify the billing entity for a given space. It is convenient for web3.storage to use an email address to identify customers with the payment provider (Stripe) so we use
did:mailto
at the moment.ℹ️ Currently all account DIDs must be
did:mailto
DIDs.Access
🏅 w3 access spec
In the Accounts section we talked about stashing and claiming delegations to your account DID. The access protocol formalises this and is very simple:
access/delegate
).access/claim
).It is ok for web3.storage to store delegations on behalf of other users. They are not private data, they can only be used as proofs in a UCAN signed by the holder of the private key.
Chapter 2 - Writes
🌟 Challenge: send a PR to improve web3.storage/docs (repo), now that you have a better understanding of things.
w3up-clients
To write to web3.storage you need to use a client library to send CAR encoded UCAN invocations to the service.
ℹ️ We're working on a bridge that will allow
curl
be used to invoke the service.The are two web3.storage clients available today in JS and Go. The Go client is work in progress and as such is not feature complete.
🏛️ JS client library
🏛️ Go client library
Store vs Upload
📚 Store vs Upload docs
To store data with web3.storage clients send a UCAN invoking a
store/add
capability. They specify the space DID to store the data in, the CID of the data and the size of the data in bytes.There is a maximum size for a single PUT request. So sometimes it is necessary to split an upload into multiple
store/add
requests. We call each CAR that is a part of an upload a shard.When all the shards are stored, a client will register an "upload" as the union of it's shards by invoking
upload/add
.It means that when users retrieve a list of their uploads they get a list of items they uploaded, not a list of shards they stored.
Streaming UnixFS
🏅 UnixFS spec
🏛️ UnixFS writer library
Streaming UnixFS refers to encoding a file/directory into a UnixFS encoded DAG using web streams. Blocks are yielded from the stream as the tree is built.
This allows files/directories of arbitrary size to be encoded without using more memory than a fraction of the total file size. This is useful in the browser (where each tab receives only 2GB of RAM to operate), or other memory constrained devices.
⭐ Mini Challenge: What settings do web3.storage clients use by default when encoding UnixFS data?
Sharding
The web3.storage clients expose a
ShardingStream
to transform a stream of blocks into a stream of CAR files.A client collects blocks until the shard size is reached, then a CAR is encoded and yielded from the stream.
The bytes of the CAR file are hashed, and a CAR CID created. You can identify a CAR CID by multicodec
0x0202
. For example:bagbaiera6xcx7hiicm7sc523axbjf2otuu5nptt6brdzt4a5ulgn6qcfdwea
The CAR CID is the CID used in a
store/add
invocation.⭐ Mini Challenge: What is the default shard size in the web3.storage clients? Specify two reasons for this.
Upload targets (signed URLs)
The receipt for a
store/add
invocation contains a signed URL. It is signed by the service that will receive the upload, and restricted such that only content that corresponds to the specified hash and size may be uploaded.This decouples the service that receives the
store/add
invocation from the service at the URL that receives the data.The service that receives the
store/add
invocation can make authorization and routing decisions, instructing the client where to send the bytes and how to authorize that send (e.g. via a upload-target-specificauthorization
header inheaders
).ℹ️ Originally designed for compatability with Amazon S3 presigned URLs or Cloudflare R2 presigned URLs.
Conditional upload
The receipt for a
store/add
invocation also contains astatus
field that indicates whether the upload needs to be performed, or if web3.storage has already received a file with the same CID.If a receipt for a
store/add
invocation containsstatus: 'done'
then the file does not need to be uploaded.Hooray for content addressing!
Indexing
🏅 CARv2 spec
🏛️ Cardex library
In order to serve the data via a trusted or trustless IPFS gateway it is necessary to index the contents of uploaded CARs, so we know which blocks are contained within a CAR (their CIDs) as well as their byte offsets. We use CARv2 indexes for this.
CARv2 adds the ability to the CAR format to include an optional index. The spec defines two possible index formats. The
MultihashIndexSorted
format is the index predominantly used in Filecoin, and is also the format used by web3.storage.⚠️
MultihashIndexSorted
maps multihashes to byte offsets. It does not include length. The offset is the byte offset of a varint that prefixes every CID+bytes pair in a CAR file. The varint is the length. So in order to know how much data you need to read for any given block, you must first read the varint.Indexes are generated by the client and stored in a user's space. The CARv2 index is generated and stored separately to the CAR file i.e. the CAR file is not transformed into a CARv2 file. The library used to generate indexes is called Cardex.
Clients also generate multiple "Content Claims" to assert that the index they uploaded relates to the CAR they uploaded.
e.g.
bagbaierajqk22ta4w4jhes5er7kqibn5d7tqls5ffbbzyrpukbsvwylp4e7q
includesbagaqqera4sswi5amck6hsfyv5hojdiaq2hbscgzayen6p75vj3tx4nlzn6ca
There will be more information about Content Claims in the next Chapter.
🚧 Work in Progress - CARv2 indexes and block indexes are currently generated and stored on behalf of users, triggered by S3/R2 bucket events. Content claims are materialized on demand. i.e. The client libraries do not yet generate indexes(!), store them or publish content claims for them.
⭐ Mini Challenge: Why is
MultihashIndexSorted
not a great index format for reading blocks directly from a remote CAR file?w3up-api
🏛️ w3up protocol implementation libraries
🏛️ w3up infrastructure repo
The implementation of web3.storage specs is separated from infrastructure or environment specifics. The w3up repo is a monorepo that contains many JavaScript packages - libraries that are meant to be depended on by other packages, who instantiate the implementations with appropriate adapters to underlying storage and networking layers. As such, the w3up packages should run almost anywhere - Node.js, Workerd, AWS Lambda etc.
The w3infra repo consumes (depends on) the implementation in the w3up repo. It is deployed to AWS, and uses primitives like Lambda, SQS, DynamoDB and S3.
Infrastructure is managed using sst.dev, which is an Infrastructure as Code-in-JS toolkit. Seed.run automatically manages deployments. Each PR generates a deployment and merging to
main
deploys to a staging environment. Deployments to staging are promoted to production manually when desired.The w3up service runs at up.web3.storage. All web3.storage services have a staging deployment.
IPNI
🏅 IPNI spec
📚 IPNI docs
🏛️ IPNI library
IPNI stands for InterPlanetary Network Indexers. Advertising to IPNI ensures that content is discoverable in the IPFS network without having to publish provider records to the IPFS DHT. For large storage providers (like web3.storage), it is more scalable and sustainable.
After content has been stored, clients opt in to IPNI indexing by sending an
ipni/offer
invocation. Then web3.storage constructs the appropriate IPNI advert and informs IPNI that it exists.🚧 Work in Progress - Advertising to IPNI is currently done automatically on behalf of users, triggered by S3/R2 bucket events. i.e. clients do not send
ipni/offer
invocations(!). The spec for offering data to IPNI via a UCAN invocation is in review.Chapter 3 - Writes cont.
🌟 Challenge: upload a file at the start of the day and find your PoDSI inclusion proof in console.web3.storage at the end of the day.
Content claims
🏅 Content Claims spec (PR)
🏛️ Content Claims implementation repo
Content Claims is a UCAN system allowing actors to assert information about content (identified by a CID).
For example, you can sign a UCAN that claims a CID can be found at a given URL.
There are a few specified types of claims available:
Generated claims
Client libraries generate content claims and submit them to web3.storage when uploading data. The claims created automatically by clients are as follows:
🚧 Work in Progress - We're actively working on adding this functionality to the client. The old web3.storage API (pre-UCANs) generated these claims for uploaded content on behalf of the client.
Content resolution
Given a root CID for some data, clients can learn all the information they need to extract blocks in the correct order from a CAR file:
HTTP API
📚 HTTP API docs
Content Claims has a very simple HTTP API that returns a CAR file of claims for a given CID.
The
?walk=
query parameter allows properties of content claims to be "expanded", effectively allowing step 1-3 of the content resolution to be performed in a single request.GraphQL API
🏛️ GraphQL implementation repo
There's an experimental GraphQL API available at graphql.claims.dag.haus.
Filecoin pipeline
🏅 w3filecoin spec
🏛️ w3filecoin protocol implementation libraries
🏛️ w3filecoin infrastructure repo
🎬 DUDE! Where's my CAR? (w3filecoin pipeline) video
Clients opt into storing their data on Filecoin by submitting a
filecoin/offer
invocation. The invocation includes the shard CID and a piece CID, which are both calculated client side.A UCAN invocation receipt is generated asynchronously and contains the inclusion proof and data aggregation proof for the offered piece. i.e. a proof that the piece has been included in an aggregate and the proof that the aggregate has been included in a Filecoin deal. The inclusion proof is called PoDSI for short.
🚧 Work in Progress - Currently all uploads are automatically submitted to the Filecoin pipeline when received. The client does not yet send this invocation. However, piece CIDs are calculated.
Pieces
📚 Piece docs
🏅 Piece Multihash and v2 Piece CID spec
🏛️ fr32-sha2-256-trunc254-padded-binary-tree-multihash library
🏛️ data-segment library
Piece CIDs (CommP) are used to address data in Filecoin deals. In short, they are an alternative merkle DAG encoding of some data. They address a binary tree, with "padding".
Piece CID calculation is done in the client by the Rust WASM based fr32-sha2-256-trunc254-padded-binary-tree-multihash library.
In web3.storage we use v2 piece CIDs because they encode the tree size, which otherwise must always be passed around with the piece CID.
The data-segment library provides functionality for generating an aggregate from a set of pieces. It is based on the go-data-segment library.
Aggregation
🎬 How data aggregation for Filecoin works video
📚 Verifiable aggregation docs
The above diagram shows the actors in the aggregation pipeline and the UCAN invocations that allow pieces to be tracked. All actors accept the Broker and Storage Providers are run by web3.storage.
The aggregation pipeline makes use of asynchronous
fx
(defined by the UCAN invocation spec) to allow an offered piece to be tracked through the pipeline. As each stage completes, the service issues a signed receipt with the result for the current stage andfx
for future tasks.The receipt for
filecoin/offer
contains twofx
tasks:filecoin/submit
- allows detailed tracing through the pipeline at every stage. i.e.piece/offer
topiece/accept
toaggregate/offer
and finallyaggregate/accept
.filecoin/accept
- resolves to a receipt that contains the data inclusion proof (inclusion proof + deal ID) when the pipeline has completed processing the piece.The documentation for the pipeline explains the stages in more depth.
💡 web3.storage implements and runs a storefront, aggregator, dealer and deal tracker. However, these services could be run by other parties.
PoDSI
🏅 PoDSI spec
📚 PoDSI docs
PoDSI stands for Proof of Data Segment Inclusion. It is a proof that a smaller piece (a segment) has been included in a larger piece (an aggregate).
Example merkle proof showing path from aggregate (piece) CID to segment (piece) CID:
ℹ️ PoDSI is used by Lighthouse for example, as well as other aggregators and is part of the FVM-Enabled Deal Aggregation FRC (currently WIP).
Chapter 4 - Reads
🌟 Challenge: use the Dagula CLI to download a CAR file of an upload from Hoverboard.
IPFS gateways
🏅 Trustless Gateway spec
🏛️ Freeway implementation repo
📚 IPFS Gateway architecture diagram
The IPFS gateway operated by web3.storage is called Freeway. It uses content claims to resolve and serve data as described in the Content Claims section.
Freeway is a path based, trusted+trustless gateway. It is exposed publically as w3s.link (aka "edge gateway") and dag.w3s.link, which is trustless only.
Freeway serves data from CARs at rest. Index data helps Freeway extract blocks as they are needed. This could be as part of a trusted UnixFS export or as a trustless DFS traversal CAR request. Freeway uses HTTP range requests to extract just the bytes that are needed for a given block. It's blockstore is able to send a single request with a bigger range when multiple blocks that are close together are required.
⭐ Mini Challenge: The web3.storage gateway currently runs on Cloudflare workers, what might some of the challenges be running a gateway in this environment? Hint: workers are significantly resource constrained.
Libraries
🏛️ gateway-lib library
🏛️ dagula library
Over time we've had to setup a few different gateways
w3s.link
,dag.w3s.link
as well asnftstorage.link
and an experimental AWS lambda based gateway Autobahn. We extracted a gateway-lib library that provides a toolkit for routing, CORS, caching, but primarily handlers for each content type that needs to be served e.g. UnixFS, raw block, CAR.The gateway library uses Dagula as a general purpose path traversal and content exporter tool. Dagula is similar to Lassie except it has no built in functionality for retrieving from different sources. It simply takes a blockstore interface and exports data by requesting blocks.
Freeway instantiates Dagula with a blockstore that is backed by Content Claims and HTTP range requests to a Cloudflare R2 bucket. There is a Dagula CLI that creates an instance with a blockstore that is backed by libp2p and bitswap.
Ex-racing gateway
The edge gateway (
w3s.link
) was once a racing gateway. It used to dispatch a request to multiple gateways (including Freeway) and serve the first to respond. Now the edge gateway uses Freeway only and redirects todweb.link
if Freeway is not able to serve the request. Note: includes when an IPNS path is requested!CDN cache
Both
w3s.link
anddag.w3s.link
make use of Cloudflare's CDN cache. Every successful response under 500MB (the CF limit for cached items) is cached.Denylist
🏛️ denylist implementation repo
Both
w3s.link
anddag.w3s.link
use our denylist API. The denylist is a superset of the badbits list. The denylist has a simple API that returns a 404 status for items NOT on the denylist and a 200 status otherwiseGET /:cid
.The denylist is currently an internal tool and is not documented publically because of it. You can access it, for example, like this:
denylist.dag.haus/bafybeia6ddkaspqckx324sv3m2cc4dh23ki6qgznx3k2una2ejkvxnk4iy
CAR native gateway
Since we content address CAR shards in web3.storage it is trivial to serve them directly from the gateway. You can request any CAR from
w3s.link
using a CAR CID. HTTP range requests are also permitted.e.g. w3s.link/ipfs/bagbaierabxhdw7wglmlehzgobjuoq3v3bdv64iagjdhu74ysjvdecrezxldq
This functionality enables any client to operate as a "freeway", including Saturn nodes and browser applications.
Dudewhere and Satnav indexes
These are legacy indexes that Freeway used to rely on exclusively to serve content.
🚧 Work in Progress - These indexes are currently being replaced with Content Claims.
Bitswap
A libp2p bitswap API is available for all content stored to web3.storage.
Elastic IPFS
🎬 The Rise of Elastic IPFS video
🎬 Elastic IPFS, Freeway and Dudewhere/Satnav indexes video
Elastic IPFS is a cloud native IPFS implementation that reads from CAR files at rest. Elastic IPFS consists of three main subsystems:
🚧 Work in Progress - We are in the process of transitioning to the second iteration of the system.
Hoverboard has replaced the Kubernetes based bitswap peer.
Content Claims are currently in use in production and will eventually fully replace the DynamoDB index.
The
ipni/offer
invocation will replace the automatic IPNI publishing subsystem.Hoverboard
🎬 Hoverboard video
📚 Hoverboard architecture diagram
🏛️ Hoverboard implementation repo
Hoverboard is a bitswap peer that runs in Cloudflare workers. It uses Dagula with a blockstore that is backed by Content Claims and a Cloudflare R2 bucket.
The libp2p address for hoverboard is:
🚧 Work in Progress - We're teaching Hoverboard to use Content Claims! Currently Hoverboard reads from the Elastic IPFS DynamoDB index.
Roundabout
📚 Roundabout docs
The Roundabout is a tool for obtaining a signed URL for a given CAR or piece CID that allows it to be downloaded directly from a Cloudflare R2 bucket without incurring egress charges.
Roundabout URLs are used by Filecoin Storage Providers to download CAR content for inclusion in deals.
Chapter 5 - Miscellanea
w3ui
🏛️ w3ui implementation repo
Headless, type-safe, UI components for the web3.storage APIs. The w3ui project originally provided components for React, Solid, and Vue. Due to significant refactors in the client library and time/resource constraints w3ui currently only provides components for React.
console.web3.storage
🏛️ console implementation repo
Web UI for the web3.storage service. Console is an example of how to build an app with w3ui.
console.web3.storage
w3name
🏛️ w3name implementation repo
A HTTP API to IPNS. Also provides a feature where changes to IPNS keys are pushed over a websocket connection to connected clients. IPNI record updates are also published to the IPFS DHT, and periodically re-published.
w3clock
🏅 w3clock spec
🏅 Merkle Clock spec
🏛️ w3clock implementation repo
UCAN based merkle clock implementation.
For the interested reader
Additional tech used in web3.storage you might like to explore:
Appendix
Resources