# IPFS Principles
:::danger
**READ THIS FIRST — OR ELSE!**
Work on this document has been moved into a
[PR to the specs site](https://github.com/ipfs/specs/pull/390). Comment there if you have
feedback! What remains here is only the previous discussion.
**NO CHANGE BELOW THIS LINE WILL BE TAKEN INTO ACCOUNT.**
**NO CHANGE ABOVE IT EITHER, MIND YOU.**
:::
:::warning
**changes**
- New title reflecting the content better
- New introduction framing the document. The previous introduction is now the `Addressing`
section (which was lightly edited).
- Added transport-agnosticity to *IPFS Space*
- Whole new `Robustness` section to capture some of the ethos and agnosticity
- Added SHOULD for incremental verification
- Negative space examples in new section
- Included example about interop with other content-addressable systems
- Added mention that this approach to robustness subsumes the end-to-end principle.
**questions**
- What is *good enough* to ship this and iterate in public?
- Can we haz move this to the GitHub?
- Do we have a list of questions we expect people to ask and expect this document to answer?
:::
The IPFS stack is a suite of specifications and tools that share two key characteristics:
1. Data is addressed by its contents using a strict but extensible mechanism, and
2. Data is moved in ways that are tolerant of arbitrary transport methods.
This document provides context and details about these claims. In doing so it defines what
implementations are part of the IPFS ecosystem.
## Addressing
The web's early designers conceived it as a universal space in which identifiers map to
information resources. As the web grew, they enshrined in
[web architecture](https://www.w3.org/TR/webarch/#identification) that all resources
should have an identifier and
defined "addressability" as meaning that
"[*a URI alone is sufficient for an agent to carry out a particular type of interaction.*](https://www.w3.org/2001/tag/doc/whenToUseGet.html#uris)" (:cite[webarch])
This design is tremendously successful. For all its flaws, the web brings together a
huge diversity of software, services, and resources under universal addressability.
Unfortunately, HTTP addressability is based on a hierarchy of authorities that places
resources under the control of a host and places hosts under the control of the DNS system
(further issues with this model are discussed further in the Appendix). As indicated
in :cite[RFC 3986]:
> Many URI schemes include a hierarchical element for a naming
> authority so that governance of the name space defined by the
> remainder of the URI is delegated to that authority (which may, in
> turn, delegate it further).
[CIDs](https://github.com/multiformats/cid) in IPFS offer an improvement over HTTP URLs by
maintaining universal addressability while eliminating the attack vectors inherent in
hierarchical authority. Content addressability derives identifiers from the content of
an information resource, such that any party can both mint the identifier and verify
that it maps to the right resource. This eliminates the need for any authority outside
of the resource itself to certity its content. It makes CIDs the universal
self-certifying addressability component of the web.
Addressing data using [CIDs](https://github.com/multiformats/cid) is the first defining
characteristic of IPFS. And the second characteristic, transport-agnosticity, can be
supported thanks to the verifiability that CIDs offer. Across a vast diversity of implementations,
architectures, and services, *IPFS is the space of resources that can be interacted with
over arbitrary transports using a CID*. As Juan Benet once put it,
"[*That's it!*](https://github.com/multiformats/cid/commit/ece08b40a6b1e9eeafc224e2757d8d1ef3317163#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R43)"
Conversely, any system that exposes interactions with resources based on CIDs is
an IPFS system. There are
[many contexts in which CIDs can be used for addressing](https://docs.ipfs.tech/how-to/address-ipfs-on-web/)
and [content routing delegation](https://github.com/ipfs/specs/blob/main/routing/DELEGATED_CONTENT_ROUTING_HTTP.md) can support a wealth of interaction options by resolving
CIDs.
## Robustness
Common wisdom about network protocol design is captured by *Postel's Law* or the
*Robustness Principle*. Over the years it has developed multiple formulations, but the
canonical one from :cite[RFC1958] ("*Architectural Principles of the Internet*") is:
> Be strict when sending and tolerant when receiving.
This principle is elegant, and expresses an intuitively pleasing behavior of protocol
implementations. However, over the years, the experience of internet and web protocol
designers has been that this principle can have detrimental effects on interoperability.
As discussed in the Internet Architecture Board's recent work on
[*Maintaining Robust Protocols*](https://datatracker.ietf.org/doc/html/draft-iab-protocol-maintenance),
implementations that silently accept faulty input can lead to interoperability defects
accumulating over time, leading the overall protocol ecosystem to decay.
There are two equilibrium points for protocol ecosystems: when deployed implementations
are strict, new implementations will be encouraged to be strict as well, leading to a
strict ecosystem; conversely, when deployed implementations are tolerant, new
implementations will have a strong incentive to tolerate non-compliance so as to
interoperate. Tolerance is highly desirable for extensibility and adaptability to new
environments, but strictness is highly desirable to prevent a protocol ecosystem from
decaying into a complex collection of corner cases with poor or difficult
interoperability (what the IETF refers to as "virtuous intolerance").
IPFS approaches this problem space with a new iteration on the robustness principle:
> Be strict about the outcomes, be tolerant about the methods.
<!--
Also considered:
- be strict in mapping intent (addresses) to outcome (content), be tolerant (open-ended, flexible, adaptable, extensible) in how you implement that mapping
- be strict about the goal, be tolerant about the path
-->
CIDs enforce strict outcomes because the mapping from address to content is verified;
there is no room for outcomes that deviate from the intent expressed in an address.
This strictness is complemented by a design that proactively expects change thanks to
a self-describing format (CIDs are a [multiformat](https://multiformats.io/) and support
an open-ended list of hashes, codecs, etc. ). The endpoints being enforceably strict means
that everything else, notably transport, can be tolerant. Being tolerant about methods
enables adaptability in how the protocol works, notably in how it can adapt to specific
environments, and in how intelligence can be applied at the endpoints in novel ways, while
being strict with outcomes guarantees that the result will be correct and interoperable.
Note that this approach to robustness also covers the
[End-to-end Principle](https://en.wikipedia.org/wiki/End-to-end_principle).
## IPFS Implementation Requirements
An :dfn[IPFS Implementation]:
* MUST support addressability using CIDs.
* MUST expose operations (eg. retrieval, provision, indexing) on resources using CIDs. The operations
that an implementation may support is an open-ended set, but this covers any interaction which the
implementation exposes to agents.
* MUST verify that the CIDs it manipulates match the resources they address, at least when it
has access to the resources' bytes. Implementations MAY relax this requirement in
controlled environments in which it is possible to ascertain that verification has happened
elsewhere in a trusted part of the system.
* SHOULD name all the important resources it exposes using CIDs. Determining which resources are
important is a matter of judgment, but anything that another agent might legitimately wish to
access is in scope, and it is best to err on the side of inclusion.
* SHOULD expose the logical units of data that structure a resource (eg. a CBOR document, a file or
directory, a branch of a B-tree search index) using CIDs.
* SHOULD support incremental verifiability, for practical reasons.
* MAY rely on any transport layer. The transport layer cannot dictate or constrain what IPFS is.
The :dfn[IPFS space] is the set of all CIDs and transport-agnostic operations that can be carried
out on CIDs.
## Boundary Examples
These IPFS principles are broad. This is by design because IPFS supports a wide family of
use cases and is adaptable to a broad array of operating conditions. Considering a few cases
at the boundary may help develop an intuition for the limits that these principles draw.
### Other Content-Addressing Systems
CIDs are readily made compatible with other content-addressable systems, but this does not
entail that all content-addressable systems are part of IPFS. Git's SHA1 hashes aren't CIDs
but can be converted into CIDs by prefixing them with `f01781114`. Likewise, BitTorrent v2
uses multihashes in the `btmh:` scheme. BitTorrent addresses aren't CIDs, but can be
converted to CIDs by replacing `btmh:` with `f017c`.
The simplicity with which one can expose these existing system over IPFS by simply prefixing
existing addresses to mint CIDs enables radical interoperability with other content-addressable
systems.
### Verification Matters
The requirements above state that an implementation may forgo verification when "*it is
possible to ascertain that verification has happened elsewhere in a trusted part of the system.*"
This is intended as a strict requirement in which implements take trust seriously. The point is
that it's okay to not constantly spend cycles verifying hashes in an internal setup which you
have reasons to believe is trustworthy. This is *not* a licence to trust an arbitrary data
source just because you like them.
For instance:
- A JS code snippet that fetches data from an IPFS HTTP gateway without verifying it is not an
IPFS implementation.
- An IPFS HTTP gateway that verifies the data that it is pulling from arbitrary IPFS nodes
before serving it over HTTP is an IPFS implementation.
- That JS piece of code in the first bullet can be turned into an IPFS implementation if it
fetches from a
[trustless gateway](https://github.com/ipfs/specs/blob/main/http-gateways/TRUSTLESS_GATEWAY.md)
and verifies what it gets.
## Self-Certifying Addressability
:dfn[Authority] is control over a given domain and :dfn[naming authority] is control over
what resources are called.
:dfn[Addressability] is the property of a naming system such that its names are sufficient
for an agent to interact with the resources being named.
:dfn[Verifiability] is the property of a naming system such that an agent can certify that
the mapping between a name it uses and a resource it is interacting with is correct without
recourse to an authority other than itself and the resource.
:dfn[Self-certifying addressability] is the property of a naming system such that it is both
addressable and verifiable: any name is sufficient to interact with a resource and its mapping
to that resource can be certified without recourse to additional authority. Self-certifying
addressability is a key component of a
[self-certifying web](https://jaygraber.medium.com/web3-is-self-certifying-9dad77fd8d81)
and it supports capture-resistance which can help mitigate against centralization.
CIDs support self-certifying addressability. With CIDs, the authority to name a resource
resides only with that resource and derives directly from that resource's most intrinsic
property: its content. This frees interactions with CID-named resources from the power
relation implicit in a client-server architecture. CIDs are the trust model of IPFS.
An implementation may retrieve a CID without verifying that the resource matches it, but that
loses the resource's naming authority. Such an implementation would be comparable to an HTTP
client looking DNS records up from a random person's resolver: it cannot guarantee that the
addressing is authoritative. Implementers may make informed decisions as to where in their
systems they support verification, but they should ensure that CIDs are verified whenever
they also have access to the resource that the CID maps to.
## Appendix: Historical Notes
We tend not to think about addressability because it is so foundational that we
struggle to apprehend a system without it, but that is precisely why it is important
that we get it right. You can find extensive historical evidence that TimBL and others saw
URLs as arguably the most fundamental invention of the Web, and the early groups that
worked on Web architecture discussed and debated the properties of URLs at length. The
problems of centralization we face today trace their lineage back to those decisions.
The hierarchical nature of the HTTP addresses was intentional, as TimBL wrote clearly in
[Web Architecture from 50,000 feet](https://www.w3.org/DesignIssues/Architecture.html):
> The HTTP space consists of two parts, one hierarchically delegated, for which the
> Domain Name System is used, and the second an opaque string whose significance is
> locally defined by the authority owning the domain name.
The model that the Web's earlier designers had in mind was a federated model
in which authority is delegated and addresses are *owned* based on that
authority delegation. This is notably clear in the *URI Ownership* passage of the
[*Architecture of the World Wide Web, Volume One*](https://www.w3.org/TR/webarch/#def-uri-ownership):
>URI ownership is a relation between a URI and a social entity, such as a person,
>organization, or specification. URI ownership gives the relevant social entity certain
>rights, including:
> * to pass on ownership of some or all owned URIs to another owner—delegation; and
> * to associate a resource with an owned URI—URI allocation.
>
> By social convention, URI ownership is delegated from the IANA URI scheme registry,
> itself a social entity, to IANA-registered URI scheme specifications.(…)
>
> The approach taken for the "http" URI scheme, for example, follows the pattern whereby
> the Internet community delegates authority, via the IANA URI scheme registry and the
> DNS, over a set of URIs with a common prefix to one particular owner. One consequence
> of this approach is the Web's heavy reliance on the central DNS registry.(…)
>
> URI owners are responsible for avoiding the assignment of equivalent URIs to multiple
> resources. Thus, if a URI scheme specification does provide for the delegation of
> individual or organized sets of URIs, it should take pains to ensure that ownership
> ultimately resides in the hands of a single social entity. Allowing multiple owners
> increases the likelihood of URI collisions.
>
> URI owners may organize or deploy infrastruture [sic] to ensure that representations of
> associated resources are available and, where appropriate, interaction with the resource
> is possible through the exchange of representations. There are social expectations for
> responsible representation management (§3.5) by URI owners. Additional social
> implications of URI ownership are not discussed here.
This notion of address or name ownership is
[pervasive across architectural documents](https://www.w3.org/DesignIssues/). This passage
from an interview of TimBL
([Philosophical Engineering and Ownerhip of URIs](https://www.w3.org/DesignIssues/PhilosophicalEngineering.html)) is explicit:
> **Alexandre Monnin**: Regarding names and URIs, a URI is not precisely a philosophical
> concept, it's an artifiact [sic]. So you can own a URI while you cannot own a philosophical
> name. The difference is entirely in this respect.\
> **Tim Berners-Lee**: For your definition of a philosophical name, you cannot own it.
> Maybe in your world, in your philosophy, you don't deal with names that are owned, but
> in the world we're talking about, names are owned.
This expectation of delegated naming authority was so strong among early Web architects
that the development of naming conventions in HTTP space (eg. `robots.txt`, `favicon.ico`,
all the `.well-known` paths) is described as "*expropriation*" in the
[Web Architecture](https://www.w3.org/TR/webarch/) and the W3C's Technical Architecture
Group (TAG) issue on the topic stated that it "breaks the web".
Federated models only have weak capture-resistance because the federated entities can always
concede power (precisely because they have ownership) but lack established means to
support collective organization. As a result, any power imbalance will likely become hard
to dislodge. A good example is search: as a publisher (the owner of delegated authority
over your domain) you can cede the rights to index your content but you can't have a voice
in what is done with the indexed content (individual opt out is not an option). This was
fine when you could barter content for links, but once search power consolidated, the
terms of trade deteriorated with no immediate recourse.