---
tags: draft, web
---
# User Agency
:::info
This is a very early draft, I am just capturing notes as I go along. It is barely coherent and not ready to review. Proceed at your own risk.
While the draft is in progress, sections are labelled with their maturity level:
* π’ β good to ship
* π§ - in progress
* β οΈ β just bare bones outline
* π¦ β material
:::
:::danger
**Issues**
* Revoice, too abstract, too distant in places.
* In **every** single section, relate it to how tiles as a primitive helps, enables,
or how they work together for better outcomes.
:::
The Web is in a bad place and many brilliant people have specific ideas about how to return oomph to
the Web by fixing this or that specific aspect. Maybe the key issue is to make it faster? Or perhaps
if we just added these API things will start looking up? Might we just need to emulate native
platforms? There is nothing inherently wrong with these ideas, and at least some of them
should probably be implemented one way or another. But the contention behind this document is
that this whack-a-mole tinkering adds up to little more than refactoring deck chairs on the
_Titanic_ and we're not even sure if the _Titanic_ might not be a submarine or an airship.
Proceeding via small, incremental changes is a healthy and laudable approach but it helps to
have a sense for what it is that we're _incrementing to_. After all, a headless chicken, too,
takes it one step at a time.
We don't have an idea of what the web should or even could be. To address this, this document
is many things, in fact it is β by design β _too many_ things. It endeavours to be
"_[vague but exciting](https://blog.mozfr.org/dotclear/public/Firefox_OS/proposal.png)_."
It's wrong. It's biting off more than anyone could hope to chew. But it's a white paper that
gives directions for the future of the web, and I hope that it can help us figure out what it is
that we're all doing here.
<u>TL;DR part 1</u>: the core *philosophical* idea that runs through this document is that **_the
web is about user agency_**. This idea will be made more precise below, but a few salient points
about this approach are worth calling out right away:
* As I will argue below, the idea of user agency ties well the
[capabilities approach](https://en.wikipedia.org/wiki/Capability_approach), an approach to
ethics and human welfare that is concrete, focused on real, pragmatic improvements, and that
has been designed to operate at scale.
* While the word "user" has come to mean something less than a person, I suggest that we
reclaim rather than abandon it and enshrine the web user as the person who operates the web.
More specifically, "user agency" points at the user agent (a.k.a. the browser) and a key tenet
of this position is that limitations and mistakes in how we envision user agents are central to
holding the web back.
* Perhaps counterintuitively, focusing on user agency does not make this position individualistic.
On the contrary, because it has to be about _everyone's_ agency, it imagines a web that is
β[_a global community thoroughly structured by non-domination_](https://bookshop.org/books/reconsidering-reparations/9780197508893).β Put differently, under this view the web is the answer to the question of β_[what is a form of βcollectivityβ that everywhere locally maximizes individual agency, while making collective emergent structures possible and interesting](https://c4ss.org/wp-content/uploads/2020/06/Aurora-ScaleAnarchy_ful-version.pdf)_.β
<u>TL;DR part 2</u>: The core *technical* position in this document is that there is **a single
primitive β which we call *tiles* β which we can add to the Web platform and that can then serve
as the foundation for a shift of power from servers to users**. In a nutshell:
* Tiles cannot interact with the network other than to load other tiles or indirectly via
purpose-specific interfaces that the user agent can reason about. This means that they can be
granted access to sensitive data and functionality (so long as it isn't locally destructive)
because they cannot generally exfiltrate information.
* Tiles are content-addressable packages, which means that they can be loaded from arbitrary
sources and through arbitrary mechanisms, and can be stored and manipulated locally
(eg. installation is just keeping something around).
* They can declare their ability to handle specific tasks or skills (in a manner reminiscent of
intents/activities) and compose with the skills of other tiles, making it possible to weave
tiles together (to tesselate) in an app-like way, mediated by the agent.
The result opens up a very rich and novel way of building things on the Web, with a world
of potential new user-centric features (hence the length of this document), but using a
relatively constrained set of new standards that lend themselves to an MVP implementation and
iterations, can be created over time, and can live side by side with today's Web to ensure a
smooth, progressive transition. *Revolution through evolution*.
Because this document is big and wrong, it is intended to remain a living doc and it invites
your participation β come make it even bigger and wronger, vaguer and more exciting!
## π§ The Web Rocks; Browsers Suck
So, this section title might come across as needlessly harsh, especially in a document written
inside a browser. But let's be frank, browsers: we need to talk.
No one knows what the Web is. Seriously, no one does. No group ever managed to reach any kind of
useful agreement as to a definition of the Web. It doesn't have to be over HTTP and it doesn't have
to happen in a browser, and conversely there are things that run over HTTP or in a browser that
many would consider to not be the Web (eg. PDF). But we don't need to agree on what it is to agree
that it's awesome. Whatever different fuzzy ideas we have in our minds are close enough to one
another.
The problem is, though, that browsers have painted themselves into a corner, and we could benefit
from taking a step back and revisiting the assumptions that have brought us here. They do very
little to support people beyond running a browser engine safely, the UI system of windows and tabs
is famously broken, they make for a poor application environment as people who get to vote with
their feet keep telling usβ¦
The goal of this project is to make incremental changes, not to reinvent everything from the
ground up, but in order to make progress we *will* need to change some pretty ingrained things
that are keeping us stuck here, and browser UI is one. The delta between the tech stack that exists
today (though it may not be deployed in this way) and what this project aims at is relatively
contained. This is primarily an evolution of the Web focused on fixing architectural gaps that
keep us trapped at a local optimum discovered a couple of decades ago. And we need to recognise that
UI metaphors enshrine that local optimum.
What's more, "browsers" are a figment of Web engineers' imaginations. From a product perspective,
in the minds of most users, they don't exist. Or at least they don't exist as things that are
separate from a search engine, and to a lesser degree from a social network (see in-app browsing).
The architectural view in which search, social, and browsing are distinct is a distraction that
does not map to the experienced reality of most people and that is in fact conceptually
arbitrary with respect to the tasks that people actually seek to accomplish on the Web. In a sense,
we're missing the Web for the tabs.
The rest of the document goes into more specific changes that need to be performed, but we can give
a high-level understanding of issues that need to be addressed with browsers in order to support
user agency.
First, **browsers need to do more to support people in their online lives**, from managing their
identities to staying on top of massive amounts of information. The tabs and bookmarks system is
very much underpowered compared to today's web. It provides a poor environment for applicative
use, which in turn is a poor match for the web's capabilities. By making extensions safer and by
extending the space for UI capabilities, we can also make it easier for content to be promoted to
extension-like status, making it easier to boost the agent's power. Some of the field's brightest
minds have been trying to make "Web Apps" happen for coming up on two decades. It's not going to
happen, not like this. The majority of people just keep any number of apps open at all times but
max out at three tabs β that's all you need to know about the future of the apps-in-tabs model:
it doesn't exist.
Second, **browsing, search, and social need to be unified**. In most people's minds, search and
browser are the same product, and on mobile (which is where it's at) social is also headed there
with in-app rendering. (And AMP leaned into this mental model even harder.) Keeping them separate
means that one gets to commoditise the others β right now it's the browser that's being commoditised
in favour of systems in which people have decreased agency. We need the user agent to commoditise
search and social right back where they belong: serving the user. This means turning them into
protocols and having the agent provide the UI for them. It also means moving controls over ranking
and recommendations to the agent as much as possible.
Moving this control away from services and into people's hands improves agency
directly, but also has less direct benefits for people. Choosing what is
relevant (which definitely includes ranking and recommendation) is an editorial
decision. Democracy requires media pluralism, but under today's system we only
have a small number of algorithmic media hegemons. "*Gatekeepers may no longer
control what gets published, but algorithms control what gets circulated. (β¦) It
is misleading then to argue that cultural circulation has been democratized. The
means of circulation are algorithmic, and they are not subject to democratic
accountability or control. Hyperconnectivity has in fact further concentrated
power over the means of circulation in the hands of the giant platforms that
design and control the architectures of visibility.*" (Rogers Brubaker,
[*Hyperconnected Culture And Its
Discontents*](https://www.noemamag.com/hyperconnected-culture-and-its-discontents/)). People will have greater agency in a world in which they have
more and better choices of editorial relevance functions than of cereal brands,
that's just a fact.
Third, **browsers (and even more so engines) need a business model** that doesn't involve
selling users to search engines. It is not uncommon for browser developers to believe that the
default search engine they ship with is not the one that is the best for their users, but that is
what foots the bills and so they close their eyes and think of England. In addition to misaligned
incentives, this arrangement also creates a mechanical feedback loop in the search market. Because
of increasing returns, the dominant search engine can pay browsers more for the default position. In
turn, few people change the default (because doing so does not help them make an informed decision that self-evidently
aligns with their interests), which in turn feeds dominance. Rinse and repeat. This dynamic guarantees
that any search engine that acquires a modest and temporary advantage over others will come to
own the market, irrespective of quality by any other criterion. This maintains both a lack of media pluralism in search
and a lack of innovation in browsers' approach to search. Long term, it also doesn't support
browser or engine diversity.
And fourth, **the architecture that browsers enforce puts most of the power in the hands of
the server.** By far almost all of the intelligence in a browser engine is dedicated to abiding by server-provided instructions. Apart from a number security protections, the client side of
the Web is profoundly dumb from the user's perspective. **Browsers enforce an asymmetry of
automation that puts authors before users; it needs to be reversed.** Any number of typical
features would work better if they were in people's hands: recommendations, search & social
filters, blocking, identity, comments, shopping cart management, subscription & membership
management. Most of these systems work poorly because they have to be constantly reinvented and
reimplemented by businesses that should be focusing on their core competencies instead. The result
is a system of widespread mediocrity. In the usability/hackability trade-off we have somehow
managed to land in a place where we have neither.
To conclude, browsers today enable, not necessarily willingly, an ecosystem that is actively
hostile to agency. We can't fix "just" UI or "just" tech or "just" the economics β we need to
navigate the complex trade-offs involved in fixing all three while maintaining an evolutionary
path, without a tabula rasa revolution.
To do that, we need to rethink agency and then find better ways to empower it.
<div style="text-align: center; font-size: 3rem;">
β
</div>
## π§ Ethics
> βThe ideas of economists and political philosophers, both when they are right and when
> they are wrong, are more powerful than is commonly understood. Indeed, the world is
> ruled by little else. Practical men, who believe themselves to be quite exempt from any
> intellectual influences, are usually slaves of some defunct economist.β\
β [John Maynard Keynes](https://en.wikipedia.org/wiki/John_Maynard_Keynes),
*The General Theory of Employment, Interest, and Money*
As technologists we are often reluctant to engage with philosophy, a reluctance often
expressed by running in the opposite direction all limbs akimbo with an ululating shriek
reminiscent of some of the less harmonious works of exorcism. Even those of us who are curious about
it rarely seem to let it shape what we build. In the same way, however, that the more
abstract forms of computer science *can* indeed help us produce better architectures,
philosophy *can* be applied to the creation of better technology. A quick tour of the biggest
problems that we face in tech β governance, sovereignty, speech, epistemic individualism,
gatekeeping, user agency, privacy, trust, community β reads like a syllabus for the toughest
course in ethics and political philosophy. There is no useful future for technology that
doesn't wrestle with harder problems.
This is probably not the right place for a full-on treatise on ethics (no, do _not_ tempt me), but
if we're going to work how best to develop user agency it seems useful to agree on some basic
notions of how to do good for people and of what having agency means. Three things that are
important to share some minimal foundations about are: 1) working towards ethical outcomes
doesn't mean relying on vapid grand principles but rather ought to be focused on concrete
recommendations, 2) when considering agency we need to be thinking about *real* agency rather
than theoretical freedoms, and 3) counterintuitively, giving people greater agency sometimes
means making decisions for them, and that's okay if it's done properly. The rest of this
section covers these in greater detail.
First, **focusing on user-centric ethics does not mean that we should get lost in reams of endless abstraction; on the contrary, we must focus on principles that can be *implemented*.**
Many documents about ethical tech seem to be lists of lofty principles ("*for all humankind!*" or
"*this must be fair and just and shiny!*"). These can sound nice, and can occasionally prove useful
(for instance to decide disagreements), but for the most part it's hard to know what to do with them.
By constrast, when working on standards, we only consider requirements that can be
[verified with a test](https://www.w3.org/TR/test-methodology/) to be meaningful
β everything else is filler. Being as strict in our ethical standards is challenging, but
we can strive for it.
In fact, on the web this is something that we have been doing for a while without
describing it as such. Instead of just saying "everyone should have access to the web" or
"people should be able to use the web in ways they can trust" we have extensively detailed
and concrete principles about what "access for all" means (all of the accessibility and the I18N
review guidelines) or what trustworth means (all of the security and privacy documents). This
approach has multiple advantages. First, it's concrete which means that proposals can be reviewed
(including self-reviewed) in practical terms and debated constructively. Second, instead of
having a constitution-level group come up with lofty phrases carved in stone, the care and
maintenance for these principles are delegated to groups that specialise in these areas, that
can often have representatives of people affected by the problems (or at least folks in touch with
them), and that can keep these updated as the community's knowledge improves. Finally, these help
to develop craft and knowledge about these areas in the broader community, rather than keep this
knowledged confined just to a group of experts.
Second, **deepening user agency has to be about giving people real capabilities to act, not
theoretical rights that they can't actually exercise**. We need to focus on *opportunity* or
*substantial* freedoms, freedoms that people can *really* exercise. This avoids the trap of
vaporware freedom in which people may have a nominal or legal right to do something but the world
is architected in such a way as to prevent it. Everyone can start a business or own a house! β
except no bank exists that will lend to people like you. Users can change the default as much
as they want to! β except you know that they won't because the UI discourages it. Everyone can
speak! β except only certain voices get amplified by the algorithmic gods.
Martha Nussbaum and Amartya Sen have developed a pragmatic understanding of quality-of-life and
basic social justice known as the *capabilities approach*. The capabilities approach asks
"*What each person is able to do and to be?*" (Martha Nussbaum,
[*Creating Capabilities*](https://bookshop.org/p/books/creating-capabilities-the-human-development-approach-martha-c-nussbaum/6690885?ean=9780674072350)) Even assuming generous amounts of universal
education, people do not have the time to assemble what they need from tech components, to go
through pages of configuration, or to answer thousands of prompts. We cannot pretend that we
are giving people neutral tools for them to go choose their own adventure with, we cannot be
satisfied with [RFC6919](https://datatracker.ietf.org/doc/html/rfc6919)-style rights Γ la
"*you MAY change the default (but we know you won't)*." Capabilities were designed with development
in mind, they are meant to change people's actual lives, not to check theoretical items off a list.
They are by nature concrete & implementable, which connects with the previous point. **In many ways,
capabilities *are* user agency**.
Finally, **designing technology for people, even paradoxically when specifically designing technology
in support of user agency, unavoidably means making at least some decisions for them.**
"*Ethics and technology are connected because technologies invite or afford specific patterns
of thought, behaviour, and valuing: they open up new possibilities for human action and foreclose or obscure others.*" (Shannon Vallor,
*[Technology and the Virtues](https://bookshop.org/books/technology-and-the-virtues-a-philosophical-guide-to-a-future-worth-wanting/9780190905286)*) Technology choices, from low-level infrastructure
all the way to the UI, decide what is made salient or silent, hard or easy. They shape what is
possible and therefore what people can even think about acting on.
Addressing this issue can be done in part through strong and diverse governance, notably of ethical
review standards, and also by focusing the decision process on supporting user agency, which is to
say on developing actual capabilities and driving skillful habituation towards the good life.
This section evidently barely provides a cursory outline of the approach, but hopefully enough to
show that it is feasible to develop alignment around an ethical grounding of user agents and user
agency that is documented, implementable, and effective.
## π§ Foundation
The Web's architecture makes it near impossible to compose multiple services without creating massive
privacy (and often security) holes. This contributes to web sites being monolithic and with a lot
of overlapping functionality. When services are composed, this is typically with abysmal privacy
properties, and often brittle security. (Script injection all the things!)
This proposal doesn't radically change the Web's architecture. What it does is *add a new primitive*
that opens the door to newer, safer, more composable Web capabilities.
### π§ Requirements & Motivation
A vision of applications on the Web, and its hellish handmaidens of chaos packaging and permissions,
has been the elusive hankering of many a bright mind over the past decades who, after being consumed
by its distant bewitching song as an adventurer may be by the murmuration of distant shores, now
spend their greyer years muttering cantankerously to themselves on social media.
What haven't we tried?
* In 1999, [[W3C](https://github.com/w3c/vc-use-cases/pull/129) chartered work on XML Packaging](https://www.w3.org/XML/2000/07/xml-packaging-charter)
that was meant to make it possible to put some SVG, XHTML, CSS, etc. in a container to make apps with.
* In 2003 the binary XML work considered the same use cases of
[packaged documents](https://www.w3.org/TR/xbc-use-cases/#edocs) and
[mobile apps](https://www.w3.org/TR/xbc-use-cases/#xml-docs-mobile) to be in scope.
* In 2006, we had the [Web APIs group](https://www.w3.org/2006/webapi/admin/charter) kick off work, as
well as the [Web App Formats group](https://www.w3.org/2006/appformats/).
* [EPUB](https://www.w3.org/publishing/epub32/) has been addressing the packaging subset of that since 2007.
* The OMTP BONDI project tried to produce a Web-based standard for mobile apps (now so defunct that there is nothing on the Web).
* [Packaged Web Apps (Widgets)](https://www.w3.org/TR/widgets/) had a whole family of specs for a while.
* In 2009, the Device APIs and Permissions WG was supposed to solve that issue.
* The September 2014 [W3C Next steps on trust and permissions for Web applications](https://www.w3.org/2014/07/permissions/) was meant to solve this.
* The September 2018 [W3C Workshop on Permissions and User Consent](https://www.w3.org/Privacy/permissions-ws-2018/cfp.html) was meant to solve this.
* We had SXG at some point around here.
* The December 2022 [W3C Workshop on Permissions](https://www.w3.org/Privacy/permissions-ws-2022/) was meant to solve this.
This list is just for illustrative purposes; if it were exhaustive it would be *much* longer.
All told, and with the exception of EPub and a small number of APIs from these groups that have
found some measure of success, that's a quarter century of failure. Is there any reason to believe that
we can do better?
We've been trying to build web applications, but we should instead be building application webs.
The monolithic app that dominates native environments is a poor fit for the Web. (HCI researchers
have [long been pointing out that it's not a great fit for
humans, either](https://en.wikipedia.org/wiki/The_Humane_Interface).) This narrow focus has led us
into a series over interconnected impasses:
* If you can just navigate to it, then it can't generally have access to powerful capabilities. If
instead of navigating to it you have to install it first, then it's just a native app built with
Web technology.
* Being connected to the internet means that almost any access to device functionality
or personal information is dangerous. This fundamentally limits approaches to a small number of
options: **1**) expose little to nothing, **2**) ask the user, or **3**) delegate the decision
to an app store or some equivalent. Every single one of these options is bad.
* Apps may be wrong but tabs are worse. Trying to reproduce the monolithic desktop or mobile app
paradigm on the Web is to application what making every website work like a linear book would be
to documents.
The model that this document puts forward is different. It involves developing an application
equivalent to hypertext (*hyperapp*, maybe) that leads to a Web of commands, tasks, or skills
that work together at the user's behest. One way to think about it is as the Unix philosophy
if it had been invented after rounded corners and didn't involve growing a beard.
The load bearing foundation for this overarching programme is the introduction *Web Tiles*, which
are a new approach to the "permissions and packaging" problem. Tiles are:
* **Safe by default**. Like Web pages, it must be *always* safe to load a tile. In fact, it must
be safer as we intend it to leak a lot less information than a Web page load can. In order to
keep the promises we are making, loading a tile must be privacy-preserving with a certain number
of strong guarantees. (It can't be perfect, but it can be much better.)
* **Powerful by default**. *Unlike* Web pages, it must be possible to give a tile access to
sensitive information by default, without requiring the user to grant additional access. (This
isn't to say that *some* powerful capabilities might not be further gated, but they should be
significantly rarer.) This is achieved because tiles are prevented from arbitrary access to the
network: they are limited to loading other tiles (which is transitively safe) and to
purpose-specific protocols which the agent can restrict and reason about. Note that when tiles
load one another, they do not have to worry about cross-origin restrictions precisely because
they are safe anyway.
* **Content-addressable**. Loading a tile musn't require interacting with a specific server. By
making tiles content-addressable based on a hash of their content, accessing a tile can happen
via a cache, via anonymising intermediaries, etc. and doesn't require communicating with any
origin server that the tile is "located" at.
* **Local-first**. Being content-addressable also makes tiles local-first. "Installing" a tile
just means pinning it such that it remains locally saved and accessible. Ditto "saving" content.
* **Packaged and linkable**. Tiles group together related content so that they can be usable
without making additional network requests, but any content inside a tile is as linkable as it
would be in any other Web context.
### π§ IPFS + CSP
Throughout this document I have tried to avoid inventing new things, prefering instead to arrange
existing technology in new ways to match the underlying requirements and architectural preference.
IPFS, containing Web formats, is a good candidate to address most of the requirements for tiles.
However, it is insufficient on its own as there is nothing preventing IPFS content from embedding
HTTP content or vice versa, which in turn breaks the safety and composability of the tile
system. Tiles are comparable in some ways to
[Isolated Web Apps](https://github.com/WICG/isolated-web-apps/) and a similar set of constraints
expressed using a Content Security Policy can be applied. The concerns are somewhat different,
though, and so for instance `blob:` would be allowed while connections wouldn't. (The below hasn't
been reviewed in detail, it is only to give a sense of the policy.)
```http
Content-Security-Policy: default-src 'self' ipfs: ipns:;
style-src 'self' 'unsafe-inline' ipfs: ipns:;
script-src 'self' 'unsafe-inline' ipfs: ipns: 'wasm-unsafe-eval';
img-src 'self' ipfs: ipns: blob:;
media-src 'self' ipfs: ipns: blob:;
```
A very quick primer on IPFS and IPFS as a Web protocol may be helpful to those with a more
traditional Web background. IPFS is a peer-to-peer content addressable protocol. Content is stored
in blocks that are given a cryptographically-derived identifier (a CID, for Content ID) that is based
on the block's content. A block is retrived using its CID. Because retrival is derived from the
block's content and not its location, a block does not have to be obtained from the party that
initially produced it or in fact from any specific network location. Rather, anyone can cache and
redistributed it, and its content cannot be tempered with without causing the CID to change.
A block can be raw bytes, or it can have various kinds of structure. A single block can package up
multiple files and their metadata, and act as a small file system β this makes it possible for instance
to distribute an HTML file with its dependencies in the block.
IPFS addresses use the `ipfs:` scheme and use the CID as the authority part of the URL, as in
[`ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi/`](https://ipfs.io/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi). If you don't have an IPFS-enabled
browser (like Brave) or an IPFS client, you can use an HTTP gateway to IPFS (clicking the previous
link makes use of one). If the IPFS block is structured data, you can then use a path component in
the URL to pick out a specific piece of content inside the block.
IPFS is strictly immutable, but it can be useful to have a fixed address that points to content that
canb change over time yet is itself self-certifying (self-certifying means that it comes with all
the information you need to verify that it is authentic) and P2P. That is what IPNS is for. The
high-level principle of IPNS is that it is a name that represents a cryptographic key pair. That
key is then used to create a record that contains the public key and a CID, and that record is
published to the peer network. Anyone can then retrive that record, make sure that it matches the
key, and retrieve the CID over IPFS. IPNS URLs use the `ipns:` scheme.
(Note that, for simplicity's sake, I'm skipping a lot of detail and ignoring a number of alternative
options in the rich interplanetary ecosystem. If you want to find out more, head to
[IPFS docs](https://docs.ipfs.tech/) site.)
In terms of how IPFS works as a Web protocol, these pieces of infrastructure means that it can be
used wherever an HTTP(S) URL is available, be it at the top level, in `src`, or in `href`. So just
to give an example, you could build a site in the usual way and simply pull in IPFS content for
some highly cachable parts, maybe loading shared NPM libraries with a
`<script src="ipfs://<CID>/lit.js"></script>` or some common free fonts into CSS with a
`url(ipfs://<CID>/NotMyType.woff2)`. Similarly, you could package up a blog post in an IPFS block
along with all of its dependencies and either load that by navigating to `ipfs://<CID>` in a
browser's URL bar or embed it with `<iframe src="ipfs://<CID>/"></iframe>`. And you can mix both
by having the blog post in a block on its own and loading dependencies from another block.
:::info
Note that this section assumes that loading, serving, and pinning IPFS blocks can be done in
a privacy-preserving way. This is the subject of work in progress and the full threat modelling
will be the subject of another document.
:::
With this brief overview done, we can turn to the interesting properties that this adds to the
platform.
IPFS content is immutable (IPNS is a mutable pointer to immutable content). This offers a
foundation from which to safely compose simpler services together. It is important to keep
in mind that a tile can only remain composable so long as it is not mixed with a
non-composable one (typically an HTTP context) because the moment that that box is open, anything
goes. This does not mean that HTTP is not useful; rather that it is an escape hatch. Because of
this, just loading IPFS into a browser is not enough to make it a composable context: you also
need a CSP strict enough to prevent loading from HTTP as exemplified above and to have the user
agent enforce it directly.
Composability creates a trade-off. So long as a context is composable, then it can safely communicate
with other composable contexts with no limits. The result is always itself a composable context.
This means that same-origin restriction lose their usefulness (as do mixed-content limitations, which
have little meaning here). This is a powerful capability and we will see how it can be put to work.
However, this also means losing the ability to use server-side knowledge β there is no server β
to personalise content on the fly as well as the possibility to send data back to the content's
origin. This isn't a small trade-off and most will find it far too constraining on the face of it.
We will see further in the document that it is possible to make these capabilities less
necessary β in fact to make it possible to do without in a large number of cases β and also to
provide purpose-specific APIs for a number of core cases. Keep in mind that if a tile can be
guaranteed to be composable, *more* powerful APIs can be exposed to it because it is inherently
trustworthy.
As we start thinking about composability, it's important to keep in mind that by allowing ourselves to
think about UI changes, we can consider new ways of composing. There's embedding composition that
we know well on the Web, but we can also imagine collaborative/lateral composition as exemplified
for instance in the speculative [MercuryOS](https://www.mercuryos.com/architecture), powered by
Intents/Activities which were designed precisely for this kind of usage. For instance, identity
can be a user agent (wallet) services the UI for which is provided by a component, and the UA
can serve as a connector to [make identity "pluggable" into sites, under user
control](https://darobin.github.io/beltalowda/).
## π§ Composition
Tiles have highly desirable privacy and security properties, and the ability to default to granting
them access to powerful capabilities is great, but if we were to stop there we would nevertheless
have a rather limited system. Sure enough, tiles can be composed by embedding a tile in another
via an `iframe` (or similar) and they can talk using `postMessage` but that remains weak. The result
remains choreographed by the rootmost app author and doesn't give the user or their agent any
greater power.
What we need is a system that enables arbitrary tiles to communicate usefully and interoperably
with one another, in a way that puts the user in charge, without sacrificing usability. We are
essentially looking for a versatile I/O system for tiles that matches a series of UI patterns that
can be made intuitive to people.
### π§ Requirements & Design
We want the ability to wire together an arbitrary number of tiles (to tesselate them) without the
user having to think about it (ie. this shouldn't look like a graphical programming interface, at
least not in the typical case) but while supporting a web of small, dedicated app-like functionality
and putting the user in control of their experience without having to configure a million
complex monoliths. In order to achieve this, we need a system that is:
* **Declarative**. Tiles must be able to describe which requests they can handle without having even
been activated once (though some form of "installation" may be required). A tile should be able
to say "I can provide pictures" or "I can edit social posts" or yet againt "I can sort a list of
articles in the order this person will prefer". Essentially, the model is one in which a tile can
convey its ability to <u>verb</u> a <u>resource type</u>.
* **Selectable**. If a request is made for an activity which multiple tiles can perform, it needs
to be possible to present a list of the applicable ones and pick the right one in the direct flow
of the user's action.
* **Installable & Sharable**. Installing a tile is trivial since it is packaged and local-first. A
tile that handles activity requests is installed in exactly that way: a single click bookmark/save
will pin it locally and render it available to handle requests. Since it's just a tile, it can be
shared just as easily by being posted to a feed the user can write to.
* **Arbitrary**. The type of requests that they can handle needs to be open ended. What degree of
coordination may be required to make sure that unrelated tile authors know to use the same
language is up for determination. Using this mechanism, tiles must be able to talk back and forth,
and to exchange arbitrary data.
* **Discoverable**. It should be simple to produce a service that indexes tiles by what they can
handle so that when there is a request for an activity that the user does not have, finding an
appropriate one can be straightforward.
* **Asynchronous/continuous**. The most basic approach for tesselation is UI-driven
request/response: the user interacts with a tile (eg. hits "Edit this picture") and a tile is
selected to offer the functionality (in this case actually provide an image editor) before returning
the content. But other integrations must be possible, for instance automatically sorting a list
of articles (technically, of tiles, perhaps arbitrary) to match the user's preferences whenever
the context calls for it, and without the user having to specifically ask for it in the moment.
Two further design considerations should be taken into account:
1. This kind of invocation system is relatively similar to that which is used by voice assistants.
In fact, the similarity is rather clear when considering
[Alexa Intents and utterances](https://developer.amazon.com/en-US/docs/alexa/custom-skills/create-the-interaction-model-for-your-skill.html), Apple's
[App Intents](https://developer.apple.com/documentation/AppIntents) which work for both Siri and
non-voice interaction modalities such as Shortcuts and Spotlight,
[Google Actions](https://medium.com/google-cloud/building-your-first-action-for-google-home-in-30-minutes-ec6c65b7bd32), or even [VoiceXML](https://www.w3.org/TR/voicexml20/). The general verb-based
model of linguistic interfaces has also been tested with Mozilla's
[Ubiquity](https://wiki.mozilla.org/Labs/Ubiquity) project that built a form of command system
for the Web. We should design our system in such a way that it supports linguistic interaction
with minimal effort on top of the composition system so as to encourage the emergence of an
open voice assistant on the back of existing functionality.
1. A system that composes pieces of functionality using verb/resource pairs is conceptually close
to [UCAN's with/can model](https://github.com/ucan-wg/spec#24-capability). Maintaining
conceptual compatibility with UCANs is valuable as it opens the door to a rich object
capability model.
### π§ Intents & Activities
The existing technology most similar to these requirements is
[Web Intents](https://www.w3.org/TR/web-intents/). Web Intents were developed (and abandoned) by
the W3C's Device APIs Working Group as a way to enable precisely the kind of composition
described here between Web pages. They were inspired by Android Intents. A number of alternative
designs were proposed at the time, one of which being Mozilla's
[WebActivities](https://wiki.mozilla.org/WebActivities) which is still in use in B2G.
The syntax of existing options isn't necessarily ideal, but it can be mapped to more workable
alternatives. For instance, registering an intent/activity can be reproduced in a way that supports
discovery by having the verb/types pairings that the tile supports listed in its IPLD metadata:
```javascript
intents: [
// this can pick images and return them
{
can: 'pick',
what: 'image/*',
title: 'Select an image from our cat memes collection',
},
// this can create a social post which the user can post
{
can: 'post',
what: 'org.w3.activity',
title: 'Post a cat meme',
},
]
```
Whereas hyperlinks are nouns β they *name* things β intents are verbs. In this sense, they are
comparable to HTTP's verbs (methods) but offer richer semantics that sit closer to user
applications.
When a verb is invoked, the agent typically needs to ask the user what they want to invoked it
on or with (eg. which source pick images from). A
[very simple demo](https://darobin.github.io/beltalowda/simple-intent.html) can serve to
illustrate the flow for installation and a simple pick intent.
An intent is invoked using a simple API (this is just an indicative example, the exact shape
should almost certainly be different and may of course reuse one of the existing alternatives):
```javascript
<button id="img-picker">Pick Image</button>
<img alt="No image" id="profile">
// ...
document.querySelector('#img-picker')
.addEventListener('click', async () => {
const img = await navigator.invokeIntent('pick', 'image/*');
if (img) document.querySelector('#img-picker').src = img.src;
}
);
```
More involved interactions can be required when the intent is not about a simple request/response
action. For instance, we can think of an intent the purpose of which is to produce recommendations
for the user. The way it works when installed and active is that it receives lists of items of
the kind that it knows how to produce recommendations for, it filters out anything that it knows
should not be in the list (eg. I've said I never want to read that author again), it ranks the
rest according to whatever applicable criteria (eg. using its own ML model or simply
chronologically), and returns that ranking to whatever tile is rendering the list.
### π§ Extensions
One interesting aspect to note is that the security properties of tiles mean that they can
be safely used as extensions to the agent in many cases, and with a simpler installation
ceremony. At least for extensions that cannot touch non-tile content.
A generalized intent system would make it easier to expose a number of useful properties that
extensions normally have access to to essentially all tiles. This is by design: if the agent
becomes as powerful as this proposal makes it, and if we increase how much user interface the
agent affords (as opposed to just a thin chrome in browsers) and how much functionality it handles
natively (eg. search and social), then it needs to be easily and readily extensible so as not
to die a suffocating death under the endlessly conservative bureaucracy of browser vendor
interfaces.
Tiles as extensions, with granular ways to replace specific parts of the UI and to handle a
significant number of agent functions, means that we can have a perpetually dynamic experience
that can be highly tailored to people and environments.
### β οΈ Examples
tk one in which it is asked to sort a list
## π§ Ecosystem
We have vested tremendous amounts of power in search and social sites, and much of this power is
maintained by architectural decisions made in browsers. For all the good that they may otherwise
do, browsers have generally sold out to search engines, are central to maintaining search stuck
in a conservative and captured position, and while they are less directly involved as enforcers
for social media, they have nevertheless copped out of putting the user first there too.
We seek to radically shift power back to the users. This requires shifting the Web in the direction
of an architecture that claws power back from search & social.
The core observation to make here is that in order for that to happen, both search and social need
to be protocols rather than systems encapsulated behind proprietary sites, and these protocols need
to be natively supported by the agent.
One may reasonably ask why bundle search and social in the same section here. One reason is because
they are two particularly important online activities, but the similarities run deeper than that:
- Search and social are fundamental modalities of interaction with an information space (along with
browsing) and we benefit from unifying them. In a nutshell, they correspond to different epistemic
approaches to discovery: browsing is self-directed, search is querying expertise, social is relying
on one's social environment. (Interestingly, most search today is implemented using social signals
rather than any actual expertise, but that's a topic for another treatise.) Bringing them together
in a way that allows new things to be built from their interaction makes the user experience more
fluid and powerful.
- The difference between search and social is greatly exaggerated. Both produce lists of resources
which they order by a relevance metric which they derive from social signals (likes or links). The
interaction modality differs (push or pull), but there is significant overlap and some providers
do both at the same time (eg. YouTube). There is no important difference between the search result
for "shoes" and someone's curated list of links about shoes, or yet again what emerges from an
online discussion between a group of shoe lovers. We are just looking at differnet methods of
aggregation and curation. Unifying the primitives that support these alternatives makes it easier
to move between them and to offer much better options to users that don't privilege one
alternative over the others.
- Both search and social suffer from similar problems in that they constrain what can appear in
their lists (links, cards) and have issues with performance and context-switching when opening a
link. This has led to the invention of all kinds of more or less ad hoc fixes, many of which aren't
great (see AMP, SXG, `<portal>`, in-app browsers, FBIA, or the Apple News Format).
Tiles and native search/social work well together because tiles can be embedded in aggregation
feeds (be they curated manually or algorithmically), which addresses the last point well. Because
tiles can be locally manipulated, having a uniform approach to search, social, and curated feeds
means that it is easy to see something good in a search result, read it in place because the tile
can be loaded without navigation, "RT" it from the search results list, and drag it into a feed
you are curating, all without switching context.
At a high level, we can explore the integration of ActivityStreams in general with tiles. This can
include various ways of rendering the tile metadata from the stream (as a card in a list if that
is preferable to active content in some cases, at various sizes when active). There is ongoing
thinking about integrating Activity* and IPFS that could certainly help here.
### π§ Native Search
Search isn't doing great today. Part of that is classic
[*enshittification*](https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys) that happens when a
market is captured ([well summarized for search](https://dkb.io/post/google-search-is-dying)), part
of it is that too much money is being taken out of the Web so that producing high-quality content is
often too expensive and can't compete when it is subsidising the low-quality stuff, and part of it
is that [people are decreasingly excited about putting their content in the wide open for indexing
by a panoptic engine](https://maggieappleton.com/cozy-web). It's plausible that without browser
(and mobile) defaults to artificially prop it up, we'd long have seen a major upheaval of the
search landscape, and I don't mean by LLM gimmicks.
Search is captured, structured to favor generalist engines irrespective of performance, is facing
degrading experience, and has advertising-driven issues. How can a protocol for search fix that?
If browsers search over a protocol rather than through a proprietary interface, multiple search
sources can be combined. The browser can expose richer UI to pick specialized vertical search
engines, can search the local data which it helps organize (as explained farther down), and can
expose an experience that is better unified with social and other curated aggregations of content.
Additionally, using a local recommender (also farther below), an engine can return an unsorted list
of the top 50 items which can then be locally filtered and ranked.
Because search is a protocol, it needs to be paid (since it can't be ads). This is done using the
money protocol system outlined in the money section. Where that money comes from is the agent's
decision; it could be paid by the user, by the user's employer through the financial object
capabilities, or it could be sourced from ads rendered natively (meaning that the agent could
organize competition for the best search-contextual ad for a given query, in a privacy-preserving
manner, using the approach detailed in the ads section).
For the younger reader, the idea of search as a protocol might seem strange, but
[it isn't new at all](https://en.wikipedia.org/wiki/Wide_area_information_server).

#### π§ A Business Model for Browsers & Browser Engines
As indicated above, the system we use today to pay for browsers is in serious need of fixing.
It has several problems:
- It creates an economic structure that is such that it guarantees that the market for search will
tip in favor of a single company.
- It pays for browsers, but doesn't pay for browser engines. There is significant overlap but
with this model browsers can also free-ride on engine development. We're down to thee major
browser engines, and dropping to two or even one within our lifetimes seems plausible. The
system isn't working.
- It limits the evolution to search by restricting the search modality to one generalist search
engine (since the money comes from that default). This hurts vertical search engines and
pigeonholes the Web in the Clippy model where a search engine has to guess which vertical was
intended.
- It gives search sites control over the search UI even when it means degrading it, which experience
has shown will happen (following the process which Doctorow has dubbed
[*enshittification*](https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys)) and which draws
power away from the user.
- It focuses search primarily on an ad model (that's what justifies paying for the default position
in the first place) that can be seen to interfere with the quality of the outcome.
- Being paid to pick a default search engine rather than guiding users to the one that is the best
for them is a direct betrayal of their trust.
All in all, the system we have is user hostile and deeply conservative, forcefully maintaining
search in a mediocre model that hasn't evolved in decades and failing to support browsers.
Here is a high-level overview of the alternative which we can pursue:
- Search revenue is *already* used to pay for browsers (just in a poorly-organised way) so we can
keep using that source of money without negatively impacting anything.
- Native search in the agent means that the agent has to *pay* for API calls to search engines. Not
much, likely a small fraction of a cent, but pay anyway. That money can come directly from the user
(if they don't want ads) or can come from native ads. (Both protocolar money and native ads are
discussed further in the document.)
- A small fraction of that payment could be (verifiably) extracted and routed to both browser and
browser engine. (I would argue that the browser engine should get the lion's share, but we can
have that discussion later.)
This approach would most easily be deployed with regulatory backing, but we could envision a
governance system in which a search engine can only be considered by an agent and an agent can only
be a recipient of payment if they abide by these terms, with the possibility that violators would
be excluded (as with WebPKI in CAB).
[Brian Kardell has estimated](https://bkardell.com/blog/WhereBrowsersComeFrom.html) that maintaining
all three current engines at a level at which they are competitive and functional requires about
USD $2bn per year. The absolute number may seem high, but considering that browser engines are
critical infrastructure for a system that is use by 5bn people and that this is less than 1% of
global search revenue, the approach seems both reasonable and cheap.
An additional benefit of this system is that it could liberate major browser engines from large
corporations whose interests routinely conflict with their users'.
### π§ Native Social
Remember Flock? It didn't last forever, and they had to hack their way to making it work, but they
were on to something. Remember MySpace? A lot of it might have been ugly, but dammit it was *your*
ugly. It was inventive and empowering, and it sure as hell beat the industrial drivel smelling
of nothing but meetings that has become the norm across social media.
An agent that supports ActivityPub natively has similar advantages for social as described above for
search. It integrates well with tiles, it opens up the possibility for a more powerful user
experience in which things can just be moved around, it works with a local recommender that can support
not just ranking but collectively-governed block lists, for instance. Native ads can be used to pay
for the infrastructure but also to pay content creators, without privacy issues and without
engagement maximization.
The current model of social media content isn't great for content: each post can contain some
highly restricted data that is silo-specific. With tiles, *arbitrary* content can be in the feed.
You can "tweet" an app and it just works inside the feed. And your readers can just click the
bookmark button to install it. You don't post a link to a PDF, you just post the packaged document.
The ability to (safely) post arbitrary content to social creates the demand for editors that can
easily *create* content to post according to specific styles. But because the tile+intents system
is highly general, those editors can be implemented simply as tiles that handle a specific intent, and
can themselves be distributed and installed from social. (That might suffer reading over.) For
instance, I can create a tile that can receive an uploaded picture, edit it a bit, throw in a bit
of text, and can ouput a tile that contains all of that in the style of Instagram, signed by the
author. Or I could create a tile that lets you pick a few colors and generates a nice-looking
view of a color theme (maybe just $n$ stripes) which you can then post to a color theme community.
The content-generating tiles can themselves just be posted, etc. Whenever the user wants to publish
new content, their feed uses a `publish` intent that gives them a choice of tiles that can produce
social content. Those tiles are never allowed to post directly to any feed β they just return
postable tiles and the agent handles that.
Note that because tiles are safe, you can "install" one that replaces agent-native functionality,
basically like an extension. So you can have one that renders your feeds your way, for instance.
People [have a lot of ideas](https://docs.google.com/document/d/1EnkjkSW14hR_uTdwEW7vqRaaIXDMlNQ4WRNoPmnn_z0/edit?pli=1#heading=h.i0urqqmbdsw0) about what they
would like their feeds to look like and do. There's no reason to hold them back.
:::info
* [ ] [Mauve might be on to something](https://blog.mauve.moe/posts/peer-to-peer-databases#p2p-social-apps)
:::
### π§ Native Feeds
Feeds of the (broad) RSS family are basically simple forms of social feeds. Assuming native social,
they shouldn't be excessively hard to support.
Doing so, and adding tile support to RSS formats, would not only be convenient for people, but it
would also address the issue that RSS content has to choose between a short efficient summary that
doesn't have everything or the full link which takes you elsewhere.
### π§ Purpose-Specific Protocols Everywhere
We can support this evolution of the ecosystem over time by adding new built-in
purpose-specific protocols. These have the advantage that they make a composable system founded
on tiles more attractive, that they can be governed to work better in support of users, but also
that they make sites easier to build because they work off the shelf.
Some examples include:
- **Advertising**. This is the topic of a section of its own further down, but the core idea
is that purpose-specific ad protocols can be privacy-preserving and user-centric.
- **Chat**. Talking to customer service or to other people, conferencing need not involve
implementing your own chat and opening a direct channel but could work via native Signal support.
- **General Messaging**. It should be an explicit goal to eliminate email as a contact method or
identifier. It leads to spam, it makes it possible to join identity across contexts, it gives too
much control over messaging to the sender rather than the user. By the user's leave, an entity
can be given a unique token with which it can notify the agent to load a tile with a message.
This makes it possible to receive communications from a site, but to ensure that they can be
revoked at any time β without putting people through the hassle of creating their own
single-origin email.
- **Buying**. Having sites manage their own shopping carts is inconvenient. They do it poorly,
they share cart state with arbitrary third parties so as to target cart abandoners with ads,
they lose them if you're not logged in, you can't manage multiple ones, etc. Agent-side cart
management can be supported relatively easily (intent-to-buy), can help power privacy-preserving
ads, and works well with a money protocol. This still requires a protocol to actually carry
the purchase out as well as to verify stock availability.
- **Declarative Telemetry**. Publishers/authors benefit from knowing what's happening on their
content. This is constant source of disagreement because those benefits are real, but people
rightly also don't want to have their behavior tracked in great detail. Making telemetry
declarative, and deploying infrastructure for privacy-preserving measurement (Γ la Prio),
could help negotiate this issue and bring it to a workable close.
:::warning
Add: CRDT over Signal
:::
### β οΈ Examples
- social feed, install intent to post, post, etc. Show ActiveGram and add the
colour themes poster. Creating a new post puts the "post" button in the sidecart.
- curation to your own feed by picking from other feeds
(see https://tomcritchlow.com/2023/01/27/small-databases/)
- search results with multiple sources, PDFs straight embedded
## π§ Money
Pretty much all of the Web's standard architecture, and more generally the Internet as well,
was designed as if money were someone else's problem. Some fondly see this as freedom from
commercial interests but what it actually does is make it difficult for people who, like,
need to make a living to participate, and it readies the bed for platforms that will only gladly
fill the space for ways to monetize online activity.
A Web that puts agency first is also a Web that ensures that publishers (in the general sense of
people who put stuff out there) can make a living, and can do so without being bossed around by
platforms. It's also a Web that aligns the incentives of publishers with those of their users.
If the economic infrastructure only works when you betray your users, people still need to eat.
They'll find ways to do it that they can live with (hire lawyers to fake caring about
privacy, imagine that there's a value exchange in surveillance, think that they're moving the
ethical needle internally with vacuous projectsβ¦) but they'll do it.
Revenue models have largely been limited to (some of these overlap):
* **Ads**. Advertising isn't *necessarily* bad, but the specific manner in which it has been
implemented today is problematic in essentially every which way. It operates on violations of
privacy, it requires publishers to backstab their users, it gives all the power and most of the
money to parasitic intermediaries, it bankrolls disinformation, ad creatives are out of control
in terms of the resources they burn, and the whole system is full of fraud that drains money
from buyers. One star, would not buy again.
* **Subscriptions**. These are great if you don't mind catering only to a rich audience. Also,
do you know how subscription-based services find users? Ads.
* **Buy Once & Own**. This has become relatively niche, and tends to involve DRM. It typically
requires being able to maintain one's own reliable digital archive, which most people aren't
equiped to do.
* **Micropayments**. They have been right around the corner for decades. Even assuming broader
deployment for a standard like Web Monetization, no one has figured out an interaction that
works for micropayments. Time-based accounting is only appropriate for some systems, payment
for access to a single item (eg. one article) is tricky when you can't predict the quality of
that single item, etc. Tipping is often worse.
Payment methods haven't fared much better:
* **Pay With Your Data**. This is intimately tied to ads, but in some cases you might be paying
with your data even though the product is not showing you ads directly (eg. WhatsApp, Chrome).
This economy primarily benefits intermediaries over people and publishers, and
[leads to market concentration](https://berjon.com/competition-privacy/).
* **Credit Cards**. Very common, but high-touch and inconvenient. These probably remain the best
option above a certain amount, but they are clunky (and risky) for frequent use.
* **App Store**. Because let's create yet another intermediary, that's going to help. People enjoy
paying more and publishers making less for no good reason.
* **Cryptocurrency**. This remains anecdotal in usage and is still all over the place in
modalities.
A key point to note is also that intermediaries completely control ads, data markets, credit cards,
and app stores β which is to say the most important of these systems. Intermediaries are not
accountable to either users or publishers and systematically shape the system to favor themselves.
It is particularly difficult to build a system that puts users in the driver's seat under such
conditions.
Intermediary capture (see [*Intermediary Influence*](https://scholarship.law.columbia.edu/cgi/viewcontent.cgi?article=2857&context=faculty_scholarship)) is also a problem in that it
directly defunds the Web. Platforms will often present themselves as funding publishers; the
truth is that they have inserted themselves between publishers and revenue sources, and use
that position to extract more value than they provide. This serves to directly remove value from
the productive, interesting, inventive, or user-centric parts of the ecosystem. Many of the Web's
problems come from the fact that it is, quite simply, starving for funds with which to build
better experiences because too much of the money is being sucked out of it by the platforms.
In the same way that Europe underdeveloped Africa, the platforms are underdeveloping the Web.
In order to ensure that Web revenue distributes more fairly, in ways that work better for people,
we need to consider money a core part of the architecture and to develop standards which we can
use to irrigate the world.
### π§ Requirements
The money part of the system needs to support the following properties:
* **Protocolize Intermediaries**. Payments necessarily involve some degree of intermediation,
as does ad serving (at least if it is going to reach any kind of practical scale). In order to
avoid transferring power to intermediaries who by nature lack accountability, the intermediary
layers need to have their behavior largely dictated by a protocol designed to offer
guarantees of capture-resistance.
* **Bidirectional**. People need to pay but people also need to be paid. It should ideally be as
easy to receive money as it is to send it.
* **Fluid & Programmable**. The money protocol needs to support very small sums efficiently and
needs to make it easy for money to flow according to arrangements that are more complex than
just one point to the next. It needs to be easy for instance to support atomic revenue sharing
arrangements, or to have a capability system that allows one party to delegate spending power
to another. This kind of additional power is required to support end-to-end "mashup" payments
for composable systems.
* **Standard & Interoperable**. The properties that we are seeking from the system are only
possible if the components that support it are available off-the-shelf as open standard
items the behavior of which is trusted.
* **No Data Market**. Data markets are intrinsically problematic not only from a privacy
standpoint but also in that they tend to mechanically lead to concentration.
The Web has considered composability before. There was a brief phase during which
[mashups](https://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)) surfaced as a popular
alternative to glue sniffing, in the sense that they were a cool idea to weave services
together but that they completely disregarded privacy and had no business model whatsoever.
Without a way to make revenue flow across all involved parties, it's really hard to imagine
how mashups could have developed into a sustainable way of providing a service based on
collaboration between multiple interoperable smaller components. That's a mistake we shouldn't
repeat and make sure that composable services, that are better for people, are supported
easily by "composable payments" that extend beyond the naive $\{user, advertiser\} \xrightarrow{pays} site$.
### π§ Ads
Everyone hates ads. They're like little shrill blinking reminders of capitalism stabbing you in the
eyeballs. It's unpleasant. They've turned our digital lives into a hellscape panopticon that leads
disinformation, fraud, and malware to thrive.
But consider this: overall ad spend has been a more or less constants share of circa 1-2% of GDP
since it's been measured. In 2023, worldwide digital ad spend is estimated to ballpark around
USD $700 billion. If that kind of money goes primarily to systems that don't support user agency,
then those are the only systems that will prosper. That money should β and could β be used to
irrigate useful, user-centric systems. The question then becomes: what if ads but good?

The ad stack is deep and complex, this section currently limits itself to indicating a broad
direction and some notes on approaches.
Tile-based ad serving has inherent privacy and security benefits. Many of today's issues in
online advertising stem from the fact that we are composing ads into content in a way that
leaks left, right, and center. (That is broadly the space that
[fenced frames](https://wicg.github.io/fenced-frame/) and
[FLEDGE](https://github.com/WICG/turtledove/blob/main/FLEDGE.md) inhabit.)
Ads that are understood as such by the agent work well with the idea of a fluid money protocol
described below, as well as with making search and social native. You can source ads from a
different provider and use part of the revenue to pay for the search, etc. and potentially
revshare with content.
Serving is only part of the problem, ads need to provide verification that they were shown,
attribution needs to be measured, etc. There has been significant improvement in
privacy-preserving purpose-specific ad protocols over the past few years, and some of that work
could be brought to bear. (See for example
[Interoperable Private Attribution (IPA)](https://docs.google.com/document/d/1KpdSKD8-Rn0bWPTu4UtK54ks0yv2j22pA5SrAD9av4s/edit),
[Attribution Reporting](https://wicg.github.io/attribution-reporting-api/),
[Privacy-Preserving Ads](https://github.com/WICG/privacy-preserving-ads), or
[Private Click Measurement](https://privacycg.github.io/private-click-measurement/)). These
would benefit from implementation as
[verifiable decentralized systems](https://filecoin.io/filecoin.pdf).
One core source of issues in today's advertising ecosystem is the reliance on Real-Time Bidding
(RTB). RTB is inefficient (it requires very substantial computation just to select an ad) and
while not strictly required it operates on the assumption that data will be shared with third
parties that will develop profiles over time and recognize people around the Web so as to target
them. What's more, the ad auction system in general is extremely opaque to the point that we
don't even know which pricing strategy a major actor may be using.
One interesting alternative here is to eliminate RTB by having buyers bid on audience segments
they want in advance, and having publishers offer those segments ahead of time based on
seller-defined audiences (which can be defined client-side, restricted to safe categories, and
protected from leakage). Ads are then shown based on a much simpler runtime selection process
than RTB, using a process that can be simultaneously private, governed as an infrastructure
commons, and accountable.
We could explore implementing this using
[PASTRAMI](https://research.protocol.ai/publications/pastrami-privacy-preserving-auditable-scalable-trustworthy-auctions-for-multiple-items/), possibly ported on FEVM+FVM.
Whether we like them or not, ads are essential to the funding of media and no alternative
scheme has come anywhere near replacing them. This makes digital advertising a critical
infrastructure for democracy, which we shouldn't let be captured, run unaccountably, or
leave subject to overcharging and extractive practices. An added bonus of using tiles is
that it help build a system of long-term traceability and accountability for ads themselves,
which has so far presented a collective challenge.
### π§ Subscriptions & Memberships
Part of the reason why subscriptions are mostly for rich people nowadays (in addition to the
lack of "disposable" income) is because they tend to lack fluidity and to focus entirely on
pairing one person with one service. In turn, this means that a reasonable bundle of content
often comes at an unreasonable price because you need to build it from offers that have a lot
of filler. And because of threshold effects on prices (and the steep cost of customer
acquisition over inefficient ad channels), it can often be a safer bet to sell a higher-priced
subscription to a smaller audience than to try to reach a much broader market at lower prices.
For lack of expertise in this area, I am keeping this section short. We should loop in our
friends from [Unlock Protocol](https://unlock-protocol.com/).
### π§ Money Protocol
Making money a protocol is not trivial and relying on an existing stack is likely to be highly
preferable. The most promising option may be the [Interledger Protocol (ILP)](https://interledger.org/).
They have [specs](https://github.com/interledger/rfcs) and [code](https://github.com/interledger),
and the protocol's properties are well-designed. It supports small payments (and should support
smaller payments as it matures), it can translate between arbitrary currencies with a network of
nodes that compete to provide the cheapest trusted route. It is intended to wrap API calls (rather
than to have a payments side-channel) and to be efficient. They have also built infrastructure for
wallets and nodes, and have been collaborating with some browsers around
[Web Monetization](https://webmonetization.org/) (even if that solution isn't perfect).
:::info
**todo**
- [ ] Boris suggests looking at
[Cross-License Collaboratives](https://writing.kemitchell.com/series/cross-license-collaboratives)
which does seem interesting and applicable as a way to support flow and revenue sharing.
- [ ] Revshare means having a way to decide how sharing. Is this something that our friends in Network
Goods can help with, eg. [Generalized Impact Evaluators](https://research.protocol.ai/publications/generalized-impact-evaluators/ngwhitepaper2.pdf)?
Go read and find out.
:::
* [ ] At a technical level, ILP is interesting in that it supports nanopayments via a standard protocol
### β οΈ Examples
tk
## π§ Agent
The agent is a set of discovery mechanisms: browse, search, social. Tiles make it possible to
create content and to tesselate apps based on principles that empower the agent in the name of
the user, but beyond that the agent itself needs to offer some services (that can of course be
replaced or skinned by tiles serving as extensions).
:::info
**todo**
- [ ] Emily Bender, in *Situating Search*, details some typologies of search that I think would be
interesting to map to search/browse/social to see what's missing and to develop a stronger
theoretical backbone for what we're building.
:::
### π§ Local
Tiles are content-addressable, which also makes them local-first. This helps organize data in
service to the user better, as explained in the next topic, but it also integrates well with
an architecture in which the user's data that a tile processes is stored locally (and possibly
synced across the user's devices).
The [unhosted project](https://unhosted.org/) described the difference in architecture clearly.
This is the architecture in common use today in which a site will be the gatekeeper to your
data, and your data is scattered all over the place for no good reason:

Instead, we can keep all of the user's data locally and selectively grant access to it to tiles
and the apps they tesselate into, with an architecture that looks like:

There is more than one way to make such a system work, and we need to pick between various options,
but the fundamental principle needs to be supported.
:::info
**todo**
- [ ] Establish if [Solid](https://solidproject.org/) is a workable option. The examples and the
spec look like a simple storage layer hidden under a thick pile of RDF, but that might just
be first impressions.
:::
### π§ Organize
You see an interesting thing, you post it to social media with a quick comment, you move on. Two
months later you just *know* that you had that thing but where the hell is it? What smartass
comment did you share it with so that you can find it again? Of course, we all know that
"I have my brain on Twitter" isn't a bright way to go about taking notes but it's also the simpler
option.
Everyone and their dog is organizing a metric ton of information about you, and organizing the
information you see to make you behave this or that way, but precious few products are there to
help you organize your information your way for yourself and all of them expect you to do work.
Your agent can remember so much for you because it's right there when you read it. It can do it
even better with tiles because they are local and because protocol interactions (eg. with social
or search) have clearer semantics. It shouldn't be twice the work to post to social and
keep a note of an interesting article. It should be trivial for the agent to expose a searchable
index of the content you've browsed, it should be easy to organise what you've read on the web by
moving files around β more than a PDF collection and less like bookmarks.
Tools like [Readwise](https://readwise.io/read), [Zotero](https://www.zotero.org/),
[Notion](https://notion.so/), or [Roam](https://roamresearch.com/) should become organising
principles that extend the agent and guide you in organizing your local storage.
### π§ An End to PDF
One of the Web's unresolved shortcomings is the fact that it hasn't killed PDF. PDF has any number
of problems that make it a poor fit for the Web and more generally for the 21st century. It has a
fixed layout that isn't responsive, it has poor accessibility, it interacts poorly with copying and
pasting (or searching), and it doesn't have production or processing tooling anywhere near on par with
what the HTML stack has. Yet it persists because it can do something Web content generally cannot:
it's a file. You can just copy it around, attach it to an email, sort it into a local collection,
upload it somewhere, and it'll just work. You'll hate your life when you try to copy a paragraph from
it on your mobile screen, but it's easy to move it around as you would any other image format.
Sure enough, EPub could compete and would be a superior option β if any browser actually supported it.
Tiles have the same properties as PDF in terms of how they are packaged and can be easily manipulated
in what remains a file-centric world, but they have none of PDF's shortcomings and are in fact
more efficient to curate and sort into collections. Bringing PDF to an end is not what this project
set out to do, but the fact that it could achieve that as a side effect is a good sign that it is
a better iteration of the Web.
### π§ Recommend
Almost no recommendation or ranking system in existence in any product is good. I'm being
cautious and qualifying that statement because I haven't used everything, but I'm reasonably
confident that they all suck. From search to social to recommended articles on news sites
to product recommnendation in ecommerce to generated music playlists to the appaling morass of
streaming service that take vicious pleasure in making your self-curated list of things you
want to watch impossible to find by burying it under reams of *BECAUSE YOU WATCHED KEN'S DREAMHOUSE*,
the preferences exhibited by the systems routinely seem off, useless, if not downright bizarre.
They are also opaque and uncontrollable. "No, I never want to read Bret Stephens or in fact be
reminded of the fact that someone would voluntarily pay him to write" or "never subject me to
reggae again" are pretty simple requests yet the best we ever get is, sometimes, "see less of
this." How much less of what exactly? Who the fuck knows.
These are all one-size-fits all implementation backed by statistical personalisation code, none
of which is stellar. Moving recommendation systems primarily agent-side can help along
multiple lines:
- As independent products that are used to make recommendations in multiple different contexts,
they have an incentive to put the user first rather than to help grind the axe of whichever
product manager has clout this week.
- This ought to encourage them to expose controls that help render them more useful.
- Since they only access tiles, they cannot leak data. (Though we could consider forms of federated
learning for some cases. Note that it is possible to use ML locally anyway.)
- They can learn from preferences across different contexts, without privacy violations.
- They can interact with Activity* protocols to fine-tune filtering.
- This system also makes it a lot easier to establish standards for blocklists that local
recommenders can use for filtering, and from that to evolve communities that can govern
them collectively.
- We can further imagine similarly-distributed curation mechanisms (eg. the [Bechdel
test](https://en.wikipedia.org/wiki/Bechdel_test)) and being used to recommend and label.
In fact, a shift to local recommendations is particularly empowering because it is a shift away from
machine learning β which has its uses but is at heart agency-reducing β to a world in which you
can use your social connections for curation, and in fact curate the curators. We don't lose
recommendations, we create the conditions for their improvement, in line with an approach that
takes the [dark forest web/cozy web](https://maggieappleton.com/ai-dark-forest) situation seriously.
<!--
- see https://chaos.social/@pkreissel/110099714893948163
-->
### π§ Identity
> You have one identity. The days of you having a different image for your work friends or co-workers
> and for the other people you know are probably coming to an end pretty quickly. Having two
> identities for yourself is an example of a lack of integrity.\
> β Mark Zuckerberg
It's a crowded field, but this is a solid contender for being one of the stupidest things ever said
about people, computers, and digital society all at once. Identity is a thoroughly contextual
concept, and the ability to choose which identity to present in which context is an essential
aspect of agency.
A recurring problem in today's digital ecosystem is how leaky identity is. In fact, there are
companies (most of the bigger tech companies among them) whose business model relies fundamentally
on producing an "identity graph" the purpose of which is precisely to force a unified identity upon
people.
Tiles help address this issue by preventing data leakage between contexts, but empowering
people to present the right identity in the right context is more than just preventing recognition.
People present different identities in different context, but sometimes they also need to prove in
one context that they are the same person as a given identity in another context. Identity
management requires keypair handling, account recovery, sync of local data (which of course the
sync service shouldn't be able to access, anything else would be deceptive), keeping track of which
context to expose which identity in, providing ways to fill forms out differently with different
identities, partitioning the local data that a tile can touch by identity, etc. Tiles
significantly simplify the problem, however, in that they make it a lot easier to expose
information knowing that it won't leak.
The user's agent is the only software legitimate in managing the user's identity. The idea that
identity should be managed remotely, by sites, and that identity providers can use information
about people for their own purposes is fundamentally wrong.
It is worth noting that moving identity management to the client, moving as much data as possible
to the client, and preventing leakage all conspire to create a much safer cybersecurity
environment. We should have as part of our endgame to make it so that maintaining people's identities
or data on servers will come to be seen as unsanitary and ridiculous, much in the same way that we
look back at [people swilling mercury](https://en.wikipedia.org/wiki/Mercury_poisoning#History) or
[using asbestos blankets for hospital patients](https://commons.wikimedia.org/wiki/File:Guy%27s_Hospital-_Life_in_a_London_Hospital,_England,_1941_D2325.jpg).
:::info
**todo**
- [ ] See if there's anything to import from [Chris Messina's identity
ideas](https://player.vimeo.com/video/10517404?h=50d7a0c784)
* [ ] Work with [CASA](https://github.com/ChainAgnostic/CASA)
:::
### π§ What About Browsers?
One thing that I never want to lose sight of is the transition from what we have to where we're
going. The Web has flaws, but it also has reams of amazing content that we can't just leave behind.
This is why this plan is architected around adding a primitive and then progressively enhancing the
platform with the advanced, superior capabilities that this primitive enables. But with that in mind,
what happens to browsing as we know it today?
The simple answer is that it stays as essentially the basic case of a more general environment.
Browsing a page is just like accessing a tile, except that it has limited access to powerful
capabilities, cannot expose an intent, cannot be embedded in social or search, cannot be manipulated
locally. Having the two side by side not only provides for a gradual transition and backwards
compatibility, but also serves as a comparison that makes tiles more attractive because they are
more user-friendly.
The new way of weaving applications together which we are outlining doesn't only compare favorably
with browsing interactions: it also works better than most application monoliths which we find on
our phones and laptops. Ultimately, there is no reason to have both a browser/agent and an OS.
([Capyloon](https://capyloon.org/) calls itself a "user agent" and that feels exactly right.)
### β οΈ Exploration & Examples
* [ ] What does this look like, tiles are abstract and we need to be clear that they can't really
look like what we have today if this is going to work
* [ ] This section needs exploratory UI (picking up on the conceptual UI) to show how things work. Social demo of posting Γ la Gram vs Γ la just posting colour themes vs full PDFs in the arXiv feed, etc. How you can just create a neat WinAMP skin and an intent can have it organise and play your Audius music.
* [ ] We can't fix anything if we don't rethink UI as well
* [ ] the choreographer for hyperapp components (intents, assistant, skills)
* [ ] Commands/skills rather than apps is the model. Intents have the advantage that they are localisable and very close in model to skills used in voice agents (or VoiceXML for that matter). So a component that implements a skill can be installed (safely, trivially) and then start answering voice commands.
* [ ] See https://en.m.wikipedia.org/wiki/Ubiquity_(Firefox)
* [ ] Zooming UI? Components connect and therefore can be grouped.
* [ ] Easels: https://saturation.social/@clive/109621917044737294
* [ ] Ethical habituation
* [ ] Good place to demo some MercuryOS ideas https://www.mercuryos.com/architecture
* [ ] NYT β with identity, subscription/paywall (Unlock), recommendations, comments
* [ ] Social media β everything is a tile, create/edit with your own new one, recommendation,
comments, identities, DM (via Signal)
* [ ] office software β EVERYTHING IS A TILE, self-create/edit, link to storage/organisation
comments (as annotations)
* [ ] playlist manager atop music (like a recommendation manager)
* [ ] Shopping cart management. A bit of metadata on products, an intent to add to cart.
Purchase just plugs shipping & billing as needed from identity, plus creates a contact
channel (some Activity* thing), pay with standard money. Merchant accepts and cryptoproof
of purchase is generated. A shop site is now basically just a CMS. The cart manager can
become a product, with the ability to produce coupons, search for cheaper optionsβ¦
## π§ Law
This document is not primarily a legal or policy document, and delving into greater detail of these
aspects should be done elsewhere. However, it is worth pointing out a few specific areas of
interest that would work well with the vision espoused here and that have been subjects of
discussion for upcoming regulation in at least the EU and the US.
The approach we have taken gives power to the people by making the user agent more powerful.
While that is probably the only logical way to achieve user agency, it also creates a major
opportunity for abuse in that the agent could be written to trick the user into favouring the
agent's vendor. Concrete evidence from the browser world shows that this is not a theoretical
concern. A legal approach when people have to rely on an agent that is placed in a position
of significant power is to give that agent *fiduciary duties*. Such duties make it illegal
for the agent to use its position to be disloyal to the user. The details of such duties
require [a much longer treatment](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3827421)
but the core principle is simple.
A browser being paid to pick a default search engine for you is like a stock broker being
paid to recommend investing in a specific company: it's a direct betrayal of trust and would
be regulated by fiduciary duties. However the specifics of how to provide defaults and choice
screens may require a regulatory framework in its own right because of the power that comes
from that specific area.
Native search and social work on the assumption that these services abide by standards in
their respective fields. Evidently, new offerings will be incentivised to work with API
clients, but transitioning to standards will happen a lot faster with mandatory
interoperability.
## π§ Evolution
There have been many projects to reinvent the Web by tossing out the old; their over-inflated
promises provide cushy padding in the dumpsters of history. This proposal outlines a path for
change that tries to set itself apart from those predecessors in several ways:
1. It isn't motivated by theoretical or aesthetic considerations. Rather, it is ruthlessly
dedicated to switching the balance of power to users and to making their lives easier and their
experience of using the Web more satisfying, while also providing the means for authors to
deliver content more easily and in ways that will more readily make them money.
1. It does not remove or break anything on the existing Web. Rather, it is additive. Today's
browsing experience is merely managed as a degenerate case. Pages work just the same, they
are simply limited in the powerful capabilities that they can tap into. The expectation is
evidently that, over time, the superior experience of composable contexts will lead most new
services to be provided that way and most users to spend most of their time there. But that
HTTP blog from 2023 with all its trackers will still be there when you need it.
1. While the full set of capabilities described in this document is not trivial to put together,
the approach is implementable incrementally and benefits can start showing with but a small
subset of the capabilities. What matters in this document is to understand what this approach
*unlocks*. The long list of capabilities isn't about what is required, just about what becomes
possible.
1. Making content movable and locally useful, making search and social native and standard,
wiring intents together all have network effects. This design has the means to become
increasingly valuable as it is increasingly used.
1. It comes with a plan to pay for its own infrastructure and it doesn't ignore the fact that
people need to make money.
I believe that this is a path forward that is both pragmatic and ambitious, and that the
community exists that wants to build it.
## π§ Acknowledgments
Deep thanks to the following people (in alphabetical order of their given names) for their
highly valuable input:
[Boris Mann](https://plnetwork.xyz/@boris),
[Brian Kardell](https://toot.cafe/@bkardell),
[Dietrich Ayala](https://mastodon.social/@dietrich).
---
:::success
This is the dumpster for ideas that might be useful.
* What's the link to Mini Apps?
* "Consumption may be personalized, but it would be a stretch, in most cases, to call it self-directed"
* Grab from yellow notebook.
* It is impossible to rely on educating users but it is desirable to lean harder into their intelligence and greater knowledge of their own lives than into their laziness and the convenience of passive monitoring..
* Mozilla product strategy notes from my Notion.
* [Hackability vs Usability](https://ipfs.io/ipfs/bafykbzacebhcml34t5725ciht2yjept2ula5c26rlbzyfubk77akxtas6ean6?filename=%28Routledge%20research%20in%20cultural%20and%20media%20studies%2C%2039%29%20Larissa%20Hjorth_%20Jean%20Burgess_%20Ingrid%20Richardson%20-%20Studying%20mobile%20media%20_%20cultural%20technologies%2C%20mobile%20communication%2C%20and%20the%20iPhone-Routledge%20%28.pdf)
* If we do develop the capabilities approach more, look into Akire (Relational Ontology Capabilities) for a list that might work better than Nussbaum's. Feels closer to virtues/Vallor. Starts from abstract notions and looks for locally-relevant concrete developments.
* Intelligent, not artifical.
* Build for the Dark Forest Web (https://mastodon.social/@robin/109639042561917605)

:::