--- tags: draft, web --- # User Agency :::info This is a very early draft, I am just capturing notes as I go along. It is barely coherent and not ready to review. Proceed at your own risk. While the draft is in progress, sections are labelled with their maturity level: * 🟒 β€” good to ship * 🚧 - in progress * ☠️ β€” just bare bones outline * πŸ“¦ β€” material ::: :::danger **Issues** * Revoice, too abstract, too distant in places. * In **every** single section, relate it to how tiles as a primitive helps, enables, or how they work together for better outcomes. ::: The Web is in a bad place and many brilliant people have specific ideas about how to return oomph to the Web by fixing this or that specific aspect. Maybe the key issue is to make it faster? Or perhaps if we just added these API things will start looking up? Might we just need to emulate native platforms? There is nothing inherently wrong with these ideas, and at least some of them should probably be implemented one way or another. But the contention behind this document is that this whack-a-mole tinkering adds up to little more than refactoring deck chairs on the _Titanic_ and we're not even sure if the _Titanic_ might not be a submarine or an airship. Proceeding via small, incremental changes is a healthy and laudable approach but it helps to have a sense for what it is that we're _incrementing to_. After all, a headless chicken, too, takes it one step at a time. We don't have an idea of what the web should or even could be. To address this, this document is many things, in fact it is β€” by design β€” _too many_ things. It endeavours to be "_[vague but exciting](https://blog.mozfr.org/dotclear/public/Firefox_OS/proposal.png)_." It's wrong. It's biting off more than anyone could hope to chew. But it's a white paper that gives directions for the future of the web, and I hope that it can help us figure out what it is that we're all doing here. <u>TL;DR part 1</u>: the core *philosophical* idea that runs through this document is that **_the web is about user agency_**. This idea will be made more precise below, but a few salient points about this approach are worth calling out right away: * As I will argue below, the idea of user agency ties well the [capabilities approach](https://en.wikipedia.org/wiki/Capability_approach), an approach to ethics and human welfare that is concrete, focused on real, pragmatic improvements, and that has been designed to operate at scale. * While the word "user" has come to mean something less than a person, I suggest that we reclaim rather than abandon it and enshrine the web user as the person who operates the web. More specifically, "user agency" points at the user agent (a.k.a. the browser) and a key tenet of this position is that limitations and mistakes in how we envision user agents are central to holding the web back. * Perhaps counterintuitively, focusing on user agency does not make this position individualistic. On the contrary, because it has to be about _everyone's_ agency, it imagines a web that is β€œ[_a global community thoroughly structured by non-domination_](https://bookshop.org/books/reconsidering-reparations/9780197508893).” Put differently, under this view the web is the answer to the question of β€œ_[what is a form of β€˜collectivity’ that everywhere locally maximizes individual agency, while making collective emergent structures possible and interesting](https://c4ss.org/wp-content/uploads/2020/06/Aurora-ScaleAnarchy_ful-version.pdf)_.” <u>TL;DR part 2</u>: The core *technical* position in this document is that there is **a single primitive β€” which we call *tiles* β€” which we can add to the Web platform and that can then serve as the foundation for a shift of power from servers to users**. In a nutshell: * Tiles cannot interact with the network other than to load other tiles or indirectly via purpose-specific interfaces that the user agent can reason about. This means that they can be granted access to sensitive data and functionality (so long as it isn't locally destructive) because they cannot generally exfiltrate information. * Tiles are content-addressable packages, which means that they can be loaded from arbitrary sources and through arbitrary mechanisms, and can be stored and manipulated locally (eg. installation is just keeping something around). * They can declare their ability to handle specific tasks or skills (in a manner reminiscent of intents/activities) and compose with the skills of other tiles, making it possible to weave tiles together (to tesselate) in an app-like way, mediated by the agent. The result opens up a very rich and novel way of building things on the Web, with a world of potential new user-centric features (hence the length of this document), but using a relatively constrained set of new standards that lend themselves to an MVP implementation and iterations, can be created over time, and can live side by side with today's Web to ensure a smooth, progressive transition. *Revolution through evolution*. Because this document is big and wrong, it is intended to remain a living doc and it invites your participation β€” come make it even bigger and wronger, vaguer and more exciting! ## 🚧 The Web Rocks; Browsers Suck So, this section title might come across as needlessly harsh, especially in a document written inside a browser. But let's be frank, browsers: we need to talk. No one knows what the Web is. Seriously, no one does. No group ever managed to reach any kind of useful agreement as to a definition of the Web. It doesn't have to be over HTTP and it doesn't have to happen in a browser, and conversely there are things that run over HTTP or in a browser that many would consider to not be the Web (eg. PDF). But we don't need to agree on what it is to agree that it's awesome. Whatever different fuzzy ideas we have in our minds are close enough to one another. The problem is, though, that browsers have painted themselves into a corner, and we could benefit from taking a step back and revisiting the assumptions that have brought us here. They do very little to support people beyond running a browser engine safely, the UI system of windows and tabs is famously broken, they make for a poor application environment as people who get to vote with their feet keep telling us… The goal of this project is to make incremental changes, not to reinvent everything from the ground up, but in order to make progress we *will* need to change some pretty ingrained things that are keeping us stuck here, and browser UI is one. The delta between the tech stack that exists today (though it may not be deployed in this way) and what this project aims at is relatively contained. This is primarily an evolution of the Web focused on fixing architectural gaps that keep us trapped at a local optimum discovered a couple of decades ago. And we need to recognise that UI metaphors enshrine that local optimum. What's more, "browsers" are a figment of Web engineers' imaginations. From a product perspective, in the minds of most users, they don't exist. Or at least they don't exist as things that are separate from a search engine, and to a lesser degree from a social network (see in-app browsing). The architectural view in which search, social, and browsing are distinct is a distraction that does not map to the experienced reality of most people and that is in fact conceptually arbitrary with respect to the tasks that people actually seek to accomplish on the Web. In a sense, we're missing the Web for the tabs. The rest of the document goes into more specific changes that need to be performed, but we can give a high-level understanding of issues that need to be addressed with browsers in order to support user agency. First, **browsers need to do more to support people in their online lives**, from managing their identities to staying on top of massive amounts of information. The tabs and bookmarks system is very much underpowered compared to today's web. It provides a poor environment for applicative use, which in turn is a poor match for the web's capabilities. By making extensions safer and by extending the space for UI capabilities, we can also make it easier for content to be promoted to extension-like status, making it easier to boost the agent's power. Some of the field's brightest minds have been trying to make "Web Apps" happen for coming up on two decades. It's not going to happen, not like this. The majority of people just keep any number of apps open at all times but max out at three tabs β€” that's all you need to know about the future of the apps-in-tabs model: it doesn't exist. Second, **browsing, search, and social need to be unified**. In most people's minds, search and browser are the same product, and on mobile (which is where it's at) social is also headed there with in-app rendering. (And AMP leaned into this mental model even harder.) Keeping them separate means that one gets to commoditise the others β€” right now it's the browser that's being commoditised in favour of systems in which people have decreased agency. We need the user agent to commoditise search and social right back where they belong: serving the user. This means turning them into protocols and having the agent provide the UI for them. It also means moving controls over ranking and recommendations to the agent as much as possible. Moving this control away from services and into people's hands improves agency directly, but also has less direct benefits for people. Choosing what is relevant (which definitely includes ranking and recommendation) is an editorial decision. Democracy requires media pluralism, but under today's system we only have a small number of algorithmic media hegemons. "*Gatekeepers may no longer control what gets published, but algorithms control what gets circulated. (…) It is misleading then to argue that cultural circulation has been democratized. The means of circulation are algorithmic, and they are not subject to democratic accountability or control. Hyperconnectivity has in fact further concentrated power over the means of circulation in the hands of the giant platforms that design and control the architectures of visibility.*" (Rogers Brubaker, [*Hyperconnected Culture And Its Discontents*](https://www.noemamag.com/hyperconnected-culture-and-its-discontents/)). People will have greater agency in a world in which they have more and better choices of editorial relevance functions than of cereal brands, that's just a fact. Third, **browsers (and even more so engines) need a business model** that doesn't involve selling users to search engines. It is not uncommon for browser developers to believe that the default search engine they ship with is not the one that is the best for their users, but that is what foots the bills and so they close their eyes and think of England. In addition to misaligned incentives, this arrangement also creates a mechanical feedback loop in the search market. Because of increasing returns, the dominant search engine can pay browsers more for the default position. In turn, few people change the default (because doing so does not help them make an informed decision that self-evidently aligns with their interests), which in turn feeds dominance. Rinse and repeat. This dynamic guarantees that any search engine that acquires a modest and temporary advantage over others will come to own the market, irrespective of quality by any other criterion. This maintains both a lack of media pluralism in search and a lack of innovation in browsers' approach to search. Long term, it also doesn't support browser or engine diversity. And fourth, **the architecture that browsers enforce puts most of the power in the hands of the server.** By far almost all of the intelligence in a browser engine is dedicated to abiding by server-provided instructions. Apart from a number security protections, the client side of the Web is profoundly dumb from the user's perspective. **Browsers enforce an asymmetry of automation that puts authors before users; it needs to be reversed.** Any number of typical features would work better if they were in people's hands: recommendations, search & social filters, blocking, identity, comments, shopping cart management, subscription & membership management. Most of these systems work poorly because they have to be constantly reinvented and reimplemented by businesses that should be focusing on their core competencies instead. The result is a system of widespread mediocrity. In the usability/hackability trade-off we have somehow managed to land in a place where we have neither. To conclude, browsers today enable, not necessarily willingly, an ecosystem that is actively hostile to agency. We can't fix "just" UI or "just" tech or "just" the economics β€” we need to navigate the complex trade-offs involved in fixing all three while maintaining an evolutionary path, without a tabula rasa revolution. To do that, we need to rethink agency and then find better ways to empower it. <div style="text-align: center; font-size: 3rem;"> ⁂ </div> ## 🚧 Ethics > β€œThe ideas of economists and political philosophers, both when they are right and when > they are wrong, are more powerful than is commonly understood. Indeed, the world is > ruled by little else. Practical men, who believe themselves to be quite exempt from any > intellectual influences, are usually slaves of some defunct economist.”\ ― [John Maynard Keynes](https://en.wikipedia.org/wiki/John_Maynard_Keynes), *The General Theory of Employment, Interest, and Money* As technologists we are often reluctant to engage with philosophy, a reluctance often expressed by running in the opposite direction all limbs akimbo with an ululating shriek reminiscent of some of the less harmonious works of exorcism. Even those of us who are curious about it rarely seem to let it shape what we build. In the same way, however, that the more abstract forms of computer science *can* indeed help us produce better architectures, philosophy *can* be applied to the creation of better technology. A quick tour of the biggest problems that we face in tech β€” governance, sovereignty, speech, epistemic individualism, gatekeeping, user agency, privacy, trust, community β€” reads like a syllabus for the toughest course in ethics and political philosophy. There is no useful future for technology that doesn't wrestle with harder problems. This is probably not the right place for a full-on treatise on ethics (no, do _not_ tempt me), but if we're going to work how best to develop user agency it seems useful to agree on some basic notions of how to do good for people and of what having agency means. Three things that are important to share some minimal foundations about are: 1) working towards ethical outcomes doesn't mean relying on vapid grand principles but rather ought to be focused on concrete recommendations, 2) when considering agency we need to be thinking about *real* agency rather than theoretical freedoms, and 3) counterintuitively, giving people greater agency sometimes means making decisions for them, and that's okay if it's done properly. The rest of this section covers these in greater detail. First, **focusing on user-centric ethics does not mean that we should get lost in reams of endless abstraction; on the contrary, we must focus on principles that can be *implemented*.** Many documents about ethical tech seem to be lists of lofty principles ("*for all humankind!*" or "*this must be fair and just and shiny!*"). These can sound nice, and can occasionally prove useful (for instance to decide disagreements), but for the most part it's hard to know what to do with them. By constrast, when working on standards, we only consider requirements that can be [verified with a test](https://www.w3.org/TR/test-methodology/) to be meaningful β€” everything else is filler. Being as strict in our ethical standards is challenging, but we can strive for it. In fact, on the web this is something that we have been doing for a while without describing it as such. Instead of just saying "everyone should have access to the web" or "people should be able to use the web in ways they can trust" we have extensively detailed and concrete principles about what "access for all" means (all of the accessibility and the I18N review guidelines) or what trustworth means (all of the security and privacy documents). This approach has multiple advantages. First, it's concrete which means that proposals can be reviewed (including self-reviewed) in practical terms and debated constructively. Second, instead of having a constitution-level group come up with lofty phrases carved in stone, the care and maintenance for these principles are delegated to groups that specialise in these areas, that can often have representatives of people affected by the problems (or at least folks in touch with them), and that can keep these updated as the community's knowledge improves. Finally, these help to develop craft and knowledge about these areas in the broader community, rather than keep this knowledged confined just to a group of experts. Second, **deepening user agency has to be about giving people real capabilities to act, not theoretical rights that they can't actually exercise**. We need to focus on *opportunity* or *substantial* freedoms, freedoms that people can *really* exercise. This avoids the trap of vaporware freedom in which people may have a nominal or legal right to do something but the world is architected in such a way as to prevent it. Everyone can start a business or own a house! β€” except no bank exists that will lend to people like you. Users can change the default as much as they want to! β€” except you know that they won't because the UI discourages it. Everyone can speak! β€” except only certain voices get amplified by the algorithmic gods. Martha Nussbaum and Amartya Sen have developed a pragmatic understanding of quality-of-life and basic social justice known as the *capabilities approach*. The capabilities approach asks "*What each person is able to do and to be?*" (Martha Nussbaum, [*Creating Capabilities*](https://bookshop.org/p/books/creating-capabilities-the-human-development-approach-martha-c-nussbaum/6690885?ean=9780674072350)) Even assuming generous amounts of universal education, people do not have the time to assemble what they need from tech components, to go through pages of configuration, or to answer thousands of prompts. We cannot pretend that we are giving people neutral tools for them to go choose their own adventure with, we cannot be satisfied with [RFC6919](https://datatracker.ietf.org/doc/html/rfc6919)-style rights Γ  la "*you MAY change the default (but we know you won't)*." Capabilities were designed with development in mind, they are meant to change people's actual lives, not to check theoretical items off a list. They are by nature concrete & implementable, which connects with the previous point. **In many ways, capabilities *are* user agency**. Finally, **designing technology for people, even paradoxically when specifically designing technology in support of user agency, unavoidably means making at least some decisions for them.** "*Ethics and technology are connected because technologies invite or afford specific patterns of thought, behaviour, and valuing: they open up new possibilities for human action and foreclose or obscure others.*" (Shannon Vallor, *[Technology and the Virtues](https://bookshop.org/books/technology-and-the-virtues-a-philosophical-guide-to-a-future-worth-wanting/9780190905286)*) Technology choices, from low-level infrastructure all the way to the UI, decide what is made salient or silent, hard or easy. They shape what is possible and therefore what people can even think about acting on. Addressing this issue can be done in part through strong and diverse governance, notably of ethical review standards, and also by focusing the decision process on supporting user agency, which is to say on developing actual capabilities and driving skillful habituation towards the good life. This section evidently barely provides a cursory outline of the approach, but hopefully enough to show that it is feasible to develop alignment around an ethical grounding of user agents and user agency that is documented, implementable, and effective. ## 🚧 Foundation The Web's architecture makes it near impossible to compose multiple services without creating massive privacy (and often security) holes. This contributes to web sites being monolithic and with a lot of overlapping functionality. When services are composed, this is typically with abysmal privacy properties, and often brittle security. (Script injection all the things!) This proposal doesn't radically change the Web's architecture. What it does is *add a new primitive* that opens the door to newer, safer, more composable Web capabilities. ### 🚧 Requirements & Motivation A vision of applications on the Web, and its hellish handmaidens of chaos packaging and permissions, has been the elusive hankering of many a bright mind over the past decades who, after being consumed by its distant bewitching song as an adventurer may be by the murmuration of distant shores, now spend their greyer years muttering cantankerously to themselves on social media. What haven't we tried? * In 1999, [[W3C](https://github.com/w3c/vc-use-cases/pull/129) chartered work on XML Packaging](https://www.w3.org/XML/2000/07/xml-packaging-charter) that was meant to make it possible to put some SVG, XHTML, CSS, etc. in a container to make apps with. * In 2003 the binary XML work considered the same use cases of [packaged documents](https://www.w3.org/TR/xbc-use-cases/#edocs) and [mobile apps](https://www.w3.org/TR/xbc-use-cases/#xml-docs-mobile) to be in scope. * In 2006, we had the [Web APIs group](https://www.w3.org/2006/webapi/admin/charter) kick off work, as well as the [Web App Formats group](https://www.w3.org/2006/appformats/). * [EPUB](https://www.w3.org/publishing/epub32/) has been addressing the packaging subset of that since 2007. * The OMTP BONDI project tried to produce a Web-based standard for mobile apps (now so defunct that there is nothing on the Web). * [Packaged Web Apps (Widgets)](https://www.w3.org/TR/widgets/) had a whole family of specs for a while. * In 2009, the Device APIs and Permissions WG was supposed to solve that issue. * The September 2014 [W3C Next steps on trust and permissions for Web applications](https://www.w3.org/2014/07/permissions/) was meant to solve this. * The September 2018 [W3C Workshop on Permissions and User Consent](https://www.w3.org/Privacy/permissions-ws-2018/cfp.html) was meant to solve this. * We had SXG at some point around here. * The December 2022 [W3C Workshop on Permissions](https://www.w3.org/Privacy/permissions-ws-2022/) was meant to solve this. This list is just for illustrative purposes; if it were exhaustive it would be *much* longer. All told, and with the exception of EPub and a small number of APIs from these groups that have found some measure of success, that's a quarter century of failure. Is there any reason to believe that we can do better? We've been trying to build web applications, but we should instead be building application webs. The monolithic app that dominates native environments is a poor fit for the Web. (HCI researchers have [long been pointing out that it's not a great fit for humans, either](https://en.wikipedia.org/wiki/The_Humane_Interface).) This narrow focus has led us into a series over interconnected impasses: * If you can just navigate to it, then it can't generally have access to powerful capabilities. If instead of navigating to it you have to install it first, then it's just a native app built with Web technology. * Being connected to the internet means that almost any access to device functionality or personal information is dangerous. This fundamentally limits approaches to a small number of options: **1**) expose little to nothing, **2**) ask the user, or **3**) delegate the decision to an app store or some equivalent. Every single one of these options is bad. * Apps may be wrong but tabs are worse. Trying to reproduce the monolithic desktop or mobile app paradigm on the Web is to application what making every website work like a linear book would be to documents. The model that this document puts forward is different. It involves developing an application equivalent to hypertext (*hyperapp*, maybe) that leads to a Web of commands, tasks, or skills that work together at the user's behest. One way to think about it is as the Unix philosophy if it had been invented after rounded corners and didn't involve growing a beard. The load bearing foundation for this overarching programme is the introduction *Web Tiles*, which are a new approach to the "permissions and packaging" problem. Tiles are: * **Safe by default**. Like Web pages, it must be *always* safe to load a tile. In fact, it must be safer as we intend it to leak a lot less information than a Web page load can. In order to keep the promises we are making, loading a tile must be privacy-preserving with a certain number of strong guarantees. (It can't be perfect, but it can be much better.) * **Powerful by default**. *Unlike* Web pages, it must be possible to give a tile access to sensitive information by default, without requiring the user to grant additional access. (This isn't to say that *some* powerful capabilities might not be further gated, but they should be significantly rarer.) This is achieved because tiles are prevented from arbitrary access to the network: they are limited to loading other tiles (which is transitively safe) and to purpose-specific protocols which the agent can restrict and reason about. Note that when tiles load one another, they do not have to worry about cross-origin restrictions precisely because they are safe anyway. * **Content-addressable**. Loading a tile musn't require interacting with a specific server. By making tiles content-addressable based on a hash of their content, accessing a tile can happen via a cache, via anonymising intermediaries, etc. and doesn't require communicating with any origin server that the tile is "located" at. * **Local-first**. Being content-addressable also makes tiles local-first. "Installing" a tile just means pinning it such that it remains locally saved and accessible. Ditto "saving" content. * **Packaged and linkable**. Tiles group together related content so that they can be usable without making additional network requests, but any content inside a tile is as linkable as it would be in any other Web context. ### 🚧 IPFS + CSP Throughout this document I have tried to avoid inventing new things, prefering instead to arrange existing technology in new ways to match the underlying requirements and architectural preference. IPFS, containing Web formats, is a good candidate to address most of the requirements for tiles. However, it is insufficient on its own as there is nothing preventing IPFS content from embedding HTTP content or vice versa, which in turn breaks the safety and composability of the tile system. Tiles are comparable in some ways to [Isolated Web Apps](https://github.com/WICG/isolated-web-apps/) and a similar set of constraints expressed using a Content Security Policy can be applied. The concerns are somewhat different, though, and so for instance `blob:` would be allowed while connections wouldn't. (The below hasn't been reviewed in detail, it is only to give a sense of the policy.) ```http Content-Security-Policy: default-src 'self' ipfs: ipns:; style-src 'self' 'unsafe-inline' ipfs: ipns:; script-src 'self' 'unsafe-inline' ipfs: ipns: 'wasm-unsafe-eval'; img-src 'self' ipfs: ipns: blob:; media-src 'self' ipfs: ipns: blob:; ``` A very quick primer on IPFS and IPFS as a Web protocol may be helpful to those with a more traditional Web background. IPFS is a peer-to-peer content addressable protocol. Content is stored in blocks that are given a cryptographically-derived identifier (a CID, for Content ID) that is based on the block's content. A block is retrived using its CID. Because retrival is derived from the block's content and not its location, a block does not have to be obtained from the party that initially produced it or in fact from any specific network location. Rather, anyone can cache and redistributed it, and its content cannot be tempered with without causing the CID to change. A block can be raw bytes, or it can have various kinds of structure. A single block can package up multiple files and their metadata, and act as a small file system β€” this makes it possible for instance to distribute an HTML file with its dependencies in the block. IPFS addresses use the `ipfs:` scheme and use the CID as the authority part of the URL, as in [`ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi/`](https://ipfs.io/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi). If you don't have an IPFS-enabled browser (like Brave) or an IPFS client, you can use an HTTP gateway to IPFS (clicking the previous link makes use of one). If the IPFS block is structured data, you can then use a path component in the URL to pick out a specific piece of content inside the block. IPFS is strictly immutable, but it can be useful to have a fixed address that points to content that canb change over time yet is itself self-certifying (self-certifying means that it comes with all the information you need to verify that it is authentic) and P2P. That is what IPNS is for. The high-level principle of IPNS is that it is a name that represents a cryptographic key pair. That key is then used to create a record that contains the public key and a CID, and that record is published to the peer network. Anyone can then retrive that record, make sure that it matches the key, and retrieve the CID over IPFS. IPNS URLs use the `ipns:` scheme. (Note that, for simplicity's sake, I'm skipping a lot of detail and ignoring a number of alternative options in the rich interplanetary ecosystem. If you want to find out more, head to [IPFS docs](https://docs.ipfs.tech/) site.) In terms of how IPFS works as a Web protocol, these pieces of infrastructure means that it can be used wherever an HTTP(S) URL is available, be it at the top level, in `src`, or in `href`. So just to give an example, you could build a site in the usual way and simply pull in IPFS content for some highly cachable parts, maybe loading shared NPM libraries with a `<script src="ipfs://<CID>/lit.js"></script>` or some common free fonts into CSS with a `url(ipfs://<CID>/NotMyType.woff2)`. Similarly, you could package up a blog post in an IPFS block along with all of its dependencies and either load that by navigating to `ipfs://<CID>` in a browser's URL bar or embed it with `<iframe src="ipfs://<CID>/"></iframe>`. And you can mix both by having the blog post in a block on its own and loading dependencies from another block. :::info Note that this section assumes that loading, serving, and pinning IPFS blocks can be done in a privacy-preserving way. This is the subject of work in progress and the full threat modelling will be the subject of another document. ::: With this brief overview done, we can turn to the interesting properties that this adds to the platform. IPFS content is immutable (IPNS is a mutable pointer to immutable content). This offers a foundation from which to safely compose simpler services together. It is important to keep in mind that a tile can only remain composable so long as it is not mixed with a non-composable one (typically an HTTP context) because the moment that that box is open, anything goes. This does not mean that HTTP is not useful; rather that it is an escape hatch. Because of this, just loading IPFS into a browser is not enough to make it a composable context: you also need a CSP strict enough to prevent loading from HTTP as exemplified above and to have the user agent enforce it directly. Composability creates a trade-off. So long as a context is composable, then it can safely communicate with other composable contexts with no limits. The result is always itself a composable context. This means that same-origin restriction lose their usefulness (as do mixed-content limitations, which have little meaning here). This is a powerful capability and we will see how it can be put to work. However, this also means losing the ability to use server-side knowledge β€” there is no server β€” to personalise content on the fly as well as the possibility to send data back to the content's origin. This isn't a small trade-off and most will find it far too constraining on the face of it. We will see further in the document that it is possible to make these capabilities less necessary β€” in fact to make it possible to do without in a large number of cases β€” and also to provide purpose-specific APIs for a number of core cases. Keep in mind that if a tile can be guaranteed to be composable, *more* powerful APIs can be exposed to it because it is inherently trustworthy. As we start thinking about composability, it's important to keep in mind that by allowing ourselves to think about UI changes, we can consider new ways of composing. There's embedding composition that we know well on the Web, but we can also imagine collaborative/lateral composition as exemplified for instance in the speculative [MercuryOS](https://www.mercuryos.com/architecture), powered by Intents/Activities which were designed precisely for this kind of usage. For instance, identity can be a user agent (wallet) services the UI for which is provided by a component, and the UA can serve as a connector to [make identity "pluggable" into sites, under user control](https://darobin.github.io/beltalowda/). ## 🚧 Composition Tiles have highly desirable privacy and security properties, and the ability to default to granting them access to powerful capabilities is great, but if we were to stop there we would nevertheless have a rather limited system. Sure enough, tiles can be composed by embedding a tile in another via an `iframe` (or similar) and they can talk using `postMessage` but that remains weak. The result remains choreographed by the rootmost app author and doesn't give the user or their agent any greater power. What we need is a system that enables arbitrary tiles to communicate usefully and interoperably with one another, in a way that puts the user in charge, without sacrificing usability. We are essentially looking for a versatile I/O system for tiles that matches a series of UI patterns that can be made intuitive to people. ### 🚧 Requirements & Design We want the ability to wire together an arbitrary number of tiles (to tesselate them) without the user having to think about it (ie. this shouldn't look like a graphical programming interface, at least not in the typical case) but while supporting a web of small, dedicated app-like functionality and putting the user in control of their experience without having to configure a million complex monoliths. In order to achieve this, we need a system that is: * **Declarative**. Tiles must be able to describe which requests they can handle without having even been activated once (though some form of "installation" may be required). A tile should be able to say "I can provide pictures" or "I can edit social posts" or yet againt "I can sort a list of articles in the order this person will prefer". Essentially, the model is one in which a tile can convey its ability to <u>verb</u> a <u>resource type</u>. * **Selectable**. If a request is made for an activity which multiple tiles can perform, it needs to be possible to present a list of the applicable ones and pick the right one in the direct flow of the user's action. * **Installable & Sharable**. Installing a tile is trivial since it is packaged and local-first. A tile that handles activity requests is installed in exactly that way: a single click bookmark/save will pin it locally and render it available to handle requests. Since it's just a tile, it can be shared just as easily by being posted to a feed the user can write to. * **Arbitrary**. The type of requests that they can handle needs to be open ended. What degree of coordination may be required to make sure that unrelated tile authors know to use the same language is up for determination. Using this mechanism, tiles must be able to talk back and forth, and to exchange arbitrary data. * **Discoverable**. It should be simple to produce a service that indexes tiles by what they can handle so that when there is a request for an activity that the user does not have, finding an appropriate one can be straightforward. * **Asynchronous/continuous**. The most basic approach for tesselation is UI-driven request/response: the user interacts with a tile (eg. hits "Edit this picture") and a tile is selected to offer the functionality (in this case actually provide an image editor) before returning the content. But other integrations must be possible, for instance automatically sorting a list of articles (technically, of tiles, perhaps arbitrary) to match the user's preferences whenever the context calls for it, and without the user having to specifically ask for it in the moment. Two further design considerations should be taken into account: 1. This kind of invocation system is relatively similar to that which is used by voice assistants. In fact, the similarity is rather clear when considering [Alexa Intents and utterances](https://developer.amazon.com/en-US/docs/alexa/custom-skills/create-the-interaction-model-for-your-skill.html), Apple's [App Intents](https://developer.apple.com/documentation/AppIntents) which work for both Siri and non-voice interaction modalities such as Shortcuts and Spotlight, [Google Actions](https://medium.com/google-cloud/building-your-first-action-for-google-home-in-30-minutes-ec6c65b7bd32), or even [VoiceXML](https://www.w3.org/TR/voicexml20/). The general verb-based model of linguistic interfaces has also been tested with Mozilla's [Ubiquity](https://wiki.mozilla.org/Labs/Ubiquity) project that built a form of command system for the Web. We should design our system in such a way that it supports linguistic interaction with minimal effort on top of the composition system so as to encourage the emergence of an open voice assistant on the back of existing functionality. 1. A system that composes pieces of functionality using verb/resource pairs is conceptually close to [UCAN's with/can model](https://github.com/ucan-wg/spec#24-capability). Maintaining conceptual compatibility with UCANs is valuable as it opens the door to a rich object capability model. ### 🚧 Intents & Activities The existing technology most similar to these requirements is [Web Intents](https://www.w3.org/TR/web-intents/). Web Intents were developed (and abandoned) by the W3C's Device APIs Working Group as a way to enable precisely the kind of composition described here between Web pages. They were inspired by Android Intents. A number of alternative designs were proposed at the time, one of which being Mozilla's [WebActivities](https://wiki.mozilla.org/WebActivities) which is still in use in B2G. The syntax of existing options isn't necessarily ideal, but it can be mapped to more workable alternatives. For instance, registering an intent/activity can be reproduced in a way that supports discovery by having the verb/types pairings that the tile supports listed in its IPLD metadata: ```javascript intents: [ // this can pick images and return them { can: 'pick', what: 'image/*', title: 'Select an image from our cat memes collection', }, // this can create a social post which the user can post { can: 'post', what: 'org.w3.activity', title: 'Post a cat meme', }, ] ``` Whereas hyperlinks are nouns β€” they *name* things β€” intents are verbs. In this sense, they are comparable to HTTP's verbs (methods) but offer richer semantics that sit closer to user applications. When a verb is invoked, the agent typically needs to ask the user what they want to invoked it on or with (eg. which source pick images from). A [very simple demo](https://darobin.github.io/beltalowda/simple-intent.html) can serve to illustrate the flow for installation and a simple pick intent. An intent is invoked using a simple API (this is just an indicative example, the exact shape should almost certainly be different and may of course reuse one of the existing alternatives): ```javascript <button id="img-picker">Pick Image</button> <img alt="No image" id="profile"> // ... document.querySelector('#img-picker') .addEventListener('click', async () => { const img = await navigator.invokeIntent('pick', 'image/*'); if (img) document.querySelector('#img-picker').src = img.src; } ); ``` More involved interactions can be required when the intent is not about a simple request/response action. For instance, we can think of an intent the purpose of which is to produce recommendations for the user. The way it works when installed and active is that it receives lists of items of the kind that it knows how to produce recommendations for, it filters out anything that it knows should not be in the list (eg. I've said I never want to read that author again), it ranks the rest according to whatever applicable criteria (eg. using its own ML model or simply chronologically), and returns that ranking to whatever tile is rendering the list. ### 🚧 Extensions One interesting aspect to note is that the security properties of tiles mean that they can be safely used as extensions to the agent in many cases, and with a simpler installation ceremony. At least for extensions that cannot touch non-tile content. A generalized intent system would make it easier to expose a number of useful properties that extensions normally have access to to essentially all tiles. This is by design: if the agent becomes as powerful as this proposal makes it, and if we increase how much user interface the agent affords (as opposed to just a thin chrome in browsers) and how much functionality it handles natively (eg. search and social), then it needs to be easily and readily extensible so as not to die a suffocating death under the endlessly conservative bureaucracy of browser vendor interfaces. Tiles as extensions, with granular ways to replace specific parts of the UI and to handle a significant number of agent functions, means that we can have a perpetually dynamic experience that can be highly tailored to people and environments. ### ☠️ Examples tk one in which it is asked to sort a list ## 🚧 Ecosystem We have vested tremendous amounts of power in search and social sites, and much of this power is maintained by architectural decisions made in browsers. For all the good that they may otherwise do, browsers have generally sold out to search engines, are central to maintaining search stuck in a conservative and captured position, and while they are less directly involved as enforcers for social media, they have nevertheless copped out of putting the user first there too. We seek to radically shift power back to the users. This requires shifting the Web in the direction of an architecture that claws power back from search & social. The core observation to make here is that in order for that to happen, both search and social need to be protocols rather than systems encapsulated behind proprietary sites, and these protocols need to be natively supported by the agent. One may reasonably ask why bundle search and social in the same section here. One reason is because they are two particularly important online activities, but the similarities run deeper than that: - Search and social are fundamental modalities of interaction with an information space (along with browsing) and we benefit from unifying them. In a nutshell, they correspond to different epistemic approaches to discovery: browsing is self-directed, search is querying expertise, social is relying on one's social environment. (Interestingly, most search today is implemented using social signals rather than any actual expertise, but that's a topic for another treatise.) Bringing them together in a way that allows new things to be built from their interaction makes the user experience more fluid and powerful. - The difference between search and social is greatly exaggerated. Both produce lists of resources which they order by a relevance metric which they derive from social signals (likes or links). The interaction modality differs (push or pull), but there is significant overlap and some providers do both at the same time (eg. YouTube). There is no important difference between the search result for "shoes" and someone's curated list of links about shoes, or yet again what emerges from an online discussion between a group of shoe lovers. We are just looking at differnet methods of aggregation and curation. Unifying the primitives that support these alternatives makes it easier to move between them and to offer much better options to users that don't privilege one alternative over the others. - Both search and social suffer from similar problems in that they constrain what can appear in their lists (links, cards) and have issues with performance and context-switching when opening a link. This has led to the invention of all kinds of more or less ad hoc fixes, many of which aren't great (see AMP, SXG, `<portal>`, in-app browsers, FBIA, or the Apple News Format). Tiles and native search/social work well together because tiles can be embedded in aggregation feeds (be they curated manually or algorithmically), which addresses the last point well. Because tiles can be locally manipulated, having a uniform approach to search, social, and curated feeds means that it is easy to see something good in a search result, read it in place because the tile can be loaded without navigation, "RT" it from the search results list, and drag it into a feed you are curating, all without switching context. At a high level, we can explore the integration of ActivityStreams in general with tiles. This can include various ways of rendering the tile metadata from the stream (as a card in a list if that is preferable to active content in some cases, at various sizes when active). There is ongoing thinking about integrating Activity* and IPFS that could certainly help here. ### 🚧 Native Search Search isn't doing great today. Part of that is classic [*enshittification*](https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys) that happens when a market is captured ([well summarized for search](https://dkb.io/post/google-search-is-dying)), part of it is that too much money is being taken out of the Web so that producing high-quality content is often too expensive and can't compete when it is subsidising the low-quality stuff, and part of it is that [people are decreasingly excited about putting their content in the wide open for indexing by a panoptic engine](https://maggieappleton.com/cozy-web). It's plausible that without browser (and mobile) defaults to artificially prop it up, we'd long have seen a major upheaval of the search landscape, and I don't mean by LLM gimmicks. Search is captured, structured to favor generalist engines irrespective of performance, is facing degrading experience, and has advertising-driven issues. How can a protocol for search fix that? If browsers search over a protocol rather than through a proprietary interface, multiple search sources can be combined. The browser can expose richer UI to pick specialized vertical search engines, can search the local data which it helps organize (as explained farther down), and can expose an experience that is better unified with social and other curated aggregations of content. Additionally, using a local recommender (also farther below), an engine can return an unsorted list of the top 50 items which can then be locally filtered and ranked. Because search is a protocol, it needs to be paid (since it can't be ads). This is done using the money protocol system outlined in the money section. Where that money comes from is the agent's decision; it could be paid by the user, by the user's employer through the financial object capabilities, or it could be sourced from ads rendered natively (meaning that the agent could organize competition for the best search-contextual ad for a given query, in a privacy-preserving manner, using the approach detailed in the ads section). For the younger reader, the idea of search as a protocol might seem strange, but [it isn't new at all](https://en.wikipedia.org/wiki/Wide_area_information_server). ![Mandalorian: this is the WAIS](https://i.imgur.com/CaIsuWV.png) #### 🚧 A Business Model for Browsers & Browser Engines As indicated above, the system we use today to pay for browsers is in serious need of fixing. It has several problems: - It creates an economic structure that is such that it guarantees that the market for search will tip in favor of a single company. - It pays for browsers, but doesn't pay for browser engines. There is significant overlap but with this model browsers can also free-ride on engine development. We're down to thee major browser engines, and dropping to two or even one within our lifetimes seems plausible. The system isn't working. - It limits the evolution to search by restricting the search modality to one generalist search engine (since the money comes from that default). This hurts vertical search engines and pigeonholes the Web in the Clippy model where a search engine has to guess which vertical was intended. - It gives search sites control over the search UI even when it means degrading it, which experience has shown will happen (following the process which Doctorow has dubbed [*enshittification*](https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys)) and which draws power away from the user. - It focuses search primarily on an ad model (that's what justifies paying for the default position in the first place) that can be seen to interfere with the quality of the outcome. - Being paid to pick a default search engine rather than guiding users to the one that is the best for them is a direct betrayal of their trust. All in all, the system we have is user hostile and deeply conservative, forcefully maintaining search in a mediocre model that hasn't evolved in decades and failing to support browsers. Here is a high-level overview of the alternative which we can pursue: - Search revenue is *already* used to pay for browsers (just in a poorly-organised way) so we can keep using that source of money without negatively impacting anything. - Native search in the agent means that the agent has to *pay* for API calls to search engines. Not much, likely a small fraction of a cent, but pay anyway. That money can come directly from the user (if they don't want ads) or can come from native ads. (Both protocolar money and native ads are discussed further in the document.) - A small fraction of that payment could be (verifiably) extracted and routed to both browser and browser engine. (I would argue that the browser engine should get the lion's share, but we can have that discussion later.) This approach would most easily be deployed with regulatory backing, but we could envision a governance system in which a search engine can only be considered by an agent and an agent can only be a recipient of payment if they abide by these terms, with the possibility that violators would be excluded (as with WebPKI in CAB). [Brian Kardell has estimated](https://bkardell.com/blog/WhereBrowsersComeFrom.html) that maintaining all three current engines at a level at which they are competitive and functional requires about USD $2bn per year. The absolute number may seem high, but considering that browser engines are critical infrastructure for a system that is use by 5bn people and that this is less than 1% of global search revenue, the approach seems both reasonable and cheap. An additional benefit of this system is that it could liberate major browser engines from large corporations whose interests routinely conflict with their users'. ### 🚧 Native Social Remember Flock? It didn't last forever, and they had to hack their way to making it work, but they were on to something. Remember MySpace? A lot of it might have been ugly, but dammit it was *your* ugly. It was inventive and empowering, and it sure as hell beat the industrial drivel smelling of nothing but meetings that has become the norm across social media. An agent that supports ActivityPub natively has similar advantages for social as described above for search. It integrates well with tiles, it opens up the possibility for a more powerful user experience in which things can just be moved around, it works with a local recommender that can support not just ranking but collectively-governed block lists, for instance. Native ads can be used to pay for the infrastructure but also to pay content creators, without privacy issues and without engagement maximization. The current model of social media content isn't great for content: each post can contain some highly restricted data that is silo-specific. With tiles, *arbitrary* content can be in the feed. You can "tweet" an app and it just works inside the feed. And your readers can just click the bookmark button to install it. You don't post a link to a PDF, you just post the packaged document. The ability to (safely) post arbitrary content to social creates the demand for editors that can easily *create* content to post according to specific styles. But because the tile+intents system is highly general, those editors can be implemented simply as tiles that handle a specific intent, and can themselves be distributed and installed from social. (That might suffer reading over.) For instance, I can create a tile that can receive an uploaded picture, edit it a bit, throw in a bit of text, and can ouput a tile that contains all of that in the style of Instagram, signed by the author. Or I could create a tile that lets you pick a few colors and generates a nice-looking view of a color theme (maybe just $n$ stripes) which you can then post to a color theme community. The content-generating tiles can themselves just be posted, etc. Whenever the user wants to publish new content, their feed uses a `publish` intent that gives them a choice of tiles that can produce social content. Those tiles are never allowed to post directly to any feed β€” they just return postable tiles and the agent handles that. Note that because tiles are safe, you can "install" one that replaces agent-native functionality, basically like an extension. So you can have one that renders your feeds your way, for instance. People [have a lot of ideas](https://docs.google.com/document/d/1EnkjkSW14hR_uTdwEW7vqRaaIXDMlNQ4WRNoPmnn_z0/edit?pli=1#heading=h.i0urqqmbdsw0) about what they would like their feeds to look like and do. There's no reason to hold them back. :::info * [ ] [Mauve might be on to something](https://blog.mauve.moe/posts/peer-to-peer-databases#p2p-social-apps) ::: ### 🚧 Native Feeds Feeds of the (broad) RSS family are basically simple forms of social feeds. Assuming native social, they shouldn't be excessively hard to support. Doing so, and adding tile support to RSS formats, would not only be convenient for people, but it would also address the issue that RSS content has to choose between a short efficient summary that doesn't have everything or the full link which takes you elsewhere. ### 🚧 Purpose-Specific Protocols Everywhere We can support this evolution of the ecosystem over time by adding new built-in purpose-specific protocols. These have the advantage that they make a composable system founded on tiles more attractive, that they can be governed to work better in support of users, but also that they make sites easier to build because they work off the shelf. Some examples include: - **Advertising**. This is the topic of a section of its own further down, but the core idea is that purpose-specific ad protocols can be privacy-preserving and user-centric. - **Chat**. Talking to customer service or to other people, conferencing need not involve implementing your own chat and opening a direct channel but could work via native Signal support. - **General Messaging**. It should be an explicit goal to eliminate email as a contact method or identifier. It leads to spam, it makes it possible to join identity across contexts, it gives too much control over messaging to the sender rather than the user. By the user's leave, an entity can be given a unique token with which it can notify the agent to load a tile with a message. This makes it possible to receive communications from a site, but to ensure that they can be revoked at any time β€” without putting people through the hassle of creating their own single-origin email. - **Buying**. Having sites manage their own shopping carts is inconvenient. They do it poorly, they share cart state with arbitrary third parties so as to target cart abandoners with ads, they lose them if you're not logged in, you can't manage multiple ones, etc. Agent-side cart management can be supported relatively easily (intent-to-buy), can help power privacy-preserving ads, and works well with a money protocol. This still requires a protocol to actually carry the purchase out as well as to verify stock availability. - **Declarative Telemetry**. Publishers/authors benefit from knowing what's happening on their content. This is constant source of disagreement because those benefits are real, but people rightly also don't want to have their behavior tracked in great detail. Making telemetry declarative, and deploying infrastructure for privacy-preserving measurement (Γ  la Prio), could help negotiate this issue and bring it to a workable close. :::warning Add: CRDT over Signal ::: ### ☠️ Examples - social feed, install intent to post, post, etc. Show ActiveGram and add the colour themes poster. Creating a new post puts the "post" button in the sidecart. - curation to your own feed by picking from other feeds (see https://tomcritchlow.com/2023/01/27/small-databases/) - search results with multiple sources, PDFs straight embedded ## 🚧 Money Pretty much all of the Web's standard architecture, and more generally the Internet as well, was designed as if money were someone else's problem. Some fondly see this as freedom from commercial interests but what it actually does is make it difficult for people who, like, need to make a living to participate, and it readies the bed for platforms that will only gladly fill the space for ways to monetize online activity. A Web that puts agency first is also a Web that ensures that publishers (in the general sense of people who put stuff out there) can make a living, and can do so without being bossed around by platforms. It's also a Web that aligns the incentives of publishers with those of their users. If the economic infrastructure only works when you betray your users, people still need to eat. They'll find ways to do it that they can live with (hire lawyers to fake caring about privacy, imagine that there's a value exchange in surveillance, think that they're moving the ethical needle internally with vacuous projects…) but they'll do it. Revenue models have largely been limited to (some of these overlap): * **Ads**. Advertising isn't *necessarily* bad, but the specific manner in which it has been implemented today is problematic in essentially every which way. It operates on violations of privacy, it requires publishers to backstab their users, it gives all the power and most of the money to parasitic intermediaries, it bankrolls disinformation, ad creatives are out of control in terms of the resources they burn, and the whole system is full of fraud that drains money from buyers. One star, would not buy again. * **Subscriptions**. These are great if you don't mind catering only to a rich audience. Also, do you know how subscription-based services find users? Ads. * **Buy Once & Own**. This has become relatively niche, and tends to involve DRM. It typically requires being able to maintain one's own reliable digital archive, which most people aren't equiped to do. * **Micropayments**. They have been right around the corner for decades. Even assuming broader deployment for a standard like Web Monetization, no one has figured out an interaction that works for micropayments. Time-based accounting is only appropriate for some systems, payment for access to a single item (eg. one article) is tricky when you can't predict the quality of that single item, etc. Tipping is often worse. Payment methods haven't fared much better: * **Pay With Your Data**. This is intimately tied to ads, but in some cases you might be paying with your data even though the product is not showing you ads directly (eg. WhatsApp, Chrome). This economy primarily benefits intermediaries over people and publishers, and [leads to market concentration](https://berjon.com/competition-privacy/). * **Credit Cards**. Very common, but high-touch and inconvenient. These probably remain the best option above a certain amount, but they are clunky (and risky) for frequent use. * **App Store**. Because let's create yet another intermediary, that's going to help. People enjoy paying more and publishers making less for no good reason. * **Cryptocurrency**. This remains anecdotal in usage and is still all over the place in modalities. A key point to note is also that intermediaries completely control ads, data markets, credit cards, and app stores β€” which is to say the most important of these systems. Intermediaries are not accountable to either users or publishers and systematically shape the system to favor themselves. It is particularly difficult to build a system that puts users in the driver's seat under such conditions. Intermediary capture (see [*Intermediary Influence*](https://scholarship.law.columbia.edu/cgi/viewcontent.cgi?article=2857&context=faculty_scholarship)) is also a problem in that it directly defunds the Web. Platforms will often present themselves as funding publishers; the truth is that they have inserted themselves between publishers and revenue sources, and use that position to extract more value than they provide. This serves to directly remove value from the productive, interesting, inventive, or user-centric parts of the ecosystem. Many of the Web's problems come from the fact that it is, quite simply, starving for funds with which to build better experiences because too much of the money is being sucked out of it by the platforms. In the same way that Europe underdeveloped Africa, the platforms are underdeveloping the Web. In order to ensure that Web revenue distributes more fairly, in ways that work better for people, we need to consider money a core part of the architecture and to develop standards which we can use to irrigate the world. ### 🚧 Requirements The money part of the system needs to support the following properties: * **Protocolize Intermediaries**. Payments necessarily involve some degree of intermediation, as does ad serving (at least if it is going to reach any kind of practical scale). In order to avoid transferring power to intermediaries who by nature lack accountability, the intermediary layers need to have their behavior largely dictated by a protocol designed to offer guarantees of capture-resistance. * **Bidirectional**. People need to pay but people also need to be paid. It should ideally be as easy to receive money as it is to send it. * **Fluid & Programmable**. The money protocol needs to support very small sums efficiently and needs to make it easy for money to flow according to arrangements that are more complex than just one point to the next. It needs to be easy for instance to support atomic revenue sharing arrangements, or to have a capability system that allows one party to delegate spending power to another. This kind of additional power is required to support end-to-end "mashup" payments for composable systems. * **Standard & Interoperable**. The properties that we are seeking from the system are only possible if the components that support it are available off-the-shelf as open standard items the behavior of which is trusted. * **No Data Market**. Data markets are intrinsically problematic not only from a privacy standpoint but also in that they tend to mechanically lead to concentration. The Web has considered composability before. There was a brief phase during which [mashups](https://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)) surfaced as a popular alternative to glue sniffing, in the sense that they were a cool idea to weave services together but that they completely disregarded privacy and had no business model whatsoever. Without a way to make revenue flow across all involved parties, it's really hard to imagine how mashups could have developed into a sustainable way of providing a service based on collaboration between multiple interoperable smaller components. That's a mistake we shouldn't repeat and make sure that composable services, that are better for people, are supported easily by "composable payments" that extend beyond the naive $\{user, advertiser\} \xrightarrow{pays} site$. ### 🚧 Ads Everyone hates ads. They're like little shrill blinking reminders of capitalism stabbing you in the eyeballs. It's unpleasant. They've turned our digital lives into a hellscape panopticon that leads disinformation, fraud, and malware to thrive. But consider this: overall ad spend has been a more or less constants share of circa 1-2% of GDP since it's been measured. In 2023, worldwide digital ad spend is estimated to ballpark around USD $700 billion. If that kind of money goes primarily to systems that don't support user agency, then those are the only systems that will prosper. That money should β€” and could β€” be used to irrigate useful, user-centric systems. The question then becomes: what if ads but good? ![Morpheus: What if I told you advertising isn't evil?](https://i.imgur.com/vNStooK.png) The ad stack is deep and complex, this section currently limits itself to indicating a broad direction and some notes on approaches. Tile-based ad serving has inherent privacy and security benefits. Many of today's issues in online advertising stem from the fact that we are composing ads into content in a way that leaks left, right, and center. (That is broadly the space that [fenced frames](https://wicg.github.io/fenced-frame/) and [FLEDGE](https://github.com/WICG/turtledove/blob/main/FLEDGE.md) inhabit.) Ads that are understood as such by the agent work well with the idea of a fluid money protocol described below, as well as with making search and social native. You can source ads from a different provider and use part of the revenue to pay for the search, etc. and potentially revshare with content. Serving is only part of the problem, ads need to provide verification that they were shown, attribution needs to be measured, etc. There has been significant improvement in privacy-preserving purpose-specific ad protocols over the past few years, and some of that work could be brought to bear. (See for example [Interoperable Private Attribution (IPA)](https://docs.google.com/document/d/1KpdSKD8-Rn0bWPTu4UtK54ks0yv2j22pA5SrAD9av4s/edit), [Attribution Reporting](https://wicg.github.io/attribution-reporting-api/), [Privacy-Preserving Ads](https://github.com/WICG/privacy-preserving-ads), or [Private Click Measurement](https://privacycg.github.io/private-click-measurement/)). These would benefit from implementation as [verifiable decentralized systems](https://filecoin.io/filecoin.pdf). One core source of issues in today's advertising ecosystem is the reliance on Real-Time Bidding (RTB). RTB is inefficient (it requires very substantial computation just to select an ad) and while not strictly required it operates on the assumption that data will be shared with third parties that will develop profiles over time and recognize people around the Web so as to target them. What's more, the ad auction system in general is extremely opaque to the point that we don't even know which pricing strategy a major actor may be using. One interesting alternative here is to eliminate RTB by having buyers bid on audience segments they want in advance, and having publishers offer those segments ahead of time based on seller-defined audiences (which can be defined client-side, restricted to safe categories, and protected from leakage). Ads are then shown based on a much simpler runtime selection process than RTB, using a process that can be simultaneously private, governed as an infrastructure commons, and accountable. We could explore implementing this using [PASTRAMI](https://research.protocol.ai/publications/pastrami-privacy-preserving-auditable-scalable-trustworthy-auctions-for-multiple-items/), possibly ported on FEVM+FVM. Whether we like them or not, ads are essential to the funding of media and no alternative scheme has come anywhere near replacing them. This makes digital advertising a critical infrastructure for democracy, which we shouldn't let be captured, run unaccountably, or leave subject to overcharging and extractive practices. An added bonus of using tiles is that it help build a system of long-term traceability and accountability for ads themselves, which has so far presented a collective challenge. ### 🚧 Subscriptions & Memberships Part of the reason why subscriptions are mostly for rich people nowadays (in addition to the lack of "disposable" income) is because they tend to lack fluidity and to focus entirely on pairing one person with one service. In turn, this means that a reasonable bundle of content often comes at an unreasonable price because you need to build it from offers that have a lot of filler. And because of threshold effects on prices (and the steep cost of customer acquisition over inefficient ad channels), it can often be a safer bet to sell a higher-priced subscription to a smaller audience than to try to reach a much broader market at lower prices. For lack of expertise in this area, I am keeping this section short. We should loop in our friends from [Unlock Protocol](https://unlock-protocol.com/). ### 🚧 Money Protocol Making money a protocol is not trivial and relying on an existing stack is likely to be highly preferable. The most promising option may be the [Interledger Protocol (ILP)](https://interledger.org/). They have [specs](https://github.com/interledger/rfcs) and [code](https://github.com/interledger), and the protocol's properties are well-designed. It supports small payments (and should support smaller payments as it matures), it can translate between arbitrary currencies with a network of nodes that compete to provide the cheapest trusted route. It is intended to wrap API calls (rather than to have a payments side-channel) and to be efficient. They have also built infrastructure for wallets and nodes, and have been collaborating with some browsers around [Web Monetization](https://webmonetization.org/) (even if that solution isn't perfect). :::info **todo** - [ ] Boris suggests looking at [Cross-License Collaboratives](https://writing.kemitchell.com/series/cross-license-collaboratives) which does seem interesting and applicable as a way to support flow and revenue sharing. - [ ] Revshare means having a way to decide how sharing. Is this something that our friends in Network Goods can help with, eg. [Generalized Impact Evaluators](https://research.protocol.ai/publications/generalized-impact-evaluators/ngwhitepaper2.pdf)? Go read and find out. ::: * [ ] At a technical level, ILP is interesting in that it supports nanopayments via a standard protocol ### ☠️ Examples tk ## 🚧 Agent The agent is a set of discovery mechanisms: browse, search, social. Tiles make it possible to create content and to tesselate apps based on principles that empower the agent in the name of the user, but beyond that the agent itself needs to offer some services (that can of course be replaced or skinned by tiles serving as extensions). :::info **todo** - [ ] Emily Bender, in *Situating Search*, details some typologies of search that I think would be interesting to map to search/browse/social to see what's missing and to develop a stronger theoretical backbone for what we're building. ::: ### 🚧 Local Tiles are content-addressable, which also makes them local-first. This helps organize data in service to the user better, as explained in the next topic, but it also integrates well with an architecture in which the user's data that a tile processes is stored locally (and possibly synced across the user's devices). The [unhosted project](https://unhosted.org/) described the difference in architecture clearly. This is the architecture in common use today in which a site will be the gatekeeper to your data, and your data is scattered all over the place for no good reason: ![browser ↔ web application ↔ user data](https://i.imgur.com/riA5twj.png) Instead, we can keep all of the user's data locally and selectively grant access to it to tiles and the apps they tesselate into, with an architecture that looks like: ![user data ↔ browser ↔ web application](https://i.imgur.com/trz7VcD.png) There is more than one way to make such a system work, and we need to pick between various options, but the fundamental principle needs to be supported. :::info **todo** - [ ] Establish if [Solid](https://solidproject.org/) is a workable option. The examples and the spec look like a simple storage layer hidden under a thick pile of RDF, but that might just be first impressions. ::: ### 🚧 Organize You see an interesting thing, you post it to social media with a quick comment, you move on. Two months later you just *know* that you had that thing but where the hell is it? What smartass comment did you share it with so that you can find it again? Of course, we all know that "I have my brain on Twitter" isn't a bright way to go about taking notes but it's also the simpler option. Everyone and their dog is organizing a metric ton of information about you, and organizing the information you see to make you behave this or that way, but precious few products are there to help you organize your information your way for yourself and all of them expect you to do work. Your agent can remember so much for you because it's right there when you read it. It can do it even better with tiles because they are local and because protocol interactions (eg. with social or search) have clearer semantics. It shouldn't be twice the work to post to social and keep a note of an interesting article. It should be trivial for the agent to expose a searchable index of the content you've browsed, it should be easy to organise what you've read on the web by moving files around β€” more than a PDF collection and less like bookmarks. Tools like [Readwise](https://readwise.io/read), [Zotero](https://www.zotero.org/), [Notion](https://notion.so/), or [Roam](https://roamresearch.com/) should become organising principles that extend the agent and guide you in organizing your local storage. ### 🚧 An End to PDF One of the Web's unresolved shortcomings is the fact that it hasn't killed PDF. PDF has any number of problems that make it a poor fit for the Web and more generally for the 21st century. It has a fixed layout that isn't responsive, it has poor accessibility, it interacts poorly with copying and pasting (or searching), and it doesn't have production or processing tooling anywhere near on par with what the HTML stack has. Yet it persists because it can do something Web content generally cannot: it's a file. You can just copy it around, attach it to an email, sort it into a local collection, upload it somewhere, and it'll just work. You'll hate your life when you try to copy a paragraph from it on your mobile screen, but it's easy to move it around as you would any other image format. Sure enough, EPub could compete and would be a superior option β€” if any browser actually supported it. Tiles have the same properties as PDF in terms of how they are packaged and can be easily manipulated in what remains a file-centric world, but they have none of PDF's shortcomings and are in fact more efficient to curate and sort into collections. Bringing PDF to an end is not what this project set out to do, but the fact that it could achieve that as a side effect is a good sign that it is a better iteration of the Web. ### 🚧 Recommend Almost no recommendation or ranking system in existence in any product is good. I'm being cautious and qualifying that statement because I haven't used everything, but I'm reasonably confident that they all suck. From search to social to recommended articles on news sites to product recommnendation in ecommerce to generated music playlists to the appaling morass of streaming service that take vicious pleasure in making your self-curated list of things you want to watch impossible to find by burying it under reams of *BECAUSE YOU WATCHED KEN'S DREAMHOUSE*, the preferences exhibited by the systems routinely seem off, useless, if not downright bizarre. They are also opaque and uncontrollable. "No, I never want to read Bret Stephens or in fact be reminded of the fact that someone would voluntarily pay him to write" or "never subject me to reggae again" are pretty simple requests yet the best we ever get is, sometimes, "see less of this." How much less of what exactly? Who the fuck knows. These are all one-size-fits all implementation backed by statistical personalisation code, none of which is stellar. Moving recommendation systems primarily agent-side can help along multiple lines: - As independent products that are used to make recommendations in multiple different contexts, they have an incentive to put the user first rather than to help grind the axe of whichever product manager has clout this week. - This ought to encourage them to expose controls that help render them more useful. - Since they only access tiles, they cannot leak data. (Though we could consider forms of federated learning for some cases. Note that it is possible to use ML locally anyway.) - They can learn from preferences across different contexts, without privacy violations. - They can interact with Activity* protocols to fine-tune filtering. - This system also makes it a lot easier to establish standards for blocklists that local recommenders can use for filtering, and from that to evolve communities that can govern them collectively. - We can further imagine similarly-distributed curation mechanisms (eg. the [Bechdel test](https://en.wikipedia.org/wiki/Bechdel_test)) and being used to recommend and label. In fact, a shift to local recommendations is particularly empowering because it is a shift away from machine learning β€” which has its uses but is at heart agency-reducing β€” to a world in which you can use your social connections for curation, and in fact curate the curators. We don't lose recommendations, we create the conditions for their improvement, in line with an approach that takes the [dark forest web/cozy web](https://maggieappleton.com/ai-dark-forest) situation seriously. <!-- - see https://chaos.social/@pkreissel/110099714893948163 --> ### 🚧 Identity > You have one identity. The days of you having a different image for your work friends or co-workers > and for the other people you know are probably coming to an end pretty quickly. Having two > identities for yourself is an example of a lack of integrity.\ > β€” Mark Zuckerberg It's a crowded field, but this is a solid contender for being one of the stupidest things ever said about people, computers, and digital society all at once. Identity is a thoroughly contextual concept, and the ability to choose which identity to present in which context is an essential aspect of agency. A recurring problem in today's digital ecosystem is how leaky identity is. In fact, there are companies (most of the bigger tech companies among them) whose business model relies fundamentally on producing an "identity graph" the purpose of which is precisely to force a unified identity upon people. Tiles help address this issue by preventing data leakage between contexts, but empowering people to present the right identity in the right context is more than just preventing recognition. People present different identities in different context, but sometimes they also need to prove in one context that they are the same person as a given identity in another context. Identity management requires keypair handling, account recovery, sync of local data (which of course the sync service shouldn't be able to access, anything else would be deceptive), keeping track of which context to expose which identity in, providing ways to fill forms out differently with different identities, partitioning the local data that a tile can touch by identity, etc. Tiles significantly simplify the problem, however, in that they make it a lot easier to expose information knowing that it won't leak. The user's agent is the only software legitimate in managing the user's identity. The idea that identity should be managed remotely, by sites, and that identity providers can use information about people for their own purposes is fundamentally wrong. It is worth noting that moving identity management to the client, moving as much data as possible to the client, and preventing leakage all conspire to create a much safer cybersecurity environment. We should have as part of our endgame to make it so that maintaining people's identities or data on servers will come to be seen as unsanitary and ridiculous, much in the same way that we look back at [people swilling mercury](https://en.wikipedia.org/wiki/Mercury_poisoning#History) or [using asbestos blankets for hospital patients](https://commons.wikimedia.org/wiki/File:Guy%27s_Hospital-_Life_in_a_London_Hospital,_England,_1941_D2325.jpg). :::info **todo** - [ ] See if there's anything to import from [Chris Messina's identity ideas](https://player.vimeo.com/video/10517404?h=50d7a0c784) * [ ] Work with [CASA](https://github.com/ChainAgnostic/CASA) ::: ### 🚧 What About Browsers? One thing that I never want to lose sight of is the transition from what we have to where we're going. The Web has flaws, but it also has reams of amazing content that we can't just leave behind. This is why this plan is architected around adding a primitive and then progressively enhancing the platform with the advanced, superior capabilities that this primitive enables. But with that in mind, what happens to browsing as we know it today? The simple answer is that it stays as essentially the basic case of a more general environment. Browsing a page is just like accessing a tile, except that it has limited access to powerful capabilities, cannot expose an intent, cannot be embedded in social or search, cannot be manipulated locally. Having the two side by side not only provides for a gradual transition and backwards compatibility, but also serves as a comparison that makes tiles more attractive because they are more user-friendly. The new way of weaving applications together which we are outlining doesn't only compare favorably with browsing interactions: it also works better than most application monoliths which we find on our phones and laptops. Ultimately, there is no reason to have both a browser/agent and an OS. ([Capyloon](https://capyloon.org/) calls itself a "user agent" and that feels exactly right.) ### ☠️ Exploration & Examples * [ ] What does this look like, tiles are abstract and we need to be clear that they can't really look like what we have today if this is going to work * [ ] This section needs exploratory UI (picking up on the conceptual UI) to show how things work. Social demo of posting Γ  la Gram vs Γ  la just posting colour themes vs full PDFs in the arXiv feed, etc. How you can just create a neat WinAMP skin and an intent can have it organise and play your Audius music. * [ ] We can't fix anything if we don't rethink UI as well * [ ] the choreographer for hyperapp components (intents, assistant, skills) * [ ] Commands/skills rather than apps is the model. Intents have the advantage that they are localisable and very close in model to skills used in voice agents (or VoiceXML for that matter). So a component that implements a skill can be installed (safely, trivially) and then start answering voice commands. * [ ] See https://en.m.wikipedia.org/wiki/Ubiquity_(Firefox) * [ ] Zooming UI? Components connect and therefore can be grouped. * [ ] Easels: https://saturation.social/@clive/109621917044737294 * [ ] Ethical habituation * [ ] Good place to demo some MercuryOS ideas https://www.mercuryos.com/architecture * [ ] NYT β€” with identity, subscription/paywall (Unlock), recommendations, comments * [ ] Social media β€” everything is a tile, create/edit with your own new one, recommendation, comments, identities, DM (via Signal) * [ ] office software β€” EVERYTHING IS A TILE, self-create/edit, link to storage/organisation comments (as annotations) * [ ] playlist manager atop music (like a recommendation manager) * [ ] Shopping cart management. A bit of metadata on products, an intent to add to cart. Purchase just plugs shipping & billing as needed from identity, plus creates a contact channel (some Activity* thing), pay with standard money. Merchant accepts and cryptoproof of purchase is generated. A shop site is now basically just a CMS. The cart manager can become a product, with the ability to produce coupons, search for cheaper options… ## 🚧 Law This document is not primarily a legal or policy document, and delving into greater detail of these aspects should be done elsewhere. However, it is worth pointing out a few specific areas of interest that would work well with the vision espoused here and that have been subjects of discussion for upcoming regulation in at least the EU and the US. The approach we have taken gives power to the people by making the user agent more powerful. While that is probably the only logical way to achieve user agency, it also creates a major opportunity for abuse in that the agent could be written to trick the user into favouring the agent's vendor. Concrete evidence from the browser world shows that this is not a theoretical concern. A legal approach when people have to rely on an agent that is placed in a position of significant power is to give that agent *fiduciary duties*. Such duties make it illegal for the agent to use its position to be disloyal to the user. The details of such duties require [a much longer treatment](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3827421) but the core principle is simple. A browser being paid to pick a default search engine for you is like a stock broker being paid to recommend investing in a specific company: it's a direct betrayal of trust and would be regulated by fiduciary duties. However the specifics of how to provide defaults and choice screens may require a regulatory framework in its own right because of the power that comes from that specific area. Native search and social work on the assumption that these services abide by standards in their respective fields. Evidently, new offerings will be incentivised to work with API clients, but transitioning to standards will happen a lot faster with mandatory interoperability. ## 🚧 Evolution There have been many projects to reinvent the Web by tossing out the old; their over-inflated promises provide cushy padding in the dumpsters of history. This proposal outlines a path for change that tries to set itself apart from those predecessors in several ways: 1. It isn't motivated by theoretical or aesthetic considerations. Rather, it is ruthlessly dedicated to switching the balance of power to users and to making their lives easier and their experience of using the Web more satisfying, while also providing the means for authors to deliver content more easily and in ways that will more readily make them money. 1. It does not remove or break anything on the existing Web. Rather, it is additive. Today's browsing experience is merely managed as a degenerate case. Pages work just the same, they are simply limited in the powerful capabilities that they can tap into. The expectation is evidently that, over time, the superior experience of composable contexts will lead most new services to be provided that way and most users to spend most of their time there. But that HTTP blog from 2023 with all its trackers will still be there when you need it. 1. While the full set of capabilities described in this document is not trivial to put together, the approach is implementable incrementally and benefits can start showing with but a small subset of the capabilities. What matters in this document is to understand what this approach *unlocks*. The long list of capabilities isn't about what is required, just about what becomes possible. 1. Making content movable and locally useful, making search and social native and standard, wiring intents together all have network effects. This design has the means to become increasingly valuable as it is increasingly used. 1. It comes with a plan to pay for its own infrastructure and it doesn't ignore the fact that people need to make money. I believe that this is a path forward that is both pragmatic and ambitious, and that the community exists that wants to build it. ## 🚧 Acknowledgments Deep thanks to the following people (in alphabetical order of their given names) for their highly valuable input: [Boris Mann](https://plnetwork.xyz/@boris), [Brian Kardell](https://toot.cafe/@bkardell), [Dietrich Ayala](https://mastodon.social/@dietrich). --- :::success This is the dumpster for ideas that might be useful. * What's the link to Mini Apps? * "Consumption may be personalized, but it would be a stretch, in most cases, to call it self-directed" * Grab from yellow notebook. * It is impossible to rely on educating users but it is desirable to lean harder into their intelligence and greater knowledge of their own lives than into their laziness and the convenience of passive monitoring.. * Mozilla product strategy notes from my Notion. * [Hackability vs Usability](https://ipfs.io/ipfs/bafykbzacebhcml34t5725ciht2yjept2ula5c26rlbzyfubk77akxtas6ean6?filename=%28Routledge%20research%20in%20cultural%20and%20media%20studies%2C%2039%29%20Larissa%20Hjorth_%20Jean%20Burgess_%20Ingrid%20Richardson%20-%20Studying%20mobile%20media%20_%20cultural%20technologies%2C%20mobile%20communication%2C%20and%20the%20iPhone-Routledge%20%28.pdf) * If we do develop the capabilities approach more, look into Akire (Relational Ontology Capabilities) for a list that might work better than Nussbaum's. Feels closer to virtues/Vallor. Starts from abstract notions and looks for locally-relevant concrete developments. * Intelligent, not artifical. * Build for the Dark Forest Web (https://mastodon.social/@robin/109639042561917605) ![](https://i.imgur.com/aUl8o61.jpg) :::