Early draft / Tom Walton-Pocock / Ben Levy 2023-09-12
We seek to advance a full integration of the verified internet ("Web3") into the standard internet protocols.
Over a decade into the story of the 'verified internet', there remains an unnatural division between so-called Web2 and Web3 applications. Most Web3 applications are mostly formed of Web2 software, interfacing with a narrow protocol via messaging. The protocol typically handles the logical transmission of value across the consensus protocol, via small programmes known as smart contracts.
What this in fact means is that in-page data which purports to be verified is in fact trustful (there is no mechanism via which the user can validate the correctness of the core data they are seeing). It also means that, without fiddling with SDKs and making a dizzying array of technical decisions that have real financial ramifications for users, there is no way for a Web2 engineer to introduce value-transmission or verified data into their webpages.
This brings us to the motivating feature of this document: to effect a deep integration of Web3 architectures into the fabric of Web2's HTML protocol, and to enable the browser to take on a bolder role with the user of this new internet: to read, write, and even (in the age of the LLM) interpret and curate what the user sees. Useful, durable protocols should act vaguely imperialistic, gradually subsuming richer functionality into their common interface.
This is a living document to commence the layering-in of these thin, verified, stateful protocols into the Web's communications protocols, for the first time layering the ephemeral machine and a common memory of digital record into the core substrate of the web.
Concretely, we add to HTML a small cluster of elements which, in the tradition of webpages, should be invisible to classical browsers but visible to "Web3 ready" browsers.
These elements convey abstract directions, or intentions, that browsers take on the (often fiduciary) responsibility of executing faithfully and competently, freeing developers to focus on the problems that matter to them.
We call this protocol HTML+.
HTML will of course need to evolve to realize this vision, much as it has already evolved to support each consecutive evolution in Internet content.
HTML4 introduced tables, styles and scripts as the Internet moved to a more aesthetic bent, and HTML5 introduced audio, video, and canvas as richer forms of content gained importance (importantly, enabled by better infrastructure).
Payments were originally intended to be enshrined in HTTP itself, but at this point we believe that HTML is a better, more pragmatic option for integrating stateful features into the Internet.
We extend this to web3 functionality.
The overriding design goal here is to abstract over all the abstruse web3 infrastructure details: no worrying about which DEX to use, how much gas to pay, etc. The browser enables this by assuming a far more weighty role, a shift we already anticipate as the rise of LLMs herald the personalized Internet (elaborated on at the end of this document).
Another design goal is to retain a great degree of minimalism and abstraction within the protocol, allowing new technological developments to be adopted swiftly without protocol changes, thus avoiding the protocol failure mode identified by Moxie Marlinspike of Signal.
Partially inherited from: https://www.w3.org/TR/html-design-principles/
An HTML+ block may contain transaction and/or state references with the appropriate metadata to allow the browser's embedded light client to verify them, or via storage proofs delegated to a coproccessing service. Since users trust the browser to verify this for them, transport-layer security (TLS) is already sufficient rather than requiring signed HTML+ blocks (which present additional challenges).
All HTML elements (with the exception of <w></w>
, introduced below) are void elements since we take the slightly ugly design decision of storing all content within attributes in order to prevent browsers that don't support HTML+ from rendering the content as plaintext. This way, incompatible browsers simply ignore the HTML+ blocks.
Here is a shortlist of initial HTML+ elements (with default attributes populated and required attributes capitalized):
<WRITE CALLDATA="" TO="" text=""/>
<READ ACCOUNT="" CALLDATA="" block_num="latest"/>
<SWAP from="" to=""/>
<SEND from="" to="" asset="USDC" amount=""/>
<STREAM from="" to="" duration="" asset="USDC" amount=""/>
<PRICE ASSET="" quote="USD" venue=""/>
Some notes:
One should wrap these elements with <w></w>
tags (though this is optional), forming a single HTML+ element.
<w CHAIN_ID="1" BLOCK="latest">
[body]
</w>
The <w>
tag may include attributes like network/chain ID; however, we encourage developers to avoid that, leaving gnarly details like the network unspecified altogether, allowing browsers to abstract over such technical details.
Here is a sample standard HTML body:
<HTML>
<HEAD>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>Lord Byron</TITLE>
</HEAD>
<BODY>
<H1>The Destruction of Sennacherib</H1>
The Assyrian came down like the wolf on the fold,
<A HREF="https://www.poetryfoundation.org/poems/43827/the-destruction-of-sennacherib"> read more</A>.
</BODY>
</HTML>
We might add an HTML+ element to this:
<HTML>
<HEAD>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>Lord Byron</TITLE>
</HEAD>
<BODY>
<H1>The Destruction of Sennacherib</H1>
The Assyrian came down like the wolf on the fold,
<A HREF="https://www.poetryfoundation.org/poems/43827/the-destruction-of-sennacherib"> read more</A>.
<w>
<SEND TO="my_wallet.eth" TEXT="Send tips here!">
</w>
</BODY>
</HTML>
We think that the majority of the value in HTML in the medium-term will stem from very simple elements for actions like sending, swapping, and viewing prices. These are the core web3 primitives that likely solve the most actual real-world problems.
An open question that we'd love to see a conversation on is whether, in the longer-term, a client-side verifiable scripting language would add useful functionality. The natural options for this include Solidity and WASM, neither of which we are quite yet able to performantly prove within a browser.
The days of internet protocols specifying rendering norms are behind us (the blue and purple links), but there may be value in browsers reserving regions of the palate for plaintext verified data / calls-to-action:
Verified browsers, taking on a fiduciary role safeguarding user state and identity, will also have to run smart software to spot spoofing attempts designed to mislead the user and supply malicious transactions.
Global computation is embarking on a gradual transfer from classical to intuitive computation. Whilst the consequences of this will be felt over decades and not months, it is likely that LLMs will widen the interpretable spectrum of potential outputs for HTML pages, and increasingly HTML and other standards in the interface will use the HTML inputs for their data and otherwise for 'guidance only'.
Seth Rosenberg described the natural progression from the AI-assisted TikTok feed to a feed populated by AI-generated videos as the latest stage in a well-established trend wherein developers can give increasingly abstract directions while intelligent agents handle the implementation details; similarly, developers will likely describe their websites' functions more directly while the browser constructs personalized interfaces for users.
This has two ramifications: