Revisiting tag libraries in light of generated code insecurity

# Revisiting tag libraries in light of generated code insecurity [TOC] ## Abstract ```js let name = "Robert'); DROP TABLE Students;--"; // aka Bobby Tables database.doTheThing( "SELECT * FROM TABLE WHERE NAME='${name}'" ) ``` AI generated code tends to do badly on problems like these: composing strings that are implicitly granted authority. Failure rates are around 40% on security sensitive coding tasks. What to do with *name* above requires understanding security best practices; we should not expect models trained on average code to magically embody specialist knowledge. It would be nice if we could split the problem of insecure, AI-generated code in two: 1. A *secure programming language idiom* with a small syntactic difference that can be generated by AI, and/or which AI security auditing can detect with minimal human effort, that 2. delegates the subtle semantics to thoroughly scrutinized *library code written by trained security engineers*. ```patch database.doTheThing( - "SELECT * FROM TABLE WHERE NAME='${name}'" + sql"SELECT * FROM TABLE WHERE NAME='${name}'" ) ``` Here is a patch to the insecure code pattern from above. The deleted line uses *string interpolation*, naïvely concatenating *name* into a larger string template. The revised line only adds the prefix *sql* to the string expression, but that small change allows specialist code to intervene in the composition of the template parts and name to produce a trustworthy query string. This document explains language design choices that allow the small syntactic addition to be an interface to the security engineers' library code, and implementation approaches sufficient to render the tagged idiom a safe content composition idiom, safe, in this case, against SQL injection attacks. Having this in mainstream programming languages would benefit the generative AI ecosystem in several ways: 1. A human security specialist (blue teamer) could use AI assisted code scanning to find expressions that lack the `sql` tag but which likely construct strings of SQL: > Please identify sub-expressions that problably construct SQL strings (they probably start with a SQL verb: SELECT, etc.). If they do not use a secure coding idiom, flag them for review. 2. Code generating root prompts could include standing orders to use secure idioms. 3. Agentic code-generating approaches that feed warnings back could use static and dynamic analysis to identify creation of strings that match certain patterns but which are constructed within generated code that does not use a secure idiom. 4. Model outputs with insecure patterns could be fixed and fed back to the next training cycle to increase the proportion of training data that conforms to best practices. ## A note on the generative AI debate Many people are skeptical of the benefits of generative AI. This document does not take a position on questions like "is XYZ corp ethically producing its models?" or "Should developer Smith use those models?". When it comes to security engineering, the focus should not be on the organization or the developer; it must be on the end user. When an end user is hurt by bugs, it doesn't matter whether the bug was introduced by a human or a generative-AI tool. In either case, security engineers have a duty to help deliver secure, robust systems. Some may object that there is a moral hazard to doing security engineering; that trying to secure AI-generated code encourages or enables bad engineering practices. That debate has reoccurred in many contexts, but informed security engineers, regulators, and insurers consistently fall on the side of "we should provide lifeboats even if doing so could theoretically encourage sloppy sailing," especially when the cabin crew are not the only ones to protect. ## Background Tagged strings syntax enables secure content composition. Recall how JavaScript [*template literal expressions*][template literals] work. First, the untagged string expression, below, simply concatenates strings. There is no difference, to the semantics of simple concatenation, to the fact that the first and last strings appear literally in the program code, and that the middle one, referenced as *recipient*, does not. ```js `<b>Hello, ${ recipient }!</b>` ``` If *recipient* contains malicious code, like `<script>alert(1)</script>`, then it can trick down stream systems, in this case, the user browsers, to execute instructions with the user's privileges. The problem is that the above does not properly escape, filter, defang *recipient*. In this case, it would be sufficient to encode any `<` characters as `<` but that solution can't be baked into the semantics of string interpolation because it's specific to the HTML language, and would not help with other attack classes, e.g. SQL injection. But in a tagged template literal, the tag can make such a distinction. Libraries like [lit-html](https://lit.dev/docs/v1/lit-html/introduction/#lit-html-templates) used this to great effect. ```ts html`<b>Hello, ${ recipient }!</b>` // ┗━━┛ ┗━━━━━━━━┛ ┗━━━━━━━┛ ┗━━━┛ // tag authored by untrusted authored // trusted dev value by trusted dev // // fn HTML plain text HTML ``` As the `html` tag example shows, this short syntax is familiar to users, but unlike the syntax without the `html` tag, it delegates the work of combining the trusted and untrusted pieces to a function, written by people with deep knowledge of web languages and composition hazards, that grants [different privileges to the trusted, developer-authored strings][contextualual-escaping-sec-model]. That function interprets the untrusted value, `recipient`, in the context of the trusted HTML chunks. The `html` tag is a *context-aware auto-encoder*, so knows to encode `<` as `<` in recipient but not in the strings that were authored by a trusted developer. A programming language might allow more complex content composition than JavaScript; this example include flow control constructs, for example, to generate an HTML list item per value in a list data-type value. ```html html""" // tag and open quote "<ol> // dev-authored HTML "{: for (let x of xs) { :} // Start of embedded loop " <li>${ x }</li> // dev-authored HTML with an untrusted value "{: } :} // End of embedded loop "</ol> // dev-authored HTML ``` The exact syntactic choices assumed here are unimportant except to understand the examples, but are explained below in gory detail. The main difference is the addition of a loop construct spread across two `{: ... :}` sections, called *statement fragments* because neither alone is a full statement. This document contributes: 1. a desugaring for complex content composition expressions, and 2. a variety of semantics for content composition that may fit into existing programming languages (PLs), and 3. an explanation of how such *tagged* content composition allows for secure, auditable code generation both by humans and generative AI agents. 4. suggestions for type system integration to further guide AI agents to safe APIs ## Goals This document describes a scheme for flexibly supporting tagged functions in a typed programming language with metaprogramming facilities. Goals include: - Code generators and human developers can read and write these idioms. - The idioms are secure when used as intended, and the number of caveats and corner cases is small. - The approach extends to quoting confusion bugs that are not in the OWASP top 10, not just classes of vulnerabilities that warrant attention across the whole software industty. - All tagged string syntaxes, single line and multi line, use the same underlying mechanisms - Simple tags, e.g. `String.raw"foo\bar"`, are really simple. They can inline easily to constant expressions. - Extend to complex statement expressions. - The desugaring is statically analyzable. Specifically, if no helper object escapes, analyses that do not do escape analysis can be effective in doing at compile time, what would otherwise be done at runtime. E.g. `html"""...` could in theory use meta-programming to pick escapers at compile time for interpolated expressions. Other analyses, like extracting human-language strings for translation, could be done via meta-programming. - Complex tag macros that need type information for embedded expressions should be able to access it ## Solution This section describes how complex tagged string composition is handled in a way that meets the goals above. 1. First, we show a syntactic desugaring of tagged strings that combine (a tag expression, string fragments, embedded statement fragments, embedded expressions) into an expression in a language that only has (whole string literals, whole expressions/statements) 2. Second, we show how the syntactic desugaring uses an internal *tagMe* macro to allow early inlining of some simple tag cases, using `String.raw` as a motivating use case 3. Third, we explain the compiler evaluation flow for the general case of the *tagMe* macro evaluation and the desugaring. 4. Fourth, we discuss how type-dependent macros might be provided type information. As a motivating example, we discuss how a hypothetical `sql` macro might decline encoding if an embedded expression is known at compile-time to be a *SafeSql* fragment, safe without escaping. We also discuss how an `html` tag library might use type information, e.g. how a *Date* value might be formatted using HTML micro-formats without RTTI checks. 5. Fifth, we discuss how static analysis of a hypothetical `html` tag might be implemented in a staged programming language with meta-programming to optimize for runtime. And we discuss how meta-programming might support connecting a tag to the larger toolchain, in the case of `html`, how to extract strings from an HTML template for L10N (localization, translation by human language translators). 6. Finally, we evaluate the solution from its ability to fit into existing code quality tools. By making content composition decisions explicit at compile time, security code static analyzers that look at compiler outputs can come to more confident conclusions than if those decisions are made at runtime. ### Statement Inversion An example tagged string use: ```html html""" // tag and open quote "<ol> // dev-authored HTML "{: for (let x of xs) { :} // Start of embedded loop " <li>${ x }</li> // dev-authored HTML with an untrusted value "{: } :} // End of embedded loop "</ol> // dev-authored HTML ``` This is the same example from above. The comments at the right are explanatory and not part of the expression. The left column shows the parts of an expression that composes HTML content by mixing statement fragments, string fragments, and embedded expressions. The statement inversion operation turns this complex expression into one that only uses parts of a simpler language, by turning it inside out so that the nested statement fragments combine into whole statements that instead nest simple expressions. <table> <tr><td> ```ts html""" "<ol> "{: for (let x of xs) { :} " <li>${ x }</li> "{: } :} "</ol> ``` </td><td> ```ts builtins.tagMe__0(html, (fragment__1, interpolation__2) => { fragment__1("<ol>\n"); for (let x of xs) { fragment__1(" <li>"); interpolation__2(x); fragment__1("</li>\n") } fragment__1("</ol>"); } ) ``` </td></tr> </table> (Throughout, identifiers that end with `__` and a number are chosen hygienically; they must not conflict with any identifier in the program source.) The tagged string expression turns into a call to the *tagMe* macro that receives: 1. the tag expression 2. a lambda expression whose body is a block of simpler statements and expressions. The lambda has two arguments: 1. *fragment__1* receives only string literals authored by the trusted developer 2. *interpolation__2* accepts arbitrary expressions This inversion conversion is important because it allows for control flow jumps like `{: break; :}` and `{: continue; :}` to preserve their meaning in the context of surrounding loop constructs. #### Syntax of a multi-line string expression Above we introduced a syntax which should be intuitively to users of languages like JavaScript and PHP. These next two sections assume specific syntactic choices, though the broad strokes should adapt fine to other choices. If you skip the details in these next two sections, the following "notes" still provide some insights into why these specific syntactic choices were made for one PL. <details><summary>Parsing algorithm in prose</summary> First we need a way to identify the constituents of a tagged string expression: the syntactic sugar that we're going to desugar into a combination of simpler syntactic elements. Desugaring operates on a tag expression that is adjacent to a multi-line string start token, `"""`, and if desired, also on an expression that starts with a multi-line string start token but which has no explicit tag expression. `{: ... :}` cannot nest without an intervening `"""` multi-line string start token so this approach does not require defining a grammar production for a statement fragment. Once we've identified the start of a tagged string expression, the parser gathers lines, taking care to recursively process expressions nested in `${...}` and fragments in `{: ... :}`. Each string content line starts with a margin character ASCII whitespace then `"`. String content lines are collected (modulo nesting) until a line has a first non-ASCII space character that is not U+22 (`"`). Nested constructs are logically part of the same line even if they span multiple lexical lines: - The token `${`, not immediately followed by a `}` token switches into a nested expression context. - The two token sequence, `${` then `}`, is ignored. This means that the character sequence `$${}{` contributes content characters '$' and '{'. - The token `{:` starts a statement fragment and its content consists of lexical tokens processed as normal, except that complex multi-token lexical constructs like multi-line strings must be consumed whole. Other than that, the token sequence stops at the first `:}` token. It may be convenient when parsing a statement fragment to store a mix of tokens and AST nodes that correspond to complex lexical constructs like nested multi-line strings. In that case, to determine whether a string expression is a parse error, just assume that an AST is well balanced. A multi-line string expression, whether with an explicit tag expression is *well-balanced* when the concatenation of tokens in statement fragments, after discarding `{:` and `:}` tokens contains no unbalanced bracket sequences. (It can be convenient to, in a language with ambiguous brackets (e.g. <code><</code> can be infix or a type argument list open bracket) to classify those early based on a simple lexical convention.) If a multi-line string expression is not *well-balanced*, the parser can do any of the followings: - parse the entire multi-line string expression to an error AST node - issue an error message about the imbalance and halt parsing - issue error messages about extra/missing brackets and insert/remove tokens to repair imbalances The next algorithm, the desugaring, assumes balance. </details> #### Desugaring algorithm Above, we've given intuition about the desugaring, but here are the gory details. The desugaring algorithm expects: - a tag expression, or an indication that a default should be synthesized - a balanced multi-line string expression consisting of a sequence of zero or more content lines where each content line consists of zero or more of: - a run of content characters that correspond to characters inside quoted strings that were not processed into non-string programming language tokens - an embedded expression that was parsed from a non-empty `${`…`}` construct - an embedded empty `${}` lexical break - a statement fragment consisting of tokens (and possible, per the implementation convenience note above, eagerly parsed ASTs) The algorithm consists of the following steps: <details><summary>Desugaring algorithm in an ecmarkup-esque form</summary> 1. Let *tagAst* be an AST node corresponding to the tag expression, or if implied an AST node for the default tag expression whose diagnostic source location can be that of the multi-line string start token. 2. Let *fragmentName* be a name that is unmentioned/unmentionable by user code and which, when viewed in compiler diagnostics should look like `fragment__123`. 3. Let *interpolationName* be a name that is derived similary to *fragmentName* but whose diagnostic hint looks like `interpolation__123`. 4. Let *lambdaBody* be an AST node corresponding to a lexical block of statements and/or expressions. *lambdaBody*'s content is derived as followed: 1. First, we simplify content lines and store enough information to decide where to insert line breaks between content lines. 2. Map each content line to a list of (*ContentLine*, *Boolean*) pairs, called *pairList*: 1. If the content line ends with a run of content characters, remove any ASCII-spaces from the end. 2. let *isOnlyStmt* be a boolean, initially *false*. 3. If the content line has exactly one element, then *isOnlyStmt* is true when that element is a statement fragment. 4. If the content line has exactly two elements, and the first is a run of ASCII space characters, and the second is a statment fragment then: 1. Set *isOnlyStmt* to *true*. 2. Remove the run of ASCII space characters from the content line. 5. Remove any empty `${}` constructs from the content line. 6. Until there are no pairs of adjacent runs of content characters, pick a pair and replace them with their concatenation. 7. Emit a pair of (the content line, *isOnlyStmt*) to *pairList*. 3. Now, we examine content lines again and turn them into statements in the *lambdaBody* AST node. 4. Let *tokensAndAsts* be a buffer of tokens and *AST* nodes that can be reparsed to a run of *statement* AST nodes. 5. For each *index* in *pairList*'s domain: 1. let *contentLine* be the first element of *pairList\[index\]*. 2. For each part of the *contentLine*: 1. If the part is a run of content characters: 1. let *contentCharacters* be the content characters from the part. 2. If the part is the last in the content line and there exists a *laterIndex* in *pairList*'s domain such that *laterIndex* > *index* and *pairList\[laterIndex\]*'s second element (isOnlyStmt) is *false*, then, add a line feed character (U+A) to the end of *contentCharacters*. (This existence check can be optimized based on max-index for which *isOnlyStmt* is false) 3. Add a function application AST node to *tokensAndAsts* consisting of: 1. a callee, a reference to the name *fragmentName*. 2. a single actual, a string literal consisting of *contentCharacters*, escaped as necessary so that the textual content matches the *contentCharacters*. 2. Else if the part is an interpolated expression, add a function application AST node to *tokensAndAsts* consisting of: 1. a callee, a reference to the name *interpolationName*. 2. a single actual, the part's expression. 3. Else part must be a statement fragment, so add each of its tokens, in order, to *tokensAndAsts*. 6. Reparse *tokenAndAsts* into *lambdaBody* as a run of statements/expressions. 7. If there was an error on reparsing, then a statement fragment was malformed though well-balanced. 5. Let *lambda* be an AST node for a lambda (arrow function) definition consisting of: 1. Two formal arguments named: *fragmentName* and *interpolationName* 2. A body with return type *Unit*, the previously constructed *lambdaBody*. 6. Let *tagApplication* be an AST node corresponding to a function application consisting of: 1. A reference to the builtin *tagMe* macro. 2. An actual, the previously constructed *tagAst* node. 3. An actual, the previously constructed *lambda*. 7. Replace the multi-line string expression syntactic sugar with the *tagApplication* AST node. </details> #### Subtleties of desugaring A few subtleties to note about this desugaring: - When turning a string fragment (eg `"<ol>` at end of line) into a string literal, we escape any string meta-characters: `"<ol>\n"`. Tags may need to apply their own unescaping. For example, a regular expression tag may need to interpret `\b` as a word-<u>b</u>oundary assertion, not as an ASCII <u>b</u>ell character. - The `"` at the beginning of each string line is a margin marker which serves to help IDE auto-indent preserve the meaning of the larger expression, and make it clear that the line continues the string started by `"""`. It does not contribute content. - A string line (ignoring margin marker) which consists only of ASCII space characters and a single `{: ... :}` contributes no newline or space content. - Trailing ASCII space characters are ignored so that IDE trim-on-save do not alter the meaning. The desugaring allows empty interpolations `${}` to contribute no content, so those can be used to craft a line with trailing spaces. - The above rule means that CRLF (aka `\r\n`) newlines are normalized to a single LF but the `${}` trick allows preserving embedded CRs. #### A note on purity Some programming languages aggressively inline/simplify certain kinds of expressions. Note that if the tag expression, nested expressions and statement fragments fall into a pure subset of the programming language, and the tag's semantics are pure, then the expanded string expression is pure. ### The tagMe macro The *tagMe* macro serves a couple purposes. It keeps the syntactic desugaring above simple. E.g., there's no need to introduce a temporary variable to prevent multiple evaluation of the tag expression because the *tagMe* macro can do that. It negotiates between the tag and the desugared body. If the tag is known at compile time to be a simple function, it can apply it. If the tag is known at compile time to be a type-aware macro, it can wait to expand to an application of that macro until type information is available. Otherwise, it could take a default strategy: assume the tag is an instance of *interface TagHandler*, and expand to a series of uses of that. Let's look at how a programming language might support one or more of these *string tag calling conventions*. #### Tag function calling convention The tag is a function: *(staticParts, expressions) ⇒ result* This is similar to JavaScript's tagged string convention. In JavaScript, these two are roughly equivalent. ```ts tag`Foo${ bar }Baz` tag(["Foo", "Baz"], bar) ``` (The salient difference being that the *Array* containing the static string parts has some extra properties to assist tag function implementors in getting the "cooked" version when they want post-escape-decoding strings and the "raw" version when they don't. (At the time this behaviour was designed there was lower tolerance among the TC39 committee for new standard library functions.)) If *tagMe* knows that tag is a function, then a JavaScript compatible implementation could do: ```ts // Given (tag, applyBody) let fragments = []; let interpolations = []; applyBody( (x) => fragments.push(x), (x) => interpolations.push(x), ); return tag(fragments, ...interpolations) ``` This simply applies the body with collectors for the various parts. #### Object oriented calling convention If the tag referred to an object, then it could enable a flow like: 1. Create an accumulator 2. Apply the body, feeding the accumulator. 3. Ask the accumulator for the result. Assuming an accumulator like: ```ts interface Accumulator<IN, OUT> { // The accumulator type provides a facility to create an // instance. Or the tag is a factory function. static new(): Accumulator<IN, OUT>; // Body application invokes these fragment(string: String): void; interpolation(value: IN): void; // After body application we can get the expression result finish(): OUT; } ``` When *tagMe* knows that it's tag is an accumulator type, it could do something like this: ```ts let accumulatorType = tag(); let accumulator = accumulatorType.new(); // Now we apply the tag body. applyBody( (x) => accumulator.fragment(x), (x) => accumulator.interpolation(x), ); return accumulator.finish(); ``` If *tagMe* is a macro it could generate instructions to do that, composing the accumulator with the linear body. The `<ol>` example above might correspond to composed code like this: ```ts let accumulator: HtmlAccumulator = new HtmlAccumulator(); // Inlining applyBody call from above accumulator.fragment("<ol>\n"); for (let x of xs) { accumulator.fragment(" <li>"); accumulator.interpolation(x); accumulator.fragment("</li>\n"); } accumulator.fragment("</ol>"); return accumulator.finish(); ``` This code has well-defined semantics according to the definition of the *MyTag* class. Maybe the *HtmlAccumulator* object dynamically parses the string fragments to pick appropriate escapers for calls to *.interpolation*. Thus, if *tagMe* marks this for analysis by any library-specific meta-programming code, it could draw conclusions based on how *accumulator* is used. ```ts let accumulator: HtmlAccumulator = new HtmlAccumulator(); // An HtmlAccumulator specific static analyzer can conclude // that accumulator's internal buffer is empty here. accumulator.fragment("<ol>\n"); for (let x of xs) { accumulator.fragment(" <li>"); // An HtmlAccumulator specific static analyzer can conclude // that accumulator's internal buffer only contains an // open `<ol>` tag and complete `<li>` elements. // `x` will be converted to HTML that parses to PCDATA!!! accumulator.interpolation(x); accumulator.fragment("</li>\n"); } // An HtmlAccumulator specific static analyzer can conclude // that accumulator's internal buffer only contains an // open `<ol>` tag and complete `<li>` elements. accumulator.fragment("</ol>"); // Now, the open tag stack for any HTML parser of accumulator's internal buffer is empty. return accumulator.finish(); ``` Given those static conclusions, a custom macro could rewrite that to the following which is more likely to pass existing third-party security scanners: ```ts // Simplified to accumulate a String. @ContentType("text/html") let accumulator: StringBuilder = new StringBuilder(); accumulator.append("<ol>\n"); for (let x of xs) { accumulator.append(" <li>"); // Explicit escaping function here. accumulator.append(escapeHtmlPcdata(x)); accumulator.append("</li>\n"); } accumulator.append("</ol>"); // Inline the finish operation here which does w3/trusted-types style value tagging. return new SafeHtml(accumulator.toString()); ``` If the embedding language allows associating a function/macro definition with a type, then the *tagMe* could expand to the accumulator form above, but also check for the presence of an analysis macro definition and auto-wrap the tag expression in a call to that macro. ### Type-dependent macros Consider a tagged string expression: ``` html""" "<p>Your answer: ${ given }.</p> "<p>Correct answer: ${ expected }.</p> ``` If *given* and *expected* are *booleans* we might want an English rendering language rendering of this template to produce *yes* and *no*. But if they are numbers, we might want *given* and *expected* to be formatted according to the numeric conventions for their preferred language. But some programming *erase* the difference between booleans and numbers: PHP, Perl, C, among others. When type information is erased, conflated, or otherwise inaccessible at runtime, it can help for content composition code to have access to all the type information available to the compiler, pre-erasure. Assuming some way of reifying types, representing them as values, any of the *tagMe* expansions above could pass type hints as an extra argument to the *interpolation*/*interpolate* calls to which `${...}` interpolations desugar. Interpolated expressions tend to be leaves in programs' value graphs, so the set of types is often small but there are at least some scenarios where types can be complex: - Some tag libraries may want to allow interpolating lists. For example, a SQL template tag might want to allow a sequence in a context like the below: ```ts sql`WHERE column IN (${ allowedSet })` ``` - To allow streaming, a type dependent macro might want to treat the lambda body as a coroutine, and when a hole expression has type *Promise\<T>*, await the result. In the latter streaming case, the *tagMe* macro may need to recognize that the template tag expression as a whole has a promise result type. ## Enabling static analysis It would be nice to have a framework that makes it easy to produce secure tag libraries. The meta-programming parts shouldn't have to be reinvented, and it would be nice to establish once for many libraries that the static optimizations preserve semantics. This section outlines a framework for many secure composition tasks based on context propagation. The intuition is that many secure content composition tasks involve picking value transformers based on the set of strings that could precede it on the output content buffer. | After this fragment of HTML | An interpolation should be converted to | | ---- | ---- | | `<a href="` | an HTML-escaped URL prefix | | `<a href="https://example.com/?foo=` | an HTML-escaped, query parameter value prefix | | `<img onerror="redirectToErrorPage(` | a JavaScript value that is also HTML escaped | There are a few things to note: - Multiple languages might be involved. For example, in the third row, if you're going to escape a string value as a JavaScript string, using double quotes, those double quotes need to be HTML-escaped to `"` because the JavaScript is embedded inside a double-quoted HTML attribute. Indeed, arguably the following cicularly embeds HTML inside a JavaScript inside a URL inside HTML: `<a href="javascript:otherElement.innerHTML="Hello, `. - Escapers do not always correspond to grammar productions in a straightforward way. In `sql"SELECT ErrorMessageString FROM ErrorMessageStrings WHERE ErrorCode="USER-ERROR-${ n }-FATAL"` arguably any number *n* is going to contribute characters to a *QuotedString* grammar production, but, given that sequences of characters are either LR or RR, there's no grammar production for a sequence of characters in the middle of two other substrings. So we can't just parse the literal parts of a template string with placeholders and get a nice parse tree. We need some solution that can handle multiple languages with escaping and embedding conventions. Also, epsilon transitions make parsing suck. Consider this case. ```html html"<a href=${url} class=foo>" ``` If we naïvely interpolate an empty string for *url*, we get: ```html <a href= class=foo> ``` According to the HTML parsing algorithm, that string is equivalent to (modulo a parsing error on '=' which browsers are required to ignore): ```html <a href="class=foo"> ``` A better output would be `<a href="" class=foo>`. It preserves the expressed structure. `class` appears to be an HTML attribute name and `foo` an attribute value, so we should be deeply suspicious of any `html` tag implementation that escapes a value for *url* in a way that leads the eventual HTML parser to treat *class* as anything other than an attribute name. A tag implementation preserves the **structure preservation property** when every developer controlled token has the same meaning in a composed output regardless of untrusted interpolated values. This property is not sufficient for security; an html template tag that inlines well-formed script tags still allows attacker-controlled, high-power instructions. But attackers often exploit parser corner cases to slip payloads into a high-privilege contexts, and even where structure preservation failures do not have security consequences, they often correspond to quality-of-service problems. To deal with epsilon transitions, empty inputs, our HTML template tag accumulator would need to recognize after `<a href=` that it is in a context like *(inside-open-tag:a (attribute-name:href (expecting-attribute)))*. If it sees a double quote next, it could note that the attribute value has started and is double quoted. But if it saw an interpolation, it might add a double quote to its buffer, and note, in its parser context, that the quote is implied, so that it knows to add a matching close-delimiter when it sees fragment text indicating the attribute name/value pair is complete. There is another wrinkle: control flow. Paths of execution diverge, which is rarely a problem for static analyzers, but then they have a nasty tendency to smush together again, like unloved bicycles at the bottom of an Amsterdam canal. ```html <img {: if (description != null) { :} alt=${description} {: } :} src="${srcUrl}" > ``` There, the *if* construct has an implied `{: else { /* output no characters */ } :}`. Before the *if*, we're in a parse context where we've just ended a tag name and are ready. After the *onclick* we're in a state where we might potentially get more attribute value content. To compute a context when parsing the string fragment after the *if* ends, `" src=\""`, we need to *join* two parse contexts. One could parse that string starting in both contexts and see if they lead to an equivalent context, but left-factoring substrings into multiple branches can lead to a combinatorial explosion in complex control flows. ```html {: if (isScripty()) { :} <script> {: } else { :} <style> {: } :} a > b { color: "${c}" } ``` Here, the developer is doing something very silly, and compilers should judge them for it. Arguably, that content after the *if* is syntactically valid, but there is no coherent view of what *c* is. Joining contexts can lead to an error. In the case of a loop, we need to join a context in which the loop was not entered, with the context after an iteration of the body, and ensure that the marginal iteration leads to a fixed-point context. ```html <ul> {: for (let item of items) { :} <li>${ item }</li> {: } :} </ul> ``` In the case where *items* is empty, we process `"</ul>"` in a context where a `<ul>` tag has just been completed. In the case where the body iterates once, we see `" <li>"` in that some context. In the case where the loop re-enters, we see `" <li>""` in a context where the previous `</li>` tag was just completed. In both these cases, the HTML open tag stack is the same, and we're expecting the same kind of general tag soup following. We've talked about *context* but not defined it. It really is very specific to the language being composed. In the case of an *html* template tag library, it's the embedding union of (HTML, CSS, JS, and URLs). In the case of a *sql* template tag library it's some specific dialect of the SQL language family. So tag libraries need freedom to define contexts in implementation and goal dependent ways. But a parsing context can be defined based on the operations performed on it: - *propagateOver(before: Context, trustedFragment: String) → { after: Context, adjustedFragment: String }* ; parses a trusted fragment string to compute a pair consisting of the context after that string is appended to a content buffer, and the fragment that should actually be appended to the buffer (see ε-transitions and implicit quotes above) - *pickValueTransform(before: Contxt) → { transform: 'a → String, after: Context }* ; given a context, pick a transformation (escaper, filter, whatever) that produces a trustworthy string, and note the context after that transformers output is appended to the content buffer. - *join(context: Context list) → Context* ; joins multiple contexts For an implementation of a tag library that executes purely at runtime, the *join* operator isn't necessary: all you're seeing is a sequence of fragments and interpolated values. If you're doing static optimizations so that you can *pickValueTransform*s early to effectively inline the *interpolate* calls, and avoid having to apply *propagateOver* at runtime, it helps to be able to answer some additional questions: - What is the start context for an accumulator type? - Is a given context an error context and if so, what error message if any should be propagated to the developer? - Is a given context a valid end context? TODO: Monomorphization to solve disjoint context propagation. Aliasing. Difference between interpolate call and statement call. ### Transparency in analysis TODO: Code quality and security scanners look for recognizable escaping functions. Macros that statically pick escapers can use those functions so that the code seen by the scanner is recognized as safe making it easier for existing codebases that have invested in getting these scanners to work to adopt these approaches incrementally. ## Related Work [JavaScript template literals][template literals] were designed with just such a separation in mind. Template literals have not received widespread adoption, in part, because I was not an effective developer advocate outside Google. Also, the inability to embed control flow made complex tag scenarios like safe HTML generation problematic; a template might start simple, but as more complex logic was needed, developers might need to abandon tagged strings for a more complex approach. *Contextual Auto-Escaping Templates* (from ["Secure by Design: Google's Blueprint for a High-Assurance Web Framework"][secure-by-design]) were more successful; they serve as the primary XSS defense for Gmail and Google's other high-profile properties, with a stellar record. But they require integrating a domain specific language (DSL) like Closure Templates into the applications build and deployment system. This proposal is best seen as an attempt to combine the strengths of both: 1. The presence of tagged strings alongside the code they affect, allowing developers to read the template in-situ, and allowing code-generating agents to generate templates near where they are used or specified. 2. The ability to embed flow control, allowing simple templates to scale and adapt to changing requirements without reworking into a more complex tool, or, worse, developers downgrading to ad-hoc string composition. [template literals]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals [secure-by-design]: https://bughunters.google.com/blog/6644316274294784/secure-by-design-google-s-blueprint-for-a-high-assurance-web-framework [contextualual-escaping-sec-model]: https://pkg.go.dev/html/template#hdr-Security_Model