--- tags: [strings, proposal, research] --- # String literals and templates It's important to allow users to construct arbitrary, valid strings and, if we want to encourage a rich tag library, to allow passing arbitrary, valid substrings to tags while maintaining interpolation boundaries. This is an attempt to capture problems with multi-line string literals in other languages and how we might handle them in Temper. I think it's important that: - Untagged single-line strings works mostly as people expect. - We use a familiar interpolation syntax. - Whether a string is single-line or multi-line has little effect on its processing. ## Indentation Languages like Kotlin allow indenting code blocks with methods that remove indenting. ```kotlin= print( """ Hello, $world """.trimIndent() ) print( """ |Hello, |$world """.trimMargin() ) ``` but if `world` contains line breaks, the `trim*` methods cannot distinguish between line breaks in the template from line breaks in the interpolation. These are error prone. ### Possible solution We use the close quotes as an indication of how much indentation to remove from the string. The run of spaces (U+20) and tabs (U+9) before the close quote must appear on all non-empty lines in the string literal and are removed. It's a lexical error otherwise. ```typescript= // The position of the open quote // is determined by the // larger expression it appears in, // so we do not use it as a queue let s = " line 1 line 2 "; // Close quote is indented 4 spaces, // so we require and remove 4 spaces // from each non-empty line. ``` Since this is pre-processing happens during lexing, indentation can move string literals around with the surrounding expressions without worrying about affecting the semantic content of strings. ## Wrapping Some organizations have strict, automatically enforced line character limits. Long strings, like URLs, can break these limits requiring wrapping a string literal across multiple lines. ```java= // Java String url = "https://verylongdomain.com/" + "path/to/file/that/is/deep/" + "what,8dot3-not-good-enough-for.you" + "?etc=etc&etc#etc"; ``` or adjacency in ```c= // C char* url = "https://verylongdomain.com/" "path/to/file/etc/etc"; ``` When a string template is a tag, wrapping via a concatenation operator may not be semantics preserving ``` tag"foo" + "bar" ``` ### Possible solution If an interpolation without any non-space tokens `${}` were a no-op, then it would allow a way to wrap. ```typescript= url"https://verylongdomain.com${ }/path/to/etc/etc${ }#etc" ``` ## Tags that need raw text String templates might need the raw text, pre-decoding escape sequences. For example, ```kotlin= let s = json""" [ "foo\n", "${bar}" ] """; // \n is significant to the json tag. ``` ### Escaping quoting chars As above, JSON uses `"` as a quoting character. So does Temper. Reading code in nested languages that each require frequent escaping is confusing and error-prone. #### Possible solution We can allow 3 or more double quotes to establish the number of close quotes required to close the string. This should be familiar to Python, Kotlin, and Markdown users. Allowing 2 double quotes to establish a custom string boundary would surprise users who write the empty string `""`. ### Escaping interpolation markers But we might need to pass a string with literal substrings like `${` to a tag. ### Possible solution - inlining nested strings When an interpolation is a string literal, unparenthesized and untagged, inline its content after decoding. ``` tag"foo ${"\""} bar" // is exactly equivalent to tag"""foo " bar""" // regardless of the semantics of tag // and tag"foo${"$"}{foo} bar" // allows the substring ${foo} to reach // the tag ``` Alternatively, we can require a syntactic "inline this string literal" marker to distinguish between string literals that are inlined to decode escape sequences into a raw string. ```typescript= tag"foo ${*"\$"} bar" ``` but this seems syntactically heavy. The nice thing though is that an `inline` chars into raw string could be extended to inline named constant strings. Alternatively or additionally, if `${}` is a no-op in a string template, then the below sends the literal text "`foo ${foo} bar`" to `tag` with no interpoltations. ``` tag"foo $${}{foo} bar" ``` ### Possible solution - C# style extra '$'s [C#](https://devblogs.microsoft.com/dotnet/csharp-11-preview-updates/#raw-string-literals) is letting a prefix to the opening quotes control interpolation. > The number of `$` that prefixes the string is the number of curly brackets that are required to indicate a nested code expression. ```c# String s = $$""" foo ${notInterpolated} bar ${{interpolated}} """; ``` How this would interact with tags is unclear. Perhaps ```typescript tag $$""" foo ${notInterpolated} bar ${{interpolated}} """ ``` Alternatives to controlling number of curlies could be to control the number of dollar signs. But controlling the number of things on close (whether curlies or dollars or ...) supports embedded control blocks better than controlling only leading escapes. Also, C# doesn't do interpolation by default. Also, it doesn't use dollars actually inside the string literals themselves, only curlies. Should we consider full C# style by requiring `$` before quotes and using only `{}` inside strings, without dollars? ```c# String s1 = "foo {notInterpolated}"""; String s2 = $"foo {interpolated}"""; String s3 = $$""" foo {notInterpolated} bar {{interpolated}} """; ``` ## Trailing whitespace in strings is significant. An interpolation with no non-space content is ignored. ```typescript= let s = " Line 1 with trailing space ${} "; ``` `${}` is different from `${""}` because if there were a tag, the tag would receive the expression `""` as an interpolated value, but the `${}` is a no-op. It interrupts two chunks of character data, but does not break them from the tag's point of view. ## Comments in templates Long strings might do well with comments. And comments here would count as whitespace, so comment-only expressions would be no-ops. ```ts let s = """ Here is some ${/* commented */}text. ${// And here is another comment. }And more text. """ ``` It might be nice to have simpler full-line comment syntax such as the following: ```ts let s = """ Here is some text. $// And here is a comment. And more text. """ ``` However, this now requires escaping also: ```ts let s = """ Here is some text. $${}// And here is also text. """ ``` If we support C#-style bonus dollars, maybe this could be: ```ts let s = $$""" Here is some text. $// And here is also text. $/// This is a comment. """ ``` Or something like that.