# String Building This document explores whether there is value in distinguishing between different partially built string representations. There seem to be several aspects to string building: 1. *Decomposability*: the ability to decompose string composition into simpler operations 2. *Delegation*: the ability to delegate composition of a substring to other functions in other modules 3. *Introspection*: the ability to treat the partially built string, for example: if the last character (if any) is not a space, append a space and then append the rest. 4. *Transparency*: is it obvious to a code reader what's being appended 5. *Interopiness*: does our translation connect types well (see I/O interop below) In Java, there are several types that help to accumulate code-units. I'll use these to examine the dimensions above. ```mermaid flowchart TD StringBuilder --> CharSequence StringBuilder --> Appendable StringWriter --> Appendable StringWriter --> Writer ByteArrayOutputStream --> OutputStream PrintStream --> Appendable PrintStream --> OutputStream CharSequence["CharSequence\n(Readable)"] OutputStream["OutputStream\n(Byte sink, I/O)"] Appendable["Appendable\n(UTF-16 sink)"] Writer["Writer\n(UTF-16 sink, I/O)"] ``` - `new StringBuilder()` is an *Appendable* and a *CharSequence* but does not inter-operate with `java.io` well - `new StringWriter()` is an *Appendable* but not a *CharSequence* and interoperates with `java.io` by extending *Writer* - `java.io.PrintStream` fuzzes the distinction between byte sinks and UTF-16 character sinks. It inter-operates with `java.io` as an *OutputStream*. - rarely, `new ByteArrayOutputStream()` can accumulate UTF-8 octets and is a `java.io` *OutputStream*. That can be handy when composing a string from a mix of (strings, files, URLs, process stdout). In addition to those types, there are common APIs: - The static method *String.format* and the instance method *String.formatted* takes does *sprintf*-like substitution based on positional parameters - *MessageFormat* similarly combines positional parameters into a template string using a different syntax. It has affordances for locale-aware interpolation which make it a special case that is used for simple concatenation but does not have those semantics so I don't treat it as distinct from *String.format* for the purposes of this document. | / | Decomposability | Delegation | Introspection | Transparency | Interopiness | | -- | -- | -- | -- | -- | -- | | *StringBuilder* | &check; | &check; | &check; | &check; | | | *StringWriter* | &check; | &check; | | &check; | &check; | | *PrintStream* | &check; | &check; | | &check; | &check; | | *ByteArrayOutputStream* | &check; | &check; | | | ~ | | *String.format* | &check; | | | &check; | | *StringBuilder* acts as a *string-like* and allows delegation. *StringWriter* and friends like *PrintStream* do not allow introspection but do allow delegation. In Java, we could craft a *StringWriter* like API that allows introspection via our own type that `extends Writer implements CharSequence` but that would not require Java APIs to be aware of our type which would be super awkward. ## Non-appending mutations *StringBuilder* allows for (not-super efficient) insertion and replacment by random-access index. TODO: use cases for not-at-end reading and mutation. Ben: I did a github search... mostly seem to see these in [solutions to homework](https://github.com/doocs/leetcode). * Concatenation. * ``` cs indent = new StringBuilder(trail).Insert(0, spaces, recursion - 1).ToString(); ``` * Build a string in reverse. * ``` java var sb = new StringBuilder(); while (stk.Count > 0) { sb.Insert(0, "/" + stk.Pop()); } return sb.Length == 0 ? "/" : sb.ToString(); ``` * Reverse in place. * ``` java while (start < end) { char temp = sb.charAt(start); sb.setCharAt(start, sb.charAt(end)); sb.setCharAt(end, temp); start++; end--; } ``` * Something palindromey. * ``` java StringBuilder sb = new StringBuilder(); sb.append(i); sb.append(new StringBuilder(i + "").reverse().substring(l & 1)); res.add(Long.parseLong(sb.toString())); ``` The only real-life use case seems to be `prepend()` for stringifying some data structures that are naturally reversed, like `SomeNode.parent` fields. We can have a cheap `prepend()` method with a custom string builder class if it's a growable ring buffer. Java's `.insert(0, x)` will shift each time, but C# seems to have a linked list of chunks, so `.insert(0, x)` would be performant. "Make a list of strings, reverse them, and concatenate" is not a terrible way to handle it and is reasonably idiomatic. Looking at common String methods, Java's StringBuilder doesn't implement some obvious conveniences like `.trim()`. C#'s StringBuilder does let you do `.Replace('!', '?')`, which might be a handy finishing operation. It doesn't seem like established languages have found a lot of compelling use cases for internal mutation. ## Conclusion It seems like the main bit of distinction between partially built strings is whether or not the string is readable. We might preserve freedom for I/O interop if we have a default *CharSink* that is not readable and a *ReadableCharSink* that must preserve enough information to allow some level of introspection, possibly at the cost of connecting to a type that does not allow for easy I/O interop with backend types. ```ts= class StringBuilder { public append(suffix: String): Void; public toString(): String; } class MutableString { public append(suffix: String): Void; public toString(): String; } ``` ## Aside on positional interpolation markers vs interstitial expressions ```python3 greeting = "Hello"; audience = "World"; mood = "!"; # Positional "%s, %s%s" % (greeting, audience, mood) # Interstitial f"{greeting}, {audience}{mood}" ``` In the above, to understand the sprintf string you have to look at the `%s` and then look over to the expression list, and then up to the definition of the value. That's a lot of back and forth. ![sprintf-scanning](https://hackmd.io/_uploads/rkcpKt6ZA.png) In the interstitial expression case, the scanning is simpler. And if you understand the expressions already, then you can scan it linearly. ![interstitial-scanning](https://hackmd.io/_uploads/rkgAtF6-A.png)