# Strings ## Considerations: - Most commonly used strings are record columns, which are usually very short. Small string optimization can make a significant change. - We often use strings that are known at compile time (`&'static str`), but construct `String`s to pass them around. - Using `Cow<'static, str>` or a similar type with borrowed/owned variants would be helpful. - Not all common strings will fit inside SSO limits. Ref-counted strings should be considered. ## Contenders ### Byteyarn - `size_of::<Yarn>() == size_of::<[usize; 2]>()` - Cow-like, can be owned or borrowed - Small string optimization, it can contain strings up to 15 bytes inline - `Option<Yarn>` is the same size as `Yarn` [docs.rs](<https://docs.rs/byteyarn/latest/byteyarn/>) | Blogpost: [I Wrote A String Type ยท mcyoung](<https://mcyoung.xyz/2023/08/09/yarns/>) ### ecow - Written by typst devs for typst! - `ecow::EcoString`: - reference-counted - clone-on-write - inline storage up to 15 bytes - `size_of::<EcoString>() == size_of::<[usize; 2]>()` - `ecow::EcoVec` - reference-counted - clone-on-write - `size_of::<EcoVec>() == size_of::<[usize; 2]>()` - [Use `ecow::EcoVec` for Record internals _#12624_](<https://github.com/nushell/nushell/pull/12624>) - [Add `List` type _#15363_](<https://github.com/nushell/nushell/pull/15363>) ### LeanString - `size_of::<LeanString>() == size_of::<[usize; 2]>()` (2 words). - one `usize` smaller than `String`. - Stores up to **16 bytes** inline (on the stack). - Strings larger than 16 bytes are stored on the heap. - Clone-on-Write (CoW) - `LeanString` uses a reference-counted heap buffer (like `Arc`). - When a `LeanString` is cloned, the heap buffer is shared. - When a `LeanString` is mutated, the heap buffer is copied if it is shared. - `O(1)`, *zero allocation* construction from `&'static str`. - Niche optimized for `Option<LeanString>`. - `size_of::<Option<LeanString>>() == size_of::<LeanString>()` - `lean_string` itself is a recent and obscure crate. It's inspired by `compact_str` and `ecow` and uses similar techniques. While it sounds too good to be true, miri does not complain while running its test suite. [docs](https://docs.rs/lean_string/latest/lean_string/) # Lists, Records and Tables - We need encapsulation of types to make it easier to iterate on and improve the underlying implementation. - Vast majority of `Record`'s use the same column names. And while the column names themselves will benefit from string related optimizations, we will still need to store copies of them in `Vec`s Splitting records into keys and values would allow us to easily reuse the list of keys for other records. ```rust pub struct Record { inner: Vec<(String, Value)>, } pub struct Record { keys: EcoVec<String>, values: Vec<Value>, } ``` We could go further and define a `Table` type: ```rust pub struct Table { columns: EcoVec<String>, values: Vec<Vec<Value>>, } impl Table { pub fn into_iter(self) -> impl Iter<Item=Record> { let Table { columns: EcoVec<String>, values: Vec<Vec<Value>> } = self; values.into_iter().map(move |row| Record { keys: columns.clone(), // basically free to clone values: row, } ) } } ``` This means some commands that can't guarantee returning tables (like `each`) will return lists of records. However, these `list<record>` values can be converted back to tables with commands like `select` which guarantee identical columns for all rows. # Composite Types With Complex Owned/Borrowed Relationships TODO: discord has a lot of ranting about this ### Options - Roll our own stuff: `NuBorrow`, `NuToOwned`, `NuCow`, etc - This crate which I haven't actually looked into deeply: https://docs.rs/borrow-framework/latest/borrow_framework/