# Strings
## Considerations:
- Most commonly used strings are record columns, which are usually very short. Small string optimization can make a significant change.
- We often use strings that are known at compile time (`&'static str`), but construct `String`s to pass them around.
- Using `Cow<'static, str>` or a similar type with borrowed/owned variants would be helpful.
- Not all common strings will fit inside SSO limits. Ref-counted strings should be considered.
## Contenders
### Byteyarn
- `size_of::<Yarn>() == size_of::<[usize; 2]>()`
- Cow-like, can be owned or borrowed
- Small string optimization, it can contain strings up to 15 bytes inline
- `Option<Yarn>` is the same size as `Yarn`
[docs.rs](<https://docs.rs/byteyarn/latest/byteyarn/>) | Blogpost: [I Wrote A String Type ยท mcyoung](<https://mcyoung.xyz/2023/08/09/yarns/>)
### ecow
- Written by typst devs for typst!
- `ecow::EcoString`:
- reference-counted
- clone-on-write
- inline storage up to 15 bytes
- `size_of::<EcoString>() == size_of::<[usize; 2]>()`
- `ecow::EcoVec`
- reference-counted
- clone-on-write
- `size_of::<EcoVec>() == size_of::<[usize; 2]>()`
- [Use `ecow::EcoVec` for Record internals _#12624_](<https://github.com/nushell/nushell/pull/12624>)
- [Add `List` type _#15363_](<https://github.com/nushell/nushell/pull/15363>)
### LeanString
- `size_of::<LeanString>() == size_of::<[usize; 2]>()` (2 words).
- one `usize` smaller than `String`.
- Stores up to **16 bytes** inline (on the stack).
- Strings larger than 16 bytes are stored on the heap.
- Clone-on-Write (CoW)
- `LeanString` uses a reference-counted heap buffer (like `Arc`).
- When a `LeanString` is cloned, the heap buffer is shared.
- When a `LeanString` is mutated, the heap buffer is copied if it is shared.
- `O(1)`, *zero allocation* construction from `&'static str`.
- Niche optimized for `Option<LeanString>`.
- `size_of::<Option<LeanString>>() == size_of::<LeanString>()`
- `lean_string` itself is a recent and obscure crate. It's inspired by `compact_str` and `ecow` and uses similar techniques.
While it sounds too good to be true, miri does not complain while running its test suite.
[docs](https://docs.rs/lean_string/latest/lean_string/)
# Lists, Records and Tables
- We need encapsulation of types to make it easier to iterate on and improve the underlying implementation.
- Vast majority of `Record`'s use the same column names. And while the column names themselves will benefit from string related optimizations, we will still need to store copies of them in `Vec`s
Splitting records into keys and values would allow us to easily reuse the list of keys for other records.
```rust
pub struct Record {
inner: Vec<(String, Value)>,
}
pub struct Record {
keys: EcoVec<String>,
values: Vec<Value>,
}
```
We could go further and define a `Table` type:
```rust
pub struct Table {
columns: EcoVec<String>,
values: Vec<Vec<Value>>,
}
impl Table {
pub fn into_iter(self) -> impl Iter<Item=Record> {
let Table {
columns: EcoVec<String>,
values: Vec<Vec<Value>>
} = self;
values.into_iter().map(move |row|
Record {
keys: columns.clone(), // basically free to clone
values: row,
}
)
}
}
```
This means some commands that can't guarantee returning tables (like `each`) will return lists of records. However, these `list<record>` values can be converted back to tables with commands like `select` which guarantee identical columns for all rows.
# Composite Types With Complex Owned/Borrowed Relationships
TODO: discord has a lot of ranting about this
### Options
- Roll our own stuff: `NuBorrow`, `NuToOwned`, `NuCow`, etc
- This crate which I haven't actually looked into deeply:
https://docs.rs/borrow-framework/latest/borrow_framework/