Nushell core team meeting 2023-01-18

# Nushell core team meeting 2023-01-18 ## Attendees - Darren - Reilly - JT - Michael - Andres - Stefan - Keith - Jakub ## Agenda * Further discussion on a new nushell config system. Darren's attempt to spawn ideas fell short here https://hackmd.io/NfBoTWUeQhOTXoeKra437A ## PRs * add dedicated const in pipeline, const builtin var errors https://github.com/nushell/nushell/pull/7784 * convert SyntaxShape::Table into the corresponding Type https://github.com/nushell/nushell/pull/7781 * Remove deprecated --numbered flag from five commands https://github.com/nushell/nushell/pull/7777 * Fixes shell crashing because alias name is shorter than alias command and there are pipes present. (Fixes Issue 7754) https://github.com/nushell/nushell/pull/7756 * str length, str substring, str index-of and split chars now use graphemes instead of UTF-8 bytes https://github.com/nushell/nushell/pull/7752 - It seems like we've reached a tipping point where we mostly agree that a `--grapheme(-g)` flag should be used but the further discussion suggested by Leon is to make the default bytes vs grapheme configurable. How do you feel about that part? * Reilly: would vote against configuration. we have too much of it already, adds another combination of config to test * Reilly's LazyRecord PR https://github.com/nushell/nushell/pull/7619 * Reilly: I'm OK if we decide that we don't want another Value variant. IMO this was a successful experiment and a nice performance improvement, but it's OK if we decide that this isn't the right long-term approach. * The rest of the older PRs probably need another look at and ping the OP to see if we need to close them or if they still plan on working on them. ## External PRs * Hopefully Windows Terminal flickering will soon be a thing of the past. In case you haven't seen it. https://github.com/microsoft/terminal/pull/14677 * The vscode nushell extension PR is going to be epic! https://github.com/nushell/vscode-nushell-lang/pull/73 # Discussed Topics ## Config system discussion Darren's comparison of different possible paths https://hackmd.io/NfBoTWUeQhOTXoeKra437A Kubouchs suggestions https://hackmd.io/@nucore/BJpziLNsj Current problem two separate config files: `env.nu` and `config.nu` -> could we bridge the difference or make it more easy to understand Current configs grow really large and get messy (error messages are hard) -> Pro splitting into more atomic parts Also handling of fall-back default config if user deletes config or if config is broken Have an easy understandable logic to understand how configuration is loaded with good default experience without needing a PhD to configure nushell Declarative config (toml, nuon etc.) Scripted config (much greater flexibility but some challenges and order matters much more plus cost of evaluation) We want to be able to break the config into easily digestable chunks (e.g. `keybindings.nu`, `hooks.nu`, `themes/my-theme-dark.nu`) Easy orderable parts and hard to order parts (`NU_LIB_DIRS`, `ENV_CONVERSIONS`) As mentioned last meeting: Path/PATH conversions for a bunch of variables (currently special sauce) Jakub: could this just be a command that gets invoked based on a closure *Curveball:* How strict do we want to keep the parse eval split JT: can we start breaking up `config.nu` into smaller scripts `keyboard.nu`, `theme.nu` etc. Important point by Michael: How do we ship the default or example config and make sure user can upgrade seemlesslyo Requirement: Set up a good default! Currently if we try to source a non existing file we error Suggestion by Jakub to have a module level mechanism to have export symbols you could query Smallest `custom.nu` (user defined) as possible for folks starting out ## Unicode and encodings https://github.com/nushell/nushell/pull/7752 Breaking change that would change the semantics of indices into strings Currently we closely map rusts `&str`/`String`: - Everything is encoded as UTF-8 - Indexing through byte (slices) for `O(1)` constant time operations - Iterator operates over `codepoints`/`scalars` from the Unicode encoding definition - diacritics + base character can be two codepoints or more - Emojis can be a wild composition of multiple codepoints - `split chars` command in nushell - Operation is cheap as UTF-8 encoding says how many bytes a codepoint uses. Graphemes are the unit relevant to typesetting or language understanding. - https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/ (Great primer on that by Manishearth from Rust/Unicode fame) - We should support our users to split by them or search for them properly - https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/index.html Risk of breaking changes high as different semantics might be expected for different situations! (e.g. `\r\n` is it one "character" or two?) **Decision:** lets first start supporting graphemes at all and revisit breaking changes at a later date Note: We are making strong assumptions by using UTF-8 strings in some places (e.g. file/path names can have different restrictions and be just bytes) We currently paper over those difficulties by heavily using [`from_utf8_lossy`](https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy) that replaces bytes that are not valid UTF-8 with a replacement (so bit twiddling string ops are safe) This allows glosses over invalid byte level indexing in nushell with those pesky question mark characters ## LazyRecord and friends Reilly implemented `LazyRecord` Goal: accelerate stuff like the materialized `sys` or `$nu` record Has to clone the engine state at a particular time **Laziness is cool!** Idea: lazy table for stuff like `ps` or `ls` with lazy loaded column Impl details: new variant on `Value` matching for the record manipulating commands necessary (methodification of Value should be a goal!) Sentiment: let's land and look for problems and opportunities `$nu` gets much faster and thus nushell is starting up quicker!!! Q: are we missing a match that currently hits the default fall through (PITA to track down) Relationship to the metadata PR discussed in the last meeting We consider the lazy record to be the experiment to prove out laziness and more semantic extension of `Value` We need to do some refactoring before we can easily continue to work on metadata or removal of spans for Value.