# Table value proposal Author: [Devyn Cairns](https://github.com/devyn) Tables in Nushell are currently only ever organized as lists of records. This is very flexible and fits a JSON-compatible data model, but has disadvantages when a table is known to have more regular structure: - Column keys have to be duplicated for each record. - This is an inefficient use of memory. - Column lookup involves a search of each row for the correct key, as there is no guarantee that they are in the same position or even that they exist on each row. - This is unnecessarily slow. - It is not possible to know which columns a table stream contains without collecting the entire stream. - This is a pain point for `to csv` and similar tabular formats, and the reason the `--columns` argument exists. My suggestion (and what I've heard from others) is that we continue to be able to treat `list<record>` as a table in most cases, but that we also add a new `table` type and a `TableStream` that addresses all of these problems and more. ## `Value::Table` Fundamentally this would be structured as a list of column names and a list of row data. There are two options for how to structure the row data: - **Row-first**, i.e. indexed by row, and then column. - **Column-first**, i.e. indexed by column, and then row. The row-first approach is the most similar to what we already have, and has these advantages: - Finding all of the values for one row only requires a single lookup - Pushing a row only requires one `Vec` to grow - Deleting a row only requires one memory move. Most databases work this way (but not all) and it typically has high transactional performance. The column-first approach is the one more commonly seen in analytical software (including Polars, at least by default) and analytical databases, and has many advantages of its own that I think would benefit Nushell: - Values are located near each other when doing an aggregation. - Processors prefer sequential access to memory. Cache & TLB are both used more efficiently if more data fits in them, and processor prefetch is more efficient too. - SIMD is possible because values are always a constant offset apart - The number of rows usually far outweighs the number of columns, so the number of non-sequential accesses is fewer even for a full table scan. - Let $N$ be the number of rows, $M$ the number of columns, $C$ the amortized speedup of sequential accesses over random ones - For row-first order, the full table scan in the worst case costs $N$ potentially random accesses, while for column-first order, it costs $M$ potentially random accesses. - Row-first cost: $N + \frac{N \times (M - 1)}{C}$ - Column-first cost: $M + \frac{(N - 1) \times M}{C}$ - Since we expect $N > M$, it's more beneficial to put the optimization on that side - There are fewer, larger `Vec`s overall to manage, so memory allocation is cheaper. - It is possible to select a subset of columns without visiting each row, whereas selecting a subset of rows only requires visiting each column (usually much fewer than the number of rows) Benchmark proof (nightly bencher), most efficient implementation of a full table sum for each column: ```rust= #![feature(test)] extern crate test; use std::hint::black_box; use test::bench::Bencher; #[bench] fn row_major(b: &mut Bencher) { let rows: Vec<Vec<i32>> = (0..10000).map(|_| (0..10).collect()).collect(); b.iter(|| { let mut row_sums = vec![0; 10]; for row in &rows { for (sum, val) in row_sums.iter_mut().zip(row) { *sum += *val; } } black_box(row_sums) }); } #[bench] fn col_major(b: &mut Bencher) { let cols: Vec<Vec<i32>> = (0..10).map(|_| (0..10000).collect()).collect(); b.iter(|| { black_box( cols.iter() .map(|vals| vals.iter().copied().sum::<i32>()) .collect::<Vec<i32>>(), ) }); } ``` ``` running 2 tests test col_major ... bench: 3,618.77 ns/iter (+/- 99.25) test row_major ... bench: 33,060.19 ns/iter (+/- 1,308.65) ``` I suggest we go with column-first order. I think most of our operations are likely to benefit from it. ## `TableStream` In order to enable table operations to be able to assume the shape of a table in a stream, we need to add a stream type for table as well. Naturally, it wouldn't make sense for this to be in column-first order, very few potential stream sources work that way. So I suggest that we still keep row-first order in this case, but that we optimize it in a similar way to how `ByteStream` works so that the producing function is adding its data to each column directly through a list of mutable references, rather than having to allocate and then release an intermediate row-first `Vec`. If we do this there should really be no significant added cost to the mismatch between the row-first order of the stream and the column-first order of the value. Pseudocode: ```rust trait TableStream { // Constant and available from the beginning fn columns(&self) -> &[String]; fn size_hint(&self) -> (usize, Option<usize>); // Pushes directly to each `Vec`, and must push one // value per row fn next(out: &mut [Vec<Value>]) -> bool; } ```