owned this note
owned this note
Published
Linked with GitHub
---
tags: meeting
---
# 2024-05-13
## Agenda
## Notes
# 2024-04-29
## Agenda
- Adding a key for tensor extensions.
- Discuss reference issues
## Notes
- Should we change how complex numbers are dealt with?
- No, we made the right decision.
- What about storing NNZ (number of stored values)?
- Consensus: add key as `number_of_stored_values` or `number_of_stored_entries` or `number_of_entries` or `number_of_explicit_entries`
- Should we add a key for tensor extensions?
- Willow: you can just test whether `format` is a supported string?
- `bytes[n]` data type
# 2024-04-18
What do we need for 1.0?
- tests and reference output
- C reference implementation
- TACO parser based on ^
- GraphBlas parser based on ^
- Convert MatrixMarket
# 2024-04-01
## Agenda
- SparseBLAS Comments:
- What about an offset to the arrays for multi-processor chunking
- What about parallelism? Blocking, seek
- add nnz to the header if it's not there already
- Add bytemap data:
Yzelman: Re the file formats and parallel I/O — there’s also some recent work by Tim Griesbach called scda, but not yet released. They come from the adaptive mesh angle but have a similar principle of storing binary arrays. No hierarchical support but they do have support for variable-sized arrays, which may come into play sparse matrix side if allowing certain compression schemes that play with the index types (think CSB, CBICRS, etc). Theirs is fully seekable based on header info (and targeting distributed-memory I/O)
Thanks, I’ve noted it=) I/O is super important as in practice we found it almost always forms a significant part of the overall execution, so we spent (and continue to spend) quite a lot of effort in optimising it. But better we have a standard binary format (finally!) to optimise for, rather than everyone doing their own thing— very important effort this one
## Notes
Discussed [attributes PR](https://github.com/GraphBLAS/binsparse-specification/pull/49). General agreement on the wording around attributes. Need some phrasing saying attributes can be ignored, but they must be correct if present. Discussed potential attributes, to list new attributes in GitHub comments in the PR.
- `number_of_diagonal_elements`
- `contains_cycles`
- Symmetry - do we want a symmetric attribute? `symmetric`, `skew_symmetric`, `hermitian`, `anti-hermitian`, `structure_symmetric`
- `invertible`
- `positive_definite`
Discussed chunking/tiling. There are essentially two ways of offering tiling, an offset-based approach, and a tile grid-based approach.
- In a tile grid-based approach, a matrix is partitioned into a series of tiles in a tile grid. Each tile is then stored as a small matrix. Given a specific set of tile coordinates, a user can access a tile in constant time. There is an implicit offset for each tile based on the tile shape and its tile coordinates. (Given a tile size `m,n`, the indices of tile `i,j` have the offset `i*m,j*n`.)
Erik: This works well, but we don't necessarily want a tile grid that's this rigid. Oftentimes sparse matrices have clusters of values in a small area, so we want to support tiles that are not necessarily uniform in size.
- In an offset-based approach, a matrix consists of a series of smaller matrices, each of which has an offset `i,j` that maps its indices into a larger sparse matrix. These tiles could possibly overlap, which increases complexity. Without an explicit tile grid, a user would likely need to iterate through all tiles, check for overlap with the submatrix that they want to compute, and then add corresponding nonzeros from any overlapping tiles. This could be nice, as it would allow adding updates in a TileDB-like fashion without modifying the existing matrix. However, it is more complex.
Isaac: Reading from HDF5 in parallel is possible with virtual datasets, and there is a proposal to allowing parallel IO in Zaar.
Willow: Do we actually need support for distributed to be in binsparse? We're getting into a lot of details that seem implementation-specific; could we just let the implementation decide how they use binsparse on their own?
Ben: We do need an official standard of some kind if libraries are going to move data back and forth. This standard doesn't necessarily have to be in binsparse, although I think that would be convenient. It is a good idea to get more implementation experience before we set anything in stone, though.
# 2024-03-19
## Agenda
- Add nnz to the header if it's not there already
## Notes
Ben: Should we add an NNZ property to the binsparse JSON? It is not currently part of the specification, with `number_of_elements` implicitly referred to.
Willow: What exactly would the NNZ be defined as? In the ISO case does it store 1? What about symmetric?
We then had a long discussion about the possible meanings of NNZ. Albert-Jan Yzelman pointed out that for a symmetric reader, it can be much more efficient if the NNZ refers to the actual number of positions in the matrix with stored values.
Use case: when reading a Matrix Market file, "number of nonzeros" refers to the number of values stored in the Matrix Market file. In the symmetric case, this is not typically the same as the number of positions in the matrix with an entry, due to a single stored value in the triangle being mirrored across the diagonal. When unpacking a symmetric matrix into a non-symmetric storage format, as is commonly done in many sparse matrix libraries, this means you do not know the amount of storage needed until you have read through the matrix to determine the number of elements in the diagonal. The amount of storage needed is `number_of_diagonal_elements` + `2*number_of_triangle_elements`.
Ben: This seems like an important use case we should support. However, it does seem non-intuitive to me that the number of nonzeros listed in the JSON would sometimes not correspond to the number of stored values in binary.
What exactly do we mean by nonzeros? Well, we mean explicitly stored values, which could be zero. We then have a few different possible definitions:
- Actual number of values stored in memory
- Number of positions in the abstract "matrix" which have a stored value. (In symmetric case, includes mirrored triangle values.)
What libraries have support for symmetric matrix types?
Isaac: Python libraries generally don't have support; Scipy.Sparse does not.
Julia does, but seg faults if you ask the number of nonzeros.
Tadd: In PyTorch, there's no way to get the NNZ at all from a sparse matrix.
Willow: Is it too difficult and ambiguous a question to put NNZ as a property in JSON?
What does SuiteSparse Matrix Collection do?
- Matrix Market reports number of matrix tuples stored in the Matrix Market file.
- Web interface reports something different:
- "Pattern entries," the number of positions in the matrix with an element (including explicit zeros)
- "Nonzeros," the number of numerically nonzero elements in the matrix (pattern entries - explicit zeros)
"Pattern entries" is the information that Albert would like here.
Ben: We previously discussed "attributes," which are optional information about the matrix that could be used for optimization. For example, an attribute could indicate that a matrix happens to be symmetric, even when the storage format is not symmetric. An attribute could also indicate that a matrix represents an undirected graph, a graph with no cycles, etc. We could provide `number_of_triangle_elements` and `number_of_diagonal_elements` as attributes.
Albert: This is sub-optimal, since the parser still has to write the inefficient case. It would be best if all the information to do optimal parsing were always there.
This would help, but would probably also be confusing. It does also require parsers that don't take advantage of this information to compute it if they don't already have it available.
Consensus:
- Introduce attributes as a concept. Define `number_of_triangle_elements` and `number_of_diagonal_elements` as optional attributes.
- Provide a note in the specification saying that a quality implementation would be expected to provide the `number_of_triangle_elements` attribute for a symmetric format, since this allows important optimizations.
- Number of nonzeros property? No consensus yet. It would seem like there are three possible options:
1. Number of elements explicitly stored in the matrix format (`len(colind)` for CSR, what's currently in Matrix Market.)
2. Number of elements with explicit entries in the matrix (`2*(len(colind) - number_of_diagonal_elements) + number_of_diagonal_elements`, the quantity Albert would like and "pattern entries" from Suite Sparse web interface.)
3. Number of elements with zero values in the matrix (doesn't count explicit zeros, "nonzeros" from Suite Sparse web interface.)
Ben's opinion: 1. and 2. are the reasonable options, with 1. being the most intuitive.
# 2024-03-04
## Agenda
- https://huggingface.co/docs/safetensors/index
## Notes
- What do we need to do for sparsity support?
# 2024-02-20
## Notes
- Discussed status of Ben's binsparse C parser
- Binsparse reading/writing is essentially fully functional
- Still working on some aspects of Matrix Market parsing (e.g. complex)
Spent most of discussion talking about testing:
- What kind of testing do we want?
- Is Matrix Market -> Binsparse enough? Can we simply `h5diff` the results?
- `h5diff` might be too strict
- Different interpretations of comments?
- Different interpretations of types? (e.g. `integer` becomes `int16` vs `int32` vs `uint64`. All should be valid.)
- What about reading in a Matrix Market file, then writing the results?
- Might have some bugs. e.g. what if you don't handle transpose properly internally, but end up outputing the correct output by just carrying through the transpose flag?
- We could have the testing infrastructure do some simple operation, such as a conversion between formats or a matrix operation (elementwise, squaring, etc.)
- Implementing these operations is going to be complex for a parser
- BUT: testing infrastructure doesn't need to be limited to parser. Can use a matrix library like SuiteSparse.
# 2024-02-05
## Agenda
- How do we spell out properties (e.g. at the element level, to define a struct of arrays)?
- Where are we at?
- 1.0
- spec is done (barring any minor adjustments as we implement stuff)
- Need to finish converting matrices
- Converting the matrices doesn't stress all of the different formats and datatypes
- Need to make c bindings
- Paper: arxiv or joss?
- wanna get on hacker news
- get it into the graphblas spec
- SparseBlas
- Testing (at least we should be able to interoperate with each other)
```cpp
struct array_t {
void* values = nullptr;
size_t size;
type_t type;
};
struct matrix_t {
array_t values;
array_t pointers_to_1;
array_t indices_0;
size_t nrows = 0;
size_t ncols = 0;
size_t nnz = 0;
format_t format;
};
enum format_t {
COO = 0,
CSR = 1,
CSC = 2
/* ... */
};
struct matrix_t {
format_t format;
void* data;
};
struct coo_t {
array_t values;
};
auto m = read_matrix("my_file.bsp");
if (m.format == COO) {
coo_t coo_m = (coo_t *) m.data;
assert(coo_m.rowind.type == INT_T);
assert(coo_m.colind.type == INT_T);
assert(coo_m.values.type == FLOAT32_T);
int* rowind = (int *) coo_m.rowind.values;
/* ... */
if (coo_m.values.type == FLOAT32_T) {
float* values = coo_m.values.values;
} else if (coo_m.values.type == FLOAT64_T) {
double* values = coo_m.values.values;
}
}
```
# 2023-12-11
## Agenda
- Support for property graphs / graphs with multiple sets of edges between vertex pairs
* Possible solutions:
* Willow: add an extra dimension for each type of edge
* User-defined datatype
- Full treatment of user-defined datatypes
## Notes
- Property graphs
- Graphs where edges can have multiple properties (e.g. height, length, width)
- Nodes may also be grouped, where each node type has the same properties
- To handle property graphs, we can add a special "element" type to the last level
- This will essentially state that there are multiple values for each stored nonzero
- These values will have names and be stored in separate arrays by type
- To handle user-defined types, we will rely on the binary container's support for user-defined types
- e.g., create an HDF5 type for your struct
- However, we can also easily allow users to dump an array of structs to disk by creating a `byte[n]` type, which would store `n` bytes
- Consensus: add a feature for storing *multiple values* of the same type for each value.
- e.g.: `uint8[12]` would mean each value is an array of 12 `uint8` values.
- complex is now kind of an alias to `float32[2]`, `float64[2]`, etc.
- Consensus: add a feature for struct of arrays.
# 2023-11-27
## Agenda
- Overview of v2
## Notes
- Binsparse: Willow's `generate_reference.jl` can pull any Matrix Market file and convert it to one of the Binsparse formats
* Limitation: doesn't handle comments (just not implemented yet)
- Converting SuiteSparse Matrix Collection: what format to use?
* Proposal: store as COO (COOR), which makes sense as an interchange format
- Topics for 2.0 next time:
- Tensor Symmetry
- Unordered indices
- Duplicate indices
- Chunking
- Is there anything else you want to see?
# 2023-10-30
## Agenda
We have two main items for the agenda:
1. Discuss implementations, best mechanism for sharing matrices. (Should we create a GitHub?)
2. Discuss V2.
## Notes
- Discussed adding a set of (small) matrices to the repository
- Matrix Market format along with canonical Binsparse format
- Candidate matrices:
- [https://sparse.tamu.edu/Mycielski/mycielskian3](https://sparse.tamu.edu/Mycielski/mycielskian3)
- [https://sparse.tamu.edu/Grund/b1_ss](https://sparse.tamu.edu/Grund/b1_ss)
- [https://sparse.tamu.edu/Meszaros/farm](https://sparse.tamu.edu/Meszaros/farm)
- Ideally, a parser would perform the following steps during testing:
- Read in Matrix Market file, convert to Binsparse file
- Check that Binsparse output is equivalent to canonical representation from repo
- Read in Binsparse file, check equivalent to Matrix Market representation
- Discussed chunking
- We could naively chunk the binsparse datasets, but this likely does not do everything we want
- e.g. for CSR format, the way matrix is chunked will not necessarily be meaningful
- We will likely end up with nonzeros from different parts of the matrix
- We would preferably chunk the matrix spatially
- This is essentially many small Binsparse matrices
- Willow: we could also use our tensor representation to represent chunking, where the higher dimensions represent splitting the matrix into chunks.
# 2023-10-16
## Agenda
We have two main items for the agenda:
1. Discuss implementations, best mechanism for sharing matrices. (Should we create a GitHub?)
2. Discuss V2.
- HDF5 binsparse format currently stores metadata inside a "binsparse" key in a string of JSON text. This text is stored inside an attribute named "binsparse" in an HDF5 group. Is this too redundant, and is this what we want?
## Notes
- Discussed converting SuiteSparse Matrix Collection to Binsparse format.
- There are two types of files in the SuiteSparse Matrix Collection: 1) Matrix Market files (including some *vectors* stored in `.mtx` files) and 2) text files, which can have arbitrary contents. Generally the text files can be viewed as lists of strings, where each line of the file contains a string, with the line number corresponding to some nonzero. The semantic meaning is dataset specific, however.
- We should be able to convert all the SuiteSparse Matrix Collection matrices to a single Binsparse HDF5 file programmatically by
1) Storing each `.mtx` file as a Binsparse matrix inside an HDF5 group with the same name as its filename (sans the `.mtx`)
2) Storing each `.txt` file as a string dataset inside the root group.
# 2023-09-25
## Agenda
- file extension ideas: .bsp.h5?
- If the complex type modifier only applies to floats, can we just handle it the same way we handle bint8, rather as some complicated "modifier" structure?
- Can we add a run-length-encoding level type called "repeated"?
## Notes
- We want to store the JSON blurb as an attribute using an HDF5 string
- Discussed how to set up automated testing of our implementations
# 2023-09-11
## Agenda
- We are missing named formats for dense matrix and dense vector
- We decided on CVEC, DVEC, DMAT, DMATR, DMATC
- Willow will work on a PR
- Can we specify that the fill_value array also has a corresponding fill_value_type entry?
- This is already specified, we will leave it alone
- Discuss storing JSON blob as attribute in HDF5 implementation -- shall we define the HDF5 type of the array?
- This isn't in spec, we'll follow up later if it comes up again.
## Notes
- Consensus: add dense matrix and dense vector formats. (These are for *true* dense formats, which cannot represent sparse matrices without a fill value.)
- Tentative names: dvec, cvec for dense and compressed vectors. dmat, dmatr, dmatc for dense matrices and row-major, column-major matrices.
- Specify a "data_type" for "fill_value"
# 2023-08-28
## Agenda
- Testing progress (brief)
- Multi-dimensional proposals
# 2023-08-21
## Agenda
- How to deal with complex numbers?
* Currently store real and imaginary parts together
* Currently allow any scalar type (e.g. integer complex types)
* Probably should restrict to floating point
- How to store Matrix Market `pattern` matrices? (As an ISO-valued `true` bint8?)
- https://github.com/ivirshup/binsparse-python
## Notes
- How to deal with user-defined types?
* Ben: ideally could handle this using binary container's user-defined type facilities.
* e.g. could have type equal to `user[user_typename` where `user_typename` is a user-defined label.
Parser would have to somehow keep a library of binary container objects (e.g. HDF5 type object)
and
# 2023-08-07
## Agenda
- Discuss HDF5 sparse chunks RFC
- Discuss current status of spec
## Notes
- HDF5 sparse chunks RFC seems quite different from Binsparse, but likely worth talking with them
* They may benefit from our sparse matrix support.
* We may benefit from their chunking.
- Discussed current status
* Working on bringin implementations up to spec
* Pass matrices back and forth, identify any spec issues
- Discussed converting SuiteSparse Matrix Collection
* Minimum viable product: take Matrix Market tarball, convert primary Matrix Market file to Binsparse format. Package up Binsparse file and any auxiliary files in a new tarball.
* Optional: could also convert any other Matrix Market files in the tarball.
* Optional: could also store any text files as HDF5 datasets.
* Converting both Matrix Market files and text files to HDF5 datasets might allow representing tarball with a single HDF5 file. However, if there are any other file types than text and Matrix Market, this would represent a problem.
* Original SuiteSparse Matrix Collection metadata field should remain the same.
# 2023-07-21
## Agenda
- Implementation status?
- When dealing with embedded Binsparse matrices, what should that look like? Should keys that correspond to binsparse matrices have a particular form? (e.g.: default is binsparse, would binsparse_MYMATRIXNAME be assumed to correspond to a Binsparse format matrix? Or should it be nested inside another dictionary? How will these be identified?)
- What's the format for version number?
- Should we "release" the spec? (e.g. release v ~0.9, to be increased to 1.0 after sufficient implementation experience?)
- Questions for Tim:
* Do we have full coverage of everything we need to store *the matrices* in SuiteSparse Matrix Collection?
* How shall we decide which binary format to use for each matrix in SSMC? (e.g. we could do some quick math based on nnz/row and pick COO, CSR, or DCSR, or try to somehow use metadata to pick?)
* How exactly will we need to repackage the metadata comment in an SSMC Matrix Market?
## Notes
- Version numbers?
* Add a key for version.
* Use semver with only two versions (e.g. 1.2)
* Isaac will submit a PR adding the version key and "releasing the spec"
* Start with version 0.5 for our current release candidate
* For development versions, we can use a version with "dev" suffix to indicate a dev version of the spec
* This does not necessarily need to be standardized.
- How to determine which file format to use?
* Use COO when the matrix type is "coordinate". Use Dense when the matrix type is "array".
- How to nest binsparse or use multiple binsparse in the same file?
* A binsparse parser is always pointed at an HDF5 group or HDF5 file, looks at the descriptor in that group, then parses the matrix. This allows for different formats in different subgroups. <- this needs a PR too
- What is the minimum viable product for MatrixMarket stuff?
* It's a script that parses a matrixmarket tarball, converts mtx files to hdf5 binsparse, and copies the comment field of mtx into the comment field of binsparse.
- Let's add a directory of more complicated reference binsparse files for formats other than COO.
- Let's add a list of parsers on the binsparse homepage!
# 2023-07-10
* Are we happy with https://github.com/GraphBLAS/binsparse-specification/pull/31?
* Yes! It has been merged
- Do we need a Skew Hermitian format?
* Does not seem to be supported by Matrix Market, although this does not necessarily mean we shouldn't add it.
- More generally, should we describe how parsers should deal with unimplemented features?
* What about the case where a matrix could be read, but the symmetry cannot be taken advantage of in the in-memory data structure?
* You could have the option of just reading in symmetric bit of matrix: https://github.com/alugowski/fast_matrix_market/blob/4e48fc97792f990cf81c8451e1d12a4e838dac82/README.md?plain=1#L77C20-L77C20
Conclusion: we don't need to specify
# 2023-06-26
* Are we happy with https://github.com/GraphBLAS/binsparse-specification/pull/29
* Yes!
### What else is needed for implementations?
* Do we need store specific documentation on formats?
* Maybe we will punt for now
* [Jim] But add a space for "implementation details"
* [Isaac] Reference datasets
* [Ben] small ones into the repo
* [Willow] Grab these out of the matrix market
* [Isaac] When do we start grabbing people to take a look?
* After we discuss + implement
* Discussion of which level `pointer` live at
* Conclusion: `pointers_to_level`
# 2023-06-12
## Agenda
- Do we have everything we need to store all matrices in the SuiteSparse Matrix Collection?
- Discuss plan for dealing with `symmetric`, `hermitian`, and `skew-symmetric` matrices.
## Notes
* We reviewed a lot of the discussion from the previous meeting.
* The SuiteSparse Matrix Collection has a lot of metadata included in the download tarball. The metadata is stored as additional files stored alongside the primary matrix. The metadata's structure is defined in the Matrix Market file's comments section.
* There are two methods we could use for storing these in Binsparse:
1. We could store the main matrix as a Binsparse file, inside of a tarball, and store the original structured comment form the Matrix Market file inside the comments section of the Binsparse file. This is very close to the status quo, with Matrix Market replaced with Binsparse.
2. Since Binsparse supports embedded sparse matrices, the metadata could potentially be stored alongside the Binsparse file in an HDF5 file or other container. The structured comment could also use JSON, since Binsparse comments allow arbitrary user-defined JSON.
* The specific mechanism is fundamentally outside the spec, but we need to support handling this kind of metadata.
* We discussed symmetry, and the [PR Ben submitted](https://github.com/GraphBLAS/binsparse-specification/pull/29) adding `symmetric_lower`, `symmetric_upper`, and the same for skew-symmetric and Hermitian matrices.
* Discussed the possibility of adding a "packed dense symmetric format," which would store a triangle of a dense matrix as a jagged array. Did not reach consensus on whether to add. There was some feeling that this is unnecessary, as we could already store a symmetric dense matrix with `symmetric_lower` or `symmetric_upper`, but wasting some disk space.
* Discussed the possibility of adding an informational "symmetric" flag that does not affect the on-disk storage format, but simply tells the user/reader that the matrix happens to be symmetric, even though it is not necessarily in a symmetric storage format.
* Discussed how having an information flag at the same level as the "structure" key might be confusing.
* Discussed the possibility of adding an "attributes" or similar key, which would contain a list of informational matrix attributes. This could include an information symmetric flag, as well as attributes like being a positive matrix or having no self-loops when interpreted as a graph.
# 2023-05-15
## Agenda
- Discuss number of `pointers` and `indices` arrays.
- Discuss plan for dealing with `symmetric`, `hermitian`, and `skew-symmetric` matrices.
- Do we have everything we need to store all matrices in the SuiteSparse Matrix Collection?
## Notes
* Overview of SuiteSparse matrix market collection
* Extra fields (example: imdb)
* Pajek collection
* Sparse matrix of movies x actors
* + extra vectors of data
* Metadata – Seems all json representable
* Willow: Metadata could live at a higher level (outside of the binsparse descriptor)
* Parent group?
* Strings
* Are strings supported
* [HDF5](https://docs.h5py.org/en/stable/strings.html), zarr, arrow have variable length string types
* Consensus was reached that we should add string to the list of datatypes, with appropriate caveats that the implementation may need to do some hacks to get it to work.
* Symmetry
* Who supports packed upper triangular formats?
* [LAPACK](https://netlib.org/lapack/lug/node123.html)
* [Finch](https://github.com/willow-ahrens/Finch.jl/blob/main/src/levels/sparsetrianglelevels.jl)
* [Not really in Base Julia](https://github.com/JuliaLang/julia/issues/22259)
* [cooler](https://cooler.readthedocs.io/en/latest/schema.html#storage-mode)
* Packed dense lower triangular is the straightforward binary interpretation of the MatrixMarket `matrix array symmetric real`, where the entry at ``(i, j)`` is stored at location `i + j(j-1)/2`.
* A note from Willow: Dense lower triangular arrays are a subset of ragged (sometimes called jagged, awkward) arrays.
* https://awkward-array.org/doc/main/
* https://en.wikipedia.org/wiki/Jagged_array
* Jim: Are we just specifying a tag?
* We can decouple whether a matrix is symmetric vs. whether we only store a triangle of it
* perhaps storing triangles get's pushed to v2
* There is contention over whether we want a packed dense triangular matrix.
# 2023-05-01
# Agenda
- Reach consensus on how to handle bitfield boolean matrices.
- Evaluate whether there are any other features missing in v1. Can we store all [SuiteSparse Matrix Collection](http://sparse.tamu.edu/) matrices using binsparse v1?
## Notes
- There's not a huge need to represent actual boolean matrices that are not ISO. We will leave this out of the spec for now, although we have a decent proposal for adding bit packing (see last week's notes). (e.g. `bit[int8]`)
- We will add `bint8` to the spec to represent an unsigned 8-bit integer that represents a boolean value.
- We will add `complex[x]` to represent a complex number, where `x` is a standard type (not complex, please), and `values` shall be a size `n*2` and `i/2`'th element contains the real part of `i'th` value and `i/2 + 1` contains the imaginary part of the `i'th` value.
- We will re-merge ISO into the value.
- Discussed
- Symmetry is needed - likely will store `symmetric_left` and `symmetric_right` (or some equivalent with better naming.)
# 2023-04-17
*Attendees: Willow Ahrens, Benjamin Brock, Jim Kitchen, Isaac Virshup, Erik Welch*
# Agenda
- Discuss ISO-value PR
- Discuss types in [#25](https://github.com/GraphBLAS/binsparse-specification/pull/25/files)
## Notes
- General consensus that ISO-valueness being part of the sparse matrix format, as written in ISO-value PR, is the correct decision.
- Question: can we support property graphs in v1, or is that completely left to v2?
- Consensus: we plan to have some support through user-defined types. There are strategies for this:
1. Users define their own type labels and provide a dictionary of type labels and type layouts to the parser when reading a file. Each language would have to define layouts for each format, and users could provide their own format.
2. This should probably be made compatible with Arrow, which defines a language for user-defined types, has implementations in most languages, and has many users with their own user-defined types.
Next, we discussed the types laid out in [#25](https://github.com/GraphBLAS/binsparse-specification/pull/25/files). Most discussion centered on how to handle boolean-valued sparse matrices represented as bit arrays. (For instance, a CSR sparse matrix with boolean values, where the values array contains uint64 values, and the i'th element of the values array is defined by the expression `(values[i / 64] >> (i % 64)) & 1`).
- There was general consensus that the "boolean bit-ness" of the values array is a quality of the values array similar to ISO-ness. In v2 this should be represented as a property of the last level, similar to ISO-ness.
- Representing a bitfield boolean values array in the values array type may be acceptable if we make it a bit more obvious what it is. For example, by introducing a `bint8`, `bit[int8]`, or similar format.
# 2023-04-03
*Attendees: Willow Ahrens, Benjamin Brock, Tim Davis, Jim Kitchen, Isaac Virshup, Erik Welch*
## Notes
- Discussed ISO values - should they be a property of the file format,
the type of the values array, or a level in a matrix?
- There seems to be consensus that for v2, ISO-ness should be a property of
the level, possibly as a "is_iso" flag.
- Ben feels that for v1, ISO-ness should be a property of the format OR value.
- We reached consensus around [Willow's proposal for v1](https://github.com/GraphBLAS/binsparse-specification/issues/26), that ISO-ness should
be reflected in the format of the matrix.
Willow wrote a draft of a property graph, with support for different properties at the bottom level.
```json
{
"swizzle": [1, 0],
"format": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "multiplex",
"subformats": {
"weight": {
"level": "element",
"value_type": "float32"
}
"capacity": {
"level": "iso",
"value_type": "int32"
}
}
}
}
},
}
```
# 2023-03-06
*Attendees: Willow Ahrens, Tim Davis, Benjamin Brock, Jim Kitchen*
## Notes
### Willow pres: Finch Sparse tensor
* Thinking of arrays as an index tree
* Level is vector of fibers
* Dense levels can be implicit in memory
* Kinds of level: `Dense`, `Sparse`, `Element`
* `Sparse` has idx + ptr
* `Dense` is implicit
* `COO` is a bit more complicated
* Collapse COO indices into one level with multiple arrays, e.g. array of coordinates
* Permutations for changing orders of dimensions
# 2023-02-27
*Attendees: Erik Welch, Jim Kitchen, Ben Brock, Isaac Virshup*
## Notes
* Going over spec
* Attributes: namespacing?
* The formats
* arrays are numbered by iteration order
* Are we missing a format?
* Ben: Dense? (masked dense)
* iso values
* Bikeshedding on the name
* "iso" for now? But erik and isaac don't like the `"1x[{dtype}]"`
* Whats next
* Review, go over one last time, pretty close to 1.0, 0.9?
*
# 2023-02-06
*Attendees: Willow Ahrens, Erik Welch, Tim Davis, Jim Kitchen, Isaac Virshup*
## Agenda
* Are there implications for v1 from v2?
## Notes
* Naming of the arrays
* 2d array for COO indices?
* Multiple dtypes in indices was a point for 1d
* Streaming
* MLIR expect coordinate major
* What does streamable mean in this context
* Can create something a part at a
* This doesn't include values, so still only indices
* Bump to v2?
* Currently:
* `inidces_0`, `pointers_0`, etc.
* 1 vs 0 indexing
* Julia currently does 1 based indexing for its sparse arrays, so not binary compatible with sparse blas libraries
* Supporting 1 and 0 across all readers could be complicated
* could be 1.1, could be extension point
* Future looking
* Ideally, can we make v2 not breaking?
* Next time
* Willow presenting on n-dim compiler
* Who else would be good to talk to here? MLIR, TACO
# 2023-01-23: graphblas binsparse meeting
*Attendees: Willow Ahrens, Jim Kitchen, Benjamin Brock, Erik Welch, Isaac Virshup*
## Agenda
* Recap of hdf5 pains (Ben)
* Zarr conventions (Isaac) (https://github.com/zarr-developers/zeps/pull/28)
* 2.0 plan (Willow + Erik)
* May scientific python meeting (Isaac) (https://scientific-python.org/summits/developer/2023/)
## Notes
* hdf5 pains
* Figuring out what dtypes are present in an HDF5 dataset
*
* C structure (from last meeting) – [link](https://github.com/GraphBLAS/binsparse-reference-impl/blob/main/include/binsparse/c_bindings/binsparse_matrix.h)
* 2.0
* "Bundled" levels: https://mybinder.org/v2/gh/GraphBLAS/binsparse-specification/main?filepath=sparsetensorviz/notebooks/Example_Rank4-bundled.ipynb
* Discussion: https://github.com/GraphBLAS/binsparse-reference-impl/pull/1
* v1
* What does the in memory implementation need to do?
* Could we go for a fairly simple file format, npz
* If high level, should we define a structure that isn't used?
* E.g. MLIR, TACO – and are these optimizing for the same thing
* Willow: they are all using the same buffers
* Maybe not 1-indexed languages, but basically the same
* TODO
* Move from design docs to spec
* Willow can follow up on interoprability with Finch
* NASA Grant
* https://science.nasa.gov/researchers/solicitations/roses-2022/amendment-73-f15-high-priority-open-source-science-final-text
* https://nspires.nasaprs.com/external/viewrepositorydocument/cmdocumentid=860825/solicitationId=%7BB364DBB8-390B-744D-013F-8F4C304B9A63%7D/viewSolicitationDocument=1/F.15%20HPOSS_Amend73.pdf