# N5 and strings
Michael and John
## changes
* added `RAW` DataType
* writing a non-enumerated type yields `RAW` type
* n5 backend writes byte array
* converter gives desired output type from byte array
## issues
* `createDataset` requires one of the enum `DataType`s
* make DataType more flexible
* or provide explicit func like "createRawDatasetAs"
* should we register a DataType along with a converter?
* register DataType string, and have it map to Raw internally
* (1) is there a native type we can convert to?
* (2) if not, is there a registered type we can map raw to?
* (3) if not, return null
* This should be a pain
* how should n5-hdf5 store strings?
* should we use its built in functions?
* does this design prevent us from doing so?
* multi-dim
* HDF5 standard includes *multidimensional* arrays of *unicode* strings
* writing everything chunked
* John wants to enforce that chunks contain an integer number of objects
* N5FS treats `DataType.OBJECT` in that way (see `DefaultBlockWriter.java:71`)
* what do you think, Stephans?
## TODO
* learn how zarr stores objects
* does zarr support variable-length
* write some strings with jhdf5
* multi-dim
* understand mecahnics of N5FS writer
* (to find where to insert obj writing logic)
## examples
Example
[ "", "a", "ab", "ABC", "ABCD" ]
[ ["", "a", "ab"],
["ABC", "ABCD"] ]
[ ["", "a", "ab", "A..."],
["...BC", "ABCD"] ]
## n5 issue draft
"storage of arbitrary data types"
@minnerbe helped with this issue.
N5 core does not support non-native and non-numeric types well (i.e. all `DataBlock` implementations are native + numeric). E.g. the API can not currently implement / wrap HDF5's string io (see https://github.com/saalfeldlab/n5-hdf5/issues/22)
### related
temporarily(?), for string support @minnerbe implemented:
### how zarr does it
Zarr has support for writing object arrays, and does so via codecs in [numcodecs](https://numcodecs.readthedocs.io/en/stable/index.html). Existing options are `JSON`, `MsgPack`, `Pickle`
### proposal
1) Include a generic Object DataBlock in
a) consider a special case String DataBlock?
2) Add a codec interface
* could be similar to the existing [`Compression`](https://github.com/saalfeldlab/n5/blob/master/src/main/java/org/janelia/saalfeldlab/n5/Compression.java) n5 interface
* idea similar to [numcodecs](https://numcodecs.readthedocs.io/en/stable/index.html)
* For Zarr interop, we should add JSON and MsgPack encoders
* See [msgpack-java](https://github.com/msgpack/msgpack-java)
* MsgPack may a good default for good zarr interop
* JSON is easy to implement
* Java's object serialization would be easy to add
* but we probably shouldn't use it
* Pickle - let's ignore