## Starter implementation
It will be mostly like this:
```python=
import awkward as ak
def actually_use(array):
print(f"using {array!r}")
def array_to_type(array, type):
# type: unknown
if isinstance(type, ak._v2.types.UnknownType):
raise TypeError("cannot write data of unknown type to RNTuple")
# type: primitive (e.g. "float32")
elif isinstance(type, ak._v2.types.NumpyType):
if isinstance(array, ak._v2.contents.IndexedArray):
array_to_type(array.project(), type) # always project IndexedArray
return
elif isinstance(array, ak._v2.contents.EmptyArray):
array_to_type(
array.to_NumpyArray(
ak._v2.types.numpytype.primitive_to_dtype(type.primitive)
),
type,
)
return
elif isinstance(array, ak._v2.contents.NumpyArray):
if array.form.type != type:
raise TypeError(f"expected {type!s}, found {array.form.type!s}")
else:
actually_use(array.data)
return
else:
raise TypeError(f"expected {type!s}, found {array.form.type!s}")
# type: regular-length lists (e.g. "3 * float32")
elif isinstance(type, ak._v2.types.RegularType):
if isinstance(array, ak._v2.contents.IndexedArray):
array_to_type(array.project(), type) # always project IndexedArray
return
elif isinstance(array, ak._v2.contents.RegularArray):
if array.size != type.size:
raise TypeError(f"expected {type!s}, found {array.form.type!s}")
else:
if type.parameter("__array__") == "string":
# maybe the fact that this is a string changes how it's used
actually_use(f"regular strings of length {type.size}")
else:
actually_use(f"regular lists of length {type.size}")
array_to_type(array.content, type.content)
return
else:
raise TypeError(f"expected {type!s}, found {array.form.type!s}")
# type: variable-length lists (e.g. "var * float32")
elif isinstance(type, ak._v2.types.ListType):
if isinstance(array, ak._v2.contents.IndexedArray):
array_to_type(array.project(), type) # always project IndexedArray
return
elif isinstance(array, ak._v2.contents.ListArray):
array_to_type(array.toListOffsetArray64(True), type)
return
elif isinstance(array, ak._v2.contents.ListOffsetArray):
if type.parameter("__array__") == "string":
# maybe the fact that this is a string changes how it's used
actually_use("variable-length strings")
else:
actually_use("variable-length lists")
actually_use(array.offsets.data)
array_to_type(array.content, type.content)
return
else:
raise TypeError(f"expected {type!s}, found {array.form.type!s}")
# type: potentially missing data (e.g. "?float32")
elif isinstance(type, ak._v2.types.OptionType):
raise NotImplementedError("RNTuple does not yet have an option-type")
# type: struct-like records (e.g. "{x: float32, y: var * int64}")
elif isinstance(type, ak._v2.types.RecordType):
if isinstance(array, ak._v2.contents.IndexedArray):
array_to_type(array.project(), type) # always project IndexedArray
return
elif isinstance(array, ak._v2.contents.RecordArray):
actually_use("begin record")
for field, subtype in zip(type.fields, type.contents):
actually_use(f"field {field}")
array_to_type(array[field], subtype)
actually_use("end record")
return
else:
raise TypeError(f"expected {type!s}, found {array.form.type!s}")
# type: heterogeneous unions/variants (e.g. "union[float32, var * int64]")
elif isinstance(type, ak._v2.types.UnionType):
if isinstance(array, ak._v2.contents.IndexedArray):
array_to_type(array.project(), type) # always project IndexedArray
return
elif isinstance(array, ak._v2.contents.UnionArray):
actually_use("begin union")
actually_use(array.tags.data)
actually_use(array.index.data)
for index, subtype in enumerate(type.contents):
actually_use(f"index {index}")
array_to_type(array.project(index), subtype)
actually_use("end union")
return
else:
raise TypeError(f"expected {type!s}, found {array.form.type!s}")
else:
raise AssertionError(f"type must be an Awkward Type, not {type!r}")
```
## All of the layout node types
### EmptyArray
type: UnknownType
packed -> self (unchanged)
### NumpyArray
type: NumpyType
packed -> convert to contiguous, maybe with RegularArray, so .data will always be one-dimensional
All of the dtypes: https://github.com/scikit-hep/awkward/blob/31f3afb6bd31949bb2014d3be4b86194a63a573c/src/awkward/_v2/types/numpytype.py#L66-L82
### RegularArray
type: RegularType
packed -> RegularArray without extraneous content (so don't worry about that)
### ListOffsetArray
type: ListType
packed -> trims extraneous content and ensures that offsets start with zero
### ListArray
type: ListType
packed -> converts to ListOffsetArray, so don't worry about this type
### IndexedArray
type: type of content
packed -> projects itself out, so don't worry about this type
### IndexedOptionArray
type: OptionType
packed -> ensures that all missing values are -1 in the index and the other values are strictly increasing
So `[55, 22, 44, None, 11, None, None, 99]` has an index of
```
[0, 1, 2, -1, 3, -1, -1, 4]
```
and the content is contiguous
```
[55, 22, 44, 11, 99]
```
### ByteMaskedArray
type: OptionType
packed -> converts to IndexedOptionArray if the content is a RecordArray and just gives you a trimmed, cleaned up ByteMaskedArray otherwise
So the above example would have a mask of
```
[False, False, False, True, False, True, True, False]
```
and the content is
```
[55, 22, 44, 31434134, 11, -23432, 23454, 99]
```
### BitMaskedArray
type: OptionType
packed -> converts to IndexedOptionArray if the content is a RecordArray and just gives you a trimmed, cleaned up BitMaskedArray otherwise
### UnmaskedArray
type: OptionType
packed -> packs through content
### RecordArray
type: RecordType
packed -> packs through content
### UnionArray
type: UnionType
packed -> cleans up tags and index, packs through content
### EmptyArray has
* `.toNumpyArray(dtype)`
### Option-type arrays all have
* `.toIndexedOptionArray64()`
* `.toByteMaskedArray(valid_when)`
* `.toBitMaskedArray(valid_when, lsb_order)`
### How to make an Awkward Array
```python=
array = ak._v2.Array([[1.1, 2.2, 3.3], [], [4.4, 5.5]])
```
### How to make an Awkward Type
```python=
array.type
```
or
```python=
ak._v2.types.from_datashape("var * float64", highlevel=False)
```
And one last thing: if the type has `ArrayType` as the outermost structure, strip it off with `type.content`. This outermost structure only tells you the length of the array, which is not information you want/need.