# Thoughts on how to store Quantify raw data # Quantify data storage ## Motivation Internally Quantify runs an arbitrary schedule and retrieves an arbitrary result. I can think of the following examples of the experiments, that need to be able to get stored: - T1 or measurement calibration experiment which are pretty much prepare-(wait)-measure. - Surface-17 with a complicated pattern with a weird periodicity. - Some completely random circuit with measurements in random places at random times. I think that the way we deal with the data should not depend on the experiment we are running. At the same time, in the end we want to easily get an XArray out of a stored data to analyze it. ## Data restrictions What we probably can guarantee, is: - we have some number of acquisition channels in the setup, probably we want to label them in the sane way ('Q0', 'Z1', 'Data qubit 1', whatever) - Each acquisition channel returns a piece of data with a fixed size: int, float or complex, or even '<u3' in some crazy cases, who knows :) - we run some schedule repeatedly (or not? :)); What we can't guarantee: - we can define a sane name for the retrieving variable; - we know the data type of the variable: do we store 0 or 1 (int), voltage (float/complex) or even a full trace (array)? - exact periodic structure of the resulting array: this data heavily depends on a periodicity. ## Sparse data storage In the beginning we can tell something about our setup, without even knowing schedule. I will use Surface-17 experiment as an example, because it is not too simple and not too hard. In that case we know already that we have, 17 acquisition channels: ```python= metadata = { 'acquisition_channels': [f"Z{i}" for i in range(4)] + [f"X{i}" for i in range(4)] + [f"D{i}" for i in range(9)] } ``` For example, we use hardware that gives us interpolated 0 or 1. To make things complicated, we can also pick to store IQ average for data qubits. Probably we have to store it in metadata: ```python= metadata['acquisition_units'] = { f"Z{i}": (int, ["|0>", "|1>"]) for i in range(4) } | { f"X{i}": (int, ["|0>", "|1>"]) for i in range(4) } | { f"D{i}": (complex, "V") for i in range(9) ``` Right now I am going to forget what experiment we are doing and start recording outputs in two byte streams. After writing a round, I will get something like : ``` sparse_data = { "acquisition_channel": [0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, ..., 6, 7, 8, 9, ..., 16, 17] "acquisition_value": [0, 1, 0, 0, ..., 0, 1, 0.3+0.8j, ..., 0.7 - 0.3j] } ``` but dumped into a two byte buffers. Let's make a placeholder: ```python= sparse_data = { "acquisition_channel": "PLACEHOLDER FOR BYTE ARRAY", "acquisition_value": "PLACEHOLDER FOR BYTE ARRAY", } ``` This is what I call "sparse data structure", and in the first place we should store **it** to disk as a raw experiment data, and not an XArray. (*Note:* there should be some additional data or metadata fields here.) Advantages are: - This data structure is able to store any schedule in experiment-agnostic manner. - Since it is internally periodic for sane experiments, we can use these byte buffers and with a help of correct stride choise we can use this byte buffer to construct a set of *not-contiguous* Numpy arrays, that can be used to construct an XArray (strides can be computed from metadata) - We may just dump the numbers from our setup into a stream, which is a very productive way to work with our equipment. Disadvantages are: - We don't have XArray writing facitlites, so we have to store data ourselves. - We need some data processing to extract an XArray with all its introspection and plotting facilities, but I think it is possible to do it with zero data copy, constructing Numpy arrays from the same byte buffer and computing correct strides. - Reshape to XArray is non-universal and heavily dependent on an experiment. I think this is unavoidable if we want Quantify to be a tool for generic experiments. Questions I didin't think about: - Live plotting - How to store timing information In the end, we should get something like: ```python= data = dict( acquisition_channel_index="ints", # Index of the channel acquisition_value="bytes" # Binary data metadata=dict( acquisition_channels=["Z1", "Z2", ..., "D9"], round_offset=42, # Offset in bytes to get to the data from the next experiment repetition ) ```