This document describes how the data and the metadata from the MX experiments could be stored in a metadata catalogue in a generic and extensible manner.
# Background
## Metadata catalogue
### Database overview
This document assumes that the metadata catalogue is ICAT which the main entities are:
1. Investigation: It contains all information concerning the experimental session. In a `BAG`, each allocated time slot in a beamline will be a different investigation.
2. User: a user is a person that participates in a experiment and has a role (local contact, proposer, scientists, co-proposer, etc...)
3. Sample: it is the specimen that will be collected in a instrument
4. Dataset: a dataset is any measurement done during the data acquisition or any analysis performed. A set of data and metadata will be associated to each each dataset
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
node [color=Red,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
Investigation->{Sample}
Investigation->{Instrument}
Investigation->{Users}
Dataset->{Datafile}
Sample->{Dataset}
}
```
A more detailed schema can be found [here](https://icatproject.org/user-documentation/icat-schema/) and the full schema [here](https://repo.icatproject.org/site/icat/server/4.10.0/schema.html)
## Data flow
This section summarizes how the sample information is inserted into the system, trasnferred to the data acquisition software and then gets catalogued after the data acquisition.
There are three different software "sub-systems":
1. Tracking system: sample and experiment description
2. Data Acquisition software
3. Processing
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
TrackingDatabase [color=blue,shape=cylinder]
UserOffice [color=blue,shape=box]
User [color=blue,shape=box]
TrackingSystem [color=blue,shape=box]
MxCube [color=Red,shape=box]
MetadataCatalogue [color=Red,shape=box]
Database [color=Red,shape=cylinder]
ProcessingPipeline [color=Brown,shape=box]
User->TrackingSystem [label="2) populates"]
UserOffice->TrackingSystem [label="1) populates"]
TrackingSystem->{TrackingDatabase}
MxCube->TrackingSystem [label="3) uses" style=dashed]
MxCube->MetadataCatalogue [label="4) push"]
MxCube->ProcessingPipeline [label="5) triggers" style=dashed]
ProcessingPipeline->MetadataCatalogue [label="6) push"]
{rank=same;TrackingSystem MxCube ProcessingPipeline}
{rank=same;TrackingDatabase Database }
MetadataCatalogue->Database []
}
```
> Note: The tracking system has become optional and has its own database. Even if for automated or highly automated beamlines the tracking system is most likely to be mandatory, it could be, in principle, possible to run the experiment without it.
---
1. **[optional]** The user office management system pushes information about the nature of the sample (chemical formula, protein acronym, safety level, etc..). This process is done automatically.
2. **[optional]** The tracking system has an UI where users can enrich the sample information already retrieved from the step 1 with:
* Shipping information: parcel, container type and location etc..
* Experimental plan: data acquisition parameters (exposure time, number of images, etc..)
* Processing plan: processing parameters (name of pipelines, force space group, etc...)
3. **[optional]** Information about the sample can be retrieved by MxCube from the TrackingSystem
4. Data and metadata are pushed from MxCube to the metadata catalogue. This step consists on the automatic capture of the metadata that will be then
5. **[optional]** When available, MxCube can triggers the online data anlysis
6. **[optional]** Results of the pipelines will be catalogued in the same way as the raw data
---
# Tracking system
## Sample description and tracking [Optional]
Before the experiment and optionally, it is possible to describe the samples, how they will be shipped, acquired and processed.
This includes:
- Shipment
- Parcel description and tracking: Where the parcel is at any time and what is its content
- Sample changer location
- Information about how samples should be collected and processed
- Experimental plan: how data will be acquired?
- Processing plan: how data will be processed?
### Example
This diagram shows the high level entities which information could eventually be stored in the `SampleTracking` database.
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
node [color=Red,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
Trypsine->{Native Soak Other}
Native->{Puck1}
Soak->{Puck2 }
Other->{Puck2 Puck3}
Puck1->{Dewar}
Puck2->{Dewar}
Puck3->{Dewar}
Dewar->{SampleChanger}
}
```
### Metadata requirements
Currently in ISPyB, the main entities currently used in the description and tracking of the sample are:
1. Protein
2. Crystal
3. Sample
4. Container
5. Dewar
6. Diffraction Plan
The next diagram shows the main fields of such entities:
```mermaid
classDiagram
direction LR
Protein <-- Crystal
Crystal <-- Sample
Sample <-- Container
Container <-- Dewar
Sample <-- DifracctionPlan
class Protein{
name
acronym
safetyLevel
}
class Crystal{
crystalUUID
name
spaceGroup
cell_a
cell_b
cell_c
cell_alpha
cell_beta
cell_alpha
pdbFileName
pdbFilePath
}
class Sample{
location
holderLength
loopLength
loopType
wireWidth
structureStage
smiles
}
class Container{
type
capacity
sampleChangerLocation
wireWidth
containerStatus
barcode
}
class Dewar{
storageLocation
capacity
customsValue
transportValue
trackingNumber
}
class DifracctionPlan{
exposureTime
experimentKind
observedResolution
radiationSensitivity
preferredBeamDiameter
aimedCompleteness
aimedIOverSigmaAtHighestRes
aimedMultiplicity
aimedResolution
anomalousData
complexity
forcedSpaceGroup
requiredMultiplicity
requiredResolution
strategyOption
kappaStrategyOption
numberOfPositions
minOscWidth
axisRange
}
```
Furthermore, a processing plan has been added to the model. Note that a processing plan can have one or more `HighResolutonCutoffCriterion`:
```mermaid
classDiagram
direction LR
ProcessingPlan <|-- HighResolutonCutoffCriterion
class ProcessingPlan{
Name
UserParameter
ReferenceHKLPath
StatisticsProgram
StatisticsBinning
StatisticsNumberOfBins
MolecularReplacementFromCell
MolecularReplacementFromUserModel
High_Resoluton_Cutoff_Criteria : [HighResolutonCutoffCriterion]
}
class HighResolutonCutoffCriterion{
HighResolutionCutoffIsotropic
HighResolutionCutoffCriterion
HighResolutionCutoffLowThreshold
HighResolutionCutoffHighThreshold
}
```
> Note: The name of the fields are still be decided. Ideally, by following the Nexus convention when possible or any kind of ontology
#### Implementation
The sample tracking data model is the result of generalizing the above to support any kind of experiment.
```mermaid
classDiagram
direction LR
Shipment *-- Parcel : hasParcels
Parcel *-- Item : hasItems
Item *-- Item : hasItems
Item *-- Parameter
class Shipment{
name
investigationId
investigationName
courierAccount
courierCompany
description
comments
status
defaultReturnAddress
defaultShippingAddress
parcels
}
class Parcel{
name
shipmentId
description
containsDangerousGoods
currentStatus
statuses
returnAddress
shippingAddress
storageConditions
defaultShippingAddress
comments
content : [Item]
}
class Item{
name
description
sampleId
type
comments
containerType
sampleContainerPosition
content : [Item]
experimentPlan : [Parameter]
processingPlan : [Parameter]
}
class Parameter{
key
value
}
```
# Data catalogue
## Samples
## Datasets
A dataset is a set of medata parameters and datafiles that is linked to a sample and investigation.
The metadata parameters are a list of well-known keys that follows a specific name convention. The list of valid keys can be found [here](https://gitlab.esrf.fr/icat/hdf5-master-config/-/blob/master/hdf5_cfg.xml)
The type of a dataset can be: `raw` or `processed`.
### Raw
#### Raw and primitives
Raw datasets contain raw data. Typically, but not only, images coming from the detector or any kind of optic device.
A raw dataset, in some cases, can be considered as a data collection. The main types of data collection (or primitives) are:
* Mesh
* Automesh
* Line
* Oscillation
* Reference
* Data collection
* Snapshots
#### Workflows and groups
Sometimes a measurement is composed by a set of primitives. In that case, apart of the sample, we should indicate that these primitives form part of a group.
Example of MXPressE workflow:
```flow
st=>start: MXPressE
e=>end: End
snapshots=>operation: SNAPSHOTS
automesh=>operation: AUTOMESH
mesh=>operation: MESH
line=>operation: LINE
ref=>operation: REF
osc=>operation: DATACOLLECTION
st->snapshots->automesh->mesh->line->ref->osc->e
```
#### Metadata
```mermaid
classDiagram
class RawDataset{
experimentType
startTime
endTime
actualSampleSlotInContainer
actualContainerBarcode
actualContainerSlotInSC
dataCollectionNumber
axisStart
axisEnd
axisRange
overlap
numberOfImages
startImageNumber
numberOfPasses
exposureTime
imageDirectory
imagePrefix
imageSuffix
imageContainerSubPath
fileTemplate
wavelength
resolution
detectorDistance
xBeam
yBeam
xBeamPix
yBeamPix
slitGapVertical
slitGapHorizontal
transmission
synchrotronMode
rotationAxis
phiStart
kappaStart
omegaStart
resolutionAtCorner
undulatorGap1
undulatorGap2
undulatorGap3
beamSizeAtSampleX
beamSizeAtSampleY
centeringMethod
actualCenteringPosition
beamShape
flux
flux_end
workflowName
workflowStatus
crystalSizeX
crystalSizeY
crystalSizeZ
}
```
### Processed
Any dataset that has been `derived` from raw data is considered processed. Any result from a analysis tool or pipeline should be stored as processed dataset and linked to the input datasets.
A processed dataset can be linked to multiple raw datasets and vice versa
Example:
```graphviz
digraph hierarchy {
nodesep=1.0
node [color=Blue,fontname=Courier,shape=box]
Raw_1 [color=Red,fontname=Courier,shape=box]
Raw_2 [color=Red,fontname=Courier,shape=box]
edge [color=Blue, style=dashed]
Raw_1->{Processed_1 Processed_2 Processed_3}
Raw_2->{Processed_3}
}
```
Intermediate results can be stored as datasets too:
```graphviz
digraph hierarchy {
nodesep=1.0
node [color=Blue,fontname=Courier,shape=box]
Raw_1 [color=Red,fontname=Courier,shape=box]
edge [color=Blue, style=dashed]
Raw_1->{Processed_1_1 }
Processed_1_1->{Processed_1_2}
Processed_1_2->{Processed_1_3}
}
```
#### Metadata
Following diagram shows how processed results are stored currently in ISPyB
```mermaid
classDiagram
direction LR
AutoProcIntegration *-- AutoProcProgram
AutoProcProgram *-- AutoProcScalingStatistics
AutoProcScalingStatistics *-- AutoProcScaling
AutoProcScaling *-- AutoProc
AutoProcScaling *-- Phasing
class AutoProcIntegration{
startImageNumber
endImageNumber
refinedDetectorDistance
refinedXBeam
refinedYBeam
rotationAxisX
rotationAxisY
rotationAxisZ
beamVectorX
beamVectorY
beamVectorZ
cell_a
cell_b
cell_c
cell_alpha
cell_beta
cell_gamma
anomalous
}
class AutoProcProgram{
processingPrograms
processingStatus
processingMessage
processingEnvironment
}
class AutoProcScalingStatistics{
scalingStatisticsType
resolutionLimitLow
resolutionLimitHigh
rMerge
rMeasWithinIPlusIMinus
rMeasAllIPlusIMinus
rPimWithinIPlusIMinus
rPimAllIPlusIMinus
fractionalPartialBias
nTotalObservations
nTotalUniqueObservations
meanIOverSigI
completeness
multiplicity
anomalousCompleteness
anomalousMultiplicity
anomalous
ccHalf
ccAno
sigAno
isa
completenessSpherical
completenessEllipsoidal
anomalousCompletenessSpherical
anomalousCompletenessEllipsoidal
}
class AutoProcScaling{
resolutionEllipsoidAxis11
resolutionEllipsoidAxis12
resolutionEllipsoidAxis13
resolutionEllipsoidAxis21
resolutionEllipsoidAxis22
resolutionEllipsoidAxis23
resolutionEllipsoidAxis31
resolutionEllipsoidAxis32
resolutionEllipsoidAxis33
resolutionEllipsoidValue1
resolutionEllipsoidValue2
resolutionEllipsoidValue3
}
class AutoProc{
spaceGroup
refinedCell_a
refinedCell_b
refinedCell_c
refinedCell_alpha
refinedCell_beta
refinedCell_gamma
}
class Phasing{
numberOfBins
binNumber
lowRes
highRes
metric
statisticsValue
method
solventContent
enantiomorph
lowRes
highRes
groupName
}
```
# Data Ingestion
Data and metadata are automatically captured from the data acquisition and processing software and pushed to the ingester via [ActiveMQ](https://activemq.apache.org/) messages.
The data flow is in one direction only, from beamline to catalog. Ideally, except of technical circustances that could eventually prevent this, the downstream process should rely only in the information that is stored on the beamline or has been propagated (e.g: MxCube triggers the processing pipelines with the diffraction and processing plan parameters).
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
MxCube [color=Brown,shape=box]
Ingester [color=Blue,shape=box]
ProcessingPipeline [color=Brown,shape=box]
MetadataCatalogue [color=Brown,shape=box]
Tape [color=Brown,shape=box]
MxCube->Ingester [label="3) pushes" style=dashed]
Ingester->MetadataCatalogue [label="4) stores"]
Ingester->Tape [label="5) Archives"]
MxCube->ProcessingPipeline [label="5) triggers" style=dashed]
ProcessingPipeline->Ingester [label="6) pushes"]
}
```
A dataset is a set of data and metadata that makes sense to group it together as the result of a single "action" or "calculation".
For instance, it might be the result of the same analysis or acquisition process. Nevertheless, the level of granularity is to be defined on case-by-case basis.
Example of datasets:
- Raw datasets
- Line
- Characterisation
- Mesh
- ...
- Processed datasets
- Integration
- Subtraction
- Grenades
- Autoproc
## Storing datasets
Datasets are cataloged in real time. In order to store a dataset a message has to be sent to the message broker (ActiveMQ server) from the client.
These messages has a generic and known format for any kind of datasets.
The message sent to the message broker contains all the information that allows the ingester to:
- create a dataset and store its data and metadata and
- link it to the proposal,
- associate rich metadata and the data that has been produced.
For doing so, the message's format is composed by:
1. Proposal name
2. Sample name
3. Dataset name
4. startDate
5. endDate
6. Location: folder with the data
7. Parameters: an array of key-value pairs with the allowed parameters
## Dataset Folder
One of the constrains of this approach is that every dataset should have its own folder.
As general recommendation the following folder's hierarchy is proposed:
```
/data/visitor/{proposal}/{beamline}/{session}/{sample}/{dataset}
```
However, paths can be then adapted depending of the needs and the type of experiment:
```
/data/visitor/{proposal}/{beamline}/{session}/raw_data/{sample}/{workflow}/{dataset}
```
Examples:
```
/data/visitor/mx1234/id23-1/20230713/raw_data/trypsine/MXpressA/line
/data/visitor/mx1234/id23-1/20230713/raw_data/trypsine/MXpressA/mesh
/data/visitor/mx1234/id23-1/20230713/raw_data/trypsine/MXpressA/Characterisation
/data/visitor/mx1234/id23-1/20230713/processed_data/trypsine/autoproc
/data/visitor/mx1234/id23-1/20230713/processed_data/trypsine/edna_proc
/data/visitor/mx1234/id23-1/20230713/processed_data/trypsine/grenades
```
> Note: The only constraint is to have one folder for each dataset.Therefore the filepath of a dataset can be used as identifier
### Permanent access to data
At the ESRF data stay on disk during 90 days after the experiment and then it is removed.
Once data are removed then the links are broken and no file can be shown in the UI. Therefore, there is a need to keep a subset of files online. It is tipically the kind of files that are shown at the level of the data collection in ISPyB/EXI:
- the diffraction images,
- dozor plot,
- output of the pipelines,
- .map files,
- etc...
In order to overcome this problem, the needed files are copied into a separated area on disk were data is never removed. This is called at the ESRF: `/data/pyarch`
Besides, each program that produces such files is responsible to copy them into `pyarch`
The next diagram shows how data is copied from the diffent programs into the two locations:
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
MxCube
ProcessingPipeline1
ProcessingPipelineN
pyarch [color=blue,shape=box, label="/data/pyarch/proposal\n Permanent disk"]
data [color=gray shape=box, label="/data/visitor/proposal\n(Ephemeral disk 90 days)"]
MxCube->pyarch [label="copies online data to" color=blue]
MxCube->data [label="copies data to" style=dashed color="grey"]
ProcessingPipeline1->pyarch [label="copies online data to" color=blue]
ProcessingPipeline1->data [label="copies data to" style=dashed color="grey"]
ProcessingPipelineN->pyarch [label="copies online data to" color=blue]
ProcessingPipelineN->data [label="copies data to" style=dashed color="grey"]
{rank=same;data;pyarch}
{rank=max;ProcessingPipeline1, ProcessingPipelineN}
{rank=same;MxCube}
}
```
> /data/visitor/proposal is the experiment's area where the files are produced from both data acquisition and processing
> Online means that the data needs to be always accesible
This procedure is prone to errors because it relies on the implementation of each pipeline which needs to know where and how to copy the data.
### Online and gallery subfolders
In order to improve this situation we propose the files to be copied in a automatic and centralized manner.
In order to do so, each dataset might contain a "special" sub-folder which data will be handled accordingly.
Currently, the proposal is to have these two subfolders at the root level of the dataset:
- Gallery: this folder contains images and small files that will be then shown in the UI. The files are persisted in a mongo DB.
- Online: this folder will be persisted in a separated disk area and will be always be available. The content of the online subfolder will be automatically copied into pyarch.
- *nobackup: the purpose of these folders is to store temporary data used mainly for the processing. All folder with such suffix will not be archived and will not be catalogued.
> Note: the purpose of the gallery is to host small files (< 16MB) while there is not size limitation for the online folder where pdb and maps can be copied. However, the goal is to make accessible these file from the web browser. Files >100MB will need time to be download and therefore other propocols like `globus` might be more appropiate than `https`
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
MxCube [color=gray,shape=box]
Ingester [color=gray,shape=box]
ProcessingPipeline [color=gray,shape=box]
MetadataCatalogue [color=gray,shape=box]
Tape [color=gray,shape=box]
MongoDB [color=Brown,shape=cylinder]
pyarch [color=Brown,shape=folder]
MxCube->Ingester [label="3) pushes" style=dashed]
Ingester->MetadataCatalogue [label="4) stores"]
Ingester->Tape [label="5) Archives"]
Ingester->MongoDB [label="6) Copies gallery to" color=Brown]
Ingester->pyarch [label="6) Copies online to" color=Brown]
MxCube->ProcessingPipeline [label="5) triggers" style=dashed]
ProcessingPipeline->Ingester [label="6) pushes"]
}
```
Example of the proposed folder structure:
```
└── trypsine
└── line
├── datafile1.h5
├── datafile2.h5
├── datafilen.h5
├── gallery
│ ├── line1.gif
│ ├── line2.gif
│ └── line3.gif
└── online
└── line.log
```
> /gallery/file*.gif files will be copied into the mongodb and the line.log will be automatically copied in /data/pyarch
#### Example molecular replacement
Assuming each pdb group is consider as a single dataset then the files attached to the dataset are:
```
/data/pyarch/2023/id23eh2/mx2452/20230716/RAW_DATA/DHOmut/DHOmut-B1X3/autoprocessing_DHOmut-B1X3_w1_run2_1/grenades_parallelproc/21_84.3_158.4_60.4_90_90_90
.
├── 4BY3.pdb_mrpipe_dir
│ ├── 2FOFC_REFINE.map
│ ├── FOFC_REFINE.map
│ ├── MR.log
│ ├── MR.mtz
│ ├── MR.pdb
│ ├── peaks.csv
│ ├── refined.mtz
│ └── refined.pdb
```
Taking the above dataset as an example, `gallery` and `online` folders could be organized in the following way:
```
├── 4BY3.pdb_mrpipe_dir
│ ├── 2FOFC_REFINE.map
│ ├── FOFC_REFINE.map
│ ├── gallery
│ │ └── peaks.csv
│ ├── MR.log
│ ├── MR.mtz
│ ├── MR.pdb
│ └── online
│ ├── refined.mtz
│ └── refined.pdb
```
> Note: peaks.csv, refined.mtz and refined.pdb will be always available. However, the rest of the files will need to be restored from tape before someone can access to them
IIn the example above as soon as the dataset is handled by the ingesters, it will take care of the copy of the files to the appropiate destination
## Linking datasets
One or more datasets can be linked together. In the context of data proccesing, the result of a job needs to determine which dataset is the output and which is the input.
The next diagram shows one simple case:
```graphviz
digraph hierarchy {
nodesep=1.0 // increases the separation between nodes
node [color=Gray,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
Sample[color=gray, style=dashed]
RawDataset[color=Blue, style=dashed, label="Raw Dataset"]
ProcessedDataset[color=Red, style=dashed, label="Processed Dataset"]
Sample->{RawDataset}
RawDataset->{ProcessedDataset}
{rank=same;Sample;RawDataset;ProcessedDataset}
}
```
### Use cases
#### Workflow MxPress-E
```graphviz
digraph hierarchy {
nodesep=0.2 // increases the separation between nodes
node [color=Gray,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
Sample[color=gray, style=dashed]
Sample->{Snapshots[color=Blue]}
Snapshots->{Automesh[color=red]}
Sample->{Mesh[color=Blue]}
Sample->{Line[color=Blue]}
Line->{Dozor[color=Red]}
Mesh->{Dozor2[color=Red]}
Sample->{Characterisation[color=Blue]}
Sample->{DataCollection[color=Blue]}
Characterisation->{Mosflm_dozor[color=Red]}
DataCollection->{XDSAPP[color=Red]}
DataCollection->{XIA2_DIALS[color=Red]}
DataCollection->{autoPROC[color=Red]}
DataCollection->{autoPROC_staraniso[color=Red]}
DataCollection->{grenades_fastproc[color=Red]}
DataCollection->{Dozor3[color=Red]}
grenades_fastproc->{Phasing[color=Red]}
Phasing->{Refinement[color=Red]}
Refinement->{Ligand_fit[color=Red]}
{rank=min;Sample}
{rank=sample;Snapshots;Automesh;Mesh;Line;Characterisation,DataCollection}
{rank=sample;XDSAPP;XIA2_DIALS;autoPROC;autoPROC_staraniso;}
}
```
#### Merging
```graphviz
digraph hierarchy {
nodesep=0.2 // increases the separation between nodes
node [color=Gray,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
Sample1[color=gray, style=dashed]
Sample1->{Snapshots[color=Blue]}
Sample1->{Automesh[color=Blue]}
Sample1->{Mesh[color=Blue]}
Sample1->{Line[color=Blue]}
Sample1->{Characterisation[color=Blue]}
Sample1->{DataCollection1[color=Blue]}
Sample1->{DataCollection2[color=Blue]}
DataCollection3->{autoPROC[color=Red]}
DataCollection2->{autoPROC[color=Red]}
Sample2->{DataCollection3[color=Blue]}
DataCollection1->{autoPROC[color=Red]}
{rank=min;Sample1;Sample2}
{rank=sample;Snapshots;Automesh;Mesh;Line;Characterisation}
{rank=sample;autoPROC;}
}
```
#### Multiple oscillations with different kappa angles
```graphviz
digraph hierarchy {
nodesep=0.2 // increases the separation between nodes
node [color=Gray,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
Sample1[color=gray, style=dashed]
Sample1->{Snapshots[color=Blue]}
Sample1->{Automesh[color=Blue]}
Sample1->{Mesh[color=Blue]}
Sample1->{Line[color=Blue]}
Sample1->{Characterisation[color=Blue]}
Sample1->{Oscillation_kappa_X[color=Blue]}
Sample1->{Oscillation_kappa_Y[color=Blue]}
Sample1->{Oscillation_kappa_Z[color=Blue]}
Oscillation_kappa_X->{indexing_X[color=Red]}
Oscillation_kappa_Y->{indexing_Y[color=Red]}
Oscillation_kappa_Z->{indexing_Z[color=Red]}
indexing_X->{integration_X[color=Red]}
indexing_Y->{integration_Y[color=Red]}
indexing_Z->{integration_Z[color=Red]}
integration_X->{Merged[color=Red]}
integration_Y->{Merged[color=Red]}
integration_Z->{Merged[color=Red]}
{rank=min;Sample1;}
{rank=sample;Snapshots;Automesh;Mesh;Line;Characterisation}
{rank=aimless;}
}
```
#### Multiple oscillations with different kappa angles - Olof's version
```graphviz
digraph hierarchy {
nodesep=0.2 // increases the separation between nodes
node [color=Gray,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=Blue, style=dashed] //All the lines look like this
Sample1[color=gray, style=dashed]
Sample1->{Snapshots[color=Blue]}
Sample1->{Automesh[color=Blue]}
Sample1->{Mesh[color=Blue]}
Sample1->{Line[color=Blue]}
Sample1->{Characterisation[color=Blue]}
Sample1->{Oscillation_no_kappa[color=Blue]}
Sample1->{Oscillation_kappa_X[color=Blue]}
Sample1->{Oscillation_kappa_Y[color=Blue]}
Sample1->{Oscillation_kappa_Z[color=Blue]}
Oscillation_no_kappa->{autoPROC_no_kappa[color=Red]}
Oscillation_kappa_X->{autoPROC_kappa_X[color=Red]}
Oscillation_kappa_Y->{autoPROC_kappa_Y[color=Red]}
Oscillation_kappa_Z->{autoPROC_kappa_Z[color=Red]}
Oscillation_no_kappa->{autoPROC_GPhL_workflow[color=Red]}
Oscillation_kappa_X->{autoPROC_GPhL_workflow[color=Red]}
Oscillation_kappa_Y->{autoPROC_GPhL_workflow[color=Red]}
Oscillation_kappa_Z->{autoPROC_GPhL_workflow[color=Red]}
{rank=min;Sample1;}
{rank=sample;Snapshots;Automesh;Mesh;Line;Characterisation}
{rank=aimless;}
}
```