owned this note
owned this note
Published
Linked with GitHub
# Electron Tomography Data(base) Model
In this document we suggest a possible data model for storing Electron Tomography data. The model ties into the existing ISPyB database model (Delagenière *et al.*, 2011) where possible, and introduces new tables where appropriate.
## The model
```mermaid
classDiagram
direction LR
DataCollectionGroup "1" -- "50" DataCollection
DataCollectionGroup "*" -- "1" BLSession
BLSession "*" -- "1" BeamLineSetup
DataCollection "1" -- "60" Movie
Movie "1" -- "1" MotionCorrection
MotionCorrection "1" -- "1" CTF
MotionCorrection "1" -- "20" MotionCorrectionDrift
DataCollection "1" -- "*" Tomogram
Tomogram "1" -- "*" TiltImageAlignment
Movie "1" -- "*" TiltImageAlignment
```
Before an experiment is conducted the physical sample information is already set up and stored in ISPyB. This could cover information such as DNA/RNA sequence data, references to known PDB or CSD structures and substructures, sample risk assessment information, and sample container metadata, such as the position of the sample within a dewar and identifying barcodes. None of this is further covered in this document, but serves as an example of the benefits of tying into the already existing ISPyB data model.
As the electron microscope is prepared by staff for users a `BeamLineSetup` entry is used to record the microscope settings that are not routinely modified by users. This setup is tied to a `BLSession` record, which contains information such as the start and end date of the user session, and links into user data and the underlying scientific proposal.
When a physical sample is finally loaded into the electron microscope a `DataCollectionGroup` entry is created. This entry represents a number of individual, consecutive tilt experiments on a single physical sample loaded into the electron microscope.
A `DataCollectionGroup` entry does not directly reference to any files on disk, but does refer to sample and session information, as outlined above, and is linked to by an arbitrary number of `DataCollection` records.
A `DataCollection` represents a single tilt experiment of one specific physical sample at one specific stage position. It is at this level that we begin referring to actual file locations on disk.
Data acquired from a specific physical sample, at the same specific location, and at a specific tilt angle is recorded in the `Movie` table. Every table entry refers back to a `DataCollection` entry. The `Movie` table already exists and is used for EM Single Particle Analysis, and allows us to tie into the existing related tables `MotionCorrection`, `MotionCorrectionDrift`, and `CTF` to store data analysis information.
The table `Tomogram` contains data analysis results for a specific sample at a specific position, thus referring to a `DataCollection` entry. Finally, the table `TiltImageAlignment` links to both `Tomogram` and the relevant `Movie` entries, storing the per-movie reconstruction information for all tomograms.
## Table overview
Table | is populated | -
-- | -- | --
`BLSample` | pre-acquisition | before samples arrive on site
`BeamLineSetup` | pre-acquisition | beginning of user session
`DataCollectionGroup` | acquisition | when a new physical sample is involved
`DataCollection` | acquisition | when the sample is moved to a new position
`Movie` | acquisition | when the sample tilt angle changes
`MotionCorrection` | early analysis |
`MotionCorrectionDrift` | early analysis |
`CTF` | early analysis |
`Tomogram` | reconstruction |
`TiltImageAlignment` | reconstruction |
## In detail
### `DataCollectionGroup`
Each individual tilt experiment results in a separate `DataCollection` entry, referencing this `DataCollectionGroup`.
The `DataCollectionGroup` table already exists and includes the following relevant fields, among others, that we propose are populated for every EM tomography experiment:
Field | Type | Property | Comment
-- | -- | -- | --
dataCollectionGroupId | INT(11) | unique | Primary key
sessionId | INT(10) | | FK referencing the user session (visit)
blSampleId | INT(10) | optional | FK referencing a sample definition
experimentType | ENUM | optional| Set to `tomo` to signify a tomography data collection. The fact that this is a CryoEM experiment is already encoded via `BLSession.beamlineName`.
startTime | DATETIME | optional | Timestamp of the beginning of the first data collection on the physical sample
endTime | DATETIME | optional | Timestamp of the end of the last data collection on the physical sample
For future use the `DataCollectionGroup` also already provides fields that refer to a sample and container barcode, and sample and container slot numbers, which will allow pinpointing where the physical sample came from.
### `DataCollection`
As used for crystallographic diffraction experiments a `DataCollection` entry in ISPyB refers to a single contiguous collection of images from one sample, usually incorporating some stage movement such as a rotation (for diffraction data sets), translation (for gridscans), or both (helical scans).
For tomography data sets a `DataCollection` most naturally maps onto a rotation data set obtained from a specific physical sample at a specific stage position, and incorporates the tilt angle sweep movement.
The `DataCollection` table already exists, and includes the following relevant fields that we propose are populated for every EM tomography experiment:
Field | Type | Property | Comment
-- | -- | -- | --
dataCollectionId | INT(11) | unique | Primary key
dataCollectionGroupId | INT(11) | | FK referencing DataCollectionGroup
startTime | DATETIME | optional | Timestamp of the beginning of the tilt sweep collection
endTime | DATETIME | optional | Timestamp of the end of the tilt sweep collection
experimentType | VARCHAR(24) | optional | Collection scheme description, eg. '*standard*', '*dose-symmetric*', or others.
imageSizeX | MEDIUMINT(8) | optional | Image size in x, units: pixels
imageSizeY | MEDIUMINT(8) | optional | Image size in y, units: pixels
numberOfImages | INT(10) | optional | Number of '*images*' in the tilt series, in the context of EM data acquisition this would refer to the number of movies collected at different angular positions. The actual number may turn out be less than the anticipated number because of acquisition limitations. Therefore if the DataCollection entry is written at the start of collection it may become necessary to update this number at the end of collection.
imageDirectory | VARCHAR(255) | optional | Location of files, should end with `/`
imagePrefix | VARCHAR(55) | optional | File naming convention
imageSuffix | VARCHAR(55) | optional | File naming convention
pixelSizeOnImage | FLOAT | optional | factor to translate image size to real size; units micrometre/pixel
axisStart | FLOAT | optional | Lowest tilt axis angle, units: degrees
axisEnd | FLOAT | optional | Highest tilt axis angle, units: degrees
nominalDefocus | FLOAT | optional | Nominal defocus, units: Angstrom
voltage | FLOAT | optional | Unit: kV
### `Movie`
An entry in the `Movie` table corresponds to data collected from one specific physical sample at a specific stage position with a specific tilt angle. Every `Movie` entry references a `DataCollectionID`.
The `Movie` table already exists and is currently used for EM Single Particle Analysis experiments. Extending this table allows use of the existing `MotionCorrection`, `MotionCorrectionDrift`, and `CTF` tables. We only need to add a few fields to accommodate the tomography-specific information.
One downside of reusing the existing `Movie` table is that the database will not enforce a uniqueness guarantee of `movieNumber` per `DataCollectionID`, which was not considered in the existing SPA table definition.
Field | Type | Property| Comment
-- | -- | -- | --
movieId | INT(11) | unique | PK
dataCollectionId | INT(11) | optional | FK to `DataCollection` table
acquisitionMovieNumber | MEDIUMINT(8) | optional | Order of acquisition within the data collection
orderedMovieNumber | MEDIUMINT(8) | optional | Index of this movie within the data collection ordered by ascending angle
movieFullPath | VARCHAR(255) | optional | references a file
positionX | FLOAT | optional| actual stage position during collection
positionY | FLOAT | optional | actual stage position during collection
angle | FLOAT | optional, **new** | unit: degrees relative to perpendicular to beam
fluence | FLOAT | optional, **new** | accumulated electron fluence from start to end of acquisition of this movie (commonly, but incorrectly, referred to as 'dose')
numberOfFrames | INT | optional, **new** | number of frames per movie. This should be equivalent to the number of `MotionCorrectionDrift` entries, but the latter is a property of data analysis, whereas the number of frames is an intrinsic property of acquisition.
### `Tomogram`
A proposed new table, storing data analysis output. Each record references a single `DataCollection` entry.
Field | Type | Property | Comment
-- | -- | -- | --
tomogramID | INT(11) | unique | PK
dataCollectionId | INT(11) | | FK to `DataCollection` table
autoProcProgramId | INT(10) | | FK, gives processing times/status and software information
volumeFile | VARCHAR(255) | | `.mrc` file representing the reconstructed tomogram volume
stackFile | VARCHAR(255) | optional | `.mrc` file containing the motion corrected images ordered by angle used as input for the reconstruction
sizeX | INT | | pixels
sizeY | INT | | pixels
sizeZ | INT | | pixels or number of slices
pixelSpacing | FLOAT | | Angstrom/pixel conversion factor
residualErrorMean | FLOAT | | Alignment error (nm)
residualErrorSD | FLOAT | | Standard deviation of the alignment error (nm)
XaxisCorrection | FLOAT | | X axis angle (etomo) in degrees
tiltAngleOffset | FLOAT | | tilt Axis offset (etomo) in degrees
Zshift | FLOAT | | shift to center volumen in Z (etomo)
### `TiltImageAlignment`
A proposed new table, storing more data analysis output. Each record references exactly one `Movie` and one `Tomogram`.
Field | Type | Property | Comment
-- | -- | -- | --
movieID | INT(11) | | FK to `Movie` table
tomogramID | INT(11) | | FK to `Tomogram` table; tuple (movieID, tomogramID) is unique
defocusU | FLOAT | | **TBD: supposedly goes into CTF table, but unclear where**
defocusV | FLOAT | | **TBD: supposedly goes into CTF table, but unclear where**
psdFile | | | **TBD: supposedly goes into CTF table, but unclear where** powerspectrum data file
resolution | FLOAT | | **TBD: supposedly goes into CTF table, but unclear where**
fitQuality | FLOAT | | **TBD: supposedly goes into CTF table, but unclear where**
refinedMagnification | FLOAT | optional | unitless
refinedTiltAngle | FLOAT | | units: degrees
refinedTiltAxis | FLOAT | | units: degrees
residualError | FLOAT | | Residual error (nm)
### `BeamLineSetup`
The `BeamLineSetup` table already exists and is used to store beamline (or in this case: microscope) setup information that is normally not controllable by the user. With this proposal one field is added to the existing schema:
Field | Type | Property | Comment
-- | -- | -- | --
amplitudeContrast | FLOAT | optional | **TBD units?**
## Proposal Mapping
The following tables are not part of the proposal. They are merely here as an aid to track how this proposal compares to the previous proposal version.
### TiltSeriesMovies
Field | Goes where | Comments
-- | -- | --
rowId | DataCollection.dataCollectionId |
\_size | DataCollection.numberOfImages | also: N(Movies)
\_samplingRate | DataCollection.pixelSizeOnImage |
\_acquisition.\_magnification | DataCollection.magnification
\_acquisition.\_voltage | DataCollection.voltage
\_acquisition.\_sphericalAberration | BeamLineSetup.CS
\_acquisition.\_amplitudeContrast | BeamLineSetup.amplitudeContrast
\_acquisition.\_doseInitial | - | can be obtained by summing fluence values
\_acquisition.\_dosePerFrame | Movie.fluence
\_acquisition.\_angleMin | DataCollection.axisStart
\_acquisition.\_angleMax | DataCollection.axisEnd
\_acquisition.\_step | - | can be obtained by dividing axisRange by (numberOfImages - 1)
\_acquisition.\_angleAxis1 | - | -
\_acquisition.\_angleAxis2 | - | Dual-axis acquisition is not priority right now, but will consider how this can be mapped onto ISPyB.
\_dimensions | DataCollection.imageSizeX | Dimensions (x,y)
\_dimensions | DataCollection.imageSizeY | Dimensions (x,y)
\_dimensions | DataCollection.numberOfImages | Dimensions (#angles)
\_dimensions | Movie.numberOfFrames | Dimensions (#frames)
\_tsId | DataCollection.imagePrefix or DataCollection.fileTemplate |
### TiltImageMovie
Field | Goes where | Comments
-- | -- | --
rowid | Movie.movieId
tiltSerieMovieId | Movie.dataCollectionId
\_filename | Movie.movieFullPath
\_accumDose | Movie.fluence
\_dimensions | DataCollection.imageSizeX | Dimensions (x,y)
\_dimensions | DataCollection.imageSizeY | Dimensions (x,y)
\_dimensions | Movie.numberOfFrames | Dimensions (#frames)
\_tiltAngle | Movie.angle
\_acqOrder | Movie.movieNumber
\_MotionX | MotionCorrectionDrift.deltaX
\_MotionY | MotionCorrectionDrift.deltaY
### TiltSeries
Field | Goes where | Comments
-- | -- | --
rowId | DataCollection.dataCollectionId
\_size | DataCollection.numberOfImages
\_dimensions | DataCollection.imageSizeX | Dimensions (x,y)
\_dimensions | DataCollection.imageSizeY | Dimensions (x,y)
\_dimensions | DataCollection.numberOfImages | Dimensions (#angles)
filename | DataCollection.fileTemplate
titlseriesmovieIramd | -
samplingRate | DataCollection.pixelSizeOnImage
\_residualErrorMean | Tomogram.residualErrorMean
\_residualErrorSD | Tomogram.residualErrorSD
### TiltImage
Field | Goes where
-- | --
rowid | -
tiltserieId | DataCollection.dataCollectionId
tiltimagemovie.rowid | Movie.movieId
\_index | Movie.acquisitionMovieNumber, .orderedMovieNumber
\_defocusU | TiltImageAlignment.defocusU
\_defocusV | TiltImageAlignment.defocusV
\_psdFile | TiltImageAlignment.psdFile
\_resolution | TiltImageAlignment.resolution
\_fitQuality | TiltImageAlignment.fitQuality
\_rotationTiltImage | TiltImageAlignment.refinedAxis
\_tiltTiltImage | TiltImageAlignment.refinedAngle
\_Mag | TiltImageAlignment.refinedMagnification
\_ResidulErrorTiltImage | TiltImageAlignment.residualError
### Tomogram
Field | Goes where
-- | --
rowid | Tomogram.tomogramID
tiltserieId | DataCollection.dataCollectionId
\_filename | Tomogram.volumeFile
\_samplingRate | Tomogram.pixelSpacing
\_dimensions | Tomogram.sizeX
\_dimensions | Tomogram.sizeY
\_dimensions | Tomogram.sizeZ
\_XaxisCorrrection | Tomogram.XaxisCorrection
\_TiltAngleOffset | Tomogram.TiltAngleOffset
\_Zshift | Tomogram.Zshift