owned this note
owned this note
Published
Linked with GitHub
---
name: DIALS meeting 2021-05-12
tags: meeting notes
---
# DIALS Meeting 2021-05-12
[![hackmd-github-sync-badge](https://hackmd.io/MK6KYBMJS6ykO5XyKK3T1Q/badge)](https://hackmd.io/MK6KYBMJS6ykO5XyKK3T1Q)
## Agenda
- Experiment identifiers - Graeme's proposed changes
- CCP4 APS workshop (mid June) - in CDT timezone (UTC-5), (BST-6)
- `dials.scale` PRs adding bulk absorption corrections parameters
- CBFlib and pycbf
## Discussions
### Dials Scaling Absorption Corrections
- James discussed how DIALS doesn't do much for absorption correction by default. There are weight parameters in place but somewhat complicated for users. Wanted to add some sensible "low/medium/high" high-level configuration options.
- Have been added to `dials.scale` and `xia2` in [`dials/dials#1688`](https://github.com/dials/dials/pull/1688) and [`xia2/xia2#592`](https://github.com/xia2/xia2/pull/592), but not making them turned on by default yet.
- Everyone was asked if they were generally happy having in this sort of broad percentage setting?
- A discussion on what the use case for "Medium" correction would be - people seemed generally satisfied that it was worthwhile
- Putting in an automatic helper e.g. in xia2 is more complicated, not planning to get in for this release, only a user option for now
- Richard asks if we should set it automatically if requesting anomalous scaling - James agreed that it would be worth looking at, but we should spend some time using it first, so leaving off by default.
### CCP4 APS Workshop
- CCP4 is running a workshop at APS week of 14th June(?). Graeme is giving several talks - wants to know if David has any more details from inside CCP4.
- Configuration is possibly more challenging than other workshops - Argonne security means that cannot run workstations for people to NX into, discussions about deploying and using machines on the cloud.
- Graeme asks if several people are available for assistance during the practical data sessions afternoon of Wednesday 16th June - raised as possibly useful for James/Elena
- They start 8am (CST/UTC-5) - 14:00 Local time (UTC+1/BST)
- Nick: When running BAG?
- Ask in beamlines slack channel - no response
### Experiment Identifiers
- Graeme has been delving into our experiment and ExperimentList handling as part of [`dials/dials#1694`](https://github.com/dials/dials/pull/1694) and found it intersecting again with the proposals described in [`dials/dials#1029`](https://github.com/dials/dials/issues/1029).
- Prepared slides for discussion...
```
Experiment identifier not list position
┌──────┐
┌──────┐ ┌──────┐┌─▶│ │
│ │◀──┤ ├┘ ├──────┤
├──────┤ ├──────┤┌─▶│ │
│ │◀──┤ ├┘ └──────┘
├──────┤ ├──────┤ Nope!
│ │◀──┤ ├┐ ┌──────┐
├──────┤ ├──────┤└─▶│ │
│ │◀──┤ ├┐ ├──────┤
└──────┘ └──────┘└─▶│ │
└──────┘
Refl Expt Parallel
Expt
```
- We use the `id` column in reflection table identifiers as corresponding to experiment position in list
- This is the worst possible way - cannot split into subsets as now the numbering is wrong if you decide to only refine one experiment out of list of 4 - cannot
- Moving to ID as an internal reflection table value, with a corresponding internal map to identifiers was the original motivation - but never got fully implemented - for convenience, existing experiments were labelled as their index, and this was then relied upon
- When doing work in [`dials/dials#1694`](https://github.com/dials/dials/pull/1694) it was pointed out that this could be considered Yet Another Identifier
- Richard pointed out that identifiers were supposed to help with bookkeeping problems with multicrystal/multiplex - but never really clarified how it was supposed to work
- Ben noted that the intent has been here before - not a new idea, still good intent - but the difficulty is that plumbing is torturous
- Graeme estimates the effort levels to fix this are ~150 places in the code. With a couple of days and couple of people could be done as a development push
- Zeroth order task is coming up with pattern to iterate over experiment list - efficient way of coming up with reflections that map to identifier
```
Create new experiment, not overwrite
Import Index
┌──────┐ ┌──────┐
│ │ ╔════▶│ │
└──────┘ ║ └──────┘
╔════════════════╗ ║
║ ╠═══╝
║ ┌──────┐ ║ ┌──────┐
║ │ │ ║ │ │
║ └──────┘ ║ └──────┘
║ Didn't
╚════════════index,
Gone!
```
- AKA [`dials/dials#1029`](https://github.com/dials/dials/issues/1029)
- When we import experiments and assign reflections - we currently throw away lots of information about where a reflection came from - if we import multiple experiments then index - these are now gone! If we index, then then improve geometry to point where unindexed reflections could be indexed - this information is now gone!
- Issues with data model breaking because no guaranteed mapping between reflections and experiments
- Believe that we should create new experiments with crystals in for indexing, without overwriting or discarding the existing experiment
- Just need to filter out experiments without lattices in downstream processing
- We can't just have experiments with no crystal because of multiple lattices - each lattice needs to be in a separate experiment
- When asked if there were any objections to this philosophy, there was broad ~~silence~~agreement
- The question was raised if there would ever be a reason to go beyond two layers - the danger of it becoming a tree. Graeme elaborated that the intention was that it might not even need a pointer to the parent - would be derivable through shared models
```
Create new Experiment, not overwrite
Import
┌──────┐ ┌──────┐
│ ├───┬─┬──▶│ │ Experiment still
└──────┘ │ │ └──────┘ exists
┌──────┐ │ │ ┌──────┐ -> Spots not
│ ├────────▶│ │ orphaned
└──────┘ │ │ └──────┘
│ │ ┌──────┐
│ └──▶│ │
│ └──────┘
│ ┌──────┐
└────▶│ │
└──────┘
```
- Point of principle: No orphaned reflections
```
Better data container - HDF5
/expt/...
/refl/...
Create @ spotfind, integrate
Update Otherwise
```
- If we kept all of our data in HDF5 files - reflections _and_ Experiments - we could just read data you needed to read - would not need to read all of it - shoeboxes being the main example here
- Makes threat of problems from exploding our experiment count with the previous changes go away
- Some discussion about whether using HDF5 would put it at risk of being co-opted or coerced into strict NxMX compliance. Although there are some potential areas where we could overlap, deficiencies in the standard and things that we use that aren't represented (e.g. shoeboxes, scan-varying models) would mean that it wouldn't really be suitable for our purposes
```
Reflections with Experiments
┌──────┐
import───────▶│ expt │
└───┬──┘ ┌─────▶refine
┌──────────────┘ │ │
▼ ┌─────┴┐ │
find_spots────────▶│ rflx │◀─────┘
└──────┘
│ ▲ │
index ◀──────────┘ │ │
│ │ │
└────────────────┘ │
┌────────┘
▼ ┌──────┐
integrate───────▶│ rflx │
└──────┘
```
- If we have reflections and data in HDF5 format, can put in same file
- Could give serious thought to not making copies of data files, only updating.
- Some discussion over whether this would work as an option or be the only way to interact. It was discussed as a user story, whether users would end up just saving `_1`, `_2`, `_final` copies of the data files
- Some discussions about how this would affect reproduceability, and what user tooling would be required to make this manageable
- The topic of external data (e.g. masks) was discussed
- Two points where we create these files: Spotfinding and integration
- Would reduce working disk space
```
┌───────────────────────┬───────────────────────┐
│ │ │
│ │ │
│ Reflection/ │.─. Create │
│ Expt Identity ) Experiments │
│ │`─' │
│ .───. │ .───. │
│ ( ) │ ( ) │
├─────────` '─────────┼─────────` '─────────┤
│ │ │
│ │ │
│ Update not │.─. New data │
│ create files ) container │
│ │`─' │
│ │ │
│ │ │
└───────────────────────┴───────────────────────┘
```
- Four things going on! All interconnected!
- A short discussion was had over how much work would be involved in doing this. Graeme thought that it would take four people a few days.
- Nick thought that was optimistic, but that it might be worth spending that time as a task force - even if it failed, we would have a much better handle on the problem