---
name: Crystallography ideas discussion 2020-11-04
tags: meeting notes
---
# Crystallography ideas discussion 04/11/20
## Agenda
## 1. Assessing disorders/pseudosymmetries using DIALS
- Which disorders would be useful/practical to assess through image/spot analysis? Translocation disorder, translational NCS etc.
- Which algorithms are most appropriate to assess this: suggested use of CryoEM-2D-classification algorithms?
- At which points of DIALS processing would it be most appropriate to report on such effects?
-- Initial thoughts [PHZ]
Space group and unit cell assignement is a crucial step, and in critical cases inspecting the images directly is important to identify OD type twinning and other disorder type effects. The disorder manifests itself as peak broadening, smearing of reflections or even a total absence of intensity, for specific subsets of reflections related to the specific disorder type. The streaking direction might not align with the images, and to properly visualize things you would like to be able to select a specific plane/slice in reciprocal space to actually see it properly. In our earlier discussion I suggested using the Patterson function calculated on the full (non-bragg locations included) dataset, but this could be expensive, instead one could compute autocorrelations on subsections if this would save memory.
RR: Issue is that it's difficult to assess NCS etc just from merged data.
- assess peaks in patterson map?
- classify spots without any further assumptions
- some good examples from Andrey
- based on patterson map, predict which slices of reciprocal space would expect to see effects. Extract this and save image.
- these issues more common in small molecule datasets? Could be good to explore first? Get in touch with I19 crew at DLS? These might be good examples were twinning is the only issue rather than lots of extra issues.
- GW: Based on patterson, map image density to reciprocal space and assess any features? Could pick up sublattices for example.
- example in recent paper https://mcl1.ncifcrf.gov/dauter_pubs/272.pdf
- MG: Sum data along a lattice direction, would easily see features between bragg spots. e.g. crysalis pro can do something like this.
- Recent paper on Acta D, get the data by emailing author: https://journals.iucr.org/d/issues/2020/11/00/qh5066/qh5066.pdf
AP: GW to get in touch with authors of above, suggest zenodo upload.
BW: General scope for guided indexing tools? Often requested from small molecule world. Reciprocal lattice viewer could help here if features picked up by spotfinding. RSMapper might be useful here too?
Plan - get some example data, investigate processing with DIALS, see what we can see already and how might incorporate some of these ideas.
### 2. Likelihood targets and degrees of freedom per reflection
- What is the proper way to weight intensities during structure determination. Important to consider are the handling of errors in integration/scaling, how many degrees of freedom there are (per reflection?).
- Background material: https://journals.iucr.org/d/issues/2020/08/00/rr5195/rr5195.pdf
-- Initial thoughts [PHZ]
The effective sample size N_eff for a given miller index can be calculated from the Welch–Satterthwaite equation, which simplifies to:
\begin{eqnarray}
w_i &=& 1/\sigma_i^2 \\
V_1 &=& \sum_{i=1}^{N} w_i \\
V_2 &=& \sum_{i=1}^{N} w_i^2 \\
N_{\mathrm{eff}} &=& V_1^2 / V_2
\end{eqnarray}
The resulting mean intensity is now approximately distributed according to a t-distribution with N_eff degrees of freedom. We should be able to compute this quanity as a simple after-thought after scaling and merging. It takes into account discrepancies in weights of different reflections.Note that the variance of a t distribution is N_eff/(N_eff-2) times larger than that of the associated normal distribution, due to its heavier tails. The effects of this are clearly seen in normal probability plots.
Very easy for us to add in DIALS output. How to test this? Ideally need to know exact value of the mean - small molecules have much better estimate of the mean. A longstanding issue in small molecule land.
PZ: Could also use very high redundancy dataset (with no damage) and analyse small sections of this and compare against full.
JBE: Consider implications for dials.scale error models of t-distribution approach as opposed to assumption of normality.