# Lena's Project
## 20/11/2023
1) Relative Difference Prior, Quadratic Prior, see https://ieeexplore.ieee.org/document/998681
2) MLEM/MAP for KL, see https://web.stanford.edu/~pratx/PDF/DMLEM.pdf
Let $f(u):= KL(Au, d)$, where $A$ is the Projection Operator and $d$ raw sinogram (PET) data. There is also background noise in PET (random + scattered).
$$ KL(Au, d) = \int Au - d\,log(Au + \eta )$$
The gradient descent for $f(u)$ ( with a (diagonal) preconditioner which may depend on the iterates and an adaptive step size)
$$\begin{equation}
\begin{aligned}
x_{k+1} & = x_{k} - \gamma_{k}D(x_k)f'(u)\\
& = x_{k} - \gamma_{k}D(x_k)A^{T}(1 - \frac{d}{Ax_{k} + \eta})
\end{aligned}
\end{equation}
$$
If $\gamma_{k} = \frac{1}{L}$ where $L$ is the Lispchitz constant for $f'(u)$ then it should converge. The problem here is that we do not know $L$ when we have poisson noise and the KL.
* There is one alterantive to define an L-smooth KL function as they do in https://arxiv.org/pdf/1706.04957.pdf equation (40). However, this is not true in real applications: "*The noise is modeled to be Poisson with a constant background of 200 compared to the approximate data mean of 694.3.*"
* Or we can use backtracking/linesearch but can be quite expensive computationally.
If we replace $D(x_k) = \frac{x_k}{A^{T}1}$ and $\gamma_{k}=1$ then we end up with the standard MLEM (written in operator form not using sums)
**Note:** In SIRF, we can do MLEM or OSEM implemented in C++. The convention that they use there is that they have a gradient ascent, maximizing the negative log, see [here](https://stir.sourceforge.net/documentation/doxy/html/classstir_1_1PoissonLogLikelihoodWithLinearModelForMean.html). So in order to match SIRF and CIL for MLEM, we need to use $\gamma_{k}=-1$. [Slide 17](https://epapoutsellis.github.io/assets/pdf/2023_CIL_User_Meeting_EPapoutsellis_StochasticOpt.pdf)
## 5/12/2023 (Sean)
I've spent some time with Parra and Barrett (PB)
https://ieeexplore.ieee.org/abstract/document/700734
to understand what exactly the list-mode likelihood should be so that we may try to apply other methods besides MLEM to it. It may be my misunderstanding, but I think there are a few minor mistakes in the exposition there (in PB) and this is my own interpretation. I will mostly use their notation, but there are a few changes which I thought necessary to fix my understanding.
The emissions are modeled as a Poisson proccess with an intensity $f_i$ in each of $M$ pixels and vector $f$ with components $f_i$ ($f$ is the image). The $j$th of $N$ detections is denoted by $A_j$ which represents the information we know about the exiting particle (you can think of $A_j$ as giving a cone) and vector $A$ containing all of the $A_j$ as components (maybe $A$ is technically stored as an array). We have the following events:
* $i$: a particle is emitted from pixel $i$ (this is subtlely different from PB who let $i$ denote the event that a particle emitted from pixel $i$ is detected.)
* $d$: an emitted particle is detected.
Notation:
* $s_i = p(d|i)$. The probability of detecting a particle emitted from pixel $i$. Also, $s$ is the vector with components $s_i$. This is called the sensitivity.
Given this set-up:
$$
p(i|f,d) = \frac{f_is_i}{\sum_{n=1}^M f_n s_n}
$$
and
$$
p(A_j|f,d) = \sum_{i=1}^M p(A_j|i,f,d) p(i|f,d).
$$
Note that $p(A_j|i,f,d) = p(A_j|i, \tilde{f},d)$ for any other value of $\tilde{f}$, so let us set $\tilde{f}$ to have all components $\tilde{f}_i = 1$. Bayes' rule says
$$
p(A_j|i,f,d) = \frac{p(i|A_j,\tilde{f},d)p(A_j|\tilde{f},d)}{p(i|\tilde{f},d)}
$$
and
$$
p(i|\tilde{f},d) = \frac{p(d|i)p(i|\tilde{f})}{p(d|\tilde{f})} = \frac{s_i}{\sum_{j=1}^M s_j}.
$$
Thus
$$
p(A_j|f,d) = p(A_j|\tilde{f},d)\left (\frac{ \sum_{j=1}^M s_j}{\sum_{j=1}^M f_n s_n}\right )\sum_{i=1}^M p(i|A_j,\tilde{f},d) f_i.
$$
Given this formula, it makes sense to define the system matrix $[A]_{ji} = p(i|A_j,\tilde{f},d)$. Then the last formula is
$$
p(A_j|f,d) = p(A_j|\tilde{f},d)\left (\frac{1^t s}{s^t f}\right )Af.
$$
Here superscript $t$ indicates transpose. As Lena pointed out, unlike in the binned case it is not clear that the matrix $A$ is related to the sensitivity $s$. This actually appears to be one of the big differences between the binned and list-mode cases in terms of practical calculation.
Now, consider the full likelihood under the assumption of a fixed time of measurement $T$ (PB also considers a fixed number of detections $N$ and the following can be modified for that situation). Assuming independence of each measured datum $A_j$, the negative log-likelihood is
$$
L(A,N|f,T) = -\ln\left (p(N|f,T) \Pi_{j=1}^N p(A_j|f,d) \right )
$$
where
$$
P(N|f,T) = (T s^t f)^N \frac{e^{-T s^t f }}{N!}.
$$
Now, let us define $\tilde{L}$ to be the negative log-likelihood with terms that do not depend on $f$ removed (since they do not affect the argmin). Then,
$$
\tilde{L}(A,N|f,T) = Ts^t f -1^t \ln(Af).
$$
This is fairly similar to the KL-divergence given above, but in that case
$$
s_i = \sum_{j=1}^N A_{ji}
$$
meaning that $s^t f = 1^t Af$. This gives the KL-divergence as above. Therefore, as mentioned the key difference appears to be the relation between $A$ and $s$ which cannot be used in the list-mode case. Being able to write the likelihood in terms of $Af$ opens the door to using many of the algorithms of CIL and it is not immediately clear we can do this for list-mode ... .
### Regarding the system matrix.
The system matrix is
$$
[A]_{ji} = p(i|A_j,\tilde{f},d).
$$
The rows of this matrix (fixed $j$) can be estimated using the methods Lena already has for MLEM.
### Using CIL for list-mode data
Picking up from above, if we add a regularisation term, or prior, then for list-mode we arrive at the following negative log-likelihood for the posterior
$$
F(A,N|f,T) = Ts^T f -1^T \ln(Af) + V(f)
$$
which must be minimised in $f$. The term $V(f)$ is the regulatisation which we leave unspecified for the moment. In order to make this fit into the CIL framework, let me switch $f \mapsto x$ so that the variational problem for the list-mode MAP estimate becomes
$$
\arg \min_x \{ T s^t x - 1^t\ln(Ax) + V(x)\}
$$
The question is, how can we use CIL to compute the minimiser for this problem? One method would use LADMM which solves the problem
$$
\arg \min_x \{f(x)+g(y) | \tilde{A}x + \tilde{B}y = b \}
$$
We can put our problem into this formate by setting
* $f(x) = Ts^t x + V(x)$,
* $g(y) = -1^t \ln(y)$,
* $\tilde{A} = A$, $\tilde{B} = -\mathrm{Id}$, $b = 0$.
To apply LADMM, it is necessary to compute the proximal map of both $f$ and $g$. For $f$ we have:
$$
\begin{split}
\mathrm{prox}_{\tau f}(v) & = \arg \min_x \Bigg\{\tau T s^t x + \tau V(x)\ + \frac{1}{2}\| x - v\|^2\Bigg\} \\
& = \arg \min_x \Bigg \{ \tau V(x) + \frac{1}{2} \| x - v + \tau T s \|^2\Bigg \}\\
& = \mathrm{prox}_{\tau V}(v - \tau T s).
\end{split}
$$
Thus, if we can calculate the proximal map of $V$, then we can also calculate the proximal map of $f$. For $g$:
$$
\mathrm{prox}_{\tau g}(v) = \arg \min_y \Bigg \{ -\tau 1^t \ln(y) + \frac{1}{2}\|y - v\|^2 \Bigg \}.
$$
Since $g$ is separable, we can compute the proximal map component-wise to get
$$
\mathrm{prox}_{\tau g}(v)_j = \frac{v_j+\sqrt{v_j^2 + 4 \tau}}{2}.
$$
This should give us all the required tools to use LADMM in CIL for list-mode MAP imaging. We do need to choose $V$ and find its proximal map. I suspect we could also apply PDHG with a similar strategy.
## NOVO, CIL, and MLEM (Lena)
**NOVO imaging**
The NOVO project aims to develop an imaging system capable of detecting mm shifts in therapeutic proton beam range. During proton therapy, protons of the treament beam interact with patient tissue and produce fast neutrons (FNs) and prompt-gamma rays (PGs) whose production origins can be related to, and imaging there of thus gives us a means to monitor, the proton beam range.
The proposed NOVO detector (named NOVCoDA), illustrated in Fig. 1, consists of a scintillator array capable of detecting PGs and FNs. For imaging PGs and FNs we require three and two consequative scatters, respectively, in seperate detector array elements. This allows us to assign so-called event cones to the coincident events (i.e., triple PG scatter and double FN scatter). An event cone is identified by a vertex $a$, half-opening angle $\theta$, and an axis $\vec{n}$.
The vertex has the coordinates of the first scatter and the axis is defined as the unit vector between the second and first scatter. The half-opening angle of a PG and FN event cone are calculated using Compton and non-elatic neutron-proton scattering kinematic equations (for details see [[1]](https://doi.org/10.1038/s41598-023-33777-w)).
Measured data is stored in list mode. A measured event can be considered as a cone with parameters {$a$, $\theta$, $\vec{n}$}, and a list of measured events as a list of observed cones, e.g. Tab.1.
*Table 1. Example of event cones as list-mode data. $I$ = number of observed cones*
| Event ID | $a$ | $\theta$ | $\vec{n}$ |
|----------|-----|-----|---------|
| 1 | 2.5 | 45° | (1, 0, 0) |
| 2 | 3.0 | 60° | (0.5, 0.1, 0.4) |
| 3 | 2.0 | 30° | (0.4, 0.4, 0.2) |
| ... | ... | ...| ... |
| $I$ | ... | ...| ... |
Projecting observed event cones onto a pixelized plane (see Fig. 1.D) with surface normal perpendicular to the beam axis gives us the **Simple Backprojection** image of the observed PGs and FNs.

*Figure 1. From NOVO's feasibility study, Fig. 1 A.-D. [[1](https://doi.org/10.1038/s41598-023-33777-w)].*
**Imaging with cones and MLEM**
There are other imaging systems than NOVO that also use event cones, such as Compton Cameras and Neutron Scatter Cameras. One of the most common image reconstruction approaches for such systems is Maximum Likelihood Expectation Maximization (MLEM), or Maximum a posteriori (MAP), sometimes refered to as regularized MLEM.
In proton range verification we deal with few observed events due to the proton beam intensity ($10^6$-$10^9$ protons/per beam spont), detection efficiency and data processing. In cases with low counts we wish to preserv as much information as possible and therefor store it in list-mode instead of bins.
MLEM was initially formulated for binned data, but has been proved to hold for list-mode data as well, by Parra and Barrett [[4](https://ieeexplore.ieee.org/abstract/document/700734)].
List-mode (LM) MLEM is also a common approach for PET imaging [[]()]. Note that PET does not use cones for imaging and tries to image a radiotracer distribution, such as fluorine-18 (18F).
**CIL in a nutshell**
In a nutshell, CIL offers a comprehensive optimization framework tailored for various imaging applications. Within this framework, many algorithms are designed to solve the following problem:
$$
\hat{x} = \arg\min_{x} f(x) + g(x)
$$
where $f$ is the fidelity term and $g$ is a regularization function.
Iterative algorithms of CIL include
- Gradient Descent (GD)
- Conjugate Gradient Least Squares (CGLS)
- Simultaneous Iterative Reconstruction Technique (SIRT)
- (ISTA)
- (FISTA)
- (PDHG)
- (LADMM)
- (SPDHG)
The problem solved and algorithms of these are elaborated on in the [CIL documentation, Optimisaton framework](https://tomographicimaging.github.io/CIL/nightly/optimisation.html).
<span style="color:green;">**Comment:** To my (Lena) knowledge, these algorithms do not have list-mode versions.</span>
<span style="color:red;">**(Reply Vaggelis):** There is a recent paper [SPDHG ListMode](https://arxiv.org/pdf/2201.05497.pdf) by a collaborator Georg Schramm. We can do it with CIL SPDHG and for your data.
**Who utilizes CIL?**
Primarily, CIL is adopted by users in fields such as x-ray tomography (CT), with a smaller presence in neutron tomography and PET/MR imaging [[2](https://doi.org/10.1007/s42979-023-02059-7),[3](https://doi.org/10.1098/rsta.2020.0208)].
<span style="color:purple;">**Question:** How and why is PET using CIL? PET also stores data in lost-mode. What does list-mode PET data look like? Has list-mode PET data been used with CIL?</span>
<span style="color:red;">**(Reply Vaggelis):** Very soon a list mode OSEM (or MLEM) implementation will be merged in [SIRF](https://github.com/SyneRBI/SIRF/pull/1103). The API is very similar to non list mode data. See [here](https://github.com/SyneRBI/SIRF/pull/1103/files#diff-d8efee1607154f015608f1bc524fae4477f44211587db066709ee15cbf217eddR48) where a KL fidelity is defined and OSEM reconstruction is [here](https://github.com/SyneRBI/SIRF/pull/1103/files#diff-d8efee1607154f015608f1bc524fae4477f44211587db066709ee15cbf217eddR63). Since it exists in SIRF, CIL can do it for PET data. Also, I believe that we can use other CIL algorithms that compute gradient of smooth function.
<div style="text-align: center;">
<div style="width: 90%; border: 2.5px solid #000000; border-radius: 20px; padding: 25px; background-color: #f0f0f0; display: inline-block; margin: 0 auto;">
**X-ray and neutron computed tomography**
X-ray and neutron computed tomography (CT) are imaging techniques used to create detailed pictures of objects or tissues inside the body or other materials. X-rays are a type of electromagnetic radiation, while neutrons are subatomic particles found in the nucleus of atoms. In X-ray CT, a machine sends X-ray beams through the object being examined, such as a human body or an industrial component. These X-rays are absorbed differently by different materials, with bones absorbing more X-rays than soft tissues, resulting in brightness variations in the images. Detectors on the other side of the object measure the remaining X-rays after they have passed through, and by rotating the X-ray source and detectors around the object and taking multiple measurements from different angles, a computer can create a detailed 3D image of the object's internal structures, similar to a series of slices. On the other hand, neutron CT works similarly to X-ray CT but uses neutrons instead of X-rays. Neutrons have different properties compared to X-rays, as they can penetrate some materials that X-rays cannot and are sensitive to different types of atoms. In neutron CT, a beam of neutrons is directed through the object being examined. Neutrons interact with the atomic nuclei of the materials they pass through, and the interactions produce signals that can be detected by detectors positioned after the object. By analyzing the signals detected from different angles as the neutron beam is rotated around the object, a computer can reconstruct a 3D image showing the distribution of different materials inside the object. Both X-ray and neutron CT are powerful imaging techniques used in various fields such as medicine, archaeology, engineering, and materials science to study the internal structures of objects and materials in detail, providing valuable information for diagnosis, research, and quality control.
</div>
</div>
### CIL goal
"The goal of the Core Imaging Library is to allow the user to simply create iterative reconstruction methods which go beyond the standard filter back projection technique ..." - [CIL documentation](https://tomographicimaging.github.io/CIL/nightly/introduction.html)
If CIL wishes to provide iterrative reconstruction alorthim to a broader audience, implementing MLEM/MAP would be a good start.
<span style="color:red;">**(Reply Vaggelis):** MLEM, SIRT, SART, ART or better Kaczmarz algorithm are basically gradient descent/ascent methods which CIL can do it and is very easy to implement. We have utilities to model the imaging application, e.g., LeastSquares, KullbackLeibler etc. So MLEM is gradient ascent acting on KullbackLeibler fitting term. Of course GD is slow and preconditioning to make a better step_size/leanring rate is needed. We can now do OSEM or Stochastic Gradient Descent which is equivalent to [Randomized Kaczmarz](https://people.eecs.berkeley.edu/~brecht/cs294docs/week1/09.Strohmer.pdf). SIRT is GD on Least Squares with some weight and preconditioning for the step size. You can approximate (Taylor) KL with a weighted LS term. This is standard in [PET](https://web.eecs.umich.edu/~fessler/papers/files/jour/94/pre/tmi,pwl.pdf) but in a recent discussion with Jeff Fessler "*is not very accurate but people still citing it*". You can still apply SIRT, by using another weight, e.g., $$\int \frac{(Au - d)^2}{d}$$, see [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4608542/).
**References**
1. Meric, I., Alagoz, E., Hysing, L.B. et al. A hybrid multi-particle approach to range assessment-based treatment verification in particle therapy. Sci Rep 13, 6709 (2023). https://doi.org/10.1038/s41598-023-33777-w
2. M. Cyrus Daugherty et al. "Assessment of Dose-Reduction Strategies in Wavelength-Selective Neutron Tomography." SN Computer Science, vol. 4, no. 586, 2023, https://doi.org/10.1007/s42979-023-02059-7.
3. Brown R et al. 2021 Motion estimation and correction for simultaneous PET/MR using SIRF and CIL. Phil. Trans. R. Soc. A 379: 20200208. https://doi.org/10.1098/rsta.2020.0208
4. L. Parra and H. H. Barrett, "List-mode likelihood: EM algorithm and image quality estimation demonstrated on 2-D PET," in IEEE Transactions on Medical Imaging, vol. 17, no. 2, pp. 228-235, April 1998, doi: 10.1109/42.700734.