# Gaia rotation curves - mock data analysis
## Problem statement
We have the following three basic sources of uncertainty:
- limited amount of data (statistical uncertainty)
- measurement errors ($\sigma_{v_\phi}$ for each star)
- variance of the data (in each bin of $r$, there is a distribution of stars with a nonzero width in $v_\phi$ and $v_r$)
The posterior of the likelihood gives us a central value of the circular velocity for each bin (the rotation curve), the uncertainties through the width of the posterior, and also the correlations between the uncertainties.
We can investigate this by generating mock data from a known, simple distribution that exactly correspods to the model, propagating it through the model, and comparing it to the values assumed in generating the mock distribution.
### Central value of the posterior (bias)
Ideally, our analysis should be unbiased - we should get out the central values of the rotation curve that we assumed in the mock data generation (we don't look at it now).
### Width of the posterior (sensitivity analysis)
We should also be able to see the effect of the above sources of uncertainty on the width of the posterior.
**How do each of the basic sources of uncertainty affect the uncertainty estimate from the width of the posterior?**
In each bin, let's sample mock data for $v_\phi$, $v_r$, $\sigma_{v_\phi}$ as follows:
$$
v_\phi \sim \mathcal{N}(M_{v_\phi}, V_{v_\phi})
$$
$$
v_r \sim \mathcal{N}(M_{v_r}, V_{v_r})
$$
$$
\ln{\left|\frac{\sigma_{v_\phi}}{v_\phi}\right|} \sim \mathcal{N}(M_{\sigma_{v_\phi}}, V_{\sigma_{v_\phi}})
$$
Therefore, under this model, the velocities $v_\phi$, $v_r$, $\sigma_{v_\phi}$ are fully described by the per-bin quantities $M_{v_\phi}, V_{v_\phi}, M_{v_r}, V_{v_r}, M_{\sigma_{v_\phi}}, V_{\sigma_{v_\phi}}$ and an $N_{\mathrm{stars}}$ normalization coefficient.
We can determine the approximate values of the parameters directly from data, taking the mean/std of the empirical distributions.
Our goal here is to have the mock data to be approximately similar to the real data distributions, i.e. reasonably physical.



Now the distributions that will be seen by the likelihood are follow Gaussian / log-normal distributions exactly:


We can run the usual MCMC fit for 5k steps, discarding the first 200 steps.


This gives us central values and uncertainties for each bin, computed using the 16th, 50th and 84th percentiles
Given the width of the posterior in each bin, we can define the relative error of a given bin as
$$
E=\frac{P_{16\%} - P_{84\%}}{P_{50\%}}
$$
and use that as a figure of merit for the sensitivity.
#### Variation in the number of stars (statistical uncertainty)
Then, we can investigate the effect of parameter variations on the output, for example creating mock data samples with different sampling rates using $N_{\mathrm{stars}} \rightarrow [0.5, 0.8, 0.9, 1.0, 1.1, 1.2, 1.5] \times N_{\mathrm{stars}}$ and studying the corresponding variation in the relative error with respect to the nominal / empirical value of $1.0 \times N_{\mathrm{stars}}$.
Posterior with $N_{\mathrm{stars}} \rightarrow 0.5 \times N_{\mathrm{stars}}$.

Posterior with $N_{\mathrm{stars}} \rightarrow 1.5 \times N_{\mathrm{stars}}$.

We can also compute the average change relative error across bins, normalized to the central value of $N_{\mathrm{stars}}$.
