---
title: ASTR 597 W23 HW 5
author: Andy Tzanidakis and friends
tags: LSST Class
---
[TOC]
# Homework 5 - ASTR 597 W23
- Author: Anastasios (Andy) Tzanidakis
- Course ID: ASTR 597 W23
# Problem 1
In this homework we will attempt to "replicate" a classic SDSS paper ([Baldry et al. 2004](https://ui.adsabs.harvard.edu/abs/2004ApJ...600..681B/abstract)) with simulated Rubin data. Plotting the absolute r-band magnitudes of low-redshift galaxies vs. their (u-r) colors shows clearly the red sequence, blue clump, and green valley populations. We will investigate if the DP0.2 galaxies show the same distribution, explore the implications since we won't have (many) spectroscopic redshifts, and look at possible evolution with redshift. Portions of this work were developed based on [this notebook](https://github.com/rubin-dp0/delegate-contributions-dp02/tree/main/photoz/CMNN_Estimator) by Melissa Graham.
We will need to deal with "k-corrections" in order to convert our observer frame photometry to the rest frame. See [Hogg et al. 2002](https://ui.adsabs.harvard.edu/abs/2002astro.ph.10394H/abstract) for a pedagogical overview.
The python package `kcorrect` by Mike Blanton provides our easist way to deal with this. The package documentation is [here](https://kcorrect.readthedocs.io/), the source code [here](https://github.com/blanton144/kcorrect), and the paper describing the algorithm is [here](https://ui.adsabs.harvard.edu/abs/2007AJ....133..734B/abstract).
## Problem 1a - Recreating Baldtry et al. 2004
Now use `kcorrect` to compute the rest-frame absolute magnitude in r-band as well as the rest frame u-r color, and plot them. Note that you'll need to exclude some rows with `NaN`s.
Compare your result qualitatively to Figure 1 of Baldry et al. 2004.
```python=
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import numpy.ma as ma
%matplotlib inline
%config InlineBackend.figure_format = "retina"
from tqdm import tqdm
from matplotlib import rcParams
rcParams["savefig.dpi"] = 100
rcParams["font.size"] = 20
from lsst.rsp import get_tap_service, retrieve_query
service = get_tap_service()
from astropy.cosmology import WMAP9 as cosmo
import astropy.units as u
from tqdm import tqdm
import kcorrect.kcorrect
```
We will first begin by loading our parquet file containing the photometry and redshift of the galaxies. We will also estimate the K-corrected maggies and inverse variances:
```python=
df = pd.read_parquet('lowz_query_results.parquet')
df = df.dropna()
# Responses filters and K-correction
responses = ['decam_u', 'decam_g', 'decam_r', 'decam_i', 'decam_z', 'decam_Y']
kc = kcorrect.kcorrect.Kcorrect(responses=responses)
maggies = df[['obj_cModelMag_u', 'obj_cModelMag_g', 'obj_cModelMag_r',
'obj_cModelMag_i', 'obj_cModelMag_z', 'obj_cModelMag_y']].apply(
lambda x: 10.0 ** (-0.4 * x)).values
ivars = df[['obj_cModelMagErr_u', 'obj_cModelMagErr_g', 'obj_cModelMagErr_r',
'obj_cModelMagErr_i', 'obj_cModelMagErr_z', 'obj_cModelMagErr_y']].apply(
lambda x: 1./ (0.4 * np.log(10) * x) ** 2.).values / (maggies**2.)
```
To recreate the Baldry et al. 2004 figure, we will need to estimate the colors and h$_{70}$ (i.e Hubble constant divided by 70 km/s/Mpc):
```python=
redshift = df['ts_redshift'].to_numpy()
coeffs = kc.fit_coeffs(redshift=redshift, maggies=maggies, ivar=ivars)
Ms = kc.absmag(maggies=maggies, ivar=ivars, redshift=redshift, coeffs=coeffs)
r_mag = df['obj_cModelMag_r'].to_numpy()
u_mag = df['obj_cModelMag_u'].to_numpy()
h70 = cosmo.H(0)/70 # h70 is defined as H0/70
```
Below we plot the galaxy CMD in u-r mag:
```python!
h70 = cosmo.H(0)/70 # h70
MrH70 = Ms[:,1] - 5*np.log10(h70)
hist_data = np.histogram2d(MrH70, u_mag-r_mag, bins=(150, 450))
plt.figure(figsize=(8,5))
plt.imshow(np.log10(hist_data[0].T), origin='lower',
extent=(MrH70.min(), MrH70.max(),
(u_mag-r_mag).min(),
(u_mag-r_mag).max()), cmap='cividis',
aspect='auto', interpolation='nearest')
plt.colorbar(label='log$_{10}$(N$_{galaxy}$)')
plt.ylim(0, 4)
plt.xlabel("M$_r, Petro$ - 5log(h$_{70}$)")
plt.ylabel("u-r")
```

We notice that this figure looks dramatically different compared to the Baldry et al. 2004 figure. First, we note that the redshift range is different, in Baldry et al. 2004 they considered redshifts between 0.004 < z < 0.08. We also notice a larger fraction of fainter galaxies. That is because LSST will be deeper compared to SDSS, and hence we expect that our luminosity function will be somewhat more skewed to the fainter galaxies.
# Problem 2
## Problem 2a - Photo-z Estimates
We made two cheats above: we used the truth table to tell us which galaxies we have as well as their redshifts. In SDSS those redshifts were obtained through spectroscopy, but for Rubin most redshifts will have to be obtained photometrically.
Using the data from our query above, follow [this notebook](https://github.com/rubin-dp0/delegate-contributions-dp02/tree/main/photoz/CMNN_Estimator) by Melissa Graham to compute a very simple photo-z for these galaxies. Plot the true redshift vs estimated photo-z, and then re-plot the CMD from Exercise 1 using photo-z rather than the true spectroscopic redshift.
Using Melissa Graham's tutorial, we estimate the photo-z using the CMNN estimator approach:
```python=
data_id = np.asarray(df['mt_match_objectId'], dtype='int')
# true ("spec") redshifts
data_tz = np.asarray(df['ts_redshift'], dtype='float')
# true ("spec") magnitudes
data_tm = np.transpose(np.asarray((df['ts_mag_u'],df['ts_mag_g'],\
df['ts_mag_r'],df['ts_mag_i'],\
df['ts_mag_z'],df['ts_mag_y']),\
dtype='float' ) )
# object apparent magnitudes
data_om = np.transpose(np.asarray((df['obj_cModelMag_u'],df['obj_cModelMag_g'],\
df['obj_cModelMag_r'],df['obj_cModelMag_i'],\
df['obj_cModelMag_z'],df['obj_cModelMag_y']),\
dtype='float' ) )
# object apparent magnitude errors
data_ome = np.transpose(np.asarray((df['obj_cModelMagErr_u'],df['obj_cModelMagErr_g'],\
df['obj_cModelMagErr_r'],df['obj_cModelMagErr_i'],\
df['obj_cModelMagErr_z'],df['obj_cModelMagErr_y']),\
dtype='float' ) )
# true ("spec") and object colors and color errors
data_tc = np.zeros( (len(data_om),5), dtype='float' )
data_oc = np.zeros( (len(data_om),5), dtype='float' )
data_oce = np.zeros( (len(data_om),5), dtype='float' )
data_tc[:,0] = data_tm[:,0] - data_tm[:,1]
data_tc[:,1] = data_tm[:,1] - data_tm[:,2]
data_tc[:,2] = data_tm[:,2] - data_tm[:,3]
data_tc[:,3] = data_tm[:,3] - data_tm[:,4]
data_tc[:,4] = data_tm[:,4] - data_tm[:,5]
data_oc[:,0] = data_om[:,0] - data_om[:,1]
data_oc[:,1] = data_om[:,1] - data_om[:,2]
data_oc[:,2] = data_om[:,2] - data_om[:,3]
data_oc[:,3] = data_om[:,3] - data_om[:,4]
data_oc[:,4] = data_om[:,4] - data_om[:,5]
data_oce[:,0] = np.sqrt( data_ome[:,0]**2 + data_ome[:,1]**2 )
data_oce[:,1] = np.sqrt( data_ome[:,1]**2 + data_ome[:,2]**2 )
data_oce[:,2] = np.sqrt( data_ome[:,2]**2 + data_ome[:,3]**2 )
data_oce[:,3] = np.sqrt( data_ome[:,3]**2 + data_ome[:,4]**2 )
data_oce[:,4] = np.sqrt( data_ome[:,4]**2 + data_ome[:,5]**2 )
cmnn_ppf = 0.68
cmnn_minNclr = 5
cmnn_thresh_table = np.zeros(6, dtype='float')
for d in range(6):
cmnn_thresh_table[d] = chi2.ppf(cmnn_ppf,d)
cmnn_thresh_table[0] = float(0.0000)
```
```python=
# where we will store photo-z
data_pz = np.zeros(len(data_tz), dtype='float') - 1.0
data_pze = np.zeros(len(data_tz), dtype='float') - 1.0
Ncalc = len(data_tz)
```
```python=
%%time
t1 = datetime.datetime.now()
for i, r in tqdm(enumerate(range(len(data_pz)))):
# calculate DM and DOF
DM = np.nansum((data_oc[r,:] - data_tc[:,:])**2 / data_oce[r,:]**2, axis=1, dtype='float')
DOF = np.nansum((data_oc[r,:]**2 + data_tc[:,:]**2 + 1.0) / (data_oc[r,:]**2 + data_tc[:,:]**2 + 1.0), axis=1, dtype='int')
# calculate the thresholds
data_th = np.zeros(len(DOF), dtype='float')
for d in range(6):
tx = np.where(DOF == d)[0]
data_th[tx] = cmnn_thresh_table[d]
del tx
DM[r] = 99.9
# identify the CMNN subset of training-set galaxies:
# those for which the DM is less than the threshold
ix = np.where((DOF >= cmnn_minNclr) & (data_th > 0.00010) & \
(DM > 0.00010) & (DM <= data_th))[0]
if len(ix) > 0:
# choose a random training-set galaxy from the CMNN subset
rix = np.random.choice(ix, size=1, replace=False)[0]
data_pz[r] = data_tz[rix]
data_pze[r] = np.std(data_tz[ix])
del rix
else:
data_pz[r] = float('nan')
data_pze[r] = float('nan')
del DM, DOF, data_th, ix
```
Here, we display the true injected redshift compared to the recovered photo-z redshift derived using the CMNN estimator:
```python!
plt.figure(figsize=(6,5))
plt.hexbin(data_tz, data_pz, gridsize=50, cmap='binary')
plt.plot( [0.0,2.0], [0.0,2.0], color='firebrick', ls='--', lw=2)
plt.colorbar(label='N$_{galaxy}$')
plt.xlim(min(data_pz), max(data_tz))
plt.ylim(min(data_pz), max(data_pz))
plt.xlabel("True Redshift")
plt.ylabel("Photo-z Redshift")
```

Now, we can regenerate the galaxy CMD plot with our photo-z samples. As expected from the above plot, we know that the photo-z estimator that we recovered only the higher redshifts. When translating this to the CMD plot using the simulated photo-z redshifts, we find that the photo-z CMD is overall more skewed with more fainter galaxies:

Finally, in the figure below we plot the cumulative distribution of the spectroscopic and photometric redshifts:

# Problem 3
## Problem 3a - Hi-z Galaxy CMD
Repeat exercise 1 for a higher redshift range (z=1.5-2?) and see how the CMD changes.
Same as in the first problem, we will repeat all our calculations as done before. For the redshift distribution, we will assume a random power law distribution:
```python=
synth_z = 1 + np.random.power(10, len(df)) # synthetic redshifts
```
Below we can see what the injected redshift distribution will look like, while keeping the same magnitudes:

In the figure below, we show the galaxy CMD for the synthetic redshift distribution from z=1.5-2. If we keep the magnitudes the same, we find that the overall CMD will shift to the brighter end of the distribution:

# Acknowledgements
Many concepts and ideas for the workflow of this project were inspired by conversations and code I shared with Tom Wagg, David Wang, Tobin Wainer, and Jake Kurlander, and John Franklin Crenshaw.