--- title: ASTR 597 W23 HW 5 author: Andy Tzanidakis and friends tags: LSST Class --- [TOC] # Homework 5 - ASTR 597 W23 - Author: Anastasios (Andy) Tzanidakis - Course ID: ASTR 597 W23 # Problem 1 In this homework we will attempt to "replicate" a classic SDSS paper ([Baldry et al. 2004](https://ui.adsabs.harvard.edu/abs/2004ApJ...600..681B/abstract)) with simulated Rubin data. Plotting the absolute r-band magnitudes of low-redshift galaxies vs. their (u-r) colors shows clearly the red sequence, blue clump, and green valley populations. We will investigate if the DP0.2 galaxies show the same distribution, explore the implications since we won't have (many) spectroscopic redshifts, and look at possible evolution with redshift. Portions of this work were developed based on [this notebook](https://github.com/rubin-dp0/delegate-contributions-dp02/tree/main/photoz/CMNN_Estimator) by Melissa Graham. We will need to deal with "k-corrections" in order to convert our observer frame photometry to the rest frame. See [Hogg et al. 2002](https://ui.adsabs.harvard.edu/abs/2002astro.ph.10394H/abstract) for a pedagogical overview. The python package `kcorrect` by Mike Blanton provides our easist way to deal with this. The package documentation is [here](https://kcorrect.readthedocs.io/), the source code [here](https://github.com/blanton144/kcorrect), and the paper describing the algorithm is [here](https://ui.adsabs.harvard.edu/abs/2007AJ....133..734B/abstract). ## Problem 1a - Recreating Baldtry et al. 2004 Now use `kcorrect` to compute the rest-frame absolute magnitude in r-band as well as the rest frame u-r color, and plot them. Note that you'll need to exclude some rows with `NaN`s. Compare your result qualitatively to Figure 1 of Baldry et al. 2004. ```python= import numpy as np import matplotlib.pyplot as plt import pandas as pd import numpy.ma as ma %matplotlib inline %config InlineBackend.figure_format = "retina" from tqdm import tqdm from matplotlib import rcParams rcParams["savefig.dpi"] = 100 rcParams["font.size"] = 20 from lsst.rsp import get_tap_service, retrieve_query service = get_tap_service() from astropy.cosmology import WMAP9 as cosmo import astropy.units as u from tqdm import tqdm import kcorrect.kcorrect ``` We will first begin by loading our parquet file containing the photometry and redshift of the galaxies. We will also estimate the K-corrected maggies and inverse variances: ```python= df = pd.read_parquet('lowz_query_results.parquet') df = df.dropna() # Responses filters and K-correction responses = ['decam_u', 'decam_g', 'decam_r', 'decam_i', 'decam_z', 'decam_Y'] kc = kcorrect.kcorrect.Kcorrect(responses=responses) maggies = df[['obj_cModelMag_u', 'obj_cModelMag_g', 'obj_cModelMag_r', 'obj_cModelMag_i', 'obj_cModelMag_z', 'obj_cModelMag_y']].apply( lambda x: 10.0 ** (-0.4 * x)).values ivars = df[['obj_cModelMagErr_u', 'obj_cModelMagErr_g', 'obj_cModelMagErr_r', 'obj_cModelMagErr_i', 'obj_cModelMagErr_z', 'obj_cModelMagErr_y']].apply( lambda x: 1./ (0.4 * np.log(10) * x) ** 2.).values / (maggies**2.) ``` To recreate the Baldry et al. 2004 figure, we will need to estimate the colors and h$_{70}$ (i.e Hubble constant divided by 70 km/s/Mpc): ```python= redshift = df['ts_redshift'].to_numpy() coeffs = kc.fit_coeffs(redshift=redshift, maggies=maggies, ivar=ivars) Ms = kc.absmag(maggies=maggies, ivar=ivars, redshift=redshift, coeffs=coeffs) r_mag = df['obj_cModelMag_r'].to_numpy() u_mag = df['obj_cModelMag_u'].to_numpy() h70 = cosmo.H(0)/70 # h70 is defined as H0/70 ``` Below we plot the galaxy CMD in u-r mag: ```python! h70 = cosmo.H(0)/70 # h70 MrH70 = Ms[:,1] - 5*np.log10(h70) hist_data = np.histogram2d(MrH70, u_mag-r_mag, bins=(150, 450)) plt.figure(figsize=(8,5)) plt.imshow(np.log10(hist_data[0].T), origin='lower', extent=(MrH70.min(), MrH70.max(), (u_mag-r_mag).min(), (u_mag-r_mag).max()), cmap='cividis', aspect='auto', interpolation='nearest') plt.colorbar(label='log$_{10}$(N$_{galaxy}$)') plt.ylim(0, 4) plt.xlabel("M$_r, Petro$ - 5log(h$_{70}$)") plt.ylabel("u-r") ``` ![](https://i.imgur.com/NgQSquR.png) We notice that this figure looks dramatically different compared to the Baldry et al. 2004 figure. First, we note that the redshift range is different, in Baldry et al. 2004 they considered redshifts between 0.004 < z < 0.08. We also notice a larger fraction of fainter galaxies. That is because LSST will be deeper compared to SDSS, and hence we expect that our luminosity function will be somewhat more skewed to the fainter galaxies. # Problem 2 ## Problem 2a - Photo-z Estimates We made two cheats above: we used the truth table to tell us which galaxies we have as well as their redshifts. In SDSS those redshifts were obtained through spectroscopy, but for Rubin most redshifts will have to be obtained photometrically. Using the data from our query above, follow [this notebook](https://github.com/rubin-dp0/delegate-contributions-dp02/tree/main/photoz/CMNN_Estimator) by Melissa Graham to compute a very simple photo-z for these galaxies. Plot the true redshift vs estimated photo-z, and then re-plot the CMD from Exercise 1 using photo-z rather than the true spectroscopic redshift. Using Melissa Graham's tutorial, we estimate the photo-z using the CMNN estimator approach: ```python= data_id = np.asarray(df['mt_match_objectId'], dtype='int') # true ("spec") redshifts data_tz = np.asarray(df['ts_redshift'], dtype='float') # true ("spec") magnitudes data_tm = np.transpose(np.asarray((df['ts_mag_u'],df['ts_mag_g'],\ df['ts_mag_r'],df['ts_mag_i'],\ df['ts_mag_z'],df['ts_mag_y']),\ dtype='float' ) ) # object apparent magnitudes data_om = np.transpose(np.asarray((df['obj_cModelMag_u'],df['obj_cModelMag_g'],\ df['obj_cModelMag_r'],df['obj_cModelMag_i'],\ df['obj_cModelMag_z'],df['obj_cModelMag_y']),\ dtype='float' ) ) # object apparent magnitude errors data_ome = np.transpose(np.asarray((df['obj_cModelMagErr_u'],df['obj_cModelMagErr_g'],\ df['obj_cModelMagErr_r'],df['obj_cModelMagErr_i'],\ df['obj_cModelMagErr_z'],df['obj_cModelMagErr_y']),\ dtype='float' ) ) # true ("spec") and object colors and color errors data_tc = np.zeros( (len(data_om),5), dtype='float' ) data_oc = np.zeros( (len(data_om),5), dtype='float' ) data_oce = np.zeros( (len(data_om),5), dtype='float' ) data_tc[:,0] = data_tm[:,0] - data_tm[:,1] data_tc[:,1] = data_tm[:,1] - data_tm[:,2] data_tc[:,2] = data_tm[:,2] - data_tm[:,3] data_tc[:,3] = data_tm[:,3] - data_tm[:,4] data_tc[:,4] = data_tm[:,4] - data_tm[:,5] data_oc[:,0] = data_om[:,0] - data_om[:,1] data_oc[:,1] = data_om[:,1] - data_om[:,2] data_oc[:,2] = data_om[:,2] - data_om[:,3] data_oc[:,3] = data_om[:,3] - data_om[:,4] data_oc[:,4] = data_om[:,4] - data_om[:,5] data_oce[:,0] = np.sqrt( data_ome[:,0]**2 + data_ome[:,1]**2 ) data_oce[:,1] = np.sqrt( data_ome[:,1]**2 + data_ome[:,2]**2 ) data_oce[:,2] = np.sqrt( data_ome[:,2]**2 + data_ome[:,3]**2 ) data_oce[:,3] = np.sqrt( data_ome[:,3]**2 + data_ome[:,4]**2 ) data_oce[:,4] = np.sqrt( data_ome[:,4]**2 + data_ome[:,5]**2 ) cmnn_ppf = 0.68 cmnn_minNclr = 5 cmnn_thresh_table = np.zeros(6, dtype='float') for d in range(6): cmnn_thresh_table[d] = chi2.ppf(cmnn_ppf,d) cmnn_thresh_table[0] = float(0.0000) ``` ```python= # where we will store photo-z data_pz = np.zeros(len(data_tz), dtype='float') - 1.0 data_pze = np.zeros(len(data_tz), dtype='float') - 1.0 Ncalc = len(data_tz) ``` ```python= %%time t1 = datetime.datetime.now() for i, r in tqdm(enumerate(range(len(data_pz)))): # calculate DM and DOF DM = np.nansum((data_oc[r,:] - data_tc[:,:])**2 / data_oce[r,:]**2, axis=1, dtype='float') DOF = np.nansum((data_oc[r,:]**2 + data_tc[:,:]**2 + 1.0) / (data_oc[r,:]**2 + data_tc[:,:]**2 + 1.0), axis=1, dtype='int') # calculate the thresholds data_th = np.zeros(len(DOF), dtype='float') for d in range(6): tx = np.where(DOF == d)[0] data_th[tx] = cmnn_thresh_table[d] del tx DM[r] = 99.9 # identify the CMNN subset of training-set galaxies: # those for which the DM is less than the threshold ix = np.where((DOF >= cmnn_minNclr) & (data_th > 0.00010) & \ (DM > 0.00010) & (DM <= data_th))[0] if len(ix) > 0: # choose a random training-set galaxy from the CMNN subset rix = np.random.choice(ix, size=1, replace=False)[0] data_pz[r] = data_tz[rix] data_pze[r] = np.std(data_tz[ix]) del rix else: data_pz[r] = float('nan') data_pze[r] = float('nan') del DM, DOF, data_th, ix ``` Here, we display the true injected redshift compared to the recovered photo-z redshift derived using the CMNN estimator: ```python! plt.figure(figsize=(6,5)) plt.hexbin(data_tz, data_pz, gridsize=50, cmap='binary') plt.plot( [0.0,2.0], [0.0,2.0], color='firebrick', ls='--', lw=2) plt.colorbar(label='N$_{galaxy}$') plt.xlim(min(data_pz), max(data_tz)) plt.ylim(min(data_pz), max(data_pz)) plt.xlabel("True Redshift") plt.ylabel("Photo-z Redshift") ``` ![](https://i.imgur.com/8lteP0s.png) Now, we can regenerate the galaxy CMD plot with our photo-z samples. As expected from the above plot, we know that the photo-z estimator that we recovered only the higher redshifts. When translating this to the CMD plot using the simulated photo-z redshifts, we find that the photo-z CMD is overall more skewed with more fainter galaxies: ![](https://i.imgur.com/JxKqqnd.png) Finally, in the figure below we plot the cumulative distribution of the spectroscopic and photometric redshifts: ![](https://i.imgur.com/KKO76xP.png) # Problem 3 ## Problem 3a - Hi-z Galaxy CMD Repeat exercise 1 for a higher redshift range (z=1.5-2?) and see how the CMD changes. Same as in the first problem, we will repeat all our calculations as done before. For the redshift distribution, we will assume a random power law distribution: ```python= synth_z = 1 + np.random.power(10, len(df)) # synthetic redshifts ``` Below we can see what the injected redshift distribution will look like, while keeping the same magnitudes: ![](https://i.imgur.com/ZVkciaX.png) In the figure below, we show the galaxy CMD for the synthetic redshift distribution from z=1.5-2. If we keep the magnitudes the same, we find that the overall CMD will shift to the brighter end of the distribution: ![](https://i.imgur.com/q3FEKCc.png) # Acknowledgements Many concepts and ideas for the workflow of this project were inspired by conversations and code I shared with Tom Wagg, David Wang, Tobin Wainer, and Jake Kurlander, and John Franklin Crenshaw.