ASTR 597 W23 HW

--- title: ASTR 597 W23 HW #2 author: Andy Tzanidakis and friends tags: LSST Class --- [TOC] # Homework 2 - ASTR 597 W23 - Author: Anastasios (Andy) Tzanidakis - Collaborators: None - Course ID: ASTR 597 W23 # Problem 1 We will initialize our set-up for this problem and importing our favorite libraries ```python= # Import general python packages import numpy as np import matplotlib.pyplot as plt import pandas from astropy import units as u from astropy.coordinates import SkyCoord import astropy.units as u # Import the Rubin TAP service utilities from lsst.rsp import get_tap_service, retrieve_query %matplotlib inline %config InlineBackend.figure_format = "retina" from tqdm import tqdm from matplotlib import rcParams rcParams["savefig.dpi"] = 100 rcParams["font.size"] = 20 # create the TAP service client service = get_tap_service() # we will all use the same input coordinates and radii center_coords = "62, -37" radius_deg = 0.5 # degrees ``` ## Exercise 1a. How many `isPrimary` How many `isPrimary` objects are in all of DP0.2? ```python= results = service.search("SELECT count(detect_isPrimary)" "FROM dp02_dc2_catalogs.Object " "WHERE detect_isPrimary = 1 ").to_table() >> Total number of rows 161025835 ``` Hooray, our query is complete. We find in the DP.0.2 object catalogue over 160,000,000 rows that satisfy `isPrimary`. ## Exercise 1b. Memory How much memory do you need to load one row of the Object table? *Hint: use the dtypes of all the columns.* To calculate this problem, we can enumerate through each column and take into account the `dtype` where we can sum the number of bits and convert it to bytes: ```python= job = service.submit_job("SELECT TOP 1 *" "FROM dp02_dc2_catalogs.Object ") job.run() job.wait(phases=['COMPLETED', 'ERROR']) # final table res = job.fetch_result().to_table() bites = 0 for k in res.keys(): d_format = res[f'{k}'].dtype if d_format=='float32' or d_format=='int32': bites += 32 elif d_format=='float64' or d_format=='int64': bites += 64 elif d_format=='bool': bites +=1 print ('Total row memory:' + (bites*u.bit).to(u.kilobyte)) >> Total row memory: 4.1725 kbyte ``` Using one row in DP.0.2 is roughly 4.2 kilobytes! ## Exercise 1c. Maximum rows on `daata.lsst.cloud` What's the maximum number of full rows of the Object table can you load into a Medium instance on data.lsst.cloud? Let's assume we can use up to 6.11 Gigabytes in memory RAM for a medium lsst.cloud instance: ```python= print (((6.1*u.gigabyte).to(u.kilobyte))/(4.14*u.kilobyte)) ``` $$\begin{equation} \text{Max Rows} = \frac{\text{Max Memory Medium Instance}}{\text{Memory per Row}} = \frac{6.1 \ Gb}{3.312 \times 10^{-5} Gb} \approx 1 473 430 \ \text{rows} \end{equation}$$ # Problem 2 ## Exercise 2a. Extendedness What values can the `{band}_extendedness` entry take? What fraction of the time do the values disagree between bands? What if you only look at rows where none of the `{band}_extendedness_flag` are set? We will use the query from the first problem and look at the top 100000 entries for this problem: ```python= # coords and radius (in deg) center_coords = "62, -37" rad = 0.5 results = service.search("SELECT TOP 100000 *" "FROM dp02_dc2_catalogs.Object " "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " f"CIRCLE('ICRS', {center_coords}, {rad})) = 1 ") table = results.to_table() ``` To find out the unique entries for `{band}_extendedness`: ```python= print (np.unique(table['z_extendedness'].data.data)) >> array([ 0., 1., nan]) ``` Now we want to know what fraction of times do the values between the `X_extendedness` disagree between each band: ```python= # coords and radis (in deg) center_coords = "62, -37" rad = 0.5 results = service.search("SELECT u_extendedness, g_extendedness, r_extendedness, i_extendedness, z_extendedness, y_extendedness" "FROM dp02_dc2_catalogs.Object " "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " f"CIRCLE('ICRS', {center_coords}, {rad})) = 1 " "AND detect_isPrimary = 1") table = results.to_table() passed_1 = np.where((table['u_extendedness']==1) & (table['g_extendedness']==1) & (table['r_extendedness']==1) & (table['i_extendedness']==1) & (table['z_extendedness']==1) & (table['y_extendedness']==1)) passed_0 = np.where((table['u_extendedness']==0) & (table['g_extendedness']==0) & (table['r_extendedness']==0) & (table['i_extendedness']==0) & (table['z_extendedness']==0) & (table['y_extendedness']==0)) total = len(passed_1[0]) + len(passed_0[0]) # total number of agreed values frac_total_extendedness = (100 - total/len(table) * 100) # 100 - total print (f"Percentage of all rows that have have extrendedness (0 or 1): {frac_total_extendedness} percent") >> 56.62 percent ``` We find that approximately 56% of each band disagrees with each other. We can also investigate how this fraction changes when the `X_extendedness_flag` is not set: ```python= cond_1 = np.where((table['u_extendedness_flag']==False) & (table['g_extendedness_flag']==False) & (table['r_extendedness_flag']==False) & (table['i_extendedness_flag']==False) & (table['z_extendedness_flag']==False) & (table['y_extendedness_flag']==False)) tab_new = table[cond_1] # new table passed_1_ = np.where((tab_new['u_extendedness']==1) & (tab_new['g_extendedness']==1) & (tab_new['r_extendedness']==1) & (tab_new['i_extendedness']==1) & (tab_new['z_extendedness']==1) & (tab_new['y_extendedness']==1)) passed_0_ = np.where((tab_new['u_extendedness']==0) & (tab_new['g_extendedness']==0) & (tab_new['r_extendedness']==0) & (tab_new['i_extendedness']==0) & (tab_new['z_extendedness']==0) & (tab_new['y_extendedness']==0)) print (f"Fraciton of X_extendedness when flag is not set: {1 - (len(passed_1_[0]) + len(passed_0_[0]))/len(tab_new)}") >> Fraciton of X_extendedness when flag is not set: 0.18672121303107636 ``` We find that the fraction is significantly smaller, about 19% when the `_flag` is not set. ## Exercise 2b. Signal to Noise Make a plot of PSF magnitude vs. magnitude error for r-band point sources in our region of interest. Exclude sources with the PSF Flux general failure flag set. Set the axes extents to a reasonable range and label your axes! Hints: Consult the [schema browser](https://dm.lsst.org/sdm_schemas/browser/dp02.html#Object) for column names. Consider using [hexbin](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hexbin.html) instead of a scatter plot with `bins='log'`... We will first begin with our query and leverage the `nanojanskyToAbMag` to convert the PSF fluxes to AB magnitudes: ```python= # convert to AB magnitues & error for the LSST-r results = service.search("SELECT coord_ra, coord_dec, " "scisql_nanojanskyToAbMag(r_calibFlux) as r_psf_mag, " "scisql_nanojanskyToAbMagSigma(r_calibFlux, r_calibFluxErr) as r_psf_mag_err " "FROM dp02_dc2_catalogs.Object " "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), " "CIRCLE('ICRS', "+center_coords+", 0.5)) = 1 " "AND detect_isPrimary = 1 ", maxrec=250_000) # fetch table table = results.to_table() ``` |<img src="https://i.imgur.com/6fcFe6p.png" alt="Image" width="500" height="500" style="display: block; margin: 0 auto" />| |:--:| | Figure 1. -- AB apparent magnitude and error estimated from the PSF flux estimates.| ## Exercise 3: Find a Transient or Variable Using the `DIAObject` table, find an object with more than 1 magnitude of variability and plot its lightcurve. One quick way to search through the light curves and check for >1 mag variability (i.e let's say we're interested in the LSST-r band magnitudes): ```python= for i in tqdm(range(100)): sel_objid = table[i]['objectId'] # From Tutorial 07b to fetch the light curve information query = "SELECT src.band, src.ccdVisitId, src.coord_ra, src.coord_dec, "\ "src.objectId, src.psfFlux, src.psfFluxErr, "\ "scisql_nanojanskyToAbMag(psfFlux) as psfMag, "\ "visinfo.ccdVisitId, visinfo.band, "\ "visinfo.expMidptMJD, visinfo.zeroPoint "\ "FROM dp02_dc2_catalogs.ForcedSource as src "\ "JOIN dp02_dc2_catalogs.CcdVisit as visinfo "\ "ON visinfo.ccdVisitId = src.ccdVisitId "\ "WHERE src.objectId = "+str(sel_objid)+" " # Query results = service.search(query) lc = results.to_table() # light curve table # Select band r = lc['band2']=='r' diff = np.median(lc['psfMag'][r])-min(lc['psfMag'][r]) if diff>1: print (f"Warning obj_id candidate {sel_objid} has >1 mag variability in the LSST-r bandpass") # Plotting light curve plt.figure(figsize=(10, 4)) plt.scatter(lc['expMidptMJD'][r], lc['psfMag'][r], color='crimson') plt.axhline(np.median(lc['psfMag'][r]), color='k', ls='--') plt.xlabel('MJD') plt.ylabel('m$_{LSST-r}$ [AB mag]') plt.ylim(plt.ylim()[::-1]) break ``` |<img src="https://i.imgur.com/ukNM9XM.png" alt="Image" width="500" height="230" style="display: block; margin: 0 auto" />| |:--:| | Figure 1. -- Sample LSST-r AB mag light curve with >1 mag variability.|