# Merian Data Processing Using the LSST Science Pipelines The [Merian Survey](https://merian.sites.ucsc.edu/) is an ambitious program designed to explore the nature of dark matter, star formation, and feedback in dwarf galaxies. Merian will use 62 nights on the 4m Blanco telescope in Chile using the Dark Energy Camera (DECam). A total of 800 square degrees of the sky will be imaged in two custom made medium-band filters to create a sample of 100,000 star forming dwarf galaxies (with 90% completeness) in the redshift range 0.058 < z < 0.10. Merian will cover the [HSC SSP](https://hsc.mtk.nao.ac.jp/ssp/) Wide field which provides gravitational lensing capabilities to probe the dark matter component of dwarfs ([Leauthaud et al. 2020](https://ui.adsabs.harvard.edu/abs/2020PDU....3000719L/abstract)). This note summarizes the ingestion and data reduction process for Merian data, initially observed in Feb/March 2021, using the [Rubin Observatory](https://www.lsst.org/) [LSST Science Pipelines](https://pipelines.lsst.io/). There are three primary DECam dataset types: - object: - primary science frames - observation type: `object` - filters: - `N540 DECam c0014 5403.2 210.0` - `N708 DECam c0012 7080.0 400.0` - data storage locations on `tiger2-sumire`: - `/projects/MERIAN/raw/*/object` - zero: - bias frames - observation type: `zero` - data storage locations on `tiger2-sumire`: - `/projects/MERIAN/raw/*/zero` - domeflat: - flat frames - observation type: `dome flat` - data storage locations on `tiger2-sumire`: - `/projects/MERIAN/raw/*/domeflat` ![The DECam focal plane](https://www.researchgate.net/profile/Deborah-Gulledge/publication/327259994/figure/fig1/AS:664441125347331@1535426517714/DECam-focal-plane-showing-the-62-2k-4k-CCDs-8-2k-2k-CCDs-labeled-F-for-the.png "The DECam focal plane") :::info The DECam Focal Plane; figure from Diehl et al. 2018. DECam focal plane showing the 62 2k x 4k CCDs, 8 2k x 2k CCDs (labeled "F") for the adaptive optics system, and 4 2k x 2k CCDs (labeled "G") for guiding. The orientation of the sky is indicated. The black label (e.g., S30) indicates a position on the focal plane. The green label (e.g., 2) indicates the number of the CCD as is in the multi-extension FITS header. When the focal plane is viewed with the real-time display at the telescope or with default SAOImage DS9 settings, the direction labeled "north" is displayed to the left and "east" at the top. The background colors of the CCDs indicate the electronics backplane that reads them out. ::: ## 1. Preparing the Science Pipelines ### 1.1. Set up the Science Pipelines First, the LSST Science Pipelines ("*the stack*") needs to be set up on the local machine: ```bash= source "/projects/HSC/LSST/stack/loadLSST.sh" ``` This will set up the most recent Rubin environment installed on the machine. A list of other installed Rubin environments is shown using `mamba`: ```bash= mamba env list ``` Older Rubin environments contain older builds of the science pipelines. To switch to an older build, simply preface the shell source command above with the appropriate `LSST_CONDA_ENV_NAME` variable: ```bash= LSST_CONDA_ENV_NAME=lsst-scipipe-4.0.0; \ source "/projects/HSC/LSST/stack/loadLSST.sh" ``` The most recently installed version can again be loaded by unsetting this variable: ```bash= unset LSST_CONDA_ENV_NAME; \ source "/projects/HSC/LSST/stack/loadLSST.sh" ``` Once the science pipelines have been set up, we can now set up the main LSST software package, `lsst_distrib`, using `setup`: ```bash= setup lsst_distrib ``` It's possible to set up a specific tagged version of `lsst_distrib` using `-t`: ```bash= setup lsst_distrib -t w_2022_26 ``` Check what version of `lsst_distrib` is being used using `eups`: ```bash= eups list lsst_distrib | grep setup # g0b29ad24fb+e8b8cae3ca current w_2022_26 setup ``` ### 1.2. Register new filters This step is only required if the data to be ingested uses a filter which is not already defined. Before being able to ingest raw science frames, all necessary filters being ingested need to be defined in the relevant `obs_` package (Update, and also in the `skymap` repo - see the end of this section for further details). Here, the relevant package is [obs_decam](https://github.com/lsst/obs_decam), and the filters file is located at [obs_decam/python/lsst/obs/decam/decamFilters.py](https://github.com/lsst/obs_decam/blob/master/python/lsst/obs/decam/decamFilters.py). For this example, the required observation filter (`N708 DECam c0012 7080.0 400.0`) was not previously defined and had to be added manually. This modification has now been merged into the main branch, but the instructions on how to do this are maintained here, for reference. As a recap, to do so, first, `git clone` the `obs_decam` package into a local directory: ```bash= OBSDECAM=/home/lkelvin/repos/obs_decam git clone git@github.com:lsst/obs_decam.git $OBSDECAM cd $OBSDECAM ``` If this is the first time the package has been cloned, it will also need to be built using `scons` (as with all Science Pipelines packages), e.g.: ```bash= scons -j8 ``` Next, we checkout a user branch from the main branch to work on: ```bash= git checkout -b u/lskelvin/merian ``` Now add the relevant filter definition. In this case: ```bash= FilterDefinition(physical_filter="N708 DECam c0012 7080.0 400.0", band="N708", lambdaEff=708), ``` Finally, make sure both `lsst_distrib` and the relevant `obs_` package (`obs_decam` here) are set up in the working shell: ```bash= setup -j -r $OBSDECAM ``` Double check that the local package has been loaded using: ```bash= eups list | grep LOCAL # obs_decam LOCAL:/home/lkelvin/repos/obs_decam ``` Once complete, subsequent processing should be able to proceed. If a warning similar to `ingest WARN: Exposure DECam:ct4m20210318t032843 could not be registered: (sqlite3.IntegrityError) FOREIGN KEY constraint failed` is returned, check that all filters are correctly assigned in the filters file. Finally, `refObjLoader` lookups to the new filter need to be added to a number of `obs_decam` config files to facilitate astrometric matching. This allows data processing to proceed beyond `characterizeImage`, i.e., the final step required to produce a `calexp`. Here, we map the new `N708` filter into the existing `i`-band filter (the nearest broad-band filter in wavelength) by adding lines similar to: ```bash= refObjLoader.filterMap['N708'] = 'i' ``` into: ```bash= config/characterizeImage.py config/calibrate.py config/measureCoaddSources.py ``` > Note: Ticket [DM-30692](https://jira.lsstcorp.org/browse/DM-30692) added these additional config lines into the main branch. :::warning New filters also need to be registered in the `skymap` repository. Central wavelengths for all required filters should be added to `python/lsst/skymap/packers.py`. ::: ### 1.3. Create a new butler A new butler will be created in the directory `/projects/MERIAN/repo`. Here we set aliases for the output repository directory: ```bash= REPO=/projects/MERIAN/repo mkdir -p $REPO chmod ug+rw $REPO ``` Whilst optional, it may be desirable to also construct a log directory, for log files to be stored within: ```bash= LOGDIR=/projects/MERIAN/logs mkdir -p $LOGDIR chmod ug+rw $LOGDIR ``` If this repository will be used by more than one user, modify the permissions of the output repository directory to ensure that all files constructed below are writeable by all members of that user group: ```bash= cd $REPO umask 2 ``` > Note: if changing permissions after the butler has been used, and if using an SQLite database (see below), you will also need to run `chmod ug+rw gen3.sqlite3` to make the SQLite database read/writable to all members of your group. You will also need to run `chmod ug+rw u` to make the user output directory (here named `u`) read/writable to all members of your group. Next, an empty Gen3 Butler repository is created, and then the instrument is registered in the data repository. In this example, the instrument is the Dark Energy Camera (DECam). There are two types of database that be be constructed for use with the butler, either a SQLite database, or a PostgreSQL database. The former is default, and simpler to set up. The latter provides significantly improved data processing times, but requires a PostgreSQL database to have already been set up on the data processing machine in advance. Both methods are summarized in the subsections below: #### Option 1: Create a SQLite database On the command line, create a butler repo: ```bash= butler create $REPO ``` This constructs a butler.yaml file in the $REPO directory. #### Option 2: Create a PostgreSQL database Before beginning to create a butler, a PostgreSQL database must first be set up on the primary data processing machine. A number of extra configuration files also need to be in place. First, construct the seed config file. It is recommended that this file is constructed in, for example, `$REPO/seed-config.yaml`. Assuming the database is named 'merian', the contents of this file should look like this: ```bash= datastore: root: <butlerRoot> registry: db: postgresql+psycopg2://localhost:5432/merian ``` > Note: this file needs only be constructed once, for the purposes of creating the butler. Second, a science pipelines authentication file needs to be created in `~/.lsst/db-auth.yaml`. The contents of this file should look like this: ```bash= - url: postgresql://localhost:5432/merian username: merian password: MYSECRETPASSWORD ``` where `MYSECRETPASSWORD` is the database password for the `merian` database, for the `merian` user (the database and the username are the same in the example above, but do not necessarily need to be). > Note: each user who wishes to interact with the butler repository needs to place a copy of this authentication file within their own home space. The authentication file must be only readable by the users account (chmod 600 db-auth.yaml). Finally, create the butler repo: ```bash= butler create --seed-config seed-config.yaml $REPO ``` This constructs a butler.yaml file in the $REPO directory. ### 1.4. Register the instrument Once the butler repo has been created, register the instrument: ```bash= butler register-instrument $REPO lsst.obs.decam.DarkEnergyCamera ``` The `register-instrument` command will need to be re-run (once only) every time a new filter is added to the filter definitions file. > Note: the instrument name here needs to be the fully qualified name of an instrument subclass. Full names can be inferred from their respective `obs_` package at [github.com/lsst](github.com/lsst). For this example, the relevant `obs_` package is [obs_decam](https://github.com/lsst/obs_decam) and the fully qualified name is `lsst.obs.decam.DarkEnergyCamera`. Finally, double check that all required filters are correctly registered with the butler: ```bash= butler query-dimension-records $REPO physical_filter ``` In this case, double check that `N708 DECam c0012 7080.4 400.0` appears in the filter list. ### 1.5. Generate reference catalogues The Science Pipelines require reference catalogues ("*refcats*") to accurately calibrate photometric and astrometric results. Two reference catalogues are required here: [Gaia DR2](https://community.lsst.org/t/gaia-dr2-reference-catalog-in-lsst-format) for astrometry, and [Pan-STARRS PS1](https://community.lsst.org/t/pan-starrs-reference-catalog-in-lsst-format) for photometry. Further information is also available on [pipelines.lsst.io](https://pipelines.lsst.io/modules/lsst.meas.algorithms/creating-a-reference-catalog.html). The first step in constructing these reference catalogues is to gather the catalogue data together and ingest the files. This process is decidedly non-trivial, and may require several hours to complete even on a high-powered machine. Fortunately, these ingested outputs already exist on the machine used here, and can be used directly: ```bash= GAIADR2=/projects/HSC/refcats/htm/gaia_dr2_20200414 PANSTARRSPS1=/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110 ``` Next, create astropy-readable `.ecsv` table files containing one row per input file in each reference catalogue. For example, in Python: ```python= import os import glob import astropy.table # output directory to save .ecsv files outdir = "/home/lkelvin" # full paths to LSST sharded reference catalogues gaiadr2 = "/projects/HSC/refcats/htm/gaia_dr2_20200414" panstarrsps1 = "/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110" refcat_dirs = [gaiadr2, panstarrsps1] # loop over each FITS file in all refcats # note: this constructs a series of .ecsv files, each containing two columns: # 1) the FITS filename, and 2) the htm7 pixel index for refcat_dir in refcat_dirs: outfile = f"{outdir}/{os.path.basename(refcat_dir)}.ecsv" print(f"Saving to: {outfile}") table = astropy.table.Table(names=("filename", "htm7"), dtype=("str", "int")) files = glob.glob(f"{refcat_dir}/[0-9]*.fits") for ii, file in enumerate(files): print(f"{ii}/{len(files)} ({100*ii/len(files):0.1f}%)", end="\r") # try/except to catch extra .fits files which may be in this dir try: file_index = int(os.path.basename(os.path.splitext(file)[0])) except ValueError: continue else: table.add_row((file, file_index)) table.write(outfile) ``` > Note: the above script running on the `tiger` machine took ~20 minutes, 10 minutes per reference catalogue. A `.ecsv` file should now exist for each reference catalogue. Next, register the dataset types for each reference catalogue with the butler: ```bash= butler register-dataset-type $REPO gaia_dr2_20200414 SimpleCatalog htm7 butler register-dataset-type $REPO ps1_pv3_3pi_20170110 SimpleCatalog htm7 ``` Check that both the Gaia DR2 and Pan-STARRS PS1 dataset types are now available using: ```bash= butler query-dataset-types $REPO ``` Finally, ingest the LSST-formatted files into the `refcats/gen2` RUN collection in the repository: ```bash= butler ingest-files -t link $REPO gaia_dr2_20200414 refcats/gen2 gaia_dr2_20200414.ecsv butler ingest-files -t link $REPO ps1_pv3_3pi_20170110 refcats/gen2 ps1_pv3_3pi_20170110.ecsv ``` --- #### Legacy reference catalogue conversion When these instructions were first written, it was not possible to convert these refcats directly into a gen3 repo (as we're setting up here). The instructions below are preserved, for reference, but should no longer be required (assuming the above gen3-approach suffices). Here we use `butler convert` to map these data into our gen3 repo. First, set up an empty temporary gen2 butler directory, and link the two reference datasets into this directory within a folder named `ref_cats`: ```bash= GEN2REPO=/projects/MERIAN/repo_gen2 mkdir -p $GEN2REPO/ref_cats echo "lsst.obs.decam.DecamMapper" > $GEN2REPO/_mapper ln -s $GAIADR2 $GEN2REPO/ref_cats/ ln -s $PANSTARRSPS1 $GEN2REPO/ref_cats/ ``` Next, a simple config file is required: ```bash= echo 'config.refCats.append("gaia_dr2_20200414") config.runs["gaia_dr2_20200414"] = "refcats" config.refCats.append("ps1_pv3_3pi_20170110") config.runs["ps1_pv3_3pi_20170110"] = "refcats" config.doRegisterInstrument = False' \ > $GEN2REPO/convert_refcats.py ``` Finally, we need to run the Science Pipelines refcat conversion script [rootRepoConverter.py](https://github.com/lsst/obs_base/blob/master/python/lsst/obs/base/gen2to3/rootRepoConverter.py). > Note: at the time of writing, a modified version of `rootRepoConverter.py` was required to complete the `butler convert` command below without raising errors. Specifically, [L137-L140](https://github.com/lsst/obs_base/blob/df941304a811ff8835745fa3b69621823ac968bd/python/lsst/obs/base/gen2to3/rootRepoConverter.py#L137-L140) (from `if not self.task.dry_run:` to `else:`) were commented out, [L141](https://github.com/lsst/obs_base/blob/df941304a811ff8835745fa3b69621823ac968bd/python/lsst/obs/base/gen2to3/rootRepoConverter.py#L141) (`self.task.log.info...`) was unindented, and [L208-L226](https://github.com/lsst/obs_base/blob/df941304a811ff8835745fa3b69621823ac968bd/python/lsst/obs/base/gen2to3/rootRepoConverter.py#L208-L226) (from `if self._refCats:` to `self._chain.append(chained)`) were commented out. This approach is not recommended for general use. This script also adds curated calibrations, negating the relevant section below. > > To set up this package, run `setup -j -r ~/repos/obs_base` and either switch to a branch with the changes listed above in place, or make the changes temporarily to the code in place. > > Once the command below has completed successfully, any changes made to the master branch can be undone using this on the command line: `git reset --hard origin/master`. > > At present, ticket [DM-30624](https://jira.lsstcorp.org/browse/DM-30624) now makes ingesting refcats from a gen2 repo easier. To run the butler conversion script: ```bash= LOGFILE=$LOGDIR/convert_refcats.log; \ date | tee $LOGFILE; \ butler convert $REPO \ --gen2root $GEN2REPO \ -C $GEN2REPO/convert_refcats.py \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: the runtime for this command was ~20 minutes. --- #### Alternative Refcat Dataset Transfer If a butler already exists on the machine, the refcat datasets can be transferred over from the existing butler directly: ```bash= LOGFILE=$LOGDIR/transfer_refcats.log; \ date | tee $LOGFILE; \ butler transfer-datasets --register-dataset-types \ -t link --collections refcats \ /repo/main $REPO \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: on the `lsst-devl` machine at NCSA, the runtime for this command was ~60 minutes. Once the refcats are in place, their collection can be confirmed: ```bash= butler query-collections $REPO "refcats" ``` Following a dataset transfer, it may be necessary to set up the parent `refcats` parent CHAINED collection which contains all of the transferred child `refcats/...` RUN collections: ```bash= PARENT=refcats; \ CHILDREN=refcats/DM-28636,refcats/DM-33444; \ butler collection-chain $REPO $PARENT $CHILDREN ``` ### 1.6. Register the skyMap Make a sky map and add it to the repository. Sky maps exist as dimensions, datasets and collections. Until the [DM-34516](https://jira.lsstcorp.org/browse/DM-34516) ticket was merged, prior DECam data reductions made use of the HSC skymap: `hsc_rings_v1`. DECam data reductions now instead make use of a DECam-specific sky map, maintaining the native pixel scale of DECam imaging. By default, DECam tracts have the same centroids as their HSC-counterparts, albeit with fewer patches per tract. An example DECam tract is shown below: ![DECam tract 9704](https://jira.lsstcorp.org/secure/attachment/59780/decam9704.png "DECam tract 9704") :::info DECam Tract 9704 as defined using the `decam_rings_v1` sky map. DECam tracts have the same centroids as HSC tracts (as defined by the `hsc_rings_v1` sky map). Each DECam tract is further sub-divided into 6x6 patches, each of ~4k pixels on a side. ::: This command registers the DECam `skyMap` dataset using the default `obs_decam` configuration, and setting the output name to the commonly used and de-facto standard `decam_rings_v1`: ```bash= LOGFILE=$LOGDIR/register_decam_rings_v1.log; \ date | tee $LOGFILE; \ butler register-skymap $REPO \ -C $OBS_DECAM_DIR/config/makeSkyMap.py \ -c name='decam_rings_v1' \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: the runtime for this command was ~5 minutes. If the HSC skymap is also required, it may be generated as well: ```bash= LOGFILE=$LOGDIR/register_hsc_rings_v1.log; \ date | tee $LOGFILE; \ butler register-skymap $REPO \ -C $OBS_SUBARU_DIR/config/makeSkyMap.py \ -c name='hsc_rings_v1' \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` Check that all required `skyMap` dataset types now exist in the `skymaps` run collection in the Merian repo: ```bash= butler query-datasets $REPO skyMap ``` > Note: this sky map step doesn't really need to be performed until much later, however, if any errors occur during registration of the sky map, it may be necessary to delete the repo and start afresh. For this reason, it's usually better to perform this step as soon as possible. ### 1.7. Write curated calibrations Curated calibrations are collections of calibration data which describe various aspects of the camera and survey. If setting up a new butler on a new machine, an instrument's curated calibrations will need to be added to the data repository: ```bash= butler write-curated-calibrations $REPO lsst.obs.decam.DarkEnergyCamera ``` > Note: if the modified version of `rootRepoConverter.py` was used above, the dataset types added by this command may already have been ingested into the repo. If so, running the above command will fail with an error similar to `A database constraint failure was triggered by inserting one or more datasets of type DatasetType('camera', {instrument}, Camera, isCalibration=True) into collection 'DECam/calib/unbounded'. This probably means a dataset with the same data ID and dataset type already exists, but it may also mean a dimension row is missing.`. The instrument may be specified via either the fully qualified name, as above, or the short name (`DECam` here). The `-h` help file indicates the former is required, but this advice may change in the future. This currently adds `camera`, `crosstalk`, `defects` and `linearizer` dataset types into the repository within a number of collections (`DECam/calib`, `DECam/calib/unbounded`, and `DECam/calib/curated/{timestamp}`). Check the current collections within the repo using: ```bash= butler query-collections $REPO "DECam/calib*" ``` ## 2. Ingest raw data ### 2.1. Ingest raw science frames We're now prepared to ingest raw science frames. If raw frames are being stored in multiple directories, this command needs to be repeated for each directory. Alternatively, a sufficient glob which is able to locate all files of interest may be supplied. Here's an example data ingest command: ```bash= LOGFILE=$LOGDIR/merian9813_ingest_science.log; \ SCIFILES=/project/lskelvin/decam/raw-merian/science/raw_*.fz; \ date | tee $LOGFILE; \ butler ingest-raws $REPO $SCIFILES --transfer link \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: The runtime for this command was ~5 minutes, ingesting 2480 distinct Butler datasets from 40 exposures. Raw science exposures have now been added to the `DECam/raw/all` collection in the Merian repo. Collections define groups of data, and can be listed (and searched) using: ```bash= butler query-collections $REPO "DECam/raw/all" ``` Ingested exposures can be listed on the command line using: ```bash= butler query-dimension-records $REPO exposure \ --where "instrument='DECam' AND exposure.observation_type='science'" ``` This view shows all available *dimensions* associated with each ingested science image, including the observation ID, the physical filter, and the observation type. Alternatively, datasets can be queried directly using `query-datasets`, with optional SQL-like `--where` arguments to search specific dimensions, e.g.: ```bash= WHERE="instrument='DECam' AND exposure=971666" WHERE="instrument='DECam' AND detector=1" WHERE="instrument='DECam' AND exposure.observation_type='science' AND exposure.day_obs > 20220101 AND exposure.day_obs < 20220201 AND detector=1" butler query-datasets $REPO --where $WHERE raw ``` > Note: To successfully use the `--where` arguement, other dimensions may be required, such as `instrument`. The butler will complain with a `UserExpressionError` if a required dimension is not found. A list of science exposure IDs can similarly be extracted within python: ```python= queryData = butler.registry.queryDatasets where = "exposure.observation_type='science' AND detector=1" exps = list(queryData("raw", collections="DECam/raw/all", instrument="DECam", where=where)) expids = tuple(x.dataId["exposure"] for x in exps) print(f'SCIEXPS="{expids}"') ``` The test dataset used here returns this list of science exposures: ```bash= SCIEXPS="(971666, 971667, 971668, 971669, 971670, 971671, 971672, 971673, 971674, 971675, 971676, 971677, 971678, 971679, 971680, 971681, 971682, 971683, 971684, 971685, 1068554, 1068555, 1068556, 1068557, 1068558, 1068559, 1068560, 1068561, 1068562, 1068713, 1068714, 1068715, 1068716, 1068717, 1068718, 1068719, 1068720, 1068721, 1068722, 1068723)" ``` ### 2.2. Ingest raw bias frames As with the raw science frames above, raw bias frames ('*zero*') are also ingested: ```bash= LOGFILE=$LOGDIR/merian9813_ingest_bias.log; \ BIASFILES=/project/lskelvin/decam/raw-merian/bias/raw_*.fz; \ date | tee $LOGFILE; \ butler ingest-raws $REPO $BIASFILES --transfer link \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: The runtime for this command was ~1 minute, ingesting 3100 distinct Butler datasets from 50 exposures. Bias exposures have now been ingested into the Merian repo. Check that the bias calibration frames have been successfully ingested using: ```bash= butler query-dimension-records $REPO exposure \ --where "instrument='DECam' AND exposure.observation_type='zero'" ``` A list of bias exposure IDs can be extracted within python: ```python= queryData = butler.registry.queryDatasets where = "exposure.observation_type='zero' AND detector=1" exps = list(queryData("raw", collections='DECam/raw/all', instrument="DECam", where=where)) expids = tuple(x.dataId["exposure"] for x in exps) print(f'BIASEXPS="{expids}"') ``` The test dataset used here returns this list of bias exposures: ```bash= BIASEXPS="(970488, 970489, 970490, 970491, 970492, 970493, 970494, 970495, 970496, 970497, 970498, 970589, 970590, 970591, 970592, 970593, 970594, 970595, 970596, 970597, 970598, 970599, 970823, 970824, 970825, 970826, 970827, 970828, 970829, 970830, 970831, 970832, 970833, 970924, 970925, 970926, 970927, 970928, 970929, 970930, 970931, 970932, 970933, 970934, 971161, 971162, 971163, 971164, 971165, 971166)" ``` ### 2.3. Ingest raw flat frames The final set of data to be ingested are the raw flat frames ('*dome flat*'): ```bash= LOGFILE=$LOGDIR/merian9813_ingest_flat.log; \ FLATFILES=/project/lskelvin/decam/raw-merian/flat/raw_*.fz; \ date | tee $LOGFILE; \ butler ingest-raws $REPO $FLATFILES --transfer link \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: The runtime for this command was ~5 minutes, ingesting 6200 distinct Butler datasets from 100 exposures. Flat exposures have now been ingested into the Merian repo. Check that the flat calibration frames have been successfully ingested using: ```bash= butler query-dimension-records $REPO exposure \ --where "instrument='DECam' AND exposure.observation_type='dome flat'" ``` > Note: if an attempt is made to ingest a file which was already been ingested, the science pipelines will fail for that particular file. This behaviour is as expected, and not a cause for concern. A list of flat exposure IDs can be extracted within python: ```python= queryData = butler.registry.queryDatasets where = "exposure.observation_type='dome flat' AND detector=1" exps = list(queryData("raw", collections="DECam/raw/all", instrument="DECam", where=where)) expids = tuple(x.dataId["exposure"] for x in exps) print(f'FLATEXPS="{expids}"') ``` The test dataset used here returns this list of flat exposures: ```bash= FLATEXPS="(970228, 970229, 970230, 970231, 970232, 970233, 970234, 970235, 970236, 970237, 970238, 970501, 970502, 970503, 970504, 970505, 970506, 970507, 970508, 970509, 970510, 970511, 970836, 970837, 970838, 970839, 970840, 970841, 970842, 970843, 970844, 970845, 970846, 971174, 971175, 971176, 971177, 971178, 971179, 971180, 971181, 971182, 971183, 971184, 971554, 971555, 971556, 971557, 971558, 971559, 1052706, 1052707, 1052708, 1052709, 1052710, 1052711, 1052712, 1052713, 1052714, 1052715, 1052716, 1053093, 1053094, 1053095, 1053096, 1053097, 1053098, 1053099, 1053100, 1053101, 1053102, 1053103, 1053485, 1053486, 1053487, 1053488, 1053489, 1053490, 1053491, 1053492, 1053493, 1053494, 1053495, 1053858, 1053859, 1053860, 1053861, 1053862, 1053863, 1053864, 1053865, 1053866, 1053867, 1053868, 1054287, 1054288, 1054289, 1054290, 1054291, 1054292)" ``` ### 2.4. Define visits Once all raw data has been ingested, we can define visits from exposures in the butler registry. This sets up the exposure IDs within the butler, allowing future runs to use this information when using the `-d` or `--where` data queries. Without this step, processing steps after ISR (i.e., `characterizeImage` onwards) will fail with `RuntimeError: QuantumGraph is empty.`. ```bash= LOGFILE=$LOGDIR/merian9813_define_visits.log; \ date | tee $LOGFILE; \ butler define-visits $REPO lsst.obs.decam.DarkEnergyCamera \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: this run took ~1 minute with these data. Do *not* use the `-j N` syntax here to run this process on more than one processor. Doing so will cause multiple `database is locked` errors as each processor attempts to write to the same butler. Ticket [DM-30607](https://jira.lsstcorp.org/browse/DM-30607) references this issue. ## 3. Calibration ### 3.1. Determine calibration frame validity ranges Now that raw bias (zero) and dome flat calibration frames have been ingested, validation date ranges need to be determined. Prior analyses of these data show that all Merian observation nights are consistent with each other to within a 1% flux deviation. For this reason, we opt to construct calibraton frames which are certified across the entire timespan of our data; from 2021-01-01 to 2022-06-30. ### 3.2. Build bias frames Next, the master bias frames are built. These frames need to be built for each valid date range (see section above). First, check the build pipeline: ```bash= pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \ --show pipeline ``` :::spoiler Click here to toggle the DECam `cpBias` pipeline at the time of writing.<br><br> ```yaml= description: cp_pipe BIAS calibration construction for DECam instrument: lsst.obs.decam.DarkEnergyCamera tasks: isr: class: lsst.ip.isr.IsrTask config: - connections.ccdExposure: raw connections.outputExposure: cpBiasProc doWrite: true doBias: false doVariance: true doLinearize: false doCrosstalk: false doBrighterFatter: false doDark: false doFlat: false doApplyGains: false doFringe: false cpBiasCombine: class: lsst.cp.pipe.cpCombine.CalibCombineTask config: - connections.inputExpHandles: cpBiasProc connections.outputData: bias calibrationType: bias exposureScaling: Unity contracts: - contract: cpBiasCombine.calibrationType == "bias" - contract: cpBiasCombine.exposureScaling == "Unity" - contract: isr.doBias == False ``` ::: The `cpBias` pipeline may also be viewed graphically: ```bash= pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpBias.pdf ``` {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_cpBias.pdf %} [pipeline_cpBias.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_cpBias.pdf) Build master bias (zero) frames, ensuring that all required input collections are given as arguments to `-i`: ```bash= LOGFILE=$LOGDIR/merian9813_cpBias.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \ -o DECam/calib/merian9813/bias \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \ -d "instrument='DECam' AND exposure IN $BIASEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: This job contained 3162 quanta, with a runtime of ~2 hours. The `-d` data query takes a wide range of SQL-like arguments. Instead of specifying exposures by their exposure ID, it may be preferable to build them by date instead, e.g.: `"exposure.observation_type='zero' AND exposure.day_obs > 20210910"`. To avoid an error regarding missing `defects` dataset types, an input collection containing `defects` must also be supplied in the `cpBias` run. Here, these data will be ingested into the repo when running `write-curated-calibrations`, for example. The instructions here make use of the `DECam/calib/curated/19700101T000000Z` collection. Other collections containing `defects` are also available, however, some of the commonly unused detectors are missing (i.e., there are <62). If `defects` data are not available at all, adding `-c isr:doDefect=False` to the `pipetask run` command will disable defect masking when running the `cpBias` pipeline. On occasion, some of the tasks (quanta) may fail, likely due to memory issues. In such cases, an afterburner can be run on a single core to try the failed tasks again. To do so, add `--extend-run` and `--skip-existing` to the `pipetask` run command, and remove `-j N` to prevent it from running on multiple cores. This will help ensure that the most memory-intensive quanta will not request too much simultaneous memory usage. Check the collections, dataset types and datasets now present in the repo: ```bash= butler query-collections $REPO "*bias*" butler query-dataset-types $REPO butler query-datasets $REPO --collections DECam/calib/merian9813/bias butler query-datasets $REPO --collections DECam/calib/merian9813/bias bias ``` ### 3.3. Certify bias frames Certify the biases for a given date range. Arguments: `REPO`, `INPUT_COLLECTION`, `OUTPUT_COLLECTION`, `DATASET_TYPE_NAME`: ```bash= butler certify-calibrations \ $REPO DECam/calib/merian9813/bias DECam/calib/merian9813 bias \ --begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59 ``` You may check what certified date ranges have been applied to the bias data in Python by querying dataset associations in the output collection. For example, to check only detector #1: ```python= qda = butler.registry.queryDatasetAssociations coll = "DECam/calib/merian9813" biases = [x for x in qda("bias", collections=coll) if x.ref.dataId["detector"] == 1] print(biases) ``` which produces a list of all biases relating to detector #1 (in the case of this example document, there should be only 1 result at present). Inspecting the properties of this object gives the timespan, e.g.: ```python= print(f"{biases[0].timespan.begin.value = }") # biases[0].timespan.begin.value = '2021-01-01 00:00:00.000000' print(f"{biases[0].timespan.end.value = }") # biases[0].timespan.end.value = '2022-06-30 23:59:59.000000' ``` ### 3.4. Generate crosstalk sources The next step is to generate crosstalk sources using step 0 of the Data Release Production (DRP) pipeline (`DRP.yaml`). Crosstalk sources need to be generated for any raw we want to run actual ISR on (i.e., raw flats and raw science frames). Step 0 of `DRP.yaml` runs only the `doOverscan` aspect of the `ISR` (instrument signature removal) task. It can be visualized using: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ --show pipeline ``` :::spoiler Click here to toggle the Merian step 0 pipeline at the time of writing.<br><br> ```yaml= description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: isrForCrosstalkSources: class: lsst.ip.isr.IsrTask config: - connections.outputExposure: overscanRaw doOverscan: true doAssembleCcd: false doBias: false doCrosstalk: false doVariance: false doLinearize: false doDefect: false doNanMasking: false doDark: false doFlat: false doFringe: false doInterpolate: false subsets: step0: subset: - isrForCrosstalkSources description: | Tasks which should be run once, prior to initial data processing. This step generates crosstalk sources for ISR/inter-chip crosstalk by applying overscan correction on raw frames. A new dataset is written, which should be used as an input for further data processing. ``` ::: The step 0 pipeline may also be viewed graphically: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step0.pdf ``` {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step0.pdf %} [pipeline_step0.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step0.pdf) Run `step0` for raw flats: ```bash= LOGFILE=$LOGDIR/merian9813_step0_flat.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \ -o DECam/calib/merian9813/crosstalk \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ -d "instrument='DECam' AND exposure IN $FLATEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: this run contained 6200 quanta, with a runtime of ~2 hours. The final run message for these data was: `Executed 6200 quanta successfully, 0 failed and 0 remain out of total 6200 quanta.` Extend the `crosstalk` RUN collection to also include science exposures: ```bash= LOGFILE=$LOGDIR/merian9813_step0_science.log; \ date | tee -a $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ --extend-run --skip-existing \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \ -o DECam/calib/merian9813/crosstalk \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ -d "instrument='DECam' AND exposure IN $SCIEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: this run contained 2480 quanta, with a runtime of ~30 minutes. The final run message for these data was: `Executed 2480 quanta successfully, 0 failed and 0 remain out of total 2480 quanta.` The `overscanRaw` dataset types should now be available in the output repository. Check the collections and datasets: ```bash= butler query-collections $REPO butler query-datasets $REPO overscanRaw ``` ### 3.5. Build flat frames This step constructs the master flat frames (which requires using the biases). The `cpFlat` pipeline can be visualized using: ```bash= pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \ --show pipeline ``` :::spoiler Click here to toggle the DECam `cpFlat` pipeline at the time of writing.<br><br> ```yaml= description: cp_pipe FLAT calibration construction for DECam instrument: lsst.obs.decam.DarkEnergyCamera tasks: isr: class: lsst.ip.isr.IsrTask config: - connections.ccdExposure: raw connections.outputExposure: cpFlatProc doWrite: true doBrighterFatter: false doFlat: false doFringe: false doApplyGains: false connections.crosstalkSources: overscanRaw connections.bias: bias doDark: false cpFlatMeasure: class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask config: - connections.inputExp: cpFlatProc connections.outputStats: flatStats doVignette: false cpFlatNorm: class: lsst.cp.pipe.cpFlatNormTask.CpFlatNormalizationTask config: - connections.inputMDs: flatStats connections.outputScales: cpFlatNormScales level: AMP cpFlatCombine: class: lsst.cp.pipe.cpCombine.CalibCombineByFilterTask config: - connections.inputExpHandles: cpFlatProc connections.inputScales: cpFlatNormScales connections.outputData: flat calibrationType: flat exposureScaling: InputList scalingLevel: AMP contracts: - contract: isr.doFlat == False ``` ::: The `cpFlat` pipeline may also be viewed graphically: ```bash= pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpFlat.pdf ``` {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_cpFlat.pdf %} [pipeline_cpFlat.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_cpFlat.pdf) Build master flats: ```bash= LOGFILE=$LOGDIR/merian9813_cpFlat.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/merian9813,DECam/calib/merian9813/crosstalk,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \ -o DECam/calib/merian9813/flat \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \ -d "instrument='DECam' AND exposure IN $FLATEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: this run contained 12526 quanta, with a runtime of ~3 hours. Initializing the task may also take some time, due to the complex nature of the quantum graph. On occasion, some quanta (tasks) may fail, likely due to memory issues. In such cases, the missing master flats can be reattempted in an afterburner by excluding `-j N` (to run on only a single core) and including `--extend-run --skip-existing` in the `pipetask` run command. Check what types of data now exist in the output collection: ```bash= butler query-collections $REPO butler query-dataset-types $REPO butler query-datasets $REPO --collections DECam/calib/merian9813/flat butler query-datasets $REPO --collections DECam/calib/merian9813/flat flat ``` > Note: the final two commands will not work if the output collection is a CHAINED collection containing a CALIBRATION child collection. If attempted, the above commands will fail with `NotImplementedError: Query for dataset type 'camera' in CALIBRATION-type collection 'DECam/calib/merian' is not yet supported.`. As a work-around, instead provide the full name of the child RUN collection as given by `query-collections`. ### 3.6. Certify flat frames Certify the flats for a given date range. Arguments: `REPO`, `INPUT_COLLECTION`, `OUTPUT_COLLECTION`, `DATASET_TYPE_NAME`: ```bash= butler certify-calibrations \ $REPO DECam/calib/merian9813/flat DECam/calib/merian9813 flat \ --begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59 ``` You may check what certified date ranges have been applied to the flat data in Python by querying dataset associations in the output collection. For example, to check only detector #1: ```python= qda = butler.registry.queryDatasetAssociations coll = "DECam/calib/merian9813" flats = [x for x in qda("flat", collections=coll) if x.ref.dataId["detector"] == 1] print(flats) ``` which produces a list of all flats relating to detector #1 (in the case of this example document, there should be only 1 result at present). Inspecting the properties of this object gives the timespan, e.g.: ```python= print(f"{flats[0].timespan.begin.value = }") # flats[0].timespan.begin.value = '2021-01-01 00:00:00.000000' print(f"{flats[0].timespan.end.value = }") # flats[0].timespan.end.value = '2022-06-30 23:59:59.000000' ``` ## 4. Set up a default collection Data in the Science Pipelines are arranged into *collections*; groupings of data. Here we establish a default collection which contains the commonly required raw RUN collections. Whilst this step is not strictly necessary, this will allow us to specify only a single `INPUT` collection for future raw data processing: ```bash= INPUT=DECam/defaults/merian9813 ``` If this step is not performed, future data processing will need to specify all required input collections explicitly: ```bash= -i long,comma,separated,list,of,child,collections ``` If this step is followed then future data processing from raw data should only need to specify the default collection: ```bash= -i $INPUT ``` > Note: it is not currently possible to query a CHAINED collection containing a CALIBRATION child collection. By constructing a dedicated CHAINED collection containing only the RUN runs of interest, this will allow users to query the CHAINED collection and avoid this error. A CHAINED collection can be set up either on the command line or in Python. To set up a CHAINED collection on the command line for all required input collections, run: ```bash= CHILDREN="DECam/raw/all,\ DECam/calib/merian9813,\ DECam/calib/merian9813/crosstalk,\ DECam/calib/curated/19700101T000000Z,\ DECam/calib/unbounded,\ skymaps,\ refcats" butler collection-chain $REPO $INPUT $CHILDREN ``` > Note: the CHILDREN list may be amended and the above command re-run to update this parent collection, if, for example, new data has been processed and a user would like to add the updated crosstalk RUN collection to this parent CHAINED collection. Alternatively, this may also be achieved in Python: ```python= import lsst.daf.butler as dafButler REPO = "/project/lskelvin/repo" default_collection = "DECam/defaults/merian9813" # Set up a writeable butler butler_writeable = dafButler.Butler(REPO, writeable=True) registry_writeable = butler_writeable.registry # Register a new default CHAINED collection registry_writeable.registerCollection(default_collection, type = dafButler.CollectionType.CHAINED) # Add required CHILD collections into the CHAINED collection registry_writeable.setCollectionChain(default_collection, ["DECam/raw/all", "DECam/calib/merian9813", "DECam/calib/merian9813/crosstalk" "DECam/calib/curated/19700101T000000Z", "DECam/calib/unbounded", "skymaps", "refcats"]) ``` > Note: as above, if reprocessing data in future runs, you can amend the list above to add your own collections, and then re-run `setCollectionChain` to update `default_collection`. This allows for the default collection to stay relevant in linking to all necessary datasets as new data becomes available. ## 5. Data release production In this section we will proceed through all the relevant data processing steps to take raw DECam science data through to coadd outputs. These processed data will output into the `OUTPUT` CHAINED collection: ```bash= OUTPUT=DECam/runs/merian9813/w_2022_26 ``` Here, the `w_2022_26` is a reference to the weekly of the LSST Science Pipelines used to reduce these data. Processing consists of three main steps (1, 2 and 3): * Step 1: single frame processing * instrumental signature removal, initial bg subtraction / calibration / PSF estimation * Step 2: post single frame processing * step 2a - initial visit aggregation * step 2b - tract-level characterization * step 2c - global collection summaries * step 2d - final source table generation * Step 3: coadd level processing * warping visit-level images onto the coadd plane, constructing a coadd, running detection & deblending algorithms If outputting to an already existing collection in the commands below, the following arguments should be appended to the `pipetask` run commands below: ```bash= --extend-run --skip-existing --clobber-outputs ``` ### 5.1. Step 1 - single frame processing Processed visit images (PVIs) and preliminary source tables are produced in step 1. :::spoiler {state="closed"} Click here to toggle the Merian step 1 YAML.<br><br> To generate the pipeline YAML: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \ --show pipeline ``` which gives: ```yaml= description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: isr: class: lsst.ip.isr.IsrTask config: - connections.crosstalkSources: overscanRaw doCrosstalk: true characterizeImage: class: lsst.pipe.tasks.characterizeImage.CharacterizeImageTask calibrate: class: lsst.pipe.tasks.calibrate.CalibrateTask config: - photoCal.match.referenceSelection.magLimit.fluxField: i_flux photoCal.match.referenceSelection.magLimit.maximum: 22.0 writePreSourceTable: class: lsst.pipe.tasks.postprocess.WriteSourceTableTask config: - connections.outputCatalog: preSource transformPreSourceTable: class: lsst.pipe.tasks.postprocess.TransformSourceTableTask config: - connections.inputCatalog: preSource connections.outputCatalog: preSourceTable subsets: processCcd: subset: - characterizeImage - isr - calibrate description: 'Set of tasks to run when doing single frame processing, without any conversions to Parquet/DataFrames or visit-level summaries. ' step1: subset: - writePreSourceTable - calibrate - transformPreSourceTable - characterizeImage - isr description: | Per-detector tasks that can be run together to start the DRP pipeline. These should never be run with 'tract' or 'patch' as part of the data ID expression if any later steps will also be run, because downstream steps require full visits and 'tract' and 'patch' constraints will always select partial visits that overlap that region. ``` ::: :::spoiler {state="closed"} Click here to toggle the Merian step 1 graph.<br><br> To generate the pipeline graph: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step1.pdf ``` which gives: {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step1.pdf %} [pipeline_step1.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step1.pdf) ::: Run step 1: ```bash= DATAQUERY="exposure.day_obs > 20210101 AND exposure.day_obs < 20220630 AND exposure.observation_type='science' AND detector NOT IN (31,61)" LOGFILE=$LOGDIR/merian9813_step1.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \ -d "instrument='DECam' AND $DATAQUERY" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: the runtime for this 12000 quanta run was ~4 hours. As is common in DECam data processing, detectors 31 and 61 have been excluded from data reduction owing to their corrupt nature. Instead of selecting by date as above, a wide range of data selectors may be used instead to identify specific raw data to be processed. For example, '`exposure IN $SCIEXPS`' would only process exposures defined in the `SCIEXPS` object. ### 5.2a. Step 2a - initial visit aggregation Initial visit aggregation takes place in step 2a, producing visit-wide preliminary source tables and visit summaries. :::spoiler {state="closed"} Click here to toggle the Merian step 2a YAML.<br><br> To generate the pipeline YAML: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \ --show pipeline ``` which gives: ```yaml= description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: consolidateVisitSummary: class: lsst.pipe.tasks.postprocess.ConsolidateVisitSummaryTask consolidatePreSourceTable: class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask config: - connections.inputCatalogs: preSourceTable connections.outputCatalog: preSourceTable_visit subsets: step2a: subset: - consolidateVisitSummary - consolidatePreSourceTable description: | Visit-level tasks Allowed data query constraints: visit Tasks aggregate all detectors for a given visit. ``` ::: :::spoiler {state="closed"} Click here to toggle the Merian step 2a graph.<br><br> To generate the pipeline graph: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2a.pdf ``` which gives: {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2a.pdf %} [pipeline_step2a.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2a.pdf) ::: Run step 2a: ```bash= LOGFILE=$LOGDIR/merian9813_step2a.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \ -d "instrument='DECam' AND $DATAQUERY" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` > Note: the runtime for this 80 quanta run was ~3 minutes. ### 5.2b. Step 2b - tract-level characterization Photometric and astrometric calibration take place at the tract level in step 2b, as does isolated star characterization. :::spoiler {state="closed"} Click here to toggle the Merian step 2b YAML.<br><br> To generate the pipeline YAML: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \ --show pipeline ``` which gives: ```yaml= description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: isolatedStarAssociation: class: lsst.pipe.tasks.isolatedStarAssociation.IsolatedStarAssociationTask config: - python: config.band_order += ["N708", "N540"] jointcal: class: lsst.jointcal.JointcalTask config: - connections.inputSourceTableVisit: preSourceTable_visit subsets: step2b: subset: - isolatedStarAssociation - jointcal description: | Tract-level tasks Allowed data query constraints: tract Jointcal and isolatedStarAssociation both use PreSources, generated by consolidatePreSourceTable, for all visits that overlap a tract. jointcal produces solutions per-tract, per-visit isolatedStarAssociation produces solutions per-tract. ``` ::: :::spoiler {state="closed"} Click here to toggle the Merian step 2b graph.<br><br> To generate the pipeline graph: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2b.pdf ``` which gives: {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2b.pdf %} [pipeline_step2b.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2b.pdf) ::: Run step 2b: ```bash= LOGFILE=$LOGDIR/merian9813_step2b.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \ -d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` ### 5.2c. Step 2c - global collection summaries Global per-collection summaries of visits and detectors are generated in step 2c. :::spoiler {state="closed"} Click here to toggle the Merian step 2c YAML.<br><br> To generate the pipeline YAML: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \ --show pipeline ``` which gives: ```yaml= description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: makeCcdVisitTable: class: lsst.pipe.tasks.postprocess.MakeCcdVisitTableTask makeVisitTable: class: lsst.pipe.tasks.postprocess.MakeVisitTableTask subsets: step2c: subset: - makeVisitTable - makeCcdVisitTable description: | Global-level tasks that must not be run with any data query constraints Can be run anytime after subset step2a. Allowed data query constraints: instrument Tasks generate one data product per collection. make[Ccd]VisitTable produces per-collection summary of the Visits and CcdVisits. ``` ::: :::spoiler {state="closed"} Click here to toggle the Merian step 2c graph.<br><br> To generate the pipeline graph: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2c.pdf ``` which gives: {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2c.pdf %} [pipeline_step2c.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2c.pdf) ::: Run step 2c: ```bash= LOGFILE=$LOGDIR/merian9813_step2c.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \ -d "instrument='DECam'" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` ### 5.2d. Step 2d - final source table generation Generation of final source tables with full calibrations applied takes place in step 2d. :::spoiler {state="closed"} Click here to toggle the Merian step 2d YAML.<br><br> To generate the pipeline YAML: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \ --show pipeline ``` which gives: ```yaml= description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: transformSourceTable: class: lsst.pipe.tasks.postprocess.TransformSourceTableTask consolidateSourceTable: class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask finalizeCharacterization: class: lsst.pipe.tasks.finalizeCharacterization.FinalizeCharacterizationTask writeRecalibratedSourceTable: class: lsst.pipe.tasks.postprocess.WriteRecalibratedSourceTableTask config: - useGlobalExternalPhotoCalib: false connections.photoCalibName: jointcal connections.outputCatalog: source subsets: step2d: subset: - writeRecalibratedSourceTable - finalizeCharacterization - consolidateSourceTable - transformSourceTable description: | Visit-level tasks. Allowed data query constraints: visit writeRecalibratedSourceTable, transformSourceTable run per-detector consolidateSourceTable produces one data product per visit. finalizeCharacterization will eventually model full focal plane PSFs. ``` ::: :::spoiler {state="closed"} Click here to toggle the Merian step 2d graph.<br><br> To generate the pipeline graph: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2d.pdf ``` which gives: {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2d.pdf %} [pipeline_step2d.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2d.pdf) ::: Run step 2d: ```bash= LOGFILE=$LOGDIR/merian9813_step2d.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \ -d "instrument='DECam' AND $DATAQUERY" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` ### 5.3. Step 3 - coadd processing Coadd-processing takes place in step 3. A large number of tasks are performed during this step, including, but not limited to: :::spoiler {state="closed"} Click here to toggle the Merian step 3 YAML.<br><br> To generate the pipeline YAML: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \ --show pipeline ``` which gives: ```yaml= description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: makeWarp: class: lsst.pipe.tasks.makeCoaddTempExp.MakeWarpTask config: - makePsfMatched: true - python: | config.warpAndPsfMatch.psfMatch.kernel['AL'].alardSigGauss = [1.0, 2.0, 4.5] from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask config.select.retarget(PsfWcsSelectImagesTask) connections.photoCalibName: jointcal matchingKernelSize: 29 doApplyExternalPhotoCalib: true externalPhotoCalibName: jointcal doApplyExternalSkyWcs: true modelPsf.defaultFwhm: 7.7 warpAndPsfMatch.warp.warpingKernelName: lanczos5 coaddPsf.warpingKernelName: lanczos5 useGlobalExternalPhotoCalib: false doWriteEmptyWarps: true assembleCoadd: class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask config: - doInputMap: true - python: | config.removeMaskPlanes.append("CROSSTALK") config.badMaskPlanes += ["SUSPECT"] from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask config.select.retarget(PsfWcsSelectImagesTask) matchingKernelSize: 29 doApplyExternalPhotoCalib: true externalPhotoCalibName: jointcal doApplyExternalSkyWcs: true subregionSize: (10000, 200) doNImage: true interpImage.transpose: true coaddPsf.warpingKernelName: lanczos5 assembleStaticSkyModel.subregionSize: (10000, 200) assembleStaticSkyModel.doApplyExternalPhotoCalib: true assembleStaticSkyModel.externalPhotoCalibName: jointcal assembleStaticSkyModel.doApplyExternalSkyWcs: true doFilterMorphological: true useGlobalExternalPhotoCalib: false assembleStaticSkyModel.useGlobalExternalPhotoCalib: false doAttachTransmissionCurve: false detection: class: lsst.pipe.tasks.multiBand.DetectCoaddSourcesTask mergeDetections: class: lsst.pipe.tasks.mergeDetections.MergeDetectionsTask deblend: class: lsst.pipe.tasks.deblendCoaddSourcesPipeline.DeblendCoaddSourcesMultiTask measure: class: lsst.pipe.tasks.multiBand.MeasureMergedCoaddSourcesTask mergeMeasurements: class: lsst.pipe.tasks.mergeMeasurements.MergeMeasurementsTask writeObjectTable: class: lsst.pipe.tasks.postprocess.WriteObjectTableTask transformObjectTable: class: lsst.pipe.tasks.postprocess.TransformObjectCatalogTask consolidateObjectTable: class: lsst.pipe.tasks.postprocess.ConsolidateObjectTableTask forcedPhotCoadd: class: lsst.meas.base.forcedPhotCoadd.ForcedPhotCoaddTask selectGoodSeeingVisits: class: lsst.pipe.tasks.selectImages.BestSeeingQuantileSelectVisitsTask config: - connections.goodVisits: goodSeeingVisits templateGen: class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask config: - doSelectVisits: true assembleStaticSkyModel.doSelectVisits: true connections.selectedVisits: goodSeeingVisits connections.outputCoaddName: goodSeeing connections.coaddExposure: goodSeeingCoadd - python: | config.removeMaskPlanes.append("CROSSTALK") config.badMaskPlanes += ["SUSPECT"] from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask config.select.retarget(PsfWcsSelectImagesTask) matchingKernelSize: 29 doApplyExternalPhotoCalib: true externalPhotoCalibName: jointcal doApplyExternalSkyWcs: true subregionSize: (10000, 200) doNImage: true interpImage.transpose: true coaddPsf.warpingKernelName: lanczos5 assembleStaticSkyModel.subregionSize: (10000, 200) assembleStaticSkyModel.doApplyExternalPhotoCalib: true assembleStaticSkyModel.externalPhotoCalibName: jointcal assembleStaticSkyModel.doApplyExternalSkyWcs: true doFilterMorphological: true useGlobalExternalPhotoCalib: false assembleStaticSkyModel.useGlobalExternalPhotoCalib: false doAttachTransmissionCurve: false contracts: - contract: '''calib_psf_candidate'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf else True' - contract: '''calib_psf_reserved'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf else True' - contract: '''calib_psf_used'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf else True' - contract: selectGoodSeeingVisits.connections.goodVisits == templateGen.connections.selectedVisits subsets: multiband: subset: - detection - mergeDetections - deblend - measure - mergeMeasurements - forcedPhotCoadd description: 'A set of tasks to run when making measurements on coadds. ' objectTable: subset: - consolidateObjectTable - writeObjectTable - transformObjectTable description: 'A set of tasks to transform multiband outputs into a parquet object table. ' step3: subset: - consolidateObjectTable - mergeMeasurements - detection - mergeDetections - selectGoodSeeingVisits - writeObjectTable - deblend - templateGen - assembleCoadd - transformObjectTable - measure - makeWarp - forcedPhotCoadd description: | Tract-level tasks that can be run together, but only after the 'step1' and 'step2' subsets. These should be run with explicit 'tract' constraints essentially all the time, because otherwise quanta will be created for jobs with only partial visit coverage. ``` ::: :::spoiler {state="closed"} Click here to toggle the Merian step 3 graph.<br><br> To generate the pipeline graph: ```bash= pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step3.pdf ``` which gives: {%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step3.pdf %} [pipeline_step3.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step3.pdf) ::: Run step 3: ```bash= LOGFILE=$LOGDIR/merian9813_step3.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \ -d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` ## A. Useful commands This section provides some useful command-line commands which may be used to interact with the data. ### query-collections Query collections in the repo: ```bash= butler query-collections $REPO "*u/lskelvin*" ``` The final search pattern may use standard glob syntax (e.g., note the asterisks above). ### query-datasets Query the datasets which live in a given collection: ```bash= butler query-datasets $REPO \ --collections u/lskelvin/testrun/01 \ --where "instrument='DECam' AND skymap='hsc_rings_v1' AND tract=9813" \ calexp ``` If the final dataset type (`calexp` in the example above) is not given, all dataset types found will be printed to the command line. ### collection-chain Redefine a CHAINED collection to only contain certain child RUN collections: ```bash= butler collection-chain $REPO PARENT "CHILD1,CHILD2" ``` This command is useful to use prior to attempting to delete a CHAINED collection, ensuring that no attempt is made to delete input raw collections. ### remove-runs Remove one or more RUN collections: ```bash= butler remove-runs $REPO COLLECTION ``` ### remove-collections Remove one or more non-RUN collections: ```bash= butler remove-collections $REPO COLLECTION ``` ## B. What tracts cover my data? The `visitSummary` tables produced in step 2a contain important information on single frame processed visits. This information may be used to find out which tracts overlap with your data. To generate a list of tract overlaps for a single visit, in Python: ```python= from collections import defaultdict import lsst.daf.butler as dafButler butler = dafButler.Butler('/project/lskelvin/repo') grouped_by_tract = defaultdict(set) for data_id in butler.registry.queryDataIds( ["tract", "visit", "detector"], datasets="visitSummary", collections="DECam/runs/merian9813/w_2022_26", instrument="DECam", visit=971666, ): grouped_by_tract[data_id["tract"]].add(data_id) print({k: len(v) for k, v in grouped_by_tract.items()}) ``` To get total tract coverage for *all* visits in a given collection, remove the `visit=` argument above. ## C. Transferring datasets from one machine to another To transfer datasets from one machine to another (e.g., from NCSA to Princeton), first, on the source machine in Python: ```python= outdir = "/project/lskelvin/merian" datasetType = ["objectTable_tract", "deepCoadd", "deepCoadd_calexp"] collection = "HSC/runs/RC2/w_2022_04/DM-33402" dataId = dict(skymap="hsc_rings_v1", tract=9813) with butler.export(directory=outdir, format="yaml", transfer="copy") as export: items = [] found = set(butler.registry.queryDatasets(datasetType, collections=collection, dataId=dataId)) items.extend(found) export.saveDatasets(items) ``` Next, in the output directory on the source machine: ```bash= tar -czvf data_transfer.tar.gz * ``` Transfer the file (here named `data_transfer.tar.gz`) from the source machine to the destination machine. Extract the tarball on the source machine: ```bash= tar -xzvf data_transfer.tar.gz ``` Next, on the source machine: ```bash= LOGFILE=$LOGDIR/data_import.log; \ butler import $REPO \ /path/to/data_transfer_directory \ --transfer copy \ --skip-dimensions skymap,tract,patch \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE ``` Finally, set up a similarly named parent collection, e.g.: ```bash= PARENT=HSC/runs/RC2/w_2022_04/DM-33402 CHILD=HSC/runs/RC2/w_2022_04/DM-33402/20220128T212035Z butler collection-chain $REPO $PARENT $CHILD ```