Try   HackMD

Merian Data Processing Using the LSST Science Pipelines

The Merian Survey is an ambitious program designed to explore the nature of dark matter, star formation, and feedback in dwarf galaxies. Merian will use 62 nights on the 4m Blanco telescope in Chile using the Dark Energy Camera (DECam). A total of 800 square degrees of the sky will be imaged in two custom made medium-band filters to create a sample of 100,000 star forming dwarf galaxies (with 90% completeness) in the redshift range 0.058 < z < 0.10. Merian will cover the HSC SSP Wide field which provides gravitational lensing capabilities to probe the dark matter component of dwarfs (Leauthaud et al. 2020).

This note summarizes the ingestion and data reduction process for Merian data, initially observed in Feb/March 2021, using the Rubin Observatory LSST Science Pipelines.

There are three primary DECam dataset types:

  • object:
    • primary science frames
    • observation type: object
    • filters:
      • N540 DECam c0014 5403.2 210.0
      • N708 DECam c0012 7080.0 400.0
    • data storage locations on tiger2-sumire:
      • /projects/MERIAN/raw/*/object
  • zero:
    • bias frames
    • observation type: zero
    • data storage locations on tiger2-sumire:
      • /projects/MERIAN/raw/*/zero
  • domeflat:
    • flat frames
    • observation type: dome flat
    • data storage locations on tiger2-sumire:
      • /projects/MERIAN/raw/*/domeflat

The DECam focal plane

The DECam Focal Plane; figure from Diehl et al. 2018. DECam focal plane showing the 62 2k x 4k CCDs, 8 2k x 2k CCDs (labeled "F") for the adaptive optics system, and 4 2k x 2k CCDs (labeled "G") for guiding. The orientation of the sky is indicated. The black label (e.g., S30) indicates a position on the focal plane. The green label (e.g., 2) indicates the number of the CCD as is in the multi-extension FITS header. When the focal plane is viewed with the real-time display at the telescope or with default SAOImage DS9 settings, the direction labeled "north" is displayed to the left and "east" at the top. The background colors of the CCDs indicate the electronics backplane that reads them out.

1. Preparing the Science Pipelines

1.1. Set up the Science Pipelines

First, the LSST Science Pipelines ("the stack") needs to be set up on the local machine:

source "/projects/HSC/LSST/stack/loadLSST.sh"

This will set up the most recent Rubin environment installed on the machine. A list of other installed Rubin environments is shown using mamba:

mamba env list

Older Rubin environments contain older builds of the science pipelines. To switch to an older build, simply preface the shell source command above with the appropriate LSST_CONDA_ENV_NAME variable:

LSST_CONDA_ENV_NAME=lsst-scipipe-4.0.0; \ source "/projects/HSC/LSST/stack/loadLSST.sh"

The most recently installed version can again be loaded by unsetting this variable:

unset LSST_CONDA_ENV_NAME; \ source "/projects/HSC/LSST/stack/loadLSST.sh"

Once the science pipelines have been set up, we can now set up the main LSST software package, lsst_distrib, using setup:

setup lsst_distrib

It's possible to set up a specific tagged version of lsst_distrib using -t:

setup lsst_distrib -t w_2022_26

Check what version of lsst_distrib is being used using eups:

eups list lsst_distrib | grep setup # g0b29ad24fb+e8b8cae3ca current w_2022_26 setup

1.2. Register new filters

This step is only required if the data to be ingested uses a filter which is not already defined. Before being able to ingest raw science frames, all necessary filters being ingested need to be defined in the relevant obs_ package (Update, and also in the skymap repo - see the end of this section for further details). Here, the relevant package is obs_decam, and the filters file is located at obs_decam/python/lsst/obs/decam/decamFilters.py.

For this example, the required observation filter (N708 DECam c0012 7080.0 400.0) was not previously defined and had to be added manually. This modification has now been merged into the main branch, but the instructions on how to do this are maintained here, for reference. As a recap, to do so, first, git clone the obs_decam package into a local directory:

OBSDECAM=/home/lkelvin/repos/obs_decam git clone git@github.com:lsst/obs_decam.git $OBSDECAM cd $OBSDECAM

If this is the first time the package has been cloned, it will also need to be built using scons (as with all Science Pipelines packages), e.g.:

scons -j8

Next, we checkout a user branch from the main branch to work on:

git checkout -b u/lskelvin/merian

Now add the relevant filter definition. In this case:

FilterDefinition(physical_filter="N708 DECam c0012 7080.0 400.0", band="N708", lambdaEff=708),

Finally, make sure both lsst_distrib and the relevant obs_ package (obs_decam here) are set up in the working shell:

setup -j -r $OBSDECAM

Double check that the local package has been loaded using:

eups list | grep LOCAL # obs_decam LOCAL:/home/lkelvin/repos/obs_decam

Once complete, subsequent processing should be able to proceed. If a warning similar to ingest WARN: Exposure DECam:ct4m20210318t032843 could not be registered: (sqlite3.IntegrityError) FOREIGN KEY constraint failed is returned, check that all filters are correctly assigned in the filters file.

Finally, refObjLoader lookups to the new filter need to be added to a number of obs_decam config files to facilitate astrometric matching. This allows data processing to proceed beyond characterizeImage, i.e., the final step required to produce a calexp. Here, we map the new N708 filter into the existing i-band filter (the nearest broad-band filter in wavelength) by adding lines similar to:

refObjLoader.filterMap['N708'] = 'i'

into:

config/characterizeImage.py config/calibrate.py config/measureCoaddSources.py

Note: Ticket DM-30692 added these additional config lines into the main branch.

New filters also need to be registered in the skymap repository. Central wavelengths for all required filters should be added to python/lsst/skymap/packers.py.

1.3. Create a new butler

A new butler will be created in the directory /projects/MERIAN/repo. Here we set aliases for the output repository directory:

REPO=/projects/MERIAN/repo mkdir -p $REPO chmod ug+rw $REPO

Whilst optional, it may be desirable to also construct a log directory, for log files to be stored within:

LOGDIR=/projects/MERIAN/logs mkdir -p $LOGDIR chmod ug+rw $LOGDIR

If this repository will be used by more than one user, modify the permissions of the output repository directory to ensure that all files constructed below are writeable by all members of that user group:

cd $REPO umask 2

Note: if changing permissions after the butler has been used, and if using an SQLite database (see below), you will also need to run chmod ug+rw gen3.sqlite3 to make the SQLite database read/writable to all members of your group. You will also need to run chmod ug+rw u to make the user output directory (here named u) read/writable to all members of your group.

Next, an empty Gen3 Butler repository is created, and then the instrument is registered in the data repository. In this example, the instrument is the Dark Energy Camera (DECam).

There are two types of database that be be constructed for use with the butler, either a SQLite database, or a PostgreSQL database. The former is default, and simpler to set up. The latter provides significantly improved data processing times, but requires a PostgreSQL database to have already been set up on the data processing machine in advance. Both methods are summarized in the subsections below:

Option 1: Create a SQLite database (quickest and easiest)

On the command line, create a butler repo:

butler create $REPO

This constructs a butler.yaml file in the $REPO directory.

Note: after the gen3.sqlite3 file has been constructed, you may have to manually add write permissions for group members by running the command: chmod g+w gen3.sqlite3.

Option 2: Create a PostgreSQL database (most efficient)

Before beginning to create a butler, a PostgreSQL database must first be set up on the primary data processing machine.

Once a PostgreSQL server has been set up, it may be necessary to enable the btree_gist extension. To do so, log in to the server:

psql -h localhost -U USERNAME SERVERNAME

and then create the extension:

CREATE EXTENSION "btree_gist";

If hitting a superuser permissions issue, it may be necessary to reach out to the server admins to create this extension on your behalf.

With the server set up, a number of extra Science Pipelines configuration files also need to be in place. First, construct the seed config file. It is recommended that this file is constructed in, for example, $REPO/seed-config.yaml. Assuming the database is named 'merian', the contents of this file should look like this:

datastore: root: <butlerRoot> registry: db: postgresql+psycopg2://localhost:5432/merian

Note: this file needs only be constructed once, for the purposes of creating the butler.

Second, a science pipelines authentication file needs to be created in ~/.lsst/db-auth.yaml. The contents of this file should look like this:

- url: postgresql://localhost:5432/merian username: merian password: MYSECRETPASSWORD

where MYSECRETPASSWORD is the database password for the merian database, for the merian user (the database and the username are the same in the example above, but do not necessarily need to be).

Note: each user who wishes to interact with the butler repository needs to place a copy of this authentication file within their own home space.

The authentication file must be only readable by the users account (chmod 600 db-auth.yaml).

Finally, create the butler repo:

butler create --seed-config seed-config.yaml $REPO

This constructs a butler.yaml file in the $REPO directory.

1.4. Register the instrument

Once the butler repo has been created, register the instrument:

butler register-instrument $REPO lsst.obs.decam.DarkEnergyCamera

The register-instrument command will need to be re-run (once only) every time a new filter is added to the filter definitions file.

Note: the instrument name here needs to be the fully qualified name of an instrument subclass. Full names can be inferred from their respective obs_ package at github.com/lsst. For this example, the relevant obs_ package is obs_decam and the fully qualified name is lsst.obs.decam.DarkEnergyCamera.

Finally, double check that all required filters are correctly registered with the butler:

butler query-dimension-records $REPO physical_filter

In this case, double check that N708 DECam c0012 7080.4 400.0 appears in the filter list.

1.5. Generate reference catalogues

The Science Pipelines require reference catalogues ("refcats") to accurately calibrate photometric and astrometric results. Two reference catalogues are required here: Gaia DR2 for astrometry, and Pan-STARRS PS1 for photometry. Further information is also available on the Community forum and on pipelines.lsst.io.

A number of different methods are available to ingest these catalogues. These steps are summarized below:

Option 1: Using butler ingest-files

Ingesting survey reference catalogues

The first step in constructing these reference catalogues is to gather the catalogue data together and ingest the files. This process is decidedly non-trivial, and may require several hours to complete even on a high-powered machine.

If ingesting for the first time, all raw files can be downloaded at the link described in this comment on the Community forum.

Fortunately for our purposes, these downloaded FITS files already exist on the machine used here, and can be used directly:

GAIADR2=/projects/HSC/refcats/htm/gaia_dr2_20200414 PANSTARRSPS1=/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110

Once the files are in place, we create astropy-readable .ecsv table files containing one row per input file in each reference catalogue. To construct these, in Python:

import os import glob import astropy.table # output directory to save .ecsv files outdir = "/home/lkelvin" # full paths to LSST sharded reference catalogues gaiadr2 = "/projects/HSC/refcats/htm/gaia_dr2_20200414" panstarrsps1 = "/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110" refcat_dirs = [gaiadr2, panstarrsps1] # loop over each FITS file in all refcats # note: this constructs a series of .ecsv files, each containing two columns: # 1) the FITS filename, and 2) the htm7 pixel index for refcat_dir in refcat_dirs: outfile = f"{outdir}/{os.path.basename(refcat_dir)}.ecsv" print(f"Saving to: {outfile}") table = astropy.table.Table(names=("filename", "htm7"), dtype=("str", "int")) files = glob.glob(f"{refcat_dir}/[0-9]*.fits") for ii, file in enumerate(files): print(f"{ii}/{len(files)} ({100*ii/len(files):0.1f}%)", end="\r") # try/except to catch extra .fits files which may be in this dir try: file_index = int(os.path.basename(os.path.splitext(file)[0])) except ValueError: continue else: table.add_row((file, file_index)) table.write(outfile)

Note: the above script running on the tiger machine took ~20 minutes, 10 minutes per reference catalogue.

A .ecsv file should now exist for each reference catalogue. Next, register the dataset types for each reference catalogue with the butler:

butler register-dataset-type $REPO gaia_dr2_20200414 SimpleCatalog htm7 butler register-dataset-type $REPO ps1_pv3_3pi_20170110 SimpleCatalog htm7

Check that both the Gaia DR2 and Pan-STARRS PS1 dataset types are now available using:

butler query-dataset-types $REPO

Finally, ingest the LSST-formatted files into the refcats/gen2 RUN collection in the repository:

butler ingest-files -t link $REPO gaia_dr2_20200414 refcats/gen2 gaia_dr2_20200414.ecsv butler ingest-files -t link $REPO ps1_pv3_3pi_20170110 refcats/gen2 ps1_pv3_3pi_20170110.ecsv
Option 2: Using convert_refcats.py

Legacy reference catalogue conversion

When these instructions were first written, it was not possible to convert these refcats directly into a gen3 repo (as we're setting up here). The instructions below are preserved, for reference, but should no longer be required (assuming the above gen3-approach suffices). Here we use butler convert to map these data into our gen3 repo.

First, set up an empty temporary gen2 butler directory, and link the two reference datasets into this directory within a folder named ref_cats:

GEN2REPO=/projects/MERIAN/repo_gen2 mkdir -p $GEN2REPO/ref_cats echo "lsst.obs.decam.DecamMapper" > $GEN2REPO/_mapper ln -s $GAIADR2 $GEN2REPO/ref_cats/ ln -s $PANSTARRSPS1 $GEN2REPO/ref_cats/

Next, a simple config file is required:

echo 'config.refCats.append("gaia_dr2_20200414") config.runs["gaia_dr2_20200414"] = "refcats" config.refCats.append("ps1_pv3_3pi_20170110") config.runs["ps1_pv3_3pi_20170110"] = "refcats" config.doRegisterInstrument = False' \ > $GEN2REPO/convert_refcats.py

Finally, we need to run the Science Pipelines refcat conversion script rootRepoConverter.py.

Note: at the time of writing, a modified version of rootRepoConverter.py was required to complete the butler convert command below without raising errors. Specifically, L137-L140 (from if not self.task.dry_run: to else:) were commented out, L141 (self.task.log.info...) was unindented, and L208-L226 (from if self._refCats: to self._chain.append(chained)) were commented out. This approach is not recommended for general use. This script also adds curated calibrations, negating the relevant section below.

To set up this package, run setup -j -r ~/repos/obs_base and either switch to a branch with the changes listed above in place, or make the changes temporarily to the code in place.

Once the command below has completed successfully, any changes made to the master branch can be undone using this on the command line: git reset --hard origin/master.

At present, ticket DM-30624 now makes ingesting refcats from a gen2 repo easier.

To run the butler conversion script:

LOGFILE=$LOGDIR/convert_refcats.log; \ date | tee $LOGFILE; \ butler convert $REPO \ --gen2root $GEN2REPO \ -C $GEN2REPO/convert_refcats.py \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: the runtime for this command was ~20 minutes.

Option 3: Using butler transfer-datasets

Repo-to-Repo Refcat Dataset Transfer

If a butler already exists on the machine, the refcat datasets can be transferred over from the existing butler directly:

LOGFILE=$LOGDIR/transfer_refcats.log; \ date | tee $LOGFILE; \ butler transfer-datasets --register-dataset-types \ -t copy --collections refcats/gen2 \ /projects/MERIAN/repo $REPO \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: on the lsst-devl machine at NCSA, the runtime for this command was ~60 minutes. The transfer type can be a number of different options, such as copy to make a copy of the files, or link to make only a symbolic link.

Once the refcats are in place, their collection can be confirmed:

butler query-collections $REPO "refcats*"

Following a dataset transfer, it may be necessary to set up the parent refcats parent CHAINED collection which contains a comma-separated list of all of the transferred child refcats/... RUN collections:

PARENT=refcats; \ CHILDREN=refcats/gen2; \ butler collection-chain $REPO $PARENT $CHILDREN

1.6. Register the skyMap

Make a sky map and add it to the repository. Sky maps exist as dimensions, datasets and collections. Until the DM-34516 ticket was merged, prior DECam data reductions made use of the HSC skymap: hsc_rings_v1. DECam data reductions now instead make use of a DECam-specific sky map, maintaining the native pixel scale of DECam imaging. By default, DECam tracts have the same centroids as their HSC-counterparts, albeit with fewer patches per tract. An example DECam tract is shown below:

DECam tract 9704

DECam Tract 9704 as defined using the decam_rings_v1 sky map. DECam tracts have the same centroids as HSC tracts (as defined by the hsc_rings_v1 sky map). Each DECam tract is further sub-divided into 6x6 patches, each of ~4k pixels on a side.

This command registers the DECam skyMap dataset using the default obs_decam configuration, and setting the output name to the commonly used and de-facto standard decam_rings_v1:

LOGFILE=$LOGDIR/register_decam_rings_v1.log; \ date | tee $LOGFILE; \ butler register-skymap $REPO \ -C $OBS_DECAM_DIR/config/makeSkyMap.py \ -c name='decam_rings_v1' \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: the runtime for this command was ~5 minutes.

If the HSC skymap is also required, it may be generated as well:

LOGFILE=$LOGDIR/register_hsc_rings_v1.log; \ date | tee $LOGFILE; \ butler register-skymap $REPO \ -C $OBS_SUBARU_DIR/config/makeSkyMap.py \ -c name='hsc_rings_v1' \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Check that all required skyMap dataset types now exist in the skymaps run collection in the Merian repo:

butler query-datasets $REPO skyMap

Note: this sky map step doesn't really need to be performed until much later, however, if any errors occur during registration of the sky map, it may be necessary to delete the repo and start afresh. For this reason, it's usually better to perform this step as soon as possible.

1.7. Write curated calibrations

Curated calibrations are collections of calibration data which describe various aspects of the camera and survey. If setting up a new butler on a new machine, an instrument's curated calibrations will need to be added to the data repository:

butler write-curated-calibrations $REPO lsst.obs.decam.DarkEnergyCamera

Note: if the modified version of rootRepoConverter.py was used above, the dataset types added by this command may already have been ingested into the repo. If so, running the above command will fail with an error similar to A database constraint failure was triggered by inserting one or more datasets of type DatasetType('camera', {instrument}, Camera, isCalibration=True) into collection 'DECam/calib/unbounded'. This probably means a dataset with the same data ID and dataset type already exists, but it may also mean a dimension row is missing..

The instrument may be specified via either the fully qualified name, as above, or the short name (DECam here). The -h help file indicates the former is required, but this advice may change in the future.

This currently adds camera, crosstalk, defects and linearizer dataset types into the repository within a number of collections (DECam/calib, DECam/calib/unbounded, and DECam/calib/curated/{timestamp}). Check the current collections within the repo using:

butler query-collections $REPO "DECam/calib*"

2. Ingest raw data

2.1. Ingest raw science frames

We're now prepared to ingest raw science frames. If raw frames are being stored in multiple directories, this command needs to be repeated for each directory. Alternatively, a sufficient glob which is able to locate all files of interest may be supplied. Here's an example data ingest command:

LOGFILE=$LOGDIR/merian9813_ingest_science.log; \ SCIFILES=/project/lskelvin/decam/raw-merian/science/raw_*.fz; \ date | tee $LOGFILE; \ butler ingest-raws $REPO $SCIFILES --transfer link \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: The runtime for this command was ~5 minutes, ingesting 2480 distinct Butler datasets from 40 exposures.

Raw science exposures have now been added to the DECam/raw/all collection in the Merian repo. Collections define groups of data, and can be listed (and searched) using:

butler query-collections $REPO "DECam/raw/all"

Ingested exposures can be listed on the command line using:

butler query-dimension-records $REPO exposure \ --where "instrument='DECam' AND exposure.observation_type='science'"

This view shows all available dimensions associated with each ingested science image, including the observation ID, the physical filter, and the observation type. Alternatively, datasets can be queried directly using query-datasets, with optional SQL-like --where arguments to search specific dimensions, e.g.:

WHERE="instrument='DECam' AND exposure=971666" WHERE="instrument='DECam' AND detector=1" WHERE="instrument='DECam' AND exposure.observation_type='science' AND exposure.day_obs > 20220101 AND exposure.day_obs < 20220201 AND detector=1" butler query-datasets $REPO --where $WHERE raw

Note: To successfully use the --where arguement, other dimensions may be required, such as instrument. The butler will complain with a UserExpressionError if a required dimension is not found.

A list of science exposure IDs can similarly be extracted within python:

queryData = butler.registry.queryDatasets where = "exposure.observation_type='science' AND detector=1" exps = list(queryData("raw", collections="DECam/raw/all", instrument="DECam", where=where)) expids = tuple(x.dataId["exposure"] for x in exps) print(f'SCIEXPS="{expids}"')

The test dataset used here returns this list of science exposures:

SCIEXPS="(971666, 971667, 971668, 971669, 971670, 971671, 971672, 971673, 971674, 971675, 971676, 971677, 971678, 971679, 971680, 971681, 971682, 971683, 971684, 971685, 1068554, 1068555, 1068556, 1068557, 1068558, 1068559, 1068560, 1068561, 1068562, 1068713, 1068714, 1068715, 1068716, 1068717, 1068718, 1068719, 1068720, 1068721, 1068722, 1068723)"

2.2. Ingest raw bias frames

As with the raw science frames above, raw bias frames ('zero') are also ingested:

LOGFILE=$LOGDIR/merian9813_ingest_bias.log; \ BIASFILES=/project/lskelvin/decam/raw-merian/bias/raw_*.fz; \ date | tee $LOGFILE; \ butler ingest-raws $REPO $BIASFILES --transfer link \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: The runtime for this command was ~1 minute, ingesting 3100 distinct Butler datasets from 50 exposures.

Bias exposures have now been ingested into the Merian repo. Check that the bias calibration frames have been successfully ingested using:

butler query-dimension-records $REPO exposure \ --where "instrument='DECam' AND exposure.observation_type='zero'"

A list of bias exposure IDs can be extracted within python:

queryData = butler.registry.queryDatasets where = "exposure.observation_type='zero' AND detector=1" exps = list(queryData("raw", collections='DECam/raw/all', instrument="DECam", where=where)) expids = tuple(x.dataId["exposure"] for x in exps) print(f'BIASEXPS="{expids}"')

The test dataset used here returns this list of bias exposures:

BIASEXPS="(970488, 970489, 970490, 970491, 970492, 970493, 970494, 970495, 970496, 970497, 970498, 970589, 970590, 970591, 970592, 970593, 970594, 970595, 970596, 970597, 970598, 970599, 970823, 970824, 970825, 970826, 970827, 970828, 970829, 970830, 970831, 970832, 970833, 970924, 970925, 970926, 970927, 970928, 970929, 970930, 970931, 970932, 970933, 970934, 971161, 971162, 971163, 971164, 971165, 971166)"

2.3. Ingest raw flat frames

The final set of data to be ingested are the raw flat frames ('dome flat'):

LOGFILE=$LOGDIR/merian9813_ingest_flat.log; \ FLATFILES=/project/lskelvin/decam/raw-merian/flat/raw_*.fz; \ date | tee $LOGFILE; \ butler ingest-raws $REPO $FLATFILES --transfer link \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: The runtime for this command was ~5 minutes, ingesting 6200 distinct Butler datasets from 100 exposures.

Flat exposures have now been ingested into the Merian repo. Check that the flat calibration frames have been successfully ingested using:

butler query-dimension-records $REPO exposure \ --where "instrument='DECam' AND exposure.observation_type='dome flat'"

Note: if an attempt is made to ingest a file which was already been ingested, the science pipelines will fail for that particular file. This behaviour is as expected, and not a cause for concern.

A list of flat exposure IDs can be extracted within python:

queryData = butler.registry.queryDatasets where = "exposure.observation_type='dome flat' AND detector=1" exps = list(queryData("raw", collections="DECam/raw/all", instrument="DECam", where=where)) expids = tuple(x.dataId["exposure"] for x in exps) print(f'FLATEXPS="{expids}"')

The test dataset used here returns this list of flat exposures:

FLATEXPS="(970228, 970229, 970230, 970231, 970232, 970233, 970234, 970235, 970236, 970237, 970238, 970501, 970502, 970503, 970504, 970505, 970506, 970507, 970508, 970509, 970510, 970511, 970836, 970837, 970838, 970839, 970840, 970841, 970842, 970843, 970844, 970845, 970846, 971174, 971175, 971176, 971177, 971178, 971179, 971180, 971181, 971182, 971183, 971184, 971554, 971555, 971556, 971557, 971558, 971559, 1052706, 1052707, 1052708, 1052709, 1052710, 1052711, 1052712, 1052713, 1052714, 1052715, 1052716, 1053093, 1053094, 1053095, 1053096, 1053097, 1053098, 1053099, 1053100, 1053101, 1053102, 1053103, 1053485, 1053486, 1053487, 1053488, 1053489, 1053490, 1053491, 1053492, 1053493, 1053494, 1053495, 1053858, 1053859, 1053860, 1053861, 1053862, 1053863, 1053864, 1053865, 1053866, 1053867, 1053868, 1054287, 1054288, 1054289, 1054290, 1054291, 1054292)"

2.4. Define visits

Once all raw data has been ingested, we can define visits from exposures in the butler registry. This sets up the exposure IDs within the butler, allowing future runs to use this information when using the -d or --where data queries. Without this step, processing steps after ISR (i.e., characterizeImage onwards) will fail with RuntimeError: QuantumGraph is empty..

LOGFILE=$LOGDIR/merian9813_define_visits.log; \ date | tee $LOGFILE; \ butler define-visits $REPO lsst.obs.decam.DarkEnergyCamera \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: this run took ~1 minute with these data. Do not use the -j N syntax here to run this process on more than one processor. Doing so will cause multiple database is locked errors as each processor attempts to write to the same butler. Ticket DM-30607 references this issue.

3. Calibration

3.1. Determine calibration frame validity ranges

Now that raw bias (zero) and dome flat calibration frames have been ingested, validation date ranges need to be determined. Prior analyses of these data show that all Merian observation nights are consistent with each other to within a 1% flux deviation. For this reason, we opt to construct calibraton frames which are certified across the entire timespan of our data; from 2021-01-01 to 2022-06-30.

3.2. Build bias frames

Next, the master bias frames are built. These frames need to be built for each valid date range (see section above).

First, check the build pipeline:

pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \ --show pipeline
Click here to toggle the DECam cpBias pipeline at the time of writing.

description: cp_pipe BIAS calibration construction for DECam instrument: lsst.obs.decam.DarkEnergyCamera tasks: isr: class: lsst.ip.isr.IsrTask config: - connections.ccdExposure: raw connections.outputExposure: cpBiasProc doWrite: true doBias: false doVariance: true doLinearize: false doCrosstalk: false doBrighterFatter: false doDark: false doFlat: false doApplyGains: false doFringe: false cpBiasCombine: class: lsst.cp.pipe.cpCombine.CalibCombineTask config: - connections.inputExpHandles: cpBiasProc connections.outputData: bias calibrationType: bias exposureScaling: Unity contracts: - contract: cpBiasCombine.calibrationType == "bias" - contract: cpBiasCombine.exposureScaling == "Unity" - contract: isr.doBias == False

The cpBias pipeline may also be viewed graphically:

pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpBias.pdf

pipeline_cpBias.pdf

Build master bias (zero) frames, ensuring that all required input collections are given as arguments to -i:

LOGFILE=$LOGDIR/merian9813_cpBias.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \ -o DECam/calib/merian9813/bias \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \ -d "instrument='DECam' AND exposure IN $BIASEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: This job contained 3162 quanta, with a runtime of ~2 hours. The -d data query takes a wide range of SQL-like arguments. Instead of specifying exposures by their exposure ID, it may be preferable to build them by date instead, e.g.: "exposure.observation_type='zero' AND exposure.day_obs > 20210910".

To avoid an error regarding missing defects dataset types, an input collection containing defects must also be supplied in the cpBias run. Here, these data will be ingested into the repo when running write-curated-calibrations, for example. The instructions here make use of the DECam/calib/curated/19700101T000000Z collection. Other collections containing defects are also available, however, some of the commonly unused detectors are missing (i.e., there are <62). If defects data are not available at all, adding -c isr:doDefect=False to the pipetask run command will disable defect masking when running the cpBias pipeline.

On occasion, some of the tasks (quanta) may fail, likely due to memory issues. In such cases, an afterburner can be run on a single core to try the failed tasks again. To do so, add --extend-run and --skip-existing to the pipetask run command, and remove -j N to prevent it from running on multiple cores. This will help ensure that the most memory-intensive quanta will not request too much simultaneous memory usage.

Check the collections, dataset types and datasets now present in the repo:

butler query-collections $REPO "*bias*" butler query-dataset-types $REPO butler query-datasets $REPO --collections DECam/calib/merian9813/bias butler query-datasets $REPO --collections DECam/calib/merian9813/bias bias

3.3. Certify bias frames

Certify the biases for a given date range. Arguments: REPO, INPUT_COLLECTION, OUTPUT_COLLECTION, DATASET_TYPE_NAME:

butler certify-calibrations \ $REPO DECam/calib/merian9813/bias DECam/calib/merian9813 bias \ --begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59

You may check what certified date ranges have been applied to the bias data in Python by querying dataset associations in the output collection. For example, to check only detector #1:

qda = butler.registry.queryDatasetAssociations coll = "DECam/calib/merian9813" biases = [x for x in qda("bias", collections=coll) if x.ref.dataId["detector"] == 1] print(biases)

which produces a list of all biases relating to detector #1 (in the case of this example document, there should be only 1 result at present). Inspecting the properties of this object gives the timespan, e.g.:

print(f"{biases[0].timespan.begin.value = }") # biases[0].timespan.begin.value = '2021-01-01 00:00:00.000000' print(f"{biases[0].timespan.end.value = }") # biases[0].timespan.end.value = '2022-06-30 23:59:59.000000'

3.4. Generate crosstalk sources

The next step is to generate crosstalk sources using step 0 of the Data Release Production (DRP) pipeline (DRP.yaml). Crosstalk sources need to be generated for any raw we want to run actual ISR on (i.e., raw flats and raw science frames). Step 0 of DRP.yaml runs only the doOverscan aspect of the ISR (instrument signature removal) task. It can be visualized using:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ --show pipeline
Click here to toggle the Merian step 0 pipeline at the time of writing.

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: isrForCrosstalkSources: class: lsst.ip.isr.IsrTask config: - connections.outputExposure: overscanRaw doOverscan: true doAssembleCcd: false doBias: false doCrosstalk: false doVariance: false doLinearize: false doDefect: false doNanMasking: false doDark: false doFlat: false doFringe: false doInterpolate: false subsets: step0: subset: - isrForCrosstalkSources description: | Tasks which should be run once, prior to initial data processing. This step generates crosstalk sources for ISR/inter-chip crosstalk by applying overscan correction on raw frames. A new dataset is written, which should be used as an input for further data processing.

The current step 0 pipeline may also be viewed graphically:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step0.pdf

pipeline_step0.pdf

Run step0 for raw flats:

LOGFILE=$LOGDIR/merian9813_step0_flat.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \ -o DECam/calib/merian9813/crosstalk \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ -d "instrument='DECam' AND exposure IN $FLATEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: this run contained 6200 quanta, with a runtime of ~2 hours. The final run message for these data was: Executed 6200 quanta successfully, 0 failed and 0 remain out of total 6200 quanta.

Extend the crosstalk RUN collection to also include science exposures:

LOGFILE=$LOGDIR/merian9813_step0_science.log; \ date | tee -a $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ --extend-run --skip-existing \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \ -o DECam/calib/merian9813/crosstalk \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \ -d "instrument='DECam' AND exposure IN $SCIEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: this run contained 2480 quanta, with a runtime of ~30 minutes. The final run message for these data was: Executed 2480 quanta successfully, 0 failed and 0 remain out of total 2480 quanta.

The overscanRaw dataset types should now be available in the output repository. Check the collections and datasets:

butler query-collections $REPO butler query-datasets $REPO overscanRaw

3.5. Build flat frames

This step constructs the master flat frames (which requires using the biases). The cpFlat pipeline can be visualized using:

pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \ --show pipeline
Click here to toggle the DECam cpFlat pipeline at the time of writing.

description: cp_pipe FLAT calibration construction for DECam instrument: lsst.obs.decam.DarkEnergyCamera tasks: isr: class: lsst.ip.isr.IsrTask config: - connections.ccdExposure: raw connections.outputExposure: cpFlatProc doWrite: true doBrighterFatter: false doFlat: false doFringe: false doApplyGains: false connections.crosstalkSources: overscanRaw connections.bias: bias doDark: false cpFlatMeasure: class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask config: - connections.inputExp: cpFlatProc connections.outputStats: flatStats doVignette: false cpFlatNorm: class: lsst.cp.pipe.cpFlatNormTask.CpFlatNormalizationTask config: - connections.inputMDs: flatStats connections.outputScales: cpFlatNormScales level: AMP cpFlatCombine: class: lsst.cp.pipe.cpCombine.CalibCombineByFilterTask config: - connections.inputExpHandles: cpFlatProc connections.inputScales: cpFlatNormScales connections.outputData: flat calibrationType: flat exposureScaling: InputList scalingLevel: AMP contracts: - contract: isr.doFlat == False

The cpFlat pipeline may also be viewed graphically:

pipetask build \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpFlat.pdf

pipeline_cpFlat.pdf

Build master flats:

LOGFILE=$LOGDIR/merian9813_cpFlat.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i DECam/raw/all,DECam/calib/merian9813,DECam/calib/merian9813/crosstalk,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \ -o DECam/calib/merian9813/flat \ -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \ -d "instrument='DECam' AND exposure IN $FLATEXPS" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: this run contained 12526 quanta, with a runtime of ~3 hours. Initializing the task may also take some time, due to the complex nature of the quantum graph.

On occasion, some quanta (tasks) may fail, likely due to memory issues. In such cases, the missing master flats can be reattempted in an afterburner by excluding -j N (to run on only a single core) and including --extend-run --skip-existing in the pipetask run command.

Check what types of data now exist in the output collection:

butler query-collections $REPO butler query-dataset-types $REPO butler query-datasets $REPO --collections DECam/calib/merian9813/flat butler query-datasets $REPO --collections DECam/calib/merian9813/flat flat

Note: the final two commands will not work if the output collection is a CHAINED collection containing a CALIBRATION child collection. If attempted, the above commands will fail with NotImplementedError: Query for dataset type 'camera' in CALIBRATION-type collection 'DECam/calib/merian' is not yet supported.. As a work-around, instead provide the full name of the child RUN collection as given by query-collections.

3.6. Certify flat frames

Certify the flats for a given date range. Arguments: REPO, INPUT_COLLECTION, OUTPUT_COLLECTION, DATASET_TYPE_NAME:

butler certify-calibrations \ $REPO DECam/calib/merian9813/flat DECam/calib/merian9813 flat \ --begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59

You may check what certified date ranges have been applied to the flat data in Python by querying dataset associations in the output collection. For example, to check only detector #1:

qda = butler.registry.queryDatasetAssociations coll = "DECam/calib/merian9813" flats = [x for x in qda("flat", collections=coll) if x.ref.dataId["detector"] == 1] print(flats)

which produces a list of all flats relating to detector #1 (in the case of this example document, there should be only 1 result at present). Inspecting the properties of this object gives the timespan, e.g.:

print(f"{flats[0].timespan.begin.value = }") # flats[0].timespan.begin.value = '2021-01-01 00:00:00.000000' print(f"{flats[0].timespan.end.value = }") # flats[0].timespan.end.value = '2022-06-30 23:59:59.000000'

4. Set up a default collection

Data in the Science Pipelines are arranged into collections; groupings of data. Here we establish a default collection which contains the commonly required raw RUN collections. Whilst this step is not strictly necessary, this will allow us to specify only a single INPUT collection for future raw data processing:

INPUT=DECam/defaults/merian9813

If this step is not performed, future data processing will need to specify all required input collections explicitly:

-i long,comma,separated,list,of,child,collections

If this step is followed then future data processing from raw data should only need to specify the default collection:

-i $INPUT

Note: it is not currently possible to query a CHAINED collection containing a CALIBRATION child collection. By constructing a dedicated CHAINED collection containing only the RUN runs of interest, this will allow users to query the CHAINED collection and avoid this error.

A CHAINED collection can be set up either on the command line or in Python. To set up a CHAINED collection on the command line for all required input collections, run:

CHILDREN="DECam/raw/all,\ DECam/calib/merian9813,\ DECam/calib/merian9813/crosstalk,\ DECam/calib/curated/19700101T000000Z,\ DECam/calib/unbounded,\ skymaps,\ refcats" butler collection-chain $REPO $INPUT $CHILDREN

Note: the CHILDREN list may be amended and the above command re-run to update this parent collection, if, for example, new data has been processed and a user would like to add the updated crosstalk RUN collection to this parent CHAINED collection.

Alternatively, this may also be achieved in Python:

import lsst.daf.butler as dafButler REPO = "/project/lskelvin/repo" default_collection = "DECam/defaults/merian9813" # Set up a writeable butler butler_writeable = dafButler.Butler(REPO, writeable=True) registry_writeable = butler_writeable.registry # Register a new default CHAINED collection registry_writeable.registerCollection(default_collection, type = dafButler.CollectionType.CHAINED) # Add required CHILD collections into the CHAINED collection registry_writeable.setCollectionChain(default_collection, ["DECam/raw/all", "DECam/calib/merian9813", "DECam/calib/merian9813/crosstalk" "DECam/calib/curated/19700101T000000Z", "DECam/calib/unbounded", "skymaps", "refcats"])

Note: as above, if reprocessing data in future runs, you can amend the list above to add your own collections, and then re-run setCollectionChain to update default_collection. This allows for the default collection to stay relevant in linking to all necessary datasets as new data becomes available.

5. Data release production

In this section we will proceed through all the relevant data processing steps to take raw DECam science data through to coadd outputs. These processed data will output into the OUTPUT CHAINED collection:

OUTPUT=DECam/runs/merian9813/w_2022_26

Here, the w_2022_26 is a reference to the weekly of the LSST Science Pipelines used to reduce these data.

Processing consists of three main steps (1, 2 and 3):

  • Step 1: single frame processing
    • instrumental signature removal, initial bg subtraction / calibration / PSF estimation
  • Step 2: post single frame processing
    • step 2a - initial visit aggregation
    • step 2b - tract-level characterization
    • step 2c - global collection summaries
    • step 2d - final source table generation
  • Step 3: coadd level processing
    • warping visit-level images onto the coadd plane, constructing a coadd, running detection & deblending algorithms

If outputting to an already existing collection in the commands below, the following arguments should be appended to the pipetask run commands below:

--extend-run --skip-existing --clobber-outputs

5.1. Step 1 - single frame processing

Processed visit images (PVIs) and preliminary source tables are produced in step 1.

Click here to toggle the Merian step 1 YAML.

To generate the pipeline YAML:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \ --show pipeline

which gives:

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: isr: class: lsst.ip.isr.IsrTask config: - connections.crosstalkSources: overscanRaw doCrosstalk: true characterizeImage: class: lsst.pipe.tasks.characterizeImage.CharacterizeImageTask calibrate: class: lsst.pipe.tasks.calibrate.CalibrateTask config: - photoCal.match.referenceSelection.magLimit.fluxField: i_flux photoCal.match.referenceSelection.magLimit.maximum: 22.0 writePreSourceTable: class: lsst.pipe.tasks.postprocess.WriteSourceTableTask config: - connections.outputCatalog: preSource transformPreSourceTable: class: lsst.pipe.tasks.postprocess.TransformSourceTableTask config: - connections.inputCatalog: preSource connections.outputCatalog: preSourceTable subsets: processCcd: subset: - characterizeImage - isr - calibrate description: 'Set of tasks to run when doing single frame processing, without any conversions to Parquet/DataFrames or visit-level summaries. ' step1: subset: - writePreSourceTable - calibrate - transformPreSourceTable - characterizeImage - isr description: | Per-detector tasks that can be run together to start the DRP pipeline. These should never be run with 'tract' or 'patch' as part of the data ID expression if any later steps will also be run, because downstream steps require full visits and 'tract' and 'patch' constraints will always select partial visits that overlap that region.
Click here to toggle the Merian step 1 graph.

To generate the pipeline graph:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step1.pdf

which gives:

pipeline_step1.pdf

Run step 1:

DATAQUERY="exposure.day_obs > 20210101 AND exposure.day_obs < 20220630 AND exposure.observation_type='science' AND detector NOT IN (31,61)" LOGFILE=$LOGDIR/merian9813_step1.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \ -d "instrument='DECam' AND $DATAQUERY" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: the runtime for this 12000 quanta run was ~4 hours. As is common in DECam data processing, detectors 31 and 61 have been excluded from data reduction owing to their corrupt nature. Instead of selecting by date as above, a wide range of data selectors may be used instead to identify specific raw data to be processed. For example, 'exposure IN $SCIEXPS' would only process exposures defined in the SCIEXPS object.

5.2a. Step 2a - initial visit aggregation

Initial visit aggregation takes place in step 2a, producing visit-wide preliminary source tables and visit summaries.

Click here to toggle the Merian step 2a YAML.

To generate the pipeline YAML:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \ --show pipeline

which gives:

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: consolidateVisitSummary: class: lsst.pipe.tasks.postprocess.ConsolidateVisitSummaryTask consolidatePreSourceTable: class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask config: - connections.inputCatalogs: preSourceTable connections.outputCatalog: preSourceTable_visit subsets: step2a: subset: - consolidateVisitSummary - consolidatePreSourceTable description: | Visit-level tasks Allowed data query constraints: visit Tasks aggregate all detectors for a given visit.
Click here to toggle the Merian step 2a graph.

To generate the pipeline graph:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2a.pdf

which gives:

pipeline_step2a.pdf

Run step 2a:

LOGFILE=$LOGDIR/merian9813_step2a.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \ -d "instrument='DECam' AND $DATAQUERY" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Note: the runtime for this 80 quanta run was ~3 minutes.

5.2b. Step 2b - Photometric and astrometric calibration

Photometric and astrometric calibration take place at the tract level in step 2b.

Click here to toggle the Merian step 2b YAML.

To generate the pipeline YAML:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \ --show pipeline

which gives:

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: isolatedStarAssociation: class: lsst.pipe.tasks.isolatedStarAssociation.IsolatedStarAssociationTask config: - python: config.band_order += ["N708", "N540"] jointcal: class: lsst.jointcal.JointcalTask config: - connections.inputSourceTableVisit: preSourceTable_visit subsets: step2b: subset: - isolatedStarAssociation - jointcal description: | Tract-level tasks Allowed data query constraints: tract Jointcal and isolatedStarAssociation both use PreSources, generated by consolidatePreSourceTable, for all visits that overlap a tract. jointcal produces solutions per-tract, per-visit isolatedStarAssociation produces solutions per-tract.
Click here to toggle the Merian step 2b graph.

To generate the pipeline graph:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2b.pdf

which gives:

pipeline_step2b.pdf

Run step 2b:

LOGFILE=$LOGDIR/merian9813_step2b.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \ -d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

5.2c. Step 2c - global collection summaries (DEPRECATED IN 2023)

Step 2c no longer exists in the current (2023+) Merian/DECam data reduction pipeline. The notes in this section below have been preserved as-is, but should not be referenced for future data reduction efforts.

Global per-collection summaries of visits and detectors are generated in step 2c.

Click here to toggle the Merian step 2c YAML.

To generate the pipeline YAML:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \ --show pipeline

which gives:

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: makeCcdVisitTable: class: lsst.pipe.tasks.postprocess.MakeCcdVisitTableTask makeVisitTable: class: lsst.pipe.tasks.postprocess.MakeVisitTableTask subsets: step2c: subset: - makeVisitTable - makeCcdVisitTable description: | Global-level tasks that must not be run with any data query constraints Can be run anytime after subset step2a. Allowed data query constraints: instrument Tasks generate one data product per collection. make[Ccd]VisitTable produces per-collection summary of the Visits and CcdVisits.
Click here to toggle the Merian step 2c graph.

To generate the pipeline graph:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2c.pdf

which gives:

pipeline_step2c.pdf

Run step 2c:

LOGFILE=$LOGDIR/merian9813_step2c.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \ -d "instrument='DECam'" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

5.2d. Step 2d - final source table generation

Generation of final source tables with full calibrations applied takes place in step 2d.

Click here to toggle the Merian step 2d YAML.

To generate the pipeline YAML:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \ --show pipeline

which gives:

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: transformSourceTable: class: lsst.pipe.tasks.postprocess.TransformSourceTableTask consolidateSourceTable: class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask finalizeCharacterization: class: lsst.pipe.tasks.finalizeCharacterization.FinalizeCharacterizationTask writeRecalibratedSourceTable: class: lsst.pipe.tasks.postprocess.WriteRecalibratedSourceTableTask config: - useGlobalExternalPhotoCalib: false connections.photoCalibName: jointcal connections.outputCatalog: source subsets: step2d: subset: - writeRecalibratedSourceTable - finalizeCharacterization - consolidateSourceTable - transformSourceTable description: | Visit-level tasks. Allowed data query constraints: visit writeRecalibratedSourceTable, transformSourceTable run per-detector consolidateSourceTable produces one data product per visit. finalizeCharacterization will eventually model full focal plane PSFs.
Click here to toggle the Merian step 2d graph.

To generate the pipeline graph:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2d.pdf

which gives:

pipeline_step2d.pdf

Run step 2d:

LOGFILE=$LOGDIR/merian9813_step2d.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \ -d "instrument='DECam' AND $DATAQUERY" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

5.2e. Step 2e - make final visit and CCD visit tables (NEW IN 2023)

Generation of final visit tables and CCD visit tables takes place in step 2e.

Click here to toggle the Merian step 2e YAML.

To generate the pipeline YAML:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \ --show pipeline

which gives:

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: makeCcdVisitTable: class: lsst.pipe.tasks.postprocess.MakeCcdVisitTableTask makeVisitTable: class: lsst.pipe.tasks.postprocess.MakeVisitTableTask subsets: step2e: subset: - makeVisitTable - makeCcdVisitTable description: | Global-level tasks that must not be run with any data query constraints Can be run anytime after subset step2d. Allowed data query constraints: instrument Tasks generate one data product per collection. make[Ccd]VisitTable produces per-collection summary of the Visits and CcdVisits.
Click here to toggle the Merian step 2e graph.

To generate the pipeline graph:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2e.pdf

which gives:

pipeline_step2e.pdf

Run step 2e:

LOGFILE=$LOGDIR/merian9813_step2e.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \ -d "instrument='DECam' AND $DATAQUERY" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

5.3. Step 3 - coadd processing

Coadd-processing takes place in step 3. A large number of tasks are performed during this step, including, but not limited to:

Click here to toggle the Merian step 3 YAML.

To generate the pipeline YAML:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \ --show pipeline

which gives:

description: | The DRP pipeline specialized for the DECam instrument, developed against the Merian dataset. instrument: lsst.obs.decam.DarkEnergyCamera parameters: band: i tasks: makeWarp: class: lsst.pipe.tasks.makeCoaddTempExp.MakeWarpTask config: - makePsfMatched: true - python: | config.warpAndPsfMatch.psfMatch.kernel['AL'].alardSigGauss = [1.0, 2.0, 4.5] from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask config.select.retarget(PsfWcsSelectImagesTask) connections.photoCalibName: jointcal matchingKernelSize: 29 doApplyExternalPhotoCalib: true externalPhotoCalibName: jointcal doApplyExternalSkyWcs: true modelPsf.defaultFwhm: 7.7 warpAndPsfMatch.warp.warpingKernelName: lanczos5 coaddPsf.warpingKernelName: lanczos5 useGlobalExternalPhotoCalib: false doWriteEmptyWarps: true assembleCoadd: class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask config: - doInputMap: true - python: | config.removeMaskPlanes.append("CROSSTALK") config.badMaskPlanes += ["SUSPECT"] from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask config.select.retarget(PsfWcsSelectImagesTask) matchingKernelSize: 29 doApplyExternalPhotoCalib: true externalPhotoCalibName: jointcal doApplyExternalSkyWcs: true subregionSize: (10000, 200) doNImage: true interpImage.transpose: true coaddPsf.warpingKernelName: lanczos5 assembleStaticSkyModel.subregionSize: (10000, 200) assembleStaticSkyModel.doApplyExternalPhotoCalib: true assembleStaticSkyModel.externalPhotoCalibName: jointcal assembleStaticSkyModel.doApplyExternalSkyWcs: true doFilterMorphological: true useGlobalExternalPhotoCalib: false assembleStaticSkyModel.useGlobalExternalPhotoCalib: false doAttachTransmissionCurve: false detection: class: lsst.pipe.tasks.multiBand.DetectCoaddSourcesTask mergeDetections: class: lsst.pipe.tasks.mergeDetections.MergeDetectionsTask deblend: class: lsst.pipe.tasks.deblendCoaddSourcesPipeline.DeblendCoaddSourcesMultiTask measure: class: lsst.pipe.tasks.multiBand.MeasureMergedCoaddSourcesTask mergeMeasurements: class: lsst.pipe.tasks.mergeMeasurements.MergeMeasurementsTask writeObjectTable: class: lsst.pipe.tasks.postprocess.WriteObjectTableTask transformObjectTable: class: lsst.pipe.tasks.postprocess.TransformObjectCatalogTask consolidateObjectTable: class: lsst.pipe.tasks.postprocess.ConsolidateObjectTableTask forcedPhotCoadd: class: lsst.meas.base.forcedPhotCoadd.ForcedPhotCoaddTask selectGoodSeeingVisits: class: lsst.pipe.tasks.selectImages.BestSeeingQuantileSelectVisitsTask config: - connections.goodVisits: goodSeeingVisits templateGen: class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask config: - doSelectVisits: true assembleStaticSkyModel.doSelectVisits: true connections.selectedVisits: goodSeeingVisits connections.outputCoaddName: goodSeeing connections.coaddExposure: goodSeeingCoadd - python: | config.removeMaskPlanes.append("CROSSTALK") config.badMaskPlanes += ["SUSPECT"] from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask config.select.retarget(PsfWcsSelectImagesTask) matchingKernelSize: 29 doApplyExternalPhotoCalib: true externalPhotoCalibName: jointcal doApplyExternalSkyWcs: true subregionSize: (10000, 200) doNImage: true interpImage.transpose: true coaddPsf.warpingKernelName: lanczos5 assembleStaticSkyModel.subregionSize: (10000, 200) assembleStaticSkyModel.doApplyExternalPhotoCalib: true assembleStaticSkyModel.externalPhotoCalibName: jointcal assembleStaticSkyModel.doApplyExternalSkyWcs: true doFilterMorphological: true useGlobalExternalPhotoCalib: false assembleStaticSkyModel.useGlobalExternalPhotoCalib: false doAttachTransmissionCurve: false contracts: - contract: '''calib_psf_candidate'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf else True' - contract: '''calib_psf_reserved'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf else True' - contract: '''calib_psf_used'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf else True' - contract: selectGoodSeeingVisits.connections.goodVisits == templateGen.connections.selectedVisits subsets: multiband: subset: - detection - mergeDetections - deblend - measure - mergeMeasurements - forcedPhotCoadd description: 'A set of tasks to run when making measurements on coadds. ' objectTable: subset: - consolidateObjectTable - writeObjectTable - transformObjectTable description: 'A set of tasks to transform multiband outputs into a parquet object table. ' step3: subset: - consolidateObjectTable - mergeMeasurements - detection - mergeDetections - selectGoodSeeingVisits - writeObjectTable - deblend - templateGen - assembleCoadd - transformObjectTable - measure - makeWarp - forcedPhotCoadd description: | Tract-level tasks that can be run together, but only after the 'step1' and 'step2' subsets. These should be run with explicit 'tract' constraints essentially all the time, because otherwise quanta will be created for jobs with only partial visit coverage.
Click here to toggle the Merian step 3 graph.

To generate the pipeline graph:

pipetask build \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \ --pipeline-dot /tmp/pipeline.dot; \ dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step3.pdf

which gives:

pipeline_step3.pdf

Run step 3:

LOGFILE=$LOGDIR/merian9813_step3.log; \ date | tee $LOGFILE; \ pipetask --long-log run --register-dataset-types -j 12 \ -b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \ -i $INPUT \ -o $OUTPUT \ -p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \ -d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

A. Useful commands

This section provides some useful command-line commands which may be used to interact with the data.

query-collections

Query collections in the repo:

butler query-collections $REPO "*u/lskelvin*"

The final search pattern may use standard glob syntax (e.g., note the asterisks above).

query-datasets

Query the datasets which live in a given collection:

butler query-datasets $REPO \ --collections u/lskelvin/testrun/01 \ --where "instrument='DECam' AND skymap='hsc_rings_v1' AND tract=9813" \ calexp

If the final dataset type (calexp in the example above) is not given, all dataset types found will be printed to the command line.

collection-chain

Redefine a CHAINED collection to only contain certain child RUN collections:

butler collection-chain $REPO PARENT "CHILD1,CHILD2"

This command is useful to use prior to attempting to delete a CHAINED collection, ensuring that no attempt is made to delete input raw collections.

remove-runs

Remove one or more RUN collections:

butler remove-runs $REPO COLLECTION

remove-collections

Remove one or more non-RUN collections:

butler remove-collections $REPO COLLECTION

B. What tracts cover my data?

The visitSummary tables produced in step 2a contain important information on single frame processed visits. This information may be used to find out which tracts overlap with your data.

To generate a list of tract overlaps for a single visit, in Python:

from collections import defaultdict import lsst.daf.butler as dafButler butler = dafButler.Butler('/project/lskelvin/repo') grouped_by_tract = defaultdict(set) for data_id in butler.registry.queryDataIds( ["tract", "visit", "detector"], datasets="visitSummary", collections="DECam/runs/merian9813/w_2022_26", instrument="DECam", visit=971666, ): grouped_by_tract[data_id["tract"]].add(data_id) print({k: len(v) for k, v in grouped_by_tract.items()})

To get total tract coverage for all visits in a given collection, remove the visit= argument above.

C. Transferring datasets from one machine to another

To transfer datasets from one machine to another (e.g., from SLAC to Princeton), first, on the source machine in Python:

outdir = "/sdf/data/rubin/u/lskelvin/merian" datasetType = ["objectTable_tract", "deepCoadd", "deepCoadd_calexp"] collection = "HSC/runs/RC2/w_2022_04/DM-33402" dataId = dict(skymap="hsc_rings_v1", tract=9813) with butler.export(directory=outdir, format="yaml", transfer="copy") as export: items = [] found = set(butler.registry.queryDatasets(datasetType, collections=collection, dataId=dataId)) items.extend(found) export.saveDatasets(items)

Next, in the output directory on the source machine:

tar -czvf data_transfer.tar.gz *

Transfer the file (here named data_transfer.tar.gz) from the source machine to the destination machine. Extract the tarball on the source machine:

tar -xzvf data_transfer.tar.gz

Next, on the source machine:

LOGFILE=$LOGDIR/data_import.log; \ butler import $REPO \ /path/to/data_transfer_directory \ --transfer copy \ --skip-dimensions skymap,tract,patch \ 2>&1 | tee -a $LOGFILE; \ date | tee -a $LOGFILE

Finally, set up a similarly named parent collection, e.g.:

PARENT=HSC/runs/RC2/w_2022_04/DM-33402 CHILD=HSC/runs/RC2/w_2022_04/DM-33402/20220128T212035Z butler collection-chain $REPO $PARENT $CHILD

D. Decertifying a calibration dataset

To decertify a calibration collection (because, e.g., a new calibration collection has been generated and intended to replace the existing certified data on-disk):

writeable_butler = dafButler.Butler( '/projects/MERIAN/repo', writeable=True ) writeable_butler.registry.decertify( collection='DECam/calib/merian', datasetType='bias', timespan=lsst.daf.butler.Timespan(None, None), ) writeable_butler.registry.decertify( collection='DECam/calib/merian', datasetType='flat', timespan=lsst.daf.butler.Timespan(None, None), )