The Merian Survey is an ambitious program designed to explore the nature of dark matter, star formation, and feedback in dwarf galaxies. Merian will use 62 nights on the 4m Blanco telescope in Chile using the Dark Energy Camera (DECam). A total of 800 square degrees of the sky will be imaged in two custom made medium-band filters to create a sample of 100,000 star forming dwarf galaxies (with 90% completeness) in the redshift range 0.058 < z < 0.10. Merian will cover the HSC SSP Wide field which provides gravitational lensing capabilities to probe the dark matter component of dwarfs (Leauthaud et al. 2020).
This note summarizes the ingestion and data reduction process for Merian data, initially observed in Feb/March 2021, using the Rubin Observatory LSST Science Pipelines.
There are three primary DECam dataset types:
object
N540 DECam c0014 5403.2 210.0
N708 DECam c0012 7080.0 400.0
tiger2-sumire
:
/projects/MERIAN/raw/*/object
zero
tiger2-sumire
:
/projects/MERIAN/raw/*/zero
dome flat
tiger2-sumire
:
/projects/MERIAN/raw/*/domeflat
The DECam Focal Plane; figure from Diehl et al. 2018. DECam focal plane showing the 62 2k x 4k CCDs, 8 2k x 2k CCDs (labeled "F") for the adaptive optics system, and 4 2k x 2k CCDs (labeled "G") for guiding. The orientation of the sky is indicated. The black label (e.g., S30) indicates a position on the focal plane. The green label (e.g., 2) indicates the number of the CCD as is in the multi-extension FITS header. When the focal plane is viewed with the real-time display at the telescope or with default SAOImage DS9 settings, the direction labeled "north" is displayed to the left and "east" at the top. The background colors of the CCDs indicate the electronics backplane that reads them out.
First, the LSST Science Pipelines ("the stack") needs to be set up on the local machine:
source "/projects/HSC/LSST/stack/loadLSST.sh"
This will set up the most recent Rubin environment installed on the machine. A list of other installed Rubin environments is shown using mamba
:
mamba env list
Older Rubin environments contain older builds of the science pipelines. To switch to an older build, simply preface the shell source command above with the appropriate LSST_CONDA_ENV_NAME
variable:
LSST_CONDA_ENV_NAME=lsst-scipipe-4.0.0; \
source "/projects/HSC/LSST/stack/loadLSST.sh"
The most recently installed version can again be loaded by unsetting this variable:
unset LSST_CONDA_ENV_NAME; \
source "/projects/HSC/LSST/stack/loadLSST.sh"
Once the science pipelines have been set up, we can now set up the main LSST software package, lsst_distrib
, using setup
:
setup lsst_distrib
It's possible to set up a specific tagged version of lsst_distrib
using -t
:
setup lsst_distrib -t w_2022_26
Check what version of lsst_distrib
is being used using eups
:
eups list lsst_distrib | grep setup
# g0b29ad24fb+e8b8cae3ca current w_2022_26 setup
This step is only required if the data to be ingested uses a filter which is not already defined. Before being able to ingest raw science frames, all necessary filters being ingested need to be defined in the relevant obs_
package (Update, and also in the skymap
repo - see the end of this section for further details). Here, the relevant package is obs_decam, and the filters file is located at obs_decam/python/lsst/obs/decam/decamFilters.py.
For this example, the required observation filter (N708 DECam c0012 7080.0 400.0
) was not previously defined and had to be added manually. This modification has now been merged into the main branch, but the instructions on how to do this are maintained here, for reference. As a recap, to do so, first, git clone
the obs_decam
package into a local directory:
OBSDECAM=/home/lkelvin/repos/obs_decam
git clone git@github.com:lsst/obs_decam.git $OBSDECAM
cd $OBSDECAM
If this is the first time the package has been cloned, it will also need to be built using scons
(as with all Science Pipelines packages), e.g.:
scons -j8
Next, we checkout a user branch from the main branch to work on:
git checkout -b u/lskelvin/merian
Now add the relevant filter definition. In this case:
FilterDefinition(physical_filter="N708 DECam c0012 7080.0 400.0",
band="N708",
lambdaEff=708),
Finally, make sure both lsst_distrib
and the relevant obs_
package (obs_decam
here) are set up in the working shell:
setup -j -r $OBSDECAM
Double check that the local package has been loaded using:
eups list | grep LOCAL
# obs_decam LOCAL:/home/lkelvin/repos/obs_decam
Once complete, subsequent processing should be able to proceed. If a warning similar to ingest WARN: Exposure DECam:ct4m20210318t032843 could not be registered: (sqlite3.IntegrityError) FOREIGN KEY constraint failed
is returned, check that all filters are correctly assigned in the filters file.
Finally, refObjLoader
lookups to the new filter need to be added to a number of obs_decam
config files to facilitate astrometric matching.
This allows data processing to proceed beyond characterizeImage
, i.e., the final step required to produce a calexp
.
Here, we map the new N708
filter into the existing i
-band filter (the nearest broad-band filter in wavelength) by adding lines similar to:
refObjLoader.filterMap['N708'] = 'i'
into:
config/characterizeImage.py
config/calibrate.py
config/measureCoaddSources.py
Note: Ticket DM-30692 added these additional config lines into the main branch.
New filters also need to be registered in the skymap
repository. Central wavelengths for all required filters should be added to python/lsst/skymap/packers.py
.
A new butler will be created in the directory /projects/MERIAN/repo
. Here we set aliases for the output repository directory:
REPO=/projects/MERIAN/repo
mkdir -p $REPO
chmod ug+rw $REPO
Whilst optional, it may be desirable to also construct a log directory, for log files to be stored within:
LOGDIR=/projects/MERIAN/logs
mkdir -p $LOGDIR
chmod ug+rw $LOGDIR
If this repository will be used by more than one user, modify the permissions of the output repository directory to ensure that all files constructed below are writeable by all members of that user group:
cd $REPO
umask 2
Note: if changing permissions after the butler has been used, and if using an SQLite database (see below), you will also need to run
chmod ug+rw gen3.sqlite3
to make the SQLite database read/writable to all members of your group. You will also need to runchmod ug+rw u
to make the user output directory (here namedu
) read/writable to all members of your group.
Next, an empty Gen3 Butler repository is created, and then the instrument is registered in the data repository. In this example, the instrument is the Dark Energy Camera (DECam).
There are two types of database that be be constructed for use with the butler, either a SQLite database, or a PostgreSQL database. The former is default, and simpler to set up. The latter provides significantly improved data processing times, but requires a PostgreSQL database to have already been set up on the data processing machine in advance. Both methods are summarized in the subsections below:
On the command line, create a butler repo:
butler create $REPO
This constructs a butler.yaml file in the $REPO directory.
Note: after the
gen3.sqlite3
file has been constructed, you may have to manually add write permissions for group members by running the command:chmod g+w gen3.sqlite3
.
Before beginning to create a butler, a PostgreSQL database must first be set up on the primary data processing machine.
Once a PostgreSQL server has been set up, it may be necessary to enable the btree_gist
extension. To do so, log in to the server:
psql -h localhost -U USERNAME SERVERNAME
and then create the extension:
CREATE EXTENSION "btree_gist";
If hitting a superuser permissions issue, it may be necessary to reach out to the server admins to create this extension on your behalf.
With the server set up, a number of extra Science Pipelines configuration files also need to be in place. First, construct the seed config file. It is recommended that this file is constructed in, for example, $REPO/seed-config.yaml
. Assuming the database is named 'merian', the contents of this file should look like this:
datastore:
root: <butlerRoot>
registry:
db: postgresql+psycopg2://localhost:5432/merian
Note: this file needs only be constructed once, for the purposes of creating the butler.
Second, a science pipelines authentication file needs to be created in ~/.lsst/db-auth.yaml
. The contents of this file should look like this:
- url: postgresql://localhost:5432/merian
username: merian
password: MYSECRETPASSWORD
where MYSECRETPASSWORD
is the database password for the merian
database, for the merian
user (the database and the username are the same in the example above, but do not necessarily need to be).
Note: each user who wishes to interact with the butler repository needs to place a copy of this authentication file within their own home space.
The authentication file must be only readable by the users account (chmod 600 db-auth.yaml).
Finally, create the butler repo:
butler create --seed-config seed-config.yaml $REPO
This constructs a butler.yaml file in the $REPO directory.
Once the butler repo has been created, register the instrument:
butler register-instrument $REPO lsst.obs.decam.DarkEnergyCamera
The register-instrument
command will need to be re-run (once only) every time a new filter is added to the filter definitions file.
Note: the instrument name here needs to be the fully qualified name of an instrument subclass. Full names can be inferred from their respective
obs_
package at github.com/lsst. For this example, the relevantobs_
package is obs_decam and the fully qualified name islsst.obs.decam.DarkEnergyCamera
.
Finally, double check that all required filters are correctly registered with the butler:
butler query-dimension-records $REPO physical_filter
In this case, double check that N708 DECam c0012 7080.4 400.0
appears in the filter list.
The Science Pipelines require reference catalogues ("refcats") to accurately calibrate photometric and astrometric results. Two reference catalogues are required here: Gaia DR2 for astrometry, and Pan-STARRS PS1 for photometry. Further information is also available on the Community forum and on pipelines.lsst.io.
A number of different methods are available to ingest these catalogues. These steps are summarized below:
butler ingest-files
The first step in constructing these reference catalogues is to gather the catalogue data together and ingest the files. This process is decidedly non-trivial, and may require several hours to complete even on a high-powered machine.
If ingesting for the first time, all raw files can be downloaded at the link described in this comment on the Community forum.
Fortunately for our purposes, these downloaded FITS files already exist on the machine used here, and can be used directly:
GAIADR2=/projects/HSC/refcats/htm/gaia_dr2_20200414
PANSTARRSPS1=/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110
Once the files are in place, we create astropy-readable .ecsv
table files containing one row per input file in each reference catalogue. To construct these, in Python:
import os
import glob
import astropy.table
# output directory to save .ecsv files
outdir = "/home/lkelvin"
# full paths to LSST sharded reference catalogues
gaiadr2 = "/projects/HSC/refcats/htm/gaia_dr2_20200414"
panstarrsps1 = "/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110"
refcat_dirs = [gaiadr2, panstarrsps1]
# loop over each FITS file in all refcats
# note: this constructs a series of .ecsv files, each containing two columns:
# 1) the FITS filename, and 2) the htm7 pixel index
for refcat_dir in refcat_dirs:
outfile = f"{outdir}/{os.path.basename(refcat_dir)}.ecsv"
print(f"Saving to: {outfile}")
table = astropy.table.Table(names=("filename", "htm7"), dtype=("str", "int"))
files = glob.glob(f"{refcat_dir}/[0-9]*.fits")
for ii, file in enumerate(files):
print(f"{ii}/{len(files)} ({100*ii/len(files):0.1f}%)", end="\r")
# try/except to catch extra .fits files which may be in this dir
try:
file_index = int(os.path.basename(os.path.splitext(file)[0]))
except ValueError:
continue
else:
table.add_row((file, file_index))
table.write(outfile)
Note: the above script running on the
tiger
machine took ~20 minutes, 10 minutes per reference catalogue.
A .ecsv
file should now exist for each reference catalogue. Next, register the dataset types for each reference catalogue with the butler:
butler register-dataset-type $REPO gaia_dr2_20200414 SimpleCatalog htm7
butler register-dataset-type $REPO ps1_pv3_3pi_20170110 SimpleCatalog htm7
Check that both the Gaia DR2 and Pan-STARRS PS1 dataset types are now available using:
butler query-dataset-types $REPO
Finally, ingest the LSST-formatted files into the refcats/gen2
RUN collection in the repository:
butler ingest-files -t link $REPO gaia_dr2_20200414 refcats/gen2 gaia_dr2_20200414.ecsv
butler ingest-files -t link $REPO ps1_pv3_3pi_20170110 refcats/gen2 ps1_pv3_3pi_20170110.ecsv
convert_refcats.py
When these instructions were first written, it was not possible to convert these refcats directly into a gen3 repo (as we're setting up here). The instructions below are preserved, for reference, but should no longer be required (assuming the above gen3-approach suffices). Here we use butler convert
to map these data into our gen3 repo.
First, set up an empty temporary gen2 butler directory, and link the two reference datasets into this directory within a folder named ref_cats
:
GEN2REPO=/projects/MERIAN/repo_gen2
mkdir -p $GEN2REPO/ref_cats
echo "lsst.obs.decam.DecamMapper" > $GEN2REPO/_mapper
ln -s $GAIADR2 $GEN2REPO/ref_cats/
ln -s $PANSTARRSPS1 $GEN2REPO/ref_cats/
Next, a simple config file is required:
echo 'config.refCats.append("gaia_dr2_20200414")
config.runs["gaia_dr2_20200414"] = "refcats"
config.refCats.append("ps1_pv3_3pi_20170110")
config.runs["ps1_pv3_3pi_20170110"] = "refcats"
config.doRegisterInstrument = False' \
> $GEN2REPO/convert_refcats.py
Finally, we need to run the Science Pipelines refcat conversion script rootRepoConverter.py.
Note: at the time of writing, a modified version of
rootRepoConverter.py
was required to complete thebutler convert
command below without raising errors. Specifically, L137-L140 (fromif not self.task.dry_run:
toelse:
) were commented out, L141 (self.task.log.info...
) was unindented, and L208-L226 (fromif self._refCats:
toself._chain.append(chained)
) were commented out. This approach is not recommended for general use. This script also adds curated calibrations, negating the relevant section below.To set up this package, run
setup -j -r ~/repos/obs_base
and either switch to a branch with the changes listed above in place, or make the changes temporarily to the code in place.Once the command below has completed successfully, any changes made to the master branch can be undone using this on the command line:
git reset --hard origin/master
.At present, ticket DM-30624 now makes ingesting refcats from a gen2 repo easier.
To run the butler conversion script:
LOGFILE=$LOGDIR/convert_refcats.log; \
date | tee $LOGFILE; \
butler convert $REPO \
--gen2root $GEN2REPO \
-C $GEN2REPO/convert_refcats.py \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: the runtime for this command was ~20 minutes.
butler transfer-datasets
If a butler already exists on the machine, the refcat datasets can be transferred over from the existing butler directly:
LOGFILE=$LOGDIR/transfer_refcats.log; \
date | tee $LOGFILE; \
butler transfer-datasets --register-dataset-types \
-t copy --collections refcats/gen2 \
/projects/MERIAN/repo $REPO \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: on the
lsst-devl
machine at NCSA, the runtime for this command was ~60 minutes. The transfer type can be a number of different options, such ascopy
to make a copy of the files, orlink
to make only a symbolic link.
Once the refcats are in place, their collection can be confirmed:
butler query-collections $REPO "refcats*"
Following a dataset transfer, it may be necessary to set up the parent refcats
parent CHAINED collection which contains a comma-separated list of all of the transferred child refcats/...
RUN collections:
PARENT=refcats; \
CHILDREN=refcats/gen2; \
butler collection-chain $REPO $PARENT $CHILDREN
Make a sky map and add it to the repository. Sky maps exist as dimensions, datasets and collections. Until the DM-34516 ticket was merged, prior DECam data reductions made use of the HSC skymap: hsc_rings_v1
. DECam data reductions now instead make use of a DECam-specific sky map, maintaining the native pixel scale of DECam imaging. By default, DECam tracts have the same centroids as their HSC-counterparts, albeit with fewer patches per tract. An example DECam tract is shown below:
DECam Tract 9704 as defined using the decam_rings_v1
sky map. DECam tracts have the same centroids as HSC tracts (as defined by the hsc_rings_v1
sky map). Each DECam tract is further sub-divided into 6x6 patches, each of ~4k pixels on a side.
This command registers the DECam skyMap
dataset using the default obs_decam
configuration, and setting the output name to the commonly used and de-facto standard decam_rings_v1
:
LOGFILE=$LOGDIR/register_decam_rings_v1.log; \
date | tee $LOGFILE; \
butler register-skymap $REPO \
-C $OBS_DECAM_DIR/config/makeSkyMap.py \
-c name='decam_rings_v1' \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: the runtime for this command was ~5 minutes.
If the HSC skymap is also required, it may be generated as well:
LOGFILE=$LOGDIR/register_hsc_rings_v1.log; \
date | tee $LOGFILE; \
butler register-skymap $REPO \
-C $OBS_SUBARU_DIR/config/makeSkyMap.py \
-c name='hsc_rings_v1' \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Check that all required skyMap
dataset types now exist in the skymaps
run collection in the Merian repo:
butler query-datasets $REPO skyMap
Note: this sky map step doesn't really need to be performed until much later, however, if any errors occur during registration of the sky map, it may be necessary to delete the repo and start afresh. For this reason, it's usually better to perform this step as soon as possible.
Curated calibrations are collections of calibration data which describe various aspects of the camera and survey. If setting up a new butler on a new machine, an instrument's curated calibrations will need to be added to the data repository:
butler write-curated-calibrations $REPO lsst.obs.decam.DarkEnergyCamera
Note: if the modified version of
rootRepoConverter.py
was used above, the dataset types added by this command may already have been ingested into the repo. If so, running the above command will fail with an error similar toA database constraint failure was triggered by inserting one or more datasets of type DatasetType('camera', {instrument}, Camera, isCalibration=True) into collection 'DECam/calib/unbounded'. This probably means a dataset with the same data ID and dataset type already exists, but it may also mean a dimension row is missing.
.
The instrument may be specified via either the fully qualified name, as above, or the short name (DECam
here). The -h
help file indicates the former is required, but this advice may change in the future.
This currently adds camera
, crosstalk
, defects
and linearizer
dataset types into the repository within a number of collections (DECam/calib
, DECam/calib/unbounded
, and DECam/calib/curated/{timestamp}
). Check the current collections within the repo using:
butler query-collections $REPO "DECam/calib*"
We're now prepared to ingest raw science frames. If raw frames are being stored in multiple directories, this command needs to be repeated for each directory. Alternatively, a sufficient glob which is able to locate all files of interest may be supplied. Here's an example data ingest command:
LOGFILE=$LOGDIR/merian9813_ingest_science.log; \
SCIFILES=/project/lskelvin/decam/raw-merian/science/raw_*.fz; \
date | tee $LOGFILE; \
butler ingest-raws $REPO $SCIFILES --transfer link \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: The runtime for this command was ~5 minutes, ingesting 2480 distinct Butler datasets from 40 exposures.
Raw science exposures have now been added to the DECam/raw/all
collection in the Merian repo. Collections define groups of data, and can be listed (and searched) using:
butler query-collections $REPO "DECam/raw/all"
Ingested exposures can be listed on the command line using:
butler query-dimension-records $REPO exposure \
--where "instrument='DECam' AND exposure.observation_type='science'"
This view shows all available dimensions associated with each ingested science image, including the observation ID, the physical filter, and the observation type. Alternatively, datasets can be queried directly using query-datasets
, with optional SQL-like --where
arguments to search specific dimensions, e.g.:
WHERE="instrument='DECam' AND exposure=971666"
WHERE="instrument='DECam' AND detector=1"
WHERE="instrument='DECam'
AND exposure.observation_type='science'
AND exposure.day_obs > 20220101
AND exposure.day_obs < 20220201
AND detector=1"
butler query-datasets $REPO --where $WHERE raw
Note: To successfully use the
--where
arguement, other dimensions may be required, such asinstrument
. The butler will complain with aUserExpressionError
if a required dimension is not found.
A list of science exposure IDs can similarly be extracted within python:
queryData = butler.registry.queryDatasets
where = "exposure.observation_type='science' AND detector=1"
exps = list(queryData("raw", collections="DECam/raw/all",
instrument="DECam", where=where))
expids = tuple(x.dataId["exposure"] for x in exps)
print(f'SCIEXPS="{expids}"')
The test dataset used here returns this list of science exposures:
SCIEXPS="(971666, 971667, 971668, 971669, 971670, 971671, 971672, 971673, 971674, 971675, 971676, 971677, 971678, 971679, 971680, 971681, 971682, 971683, 971684, 971685, 1068554, 1068555, 1068556, 1068557, 1068558, 1068559, 1068560, 1068561, 1068562, 1068713, 1068714, 1068715, 1068716, 1068717, 1068718, 1068719, 1068720, 1068721, 1068722, 1068723)"
As with the raw science frames above, raw bias frames ('zero') are also ingested:
LOGFILE=$LOGDIR/merian9813_ingest_bias.log; \
BIASFILES=/project/lskelvin/decam/raw-merian/bias/raw_*.fz; \
date | tee $LOGFILE; \
butler ingest-raws $REPO $BIASFILES --transfer link \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: The runtime for this command was ~1 minute, ingesting 3100 distinct Butler datasets from 50 exposures.
Bias exposures have now been ingested into the Merian repo. Check that the bias calibration frames have been successfully ingested using:
butler query-dimension-records $REPO exposure \
--where "instrument='DECam' AND exposure.observation_type='zero'"
A list of bias exposure IDs can be extracted within python:
queryData = butler.registry.queryDatasets
where = "exposure.observation_type='zero' AND detector=1"
exps = list(queryData("raw", collections='DECam/raw/all',
instrument="DECam", where=where))
expids = tuple(x.dataId["exposure"] for x in exps)
print(f'BIASEXPS="{expids}"')
The test dataset used here returns this list of bias exposures:
BIASEXPS="(970488, 970489, 970490, 970491, 970492, 970493, 970494, 970495, 970496, 970497, 970498, 970589, 970590, 970591, 970592, 970593, 970594, 970595, 970596, 970597, 970598, 970599, 970823, 970824, 970825, 970826, 970827, 970828, 970829, 970830, 970831, 970832, 970833, 970924, 970925, 970926, 970927, 970928, 970929, 970930, 970931, 970932, 970933, 970934, 971161, 971162, 971163, 971164, 971165, 971166)"
The final set of data to be ingested are the raw flat frames ('dome flat'):
LOGFILE=$LOGDIR/merian9813_ingest_flat.log; \
FLATFILES=/project/lskelvin/decam/raw-merian/flat/raw_*.fz; \
date | tee $LOGFILE; \
butler ingest-raws $REPO $FLATFILES --transfer link \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: The runtime for this command was ~5 minutes, ingesting 6200 distinct Butler datasets from 100 exposures.
Flat exposures have now been ingested into the Merian repo. Check that the flat calibration frames have been successfully ingested using:
butler query-dimension-records $REPO exposure \
--where "instrument='DECam' AND exposure.observation_type='dome flat'"
Note: if an attempt is made to ingest a file which was already been ingested, the science pipelines will fail for that particular file. This behaviour is as expected, and not a cause for concern.
A list of flat exposure IDs can be extracted within python:
queryData = butler.registry.queryDatasets
where = "exposure.observation_type='dome flat' AND detector=1"
exps = list(queryData("raw", collections="DECam/raw/all",
instrument="DECam", where=where))
expids = tuple(x.dataId["exposure"] for x in exps)
print(f'FLATEXPS="{expids}"')
The test dataset used here returns this list of flat exposures:
FLATEXPS="(970228, 970229, 970230, 970231, 970232, 970233, 970234, 970235, 970236, 970237, 970238, 970501, 970502, 970503, 970504, 970505, 970506, 970507, 970508, 970509, 970510, 970511, 970836, 970837, 970838, 970839, 970840, 970841, 970842, 970843, 970844, 970845, 970846, 971174, 971175, 971176, 971177, 971178, 971179, 971180, 971181, 971182, 971183, 971184, 971554, 971555, 971556, 971557, 971558, 971559, 1052706, 1052707, 1052708, 1052709, 1052710, 1052711, 1052712, 1052713, 1052714, 1052715, 1052716, 1053093, 1053094, 1053095, 1053096, 1053097, 1053098, 1053099, 1053100, 1053101, 1053102, 1053103, 1053485, 1053486, 1053487, 1053488, 1053489, 1053490, 1053491, 1053492, 1053493, 1053494, 1053495, 1053858, 1053859, 1053860, 1053861, 1053862, 1053863, 1053864, 1053865, 1053866, 1053867, 1053868, 1054287, 1054288, 1054289, 1054290, 1054291, 1054292)"
Once all raw data has been ingested, we can define visits from exposures in the butler registry. This sets up the exposure IDs within the butler, allowing future runs to use this information when using the -d
or --where
data queries. Without this step, processing steps after ISR (i.e., characterizeImage
onwards) will fail with RuntimeError: QuantumGraph is empty.
.
LOGFILE=$LOGDIR/merian9813_define_visits.log; \
date | tee $LOGFILE; \
butler define-visits $REPO lsst.obs.decam.DarkEnergyCamera \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: this run took ~1 minute with these data. Do not use the
-j N
syntax here to run this process on more than one processor. Doing so will cause multipledatabase is locked
errors as each processor attempts to write to the same butler. Ticket DM-30607 references this issue.
Now that raw bias (zero) and dome flat calibration frames have been ingested, validation date ranges need to be determined. Prior analyses of these data show that all Merian observation nights are consistent with each other to within a 1% flux deviation. For this reason, we opt to construct calibraton frames which are certified across the entire timespan of our data; from 2021-01-01 to 2022-06-30.
Next, the master bias frames are built. These frames need to be built for each valid date range (see section above).
First, check the build pipeline:
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \
--show pipeline
cpBias
pipeline at the time of writing.
description: cp_pipe BIAS calibration construction for DECam
instrument: lsst.obs.decam.DarkEnergyCamera
tasks:
isr:
class: lsst.ip.isr.IsrTask
config:
- connections.ccdExposure: raw
connections.outputExposure: cpBiasProc
doWrite: true
doBias: false
doVariance: true
doLinearize: false
doCrosstalk: false
doBrighterFatter: false
doDark: false
doFlat: false
doApplyGains: false
doFringe: false
cpBiasCombine:
class: lsst.cp.pipe.cpCombine.CalibCombineTask
config:
- connections.inputExpHandles: cpBiasProc
connections.outputData: bias
calibrationType: bias
exposureScaling: Unity
contracts:
- contract: cpBiasCombine.calibrationType == "bias"
- contract: cpBiasCombine.exposureScaling == "Unity"
- contract: isr.doBias == False
The cpBias
pipeline may also be viewed graphically:
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpBias.pdf
Build master bias (zero) frames, ensuring that all required input collections are given as arguments to -i
:
LOGFILE=$LOGDIR/merian9813_cpBias.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \
-o DECam/calib/merian9813/bias \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \
-d "instrument='DECam' AND exposure IN $BIASEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: This job contained 3162 quanta, with a runtime of ~2 hours. The
-d
data query takes a wide range of SQL-like arguments. Instead of specifying exposures by their exposure ID, it may be preferable to build them by date instead, e.g.:"exposure.observation_type='zero' AND exposure.day_obs > 20210910"
.
To avoid an error regarding missing defects
dataset types, an input collection containing defects
must also be supplied in the cpBias
run. Here, these data will be ingested into the repo when running write-curated-calibrations
, for example. The instructions here make use of the DECam/calib/curated/19700101T000000Z
collection. Other collections containing defects
are also available, however, some of the commonly unused detectors are missing (i.e., there are <62). If defects
data are not available at all, adding -c isr:doDefect=False
to the pipetask run
command will disable defect masking when running the cpBias
pipeline.
On occasion, some of the tasks (quanta) may fail, likely due to memory issues. In such cases, an afterburner can be run on a single core to try the failed tasks again. To do so, add --extend-run
and --skip-existing
to the pipetask
run command, and remove -j N
to prevent it from running on multiple cores. This will help ensure that the most memory-intensive quanta will not request too much simultaneous memory usage.
Check the collections, dataset types and datasets now present in the repo:
butler query-collections $REPO "*bias*"
butler query-dataset-types $REPO
butler query-datasets $REPO --collections DECam/calib/merian9813/bias
butler query-datasets $REPO --collections DECam/calib/merian9813/bias bias
Certify the biases for a given date range. Arguments: REPO
, INPUT_COLLECTION
, OUTPUT_COLLECTION
, DATASET_TYPE_NAME
:
butler certify-calibrations \
$REPO DECam/calib/merian9813/bias DECam/calib/merian9813 bias \
--begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59
You may check what certified date ranges have been applied to the bias data in Python by querying dataset associations in the output collection. For example, to check only detector #1:
qda = butler.registry.queryDatasetAssociations
coll = "DECam/calib/merian9813"
biases = [x for x in qda("bias", collections=coll) if x.ref.dataId["detector"] == 1]
print(biases)
which produces a list of all biases relating to detector #1 (in the case of this example document, there should be only 1 result at present). Inspecting the properties of this object gives the timespan, e.g.:
print(f"{biases[0].timespan.begin.value = }")
# biases[0].timespan.begin.value = '2021-01-01 00:00:00.000000'
print(f"{biases[0].timespan.end.value = }")
# biases[0].timespan.end.value = '2022-06-30 23:59:59.000000'
The next step is to generate crosstalk sources using step 0 of the Data Release Production (DRP) pipeline (DRP.yaml
). Crosstalk sources need to be generated for any raw we want to run actual ISR on (i.e., raw flats and raw science frames). Step 0 of DRP.yaml
runs only the doOverscan
aspect of the ISR
(instrument signature removal) task. It can be visualized using:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
--show pipeline
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
isrForCrosstalkSources:
class: lsst.ip.isr.IsrTask
config:
- connections.outputExposure: overscanRaw
doOverscan: true
doAssembleCcd: false
doBias: false
doCrosstalk: false
doVariance: false
doLinearize: false
doDefect: false
doNanMasking: false
doDark: false
doFlat: false
doFringe: false
doInterpolate: false
subsets:
step0:
subset:
- isrForCrosstalkSources
description: |
Tasks which should be run once, prior to initial data processing.
This step generates crosstalk sources for ISR/inter-chip crosstalk by
applying overscan correction on raw frames. A new dataset is written,
which should be used as an input for further data processing.
The current step 0 pipeline may also be viewed graphically:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step0.pdf
Run step0
for raw flats:
LOGFILE=$LOGDIR/merian9813_step0_flat.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \
-o DECam/calib/merian9813/crosstalk \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
-d "instrument='DECam' AND exposure IN $FLATEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: this run contained 6200 quanta, with a runtime of ~2 hours. The final run message for these data was:
Executed 6200 quanta successfully, 0 failed and 0 remain out of total 6200 quanta.
Extend the crosstalk
RUN collection to also include science exposures:
LOGFILE=$LOGDIR/merian9813_step0_science.log; \
date | tee -a $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
--extend-run --skip-existing \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \
-o DECam/calib/merian9813/crosstalk \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
-d "instrument='DECam' AND exposure IN $SCIEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: this run contained 2480 quanta, with a runtime of ~30 minutes. The final run message for these data was:
Executed 2480 quanta successfully, 0 failed and 0 remain out of total 2480 quanta.
The overscanRaw
dataset types should now be available in the output repository. Check the collections and datasets:
butler query-collections $REPO
butler query-datasets $REPO overscanRaw
This step constructs the master flat frames (which requires using the biases). The cpFlat
pipeline can be visualized using:
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \
--show pipeline
cpFlat
pipeline at the time of writing.
description: cp_pipe FLAT calibration construction for DECam
instrument: lsst.obs.decam.DarkEnergyCamera
tasks:
isr:
class: lsst.ip.isr.IsrTask
config:
- connections.ccdExposure: raw
connections.outputExposure: cpFlatProc
doWrite: true
doBrighterFatter: false
doFlat: false
doFringe: false
doApplyGains: false
connections.crosstalkSources: overscanRaw
connections.bias: bias
doDark: false
cpFlatMeasure:
class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask
config:
- connections.inputExp: cpFlatProc
connections.outputStats: flatStats
doVignette: false
cpFlatNorm:
class: lsst.cp.pipe.cpFlatNormTask.CpFlatNormalizationTask
config:
- connections.inputMDs: flatStats
connections.outputScales: cpFlatNormScales
level: AMP
cpFlatCombine:
class: lsst.cp.pipe.cpCombine.CalibCombineByFilterTask
config:
- connections.inputExpHandles: cpFlatProc
connections.inputScales: cpFlatNormScales
connections.outputData: flat
calibrationType: flat
exposureScaling: InputList
scalingLevel: AMP
contracts:
- contract: isr.doFlat == False
The cpFlat
pipeline may also be viewed graphically:
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpFlat.pdf
Build master flats:
LOGFILE=$LOGDIR/merian9813_cpFlat.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/merian9813,DECam/calib/merian9813/crosstalk,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \
-o DECam/calib/merian9813/flat \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \
-d "instrument='DECam' AND exposure IN $FLATEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: this run contained 12526 quanta, with a runtime of ~3 hours. Initializing the task may also take some time, due to the complex nature of the quantum graph.
On occasion, some quanta (tasks) may fail, likely due to memory issues. In such cases, the missing master flats can be reattempted in an afterburner by excluding -j N
(to run on only a single core) and including --extend-run --skip-existing
in the pipetask
run command.
Check what types of data now exist in the output collection:
butler query-collections $REPO
butler query-dataset-types $REPO
butler query-datasets $REPO --collections DECam/calib/merian9813/flat
butler query-datasets $REPO --collections DECam/calib/merian9813/flat flat
Note: the final two commands will not work if the output collection is a CHAINED collection containing a CALIBRATION child collection. If attempted, the above commands will fail with
NotImplementedError: Query for dataset type 'camera' in CALIBRATION-type collection 'DECam/calib/merian' is not yet supported.
. As a work-around, instead provide the full name of the child RUN collection as given byquery-collections
.
Certify the flats for a given date range.
Arguments: REPO
, INPUT_COLLECTION
, OUTPUT_COLLECTION
, DATASET_TYPE_NAME
:
butler certify-calibrations \
$REPO DECam/calib/merian9813/flat DECam/calib/merian9813 flat \
--begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59
You may check what certified date ranges have been applied to the flat data in Python by querying dataset associations in the output collection. For example, to check only detector #1:
qda = butler.registry.queryDatasetAssociations
coll = "DECam/calib/merian9813"
flats = [x for x in qda("flat", collections=coll) if x.ref.dataId["detector"] == 1]
print(flats)
which produces a list of all flats relating to detector #1 (in the case of this example document, there should be only 1 result at present). Inspecting the properties of this object gives the timespan, e.g.:
print(f"{flats[0].timespan.begin.value = }")
# flats[0].timespan.begin.value = '2021-01-01 00:00:00.000000'
print(f"{flats[0].timespan.end.value = }")
# flats[0].timespan.end.value = '2022-06-30 23:59:59.000000'
Data in the Science Pipelines are arranged into collections; groupings of data. Here we establish a default collection which contains the commonly required raw RUN collections. Whilst this step is not strictly necessary, this will allow us to specify only a single INPUT
collection for future raw data processing:
INPUT=DECam/defaults/merian9813
If this step is not performed, future data processing will need to specify all required input collections explicitly:
-i long,comma,separated,list,of,child,collections
If this step is followed then future data processing from raw data should only need to specify the default collection:
-i $INPUT
Note: it is not currently possible to query a CHAINED collection containing a CALIBRATION child collection. By constructing a dedicated CHAINED collection containing only the RUN runs of interest, this will allow users to query the CHAINED collection and avoid this error.
A CHAINED collection can be set up either on the command line or in Python. To set up a CHAINED collection on the command line for all required input collections, run:
CHILDREN="DECam/raw/all,\
DECam/calib/merian9813,\
DECam/calib/merian9813/crosstalk,\
DECam/calib/curated/19700101T000000Z,\
DECam/calib/unbounded,\
skymaps,\
refcats"
butler collection-chain $REPO $INPUT $CHILDREN
Note: the CHILDREN list may be amended and the above command re-run to update this parent collection, if, for example, new data has been processed and a user would like to add the updated crosstalk RUN collection to this parent CHAINED collection.
Alternatively, this may also be achieved in Python:
import lsst.daf.butler as dafButler
REPO = "/project/lskelvin/repo"
default_collection = "DECam/defaults/merian9813"
# Set up a writeable butler
butler_writeable = dafButler.Butler(REPO, writeable=True)
registry_writeable = butler_writeable.registry
# Register a new default CHAINED collection
registry_writeable.registerCollection(default_collection,
type = dafButler.CollectionType.CHAINED)
# Add required CHILD collections into the CHAINED collection
registry_writeable.setCollectionChain(default_collection,
["DECam/raw/all",
"DECam/calib/merian9813",
"DECam/calib/merian9813/crosstalk"
"DECam/calib/curated/19700101T000000Z",
"DECam/calib/unbounded",
"skymaps",
"refcats"])
Note: as above, if reprocessing data in future runs, you can amend the list above to add your own collections, and then re-run
setCollectionChain
to updatedefault_collection
. This allows for the default collection to stay relevant in linking to all necessary datasets as new data becomes available.
In this section we will proceed through all the relevant data processing steps to take raw DECam science data through to coadd outputs. These processed data will output into the OUTPUT
CHAINED collection:
OUTPUT=DECam/runs/merian9813/w_2022_26
Here, the w_2022_26
is a reference to the weekly of the LSST Science Pipelines used to reduce these data.
Processing consists of three main steps (1, 2 and 3):
If outputting to an already existing collection in the commands below, the following arguments should be appended to the pipetask
run commands below:
--extend-run --skip-existing --clobber-outputs
Processed visit images (PVIs) and preliminary source tables are produced in step 1.
To generate the pipeline YAML:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \
--show pipeline
which gives:
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
isr:
class: lsst.ip.isr.IsrTask
config:
- connections.crosstalkSources: overscanRaw
doCrosstalk: true
characterizeImage:
class: lsst.pipe.tasks.characterizeImage.CharacterizeImageTask
calibrate:
class: lsst.pipe.tasks.calibrate.CalibrateTask
config:
- photoCal.match.referenceSelection.magLimit.fluxField: i_flux
photoCal.match.referenceSelection.magLimit.maximum: 22.0
writePreSourceTable:
class: lsst.pipe.tasks.postprocess.WriteSourceTableTask
config:
- connections.outputCatalog: preSource
transformPreSourceTable:
class: lsst.pipe.tasks.postprocess.TransformSourceTableTask
config:
- connections.inputCatalog: preSource
connections.outputCatalog: preSourceTable
subsets:
processCcd:
subset:
- characterizeImage
- isr
- calibrate
description: 'Set of tasks to run when doing single frame processing, without
any conversions to Parquet/DataFrames or visit-level summaries.
'
step1:
subset:
- writePreSourceTable
- calibrate
- transformPreSourceTable
- characterizeImage
- isr
description: |
Per-detector tasks that can be run together to start the DRP pipeline.
These should never be run with 'tract' or 'patch' as part of the data ID
expression if any later steps will also be run, because downstream steps
require full visits and 'tract' and 'patch' constraints will always
select partial visits that overlap that region.
To generate the pipeline graph:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step1.pdf
which gives:
Run step 1:
DATAQUERY="exposure.day_obs > 20210101
AND exposure.day_obs < 20220630
AND exposure.observation_type='science'
AND detector NOT IN (31,61)"
LOGFILE=$LOGDIR/merian9813_step1.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: the runtime for this 12000 quanta run was ~4 hours. As is common in DECam data processing, detectors 31 and 61 have been excluded from data reduction owing to their corrupt nature. Instead of selecting by date as above, a wide range of data selectors may be used instead to identify specific raw data to be processed. For example, '
exposure IN $SCIEXPS
' would only process exposures defined in theSCIEXPS
object.
Initial visit aggregation takes place in step 2a, producing visit-wide preliminary source tables and visit summaries.
To generate the pipeline YAML:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \
--show pipeline
which gives:
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
consolidateVisitSummary:
class: lsst.pipe.tasks.postprocess.ConsolidateVisitSummaryTask
consolidatePreSourceTable:
class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask
config:
- connections.inputCatalogs: preSourceTable
connections.outputCatalog: preSourceTable_visit
subsets:
step2a:
subset:
- consolidateVisitSummary
- consolidatePreSourceTable
description: |
Visit-level tasks
Allowed data query constraints: visit
Tasks aggregate all detectors for a given visit.
To generate the pipeline graph:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2a.pdf
which gives:
Run step 2a:
LOGFILE=$LOGDIR/merian9813_step2a.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Note: the runtime for this 80 quanta run was ~3 minutes.
Photometric and astrometric calibration take place at the tract level in step 2b.
To generate the pipeline YAML:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \
--show pipeline
which gives:
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
isolatedStarAssociation:
class: lsst.pipe.tasks.isolatedStarAssociation.IsolatedStarAssociationTask
config:
- python: config.band_order += ["N708", "N540"]
jointcal:
class: lsst.jointcal.JointcalTask
config:
- connections.inputSourceTableVisit: preSourceTable_visit
subsets:
step2b:
subset:
- isolatedStarAssociation
- jointcal
description: |
Tract-level tasks
Allowed data query constraints: tract
Jointcal and isolatedStarAssociation both use PreSources, generated
by consolidatePreSourceTable, for all visits that overlap a tract.
jointcal produces solutions per-tract, per-visit
isolatedStarAssociation produces solutions per-tract.
To generate the pipeline graph:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2b.pdf
which gives:
Run step 2b:
LOGFILE=$LOGDIR/merian9813_step2b.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \
-d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Step 2c no longer exists in the current (2023+) Merian/DECam data reduction pipeline. The notes in this section below have been preserved as-is, but should not be referenced for future data reduction efforts.
Global per-collection summaries of visits and detectors are generated in step 2c.
To generate the pipeline YAML:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \
--show pipeline
which gives:
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
makeCcdVisitTable:
class: lsst.pipe.tasks.postprocess.MakeCcdVisitTableTask
makeVisitTable:
class: lsst.pipe.tasks.postprocess.MakeVisitTableTask
subsets:
step2c:
subset:
- makeVisitTable
- makeCcdVisitTable
description: |
Global-level tasks that must not be run with any data query constraints
Can be run anytime after subset step2a.
Allowed data query constraints: instrument
Tasks generate one data product per collection.
make[Ccd]VisitTable produces per-collection summary of the Visits
and CcdVisits.
To generate the pipeline graph:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2c.pdf
which gives:
Run step 2c:
LOGFILE=$LOGDIR/merian9813_step2c.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \
-d "instrument='DECam'" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Generation of final source tables with full calibrations applied takes place in step 2d.
To generate the pipeline YAML:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \
--show pipeline
which gives:
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
transformSourceTable:
class: lsst.pipe.tasks.postprocess.TransformSourceTableTask
consolidateSourceTable:
class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask
finalizeCharacterization:
class: lsst.pipe.tasks.finalizeCharacterization.FinalizeCharacterizationTask
writeRecalibratedSourceTable:
class: lsst.pipe.tasks.postprocess.WriteRecalibratedSourceTableTask
config:
- useGlobalExternalPhotoCalib: false
connections.photoCalibName: jointcal
connections.outputCatalog: source
subsets:
step2d:
subset:
- writeRecalibratedSourceTable
- finalizeCharacterization
- consolidateSourceTable
- transformSourceTable
description: |
Visit-level tasks.
Allowed data query constraints: visit
writeRecalibratedSourceTable, transformSourceTable run per-detector
consolidateSourceTable produces one data product per visit.
finalizeCharacterization will eventually model full focal plane PSFs.
To generate the pipeline graph:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2d.pdf
which gives:
Run step 2d:
LOGFILE=$LOGDIR/merian9813_step2d.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Generation of final visit tables and CCD visit tables takes place in step 2e.
To generate the pipeline YAML:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \
--show pipeline
which gives:
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
makeCcdVisitTable:
class: lsst.pipe.tasks.postprocess.MakeCcdVisitTableTask
makeVisitTable:
class: lsst.pipe.tasks.postprocess.MakeVisitTableTask
subsets:
step2e:
subset:
- makeVisitTable
- makeCcdVisitTable
description: |
Global-level tasks that must not be run with any data query constraints
Can be run anytime after subset step2d.
Allowed data query constraints: instrument
Tasks generate one data product per collection.
make[Ccd]VisitTable produces per-collection summary of the Visits
and CcdVisits.
To generate the pipeline graph:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2e.pdf
which gives:
Run step 2e:
LOGFILE=$LOGDIR/merian9813_step2e.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Coadd-processing takes place in step 3. A large number of tasks are performed during this step, including, but not limited to:
To generate the pipeline YAML:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \
--show pipeline
which gives:
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
makeWarp:
class: lsst.pipe.tasks.makeCoaddTempExp.MakeWarpTask
config:
- makePsfMatched: true
- python: |
config.warpAndPsfMatch.psfMatch.kernel['AL'].alardSigGauss = [1.0, 2.0, 4.5]
from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask
config.select.retarget(PsfWcsSelectImagesTask)
connections.photoCalibName: jointcal
matchingKernelSize: 29
doApplyExternalPhotoCalib: true
externalPhotoCalibName: jointcal
doApplyExternalSkyWcs: true
modelPsf.defaultFwhm: 7.7
warpAndPsfMatch.warp.warpingKernelName: lanczos5
coaddPsf.warpingKernelName: lanczos5
useGlobalExternalPhotoCalib: false
doWriteEmptyWarps: true
assembleCoadd:
class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask
config:
- doInputMap: true
- python: |
config.removeMaskPlanes.append("CROSSTALK")
config.badMaskPlanes += ["SUSPECT"]
from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask
config.select.retarget(PsfWcsSelectImagesTask)
matchingKernelSize: 29
doApplyExternalPhotoCalib: true
externalPhotoCalibName: jointcal
doApplyExternalSkyWcs: true
subregionSize: (10000, 200)
doNImage: true
interpImage.transpose: true
coaddPsf.warpingKernelName: lanczos5
assembleStaticSkyModel.subregionSize: (10000, 200)
assembleStaticSkyModel.doApplyExternalPhotoCalib: true
assembleStaticSkyModel.externalPhotoCalibName: jointcal
assembleStaticSkyModel.doApplyExternalSkyWcs: true
doFilterMorphological: true
useGlobalExternalPhotoCalib: false
assembleStaticSkyModel.useGlobalExternalPhotoCalib: false
doAttachTransmissionCurve: false
detection:
class: lsst.pipe.tasks.multiBand.DetectCoaddSourcesTask
mergeDetections:
class: lsst.pipe.tasks.mergeDetections.MergeDetectionsTask
deblend:
class: lsst.pipe.tasks.deblendCoaddSourcesPipeline.DeblendCoaddSourcesMultiTask
measure:
class: lsst.pipe.tasks.multiBand.MeasureMergedCoaddSourcesTask
mergeMeasurements:
class: lsst.pipe.tasks.mergeMeasurements.MergeMeasurementsTask
writeObjectTable:
class: lsst.pipe.tasks.postprocess.WriteObjectTableTask
transformObjectTable:
class: lsst.pipe.tasks.postprocess.TransformObjectCatalogTask
consolidateObjectTable:
class: lsst.pipe.tasks.postprocess.ConsolidateObjectTableTask
forcedPhotCoadd:
class: lsst.meas.base.forcedPhotCoadd.ForcedPhotCoaddTask
selectGoodSeeingVisits:
class: lsst.pipe.tasks.selectImages.BestSeeingQuantileSelectVisitsTask
config:
- connections.goodVisits: goodSeeingVisits
templateGen:
class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask
config:
- doSelectVisits: true
assembleStaticSkyModel.doSelectVisits: true
connections.selectedVisits: goodSeeingVisits
connections.outputCoaddName: goodSeeing
connections.coaddExposure: goodSeeingCoadd
- python: |
config.removeMaskPlanes.append("CROSSTALK")
config.badMaskPlanes += ["SUSPECT"]
from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask
config.select.retarget(PsfWcsSelectImagesTask)
matchingKernelSize: 29
doApplyExternalPhotoCalib: true
externalPhotoCalibName: jointcal
doApplyExternalSkyWcs: true
subregionSize: (10000, 200)
doNImage: true
interpImage.transpose: true
coaddPsf.warpingKernelName: lanczos5
assembleStaticSkyModel.subregionSize: (10000, 200)
assembleStaticSkyModel.doApplyExternalPhotoCalib: true
assembleStaticSkyModel.externalPhotoCalibName: jointcal
assembleStaticSkyModel.doApplyExternalSkyWcs: true
doFilterMorphological: true
useGlobalExternalPhotoCalib: false
assembleStaticSkyModel.useGlobalExternalPhotoCalib: false
doAttachTransmissionCurve: false
contracts:
- contract: '''calib_psf_candidate'' not in measure.propagateFlags.source_flags if
makeWarp.doApplyFinalizedPsf else True'
- contract: '''calib_psf_reserved'' not in measure.propagateFlags.source_flags if
makeWarp.doApplyFinalizedPsf else True'
- contract: '''calib_psf_used'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf
else True'
- contract: selectGoodSeeingVisits.connections.goodVisits == templateGen.connections.selectedVisits
subsets:
multiband:
subset:
- detection
- mergeDetections
- deblend
- measure
- mergeMeasurements
- forcedPhotCoadd
description: 'A set of tasks to run when making measurements on coadds.
'
objectTable:
subset:
- consolidateObjectTable
- writeObjectTable
- transformObjectTable
description: 'A set of tasks to transform multiband outputs into a parquet object
table.
'
step3:
subset:
- consolidateObjectTable
- mergeMeasurements
- detection
- mergeDetections
- selectGoodSeeingVisits
- writeObjectTable
- deblend
- templateGen
- assembleCoadd
- transformObjectTable
- measure
- makeWarp
- forcedPhotCoadd
description: |
Tract-level tasks that can be run together, but only after the 'step1'
and 'step2' subsets.
These should be run with explicit 'tract' constraints essentially all the
time, because otherwise quanta will be created for jobs with only partial visit
coverage.
To generate the pipeline graph:
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step3.pdf
which gives:
Run step 3:
LOGFILE=$LOGDIR/merian9813_step3.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \
-d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
This section provides some useful command-line commands which may be used to interact with the data.
Query collections in the repo:
butler query-collections $REPO "*u/lskelvin*"
The final search pattern may use standard glob syntax (e.g., note the asterisks above).
Query the datasets which live in a given collection:
butler query-datasets $REPO \
--collections u/lskelvin/testrun/01 \
--where "instrument='DECam' AND skymap='hsc_rings_v1' AND tract=9813" \
calexp
If the final dataset type (calexp
in the example above) is not given, all dataset types found will be printed to the command line.
Redefine a CHAINED collection to only contain certain child RUN collections:
butler collection-chain $REPO PARENT "CHILD1,CHILD2"
This command is useful to use prior to attempting to delete a CHAINED collection, ensuring that no attempt is made to delete input raw collections.
Remove one or more RUN collections:
butler remove-runs $REPO COLLECTION
Remove one or more non-RUN collections:
butler remove-collections $REPO COLLECTION
The visitSummary
tables produced in step 2a contain important information on single frame processed visits. This information may be used to find out which tracts overlap with your data.
To generate a list of tract overlaps for a single visit, in Python:
from collections import defaultdict
import lsst.daf.butler as dafButler
butler = dafButler.Butler('/project/lskelvin/repo')
grouped_by_tract = defaultdict(set)
for data_id in butler.registry.queryDataIds(
["tract", "visit", "detector"],
datasets="visitSummary",
collections="DECam/runs/merian9813/w_2022_26",
instrument="DECam",
visit=971666,
):
grouped_by_tract[data_id["tract"]].add(data_id)
print({k: len(v) for k, v in grouped_by_tract.items()})
To get total tract coverage for all visits in a given collection, remove the visit=
argument above.
To transfer datasets from one machine to another (e.g., from SLAC to Princeton), first, on the source machine in Python:
outdir = "/sdf/data/rubin/u/lskelvin/merian"
datasetType = ["objectTable_tract", "deepCoadd", "deepCoadd_calexp"]
collection = "HSC/runs/RC2/w_2022_04/DM-33402"
dataId = dict(skymap="hsc_rings_v1", tract=9813)
with butler.export(directory=outdir, format="yaml", transfer="copy") as export:
items = []
found = set(butler.registry.queryDatasets(datasetType,
collections=collection,
dataId=dataId))
items.extend(found)
export.saveDatasets(items)
Next, in the output directory on the source machine:
tar -czvf data_transfer.tar.gz *
Transfer the file (here named data_transfer.tar.gz
) from the source machine to the destination machine. Extract the tarball on the source machine:
tar -xzvf data_transfer.tar.gz
Next, on the source machine:
LOGFILE=$LOGDIR/data_import.log; \
butler import $REPO \
/path/to/data_transfer_directory \
--transfer copy \
--skip-dimensions skymap,tract,patch \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
Finally, set up a similarly named parent collection, e.g.:
PARENT=HSC/runs/RC2/w_2022_04/DM-33402
CHILD=HSC/runs/RC2/w_2022_04/DM-33402/20220128T212035Z
butler collection-chain $REPO $PARENT $CHILD
To decertify a calibration collection (because, e.g., a new calibration collection has been generated and intended to replace the existing certified data on-disk):
writeable_butler = dafButler.Butler(
'/projects/MERIAN/repo', writeable=True
)
writeable_butler.registry.decertify(
collection='DECam/calib/merian',
datasetType='bias',
timespan=lsst.daf.butler.Timespan(None, None),
)
writeable_butler.registry.decertify(
collection='DECam/calib/merian',
datasetType='flat',
timespan=lsst.daf.butler.Timespan(None, None),
)