# Merian Data Processing Using the LSST Science Pipelines
The [Merian Survey](https://merian.sites.ucsc.edu/) is an ambitious program designed to explore the nature of dark matter, star formation, and feedback in dwarf galaxies. Merian will use 62 nights on the 4m Blanco telescope in Chile using the Dark Energy Camera (DECam). A total of 800 square degrees of the sky will be imaged in two custom made medium-band filters to create a sample of 100,000 star forming dwarf galaxies (with 90% completeness) in the redshift range 0.058 < z < 0.10. Merian will cover the [HSC SSP](https://hsc.mtk.nao.ac.jp/ssp/) Wide field which provides gravitational lensing capabilities to probe the dark matter component of dwarfs ([Leauthaud et al. 2020](https://ui.adsabs.harvard.edu/abs/2020PDU....3000719L/abstract)).
This note summarizes the ingestion and data reduction process for Merian data, initially observed in Feb/March 2021, using the [Rubin Observatory](https://www.lsst.org/) [LSST Science Pipelines](https://pipelines.lsst.io/).
There are three primary DECam dataset types:
- object:
- primary science frames
- observation type: `object`
- filters:
- `N540 DECam c0014 5403.2 210.0`
- `N708 DECam c0012 7080.0 400.0`
- data storage locations on `tiger2-sumire`:
- `/projects/MERIAN/raw/*/object`
- zero:
- bias frames
- observation type: `zero`
- data storage locations on `tiger2-sumire`:
- `/projects/MERIAN/raw/*/zero`
- domeflat:
- flat frames
- observation type: `dome flat`
- data storage locations on `tiger2-sumire`:
- `/projects/MERIAN/raw/*/domeflat`
![The DECam focal plane](https://www.researchgate.net/profile/Deborah-Gulledge/publication/327259994/figure/fig1/AS:664441125347331@1535426517714/DECam-focal-plane-showing-the-62-2k-4k-CCDs-8-2k-2k-CCDs-labeled-F-for-the.png "The DECam focal plane")
:::info
The DECam Focal Plane; figure from Diehl et al. 2018. DECam focal plane showing the 62 2k x 4k CCDs, 8 2k x 2k CCDs (labeled "F") for the adaptive optics system, and 4 2k x 2k CCDs (labeled "G") for guiding. The orientation of the sky is indicated. The black label (e.g., S30) indicates a position on the focal plane. The green label (e.g., 2) indicates the number of the CCD as is in the multi-extension FITS header. When the focal plane is viewed with the real-time display at the telescope or with default SAOImage DS9 settings, the direction labeled "north" is displayed to the left and "east" at the top. The background colors of the CCDs indicate the electronics backplane that reads them out.
:::
## 1. Preparing the Science Pipelines
### 1.1. Set up the Science Pipelines
First, the LSST Science Pipelines ("*the stack*") needs to be set up on the local machine:
```bash=
source "/projects/HSC/LSST/stack/loadLSST.sh"
```
This will set up the most recent Rubin environment installed on the machine. A list of other installed Rubin environments is shown using `mamba`:
```bash=
mamba env list
```
Older Rubin environments contain older builds of the science pipelines. To switch to an older build, simply preface the shell source command above with the appropriate `LSST_CONDA_ENV_NAME` variable:
```bash=
LSST_CONDA_ENV_NAME=lsst-scipipe-4.0.0; \
source "/projects/HSC/LSST/stack/loadLSST.sh"
```
The most recently installed version can again be loaded by unsetting this variable:
```bash=
unset LSST_CONDA_ENV_NAME; \
source "/projects/HSC/LSST/stack/loadLSST.sh"
```
Once the science pipelines have been set up, we can now set up the main LSST software package, `lsst_distrib`, using `setup`:
```bash=
setup lsst_distrib
```
It's possible to set up a specific tagged version of `lsst_distrib` using `-t`:
```bash=
setup lsst_distrib -t w_2022_26
```
Check what version of `lsst_distrib` is being used using `eups`:
```bash=
eups list lsst_distrib | grep setup
# g0b29ad24fb+e8b8cae3ca current w_2022_26 setup
```
### 1.2. Register new filters
This step is only required if the data to be ingested uses a filter which is not already defined. Before being able to ingest raw science frames, all necessary filters being ingested need to be defined in the relevant `obs_` package (Update, and also in the `skymap` repo - see the end of this section for further details). Here, the relevant package is [obs_decam](https://github.com/lsst/obs_decam), and the filters file is located at [obs_decam/python/lsst/obs/decam/decamFilters.py](https://github.com/lsst/obs_decam/blob/master/python/lsst/obs/decam/decamFilters.py).
For this example, the required observation filter (`N708 DECam c0012 7080.0 400.0`) was not previously defined and had to be added manually. This modification has now been merged into the main branch, but the instructions on how to do this are maintained here, for reference. As a recap, to do so, first, `git clone` the `obs_decam` package into a local directory:
```bash=
OBSDECAM=/home/lkelvin/repos/obs_decam
git clone git@github.com:lsst/obs_decam.git $OBSDECAM
cd $OBSDECAM
```
If this is the first time the package has been cloned, it will also need to be built using `scons` (as with all Science Pipelines packages), e.g.:
```bash=
scons -j8
```
Next, we checkout a user branch from the main branch to work on:
```bash=
git checkout -b u/lskelvin/merian
```
Now add the relevant filter definition. In this case:
```bash=
FilterDefinition(physical_filter="N708 DECam c0012 7080.0 400.0",
band="N708",
lambdaEff=708),
```
Finally, make sure both `lsst_distrib` and the relevant `obs_` package (`obs_decam` here) are set up in the working shell:
```bash=
setup -j -r $OBSDECAM
```
Double check that the local package has been loaded using:
```bash=
eups list | grep LOCAL
# obs_decam LOCAL:/home/lkelvin/repos/obs_decam
```
Once complete, subsequent processing should be able to proceed. If a warning similar to `ingest WARN: Exposure DECam:ct4m20210318t032843 could not be registered: (sqlite3.IntegrityError) FOREIGN KEY constraint failed` is returned, check that all filters are correctly assigned in the filters file.
Finally, `refObjLoader` lookups to the new filter need to be added to a number of `obs_decam` config files to facilitate astrometric matching.
This allows data processing to proceed beyond `characterizeImage`, i.e., the final step required to produce a `calexp`.
Here, we map the new `N708` filter into the existing `i`-band filter (the nearest broad-band filter in wavelength) by adding lines similar to:
```bash=
refObjLoader.filterMap['N708'] = 'i'
```
into:
```bash=
config/characterizeImage.py
config/calibrate.py
config/measureCoaddSources.py
```
> Note: Ticket [DM-30692](https://jira.lsstcorp.org/browse/DM-30692) added these additional config lines into the main branch.
:::warning
New filters also need to be registered in the `skymap` repository. Central wavelengths for all required filters should be added to `python/lsst/skymap/packers.py`.
:::
### 1.3. Create a new butler
A new butler will be created in the directory `/projects/MERIAN/repo`. Here we set aliases for the output repository directory:
```bash=
REPO=/projects/MERIAN/repo
mkdir -p $REPO
chmod ug+rw $REPO
```
Whilst optional, it may be desirable to also construct a log directory, for log files to be stored within:
```bash=
LOGDIR=/projects/MERIAN/logs
mkdir -p $LOGDIR
chmod ug+rw $LOGDIR
```
If this repository will be used by more than one user, modify the permissions of the output repository directory to ensure that all files constructed below are writeable by all members of that user group:
```bash=
cd $REPO
umask 2
```
> Note: if changing permissions after the butler has been used, and if using an SQLite database (see below), you will also need to run `chmod ug+rw gen3.sqlite3` to make the SQLite database read/writable to all members of your group. You will also need to run `chmod ug+rw u` to make the user output directory (here named `u`) read/writable to all members of your group.
Next, an empty Gen3 Butler repository is created, and then the instrument is registered in the data repository. In this example, the instrument is the Dark Energy Camera (DECam).
There are two types of database that be be constructed for use with the butler, either a SQLite database, or a PostgreSQL database. The former is default, and simpler to set up. The latter provides significantly improved data processing times, but requires a PostgreSQL database to have already been set up on the data processing machine in advance. Both methods are summarized in the subsections below:
:::spoiler {state="closed"} **Option 1: Create a SQLite database (quickest and easiest)**<br><br>
On the command line, create a butler repo:
```bash=
butler create $REPO
```
This constructs a butler.yaml file in the $REPO directory.
> Note: after the `gen3.sqlite3` file has been constructed, you may have to manually add write permissions for group members by running the command: `chmod g+w gen3.sqlite3`.
:::
:::spoiler {state="closed"} **Option 2: Create a PostgreSQL database (most efficient)**<br><br>
Before beginning to create a butler, a PostgreSQL database must first be set up on the primary data processing machine.
Once a PostgreSQL server has been set up, it may be necessary to enable the `btree_gist` extension. To do so, log in to the server:
```bash=
psql -h localhost -U USERNAME SERVERNAME
```
and then create the extension:
```bash=
CREATE EXTENSION "btree_gist";
```
If hitting a superuser permissions issue, it may be necessary to reach out to the server admins to create this extension on your behalf.
With the server set up, a number of extra Science Pipelines configuration files also need to be in place. First, construct the seed config file. It is recommended that this file is constructed in, for example, `$REPO/seed-config.yaml`. Assuming the database is named 'merian', the contents of this file should look like this:
```bash=
datastore:
root: <butlerRoot>
registry:
db: postgresql+psycopg2://localhost:5432/merian
```
> Note: this file needs only be constructed once, for the purposes of creating the butler.
Second, a science pipelines authentication file needs to be created in `~/.lsst/db-auth.yaml`. The contents of this file should look like this:
```bash=
- url: postgresql://localhost:5432/merian
username: merian
password: MYSECRETPASSWORD
```
where `MYSECRETPASSWORD` is the database password for the `merian` database, for the `merian` user (the database and the username are the same in the example above, but do not necessarily need to be).
> Note: each user who wishes to interact with the butler repository needs to place a copy of this authentication file within their own home space.
The authentication file must be only readable by the users account (chmod 600 db-auth.yaml).
Finally, create the butler repo:
```bash=
butler create --seed-config seed-config.yaml $REPO
```
This constructs a butler.yaml file in the $REPO directory.
:::
### 1.4. Register the instrument
Once the butler repo has been created, register the instrument:
```bash=
butler register-instrument $REPO lsst.obs.decam.DarkEnergyCamera
```
The `register-instrument` command will need to be re-run (once only) every time a new filter is added to the filter definitions file.
> Note: the instrument name here needs to be the fully qualified name of an instrument subclass. Full names can be inferred from their respective `obs_` package at [github.com/lsst](github.com/lsst). For this example, the relevant `obs_` package is [obs_decam](https://github.com/lsst/obs_decam) and the fully qualified name is `lsst.obs.decam.DarkEnergyCamera`.
Finally, double check that all required filters are correctly registered with the butler:
```bash=
butler query-dimension-records $REPO physical_filter
```
In this case, double check that `N708 DECam c0012 7080.4 400.0` appears in the filter list.
### 1.5. Generate reference catalogues
The Science Pipelines require reference catalogues ("*refcats*") to accurately calibrate photometric and astrometric results. Two reference catalogues are required here: [Gaia DR2](https://community.lsst.org/t/gaia-dr2-reference-catalog-in-lsst-format) for astrometry, and [Pan-STARRS PS1](https://community.lsst.org/t/pan-starrs-reference-catalog-in-lsst-format) for photometry. Further information is also available on [the Community forum](https://community.lsst.org/t/gaia-dr2-reference-catalog-in-lsst-format/3901) and on [pipelines.lsst.io](https://pipelines.lsst.io/modules/lsst.meas.algorithms/creating-a-reference-catalog.html).
A number of different methods are available to ingest these catalogues. These steps are summarized below:
:::spoiler {state="closed"} **Option 1: Using `butler ingest-files`**<br><br>
#### Ingesting survey reference catalogues
The first step in constructing these reference catalogues is to gather the catalogue data together and ingest the files. This process is decidedly non-trivial, and may require several hours to complete even on a high-powered machine.
If ingesting for the first time, all raw files can be downloaded at the link described in [this comment on the Community forum](https://community.lsst.org/t/gaia-dr2-reference-catalog-in-lsst-format/3901/6).
Fortunately for our purposes, these downloaded FITS files already exist on the machine used here, and can be used directly:
```bash=
GAIADR2=/projects/HSC/refcats/htm/gaia_dr2_20200414
PANSTARRSPS1=/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110
```
Once the files are in place, we create astropy-readable `.ecsv` table files containing one row per input file in each reference catalogue. To construct these, in Python:
```python=
import os
import glob
import astropy.table
# output directory to save .ecsv files
outdir = "/home/lkelvin"
# full paths to LSST sharded reference catalogues
gaiadr2 = "/projects/HSC/refcats/htm/gaia_dr2_20200414"
panstarrsps1 = "/projects/HSC/refcats/htm/ps1_pv3_3pi_20170110"
refcat_dirs = [gaiadr2, panstarrsps1]
# loop over each FITS file in all refcats
# note: this constructs a series of .ecsv files, each containing two columns:
# 1) the FITS filename, and 2) the htm7 pixel index
for refcat_dir in refcat_dirs:
outfile = f"{outdir}/{os.path.basename(refcat_dir)}.ecsv"
print(f"Saving to: {outfile}")
table = astropy.table.Table(names=("filename", "htm7"), dtype=("str", "int"))
files = glob.glob(f"{refcat_dir}/[0-9]*.fits")
for ii, file in enumerate(files):
print(f"{ii}/{len(files)} ({100*ii/len(files):0.1f}%)", end="\r")
# try/except to catch extra .fits files which may be in this dir
try:
file_index = int(os.path.basename(os.path.splitext(file)[0]))
except ValueError:
continue
else:
table.add_row((file, file_index))
table.write(outfile)
```
> Note: the above script running on the `tiger` machine took ~20 minutes, 10 minutes per reference catalogue.
A `.ecsv` file should now exist for each reference catalogue. Next, register the dataset types for each reference catalogue with the butler:
```bash=
butler register-dataset-type $REPO gaia_dr2_20200414 SimpleCatalog htm7
butler register-dataset-type $REPO ps1_pv3_3pi_20170110 SimpleCatalog htm7
```
Check that both the Gaia DR2 and Pan-STARRS PS1 dataset types are now available using:
```bash=
butler query-dataset-types $REPO
```
Finally, ingest the LSST-formatted files into the `refcats/gen2` RUN collection in the repository:
```bash=
butler ingest-files -t link $REPO gaia_dr2_20200414 refcats/gen2 gaia_dr2_20200414.ecsv
butler ingest-files -t link $REPO ps1_pv3_3pi_20170110 refcats/gen2 ps1_pv3_3pi_20170110.ecsv
```
:::
:::spoiler {state="closed"} **Option 2: Using `convert_refcats.py`**<br><br>
#### Legacy reference catalogue conversion
When these instructions were first written, it was not possible to convert these refcats directly into a gen3 repo (as we're setting up here). The instructions below are preserved, for reference, but should no longer be required (assuming the above gen3-approach suffices). Here we use `butler convert` to map these data into our gen3 repo.
First, set up an empty temporary gen2 butler directory, and link the two reference datasets into this directory within a folder named `ref_cats`:
```bash=
GEN2REPO=/projects/MERIAN/repo_gen2
mkdir -p $GEN2REPO/ref_cats
echo "lsst.obs.decam.DecamMapper" > $GEN2REPO/_mapper
ln -s $GAIADR2 $GEN2REPO/ref_cats/
ln -s $PANSTARRSPS1 $GEN2REPO/ref_cats/
```
Next, a simple config file is required:
```bash=
echo 'config.refCats.append("gaia_dr2_20200414")
config.runs["gaia_dr2_20200414"] = "refcats"
config.refCats.append("ps1_pv3_3pi_20170110")
config.runs["ps1_pv3_3pi_20170110"] = "refcats"
config.doRegisterInstrument = False' \
> $GEN2REPO/convert_refcats.py
```
Finally, we need to run the Science Pipelines refcat conversion script [rootRepoConverter.py](https://github.com/lsst/obs_base/blob/master/python/lsst/obs/base/gen2to3/rootRepoConverter.py).
> Note: at the time of writing, a modified version of `rootRepoConverter.py` was required to complete the `butler convert` command below without raising errors. Specifically, [L137-L140](https://github.com/lsst/obs_base/blob/df941304a811ff8835745fa3b69621823ac968bd/python/lsst/obs/base/gen2to3/rootRepoConverter.py#L137-L140) (from `if not self.task.dry_run:` to `else:`) were commented out, [L141](https://github.com/lsst/obs_base/blob/df941304a811ff8835745fa3b69621823ac968bd/python/lsst/obs/base/gen2to3/rootRepoConverter.py#L141) (`self.task.log.info...`) was unindented, and [L208-L226](https://github.com/lsst/obs_base/blob/df941304a811ff8835745fa3b69621823ac968bd/python/lsst/obs/base/gen2to3/rootRepoConverter.py#L208-L226) (from `if self._refCats:` to `self._chain.append(chained)`) were commented out. This approach is not recommended for general use. This script also adds curated calibrations, negating the relevant section below.
>
> To set up this package, run `setup -j -r ~/repos/obs_base` and either switch to a branch with the changes listed above in place, or make the changes temporarily to the code in place.
>
> Once the command below has completed successfully, any changes made to the master branch can be undone using this on the command line: `git reset --hard origin/master`.
>
> At present, ticket [DM-30624](https://jira.lsstcorp.org/browse/DM-30624) now makes ingesting refcats from a gen2 repo easier.
To run the butler conversion script:
```bash=
LOGFILE=$LOGDIR/convert_refcats.log; \
date | tee $LOGFILE; \
butler convert $REPO \
--gen2root $GEN2REPO \
-C $GEN2REPO/convert_refcats.py \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: the runtime for this command was ~20 minutes.
:::
:::spoiler {state="closed"} **Option 3: Using `butler transfer-datasets`**<br><br>
#### Repo-to-Repo Refcat Dataset Transfer
If a butler already exists on the machine, the refcat datasets can be transferred over from the existing butler directly:
```bash=
LOGFILE=$LOGDIR/transfer_refcats.log; \
date | tee $LOGFILE; \
butler transfer-datasets --register-dataset-types \
-t copy --collections refcats/gen2 \
/projects/MERIAN/repo $REPO \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: on the `lsst-devl` machine at NCSA, the runtime for this command was ~60 minutes. The transfer type can be a number of different options, such as `copy` to make a copy of the files, or `link` to make only a symbolic link.
:::
Once the refcats are in place, their collection can be confirmed:
```bash=
butler query-collections $REPO "refcats*"
```
Following a dataset transfer, it may be necessary to set up the parent `refcats` parent CHAINED collection which contains a comma-separated list of all of the transferred child `refcats/...` RUN collections:
```bash=
PARENT=refcats; \
CHILDREN=refcats/gen2; \
butler collection-chain $REPO $PARENT $CHILDREN
```
### 1.6. Register the skyMap
Make a sky map and add it to the repository. Sky maps exist as dimensions, datasets and collections. Until the [DM-34516](https://jira.lsstcorp.org/browse/DM-34516) ticket was merged, prior DECam data reductions made use of the HSC skymap: `hsc_rings_v1`. DECam data reductions now instead make use of a DECam-specific sky map, maintaining the native pixel scale of DECam imaging. By default, DECam tracts have the same centroids as their HSC-counterparts, albeit with fewer patches per tract. An example DECam tract is shown below:
![DECam tract 9704](https://jira.lsstcorp.org/secure/attachment/59780/decam9704.png "DECam tract 9704")
:::info
DECam Tract 9704 as defined using the `decam_rings_v1` sky map. DECam tracts have the same centroids as HSC tracts (as defined by the `hsc_rings_v1` sky map). Each DECam tract is further sub-divided into 6x6 patches, each of ~4k pixels on a side.
:::
This command registers the DECam `skyMap` dataset using the default `obs_decam` configuration, and setting the output name to the commonly used and de-facto standard `decam_rings_v1`:
```bash=
LOGFILE=$LOGDIR/register_decam_rings_v1.log; \
date | tee $LOGFILE; \
butler register-skymap $REPO \
-C $OBS_DECAM_DIR/config/makeSkyMap.py \
-c name='decam_rings_v1' \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: the runtime for this command was ~5 minutes.
If the HSC skymap is also required, it may be generated as well:
```bash=
LOGFILE=$LOGDIR/register_hsc_rings_v1.log; \
date | tee $LOGFILE; \
butler register-skymap $REPO \
-C $OBS_SUBARU_DIR/config/makeSkyMap.py \
-c name='hsc_rings_v1' \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
Check that all required `skyMap` dataset types now exist in the `skymaps` run collection in the Merian repo:
```bash=
butler query-datasets $REPO skyMap
```
> Note: this sky map step doesn't really need to be performed until much later, however, if any errors occur during registration of the sky map, it may be necessary to delete the repo and start afresh. For this reason, it's usually better to perform this step as soon as possible.
### 1.7. Write curated calibrations
Curated calibrations are collections of calibration data which describe various aspects of the camera and survey. If setting up a new butler on a new machine, an instrument's curated calibrations will need to be added to the data repository:
```bash=
butler write-curated-calibrations $REPO lsst.obs.decam.DarkEnergyCamera
```
> Note: if the modified version of `rootRepoConverter.py` was used above, the dataset types added by this command may already have been ingested into the repo. If so, running the above command will fail with an error similar to `A database constraint failure was triggered by inserting one or more datasets of type DatasetType('camera', {instrument}, Camera, isCalibration=True) into collection 'DECam/calib/unbounded'. This probably means a dataset with the same data ID and dataset type already exists, but it may also mean a dimension row is missing.`.
The instrument may be specified via either the fully qualified name, as above, or the short name (`DECam` here). The `-h` help file indicates the former is required, but this advice may change in the future.
This currently adds `camera`, `crosstalk`, `defects` and `linearizer` dataset types into the repository within a number of collections (`DECam/calib`, `DECam/calib/unbounded`, and `DECam/calib/curated/{timestamp}`). Check the current collections within the repo using:
```bash=
butler query-collections $REPO "DECam/calib*"
```
## 2. Ingest raw data
### 2.1. Ingest raw science frames
We're now prepared to ingest raw science frames. If raw frames are being stored in multiple directories, this command needs to be repeated for each directory. Alternatively, a sufficient glob which is able to locate all files of interest may be supplied. Here's an example data ingest command:
```bash=
LOGFILE=$LOGDIR/merian9813_ingest_science.log; \
SCIFILES=/project/lskelvin/decam/raw-merian/science/raw_*.fz; \
date | tee $LOGFILE; \
butler ingest-raws $REPO $SCIFILES --transfer link \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: The runtime for this command was ~5 minutes, ingesting 2480 distinct Butler datasets from 40 exposures.
Raw science exposures have now been added to the `DECam/raw/all` collection in the Merian repo. Collections define groups of data, and can be listed (and searched) using:
```bash=
butler query-collections $REPO "DECam/raw/all"
```
Ingested exposures can be listed on the command line using:
```bash=
butler query-dimension-records $REPO exposure \
--where "instrument='DECam' AND exposure.observation_type='science'"
```
This view shows all available *dimensions* associated with each ingested science image, including the observation ID, the physical filter, and the observation type. Alternatively, datasets can be queried directly using `query-datasets`, with optional SQL-like `--where` arguments to search specific dimensions, e.g.:
```bash=
WHERE="instrument='DECam' AND exposure=971666"
WHERE="instrument='DECam' AND detector=1"
WHERE="instrument='DECam'
AND exposure.observation_type='science'
AND exposure.day_obs > 20220101
AND exposure.day_obs < 20220201
AND detector=1"
butler query-datasets $REPO --where $WHERE raw
```
> Note: To successfully use the `--where` arguement, other dimensions may be required, such as `instrument`. The butler will complain with a `UserExpressionError` if a required dimension is not found.
A list of science exposure IDs can similarly be extracted within python:
```python=
queryData = butler.registry.queryDatasets
where = "exposure.observation_type='science' AND detector=1"
exps = list(queryData("raw", collections="DECam/raw/all",
instrument="DECam", where=where))
expids = tuple(x.dataId["exposure"] for x in exps)
print(f'SCIEXPS="{expids}"')
```
The test dataset used here returns this list of science exposures:
```bash=
SCIEXPS="(971666, 971667, 971668, 971669, 971670, 971671, 971672, 971673, 971674, 971675, 971676, 971677, 971678, 971679, 971680, 971681, 971682, 971683, 971684, 971685, 1068554, 1068555, 1068556, 1068557, 1068558, 1068559, 1068560, 1068561, 1068562, 1068713, 1068714, 1068715, 1068716, 1068717, 1068718, 1068719, 1068720, 1068721, 1068722, 1068723)"
```
### 2.2. Ingest raw bias frames
As with the raw science frames above, raw bias frames ('*zero*') are also ingested:
```bash=
LOGFILE=$LOGDIR/merian9813_ingest_bias.log; \
BIASFILES=/project/lskelvin/decam/raw-merian/bias/raw_*.fz; \
date | tee $LOGFILE; \
butler ingest-raws $REPO $BIASFILES --transfer link \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: The runtime for this command was ~1 minute, ingesting 3100 distinct Butler datasets from 50 exposures.
Bias exposures have now been ingested into the Merian repo. Check that the bias calibration frames have been successfully ingested using:
```bash=
butler query-dimension-records $REPO exposure \
--where "instrument='DECam' AND exposure.observation_type='zero'"
```
A list of bias exposure IDs can be extracted within python:
```python=
queryData = butler.registry.queryDatasets
where = "exposure.observation_type='zero' AND detector=1"
exps = list(queryData("raw", collections='DECam/raw/all',
instrument="DECam", where=where))
expids = tuple(x.dataId["exposure"] for x in exps)
print(f'BIASEXPS="{expids}"')
```
The test dataset used here returns this list of bias exposures:
```bash=
BIASEXPS="(970488, 970489, 970490, 970491, 970492, 970493, 970494, 970495, 970496, 970497, 970498, 970589, 970590, 970591, 970592, 970593, 970594, 970595, 970596, 970597, 970598, 970599, 970823, 970824, 970825, 970826, 970827, 970828, 970829, 970830, 970831, 970832, 970833, 970924, 970925, 970926, 970927, 970928, 970929, 970930, 970931, 970932, 970933, 970934, 971161, 971162, 971163, 971164, 971165, 971166)"
```
### 2.3. Ingest raw flat frames
The final set of data to be ingested are the raw flat frames ('*dome flat*'):
```bash=
LOGFILE=$LOGDIR/merian9813_ingest_flat.log; \
FLATFILES=/project/lskelvin/decam/raw-merian/flat/raw_*.fz; \
date | tee $LOGFILE; \
butler ingest-raws $REPO $FLATFILES --transfer link \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: The runtime for this command was ~5 minutes, ingesting 6200 distinct Butler datasets from 100 exposures.
Flat exposures have now been ingested into the Merian repo. Check that the flat calibration frames have been successfully ingested using:
```bash=
butler query-dimension-records $REPO exposure \
--where "instrument='DECam' AND exposure.observation_type='dome flat'"
```
> Note: if an attempt is made to ingest a file which was already been ingested, the science pipelines will fail for that particular file. This behaviour is as expected, and not a cause for concern.
A list of flat exposure IDs can be extracted within python:
```python=
queryData = butler.registry.queryDatasets
where = "exposure.observation_type='dome flat' AND detector=1"
exps = list(queryData("raw", collections="DECam/raw/all",
instrument="DECam", where=where))
expids = tuple(x.dataId["exposure"] for x in exps)
print(f'FLATEXPS="{expids}"')
```
The test dataset used here returns this list of flat exposures:
```bash=
FLATEXPS="(970228, 970229, 970230, 970231, 970232, 970233, 970234, 970235, 970236, 970237, 970238, 970501, 970502, 970503, 970504, 970505, 970506, 970507, 970508, 970509, 970510, 970511, 970836, 970837, 970838, 970839, 970840, 970841, 970842, 970843, 970844, 970845, 970846, 971174, 971175, 971176, 971177, 971178, 971179, 971180, 971181, 971182, 971183, 971184, 971554, 971555, 971556, 971557, 971558, 971559, 1052706, 1052707, 1052708, 1052709, 1052710, 1052711, 1052712, 1052713, 1052714, 1052715, 1052716, 1053093, 1053094, 1053095, 1053096, 1053097, 1053098, 1053099, 1053100, 1053101, 1053102, 1053103, 1053485, 1053486, 1053487, 1053488, 1053489, 1053490, 1053491, 1053492, 1053493, 1053494, 1053495, 1053858, 1053859, 1053860, 1053861, 1053862, 1053863, 1053864, 1053865, 1053866, 1053867, 1053868, 1054287, 1054288, 1054289, 1054290, 1054291, 1054292)"
```
### 2.4. Define visits
Once all raw data has been ingested, we can define visits from exposures in the butler registry. This sets up the exposure IDs within the butler, allowing future runs to use this information when using the `-d` or `--where` data queries. Without this step, processing steps after ISR (i.e., `characterizeImage` onwards) will fail with `RuntimeError: QuantumGraph is empty.`.
```bash=
LOGFILE=$LOGDIR/merian9813_define_visits.log; \
date | tee $LOGFILE; \
butler define-visits $REPO lsst.obs.decam.DarkEnergyCamera \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: this run took ~1 minute with these data. Do *not* use the `-j N` syntax here to run this process on more than one processor. Doing so will cause multiple `database is locked` errors as each processor attempts to write to the same butler. Ticket [DM-30607](https://jira.lsstcorp.org/browse/DM-30607) references this issue.
## 3. Calibration
### 3.1. Determine calibration frame validity ranges
Now that raw bias (zero) and dome flat calibration frames have been ingested, validation date ranges need to be determined. Prior analyses of these data show that all Merian observation nights are consistent with each other to within a 1% flux deviation. For this reason, we opt to construct calibraton frames which are certified across the entire timespan of our data; from 2021-01-01 to 2022-06-30.
### 3.2. Build bias frames
Next, the master bias frames are built. These frames need to be built for each valid date range (see section above).
First, check the build pipeline:
```bash=
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \
--show pipeline
```
:::spoiler Click here to toggle the DECam `cpBias` pipeline at the time of writing.<br><br>
```yaml=
description: cp_pipe BIAS calibration construction for DECam
instrument: lsst.obs.decam.DarkEnergyCamera
tasks:
isr:
class: lsst.ip.isr.IsrTask
config:
- connections.ccdExposure: raw
connections.outputExposure: cpBiasProc
doWrite: true
doBias: false
doVariance: true
doLinearize: false
doCrosstalk: false
doBrighterFatter: false
doDark: false
doFlat: false
doApplyGains: false
doFringe: false
cpBiasCombine:
class: lsst.cp.pipe.cpCombine.CalibCombineTask
config:
- connections.inputExpHandles: cpBiasProc
connections.outputData: bias
calibrationType: bias
exposureScaling: Unity
contracts:
- contract: cpBiasCombine.calibrationType == "bias"
- contract: cpBiasCombine.exposureScaling == "Unity"
- contract: isr.doBias == False
```
:::
The `cpBias` pipeline may also be viewed graphically:
```bash=
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpBias.pdf
```
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/cp_pipe/DarkEnergyCamera/cpBias/pipeline_cp_pipe_DarkEnergyCamera_cpBias.pdf %}
[pipeline_cpBias.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/cp_pipe/DarkEnergyCamera/cpBias/pipeline_cp_pipe_DarkEnergyCamera_cpBias.pdf)
Build master bias (zero) frames, ensuring that all required input collections are given as arguments to `-i`:
```bash=
LOGFILE=$LOGDIR/merian9813_cpBias.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \
-o DECam/calib/merian9813/bias \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml \
-d "instrument='DECam' AND exposure IN $BIASEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: This job contained 3162 quanta, with a runtime of ~2 hours. The `-d` data query takes a wide range of SQL-like arguments. Instead of specifying exposures by their exposure ID, it may be preferable to build them by date instead, e.g.: `"exposure.observation_type='zero' AND exposure.day_obs > 20210910"`.
To avoid an error regarding missing `defects` dataset types, an input collection containing `defects` must also be supplied in the `cpBias` run. Here, these data will be ingested into the repo when running `write-curated-calibrations`, for example. The instructions here make use of the `DECam/calib/curated/19700101T000000Z` collection. Other collections containing `defects` are also available, however, some of the commonly unused detectors are missing (i.e., there are <62). If `defects` data are not available at all, adding `-c isr:doDefect=False` to the `pipetask run` command will disable defect masking when running the `cpBias` pipeline.
On occasion, some of the tasks (quanta) may fail, likely due to memory issues. In such cases, an afterburner can be run on a single core to try the failed tasks again. To do so, add `--extend-run` and `--skip-existing` to the `pipetask` run command, and remove `-j N` to prevent it from running on multiple cores. This will help ensure that the most memory-intensive quanta will not request too much simultaneous memory usage.
Check the collections, dataset types and datasets now present in the repo:
```bash=
butler query-collections $REPO "*bias*"
butler query-dataset-types $REPO
butler query-datasets $REPO --collections DECam/calib/merian9813/bias
butler query-datasets $REPO --collections DECam/calib/merian9813/bias bias
```
### 3.3. Certify bias frames
Certify the biases for a given date range. Arguments: `REPO`, `INPUT_COLLECTION`, `OUTPUT_COLLECTION`, `DATASET_TYPE_NAME`:
```bash=
butler certify-calibrations \
$REPO DECam/calib/merian9813/bias DECam/calib/merian9813 bias \
--begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59
```
You may check what certified date ranges have been applied to the bias data in Python by querying dataset associations in the output collection. For example, to check only detector #1:
```python=
qda = butler.registry.queryDatasetAssociations
coll = "DECam/calib/merian9813"
biases = [x for x in qda("bias", collections=coll) if x.ref.dataId["detector"] == 1]
print(biases)
```
which produces a list of all biases relating to detector #1 (in the case of this example document, there should be only 1 result at present). Inspecting the properties of this object gives the timespan, e.g.:
```python=
print(f"{biases[0].timespan.begin.value = }")
# biases[0].timespan.begin.value = '2021-01-01 00:00:00.000000'
print(f"{biases[0].timespan.end.value = }")
# biases[0].timespan.end.value = '2022-06-30 23:59:59.000000'
```
### 3.4. Generate crosstalk sources
The next step is to generate crosstalk sources using step 0 of the Data Release Production (DRP) pipeline (`DRP.yaml`). Crosstalk sources need to be generated for any raw we want to run actual ISR on (i.e., raw flats and raw science frames). Step 0 of `DRP.yaml` runs only the `doOverscan` aspect of the `ISR` (instrument signature removal) task. It can be visualized using:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
--show pipeline
```
:::spoiler Click here to toggle the Merian step 0 pipeline at the time of writing.<br><br>
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
isrForCrosstalkSources:
class: lsst.ip.isr.IsrTask
config:
- connections.outputExposure: overscanRaw
doOverscan: true
doAssembleCcd: false
doBias: false
doCrosstalk: false
doVariance: false
doLinearize: false
doDefect: false
doNanMasking: false
doDark: false
doFlat: false
doFringe: false
doInterpolate: false
subsets:
step0:
subset:
- isrForCrosstalkSources
description: |
Tasks which should be run once, prior to initial data processing.
This step generates crosstalk sources for ISR/inter-chip crosstalk by
applying overscan correction on raw frames. A new dataset is written,
which should be used as an input for further data processing.
```
:::
The current step 0 pipeline may also be viewed graphically:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step0.pdf
```
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step0.pdf %}
[pipeline_step0.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step0.pdf)
Run `step0` for raw flats:
```bash=
LOGFILE=$LOGDIR/merian9813_step0_flat.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \
-o DECam/calib/merian9813/crosstalk \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
-d "instrument='DECam' AND exposure IN $FLATEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: this run contained 6200 quanta, with a runtime of ~2 hours. The final run message for these data was: `Executed 6200 quanta successfully, 0 failed and 0 remain out of total 6200 quanta.`
Extend the `crosstalk` RUN collection to also include science exposures:
```bash=
LOGFILE=$LOGDIR/merian9813_step0_science.log; \
date | tee -a $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
--extend-run --skip-existing \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/merian9813,DECam/calib/unbounded \
-o DECam/calib/merian9813/crosstalk \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step0 \
-d "instrument='DECam' AND exposure IN $SCIEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: this run contained 2480 quanta, with a runtime of ~30 minutes. The final run message for these data was: `Executed 2480 quanta successfully, 0 failed and 0 remain out of total 2480 quanta.`
The `overscanRaw` dataset types should now be available in the output repository. Check the collections and datasets:
```bash=
butler query-collections $REPO
butler query-datasets $REPO overscanRaw
```
### 3.5. Build flat frames
This step constructs the master flat frames (which requires using the biases). The `cpFlat` pipeline can be visualized using:
```bash=
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \
--show pipeline
```
:::spoiler Click here to toggle the DECam `cpFlat` pipeline at the time of writing.<br><br>
```yaml=
description: cp_pipe FLAT calibration construction for DECam
instrument: lsst.obs.decam.DarkEnergyCamera
tasks:
isr:
class: lsst.ip.isr.IsrTask
config:
- connections.ccdExposure: raw
connections.outputExposure: cpFlatProc
doWrite: true
doBrighterFatter: false
doFlat: false
doFringe: false
doApplyGains: false
connections.crosstalkSources: overscanRaw
connections.bias: bias
doDark: false
cpFlatMeasure:
class: lsst.cp.pipe.cpFlatNormTask.CpFlatMeasureTask
config:
- connections.inputExp: cpFlatProc
connections.outputStats: flatStats
doVignette: false
cpFlatNorm:
class: lsst.cp.pipe.cpFlatNormTask.CpFlatNormalizationTask
config:
- connections.inputMDs: flatStats
connections.outputScales: cpFlatNormScales
level: AMP
cpFlatCombine:
class: lsst.cp.pipe.cpCombine.CalibCombineByFilterTask
config:
- connections.inputExpHandles: cpFlatProc
connections.inputScales: cpFlatNormScales
connections.outputData: flat
calibrationType: flat
exposureScaling: InputList
scalingLevel: AMP
contracts:
- contract: isr.doFlat == False
```
:::
The `cpFlat` pipeline may also be viewed graphically:
```bash=
pipetask build \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_cpFlat.pdf
```
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/cp_pipe/DarkEnergyCamera/cpFlat/pipeline_cp_pipe_DarkEnergyCamera_cpFlat.pdf %}
[pipeline_cpFlat.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/cp_pipe/DarkEnergyCamera/cpFlat/pipeline_cp_pipe_DarkEnergyCamera_cpFlat.pdf)
Build master flats:
```bash=
LOGFILE=$LOGDIR/merian9813_cpFlat.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i DECam/raw/all,DECam/calib/merian9813,DECam/calib/merian9813/crosstalk,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded \
-o DECam/calib/merian9813/flat \
-p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpFlat.yaml \
-d "instrument='DECam' AND exposure IN $FLATEXPS" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: this run contained 12526 quanta, with a runtime of ~3 hours. Initializing the task may also take some time, due to the complex nature of the quantum graph.
On occasion, some quanta (tasks) may fail, likely due to memory issues. In such cases, the missing master flats can be reattempted in an afterburner by excluding `-j N` (to run on only a single core) and including `--extend-run --skip-existing` in the `pipetask` run command.
Check what types of data now exist in the output collection:
```bash=
butler query-collections $REPO
butler query-dataset-types $REPO
butler query-datasets $REPO --collections DECam/calib/merian9813/flat
butler query-datasets $REPO --collections DECam/calib/merian9813/flat flat
```
> Note: the final two commands will not work if the output collection is a CHAINED collection containing a CALIBRATION child collection. If attempted, the above commands will fail with `NotImplementedError: Query for dataset type 'camera' in CALIBRATION-type collection 'DECam/calib/merian' is not yet supported.`. As a work-around, instead provide the full name of the child RUN collection as given by `query-collections`.
### 3.6. Certify flat frames
Certify the flats for a given date range.
Arguments: `REPO`, `INPUT_COLLECTION`, `OUTPUT_COLLECTION`, `DATASET_TYPE_NAME`:
```bash=
butler certify-calibrations \
$REPO DECam/calib/merian9813/flat DECam/calib/merian9813 flat \
--begin-date 2021-01-01T00:00:00 --end-date 2022-06-30T23:59:59
```
You may check what certified date ranges have been applied to the flat data in Python by querying dataset associations in the output collection.
For example, to check only detector #1:
```python=
qda = butler.registry.queryDatasetAssociations
coll = "DECam/calib/merian9813"
flats = [x for x in qda("flat", collections=coll) if x.ref.dataId["detector"] == 1]
print(flats)
```
which produces a list of all flats relating to detector #1 (in the case of this example document, there should be only 1 result at present).
Inspecting the properties of this object gives the timespan, e.g.:
```python=
print(f"{flats[0].timespan.begin.value = }")
# flats[0].timespan.begin.value = '2021-01-01 00:00:00.000000'
print(f"{flats[0].timespan.end.value = }")
# flats[0].timespan.end.value = '2022-06-30 23:59:59.000000'
```
## 4. Set up a default collection
Data in the Science Pipelines are arranged into *collections*; groupings of data. Here we establish a default collection which contains the commonly required raw RUN collections. Whilst this step is not strictly necessary, this will allow us to specify only a single `INPUT` collection for future raw data processing:
```bash=
INPUT=DECam/defaults/merian9813
```
If this step is not performed, future data processing will need to specify all required input collections explicitly:
```bash=
-i long,comma,separated,list,of,child,collections
```
If this step is followed then future data processing from raw data should only need to specify the default collection:
```bash=
-i $INPUT
```
> Note: it is not currently possible to query a CHAINED collection containing a CALIBRATION child collection. By constructing a dedicated CHAINED collection containing only the RUN runs of interest, this will allow users to query the CHAINED collection and avoid this error.
A CHAINED collection can be set up either on the command line or in Python.
To set up a CHAINED collection on the command line for all required input collections, run:
```bash=
CHILDREN="DECam/raw/all,\
DECam/calib/merian9813,\
DECam/calib/merian9813/crosstalk,\
DECam/calib/curated/19700101T000000Z,\
DECam/calib/unbounded,\
skymaps,\
refcats"
butler collection-chain $REPO $INPUT $CHILDREN
```
> Note: the CHILDREN list may be amended and the above command re-run to update this parent collection, if, for example, new data has been processed and a user would like to add the updated crosstalk RUN collection to this parent CHAINED collection.
Alternatively, this may also be achieved in Python:
```python=
import lsst.daf.butler as dafButler
REPO = "/project/lskelvin/repo"
default_collection = "DECam/defaults/merian9813"
# Set up a writeable butler
butler_writeable = dafButler.Butler(REPO, writeable=True)
registry_writeable = butler_writeable.registry
# Register a new default CHAINED collection
registry_writeable.registerCollection(default_collection,
type = dafButler.CollectionType.CHAINED)
# Add required CHILD collections into the CHAINED collection
registry_writeable.setCollectionChain(default_collection,
["DECam/raw/all",
"DECam/calib/merian9813",
"DECam/calib/merian9813/crosstalk"
"DECam/calib/curated/19700101T000000Z",
"DECam/calib/unbounded",
"skymaps",
"refcats"])
```
> Note: as above, if reprocessing data in future runs, you can amend the list above to add your own collections, and then re-run `setCollectionChain` to update `default_collection`. This allows for the default collection to stay relevant in linking to all necessary datasets as new data becomes available.
## 5. Data release production
In this section we will proceed through all the relevant data processing steps to take raw DECam science data through to coadd outputs. These processed data will output into the `OUTPUT` CHAINED collection:
```bash=
OUTPUT=DECam/runs/merian9813/w_2022_26
```
Here, the `w_2022_26` is a reference to the weekly of the LSST Science Pipelines used to reduce these data.
Processing consists of three main steps (1, 2 and 3):
* Step 1: single frame processing
* instrumental signature removal, initial bg subtraction / calibration / PSF estimation
* Step 2: post single frame processing
* step 2a - initial visit aggregation
* step 2b - tract-level characterization
* step 2c - global collection summaries
* step 2d - final source table generation
* Step 3: coadd level processing
* warping visit-level images onto the coadd plane, constructing a coadd, running detection & deblending algorithms
If outputting to an already existing collection in the commands below, the following arguments should be appended to the `pipetask` run commands below:
```bash=
--extend-run --skip-existing --clobber-outputs
```
### 5.1. Step 1 - single frame processing
Processed visit images (PVIs) and preliminary source tables are produced in step 1.
:::spoiler {state="closed"} Click here to toggle the Merian step 1 YAML.<br><br>
To generate the pipeline YAML:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \
--show pipeline
```
which gives:
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
isr:
class: lsst.ip.isr.IsrTask
config:
- connections.crosstalkSources: overscanRaw
doCrosstalk: true
characterizeImage:
class: lsst.pipe.tasks.characterizeImage.CharacterizeImageTask
calibrate:
class: lsst.pipe.tasks.calibrate.CalibrateTask
config:
- photoCal.match.referenceSelection.magLimit.fluxField: i_flux
photoCal.match.referenceSelection.magLimit.maximum: 22.0
writePreSourceTable:
class: lsst.pipe.tasks.postprocess.WriteSourceTableTask
config:
- connections.outputCatalog: preSource
transformPreSourceTable:
class: lsst.pipe.tasks.postprocess.TransformSourceTableTask
config:
- connections.inputCatalog: preSource
connections.outputCatalog: preSourceTable
subsets:
processCcd:
subset:
- characterizeImage
- isr
- calibrate
description: 'Set of tasks to run when doing single frame processing, without
any conversions to Parquet/DataFrames or visit-level summaries.
'
step1:
subset:
- writePreSourceTable
- calibrate
- transformPreSourceTable
- characterizeImage
- isr
description: |
Per-detector tasks that can be run together to start the DRP pipeline.
These should never be run with 'tract' or 'patch' as part of the data ID
expression if any later steps will also be run, because downstream steps
require full visits and 'tract' and 'patch' constraints will always
select partial visits that overlap that region.
```
:::
:::spoiler {state="closed"} Click here to toggle the Merian step 1 graph.<br><br>
To generate the pipeline graph:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step1.pdf
```
which gives:
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step1.pdf %}
[pipeline_step1.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step1.pdf)
:::
Run step 1:
```bash=
DATAQUERY="exposure.day_obs > 20210101
AND exposure.day_obs < 20220630
AND exposure.observation_type='science'
AND detector NOT IN (31,61)"
LOGFILE=$LOGDIR/merian9813_step1.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step1 \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: the runtime for this 12000 quanta run was ~4 hours. As is common in DECam data processing, detectors 31 and 61 have been excluded from data reduction owing to their corrupt nature. Instead of selecting by date as above, a wide range of data selectors may be used instead to identify specific raw data to be processed. For example, '`exposure IN $SCIEXPS`' would only process exposures defined in the `SCIEXPS` object.
### 5.2a. Step 2a - initial visit aggregation
Initial visit aggregation takes place in step 2a, producing visit-wide preliminary source tables and visit summaries.
:::spoiler {state="closed"} Click here to toggle the Merian step 2a YAML.<br><br>
To generate the pipeline YAML:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \
--show pipeline
```
which gives:
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
consolidateVisitSummary:
class: lsst.pipe.tasks.postprocess.ConsolidateVisitSummaryTask
consolidatePreSourceTable:
class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask
config:
- connections.inputCatalogs: preSourceTable
connections.outputCatalog: preSourceTable_visit
subsets:
step2a:
subset:
- consolidateVisitSummary
- consolidatePreSourceTable
description: |
Visit-level tasks
Allowed data query constraints: visit
Tasks aggregate all detectors for a given visit.
```
:::
:::spoiler {state="closed"} Click here to toggle the Merian step 2a graph.<br><br>
To generate the pipeline graph:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2a.pdf
```
which gives:
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2a.pdf %}
[pipeline_step2a.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2a.pdf)
:::
Run step 2a:
```bash=
LOGFILE=$LOGDIR/merian9813_step2a.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2a \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
> Note: the runtime for this 80 quanta run was ~3 minutes.
### 5.2b. Step 2b - Photometric and astrometric calibration
Photometric and astrometric calibration take place at the tract level in step 2b.
:::spoiler {state="closed"} Click here to toggle the Merian step 2b YAML.<br><br>
To generate the pipeline YAML:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \
--show pipeline
```
which gives:
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
isolatedStarAssociation:
class: lsst.pipe.tasks.isolatedStarAssociation.IsolatedStarAssociationTask
config:
- python: config.band_order += ["N708", "N540"]
jointcal:
class: lsst.jointcal.JointcalTask
config:
- connections.inputSourceTableVisit: preSourceTable_visit
subsets:
step2b:
subset:
- isolatedStarAssociation
- jointcal
description: |
Tract-level tasks
Allowed data query constraints: tract
Jointcal and isolatedStarAssociation both use PreSources, generated
by consolidatePreSourceTable, for all visits that overlap a tract.
jointcal produces solutions per-tract, per-visit
isolatedStarAssociation produces solutions per-tract.
```
:::
:::spoiler {state="closed"} Click here to toggle the Merian step 2b graph.<br><br>
To generate the pipeline graph:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2b.pdf
```
which gives:
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2b.pdf %}
[pipeline_step2b.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2b.pdf)
:::
Run step 2b:
```bash=
LOGFILE=$LOGDIR/merian9813_step2b.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2b \
-d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
### 5.2c. Step 2c - global collection summaries (DEPRECATED IN 2023)
:::warning
Step 2c no longer exists in the current (2023+) Merian/DECam data reduction pipeline.
The notes in this section below have been preserved as-is, but should not be referenced for future data reduction efforts.
:::
Global per-collection summaries of visits and detectors are generated in step 2c.
:::spoiler {state="closed"} Click here to toggle the Merian step 2c YAML.<br><br>
To generate the pipeline YAML:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \
--show pipeline
```
which gives:
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
makeCcdVisitTable:
class: lsst.pipe.tasks.postprocess.MakeCcdVisitTableTask
makeVisitTable:
class: lsst.pipe.tasks.postprocess.MakeVisitTableTask
subsets:
step2c:
subset:
- makeVisitTable
- makeCcdVisitTable
description: |
Global-level tasks that must not be run with any data query constraints
Can be run anytime after subset step2a.
Allowed data query constraints: instrument
Tasks generate one data product per collection.
make[Ccd]VisitTable produces per-collection summary of the Visits
and CcdVisits.
```
:::
:::spoiler {state="closed"} Click here to toggle the Merian step 2c graph.<br><br>
To generate the pipeline graph:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2c.pdf
```
which gives:
{%pdf https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2c.pdf %}
[pipeline_step2c.pdf](https://lsst.ncsa.illinois.edu/~lskelvin/decam/pipeline_step2c.pdf)
:::
Run step 2c:
```bash=
LOGFILE=$LOGDIR/merian9813_step2c.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2c \
-d "instrument='DECam'" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
### 5.2d. Step 2d - final source table generation
Generation of final source tables with full calibrations applied takes place in step 2d.
:::spoiler {state="closed"} Click here to toggle the Merian step 2d YAML.<br><br>
To generate the pipeline YAML:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \
--show pipeline
```
which gives:
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
transformSourceTable:
class: lsst.pipe.tasks.postprocess.TransformSourceTableTask
consolidateSourceTable:
class: lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask
finalizeCharacterization:
class: lsst.pipe.tasks.finalizeCharacterization.FinalizeCharacterizationTask
writeRecalibratedSourceTable:
class: lsst.pipe.tasks.postprocess.WriteRecalibratedSourceTableTask
config:
- useGlobalExternalPhotoCalib: false
connections.photoCalibName: jointcal
connections.outputCatalog: source
subsets:
step2d:
subset:
- writeRecalibratedSourceTable
- finalizeCharacterization
- consolidateSourceTable
- transformSourceTable
description: |
Visit-level tasks.
Allowed data query constraints: visit
writeRecalibratedSourceTable, transformSourceTable run per-detector
consolidateSourceTable produces one data product per visit.
finalizeCharacterization will eventually model full focal plane PSFs.
```
:::
:::spoiler {state="closed"} Click here to toggle the Merian step 2d graph.<br><br>
To generate the pipeline graph:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2d.pdf
```
which gives:
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2d.pdf %}
[pipeline_step2d.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2d.pdf)
:::
Run step 2d:
```bash=
LOGFILE=$LOGDIR/merian9813_step2d.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2d \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
### 5.2e. Step 2e - make final visit and CCD visit tables (NEW IN 2023)
Generation of final visit tables and CCD visit tables takes place in step 2e.
:::spoiler {state="closed"} Click here to toggle the Merian step 2e YAML.<br><br>
To generate the pipeline YAML:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \
--show pipeline
```
which gives:
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
makeCcdVisitTable:
class: lsst.pipe.tasks.postprocess.MakeCcdVisitTableTask
makeVisitTable:
class: lsst.pipe.tasks.postprocess.MakeVisitTableTask
subsets:
step2e:
subset:
- makeVisitTable
- makeCcdVisitTable
description: |
Global-level tasks that must not be run with any data query constraints
Can be run anytime after subset step2d.
Allowed data query constraints: instrument
Tasks generate one data product per collection.
make[Ccd]VisitTable produces per-collection summary of the Visits
and CcdVisits.
```
:::
:::spoiler {state="closed"} Click here to toggle the Merian step 2e graph.<br><br>
To generate the pipeline graph:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step2e.pdf
```
which gives:
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2e.pdf %}
[pipeline_step2e.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step2e.pdf)
:::
Run step 2e:
```bash=
LOGFILE=$LOGDIR/merian9813_step2e.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step2e \
-d "instrument='DECam' AND $DATAQUERY" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
### 5.3. Step 3 - coadd processing
Coadd-processing takes place in step 3. A large number of tasks are performed during this step, including, but not limited to:
:::spoiler {state="closed"} Click here to toggle the Merian step 3 YAML.<br><br>
To generate the pipeline YAML:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \
--show pipeline
```
which gives:
```yaml=
description: |
The DRP pipeline specialized for the DECam instrument, developed against the
Merian dataset.
instrument: lsst.obs.decam.DarkEnergyCamera
parameters:
band: i
tasks:
makeWarp:
class: lsst.pipe.tasks.makeCoaddTempExp.MakeWarpTask
config:
- makePsfMatched: true
- python: |
config.warpAndPsfMatch.psfMatch.kernel['AL'].alardSigGauss = [1.0, 2.0, 4.5]
from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask
config.select.retarget(PsfWcsSelectImagesTask)
connections.photoCalibName: jointcal
matchingKernelSize: 29
doApplyExternalPhotoCalib: true
externalPhotoCalibName: jointcal
doApplyExternalSkyWcs: true
modelPsf.defaultFwhm: 7.7
warpAndPsfMatch.warp.warpingKernelName: lanczos5
coaddPsf.warpingKernelName: lanczos5
useGlobalExternalPhotoCalib: false
doWriteEmptyWarps: true
assembleCoadd:
class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask
config:
- doInputMap: true
- python: |
config.removeMaskPlanes.append("CROSSTALK")
config.badMaskPlanes += ["SUSPECT"]
from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask
config.select.retarget(PsfWcsSelectImagesTask)
matchingKernelSize: 29
doApplyExternalPhotoCalib: true
externalPhotoCalibName: jointcal
doApplyExternalSkyWcs: true
subregionSize: (10000, 200)
doNImage: true
interpImage.transpose: true
coaddPsf.warpingKernelName: lanczos5
assembleStaticSkyModel.subregionSize: (10000, 200)
assembleStaticSkyModel.doApplyExternalPhotoCalib: true
assembleStaticSkyModel.externalPhotoCalibName: jointcal
assembleStaticSkyModel.doApplyExternalSkyWcs: true
doFilterMorphological: true
useGlobalExternalPhotoCalib: false
assembleStaticSkyModel.useGlobalExternalPhotoCalib: false
doAttachTransmissionCurve: false
detection:
class: lsst.pipe.tasks.multiBand.DetectCoaddSourcesTask
mergeDetections:
class: lsst.pipe.tasks.mergeDetections.MergeDetectionsTask
deblend:
class: lsst.pipe.tasks.deblendCoaddSourcesPipeline.DeblendCoaddSourcesMultiTask
measure:
class: lsst.pipe.tasks.multiBand.MeasureMergedCoaddSourcesTask
mergeMeasurements:
class: lsst.pipe.tasks.mergeMeasurements.MergeMeasurementsTask
writeObjectTable:
class: lsst.pipe.tasks.postprocess.WriteObjectTableTask
transformObjectTable:
class: lsst.pipe.tasks.postprocess.TransformObjectCatalogTask
consolidateObjectTable:
class: lsst.pipe.tasks.postprocess.ConsolidateObjectTableTask
forcedPhotCoadd:
class: lsst.meas.base.forcedPhotCoadd.ForcedPhotCoaddTask
selectGoodSeeingVisits:
class: lsst.pipe.tasks.selectImages.BestSeeingQuantileSelectVisitsTask
config:
- connections.goodVisits: goodSeeingVisits
templateGen:
class: lsst.pipe.tasks.assembleCoadd.CompareWarpAssembleCoaddTask
config:
- doSelectVisits: true
assembleStaticSkyModel.doSelectVisits: true
connections.selectedVisits: goodSeeingVisits
connections.outputCoaddName: goodSeeing
connections.coaddExposure: goodSeeingCoadd
- python: |
config.removeMaskPlanes.append("CROSSTALK")
config.badMaskPlanes += ["SUSPECT"]
from lsst.pipe.tasks.selectImages import PsfWcsSelectImagesTask
config.select.retarget(PsfWcsSelectImagesTask)
matchingKernelSize: 29
doApplyExternalPhotoCalib: true
externalPhotoCalibName: jointcal
doApplyExternalSkyWcs: true
subregionSize: (10000, 200)
doNImage: true
interpImage.transpose: true
coaddPsf.warpingKernelName: lanczos5
assembleStaticSkyModel.subregionSize: (10000, 200)
assembleStaticSkyModel.doApplyExternalPhotoCalib: true
assembleStaticSkyModel.externalPhotoCalibName: jointcal
assembleStaticSkyModel.doApplyExternalSkyWcs: true
doFilterMorphological: true
useGlobalExternalPhotoCalib: false
assembleStaticSkyModel.useGlobalExternalPhotoCalib: false
doAttachTransmissionCurve: false
contracts:
- contract: '''calib_psf_candidate'' not in measure.propagateFlags.source_flags if
makeWarp.doApplyFinalizedPsf else True'
- contract: '''calib_psf_reserved'' not in measure.propagateFlags.source_flags if
makeWarp.doApplyFinalizedPsf else True'
- contract: '''calib_psf_used'' not in measure.propagateFlags.source_flags if makeWarp.doApplyFinalizedPsf
else True'
- contract: selectGoodSeeingVisits.connections.goodVisits == templateGen.connections.selectedVisits
subsets:
multiband:
subset:
- detection
- mergeDetections
- deblend
- measure
- mergeMeasurements
- forcedPhotCoadd
description: 'A set of tasks to run when making measurements on coadds.
'
objectTable:
subset:
- consolidateObjectTable
- writeObjectTable
- transformObjectTable
description: 'A set of tasks to transform multiband outputs into a parquet object
table.
'
step3:
subset:
- consolidateObjectTable
- mergeMeasurements
- detection
- mergeDetections
- selectGoodSeeingVisits
- writeObjectTable
- deblend
- templateGen
- assembleCoadd
- transformObjectTable
- measure
- makeWarp
- forcedPhotCoadd
description: |
Tract-level tasks that can be run together, but only after the 'step1'
and 'step2' subsets.
These should be run with explicit 'tract' constraints essentially all the
time, because otherwise quanta will be created for jobs with only partial visit
coverage.
```
:::
:::spoiler {state="closed"} Click here to toggle the Merian step 3 graph.<br><br>
To generate the pipeline graph:
```bash=
pipetask build \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \
--pipeline-dot /tmp/pipeline.dot; \
dot /tmp/pipeline.dot -Tpdf > $LOGDIR/pipeline_step3.pdf
```
which gives:
{%pdf https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step3.pdf %}
[pipeline_step3.pdf](https://tigress-web.princeton.edu/~lkelvin/pipelines/current/drp_pipe/DECam/DRP-Merian/pipeline_drp_pipe_DECam_DRP-Merian_step3.pdf)
:::
Run step 3:
```bash=
LOGFILE=$LOGDIR/merian9813_step3.log; \
date | tee $LOGFILE; \
pipetask --long-log run --register-dataset-types -j 12 \
-b $REPO --instrument lsst.obs.decam.DarkEnergyCamera \
-i $INPUT \
-o $OUTPUT \
-p $DRP_PIPE_DIR/pipelines/DECam/DRP-Merian.yaml#step3 \
-d "instrument='DECam' AND skymap='decam_rings_v1' AND tract=9813" \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
## A. Useful commands
This section provides some useful command-line commands which may be used to interact with the data.
### query-collections
Query collections in the repo:
```bash=
butler query-collections $REPO "*u/lskelvin*"
```
The final search pattern may use standard glob syntax (e.g., note the asterisks above).
### query-datasets
Query the datasets which live in a given collection:
```bash=
butler query-datasets $REPO \
--collections u/lskelvin/testrun/01 \
--where "instrument='DECam' AND skymap='hsc_rings_v1' AND tract=9813" \
calexp
```
If the final dataset type (`calexp` in the example above) is not given, all dataset types found will be printed to the command line.
### collection-chain
Redefine a CHAINED collection to only contain certain child RUN collections:
```bash=
butler collection-chain $REPO PARENT "CHILD1,CHILD2"
```
This command is useful to use prior to attempting to delete a CHAINED collection, ensuring that no attempt is made to delete input raw collections.
### remove-runs
Remove one or more RUN collections:
```bash=
butler remove-runs $REPO COLLECTION
```
### remove-collections
Remove one or more non-RUN collections:
```bash=
butler remove-collections $REPO COLLECTION
```
## B. What tracts cover my data?
The `visitSummary` tables produced in step 2a contain important information on single frame processed visits. This information may be used to find out which tracts overlap with your data.
To generate a list of tract overlaps for a single visit, in Python:
```python=
from collections import defaultdict
import lsst.daf.butler as dafButler
butler = dafButler.Butler('/project/lskelvin/repo')
grouped_by_tract = defaultdict(set)
for data_id in butler.registry.queryDataIds(
["tract", "visit", "detector"],
datasets="visitSummary",
collections="DECam/runs/merian9813/w_2022_26",
instrument="DECam",
visit=971666,
):
grouped_by_tract[data_id["tract"]].add(data_id)
print({k: len(v) for k, v in grouped_by_tract.items()})
```
To get total tract coverage for *all* visits in a given collection, remove the `visit=` argument above.
## C. Transferring datasets from one machine to another
To transfer datasets from one machine to another (e.g., from SLAC to Princeton), first, on the source machine in Python:
```python=
outdir = "/sdf/data/rubin/u/lskelvin/merian"
datasetType = ["objectTable_tract", "deepCoadd", "deepCoadd_calexp"]
collection = "HSC/runs/RC2/w_2022_04/DM-33402"
dataId = dict(skymap="hsc_rings_v1", tract=9813)
with butler.export(directory=outdir, format="yaml", transfer="copy") as export:
items = []
found = set(butler.registry.queryDatasets(datasetType,
collections=collection,
dataId=dataId))
items.extend(found)
export.saveDatasets(items)
```
Next, in the output directory on the source machine:
```bash=
tar -czvf data_transfer.tar.gz *
```
Transfer the file (here named `data_transfer.tar.gz`) from the source machine to the destination machine. Extract the tarball on the source machine:
```bash=
tar -xzvf data_transfer.tar.gz
```
Next, on the source machine:
```bash=
LOGFILE=$LOGDIR/data_import.log; \
butler import $REPO \
/path/to/data_transfer_directory \
--transfer copy \
--skip-dimensions skymap,tract,patch \
2>&1 | tee -a $LOGFILE; \
date | tee -a $LOGFILE
```
Finally, set up a similarly named parent collection, e.g.:
```bash=
PARENT=HSC/runs/RC2/w_2022_04/DM-33402
CHILD=HSC/runs/RC2/w_2022_04/DM-33402/20220128T212035Z
butler collection-chain $REPO $PARENT $CHILD
```
## D. Decertifying a calibration dataset
To decertify a calibration collection (because, e.g., a new calibration collection has been generated and intended to replace the existing certified data on-disk):
```python=
writeable_butler = dafButler.Butler(
'/projects/MERIAN/repo', writeable=True
)
writeable_butler.registry.decertify(
collection='DECam/calib/merian',
datasetType='bias',
timespan=lsst.daf.butler.Timespan(None, None),
)
writeable_butler.registry.decertify(
collection='DECam/calib/merian',
datasetType='flat',
timespan=lsst.daf.butler.Timespan(None, None),
)
```