# DDFacet tutorial for beginner
This documentation gives a brief overview of how to use DDFacet to input a LOFAR-type measurementSet and output reconstructed images.
## Biblio
:page_facing_up: [DDFacet](https://arxiv.org/pdf/1712.02078) *C. Tasse, B. Hugo, M. Mirmont, O. Smirnov, M. Atemkeng, L. Bester, M.J. Hardcastle4, R. Lakhoo, S. Perkins and T. Shimwel «Multi-core multi-node parallelization of the radio interferometric imaging pipeline DDFacet»*
:page_facing_up: [DDFacet parallel](https://arxiv.org/pdf/1712.02078) *N. Monnier, D. Guibert, C. Tasse, N. Gac, F. Orieux, E. Raffin, O. M. Smirnov, B. V. Hugo «Faceting for direction-dependent spectral deconvolution»*
## Context
**DDFacet** is a radio astronomy imaging pipeline currently used in production on telescopes such as LOFAR and MeerKat. Its main purpose is to transform **MeasurementSets** (MS) — raw observational data collected by radio interferometers — into reconstructed sky images in FITS format (`*.fits`). MeasurementSets contain the measured visibilities, while the resulting FITS files provide calibrated images of the sky, ready for analysis or calibration purposes.
The pipeline performs **computationally intensive** and **highly parameterized calculations**, allowing it to adapt to different observational setups and scientific requirements.
All available option commands are listed in the section [Exhaustive list of DDF commands](#exhaustive-list-of-ddf-commands)
, and their explanations are provided in [Key parameter explanation](#key-parameter-explanation).
To facilitate execution on **High Performance Computing (HPC) systems**, DDFacet's team provides a **parallel** and **parameterized** implementation packaged in a Singularity container (`.sif`). Singularity is a container technology designed for HPC, enabling complex applications to run with all dependencies in an isolated and reproducible environment.
This tutorial provides:
- Examples of running DDFacet on your own laptop in the section [Running DDFacet on your own laptop](#running-ddfacet-on-your-own-laptop), to help users familiarize themselves with the pipeline and file formats.
- A procedure for deploying DDFacet on a multi-node HPC cluster using Slurm, described in the section Multinode Execution, for processing large MeasurementSets efficiently in parallel.
## Requirements
- Download ddfacet singularity image: `ddf_dev_np1.22.4.sif` [NAS - vaader](https://nasext-vaader.insa-rennes.fr/ietr-vaader/)
- Download parset example: `Template.parset` [NAS - vaader](https://nasext-vaader.insa-rennes.fr/ietr-vaader/)
- Having a measurementSet (i.e. **\*.ms** ==> set of folders)
- (option) Download the script that facilitates reading **\*.fits** (based on the ds9 tool): `dsm.py` [NAS - vaader](https://nasext-vaader.insa-rennes.fr/ietr-vaader/)
## Exhaustive list of DDF commands
<details>
<summary style="cursor: pointer; color: #007bff;"> Click here to reveal the section </summary>
| 📝 **Note** |
| ------------------------------------------------------------ |
|This section provides a comprehensive list of available DDF commands. It is based on the output of `DDF.py -h`. Not all commands are used in this tutorial; this is mainly for reference and advanced users. |
```
Usage: DDF.py [parset file] [options]
Questions and suggestions: cyril.tasse@obspm.fr
Options:
--version show program's version number and exit
-h, --help show this help message and exit
Visibility data options:
--Data-MS=MS(s) Single MS name, or list of comma-separated MSs, or
name of *.txt file listing MSs. Note that each MS
may also be specified as a glob pattern (e.g. *.MS),
and may be suffixed with "//Dx" and/or "//Fy" to
select specific DATA_DESC_ID and FIELD_IDs in the MS.
"x" and "y" can take the form of a single number, a
Pythonic range (e.g. "0:16"), an inclusive range
("0~15"), or "*" to select all. E.g.
"foo.MS//D*//F0:2" selects all DDIDs, and fields 0 and
1 from foo.MS. If D and/or F is not specified,
--Selection-Field and --Selection-DDID is used as the
default. (default: )
--Data-ColName=COLUMN
MS column to image (default: CORRECTED_DATA)
--Data-ChunkHours=N
Process data in chunks of less than or equal to N hours. Use 0 for no
chunking. (default: 0.0)
--Data-Sort=0|1 if True, data will be resorted by baseline-time order
internally. This usually speeds up processing.
(default: False)
Predict:
--Predict-ColName=COLUMN
MS column to write predict to. Can be empty to
disable. (default: none)
--Predict-MaskSquare=IMAGE
Use this field if you want to predict (in/out)side a
square region. Syntax is (MaskOutSide,NpixInside). For
example setting (0,1000) will predict the outer
(1000x1000) square only (default: none)
--Predict-FromImage=IMAGE
In --Image-Mode=Predict, will predict data from this
image, rather than --Data-InitDicoModel (default:
none)
--Predict-InitDicoModel=FILENAME
Resume deconvolution from given DicoModel (default:
none)
--Predict-Overwrite=0|1
Allow overwriting of predict column (default: True)
Data selection options:
--Selection-Field=FIELD
default FIELD_ID to read, if not specified in --Data-
MS. (default: 0)
--Selection-DDID=DDID
default DATA_DESC_ID to read, if not specified in
--Data-MS. (default: 0)
--Selection-TaQL=TaQL
additional TaQL selection string (default: )
--Selection-ChanStart=N
First channel (default: 0)
--Selection-ChanEnd=N
Last channel+1, -1 means up and including last
channel. (default: -1)
--Selection-ChanStep=N
Channel stepping (default: 1)
--Selection-FlagAnts=ANT,...
List of antennas to be flagged, e.g. "RS,CS017LBA"
(default: )
--Selection-UVRangeKm=KM_MIN,KM_MAX
Select baseline range (default: [0, 2000])
--Selection-TimeRange=SELECTION_TIMERANGE
Select time range (two comma separated values)
containing UTC start and end times in ISO8601
(default: )
--Selection-TimeRangeFromStartMin=SELECTION_TIMERANGEFROMSTARTMIN
In minutes before start of obs. (default: )
--Selection-DistMaxToCore=KM
Select antennas by specifying a maximum distance to
core (default: )
--Selection-AutoFlagNyquist=SELECTION_AUTOFLAGNYQUIST
flag those baselines that are not properly sampled
(default: 0)
Options for input and output image names:
--Output-Mode=Dirty|Clean|Predict|PSF
What to do. (default: Clean)
--Output-Clobber=0|1
Allow overwriting of existing parset and images (can't
be specified via parset!) (default: False)
--Output-Name=BASENAME
Base name of output images (default: image)
--Output-ShiftFacetsFile=OUTPUT_SHIFTFACETSFILE
Astrometric correction per facet, when Image-
Mode=RestoreAndShift (default: none)
--Output-RestoringBeam=OUTPUT_RESTORINGBEAM
(default: none)
--Output-Also=CODES
Save also these images (i.e. adds to the default set
of --Output-Images) (default: )
--Output-Cubes=CODES
Also save cube versions for these images (only MmRrIi
codes recognized) (default: )
--Output-Images=OUTPUT_IMAGES
Combination of letter codes indicating what images to
save. Uppercase for intrinsic flux scale [D]irty,
[M]odel, [C]onvolved model, [R]esiduals, restored
[I]mage; Lowercase for apparent flux scale [d]irty,
[m]odel, [c]onvolved model, [r]esiduals, restored
[i]mage; Other images: [P]SF, [N]orm, [n]orm facets,
[S] flux scale, [A]lpha (spectral index), [X] mixed-
scale (intrinsic model, apparent residuals, i.e.
Cyrils original output), [o] intermediate mOdels
(Model_i), [e] intermediate rEsiduals (Residual_i),
[k] intermediate masK image, [z] intermediate auto
mask-related noiZe image, [g] intermediate dirty
images (only if [Debugging]
SaveIntermediateDirtyImages is enabled). [F] intrinsic
MFS restored image [f] apparent MFS restored image Use
"all" to save all. (default: DdPAMRIikemz)
--Output-StokesResidues=OUTPUT_STOKESRESIDUES
After cleaning Stokes I, output specified residues if
[r] or [R] is specified in option Output-Images. Note
that the imager does not perform deconvolution on any
Stokes products other than I - it only outputs
residues. (default: I)
SPIMaps:
--SPIMaps-AlphaThreshold=N
Multiple of the RMS in final residual which determines
threshold for fitting alpha map. (default: 15)
General imager settings:
--Image-NPix=NPIX Image size. (default: 5000)
--Image-Cell=ARCSEC
Cell size. (default: 5.0)
--Image-PhaseCenterRADEC=RA,DEC
Use non-default phase centre. If "align" is used, all
MSs will be rephased to the phase centre of the first
MS. Otherwise, specify [HH:MM:SS,DD:MM:SS] direction.
If empty, no rephasing is done. (default: none)
--Image-SidelobeSearchWindow=NPIX
Size of PSF subwindow (centred around the main lobe)
to search for the highest sidelobe when fitting the
PSF size. (default: 200)
Spacial tessellation settings:
--Facets-NFacets=N Number of facets to use. (default: 3)
--Facets-CatNodes=FACETS_CATNODES
(default: none)
--Facets-DiamMax=DEG
Max facet size, for tessellations. Larger facets will
be broken up. (default: 180.0)
--Facets-DiamMin=DEG
Min facet size, for tessellations. Smaller facets will
be merged. (default: 0.0)
--Facets-MixingWidth=FACETS_MIXINGWIDTH
Sigma of the gaussian (in pixels) being used to mix
the facets on their edges (default: 10)
--Facets-PSFOversize=X
For cleaning, use oversize PSF relative to size of
facet. (default: 1.0)
--Facets-PSFFacets=N
Number of PSF facets to make. 0: same as NFacets (one
PSF per facet) 1: one PSF for entire field. (default:
0)
--Facets-Padding=FACTOR
Facet padding factor. (default: 1.7)
--Facets-Circumcision=N
Set to non-0 to override NPixMin computation in
FacetsToIm(). Debugging option, really. (default: 0)
--Facets-FluxPaddingAppModel=FACETS_FLUXPADDINGAPPMODEL
For flux-dependent facet-padding, the apparant model
image (or cube) (default: none)
--Facets-FluxPaddingScale=FACETS_FLUXPADDINGSCALE
The factor applied to the --Facets-Padding for the
facet with the highest flux (default: 2.0)
--Facets-SkipTh=FACETS_SKIPTH
Skip gridding/degridding if the mean Jones power is
lower than this level (useful in mosaicing mode)
(default: 0.0)
Data and imaging weight settings:
--Weight-ColName=COLUMN
Read data weights from specified column. Use
WEIGHT_SPECTRUM or WEIGHT, more rarely IMAGING_WEIGHT.
You can also specify a list of columns like using
--Weight-ColName=[WEIGHT_SPECTRUM,IMAGING_WEIGHT]
(default: WEIGHT_SPECTRUM)
--Weight-Mode=Natural|Uniform|Robust|Briggs
Image weighting. (default: Briggs)
--Weight-MFS=0|1 If True, MFS uniform/Briggs weighting is used (all
channels binned onto one uv grid). If 0, binning is
per-band. (default: True)
--Weight-Robust=R Briggs robustness parameter, from -2 to 2. (default:
0.0)
--Weight-SuperUniform=X
Super/subuniform weighting: FoV for weighting purposes
is taken as X*Image_Size (default: 1.0)
--Weight-OutColName=COLUMN
Save the internally computed weights into this column
(default: none)
--Weight-EnableSigmoidTaper=WEIGHT_ENABLESIGMOIDTAPER
Toggles sigmoid tapering type:bool (default: 0)
--Weight-SigmoidTaperInnerCutoff=WEIGHT_SIGMOIDTAPERINNERCUTOFF
Inner taper cutoff in uvwavelengths type:float
(default: 0.0)
--Weight-SigmoidTaperOuterCutoff=WEIGHT_SIGMOIDTAPEROUTERCUTOFF
Outer taper cutoff in uvwavelengths type:float
(default: 0.0)
--Weight-SigmoidTaperInnerRolloffStrength=WEIGHT_SIGMOIDTAPERINNERROLLOFFSTRENGTH
Rolloff strength on inner taper if enabled. 1.0 is
essentially a boxcar, 0.0 means very long rolloffs
type:float (default: 0.5)
--Weight-SigmoidTaperOuterRolloffStrength=WEIGHT_SIGMOIDTAPEROUTERROLLOFFSTRENGTH
Rolloff strength on outer taper if enabled. 1.0 is
essentially a boxcar, 0.0 means very long rolloffs
type:float (default: 0.5)
Low level parameters related to the forward and backward image to visibility spaces transforms:
--RIME-Precision=S|D
Single or double precision gridding. DEPRECATED?
(default: S)
--RIME-PolMode=I|IQ|IU|IV|IQU|IQUV
Polarization mode. (default: I)
--RIME-FullMTilde=RIME_FULLMTILDE
Uee the full MTilde as described in the paper to do
the image plane correction type:bool (default: False)
--RIME-FFTMachine=RIME_FFTMACHINE
(default: FFTW)
--RIME-ForwardMode=BDA-degrid|Classic|Montblanc
Forward predict mode. (default: BDA-degrid)
--RIME-BackwardMode=BDA-grid|Classic
Backward mode. (default: BDA-grid)
--RIME-DecorrMode=RIME_DECORRMODE
decorrelation mode (default: )
--RIME-DecorrLocation=Center|Edge
where decorrelation is estimated (default: Edge)
Imager convolution function settings:
--CF-OverS=N Oversampling factor. (default: 11)
--CF-Support=N CF support size. (default: 7)
--CF-Nw=PLANES Number of w-planes. Setting this to 1 enables AIPS
style faceting. (default: 100)
--CF-wmax=METERS Maximum w coordinate. Visibilities with larger w will
not be gridded. If 0, no maximum is imposed. (default:
0.0)
Compression settings (baseline-dependent averaging [BDA] and sparsification):
--Comp-GridDecorr=X
Maximum BDA decorrelation factor (gridding) (default:
0.02)
--Comp-GridFoV=Full|Facet
FoV over which decorrelation factor is computed
(gridding) (default: Facet)
--Comp-DegridDecorr=X
Maximum BDA decorrelation factor (degridding)
(default: 0.02)
--Comp-DegridFoV=Full|Facet
FoV over which decorrelation factor is computed
(degridding) (default: Facet)
--Comp-Sparsification=N1,N2,...
apply sparsification compression to initial major
cycles. Sparsification refers to throwing away random
visibilities. Supply a list of factors: e.g. 100,30,10
would mean only 1/100 of the data is used for the
first major cycle, 1/30 for the second, 1/10 for the
third, and full data for the fourth cycle onwards.
This can substantially accelerate deconvolution of
deep observations, since, in these regimes, very
little sensitivity is required for model construction
in the initial cycles. (default: 0)
--Comp-BDAMode=1|2 BDA block computation mode. 1 for Cyril's old mode, 2
for Oleg's new mode. 2 is faster but see issue #319.
(default: 1)
--Comp-BDAJones=COMP_BDAJONES
If disabled, gridders and degridders will apply a
Jones terms per visibility. If 'grid', gridder will
apply them per BDA block, if 'both' so will the
degridder. This is faster but possibly less accurate,
if you have rapidly evolving Jones terms. (default: 0)
Parallelization options:
--Parallel-NCPU=N Number of CPUs to use in parallel mode. 0: use all
available. 1: disable parallelism. (default: 0)
--Parallel-Affinity=PARALLEL_AFFINITY
pin processes to cores. -1/1/2 determines stepping
used in selecting cores. Alternatively specifies a
list of length NCPU. Alternatively "disable" to
disable affinity settings Alternatively "enable_ht"
uses stepping of 1 (equivalent to
Parallel.Affinity=1), will use all vthreads - the
obvious exception is if HT is disabled at BIOS level
Alternatively "disable_ht" autodetects the NUMA layout
of the chip for Debian-based systems and dont use both
vthreads per core Use 1 if unsure. (default: 1)
--Parallel-MainProcessAffinity=PARALLEL_MAINPROCESSAFFINITY
this should be set to a core that is not used by
forked processes, this option is ignored when using
option "disable or disable_ht" for Parallel.Affinity
(default: 0)
--Parallel-MotherNode=PARALLEL_MOTHERNODE
(default: localhost)
Cache management options:
--Cache-Reset=0|1 Reset all caches (including PSF and dirty image)
(default: False)
--Cache-Jones=reset|auto
Reset cached Jones (default: auto)
--Cache-SmoothBeam=reset|auto|force
Reset cached smooth beam (default: auto)
--Cache-Weight=reset|auto
Reset cached weight (default: auto)
--Cache-PSF=off|reset|auto|force
Cache PSF data. (default: auto)
--Cache-Dirty=off|reset|auto|forcedirty|forceresidual
Cache dirty image data. (default: auto)
--Cache-VisData=off|auto|force
Cache visibility data and flags at runtime. (default:
auto)
--Cache-LastResidual=0|1
Cache last residual data (at end of last minor cycle)
(default: True)
--Cache-Dir=CACHE_DIR
Directory to store caches in. Default is to keep cache
next to the MS, but this can cause performance issues
with e.g. NFS volumes. If you have fast local storage,
point to it. %metavar:DIR (default: )
--Cache-DirWisdomFFTW=CACHE_DIRWISDOMFFTW
Directory in which to store the FFTW wisdom files
(default: ~/.fftw_wisdom)
--Cache-ResetWisdom=0|1
Reset Wisdom file (default: False)
--Cache-CF=0|1 Cache convolution functions. With many CPUs, may be
faster to recompute. (default: True)
--Cache-HMP=0|1 Cache HMP basis functions. With many CPUs, may be
faster to recompute. (default: False)
Apply E-Jones (beam) during imaging:
--Beam-Model=None|LOFAR|FITS|GMRT|ATCA
Beam model to use. (default: none)
--Beam-At=facet|tessel
when DDESolutions are enabled, compute beam per facet,
or per larger solution tessel (default: facet)
--Beam-PhasedArrayMode=A|AE
PhasedArrayMode beam mode. (default: AE)
--Beam-NBand=N Number of channels over which same beam value is used.
0 means use every channel. (default: 0)
--Beam-CenterNorm=0|1
Normalize beam so that its amplitude at the centre is
1. (default: False)
--Beam-Smooth=BEAM_SMOOTH
Compute the interpolated smooth beam (default: False)
--Beam-SmoothNPix=BEAM_SMOOTHNPIX
Number of pixels the beam is evaluated and smoothed
(default: 11)
--Beam-SmoothInterpMode=BEAM_SMOOTHINTERPMODE
Linear/Log (default: Linear)
--Beam-FITSFile=BEAM_FITSFILE
Beam FITS file pattern. A beam pattern consists of
eight FITS files, i.e. a real and imaginary part for
each of the four Jones terms. The following
substitutions are performed to form up the eight
filenames: $(corr) or $(xy) is replaced by the Jones
element label (e.g. "xx" or "rr"), $(reim) is replaced
by "re" or "im", $(realimag) is replaced by "real" or
"imag". Uppercase variables are replaced by uppercase
values, e.g. $(REIM) by "RE" pr "IM". Use "unity" if
you want to apply a unity matrix for the E term (e.g.
only want to do visibility derotations). Correlation
labels (XY or RL) are determined by reading the MS,
but may be overridden by the FITSFeed option. To use a
heterogeneous mix of beams you have to first type
specialize the antennas using a json configuration of
the following format: {'lband': { 'patterns': {
'cmd::default': ['$(stype)_$(corr)_$(reim).fits',...],
}, 'define-stationtypes': { 'cmd::default': 'meerkat',
'ska000': 'ska' } }, ... } This will substitute
'meerkat' for all antennas but ska000, with
'meerkat_$(corr)_$(reim).fits' whereas beams for
ska000 will be loaded from 'ska_$(corr)_$(reim).fits'
in this example. The station name may be specified as
regex by adding a '~' infront of the pattern to match,
e.g '~ska[0-9]{3}': 'ska' will assgign all the 'ska'
type to all matching names such as ska000, ska001,
..., skaNNN. Each station type in the pattern section
may specify a list of patterns for different frequency
ranges. Multiple keyed dictionaries such as this may
be specified within one file. They will be treated as
chained configurations, adding more patterns and
station-types to the first such block. Warning: Once a
station is type-specialized the type applies to
**ALL** chained blocks! Blocks from more than one
config file can be loaded by comma separation, e.g. '
--Beam-FITSFile conf1.json,conf2.json,...', however no
block may define multiple types for any station. If
patterns for a particular station type already exists
more patterns are just appended to the existing list.
Warning: where multiple patterns specify the same
frequency range the first such pattern closest to the
MS SPW frequency coverage will be loaded. If no
configuration file is provided the pattern may not
contain $(stype) -- station independence is assumed.
This is the same as specifing the following config:
{'lband': { 'patterns': { 'cmd::default':
['$(corr)_$(reim).fits',...], }, 'define-
stationtypes': { 'cmd::default': 'cmd::default' } }
(default: beam_$(corr)_$(reim).fits)
--Beam-FITSFeed=None|xy|XY|rl|RL
If set, overrides correlation labels given by the
measurement set. (default: none)
--Beam-FITSFeedSwap=0|1
swap feed patterns (X to Y and R to L) (default:
False)
--Beam-DtBeamMin=MIN
change in minutes on which the beam is re-evaluated
(default: 5.0)
--Beam-FITSParAngleIncDeg=DEG
increment in PA in degrees at which the beam is to be
re-evaluated (on top of DtBeamMin) (default: 5.0)
--Beam-FITSLAxis=AXIS
L axis of FITS file. Minus sign indicates reverse
coordinate convention. (default: -X)
--Beam-FITSMAxis=AXIS
M axis of FITS file. Minus sign indicates reverse
coordinate convention. (default: Y)
--Beam-FITSVerbosity=LEVEL
set to >0 to have verbose output from FITS
interpolator classes. (default: 0)
--Beam-FITSFrame=BEAM_FITSFRAME
coordinate frame for FITS beams. Currently, alt-az,
equatorial and zenith mounts are supported. (default:
altaz)
--Beam-FeedAngle=BEAM_FEEDANGLE
offset feed angle to add to parallactic angle
(default: 0.0)
--Beam-ApplyPJones=BEAM_APPLYPJONES
derotate visibility data (only when FITS beam is
active and also time sampled). If you have equatorial
mounts this is not what you should be doing! (default:
0)
--Beam-FlipVisibilityHands=BEAM_FLIPVISIBILITYHANDS
apply anti-diagonal matrix if FITS beam is enabled
effectively swapping X and Y or R and L and their
respective hands (default: 0)
Multifrequency imaging options:
--Freq-BandMHz=MHz Gridding cube frequency step. If 0, --Freq-NBand is
used instead. (default: 0.0)
--Freq-FMinMHz=MHz Gridding cube frequency Min. If 0, is ignored.
(default: 0.0)
--Freq-FMaxMHz=MHz Gridding cube frequency Max. If 0, is ignored.
(default: 0.0)
--Freq-DegridBandMHz=MHz
Degridding cube frequency step. If 0, --Freq-
NDegridBand is used instead. (default: 0.0)
--Freq-NBand=N Number of image bands for gridding. (default: 1)
--Freq-NDegridBand=N
Number of image bands for degridding. 0 means degrid
each channel. (default: 0)
Apply DDE solutions during imaging (@cyriltasse please document this section):
--DDESolutions-DDSols=DDESOLUTIONS_DDSOLS
Name of the DDE solution file (default: )
--DDESolutions-SolsDir=DDESOLUTIONS_SOLSDIR
Name of the directry of the DDE Solutions which
contains
[SolsDir]/[MSNames]/killMS.[SolsName].sols.npz
(default: none)
--DDESolutions-GlobalNorm=DDESOLUTIONS_GLOBALNORM
Option to normalise the Jones matrices (options:
MeanAbs, MeanAbsAnt, BLBased or SumBLBased). See code
for more detail (default: none)
--DDESolutions-JonesNormList=DDESOLUTIONS_JONESNORMLIST
Deprecated? (default: AP)
--DDESolutions-JonesMode=Scalar|Diag|Full
(default: Full)
--DDESolutions-DDModeGrid=DDESOLUTIONS_DDMODEGRID
In the gridding step, apply Jones matrices Amplitude
(A) or Phase (P) or Amplitude&Phase (AP) (default: AP)
--DDESolutions-DDModeDeGrid=DDESOLUTIONS_DDMODEDEGRID
In the degridding step, apply Jones matrices Amplitude
(A) or Phase (P) or Amplitude&Phase (AP) (default: AP)
--DDESolutions-ScaleAmpGrid=DDESOLUTIONS_SCALEAMPGRID
Deprecated? (default: 0)
--DDESolutions-ScaleAmpDeGrid=DDESOLUTIONS_SCALEAMPDEGRID
Deprecated? (default: 0)
--DDESolutions-CalibErr=DDESOLUTIONS_CALIBERR
Deprecated? (default: 10.0)
--DDESolutions-Type=Krigging|Nearest
Deprecated? (default: Nearest)
--DDESolutions-Scale=DEG
Deprecated? (default: 1.0)
--DDESolutions-gamma=DDESOLUTIONS_GAMMA
Deprecated? (default: 4.0)
--DDESolutions-RestoreSub=DDESOLUTIONS_RESTORESUB
Deprecated? (default: False)
--DDESolutions-ReWeightSNR=DDESOLUTIONS_REWEIGHTSNR
Deprecated? (default: 0.0)
Apply pointing offsets to beam during DFT predict. Requires Montblanc in --RIME-ForwardMode.:
--PointingSolutions-PointingSolsCSV=POINTINGSOLUTIONS_POINTINGSOLSCSV
Filename of CSV containing time-variable pointing
solutions. None initializes all antenna pointing
offsets to 0, 0 (default: none)
--PointingSolutions-InterpolationMode=LERP
Interpolation mode (default: LERP)
Common deconvolution options. Not all of these apply to all deconvolution modes:
--Deconv-Mode=HMP|Hogbom|SSD|WSCMS
Deconvolution algorithm. (default: HMP)
--Deconv-MaxMajorMaxMajorIter=N
Max number of major cycles. (default: 20)
--Deconv-MaxMinorIter=N
Max number of (overall) minor cycle iterations (HMP,
Hogbom). (default: 20000)
--Deconv-AllowNegative=0|1
Allow negative components (HMP, Hogbom). (default:
True)
--Deconv-Gain=GAIN Loop gain (HMP, Hogbom). (default: 0.1)
--Deconv-FluxThreshold=Jy
Absolute flux threshold at which deconvolution is
stopped (HMP, Hogbom, SSD). (default: 0.0)
--Deconv-CycleFactor=X
Cycle factor: used to set a minor cycle stopping
threshold based on PSF sidelobe level (HMP, Hogbom).
Use 0 to disable, otherwise 2.5 is a reasonable value,
but may lead to very shallow minor cycle. (default:
0.0)
--Deconv-RMSFactor=X
Set minor cycle stopping threshold to X*{residual RMS
at start of major cycle} (HMP, Hogbom, SSD).
(default: 0.0)
--Deconv-PeakFactor=X
Set minor cycle stopping threshold to X*{peak residual
at start of major cycle} (HMP, Hogbom, SSD).
(default: 0.15)
--Deconv-PrevPeakFactor=X
Set minor cycle stopping threshold to X*{peak residual
at end of previous major cycle} (HMP). (default: 0.0)
--Deconv-NumRMSSamples=N
How many samples to draw for RMS computation. Use 0 to
use all pixels (most precise). (default: 10000)
--Deconv-ApproximatePSF=SF
when --Comp-Sparsification is on, use approximate
(i.e. central facet) PSF for cleaning while operating
above the given sparsification factor (SF). This
speeds up HMP reinitialization in major cycles. A
value of 1-10 is sensible. Set to 0 to always use
precise per-facet PSF. (default: 0)
--Deconv-PSFBox=BOX
determines the size of the PSF subtraction box used in
CLEAN-style deconvolution (if appropriate). Use "auto"
(or "sidelobe") for a Clark-CLEAN-style box taken out
to a certain sidelobe (faster). Use "full" to subtract
the full PSF, Hogbom-style (more accurate, can also
combine with --Image-PSFOversize for maximum
accuracy). Use an integer number to set an explicit
box radius, in pixels. (HMP) (default: auto)
Masking options. The logic being Mask_{i+1} = ExternalMask | ResidualMask | Mask_{i}:
--Mask-External=FILENAME
External clean mask image (FITS format). (default:
none)
--Mask-Auto=MASK_AUTO
Do automatic masking (default: False)
--Mask-AutoRMSFactor=MASK_AUTORMSFACTOR
RMS Factor for automasking HMP (default: 3)
--Mask-SigTh=MASK_SIGTH
set Threshold (in sigma) for automatic masking
(default: 10)
--Mask-FluxImageType=MASK_FLUXIMAGETYPE
If Auto enabled, does the cut of SigTh either on the
ModelConv or the Restored (default: ModelConv)
When using a noise map to HMP or to mask:
--Noise-MinStats=NOISE_MINSTATS
The parameters to compute the noise-map-based mask for
step i+1 from the residual image at step i. Should be
[box_size,box_step] (default: [60, 2])
--Noise-BrutalHMP=NOISE_BRUTALHMP
If noise map is computed, this option enabled, it
first computes an image plane deconvolution with a
high gain value, and compute the noise-map-based mask
using the brutal-restored image (default: True)
Hybrid Matching Pursuit (aka multiscale/multifrequency) mode deconvolution options:
--HMP-Alpha=MIN,MAX,N
List of alphas to fit. (default: [-1.0, 1.0, 11])
--HMP-Scales=LIST List of scales to use. (default: [0])
--HMP-Ratios=HMP_RATIOS
@cyriltasse please document (default: [''])
--HMP-NTheta=N Number of PA steps to use. (default: 6)
--HMP-SolverMode=PI|NNLS
Solver mode: pseudoinverse, or non-negative least
squares. (default: PI)
--HMP-AllowResidIncrease=FACTOR
Allow the maximum residual to increase by at most this
much relative to the lowest residual, before bailing
out due to divergence. (default: 0.1)
--HMP-MajorStallThreshold=X
Major cycle stall threshold. If the residual at the
beginning of a major cycle is above X*residual at the
beginning of the previous major cycle, then we
consider the deconvolution stalled and bail out.
(default: 0.8)
--HMP-Taper=HMP_TAPER
Weighting taper size for HMP fit. If 0, determined
automatically. (default: 0)
--HMP-Support=HMP_SUPPORT
Basis function support size. If 0, determined
automatically. (default: 0)
--HMP-PeakWeightImage=HMP_PEAKWEIGHTIMAGE
weigh the peak finding by given image (default: none)
--HMP-Kappa=HMP_KAPPA
Regularization parameter. If stddev of per-alpha
solutions exceeds the maximum solution amplitude
divided by Kappa, forces a fully-regularized solution.
Use 0 for no such regularization. (default: 0.0)
--HMP-OuterSpaceTh=HMP_OUTERSPACETH
(default: 2.0)
--HMP-FractionRandomPeak=HMP_FRACTIONRANDOMPEAK
(default: none)
Hogbom:
--Hogbom-PolyFitOrder=HOGBOM_POLYFITORDER
polynomial order for frequency fitting (default: 4)
--Hogbom-LinearPeakfinding=Joint|Separate
Perform EVPA-preserving (complex-valued) polarization
CLEAN (Pratley-Johnston-Hollitt) or separate Q and U
cleaning. (default: Joint)
WSCMS:
--WSCMS-NumFreqBasisFuncs=WSCMS_NUMFREQBASISFUNCS
number of basis functions to use for the fit to the
frequency axis (default: 4)
--WSCMS-MultiScale=WSCMS_MULTISCALE
whether to use multi-scale or not (recommended to use
Hogbom if not using multi-scale) (default: True)
--WSCMS-MultiScaleBias=WSCMS_MULTISCALEBIAS
scale bias parameter (smaller values give more weight
to larger scales) (default: 0.55)
--WSCMS-ScaleBasis=WSCMS_SCALEBASIS
the kind of scale kernels to use (only Gauss available
for now) (default: Gauss)
--WSCMS-Scales=WSCMS_SCALES
Scale sizes in pixels/FWHM eg. [0, 4, 8, 16] (if None
determined automatically) (default: none)
--WSCMS-MaxScale=WSCMS_MAXSCALE
The maximum extent of the scale functions in pixels
(default: 250)
--WSCMS-NSubMinorIter=WSCMS_NSUBMINORITER
Number of iterations for the sub minor loop (default:
250)
--WSCMS-SubMinorPeakFact=WSCMS_SUBMINORPEAKFACT
Peak factor of sub minor loop (default: 0.85)
--WSCMS-MinorStallThreshold=WSCMS_MINORSTALLTHRESHOLD
if the peak in the minor cycle decreases by less than
this fraction it has stalled and we go back to the
major cycle (default: 1e-07)
--WSCMS-MinorDivergenceFactor=WSCMS_MINORDIVERGENCEFACTOR
if the peak flux increases by more than this fraction
between minor cycles then it has diverged and we go
back to a major cycle (default: 1.3)
--WSCMS-AutoMask=WSCMS_AUTOMASK
whether to use scale dependent auto-masking (default:
True)
--WSCMS-AutoMaskThreshold=WSCMS_AUTOMASKTHRESHOLD
Threshold at which the scale dependent mask should be
fixed. (default: none)
--WSCMS-AutoMaskRMSFactor=WSCMS_AUTOMASKRMSFACTOR
Default multiple of RMS at which to start AutoMasking
in case no (default: 3)
--WSCMS-CacheSize=WSCMS_CACHESIZE
the number of items to keep in the cache dict before
spilling over to disk (default: 3)
--WSCMS-Padding=WSCMS_PADDING
padding in the minor cycle. Can often be much smaller
than facet padding (default: 1.2)
Montblanc settings (for --Image-PredictMode=Montblanc):
--Montblanc-TensorflowServerTarget=URL
URL for the TensorflowServer, e.g.
grpc://tensorflow.server.com:8888/ (default: )
--Montblanc-LogFile=FILENAME
None to dump as Output-Name.montblanc.log, otherwise
user-specified filename (default: none)
--Montblanc-MemoryBudget=MONTBLANC_MEMORYBUDGET
Predictor memory budget in GiB (default: 4.0)
--Montblanc-LogLevel=NOTSET|DEBUG|INFO|WARNING|ERROR|CRITICAL
Log level to write to console, rest of the messages
goes to log file (default: WARNING)
--Montblanc-SolverDType=single|double
Data type used in solver, (default: double)
--Montblanc-DriverVersion=tf
Backend to use, (default: tf)
SSD deconvolution mode settings:
--SSDClean-Parallel=0|1
Enable parallel mode. (default: True)
--SSDClean-IslandDeconvMode=SSDCLEAN_ISLANDDECONVMODE
Moresane, GA, Sasir, ... (default: GA)
--SSDClean-SSDSolvePars=SSDCLEAN_SSDSOLVEPARS
(default: ['S', 'Alpha'])
--SSDClean-SSDCostFunc=SSDCLEAN_SSDCOSTFUNC
(default: ['Chi2', 'MinFlux'])
--SSDClean-BICFactor=SSDCLEAN_BICFACTOR
(default: 0.0)
--SSDClean-ArtifactRobust=SSDCLEAN_ARTIFACTROBUST
(default: False)
--SSDClean-ConvFFTSwitch=SSDCLEAN_CONVFFTSWITCH
(default: 1000)
--SSDClean-NEnlargePars=SSDCLEAN_NENLARGEPARS
(default: 0)
--SSDClean-NEnlargeData=SSDCLEAN_NENLARGEDATA
(default: 2)
--SSDClean-RestoreMetroSwitch=SSDCLEAN_RESTOREMETROSWITCH
(default: 0)
--SSDClean-MinMaxGroupDistance=SSDCLEAN_MINMAXGROUPDISTANCE
(default: [10, 50])
--SSDClean-MaxIslandSize=SSDCLEAN_MAXISLANDSIZE
(default: 0)
--SSDClean-InitType=SSDCLEAN_INITTYPE
(default: HMP)
SSD2 deconvolution mode settings:
--SSD2-PolyFreqOrder=SSD2_POLYFREQORDER
Add Polyi to --SSDClean-SSDSolvePars. (default: 2)
--SSD2-SolvePars=SSD2_SOLVEPARS
(default: ['Poly'])
--SSD2-InitType=SSD2_INITTYPE
(default: ['HMP', 'MultiSlice:Orieux'])
--SSD2-ConvexifyIslands=SSD2_CONVEXIFYISLANDS
(default: 1)
--SSD2-NLastCyclesDeconvAll=SSD2_NLASTCYCLESDECONVALL
This parameter sets how many of the last cycles will
deconvolve all islands. If set to 0, SSD2 will use
--Deconv-CycleFactor, --Deconv-PeakFactor, --Deconv-
RMSFactor to determine threshold above which islands
are reestimated. If set to 2, in the last 2 major
cycle all islands are estimated. If -1: Always deconv
all islands regardless of the cycle number (default:
1)
MultiSliceDeconv:
--MultiSliceDeconv-Type=MULTISLICEDECONV_TYPE
MORESANE, Orieux, etc (default: MORESANE)
--MultiSliceDeconv-PolyFitOrder=MULTISLICEDECONV_POLYFITORDER
(default: 2)
GAClean:
--GAClean-NSourceKin=GACLEAN_NSOURCEKIN
(default: 50)
--GAClean-NMaxGen=GACLEAN_NMAXGEN
(default: 50)
--GAClean-MinSizeInit=GACLEAN_MINSIZEINIT
(default: 10)
--GAClean-AlphaInitHMP=GACLEAN_ALPHAINITHMP
(default: [-4.0, 1.0, 6])
--GAClean-ScalesInitHMP=GACLEAN_SCALESINITHMP
(default: [0, 1, 2, 4, 8, 16, 24, 32])
--GAClean-GainInitHMP=GACLEAN_GAININITHMP
(default: 0.1)
--GAClean-RatiosInitHMP=GACLEAN_RATIOSINITHMP
(default: [''])
--GAClean-NThetaInitHMP=GACLEAN_NTHETAINITHMP
(default: 4)
--GAClean-MaxMinorIterInitHMP=GACLEAN_MAXMINORITERINITHMP
(default: 10000)
--GAClean-AllowNegativeInitHMP=GACLEAN_ALLOWNEGATIVEINITHMP
(default: False)
--GAClean-RMSFactorInitHMP=GACLEAN_RMSFACTORINITHMP
(default: 3.0)
--GAClean-ParallelInitHMP=0|1
run island init in parallel. Serial mode may reduce
RAM pressure, and could be useful for debugging.
(default: True)
--GAClean-NCPU=GACLEAN_NCPU
number of cores to use for parallel fitness
calculations (in large-island mode). Default of 0
means use as many as specified by --Parallel-NCPU. If
you find yourself running out of memory here, you
might want to specify a small number of cores for this
step. (default: 0)
PyMoresane internal options:
--MORESANE-NMajorIter=MORESANE_NMAJORITER
Maximum number of iterations allowed in the major
loop. Exit condition. (default: 200)
--MORESANE-NMinorIter=MORESANE_NMINORITER
Maximum number of iterations allowed in the minor
loop. Serves as an exit condition when the SNR is does
not reach a maximum. (default: 200)
--MORESANE-Gain=MORESANE_GAIN
Loop gain for the deconvolution. (default: 0.1)
--MORESANE-ForcePositive=MORESANE_FORCEPOSITIVE
Boolean specifier for whether or not a model must be
strictly positive. (default: True)
--MORESANE-SigmaCutLevel=MORESANE_SIGMACUTLEVEL
Number of sigma at which thresholding is to be
performed. (default: 1)
Options related to logging:
--Log-Memory=0|1 log memory use (default: False)
--Log-Boring=0|1 disable progress bars and other pretty console output
(default: False)
--Log-Append=0|1 append to log file if it exists (default truncates)
(default: False)
Debugging options for the discerning masochist:
--Debug-PauseWorkers=0|1
Pauses worker processes upon launch (with SIGSTOP).
Useful to attach gdb to workers. (default: False)
--Debug-FacetPhaseShift=L,M
Shift in facet coordinates in arcseconds for l and m
(this phase steers the sky over the image plane).
(default: [0.0, 0.0])
--Debug-PrintMinorCycleRMS=0|1
Compute and print RMS in minor cycle iterations.
(default: False)
--Debug-DumpCleanSolutions=DEBUG_DUMPCLEANSOLUTIONS
Dump intermediate minor cycle solutions to a file. Use
0 or 1, or give an explicit list of things to dump
(default: 0)
--Debug-DumpCleanPostageStamps=X,Y,R
Also dump postage stamps when cleaning within a radius
R of X,Y. Implies --Debug-DumpCleanSolutions.
(default: )
--Debug-CleanStallThreshold=DEBUG_CLEANSTALLTHRESHOLD
Throw an exception when a fitted CLEAN component is
below this threshold in flux. Useful for debugging.
(default: 0.0)
--Debug-MemoryGreedy=0|1
Enable memory-greedy mode. Retain certain shared
arrays in RAM as long as possible. (default: True)
--Debug-APPVerbose=DEBUG_APPVERBOSE
Verbosity level for multiprocessing. (default: 0)
--Debug-Pdb=never|always|auto
Invoke pdb on unexpected error conditions (rather than
exit). If set to 'auto', then invoke pdb only if
--Log-Boring is 0. (default: auto)
Miscellaneous options:
--Misc-RandomSeed=N
seed random number generator with explicit seed, if
given. Useful for reproducibility of the random-based
optimizations (sparsification, etc.). (default: none)
--Misc-ConserveMemory=MISC_CONSERVEMEMORY
if true, tries to minimize memory use at possible
expense of runtime. (default: 0)
--Misc-IgnoreDeprecationMarking=0|1
if true, tries to run deprecated modes. Currently
this means that deconvolution machines are reset and
reinitialized each major cycle. (default: False)
```
</details>
## Key parameter explanation
<details>
<summary style="cursor: pointer; color: #007bff;"> Click here to reveal the section </summary>
| 📝 **Note** |
| ------------------------------------------------------------ |
|This section explains the main parameters used in DDFacet, their equivalents in the PREESM dataflow model, and their default values. These parameters control the imaging and deconvolution process. The key values have been found running DDF with default values and allow to pipelines developers to set up their prototypes with the same parameters value for fair comparison. |
### Parameters Exposed in the PREESM Dataflow Model:
| PREESM Parameter | DDFacet Equivalent | DDF Value | Description |
| --------------------- | --------------------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `NUM_MAJOR_CYCLE` | `Deconv-MaxMajorIter` | `20` | Number of CLEAN major cycles. More cycles improve convergence but increase sequential processing time. |
| `NUM_MINOR_CYCLE` | `Deconv-MaxMinorIter` | `1000` | Number of minor iterations per major cycle. Too small leads to incomplete deconvolution; too large adds no value. |
| `NUM_KERNEL_SUPPORT` | `CF-Support` | `7` | Size (in pixels) of the convolution kernel support. Small values cause artifacts; large values increase memory use without added benefit. |
| `OVERSAMPLING_FACTOR` | `CF-OverS` | `11` | Oversampling factor for the convolution kernel. Larger values increase accuracy but also memory usage. |
| `NUM_KERNELS` | `CF-Nw` | `100` | Number of w-projection kernels (planes) used to correct for non-coplanar baselines. |
| `GRID_SIZE` | `Image-NPix` | `4000` | Size (in pixels) of the image grid. Must match the expected field of view and resolution. $NPix =\frac{FoV (arcsec)}{CellSize(arcsec)}$ |
### Constants Fixed in the PREESM Configuration:
| PREESM Constant | DDFacet Equivalent | Value | Description |
| -------------------- | -------------------- | ------------- | ------------------------------------------------------------------------------------------------ |
| `Config.gain` | `Deconv-Gain` | `0.1` | Fraction of the peak flux removed at each minor iteration. Low values ensure stable convergence. |
| `Config.noisyfactor` | `Deconv-NoiseFactor` | `1.5` | CLEAN stopping threshold, defined as a multiple of the estimated image RMS noise. |
| `Config.max_w` | `CF-wmax` | `20000` | Maximum w-component used for gridding. Visibilities with higher w values are excluded. |
| `Config.cell_size` | `Image-Cell` | `15` (arcsec) | Angular size of an image pixel. Defines resolution and field of view. |
</details>
## Running DDFacet on your own laptop
a.k.a Single node (multicore) Execution
<details>
<summary style="cursor: pointer; color: #007bff;"> Click here to reveal the section </summary>
| 📝 **Note** |
| ------------------------------------------------------------ |
|The tutorial really starts here. This section how to install tools and usage of DDFacet on your own laptop considering that you have a Linux x86. |
- Run the following commands to install dependencies and prepare your working environment:
```bash
# install singularity
sudo apt update
sudo apt install singularity-container -y
# Install DS9 viewer (for FITS files)
sudo apt install saods9
# Create a writable folder for data
sudo mkdir -p /media/tasse/data
sudo chown -R $USER /media/tasse
```
- Once everything is ready, you can start DDFacet inside the container:
- Single-node architecture (multicore):
```bash
# run the singularity environment
singularity shell -B/home -B/media/tasse/data ./ddf_dev_np1.22.4.sif
```
- Multi-node architecture
```bash
# run the singularity environment
mpirun -np 2 singularity exec -B/home -B/media/tasse/data ./ddf_dev_np1.22.4.sif [DDFacet command directly]
```
This will drop you into a shell where you can run `DDF.py` with your `.parset` file and `.ms` input.
Here after are examples of command line usage and expected results explanation.
#### Default imaging (dirty map)
```bash
DDF.py Template.parset \
--Data-MS 0000.MS \
--Output-Name default/test \
--Data-ColName DATA
```
###### Output:
The command will generate two FITS files in the `default` folder:
* `test.dirty.fits`: The raw, uncalibrated image.
* `test.dirty.corr.fits`: The calibrated image (if calibration is applied).
###### Displaying the results:
`python dsm.py default/test.dirty.fits`
or if you don't mind about calibration: `ds9 *.fits -lock frame wcs -zoom to fit`

> :bulb: Color map suggestion in DS9: Color > `inferno`
#### Degridding
To generate model visibilities from an input image (useful for subtraction or calibration):
- copy a `test.dirty.fits` file in the right location.
```
DDF.py Template.parset \
--Data-MS 0000.MS \
--Output-Name predict/test \
--Output-Mode Predict \
--Predict-ColName DDF_PREDICT \
--Predict-FromImage predict/test.dirty.fits
```
###### Output:
The command will generate two FITS files in the `default` folder:
* `test.cube.model.fits`: Spectral cube model
The command will also store the predicted visibilities in the column `DDF_PREDICT` of `0000.MS`
> :bulb: If the input image (`.fits`) only contains a single frequency channel and the MS has multiple channels, DDFacet will replicate the image across the spectral axis to create a matching cube.
#### Clean
When you observe the sky with a radio telescope, you don't get a perfect image right away. What you get is a dirty image — it contains real sources, but also lots of artifacts from the instrument's point spread function (PSF). The Clean mode helps remove these artifacts and reconstruct a clearer, more realistic image of the sky. Use this mode when you want to visualize a usable sky image or create accurate sky models for calibration or source subtraction.
```
DDF.py Template.parset \
--Data-MS 0000.MS \
--Output-Name clean/test \
--Output-Mode Clean \
--Deconv-Mode Hogbom \
--Freq-NBand 1 \
--Freq-NDegridBand 1 \
--Mask-Auto False \
--Deconv-MaxMajorIter 1 \
--Deconv-MaxMinorIter 3 \
--nCPU 1
```
###### Output:
The command will generate two FITS files in the `clean` folder:
* `test.alfa.fits`: Weight map showing confidence per pixel in the model.
* `test.app.model.fits`: The reconstructed model of the sky (just the detected sources).
* `test.app.restored.fits`: The final clean image: model + residual smoothed with the PSF.
* `test.brutalModelConv01.fits`: Raw versions of model images (for advanced use/debugging).
* `test.brutalRestored01.fits`: Raw versions of restored images (for advanced use/debugging).
* `test.mask01.fits`: Mask used to guide automatic cleaning (based on signal thresholds).
* `test.noise01.fits`: Noise estimation map of the image.
* `test.residual01.fits`: What’s left after subtracting the model from the dirty image.

|Deconv-Mode | Speed | Resources | Use When…|
| -------- | -------- | -------- | -------- |
|HMP (Harmonic Matching Pursuit)| 🟢 Fast | 🟢 Light | You want quick results and decent cleaning, good default.|
Hogbom | 🟡 Medium | 🟡 Medium | Classic CLEAN, for point sources.
SSD (Steepest-Descent Deconvolution)| 🔴 Very Slow | 🔴 Heavy | Gradient-based deconv, avoid unless you really need precise large-scale structures.
WSCMS (Weighted Source Component Model Subtaction) | 🔴 Very Slow | 🔴 Heavy | Advanced method for complex sky models, but slow and memory-hungry.
</details>
## Running DDFacet MPI on your on laptop
<details>
<summary style="cursor: pointer; color: #007bff;"> Click here to reveal the section </summary>
| 📝 **Note** |
| ------------------------------------------------------------ |
| This section how to install tools and usage of DDFacet a multinode Cluster faking multi-node execution on a Linux x86. The procedure start with a distributed measurementSet and apply the MPI version of DDFacet.|
```mermaid
graph LR;
A[Split-MS] --> B[MPI-DDFacet];
B --> C[image.fits];
```
### Distributed MeasurementSet
The following steps explain how to properly split a measurementSet for multinode-DDFacet execution.
- Dowload python casacore: `pip install python-casacore`
- Download the script to split measurementSet [here](https://github.com/Ophelie-Renaud/ddfacet-dft-fft-g2g-tutorials).
```python
import os
import shutil
import numpy as np
from casacore.tables import table
import argparse
def split_ms(ms_path, output_prefix, criterion='time', n_splits=2):
assert criterion in ['time', 'scan', 'field', 'spw'], "Critère non supporté pour split standard."
t = table(ms_path)
colname = {
'time': 'TIME',
'scan': 'SCAN_NUMBER',
'field': 'FIELD_ID',
'spw': 'DATA_DESC_ID'
}[criterion]
column = t.getcol(colname)
unique_vals = sorted(set(column))
print(f"📊 {criterion.upper()} values trouvées : {unique_vals}")
if n_splits > len(unique_vals):
print(f"⚠️ Seulement {len(unique_vals)} valeurs distinctes pour {colname}, ajustement de n_splits.")
n_splits = len(unique_vals)
groups = [[] for _ in range(n_splits)]
for i, val in enumerate(unique_vals):
groups[i % n_splits].append(val)
for i, group_vals in enumerate(groups):
val_str = ','.join(map(str, group_vals))
query_str = f"{colname} IN [{val_str}]"
sub = t.query(query_str)
out_path = f"{output_prefix}_{criterion}_{i}.ms"
if os.path.exists(out_path):
shutil.rmtree(out_path)
sub.copy(out_path, deep=True)
print(f"✅ Split {i} : {sub.nrows()} lignes -> {out_path}")
t.close()
def split_by_channel(ms_path, output_prefix):
t = table(ms_path)
data = t.getcol('DATA')
n_channels = data.shape[1]
t.close()
print(f"📡 Nombre de canaux détectés : {n_channels}")
for chan_idx in range(n_channels):
print(f"\n📤 Extraction du canal {chan_idx}")
out_path = f"{output_prefix}_channel_{chan_idx}.ms"
if os.path.exists(out_path):
shutil.rmtree(out_path)
shutil.copytree(ms_path, out_path)
t_out = table(out_path, readonly=False)
data_all = t_out.getcol('DATA')
masked_data = np.zeros_like(data_all)
masked_data[:, chan_idx, :] = data_all[:, chan_idx, :]
t_out.putcol('DATA', masked_data)
t_out.close()
print(f"✅ Canal {chan_idx} sauvegardé dans : {out_path}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Split Measurement Set (MS) selon divers critères.")
parser.add_argument("ms_path", help="Chemin vers le MS d'entrée.")
parser.add_argument("output_prefix", help="Préfixe ou dossier de sortie.")
parser.add_argument("--criterion", choices=['time', 'scan', 'field', 'spw', 'channel'], default='time',
help="Critère de découpe (par défaut: time).")
parser.add_argument("--n_splits", type=int, default=2,
help="Nombre de splits (non utilisé si criterion=channel).")
args = parser.parse_args()
if args.criterion == 'channel':
split_by_channel(args.ms_path, args.output_prefix)
else:
split_ms(args.ms_path, args.output_prefix, criterion=args.criterion, n_splits=args.n_splits)
```
Usage:
```bash
python split_ms_tool.py /path/to/my.ms output_prefix --criterion field --n_splits 3
```
*where `criterion field` ∈ [`time` (default: split by observation time), `scan` (split by scan number, continuous observation sequence), `field` (split by field/source ID), `spw` (split by spectral window (frequency), `channel` (split each spectral channel into a separate MS)], `n_splits` by default limited to your MS criterion field* .
### Fake a Multinode execution on your laptop
Here after are examples of command line usage running MPI version of DDFacet on your laptop. For the expected results explanation please refer to section single node execution where the DDFacet options used are basically the same.
#### Default imaging (dirty map)
Change the measurementSets names with your distributed measuremenSets names.
```
mpirun -np 2 singularity exec -B /home -B /media/tasse/data \
./ddf_dev_np1.22.4.sif DDF.py Template.parset \
--Data-MS 0000.MS,0000.MS \
--Output-Name default/test \
--Data-ColName DATA
```
#### Degridding
```
mpirun -np 2 singularity exec -B /home -B /media/tasse/data \
./ddf_dev_np1.22.4.sif DDF.py Template.parset \
--Data-MS 0000.MS,0000.MS \
--Output-Name predict/test \
--Output-Mode Predict \
--Predict-ColName DDF_PREDICT \
--Predict-FromImage predict/test.dirty.fits
```
#### Clean
<p align="center">
<img src="https://hackmd.io/_uploads/rytkVN2Glx.png" alt="Figure DDFacet" width="400"/>
<br>
<sub>The Figure gives an overview of the parallelization implemented within DDFacet. The following command enables the reconstruction of a distributed MeasurementSet across a multi-node architecture, based on the work by N. Monnier et al: <a href="https://hal.science/hal-03729202/document">DDFacet parallel</a>.</sub>
</p>
```
mpirun -np 2 singularity exec -B /home -B /media/tasse/data \
./ddf_dev_np1.22.4.sif DDF.py Template.parset \
--Data-MS 0000.MS,0000.MS \
--Output-Name clean/test \
--Output-Mode Clean \
--Deconv-Mode HMP \
--Freq-NBand 3 \
--Freq-NDegridBand 1 \
--Mask-Auto True \
--Mask-SigTh 15 \
--Mask-AutoRMSFactor 3 \
--Deconv-MaxMajorIter 1 \
--Deconv-MaxMinorIter 5
```
</details>
## Running DDFacet MPI on Ruche Mesocentre
a.k.a Multinode Execution
<details>
<summary style="cursor: pointer; color: #007bff;"> Click here to reveal the section </summary>
| 📝 **Note** |
| ------------------------------------------------------------ |
| This section how to install tools and usage of DDFacet on [Ruche Mesocentre Cluster](https://mesocentre.pages.centralesupelec.fr/user_doc/), a SLURM-based cluster, however feel free to crash & test it wherever you can. The procedure start with a distributed measurementSet and apply the MPI version of DDFacet.|
- To run on the Ruche Mesocentre cluster, you first need a valid account.
- Check your eligibility: Ruche is accessible to all researchers from ENS Paris-Saclay, CentraleSupélec, Université Paris-Saclay, Maison de la Simulation, otherwise you can pay.
- Prepare a [project referee](https://mesocentre.universite-paris-saclay.fr/ruche/project_creation_form.html.txt?utm_source=chatgpt.com)
- Send the document to the support: `ruche.support@universite-paris-saclay.fr`
Once the account is activated, you can connect via SSH and submit jobs through SLURM. SLURM manages the allocation of resources (nodes, CPUs, GPUs, memory, time limits) and schedules jobs. Here, an example job script `slurm.sh`:
```shell
#!/bin/bash
#SBATCH --job-name=ddfacet_job
#SBATCH --output=%x.out
#SBATCH --error=%x.err
#SBATCH --time=02:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --partition=cpu_med
set -x # mode debug
# Chargement des modules
module purge
module load intel/19.0.3/gcc-4.8.5
module load singularity/3.8.3/gcc-11.2.0
# Nettoyage environnement Python utilisateur
unset PYTHONPATH
export PYTHONNOUSERSITE=1
export PYTHONUSERBASE=/tmp
# HOME factice dans le conteneur
FAKE_HOME="/tmp/fakehome_${SLURM_JOB_ID}"
mkdir -p $FAKE_HOME
# Lancement sans le vrai $HOME
srun singularity exec --cleanenv --no-home \
-H $FAKE_HOME \
-B /gpfs/users/renaudo/data:/media/tasse/data \
-B /home/renaudo/Template.parset:/media/tasse/Template.parset \
./ddf_dev_np1.22.4.sif DDF.py /media/tasse/Template.parset \
--Data-MS /media/tasse/data/0000.MS,/media/tasse/data/0000.MS \
--Output-Name /media/tasse/data/clean/test \
--Output-Mode Clean \
--Deconv-Mode HMP \
--Freq-NBand 3 \
--Freq-NDegridBand 1 \
--Mask-Auto True \
--Mask-SigTh 15 \
--Mask-AutoRMSFactor 3 \
--Deconv-MaxMajorIter 1 \
--Deconv-MaxMinorIter 5
```
- Transfer the mandatory file on the Cluster:
```bash
rsync -avh --progress ddf_dev_np1.22.4.sif renaudo@ruche.mesocentre.universite-paris-saclay.fr:/home/renaudo/
rsync -avh --progress Template.parset renaudo@ruche.mesocentre.universite-paris-saclay.fr:/home/renaudo/
rsync -avh --progress SB155.rebin.ms renaudo@ruche.mesocentre.universite-paris-saclay.fr:/home/renaudo/
rsync -avh --progress slurm.sh renaudo@ruche.mesocentre.universite-paris-saclay.fr:/home/renaudo/
```
- Connect and prepare the workspace
```bash
ssh renaudo@ruche.mesocentre.universite-paris-saclay.fr
cd /home/renaudo/
mkdir -p /gpfs/users/renaudo/data
```
- Submit the job
```bash
sbatch slurm.sh
```
- (optional) Commands for monitoring
```bash
squeue -u renaudo # check job status
cat ddfacet_job.out # standard output
cat ddfacet_job.err # error logs
```
- Generated files are stored in `ls -lh ~/clean`. To copy them back locally:
`rsync -avh --progress renaudo@ruche.mesocentre.universite-paris-saclay.fr:/gpfs/users/renaudo/clean/ ~/local/path/clean/`
</details>
## Multiple Chain Execution of Distributed MSs on a Ruche
<details>
<summary style="cursor: pointer; color: #007bff;"> Click here to reveal the section </summary>
| 📝 **Note** |
| ------------------------------------------------------------ |
|The goal here is to run DDFacet multiple times imaging diferent measuremenSet where we control the number of visibilities, why not the grid size and why not the number of sources/ minor cycle and store the latency of each. This will allow to generate a trend line of DDFacet behavior (fixing other options) to position according with existing imager trend lines at NenuFAR level and extrapolation to SKAO scale and to position according real time execution target.|
```mermaid
graph LR;
subgraph Loop["For ms ∊ [ms_min; ms_ax]"]
B[MPI-DDFacet] --> C[image.fits]
end
```
Here is the `slurm_multi.sh`:
```shell
#!/bin/bash
#SBATCH --job-name=ddfacet_multi
#SBATCH --output=%x.out
#SBATCH --error=%x.err
#SBATCH --time=04:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=40
#SBATCH --partition=cpu_med
set -x
module purge
module load intel/19.0.3/gcc-4.8.5
module load singularity/3.8.3/gcc-11.2.0
unset PYTHONPATH
export PYTHONNOUSERSITE=1
export PYTHONUSERBASE=/tmp
FAKE_HOME="/tmp/fakehome_${SLURM_JOB_ID}"
mkdir -p $FAKE_HOME
# List of MeasurementSets to process
MS_LIST=(
"/media/tasse/data/0000.MS"
"/media/tasse/data/0000.MS"
"/media/tasse/data/0000.MS"
)
# Loop over them
for MS in "${MS_LIST[@]}"; do
BASENAME=$(basename "$MS" .MS)
START=$(date +%s)
srun singularity exec --cleanenv --no-home \
-H $FAKE_HOME \
-B /gpfs/users/renaudo/data:/media/tasse/data \
-B /home/renaudo/Template.parset:/media/tasse/Template.parset \
./ddf_dev_np1.22.4.sif DDF.py /media/tasse/Template.parset \
--Data-MS "$MS" \
--Output-Name /media/tasse/data/clean/${BASENAME} \
--Output-Mode Clean \
--Deconv-Mode HMP \
--Freq-NBand 3 \
--Freq-NDegridBand 1 \
--Mask-Auto True \
--Mask-SigTh 15 \
--Mask-AutoRMSFactor 3 \
--Deconv-MaxMajorIter 1 \
--Deconv-MaxMinorIter 5\
--Parallel-NCPU ${SLURM_CPUS_PER_TASK}
END=$(date +%s)
RUNTIME=$((END-START))
echo "${BASENAME} took ${RUNTIME} seconds" | tee -a runtimes.log
done
```
- Transfer the mandatory file on the Cluster:
```
rsync -avh --progress slurm_multi.sh renaudo@ruche.mesocentre.universite-paris-saclay.fr:/home/renaudo/
```
- Connect and prepare the workspace
```bash
ssh renaudo@ruche.mesocentre.universite-paris-saclay.fr
cd /home/renaudo/
mkdir -p /gpfs/users/renaudo/data
```
- Submit the job
```bash
sbatch slurm_multi.sh
```
- (optional) Commands for monitoring
```bash
squeue -u renaudo # check job status
cat ddfacet_job.out # standard output
cat ddfacet_job.err # error logs
```
- Recuperons les timings:
`rsync -avh --progress renaudo@ruche.mesocentre.universite-paris-saclay.fr:/gpfs/users/renaudo/runtimes.log ~/Desktop/`
- Affichons la courbe de tendance des timing:
```python
import matplotlib.pyplot as plt
# Lecture du fichier
ms_names = []
runtimes = []
with open("runtimes.log") as f:
for line in f:
parts = line.strip().split()
ms_names.append(parts[0])
runtimes.append(float(parts[2])) # secondes
# Création de la figure
plt.figure(figsize=(8,5))
plt.plot(ms_names, runtimes, marker='o', linestyle='-', color='blue')
plt.xlabel("MeasurementSet")
plt.ylabel("Runtime (s)")
plt.title("Execution time per MS")
plt.grid(True)
plt.tight_layout()
plt.show()
```
</details>
## Read measuremenSet
ToDo
## Note
:warning: *This tutorial has been tested from the `SB155.rebin.ms` mesuremenSet obtained from NenuFAR via nancep.*
Contributions and ideas are welcome!
## Contact
For questions or feedback, please contact:
- This tutorial was written by [Ophélie Renaud](mailto:ophelie.renaud@ens-paris-saclay.fr) from PREESM/SimSDP community (previously @INSA, @IETR, now @SATIE paris-saclay, @IRISA, @ENS Rennes :fr:).