Makers - Data reduction#

Data reduction: from observations to binned datasets

Introduction#

The gammapy.makers sub-package contains classes to perform data reduction tasks from DL3 data to binned datasets. In the data reduction step the DL3 data is prepared for modeling and fitting, by binning events into a counts map and interpolating the exposure, background, psf and energy dispersion on the chosen analysis geometry.

Setup#

import numpy as np
from astropy import units as u
from regions import CircleSkyRegion
import matplotlib.pyplot as plt
from IPython.display import display
from gammapy.data import DataStore
from gammapy.datasets import Datasets, MapDataset, SpectrumDataset
from gammapy.makers import (
    DatasetsMaker,
    FoVBackgroundMaker,
    MapDatasetMaker,
    ReflectedRegionsBackgroundMaker,
    SafeMaskMaker,
    SpectrumDatasetMaker,
)
from gammapy.maps import MapAxis, RegionGeom, WcsGeom

Check setup#

from gammapy.utils.check import check_tutorials_setup

check_tutorials_setup()
System:

        python_executable      : /home/runner/work/gammapy-docs/gammapy-docs/gammapy/.tox/build_docs/bin/python
        python_version         : 3.9.16
        machine                : x86_64
        system                 : Linux


Gammapy package:

        version                : 1.1
        path                   : /home/runner/work/gammapy-docs/gammapy-docs/gammapy/.tox/build_docs/lib/python3.9/site-packages/gammapy


Other packages:

        numpy                  : 1.24.3
        scipy                  : 1.10.1
        astropy                : 5.2.2
        regions                : 0.7
        click                  : 8.1.3
        yaml                   : 6.0
        IPython                : 8.14.0
        jupyterlab             : not installed
        matplotlib             : 3.7.1
        pandas                 : not installed
        healpy                 : 1.16.2
        iminuit                : 2.21.3
        sherpa                 : 4.15.1
        naima                  : 0.10.0
        emcee                  : 3.1.4
        corner                 : 2.2.2
        ray                    : 2.5.0


Gammapy environment variables:

        GAMMAPY_DATA           : /home/runner/work/gammapy-docs/gammapy-docs/gammapy-datasets/1.1

Dataset#

The counts, exposure, background and IRF maps are bundled together in a data structure named MapDataset.

The first step of the data reduction is to create an empty dataset. A MapDataset can be created from any WcsGeom object. This is illustrated in the following example:

energy_axis = MapAxis.from_bounds(
    1, 10, nbin=11, name="energy", unit="TeV", interp="log"
)
geom = WcsGeom.create(
    skydir=(83.63, 22.01),
    axes=[energy_axis],
    width=5 * u.deg,
    binsz=0.05 * u.deg,
    frame="icrs",
)
dataset_empty = MapDataset.create(geom=geom)
print(dataset_empty)
MapDataset
----------

  Name                            : Xbps7GDv

  Total counts                    : 0
  Total background counts         : 0.00
  Total excess counts             : 0.00

  Predicted counts                : 0.00
  Predicted background counts     : 0.00
  Predicted excess counts         : nan

  Exposure min                    : 0.00e+00 m2 s
  Exposure max                    : 0.00e+00 m2 s

  Number of total bins            : 110000
  Number of fit bins              : 0

  Fit statistic type              : cash
  Fit statistic value (-2 log(L)) : nan

  Number of models                : 0
  Number of parameters            : 0
  Number of free parameters       : 0

It is possible to compute the instrument response functions with different spatial and energy bins as compared to the counts and background maps. For example, one can specify a true energy axis which defines the energy binning of the IRFs:

energy_axis_true = MapAxis.from_bounds(
    0.3, 10, nbin=31, name="energy_true", unit="TeV", interp="log"
)
dataset_empty = MapDataset.create(geom=geom, energy_axis_true=energy_axis_true)

For the detail of the other options available, you can always call the help:

Help on method create in module gammapy.datasets.map:

create(geom, energy_axis_true=None, migra_axis=None, rad_axis=None, binsz_irf=<Quantity 0.2 deg>, reference_time='2000-01-01', name=None, meta_table=None, reco_psf=False, **kwargs) method of abc.ABCMeta instance
    Create a MapDataset object with zero filled maps.

    Parameters
    ----------
    geom : `~gammapy.maps.WcsGeom`
        Reference target geometry in reco energy, used for counts and background maps
    energy_axis_true : `~gammapy.maps.MapAxis`
        True energy axis used for IRF maps
    migra_axis : `~gammapy.maps.MapAxis`
        If set, this provides the migration axis for the energy dispersion map.
        If not set, an EDispKernelMap is produced instead. Default is None
    rad_axis : `~gammapy.maps.MapAxis`
        Rad axis for the psf map
    binsz_irf : float
        IRF Map pixel size in degrees.
    reference_time : `~astropy.time.Time`
        the reference time to use in GTI definition
    name : str
        Name of the returned dataset.
    meta_table : `~astropy.table.Table`
        Table listing information on observations used to create the dataset.
        One line per observation for stacked datasets.
    reco_psf : bool
        Use reconstructed energy for the PSF geometry. Default is False

    Returns
    -------
    empty_maps : `MapDataset`
        A MapDataset containing zero filled maps

    Examples
    --------
    >>> from gammapy.datasets import MapDataset
    >>> from gammapy.maps import WcsGeom, MapAxis

    >>> energy_axis = MapAxis.from_energy_bounds(1.0, 10.0, 4, unit="TeV")
    >>> energy_axis_true = MapAxis.from_energy_bounds(
                0.5, 20, 10, unit="TeV", name="energy_true"
            )
    >>> geom = WcsGeom.create(
                skydir=(83.633, 22.014),
                binsz=0.02, width=(2, 2),
                frame="icrs",
                proj="CAR",
                axes=[energy_axis]
            )
    >>> empty = MapDataset.create(geom=geom, energy_axis_true=energy_axis_true, name="empty")

Once this empty “reference” dataset is defined, it can be filled with observational data using the MapDatasetMaker:

# get observation
data_store = DataStore.from_dir("$GAMMAPY_DATA/hess-dl3-dr1")
obs = data_store.get_observations([23592])[0]

# fill dataset
maker = MapDatasetMaker()
dataset = maker.run(dataset_empty, obs)
print(dataset)

dataset.counts.sum_over_axes().plot(stretch="sqrt", add_cbar=True)
plt.show()
makers
MapDataset
----------

  Name                            : ljPCb6a2

  Total counts                    : 2016
  Total background counts         : 1866.72
  Total excess counts             : 149.28

  Predicted counts                : 1866.72
  Predicted background counts     : 1866.72
  Predicted excess counts         : nan

  Exposure min                    : 1.19e+02 m2 s
  Exposure max                    : 1.09e+09 m2 s

  Number of total bins            : 110000
  Number of fit bins              : 110000

  Fit statistic type              : cash
  Fit statistic value (-2 log(L)) : nan

  Number of models                : 0
  Number of parameters            : 0
  Number of free parameters       : 0

The MapDatasetMaker fills the corresponding counts, exposure, background, psf and edisp map per observation. The MapDatasetMaker has a selection parameter, in case some of the maps should not be computed. There is also a background_oversampling parameter that defines the oversampling factor in energy used to compute the background (default is None).

Safe data range handling#

To exclude the data range from a MapDataset, that is associated with high systematics on instrument response functions, a mask_safe can be defined. The mask_safe is a Map object with bool data type, which indicates for each pixel, whether it should be included in the analysis. The convention is that a value of True or 1 includes the pixel, while a value of False or 0 excludes a pixels from the analysis. To compute safe data range masks according to certain criteria, Gammapy provides a SafeMaskMaker class. The different criteria are given by the methodsargument, available options are :

  • aeff-default, uses the energy ranged specified in the DL3 data files, if available.

  • aeff-max, the lower energy threshold is determined such as the effective area is above a given percentage of its maximum

  • edisp-bias, the lower energy threshold is determined such as the energy bias is below a given percentage

  • offset-max, the data beyond a given offset radius from the observation center are excluded

  • bkg-peak, the energy threshold is defined as the upper edge of the energy bin with the highest predicted background rate. This method was introduced in the HESS DL3 validation paper: https://arxiv.org/pdf/1910.08088.pdf

Note that currently some methods computing a safe energy range (“aeff-default”, “aeff-max” and “edisp-bias”) determine a true energy range and apply it to reconstructed energy, effectively neglecting the energy dispersion.

Multiple methods can be combined. Here is an example :

safe_mask_maker = SafeMaskMaker(
    methods=["aeff-default", "offset-max"], offset_max="3 deg"
)

dataset = maker.run(dataset_empty, obs)
dataset = safe_mask_maker.run(dataset, obs)
print(dataset.mask_safe)

dataset.mask_safe.sum_over_axes().plot()
plt.show()
makers
WcsNDMap

        geom  : WcsGeom
        axes  : ['lon', 'lat', 'energy']
        shape : (100, 100, 11)
        ndim  : 3
        unit  :
        dtype : bool

The SafeMaskMaker does not modify any data, but only defines the mask_safe attribute. This means that the safe data range can be defined and modified in between the data reduction and stacking and fitting. For a joint-likelihood analysis of multiple observations the safe mask is applied to the counts and predicted number of counts map during fitting. This correctly accounts for contributions (spill-over) by the PSF from outside the field of view.

Background estimation#

The background computed by the MapDatasetMaker gives the number of counts predicted by the background IRF of the observation. Because its actual normalization, or even its spectral shape, might be poorly constrained, it is necessary to correct it with the data themselves. This is the role of background estimation Makers.

FoV background#

If the background energy dependent morphology is well reproduced by the background model stored in the IRF, it might be that its normalization is incorrect and that some spectral corrections are necessary. This is made possible thanks to the FoVBackgroundMaker. This technique is recommended in most 3D data reductions. For more details and usage, see fov_background.

Here we are going to use a FoVBackgroundMaker that will rescale the background model to the data excluding the region where a known source is present. For more details on the way to create exclusion masks see the mask maps notebook.

Other backgrounds production methods are available as listed below.

Ring background#

If the background model does not reproduce well the morphology, a classical approach consists in applying local corrections by smoothing the data with a ring kernel. This allows to build a set of OFF counts taking into account the imperfect knowledge of the background. This is implemented in the RingBackgroundMaker which transforms the Dataset in a MapDatasetOnOff. This technique is mostly used for imaging, and should not be applied for 3D modeling and fitting.

For more details and usage, see ring_background.

Reflected regions background#

In the absence of a solid background model, a classical technique in Cherenkov astronomy for 1D spectral analysis is to estimate the background in a number of OFF regions. When the background can be safely estimated as radially symmetric w.r.t. the pointing direction, one can apply the reflected regions background technique. This is implemented in the ReflectedRegionsBackgroundMaker which transforms a SpectrumDataset in a SpectrumDatasetOnOff. This method is only used for 1D spectral analysis.

For more details and usage, see reflected_background.

Data reduction loop#

The data reduction steps can be combined in a single loop to run a full data reduction chain. For this the MapDatasetMaker is run first and the output dataset is the passed on to the next maker step. Finally, the dataset per observation is stacked into a larger map.

data_store = DataStore.from_dir("$GAMMAPY_DATA/hess-dl3-dr1")
observations = data_store.get_observations([23523, 23592, 23526, 23559])

energy_axis = MapAxis.from_bounds(
    1, 10, nbin=11, name="energy", unit="TeV", interp="log"
)
geom = WcsGeom.create(skydir=(83.63, 22.01), axes=[energy_axis], width=5, binsz=0.02)

dataset_maker = MapDatasetMaker()
safe_mask_maker = SafeMaskMaker(
    methods=["aeff-default", "offset-max"], offset_max="3 deg"
)

stacked = MapDataset.create(geom)

for obs in observations:
    local_dataset = stacked.cutout(obs.get_pointing_icrs(obs.tmid), width="6 deg")
    dataset = dataset_maker.run(local_dataset, obs)
    dataset = safe_mask_maker.run(dataset, obs)
    dataset = fov_bkg_maker.run(dataset)
    stacked.stack(dataset)

print(stacked)
MapDataset
----------

  Name                            : jBlAJZOA

  Total counts                    : 7972
  Total background counts         : 7555.42
  Total excess counts             : 416.58

  Predicted counts                : 7555.42
  Predicted background counts     : 7555.42
  Predicted excess counts         : nan

  Exposure min                    : 1.04e+06 m2 s
  Exposure max                    : 3.22e+09 m2 s

  Number of total bins            : 687500
  Number of fit bins              : 687214

  Fit statistic type              : cash
  Fit statistic value (-2 log(L)) : nan

  Number of models                : 0
  Number of parameters            : 0
  Number of free parameters       : 0

To maintain good performance it is always recommended to do a cutout of the MapDataset as shown above. In case you want to increase the offset-cut later, you can also choose a larger width of the cutout than 2 * offset_max.

Note that we stack the individual MapDataset, which are computed per observation into a larger dataset. During the stacking the safe data range mask (mask_safe) is applied by setting data outside to zero, then data is added to the larger map dataset. To stack multiple observations, the larger dataset must be created first.

The data reduction loop shown above can be done through the DatasetsMaker class that take as argument a list of makers. Note that the order of the makers list is important as it determines their execution order. Moreover the stack_datasets option offers the possibility to stack or not the output datasets, and the n_jobs option allow to use multiple processes on run.

Datasets
--------

Dataset 0:

  Type       : MapDataset
  Name       : EaXHe1f_
  Instrument : HESS
  Models     : ['EaXHe1f_-bkg']

Dataset 1:

  Type       : MapDataset
  Name       : 6sEPhYwN
  Instrument : HESS
  Models     : ['6sEPhYwN-bkg']

Dataset 2:

  Type       : MapDataset
  Name       : INbgAqMg
  Instrument : HESS
  Models     : ['INbgAqMg-bkg']

Dataset 3:

  Type       : MapDataset
  Name       : R80-KIIM
  Instrument : HESS
  Models     : ['R80-KIIM-bkg']

Spectrum dataset#

The spectrum datasets represent 1D spectra along an energy axis within a given on region. The SpectrumDataset contains a counts spectrum, and a background model. The SpectrumDatasetOnOff contains ON and OFF count spectra, background is implicitly modeled via the OFF counts spectrum.

The SpectrumDatasetMaker make spectrum dataset for a single observation. In that case the irfs and background are computed at a single fixed offset, which is recommended only for point-sources.

Here is an example of data reduction loop to create SpectrumDatasetOnOff datasets:

Datasets
--------

Dataset 0:

  Type       : SpectrumDatasetOnOff
  Name       : obs-23523
  Instrument : HESS
  Models     :

Dataset 1:

  Type       : SpectrumDatasetOnOff
  Name       : obs-23592
  Instrument : HESS
  Models     :

Dataset 2:

  Type       : SpectrumDatasetOnOff
  Name       : obs-23526
  Instrument : HESS
  Models     :

Dataset 3:

  Type       : SpectrumDatasetOnOff
  Name       : obs-23559
  Instrument : HESS
  Models     :

Total running time of the script: ( 0 minutes 20.997 seconds)

Gallery generated by Sphinx-Gallery