. include:: ../../references.txt

PIG 28 - Gammapy version 2.0 Roadmap#

  • Author: Axel Donath, Bruno Khélifi, Régis Terrier and others

  • Created: March 20th, 2023

  • Accepted: withdrawn on May 23rd, 2025

  • Status: Withdrawn

  • Discussion: GH 4388

Abstract#

The second Long Term Stable (LTS) release of Gammapy will take place at the end of 2024. This PIG discusses the main area of development that are foreseen for the v2.0 and proposes some prioritization of the effort and plausible milestones for the intermediate feature releases expected (v1.x).

This document first describes a number of general use cases that should be made possible with Gammapy. It then describes specific projects and changes to be made to the library to support such use cases and further improvements.

Several specific aspects and projects are discussed in their specific PIGs (e.g. unbinned analysis in GH 4253 or priors and likelihood in GH 4381).

Use cases to support#

This section describes new use cases that we would like to see supported in future versions of Gammapy.

Event type handling#

Starting from a datastore containing events lists and IRFs with types and classes of events, the user produces of a list of datasets per event type or class. Meta informations of the datasets allow for complex types handling at modeling/fitting step (e.g. joint fit of A & B type, stack all type together etc).

Manipulation and selection of Datasets#

After data reduction in a list of spectrum datasets, the user wants to stack spectra obtained in observation in given bands of zenith angle. The metadata information stored on the datasets allows complex manipulation at modeling/fitting step.

Unbinned spectral or 3D analysis#

The user produces a dataset with reprojected IRFs and a list of events and performs model fitting and parameter estimation computing likelihoods of individual events.

Source detection#

After creating a MapDataset, a user extracts a list of source candidate positions and fluxes with associated errors and estimated significance. The list can be used as input for model fitting at later steps.

Transient source detection#

The user wants to search for unknown transient sources in a given observation or set of observations.

The user wants to find flares in the long term light curve of a variable source or to study source variability on various temporal scales. A number of standard quantities such as excess variance, flux doubling time scale can be extracted from datasets or lightcurves products.

Spectral unfolding#

The user wants to extract the intrinsic source spectrum with minimal hypothesis on the shape (mostly with a regularity criterion).

Morphology estimation#

An estimator API allows the user to test the model morphology parameters: extension profile and associated significance, position error contours. Applying it per energy bands allows testing for energy dependent morphology.

Handling systematic effects#

The user wants to add a systematic effect of given amplitude on a reduced dataset IRFs (e.g. a bias in the absolute energy scale, or a possible broadening of the PSF) to allow quantifying its impact on a measurement. Specific models for such IRFs uncertainties could be defined on any dataset.

Nuisance parameters and priors#

The user wants to add a systematic effect of unknown amplitude (e.g. a bias in the absolute energy scale) and wants to estimate the impact of this effect on the parameter estimation assuming a prior distribution of the nuisance parameter.

Specific Projects#

Here we list specific projects

Configurable API#

To provide safety w.r.t. class instantiations and to allow for an easily configurable API, the main Gammapy API classes should be directly configurable.

This is a generic problem that could be tackled using a similar approach as ctapipe Pydantic and its BaseModel class seems to be a widely used solution. This is already used in the v1.0 AnalysisConfig.

Gammapy Maps#

gammapy.maps is one of the biggest element in gammapy which requires expertise and dedication to properly maintain. It is also one the subpackage that has potentially the largest impact outside the gamma-ray community. If we find a few contributors from outside, it might be worth splitting out gammapy.maps as an independent package. This is of course a very long term perspective, beyond v2.0.

Proposed minor changes#

  • Improve the user interface to Map. In particular, better protect and improve the documentation of Map.create() MapGeom.create() and constructors. Improve the handling of MapCoord to ease slice extraction.

  • RegionGeom could support sizes changing with axis. This would handle energy dependent region sizes as well as (See GH 3863).

  • The serialization code is complex and will become hard to maintain when new formats are introduced, see e.g. for MapAxis. Some clean-up and refactoring is necessary here.

Possible major changes#

We discuss here some aspects that should be explored.

  • IRF and Map share a similar data model. A N-dimensional Quantity with a MapAxes and an interpolator. In addition, Maps use the Geom object to represent the spherical coordinates.

    • Having a common data structure could help make maps fully re-usable for IRFs. This might be a common use case with pyirf.

    • One could allow Maps and MapCoord objects without spatial axes. Introducing specialized spatial axes such as WcsMapAxis, RegionMapAxis or HpxMapAxis could allow avoiding using Geom objects.

    • The evaluation of the feasibility will require some detailed prototyping. Such a major change would probably be possible at best only when releasing v2.0. Having a prototype at this timescale would be nice.

  • Migrate from the healpy dependency to using astropy/astropy-healpix or cds-astro/cds-healpix-python. Another option could be to interface multi resolution HPX maps: https://mhealpy.readthedocs.io ?

Data model and data formats#

As of v1.0, Gammapy’s internal DL3 data structures are very deeply intertwined with the GADF specification. Astropy table are read from GADF compliant FITS files and stored as is. Part of the information being stored in the table.meta.

This is problematic for the following reasons: - This prevents the support of multiple formats, since the internal data structure is tied to one specific format. - Data is not in the optimal in-memory representation. For instance, times should be stored as astropy.time.Time instances, and coodinates as SkyCoord. - Data is not validated on input. Errors can happen deep into the code for something that could have been caught on input file reading or object creation. - Writing data out is harder

We should:

  • define the internal data model, via the corresponding data classes (EventList, IRFs, etc. ) and introduce a validation mechanism on input.

  • build a clear IO boundary between internal and external data representations that supports various versions of various formats.

  • define a metadata structure

Clarify internal Gammapy DL3 data model#

Each DL3 object should have its validate() method called on init.

See also the general discussion in GH 3767 . The specific subparts are discussed in GH 4238, GH 4239, GH 4240 and GH 4241.

Version Support for I/O#

Meta Data Handling#

A metadata class structure specific for Gammapy should be designed and implemented. It should allow complex types (e.g. SkyCoord or even Map), it should validate its content, allow hierarchical structure (i.e. a metadata object should be able to contain another one). Once defined, specific classes such as IRFMetaData, DatasetMetaData, or ObservationMetaData classes can be introduced with their separate serialization and validation. This is discussed in PIG 25 which proposes to handle MetaData with pydantic which allows defining hierarchical structures and being able to validate those. See GH 4491

Once this is defined a second question must be tackled: the metadata model: what is meta data / and what is data and where to draw the line.

Estimators#

The sensitivity of given Datasets for an estimates quantity should be provided by Estimators in particular for flux. Flux maps estimators should provide sensitivity maps and flux point estimators could provide the spectral flux sensitivity.

Documentation#

Main documentation#

  • Introduce a deprecation system

  • Update pydata-sphinx-theme

  • More detailed

  • Use type hints in Gammapy everywhere?

Gammapy-recipes and additional ressources#

The Gammapy-recipes gallery offers a nice additional source of tutorials for advanced or non standard use cases.

Several questions should be solved for the long term viability of such a repository: e.g. should the recipes be updated to e.g. the latest LTS? Currently, none of the existing recipes work with v1.0.

How to keep track and refer to material designed for hands-on sessions, schools etc? Currently we have one GitHub repository: gammapy-handson.

We could also develop tutorial videos.

Infrastructure#

  • Improve test coverage and quality.

  • Improve our tools helping to the creation of releases

  • Creation of Docker images with an automatized tool

  • Re-use docker image for Binder, this config here: gammapy/gammapy-webpage already creates docker image in Binder.

Distributed Computing and Performance#

  • Evaluate Jax for GPU acceleration and autograd (https://jax.readthedocs.io/en/latest/ )

  • Evaluate Ray for distributed computing (https://www.ray.io )

  • Make Dataset distributable with same API

  • Probably rework Dataset API, split off model handling…

  • Split off statistic handling from datasets

Flexible Statistics API#

  • Support for priors in likelihood

  • Support for systematics terms in likelihood

  • Needs to be serialised, i.e. keep information on which statistics and priors haven been used (meta data / provenance)

  • Split of statistics definition from datasets…

  • Support for statistical test associated with periodic signals, in the frequency domain

  • Add more tests on model hypothesis? E.g. AIC, PS (https://arxiv.org/abs/2109.07443)

  • Add likelihood weights?

Models and Modeling#

  • Move amplitude parameter to SkyModel

  • Rely more on the SkyModel then the submodels

  • What about NPredModel, deprecate or introduce consistently as concept?

  • Adjustment of theory-based abaques as spatial/spectral model ? (random axis as parameters, interpolation features during evaluation, definition of a ‘format’)

  • Formats for energy dependent temporal models

  • How to handle the handle the FitResult object? Make this more important? Make it serialisable? Rely on it in later API, such as Estimators?

Decision#

The PIG discussion has stalled. The roadmap reparation process was not very well organized. This should be improved for the next development round.

A number of features discussed in the current draft have been implemented. Some of the use cases are supported but work remains to be done for several. Many decisions and major changes have been postponed