PIG 20 - Global Model API

  • Author: Axel Donath, Régis Terrier and Quentin Remy

  • Created: Jun 10, 2020

  • Withdrawn: Apr 26th, 2021

  • Status: withdrawn

  • Discussion: GH 2942

Abstract

Gammapy already supports joint-likelihood analyses, where indiviudal, statistically independent datasets are represented by a Dataset object. In a typical analysis scenario there are components in the model, that are only fitted to one of the datasets, while other model components are shared between all datasets. This PIG proposes the introduction of a global model API for Gammapy, which handles all model components involved in an analysis in a single global models object to resolve the spread model definition in the current implementation. We consider the global model API as a key solution for future support for distributed computing in Gammapy.

Proposal

Global Model Handling

Currently different model components are handled in Gammapy by having a different selection of models in the Dataset.models attributes and pointing to the same instance of a model, if the component is shared between multiple datasets. This works as long as all objects reside in the same memory space.

If datasets are distributed to different processes in future, it is technically complex and probably in-efficient to share model states between all sub-processes. It is conceptionally simpler if processes communicate with a single server process that contains the single global model object.

The fundamental important difference to the current design is, that the model objects defined in Dataset.models can represent copies of the global model object components. To avoid

Using the .set_models() API we propose to hide the dataset.models attribute.

from gammapy.modeling.models import Models
from gammapy.datasets import Datasets

models = Models.read("my-models.yaml")
datasets = Datasets.read("my-datasets.yaml")

# the .set_models call distributes the model components to the datasets
datasets.set_models(models)

# this initialises the model evaluators
    dataset.set_models(models)

# and to update parameters during fitting, a manual parameter modification by the user
# requires an update as well, maybe we can "detect" parameter changes automatically by
# caching the last parameters state?
datasets.set_models_parameters(models.parameters)

It also requires adapting our fitting API as well to handle the model separately:

from gammapy.modeling import Fit

fit = Fit(datasets)

result = fit.optimize(models)

# or for estimators

fpe = FluxPointsEstimator(
    source="my_source"
)

fpe.run(datasets, models)

The public model attribute allows to create a global model on data reduction like so:

models = Models()

for obs in observations:
    dataset, bkg_model = bkg_maker.run(dataset)
    dataset.write(f"dataset-{obs.obs_id}.fits.gz")
    models.extend(model)

models.write("my-model.yaml")

Interaction Between Models and Dataset Objects

The MapDataset object features methods such as .to_spectrum_dataset(), .to_image() and .stack() and .copy(). It is convenient for the user if those methods modify the models contained in the dataset as well. In particular this is useful for the background model. We propose a uniform scheme on how the dataset methods interact with the model.

We propose that in general datasets can modify their own models i.e. copies contained in DatasetModels, but never interact “bottom to top” with the global Models object. So the global model object needs to be re-defined or updated explicitly.

The proposed behaviour is as follows: - Dataset.copy(), copy the dataset and model, if a new name is specified for the dataset, the Model.dataset_names are adapted.

  • Dataset.stack(), stack the model components by concatenating the model lists. The background model is stacked in place.

  • .to_image() sums up the background model component and TemplateSpatialModel if it defines an energy axis.

  • .to_spectrum_dataset, creates a fixed BackgroundModel by summing up the data in the same region. Further suggestions? Check which model contributes npred to the region?

In this case we drop the model from the dataset.

Background Model Handling

We also propose to extend the BackgroundModel to include a spectral model component like so:

from gammapy.modeling.models import BackgroundIRFModel, PowerLawNormSpectralModel

norm = PowerLawNormSpectralModel(norm=1, tilt=0.1)

bkg_model = BackgroundIRFModel(
    spectral_model=norm,
    dataset_name="my-dataset"
)

bkg_model.evaluate(map=map)

After introduction of the global model we propose to remove MapDataset.background_model and use MapDataset.models["dataset-name-bkg"] instead. Introduce a naming convention?

The background data can be stored either in the BackgroundModel class or the MapDataset object as an IRF. This has implications on the serialisation and memory management once we introduce distributed computing. In one case the data is stored in the server process in the other case it is stored on the sub-process.

To support spectral background models we propose to support RegionGeom in the BackgroundModel class.

Decision

The authors decided to withdraw the PIG. Most of the proposed changes have been discussed and implemented independently in small contributions and discussion with the Gammapy developer team.