PIG 20 - Global Model API#
Author: Axel Donath, Régis Terrier and Quentin Remy
Created: Jun 10, 2020
Withdrawn: Apr 26th, 2021
Status: withdrawn
Discussion: GH 2942
Abstract#
Gammapy already supports joint-likelihood analyses, where indiviudal, statistically
independent datasets are represented by a Dataset
object. In a typical analysis
scenario there are components in the model, that are only fitted to one of the datasets,
while other model components are shared between all datasets. This PIG proposes the
introduction of a global model API for Gammapy, which handles all model components
involved in an analysis in a single global models object to resolve the spread
model definition in the current implementation. We consider the global model API
as a key solution for future support for distributed computing in Gammapy.
Proposal#
Global Model Handling#
Currently different model components are handled in Gammapy by having a different
selection of models in the Dataset.models
attributes and pointing to the same
instance of a model, if the component is shared between multiple datasets. This
works as long as all objects reside in the same memory space.
If datasets are distributed to different processes in future, it is technically complex and probably in-efficient to share model states between all sub-processes. It is conceptionally simpler if processes communicate with a single server process that contains the single global model object.
The fundamental important difference to the current design is, that the model objects
defined in Dataset.models
can represent copies of the global model object components.
To avoid
Using the .set_models()
API we propose to hide the dataset.models
attribute.
from gammapy.modeling.models import Models
from gammapy.datasets import Datasets
models = Models.read("my-models.yaml")
datasets = Datasets.read("my-datasets.yaml")
# the .set_models call distributes the model components to the datasets
datasets.set_models(models)
# this initialises the model evaluators
dataset.set_models(models)
# and to update parameters during fitting, a manual parameter modification by the user
# requires an update as well, maybe we can "detect" parameter changes automatically by
# caching the last parameters state?
datasets.set_models_parameters(models.parameters)
It also requires adapting our fitting API as well to handle the model separately:
from gammapy.modeling import Fit
fit = Fit(datasets)
result = fit.optimize(models)
# or for estimators
fpe = FluxPointsEstimator(
source="my_source"
)
fpe.run(datasets, models)
The public model attribute allows to create a global model on data reduction like so:
models = Models()
for obs in observations:
dataset, bkg_model = bkg_maker.run(dataset)
dataset.write(f"dataset-{obs.obs_id}.fits.gz")
models.extend(model)
models.write("my-model.yaml")
Interaction Between Models and Dataset Objects#
The MapDataset
object features methods such as .to_spectrum_dataset()
, .to_image()
and .stack()
and .copy()
. It is convenient for the user if those methods modify the
models contained in the dataset as well. In particular this is useful for the background model.
We propose a uniform scheme on how the dataset methods interact with the model.
We propose that in general datasets can modify their own models i.e. copies contained
in DatasetModels
, but never interact “bottom to top” with the global Models
object. So the global model object needs to be re-defined or updated explicitly.
The proposed behaviour is as follows:
- Dataset.copy()
, copy the dataset and model, if a new name is specified for the dataset, the Model.dataset_names
are adapted.
Dataset.stack()
, stack the model components by concatenating the model lists. The background model is stacked in place..to_image()
sums up the background model component andTemplateSpatialModel
if it defines an energy axis..to_spectrum_dataset
, creates a fixedBackgroundModel
by summing up the data in the same region. Further suggestions? Check which model contributes npred to the region?
In this case we drop the model from the dataset.
Background Model Handling#
We also propose to extend the BackgroundModel
to include a spectral model
component like so:
from gammapy.modeling.models import BackgroundIRFModel, PowerLawNormSpectralModel
norm = PowerLawNormSpectralModel(norm=1, tilt=0.1)
bkg_model = BackgroundIRFModel(
spectral_model=norm,
dataset_name="my-dataset"
)
bkg_model.evaluate(map=map)
After introduction of the global model we propose to remove MapDataset.background_model
and use MapDataset.models["dataset-name-bkg"]
instead. Introduce a naming convention?
The background data can be stored either in the BackgroundModel
class
or the MapDataset
object as an IRF. This has implications on the
serialisation and memory management once we introduce distributed
computing. In one case the data is stored in the server process
in the other case it is stored on the sub-process.
To support spectral background models we propose to support RegionGeom
in
the BackgroundModel
class.
Decision#
The authors decided to withdraw the PIG. Most of the proposed changes have been discussed and implemented independently in small contributions and discussion with the Gammapy developer team.