.. include:: ../../references.txt .. _pig-020: ************************* PIG 20 - Global Model API ************************* * Author: Axel Donath, RĂ©gis Terrier and Quentin Remy * Created: Jun 10, 2020 * Withdrawn: Apr 26th, 2021 * Status: withdrawn * Discussion: `GH 2942`_ Abstract ======== Gammapy already supports joint-likelihood analyses, where indiviudal, statistically independent datasets are represented by a `Dataset` object. In a typical analysis scenario there are components in the model, that are only fitted to one of the datasets, while other model components are shared between all datasets. This PIG proposes the introduction of a global model API for Gammapy, which handles all model components involved in an analysis in a single global models object to resolve the spread model definition in the current implementation. We consider the global model API as a key solution for future support for distributed computing in Gammapy. Proposal ======== Global Model Handling --------------------- Currently different model components are handled in Gammapy by having a different selection of models in the ``Dataset.models`` attributes and pointing to the same instance of a model, if the component is shared between multiple datasets. This works as long as all objects reside in the same memory space. If datasets are distributed to different processes in future, it is technically complex and probably in-efficient to share model states between all sub-processes. It is conceptionally simpler if processes communicate with a single server process that contains the single global model object. The fundamental important difference to the current design is, that the model objects defined in ``Dataset.models`` can represent copies of the global model object components. To avoid Using the ``.set_models()`` API we propose to hide the ``dataset.models`` attribute. .. code:: from gammapy.modeling.models import Models from gammapy.datasets import Datasets models = Models.read("my-models.yaml") datasets = Datasets.read("my-datasets.yaml") # the .set_models call distributes the model components to the datasets datasets.set_models(models) # this initialises the model evaluators dataset.set_models(models) # and to update parameters during fitting, a manual parameter modification by the user # requires an update as well, maybe we can "detect" parameter changes automatically by # caching the last parameters state? datasets.set_models_parameters(models.parameters) It also requires adapting our fitting API as well to handle the model separately: .. code:: from gammapy.modeling import Fit fit = Fit(datasets) result = fit.optimize(models) # or for estimators fpe = FluxPointsEstimator( source="my_source" ) fpe.run(datasets, models) The public model attribute allows to create a global model on data reduction like so: .. code:: models = Models() for obs in observations: dataset, bkg_model = bkg_maker.run(dataset) dataset.write(f"dataset-{obs.obs_id}.fits.gz") models.extend(model) models.write("my-model.yaml") Interaction Between Models and Dataset Objects ---------------------------------------------- The ``MapDataset`` object features methods such as ``.to_spectrum_dataset()``, ``.to_image()`` and ``.stack()`` and ``.copy()``. It is convenient for the user if those methods modify the models contained in the dataset as well. In particular this is useful for the background model. We propose a uniform scheme on how the dataset methods interact with the model. We propose that in general datasets can modify their own models i.e. copies contained in ``DatasetModels``, but never interact "bottom to top" with the global ``Models`` object. So the global model object needs to be re-defined or updated explicitly. The proposed behaviour is as follows: - ``Dataset.copy()``, copy the dataset and model, if a new name is specified for the dataset, the ``Model.dataset_names`` are adapted. - ``Dataset.stack()``, stack the model components by concatenating the model lists. The background model is stacked in place. - ``.to_image()`` sums up the background model component and ``TemplateSpatialModel`` if it defines an energy axis. - ``.to_spectrum_dataset``, creates a fixed ``BackgroundModel`` by summing up the data in the same region. Further suggestions? Check which model contributes npred to the region? In this case we drop the model from the dataset. Background Model Handling ------------------------- We also propose to extend the ``BackgroundModel`` to include a spectral model component like so: .. code:: from gammapy.modeling.models import BackgroundIRFModel, PowerLawNormSpectralModel norm = PowerLawNormSpectralModel(norm=1, tilt=0.1) bkg_model = BackgroundIRFModel( spectral_model=norm, dataset_name="my-dataset" ) bkg_model.evaluate(map=map) After introduction of the global model we propose to remove ``MapDataset.background_model`` and use ``MapDataset.models["dataset-name-bkg"]`` instead. Introduce a naming convention? The background data can be stored either in the ``BackgroundModel`` class or the ``MapDataset`` object as an IRF. This has implications on the serialisation and memory management once we introduce distributed computing. In one case the data is stored in the server process in the other case it is stored on the sub-process. To support spectral background models we propose to support ``RegionGeom`` in the ``BackgroundModel`` class. Decision ======== The authors decided to withdraw the PIG. Most of the proposed changes have been discussed and implemented independently in small contributions and discussion with the Gammapy developer team. .. _GH 2942: https://github.com/gammapy/gammapy/pull/2942 .. _gammapy: https://github.com/gammapy/gammapy .. _gammapy-web: https://github.com/gammapy/gammapy-webpage