.. include:: ../../references.txt .. _pig-002: *********************************************** PIG 2 - Organization of low level analysis code *********************************************** ----------------------------------- The case of image and cube analysis ----------------------------------- * Author: RĂ©gis Terrier & Christoph Deil * Created: Jan 12, 2018 * Accepted: Jul 27, 2018 * Status: accepted * Discussion: `GH 1277`_ Abstract ======== This PIG discusses the general structure of the low level analysis subpackages of gammapy. Low level analysis is based on the gammapy building blocks from ``gammapy.data``, ``gammapy.irf`` and ``gammapy.maps``. Low level analysis implements all the individual steps required to perform data reduction for IACT from DL3 inputs (event lists and IRFs) to DL4 data (spectra, maps and cubes) and their associated reduced IRFs. Low level analysis should be structured in a very modular way to allow easy implementation of high level analysis classes and scripts. General code style guidelines ============================= Functions or methods should be no longer than few tens of lines of code. Above that it is better to use multiple functions to make testing easier and allow more modular usage. One line functions are usually not needed unless this is a very complex line. Similarly, classes should have 3-10 methods. 2 methods classes (e.g. only ``__init__`` and ``__call__``) should usually be functions. Above 10-20 methodes, the class should be split into several classes/functions. It is important to keep the number of functions and classes needed by the user to a reasonable level. Modularity is therefore very important, since it allows to easily implement high level interfaces that orchestrates the common analysis patterns. Algorithms and data should be clearly separated. The naming scheme used should allow easy identification of the nature of a piece of code. For instance, functions creating maps and or cube should be named make_map_xxx. Data analysis subpackages in gammapy ==================================== Low level analysis produces reduced datasets and IRFs from the general event lists and multidimensional IRFs of each observation or GTI. The building blocks on which it relies are coded in gammapy.data (``EventList``, ``DataStore``, ``DataStoreObservation`` etc), in gammapy.maps (in particular ``WcsNDMap`` used both for images and cubes), in gammapy.irf (e.g. ``EffectiveAreaTable2D``, ``EnergyDispersion2D``, ``EnergyDependentTablePSF``, etc). Analysis subpackages are: * 1D or spectral analysis (in ``gammapy.spectrum``) * 2D and 3D (cube) analysis (in ``gammapy.cube``) * timing analysis (in ``gammapy.time``) Low level map and cube analysis =============================== The low level analysis cube package deals with the production of all maps/cubes and PSF kernels required to perform 2D and 3D modeling and fitting. This includes counts, exposure, acceptance and normalized background maps and cubes. These reduced data and IRFs are stored using the ``gammapy.maps.WcsNDMap`` class which describes multidimensional maps with their World Coordinate System (WCS) description and a set of non-spatial axis. The default map structure for most of the typical analysis will be 3 dimensional maps with an energy axis (with a single bin for 2D images). The low level analysis is performed on an observation per observation (or GTI) basis. This is required by the response and background rapid variations. Therefore, all basic functions operate on a single ``EventList`` or set of IRFs (i.e. ``EffectiveAreaTable2D``, ``EnergyDispersion2D``, ``EnergyDependentTablePSF``). The iterative production of the individual reduced datasets and IRFs and their combination is realized by the higher level class. The individual observation products can be serialized, mostly for analysis debugging purposes or to avoid reprocessing large databases when new data are added. Depending on the type of analysis, different reduced IRFs are to be produced. The main difference lies in the type of energy considered: reconstructed or true (i.e. incident) energy. Counts, hadronic acceptance and background always use reconstructed (i.e. measured) energy. Exposure and PSF kernels will be defined in reconstructed energy for 2D analysis whereas they will be defined in true energies for 3D analysis with their own energy binning. A reduced energy dispersion will then be produced to convert from true to reconstructed energies and used later to predict counts. The maker functions and the products have to clearly state what type of energy they are using to avoid any confusion. The serialization has to include a way to clearly differentiate the products. Some metadata, probably in the form of an ``OrderedDict`` as in the case of ``astropy.table.Table`` could be used to do so. In order to perform likelihood analysis of maps and cubes, as well as to apply *ON-OFF* significance estimation techniques it is important to have integers values for counts and OFF maps produced by ring background estimation techniques (on an observation per observation basis). Therefore, we want to avoid reprojecting individual maps onto a global mosaic. The approach should be to define the general geometry of the target mosaic map and to perform cutouts for each observation. This can be done using for instance ``astropy.Cutout2D``. The index range of the cutout in the general mosaic map should be kept for easy summation. This step is performed with: ``make_map_cutout`` * *takes* a ``WcsNDMap`` and a maximum offset angle ``Angle`` or ``Quantity`` * *returns* the ``WcsGeom`` of the cutout and its ``slice`` For individual observations/gti, the general arguments of all maker functions are: * Reference image and energy range. ``gammapy.maps.MapGeom`` * Maximum offset angle. ``astropy.coordinates.Angle`` The various maker functions are then: ``make_map_counts`` * *takes* an ``EventList`` * *returns* a count map/cube ``make_map_exposure_true_energy`` * *takes* a pointing direction, an ``EffectiveAreaTable2D`` and a livetime * *returns* an exposure map/cube in true energy ``make_map_exposure_reco_energy`` * *takes* a pointing direction, an ``EffectiveAreaTable2D``, an ``EnergyDispersion2D`` and a livetime * *returns* an exposure map/cube in reco energy ``make_map_hadron_acceptance`` * *takes* a pointing direction, a ``Background3D`` and a livetime * *returns* an hadronic acceptance map, i.e. a predicted background map/cube. ``make_map_FoV_background`` * *takes* maps/cube (``WcsNDMap``) of observed counts and hadron acceptance/predicted background and an exclusion map * *returns* the map of background normalized on the observed counts in the whole FoV (excluding regions with significant gamma-ray emission). * Different energy grouping schemes should be available to ensure a reasonable number of events are used for the normalization. This scheme and the number of events used for normalization should be included in the optional serialization. ``make_map_ring_background`` * *takes* maps/cube (``WcsNDMap``) of observed counts and hadron acceptance/predicted background and exclusion map. It also takes a ``gammapy.background.AdaptiveRingBackgroundEstimator`` or a ``gammapy.background.RingBackgroundEstimator`` * *returns* the map of background normalized on the observed counts with a ring filter (excluding regions with significant gamma-ray emission). The background estimator object also contains the *OFF* map and the *ON* and *OFF* exposure maps. * Most likely this technique is not meant to be used for too small energy bands, so that energy grouping is probably not relevant here. The general processing can then be performed by general classes or scripts, possibly config file driven. It should be sufficiently modular to allow for users to do their own scripts Existing code ============= Currently, maps and cubes rely on the ``SkyImage`` and ``SkyCube`` classes. There are various scripts and classes existing currently in gammapy to produce maps and cubes (mostly developed by @adonath and @ljouvin).Image processing can be performed with ``SingleObsImageMaker`` and ``StackedObsImageMaker``, while cube processing can be performed with ``SingleObsCubeMaker`` and ``StackedObsCubeMaker``. For images, one can also use the ``IACTBasicImageEstimator``. All this code relies on high level class which perform all the analysis sequentially (exposure, background, count maps etc). This approach is not modular and creates a lot of code duplication. Some cube-related analysis is required for images creating some cross-dependencies. The proposed scheme should be much more modular and allow user to use gammapy as a library to compose their own scripts and classes if needed. It should limit code duplication. In particular, it uses the more general ``gammapy.maps`` which allows to get rid of the cross dependencies of the image and cube package we have now. The existing code will remain in gammapy for the moment, with possibly some bugs fixed. The new code is largely independent so that the new development should bot break user scripts. Decision ======== This PIG was extensively discussed on GitHub, as well as in Gammapy weekly calls and at the Feb 2018 and July 2018 Gammapy meetings. Doing this move to new analysis code based on gammapy.maps was never controversial, bug API and implementation discussions were ongoing. On July 27, 2018, Regis and Christoph noticed that the description in this PIG had been mostly implemented in Gammapy master already, and that further progress would come from individual improvements, not a rewrite / update of this PIG with a complete design. So we decided to merge this PIG with status "approved" to have it on the record as part of the design and evolution process for Gammapy. .. _GH 1277: https://github.com/gammapy/gammapy/pull/1277