# PIG 9 - Event sampling#

Author: Fabio Pintore, Andrea Giuliani, Axel Donath

Created: May 03, 2019

Accepted: Aug 30, 2019

Status: accepted

Discussion: GH 2136

## Abstract#

An event sampler for gamma events is an important part of the science tools of the future Cherenkov Telescope Array (CTA) observatory. It will allow users to simulate observations of sources with different spectral, morphological and temporal properties and predict the performance of CTA on the simulated events e.g. to support observation proposals or study the sensitivity of future observations. For this reason, we propose to implement a framework for event simulation in Gammapy.

The proposed framework consists of individual building blocks, represented by classes and methods, that can be chained together to achieve a full simulation of an event list corresponding to a given observation. This includes the simulation of source events, background events, effects of instrument response functions (IRF) and arrival times. As underlying method for the actual event sampling we propose to use inverse cumulative distribution function (CDF) sampling (inverseCDF) with finely binned discrete source and background. Temporal models will be also taken into account and time will be sampled separately in a 1D analysis, assuming that the temporal dependency of the input source models factorizes.

## Sampling methods#

Inverse CDF sampling (inverseCDF) is an established method to sample from
discrete probability mass functions. It is used by `ASTRIsim`

(astrisim), the
event simulator of the AGILE collaboration. However it is not the method of
choice for other existing event samplers such as the the Fermi-LAT Science Tools
(gtobsim) and CTOOLS (ctobssim). The latter uses a combination of analytical
sampling for models, where a solution is known (e.g. power-laws) and the
rejection sampling method (rej_sampl), where the sampling has to be done
numerically (see an example here gammalib).

As rejection sampling can directly sample from continuous probability density functions, it is expected to yield very precise results. However an enveloping distribution is needed, which should be adapted to the target distribution to be efficient (see also rejection sampling in Python for an example implementation), otherwise a lot of computation time is spend in rejecting drawn samples.

For this reason we favour the inverse CDF sampling method, as a simple to implement and general sampling method. The precision of the inverse CDF sampling method can be controlled by the resolution of the input probability mass function (PMF) and is in practice only limited by the available memory. We will study the required bin-size of the PMFs to reach sufficient precision. If we find the inverse CDF sampling method to be not precise enough, it is still possible to achieve better precision adopting the rejection sampling. This will not have a strong impact on the structure of the event-sampler.

## Proposal#

We propose to include in `gammapy.cube`

an high level interface (HLI) class,
labelled as `MapDatasetEventSampler`

or `MapDataset.sample`

method. This
class handles the complete event sampling process, including the corrections
related to the IRF and source temporal variability, for a given GTI /
observation.

The dataset will be computed using the standard data reduction procedure of Gammapy, as illustrated in the following example:

```
obs = Observation(pointing, gti, aeff, psf, edisp, expomap)
maker = MapDatasetMaker(geom, geom_irf, ...)
dataset = maker.run(obs)
model = SkyModels.read("model.yaml")
dataset.model = model
sampler = MapDatasetEventSampler(dataset)
events = sampler.sample()
events.write()
```

After data reduction, the Dataset object should contain all the needed
information, such as the pointing sky coordinates, the GTI, and the setup of all
the models (spectra, spatial morphology, temporal model) for any given source,
and it is passed as input parameter to the `MapDatasetEventSampler`

. It is
important to note that the `MapDataset`

object can store information for more
than one source. Then, a `.sample`

method will draw the sampled events and
will provide an output `~astropy.table.Table`

object. The latter will contain
the reconstructed sky positions, energies, times, and an `EVENT_ID`

and
`MC_ID`

. `EVENT_ID`

is a unique number or a string to identify the sampled
event, while `MC_ID`

is a unique ID (number or string) to identify the model
component the event was sampled from. The `MapDatasetSampler`

should also fill
the mandatory header information for event list files described on gadf.

### MapDatasetEventSampler#

The general design of the `sample`

method is as follows:

```
def sample(dataset, random_state)
"""Sample events from a ``MapDataset``"""
events_list = []
for evaluator in dataset.evaluators:
npred = evaluator.compute_npred()
n_events = random_state.poisson(npred.data.sum())
events = npred.sample(n_events, random_state)
time = LightCurveTableModel.sample(n_events=, lc=, random_state=)
events = hstack(events,time)
events_list.append(events)
event_list["MC_ID"] = evaluator.model.name
events_src = vstack(events_list)
events_src = dataset.psf.sample(events_src, random_state)
events_src = dataset.edisp.sample(events_src, random_state)
n_events_bkg = random_state.poisson(dataset.background_model.map.data.sum())
events_bkg = dataset.background_model.sample(n_events, random_state)
events_total = vstack([events_src, events_bkg])
events_total.meta = get_events_meta_data(dataset)
return EventList(events_total)
```

In more detail, `sample`

starts a loop over the sources stored into the
`MapDataset`

model. Then, for each source, the `src.compute_npred`

method
will calculate the predicted number of source counts `npred`

. In particular,
it is important to note that `npred = exposure * flux`

, where `exposure`

is
defined as `effective_area * exposure_time`

. `npred`

is therefore calculated
irrespective of the energy dispersion and of PSF. Then, `npred`

will be the
input of the `npred.sample`

method. The latter uses a Poisson distribution,
with mean equal to the predicted counts, to estimate the random number of
sampled events.

We propose to add a `Map.sample(n_events=, random_state=)`

method in
`~gammapy.maps.Map`

that will be the core of the sampling process. The
`sample`

is based on the `~gammapy.utils.random.InverseCDFSampler`

class
described in GH 2229 . The output will be an `~astropy.table.Table`

with
columns: `RA_TRUE`

, `DEC_TRUE`

and `ENERGY_TRUE`

.

Then, the time will be sampled independently using the temporal information
stored into the `MapDataset`

model for each source of interest. This will be
done through a `.sample(n_events=, random_state=)`

method that we propose to
add to `~gammapy.time.models.LightCurveTableModel`

and
`~gammapy.time.models.PhaseCurveTableModel`

. This method will take as input
the GTIs (i.e. one Tstart and Tstop) in the `MapDataset`

object. Also in this
case the `InverseCDFSampler`

class is the machine used to sample the time of
the events. In the case the temporal model is not provided, the time is
uniformly sampled in the time range `t_min`

and `t_max`

. To define a
light-curve per model component, the current `SkyModel`

class will be extended
by a `SkyModel(..., temporal_model=)`

.

The IRF correction can now be applied to sampled events. We propose to add a
`.sample(events=)`

method in both `~gammapy.cube.PSFMap`

and
`~gammapy.cube.EdispMap`

. The method interpolates the “correct” IRF at the
position of a given event and applies it. In more detail, the method calculates
the psf and the energy dispersion at the events true positions and true
energies, which are given in input as an `~astropy.table.Table`

object. The
IRFs are assumed to be constant and not time-dependent. The output will be an
`~astropy.table.Table`

with the new columns `RA`

, `DEC`

and `ENERGY`

,
which are the reconstructed event energies and positions.

Finally, the times and the energies/coordinates of the events will be merged
into a single `~astropy.table.Table`

with the columns:`RA`

, `DEC`

and
`ENERGY`

and `TIME`

.

The `MapDatasetEventSampler`

can be used to sample background events using the
`Map.sample(n_events=, random_state=)`

as well. The time of the events is
sampled assuming a constant event rate. Finally, the IRF corrections are not
applied to background sampled events.

## Performance and precision evaluation#

To evaluate the precision and performance of the described framework we propose to implement a prototype for a simulation / fitting pipeline. Starting from a selection of spatial, spectral and temporal models, data are simulated and fitted multiple times to derive distributions and pull-distributions of the reconstructed model parameters. This pipeline should also monitor the required cpu and memory usage. This first prototype can be used to evaluate the optimal bin-size (with the best compromise between performance and precision) for the simulations and to verify the over-all correctness of the simulated data. This will be valid for a set of input maps and IRFs. Later this prototype can be developed further into a full simulation / fitting continuous integration system.

## Alternatives / Outlook#

So far Gammapy only supports binned likelihood analysis and technically most
use- cases for the event sampling could be solved with binned simulations. A
binned simulation can be basically achieved by a call to
`numpy.random.poisson()`

based on the predicted number of counts map. This is
conceptionally simpler as well as computationally more efficient than a sampling
of event lists. In `Gammapy`

a similar dataset simulation is already
implemented in `Dataset.fake()`

, although this has a limited number of use
cases than an event sampler. However, to support the full data access and data
reduction process for simulations, event lists are required. In future Gammapy
possibly also supports event based analysis methods (unbinned likelihood, but
also e.g. clustering algorithms), that also require event lists. For this reason
binned simulations cannot present a full equivalent solution to event sampling.

The question of the API to simulate multiple observations from e.g. an
`ObservationTable`

or a list of `GTIs`

as it is needed for simulating data
for the CTA data challenge is not addressed in this PIG. For the scope of this
PIG, the fundamental class `MapDatasetEventSampler`

to simulate events
corresponding to a given observation and/or single GTI is in place.

The proposed Event Sampler will not provide, for each event, the corresponding
`DETX`

and `DETY`

position. These will be added in a future development of
the simulator.

## Task list#

This is a proposal for a list of tasks to implement the proposed changes:

Implement the

`sample`

method in`gammapy.maps.Map`

and add tests.Implement the

`sample`

method in`gammapy.time.models.LightCurveTableModel`

and`gammapy.time.models.PhaseCurveTableModel`

and add tests.Implement the

`sample`

method in`gammapy.cube.PSFMap`

and add tests.Implement the

`sample`

method in`gammapy.cube.EdispMap`

and add tests.Introduce the

`MapDatasetEventSampler`

into`gammapy.cube.`

and add tests.Add tutorials for event simulations of different kinds of sources.

## Decision#

The PIG was discussed extensively in GH 2136, the weekly Gammapy developer calls and coding sprint in person. After the deadline for final review expired on August 20, all remaining comments were addressed and the PIG was accepted on August 30, 2019.