PIG 11 - Light curves#

Author: Régis Terrier, Axel Donath, David Fidalgo, Atreyee Sinha
Created: Jul 2, 2018
Withdrawn: Oct 29, 2019
Status: withdrawn
Discussion: GH 1451

Abstract#

In this PIG we want to discuss a restructuring of the way light curves are computed and stored in Gammapy. Lightcurves in gamma-ray astronomy are the result of a series of fits of the source flux in each time bin. Lightcurve extraction covers therefore both the data reduction and the data modeling steps. The lightcurve estimation will therefore have these two steps.

Here, we propose to perform the data reduction step in each of the time bins and store the result in the form of a Datasets (MapDataset or SpectrumOnOffDataset depending on the selected approach). Each individual Dataset is then modeled and fitted to extract the source flux and its errors in each time bin. The result is then stored in a LightCurve object which contains a Table of the fit results.

For this purpose, we propose to introduce two new classes to perform the data reduction first and then then the fitting. The time dependent data reduction could be a specific case of the high level pipeline, when a list of time bins is passed with the configuration dict. The data fitting should be performed by the new LightCurveEstimator class, which should essentially be a wrapper around the FluxPointEstimator class that does the same thing for spectrum and map datasets.

Introduction#

Lightcurves in gamma-ray astronomy#

In photon counting experiments, lightcurves are often simply obtained by counting events in a given energy range in a set of time bins. In ground based gamma-ray astronomy, things are usually more complex.

The response and the instrumental background of the instruments can strongly vary over time on a night scale, e.g. because the source elevation changes or on longer time scales given the possible changes of the atmosphere transparency or the instrument efficiency.

Another complexity comes from the astrophysical background which can often pollute a source and needs to be properly removed to extract the intrinsic source flux at any given time.

For these reasons, gamma-ray lightcurve are usually the results of a fit of model on the data performed on a number of time bins to extract the source flux in these time bins.

This is more limited than e.g. time resolved spectral analysis. Although the latter share many similarities with the lightcurve extraction, it is a more complex task which we do not cover here.

Background / What we have now#

The current gammapy.time.LightCurveEstimator class assumes that a part of the data reduction process has already been applied at it takes as input a gammapy.spectrum.SpectrumExtraction instance for which a list of gammapy.background.BackgroundEstimate is required. Apart from the time intervals, the user also has to provide a gammapy.spectrum.SpectralModel that is used to compute the expected counts in a time bin and to scale the integral flux with respect to the excess events found in that time bin. The parameters of the spectral model are generally obtained via a spectral fit to the whole data sample. See current lightcurve tutorial.

Drawbacks of this approach: no clear separation of the data reduction and modeling steps, and only 1D On-Off spectral analysis is supported and lacks supports for MapDataset based analysis.

Proposal#

General organization of the new approach#

The approach will be split into 3 main phases: * Time bin preparation * Data reduction in time bins to produce a list of Dataset * Light curve estimation to fit a model on the resulting Datasets

The end product should contain enough information to perform some rebinning and apply high level timing techniques without rerunning the whole data reduction and fitting steps.

Time bin preparation#

Independently of the actual data reduction technique chosen, the user should first provide a list/table of time intervals for which she/he wants to compute the source flux. The computation of this list/table will be done outside of the light curve estimation class.

While we could provide helper functions to prepare the time bins. astropy.time.Time is relatively easy to use, so that a user would execute code similar to the following example:

from astropy.time import Time
time_start = Time("2006-07-29 20:00:00.000")
time_step = 5.0 * u.min
nstep = 120
times = time_start + time_step * np.arange(nstep)
time_bins = []
for tmin, tmax in zip(times, times[1:]):
    time_bins.append((tmin,tmax))

Data reduction#

Once the time bins are determined, the user will have to select the relevant Observations from the Datastore. The Observations are then filtered and grouped according to the time bins using the ObservationFilter and passed to the light curve extraction function or class. The latter could take a geom or a region argument that will define the data reduction geometry (and in particular, if the data reduction is 3D or 1D). In the absence, of a RegionGeom we could simply expect a reco energy and true energy MapAxis.

We do not detail possible approaches in the current PIG. These should be re-evaluated in the more general context of data reduction. Some examples, using the current data reduction approach will be exposed to users in a dedicated notebook.

Both approaches will result in a list of Datasets consisting of MapDataset or SpectrumDatasetOnOff with identical geometries for each time bin.

In order to properly assign times to any Dataset, the latter must therefore carry the time information. Minimally, this can be done with a meta information such as time_start and time_stop. This does not give a full idea of the coverage of time bin though. Ideally, the Dataset should track the GTI table of the filtered Eventlist that were used for its production. It can be stored on the Dataset itself and be serialized independently or be added as an extension to the serialized count data container (a CountsSpectrum or Map), as is done for OGIP spectra. In order to stack several Dataset objects it is necessary to be able to combine GTI tables. While a simple stacking of tables is enough in many cases, the situation of overlapping time intervals should be considered and a proper GTI.union() should be introduced (a functionality recurrent in HEA software see e.g. ftools mgtime)

Data Fitting#

The data fitting is the step were the Datasets are converted into a lightcurve based on a model of the emission. The amplitude of the source model is fitted while other parameters are usually, but not necessarily, fixed.

The fitting is very similar to the extraction of FluxPoints in the spectral analysis and does not strongly depend on the type of Datasets considered. The new LightCurveEstimator class should therefore be able to take as input both SpectrumDatasetOnOff and MapDataset, in a similar fashion as the current FluxPointEstimator.

After creating the Datasets, the user would first define the model to be used and freeze its parameters. Then we would apply it to the various Dataset objects before calling the LightCurveEstimator:

sky_model = SkyModel(spatial_model=spatial_model, spectral_model=spectral_model)
sky_model.parameters["index"].frozen = True
sky_model.parameters["lon_0"].frozen = True
sky_model.parameters["lat_0"].frozen = True

for dataset in datasets:
    dataset.model = sky_model

lc_estimator = LightCurveEstimator(datasets)
lightcurve = lc.run()

To help, the user deal with model parameters some helper function could be introduced to freeze all parameters of a model. Another possibility is to assume that all parameters are fixed except the amplitude in the LightCurveEstimator.run() method, or pass as arguments the names of the parameters to leave free.

Storing the results and further studies#

The results are returned as a gammapy.time.LightCurve instance (the current container class for light curves) that, so far, essentially holds the integrated flux + errors and the time_min/time_max of each time bin. There are many other quantities which could be stored, such as the energy range of the flux, the number of excess events in the time bin considered, etc. The current container class already provides some methods to study variability, such as chi-square test, fractional variance estimation.

Example usage:

lightcurve.plot('amplitude')

# Get fractional variance
print(lightcurve.fvar('amplitude'))

The LightCurve object should be able to rpovide some rebinning functionalities. In particular, it stores the fit statistic scan, it should be possible to perform minimal significance rebinning:

simple_rebin_lc = lightcurve.rebin(factor=2)
# or
min_significance_lc = lightcurve.rebin(min_significance=3)

Discussion / Alternatives#

Time bins#

To work with time bins, we could also rely on astropy.timeseries if we force the dependency to astropy v>3.2. This would look like:

from astropy.timeseries import BinnedTimeSeries
times = BinnedTimeSeries(time_bin_start="2006-07-29 20:00:00.000",
                       time_bin_size=5 * u.min)
print(times.time_bin_start)
print(times.time_bin_end)

Light Curve Fitting#

We could provide distinct objects specialized for Map and Spectrum datasets instead of a single object.

Lightcurve#

The Lightcurve contains an astropy.table.Table object. It could be improved by using the astropy_timeseries.BinnedTimeSeries object which itself inherits from QTable and provides support for row selection with time, rebinning and more complex methods for detailed timing studies such as Lomb-Scargle periodograms.

Task list#

Add TSTART and TSTOP meta info on the Dataset as well as the GTI table.
Introduce GTI.union()` method to merge Datasets.
Refactor the Lightcurve class to add a number of new content in the Table e.g.:
- a fit_stat_scan column.
- rebinning methods
Implement the new LightCurveEstimator class
Provide a notebook showing data reduction and fitting examples for both 3D and 1D cases

Decision#

The authors have decided to withdraw the PIG. A significant part of the implementation is already there. Besides, the recent additions of the high level interface and of the new Maker scheme change the scope of the discussion.