PIG 11 - Light curves#
Author: RĂ©gis Terrier, Axel Donath, David Fidalgo, Atreyee Sinha
Created: Jul 2, 2018
Withdrawn: Oct 29, 2019
Status: withdrawn
Discussion: GH 1451
Abstract#
In this PIG we want to discuss a restructuring of the way light curves are computed and stored in Gammapy. Lightcurves in gamma-ray astronomy are the result of a series of fits of the source flux in each time bin. Lightcurve extraction covers therefore both the data reduction and the data modeling steps. The lightcurve estimation will therefore have these two steps.
Here, we propose to perform the data reduction step in each of the time bins and
store the result in the form of a Datasets
(MapDataset
or
SpectrumOnOffDataset
depending on the selected approach). Each individual
Dataset
is then modeled and fitted to extract the source flux and its errors
in each time bin. The result is then stored in a LightCurve
object which
contains a Table
of the fit results.
For this purpose, we propose to introduce two new classes to perform the data
reduction first and then then the fitting. The time dependent data reduction
could be a specific case of the high level pipeline, when a list of time bins is
passed with the configuration dict
. The data fitting should be performed by
the new LightCurveEstimator
class, which should essentially be a wrapper
around the FluxPointEstimator
class that does the same thing for spectrum
and map datasets.
Introduction#
Lightcurves in gamma-ray astronomy#
In photon counting experiments, lightcurves are often simply obtained by counting events in a given energy range in a set of time bins. In ground based gamma-ray astronomy, things are usually more complex.
The response and the instrumental background of the instruments can strongly vary over time on a night scale, e.g. because the source elevation changes or on longer time scales given the possible changes of the atmosphere transparency or the instrument efficiency.
Another complexity comes from the astrophysical background which can often pollute a source and needs to be properly removed to extract the intrinsic source flux at any given time.
For these reasons, gamma-ray lightcurve are usually the results of a fit of model on the data performed on a number of time bins to extract the source flux in these time bins.
This is more limited than e.g. time resolved spectral analysis. Although the latter share many similarities with the lightcurve extraction, it is a more complex task which we do not cover here.
Background / What we have now#
The current gammapy.time.LightCurveEstimator
class assumes that a part of
the data reduction process has already been applied at it takes as input a
gammapy.spectrum.SpectrumExtraction
instance for which a list of
gammapy.background.BackgroundEstimate
is required. Apart from the time
intervals, the user also has to provide a gammapy.spectrum.SpectralModel
that is used to compute the expected counts in a time bin and to scale the
integral flux with respect to the excess events found in that time bin. The
parameters of the spectral model are generally obtained via a spectral fit to
the whole data sample. See current lightcurve tutorial.
Drawbacks of this approach: no clear separation of the data reduction and
modeling steps, and only 1D On-Off spectral analysis is supported and lacks
supports for MapDataset
based analysis.
Proposal#
General organization of the new approach#
The approach will be split into 3 main phases:
* Time bin preparation
* Data reduction in time bins to produce a list of Dataset
* Light curve estimation to fit a model on the resulting Datasets
The end product should contain enough information to perform some rebinning and apply high level timing techniques without rerunning the whole data reduction and fitting steps.
Time bin preparation#
Independently of the actual data reduction technique chosen, the user should first provide a list/table of time intervals for which she/he wants to compute the source flux. The computation of this list/table will be done outside of the light curve estimation class.
While we could provide helper functions to prepare the time bins.
astropy.time.Time
is relatively easy to use, so that a user would execute
code similar to the following example:
from astropy.time import Time
time_start = Time("2006-07-29 20:00:00.000")
time_step = 5.0 * u.min
nstep = 120
times = time_start + time_step * np.arange(nstep)
time_bins = []
for tmin, tmax in zip(times, times[1:]):
time_bins.append((tmin,tmax))
Data reduction#
Once the time bins are determined, the user will have to select the relevant
Observations
from the Datastore
. The Observations
are then filtered
and grouped according to the time bins using the ObservationFilter
and
passed to the light curve extraction function or class. The latter could take a
geom
or a region
argument that will define the data reduction geometry
(and in particular, if the data reduction is 3D or 1D). In the absence, of a
RegionGeom
we could simply expect a reco energy and true energy MapAxis
.
We do not detail possible approaches in the current PIG. These should be re-evaluated in the more general context of data reduction. Some examples, using the current data reduction approach will be exposed to users in a dedicated notebook.
Both approaches will result in a list of Datasets
consisting of
MapDataset
or SpectrumDatasetOnOff
with identical geometries for each
time bin.
In order to properly assign times to any Dataset
, the latter must therefore
carry the time information. Minimally, this can be done with a meta information
such as time_start
and time_stop
. This does not give a full idea of the
coverage of time bin though. Ideally, the Dataset
should track the GTI
table of the filtered Eventlist
that were used for its production. It can be
stored on the Dataset
itself and be serialized independently or be added as
an extension to the serialized count
data container (a CountsSpectrum
or
Map
), as is done for OGIP spectra. In order to stack several Dataset
objects it is necessary to be able to combine GTI
tables. While a simple
stacking of tables is enough in many cases, the situation of overlapping time
intervals should be considered and a proper GTI.union()
should be introduced
(a functionality recurrent in HEA software see e.g. ftools mgtime
)
Data Fitting#
The data fitting is the step were the Datasets
are converted into a
lightcurve based on a model of the emission. The amplitude of the source model
is fitted while other parameters are usually, but not necessarily, fixed.
The fitting is very similar to the extraction of FluxPoints
in the spectral
analysis and does not strongly depend on the type of Datasets
considered.
The new LightCurveEstimator
class should therefore be able to take as
input both SpectrumDatasetOnOff
and MapDataset
, in a similar fashion as
the current FluxPointEstimator
.
After creating the Datasets
, the user would first define the model to be
used and freeze its parameters. Then we would apply it to the various
Dataset
objects before calling the LightCurveEstimator
:
sky_model = SkyModel(spatial_model=spatial_model, spectral_model=spectral_model)
sky_model.parameters["index"].frozen = True
sky_model.parameters["lon_0"].frozen = True
sky_model.parameters["lat_0"].frozen = True
for dataset in datasets:
dataset.model = sky_model
lc_estimator = LightCurveEstimator(datasets)
lightcurve = lc.run()
To help, the user deal with model parameters some helper function could be
introduced to freeze all parameters of a model. Another possibility is to assume
that all parameters are fixed except the amplitude
in the
LightCurveEstimator.run()
method, or pass as arguments the names of the
parameters to leave free.
Storing the results and further studies#
The results are returned as a gammapy.time.LightCurve
instance (the current
container class for light curves) that, so far, essentially holds the integrated
flux + errors and the time_min/time_max of each time bin. There are many other
quantities which could be stored, such as the energy range of the flux, the
number of excess events in the time bin considered, etc. The current container
class already provides some methods to study variability, such as chi-square
test, fractional variance estimation.
Example usage:
lightcurve.plot('amplitude')
# Get fractional variance
print(lightcurve.fvar('amplitude'))
The LightCurve
object should be able to rpovide some rebinning
functionalities. In particular, it stores the fit statistic scan, it should be
possible to perform minimal significance rebinning:
simple_rebin_lc = lightcurve.rebin(factor=2)
# or
min_significance_lc = lightcurve.rebin(min_significance=3)
Discussion / Alternatives#
Time bins#
To work with time bins, we could also rely on astropy.timeseries
if we force
the dependency to astropy v>3.2. This would look like:
from astropy.timeseries import BinnedTimeSeries
times = BinnedTimeSeries(time_bin_start="2006-07-29 20:00:00.000",
time_bin_size=5 * u.min)
print(times.time_bin_start)
print(times.time_bin_end)
Light Curve Fitting#
We could provide distinct objects specialized for Map
and Spectrum
datasets
instead of a single object.
Lightcurve#
The Lightcurve
contains an astropy.table.Table
object. It could be
improved by using the astropy_timeseries.BinnedTimeSeries
object which
itself inherits from QTable
and provides support for row selection with
time, rebinning and more complex methods for detailed timing studies such as
Lomb-Scargle periodograms.
Task list#
Add
TSTART
andTSTOP
meta info on theDataset
as well as theGTI
table.Introduce
GTI.union()`
method to mergeDatasets
.Refactor the
Lightcurve
class to add a number of new content in theTable
e.g.:a
fit_stat_scan
column.rebinning methods
Implement the new
LightCurveEstimator
classProvide a notebook showing data reduction and fitting examples for both 3D and 1D cases
Decision#
The authors have decided to withdraw the PIG. A significant part of the implementation is already there. Besides, the recent additions of the high level interface and of the new Maker scheme change the scope of the discussion.