Note

You are not reading the most up to date version of Gammapy documentation.
Access the latest stable version v1.3 or the list of Gammapy releases.

This is a fixed-text formatted version of a Jupyter notebook

Try online
You can contribute with your own notebooks in this GitHub repository.
Source files: spectrum_analysis.ipynb | spectrum_analysis.py

Spectral analysis with Gammapy¶

Prerequisites¶

Understanding how spectral extraction is performed in Cherenkov astronomy, in particular regarding OFF background measurements.
Understanding the basics data reduction and modeling/fitting process with the gammapy library API as shown in the first gammapy analysis with the gammapy library API tutorial

Context¶

While 3D analysis allows in principle to deal with complex situations such as overlapping sources, in many cases, it is not required to extract the spectrum of a source. Spectral analysis, where all data inside a ON region are binned into 1D datasets, provides a nice alternative.

In classical Cherenkov astronomy, it is used with a specific background estimation technique that relies on OFF measurements taken in the field-of-view in regions where the background rate is assumed to be equal to the one in the ON region.

This allows to use a specific fit statistics for ON-OFF measurements, the wstat (see gammapy.stats.fit_statistics), where no background model is assumed. Background is treated as a set of nuisance parameters. This removes some systematic effects connected to the choice or the quality of the background model. But this comes at the expense of larger statistical uncertainties on the fitted model parameters.

Objective: perform a full region based spectral analysis of 4 Crab observations of H.E.S.S. data release 1 and fit the resulting datasets.

Introduction¶

Here, as usual, we use the gammapy.data.DataStore to retrieve a list of selected observations (gammapy.data.Observations). Then, we define the ON region containing the source and the geometry of the gammapy.datasets.SpectrumDataset object we want to produce. We then create the corresponding dataset Maker.

We have to define the Maker object that will extract the OFF counts from reflected regions in the field-of-view. To ensure we use data in an energy range where the quality of the IRFs is good enough we also create a safe range Maker.

We can then proceed with data reduction with a loop over all selected observations to produce datasets in the relevant geometry.

We can then explore the resulting datasets and look at the cumulative signal and significance of our source. We finally proceed with model fitting.

In practice, we have to: - Create a gammapy.data.DataStore poiting to the relevant data - Apply an observation selection to produce a list of observations, a gammapy.data.Observations object. - Define a geometry of the spectrum we want to produce: - Create a ~regions.CircleSkyRegion for the ON extraction region - Create a gammapy.maps.MapAxis for the energy binnings: one for the reconstructed (i.e. measured) energy, the other for the true energy (i.e. the one used by IRFs and models) - Create the necessary makers : - the spectrum dataset maker : gammapy.makers.SpectrumDatasetMaker - the OFF background maker, here a gammapy.makers.ReflectedRegionsBackgroundMaker - and the safe range maker : gammapy.makers.SafeRangeMaker - Perform the data reduction loop. And for every observation: - Apply the makers sequentially to produce a gammapy.datasets.SpectrumDatasetOnOff - Append it to list of datasets - Define the gammapy.modeling.models.SkyModel to apply to the dataset. - Create a gammapy.modeling.Fit object and run it to fit the model parameters - Apply a gammapy.estimators.FluxPointsEstimator to compute flux points for the spectral part of the fit.

Setup¶

As usual, we’ll start with some setup …

[1]:

%matplotlib inline
import matplotlib.pyplot as plt

[2]:

# Check package versions
import gammapy
import numpy as np
import astropy
import regions

print("gammapy:", gammapy.__version__)
print("numpy:", np.__version__)
print("astropy", astropy.__version__)
print("regions", regions.__version__)

gammapy: 0.18.2
numpy: 1.18.5
astropy 4.0.1.post1
regions 0.4

[3]:

from pathlib import Path
import astropy.units as u
from astropy.coordinates import SkyCoord, Angle
from regions import CircleSkyRegion
from gammapy.maps import Map, MapAxis
from gammapy.modeling import Fit
from gammapy.data import DataStore
from gammapy.datasets import (
    Datasets,
    SpectrumDataset,
    SpectrumDatasetOnOff,
    FluxPointsDataset,
)
from gammapy.modeling.models import (
    PowerLawSpectralModel,
    create_crab_spectral_model,
    SkyModel,
)
from gammapy.makers import (
    SafeMaskMaker,
    SpectrumDatasetMaker,
    ReflectedRegionsBackgroundMaker,
)
from gammapy.estimators import FluxPointsEstimator
from gammapy.visualization import plot_spectrum_datasets_off_regions

Load Data¶

First, we select and load some H.E.S.S. observations of the Crab nebula (simulated events for now).

We will access the events, effective area, energy dispersion, livetime and PSF for containement correction.

[4]:

datastore = DataStore.from_dir("$GAMMAPY_DATA/hess-dl3-dr1/")
obs_ids = [23523, 23526, 23559, 23592]
observations = datastore.get_observations(obs_ids)

Define Target Region¶

The next step is to define a signal extraction region, also known as on region. In the simplest case this is just a CircleSkyRegion, but here we will use the Target class in gammapy that is useful for book-keeping if you run several analysis in a script.

[5]:

target_position = SkyCoord(ra=83.63, dec=22.01, unit="deg", frame="icrs")
on_region_radius = Angle("0.11 deg")
on_region = CircleSkyRegion(center=target_position, radius=on_region_radius)

Create exclusion mask¶

We will use the reflected regions method to place off regions to estimate the background level in the on region. To make sure the off regions don’t contain gamma-ray emission, we create an exclusion mask.

Using http://gamma-sky.net/ we find that there’s only one known gamma-ray source near the Crab nebula: the AGN called RGB J0521+212 at GLON = 183.604 deg and GLAT = -8.708 deg.

[6]:

exclusion_region = CircleSkyRegion(
    center=SkyCoord(183.604, -8.708, unit="deg", frame="galactic"),
    radius=0.5 * u.deg,
)

skydir = target_position.galactic
exclusion_mask = Map.create(
    npix=(150, 150), binsz=0.05, skydir=skydir, proj="TAN", frame="icrs"
)

mask = exclusion_mask.geom.region_mask([exclusion_region], inside=False)
exclusion_mask.data = mask
exclusion_mask.plot();

../_images/tutorials_spectrum_analysis_12_0.png

Run data reduction chain¶

We begin with the configuration of the maker classes:

[7]:

e_reco = MapAxis.from_energy_bounds(0.1, 40, 40, unit="TeV", name="energy")
e_true = MapAxis.from_energy_bounds(
    0.05, 100, 200, unit="TeV", name="energy_true"
)
dataset_empty = SpectrumDataset.create(
    e_reco=e_reco, e_true=e_true, region=on_region
)

[8]:

dataset_maker = SpectrumDatasetMaker(
    containment_correction=False, selection=["counts", "exposure", "edisp"]
)
bkg_maker = ReflectedRegionsBackgroundMaker(exclusion_mask=exclusion_mask)
safe_mask_masker = SafeMaskMaker(methods=["aeff-max"], aeff_percent=10)

[9]:

%%time
datasets = Datasets()

for obs_id, observation in zip(obs_ids, observations):
    dataset = dataset_maker.run(
        dataset_empty.copy(name=str(obs_id)), observation
    )
    dataset_on_off = bkg_maker.run(dataset, observation)
    dataset_on_off = safe_mask_masker.run(dataset_on_off, observation)
    datasets.append(dataset_on_off)

CPU times: user 3.02 s, sys: 115 ms, total: 3.14 s
Wall time: 2.94 s

Plot off regions¶

[10]:

plt.figure(figsize=(8, 8))
_, ax, _ = exclusion_mask.plot()
on_region.to_pixel(ax.wcs).plot(ax=ax, edgecolor="k")
plot_spectrum_datasets_off_regions(ax=ax, datasets=datasets)

../_images/tutorials_spectrum_analysis_18_0.png

Source statistic¶

Next we’re going to look at the overall source statistics in our signal region.

[11]:

info_table = datasets.info_table(cumulative=True)

[12]:

info_table

[12]:

Table length=4

name	counts	background	excess	sqrt_ts	npred	npred_background	npred_signal	exposure_min	exposure_max	livetime	ontime	counts_rate	background_rate	excess_rate	n_bins	n_fit_bins	stat_type	stat_sum	counts_off	acceptance	acceptance_off	alpha
								m2 s	m2 s	s	s	1 / s	1 / s	1 / s
str5	float32	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64	int64	int64	str5	float64	float32	float64	float64	float64
23523	177.0	14.916666666666666	162.08333333333334	21.050838590041725	27.384615384615387	27.384615384615387	nan	1808360.59208519	1099528528.5914836	1581.7367584109306	1687.0	0.11190231184727636	0.00943056206245879	0.10247174978481759	40	28	wstat	485.0079493226156	179.0	28.0	336.0	0.08333333333333336
23523	353.0	31.249998092651367	321.75	29.365438086745748	56.00000044227352	56.00000044227352	nan	7707104.953775072	2036737418.8943977	3154.4234824180603	3370.0	0.11190634420759628	0.009906722501538161	0.10199962110139973	40	29	wstat	903.6963227647335	375.0	29.0	348.0000305175781	0.083333320915699
23523	503.0	41.884151458740234	461.1158447265625	36.94259742359151	65.53241887997581	65.53241887997581	nan	11107774.944855023	2786019506.4521184	4732.546999931335	5056.0	0.10628526246169305	0.008850234653633219	0.09743502700200396	40	29	wstat	1425.8449748709932	811.0	29.0	561.5250244140625	0.05006258562207222
23523	632.0	53.3841552734375	578.6158447265625	41.679006011575446	77.982364962886	77.982364962886	nan	12189605.052072244	3625950388.293474	6313.811640620232	6742.0	0.1000980130503095	0.00845513903677262	0.09164287401353688	40	29	wstat	1826.8050329098885	1225.0	29.0	665.4596557617188	0.04139557480812073

[13]:

plt.plot(
    info_table["livetime"].to("h"), info_table["excess"], marker="o", ls="none"
)
plt.xlabel("Livetime [h]")
plt.ylabel("Excess");

../_images/tutorials_spectrum_analysis_22_0.png

[14]:

plt.plot(
    info_table["livetime"].to("h"),
    info_table["sqrt_ts"],
    marker="o",
    ls="none",
)
plt.xlabel("Livetime [h]")
plt.ylabel("Sqrt(TS)");

../_images/tutorials_spectrum_analysis_23_0.png

Finally you can write the extrated datasets to disk using the OGIP format (PHA, ARF, RMF, BKG, see here for details):

[15]:

path = Path("spectrum_analysis")
path.mkdir(exist_ok=True)

[16]:

for dataset in datasets:
    dataset.to_ogip_files(outdir=path, overwrite=True)

If you want to read back the datasets from disk you can use:

[17]:

datasets = Datasets()
for obs_id in obs_ids:
    filename = path / f"pha_obs{obs_id}.fits"
    datasets.append(SpectrumDatasetOnOff.from_ogip_files(filename))

Fit spectrum¶

Now we’ll fit a global model to the spectrum. First we do a joint likelihood fit to all observations. If you want to stack the observations see below. We will also produce a debug plot in order to show how the global fit matches one of the individual observations.

[18]:

spectral_model = PowerLawSpectralModel(
    index=2, amplitude=2e-11 * u.Unit("cm-2 s-1 TeV-1"), reference=1 * u.TeV
)
model = SkyModel(spectral_model=spectral_model, name="crab")

for dataset in datasets:
    dataset.models = model

fit_joint = Fit(datasets)
result_joint = fit_joint.run()

# we make a copy here to compare it later
model_best_joint = model.copy()

Fit quality and model residuals¶

We can access the results dictionary to see if the fit converged:

[19]:

print(result_joint)

OptimizeResult

        backend    : minuit
        method     : minuit
        success    : True
        message    : Optimization terminated successfully.
        nfev       : 47
        total stat : 124.41

A simple way to inspect the model residuals is using the function ~SpectrumDataset.plot_fit()

[20]:

ax_spectrum, ax_residuals = datasets[0].plot_fit()
ax_spectrum.set_ylim(0.1, 40)

[20]:

(0.1, 40)

../_images/tutorials_spectrum_analysis_35_1.png

For more ways of assessing fit quality, please refer to the dedicated modeling and fitting tutorial.

Compute Flux Points¶

To round up our analysis we can compute flux points by fitting the norm of the global model in energy bands. We’ll use a fixed energy binning for now:

[21]:

e_min, e_max = 0.7, 30
energy_edges = np.logspace(np.log10(e_min), np.log10(e_max), 11) * u.TeV

Now we create an instance of the gammapy.estimators.FluxPointsEstimator, by passing the dataset and the energy binning:

[22]:

fpe = FluxPointsEstimator(energy_edges=energy_edges, source="crab")
flux_points = fpe.run(datasets=datasets)

Here is a the table of the resulting flux points:

[23]:

flux_points.table_formatted

[23]:

Table length=10

counts [4]	e_ref	e_min	e_max	ref_dnde	ref_flux	ref_eflux	ref_e2dnde	norm	stat	success	norm_err	ts	norm_errp	norm_errn	norm_ul	norm_scan [11]	stat_scan [11]	sqrt_ts	dnde	dnde_ul	dnde_err	dnde_errp	dnde_errn
	TeV	TeV	TeV	1 / (cm2 s TeV)	1 / (cm2 s)	TeV / (cm2 s)	TeV / (cm2 s)												1 / (cm2 s TeV)	1 / (cm2 s TeV)	1 / (cm2 s TeV)	1 / (cm2 s TeV)	1 / (cm2 s TeV)
int64	float64	float64	float64	float64	float64	float64	float64	float64	float64	bool	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64	float64
61 .. 42	0.877	0.701	1.099	3.772e-11	1.519e-11	1.309e-11	2.905e-11	0.966	16.146	True	0.074	584.992	0.075	0.072	1.121	0.200 .. 5.000	248.571 .. 934.424	24.187	3.645e-11	4.227e-11	2.780e-12	2.845e-12	2.715e-12
20 .. 25	1.276	1.099	1.482	1.432e-11	5.524e-12	6.992e-12	2.331e-11	0.990	10.939	True	0.115	264.797	0.120	0.111	1.238	0.200 .. 5.000	114.916 .. 385.557	16.273	1.418e-11	1.772e-11	1.651e-12	1.711e-12	1.592e-12
36 .. 20	1.856	1.482	2.323	5.432e-12	4.626e-12	8.430e-12	1.871e-11	1.213	7.844	True	0.125	350.376	0.129	0.121	1.478	0.200 .. 5.000	167.003 .. 342.942	18.718	6.587e-12	8.029e-12	6.768e-13	6.988e-13	6.551e-13
11 .. 12	2.699	2.323	3.135	2.061e-12	1.682e-12	4.503e-12	1.501e-11	1.195	9.790	True	0.183	157.668	0.192	0.174	1.596	0.200 .. 5.000	80.721 .. 165.447	12.557	2.464e-12	3.291e-12	3.765e-13	3.948e-13	3.587e-13
14 .. 8	3.924	3.135	4.913	7.823e-13	1.409e-12	5.429e-12	1.205e-11	0.861	22.656	True	0.158	92.915	0.167	0.150	1.214	0.200 .. 5.000	57.809 .. 214.137	9.639	6.736e-13	9.499e-13	1.239e-13	1.310e-13	1.171e-13
7 .. 4	5.707	4.913	6.629	2.968e-13	5.123e-13	2.900e-12	9.667e-12	1.111	6.505	True	0.275	63.182	0.296	0.253	1.749	0.200 .. 5.000	32.826 .. 73.940	7.949	3.298e-13	5.193e-13	8.148e-14	8.801e-14	7.523e-14
5 .. 5	8.299	6.629	10.390	1.126e-13	4.290e-13	3.496e-12	7.758e-12	1.107	14.176	True	0.294	44.347	0.318	0.271	1.793	0.200 .. 5.000	34.899 .. 74.713	6.659	1.248e-13	2.020e-13	3.309e-14	3.580e-14	3.050e-14
0 .. 0	12.068	10.390	14.018	4.275e-14	1.560e-13	1.867e-12	6.226e-12	0.771	11.511	True	0.413	8.495	0.476	0.355	1.858	0.200 .. 5.000	14.893 .. 40.219	2.915	3.298e-14	7.943e-14	1.766e-14	2.036e-14	1.517e-14
1 .. 0	17.550	14.018	21.971	1.622e-14	1.307e-13	2.252e-12	4.996e-12	0.424	7.712	True	0.300	3.175	0.374	0.234	1.333	0.200 .. 5.000	8.603 .. 41.037	1.782	6.873e-15	2.163e-14	4.860e-15	6.068e-15	3.794e-15
0 .. 0	25.521	21.971	29.645	6.156e-15	4.751e-14	1.203e-12	4.009e-12	-0.000	0.524	False	0.031	-0.000	0.001	0.001	1.124	0.200 .. 5.000	1.235 .. 18.312	nan	-2.687e-25	6.921e-15	1.935e-16	3.430e-18	3.430e-18

Now we plot the flux points and their likelihood profiles. For the plotting of upper limits we choose a threshold of TS < 4.

[24]:

plt.figure(figsize=(8, 5))
flux_points.table["is_ul"] = flux_points.table["ts"] < 4
ax = flux_points.plot(
    energy_power=2, flux_unit="erg-1 cm-2 s-1", color="darkorange"
)
flux_points.to_sed_type("e2dnde").plot_ts_profiles(ax=ax);

../_images/tutorials_spectrum_analysis_44_0.png

The final plot with the best fit model, flux points and residuals can be quickly made like this:

[25]:

flux_points_dataset = FluxPointsDataset(
    data=flux_points, models=model_best_joint
)

[26]:

flux_points_dataset.plot_fit();

../_images/tutorials_spectrum_analysis_47_0.png

Stack observations¶

And alternative approach to fitting the spectrum is stacking all observations first and the fitting a model. For this we first stack the individual datasets:

[27]:

dataset_stacked = Datasets(datasets).stack_reduce()

Again we set the model on the dataset we would like to fit (in this case it’s only a single one) and pass it to the gammapy.modeling.Fit object:

[28]:

dataset_stacked.models = model
stacked_fit = Fit([dataset_stacked])
result_stacked = stacked_fit.run()

# make a copy to compare later
model_best_stacked = model.copy()

[29]:

print(result_stacked)

OptimizeResult

        backend    : minuit
        method     : minuit
        success    : True
        message    : Optimization terminated successfully.
        nfev       : 31
        total stat : 29.88

[30]:

model_best_joint.parameters.to_table()

[30]:

Table length=3

name	value	unit	min	max	frozen	error
str9	float64	str14	float64	float64	bool	float64
index	2.5876e+00		nan	nan	False	5.867e-02
amplitude	2.6900e-11	cm-2 s-1 TeV-1	nan	nan	False	1.234e-12
reference	1.0000e+00	TeV	nan	nan	True	0.000e+00

[31]:

model_best_stacked.parameters.to_table()

[31]:

Table length=3

name	value	unit	min	max	frozen	error
str9	float64	str14	float64	float64	bool	float64
index	2.5912e+00		nan	nan	False	5.881e-02
amplitude	2.6933e-11	cm-2 s-1 TeV-1	nan	nan	False	1.234e-12
reference	1.0000e+00	TeV	nan	nan	True	0.000e+00

Finally, we compare the results of our stacked analysis to a previously published Crab Nebula Spectrum for reference. This is available in gammapy.modeling.models.create_crab_spectral_model.

[32]:

plot_kwargs = {
    "energy_range": [0.1, 30] * u.TeV,
    "energy_power": 2,
    "flux_unit": "erg-1 cm-2 s-1",
}

# plot stacked model
model_best_stacked.spectral_model.plot(
    **plot_kwargs, label="Stacked analysis result"
)
model_best_stacked.spectral_model.plot_error(**plot_kwargs)

# plot joint model
model_best_joint.spectral_model.plot(
    **plot_kwargs, label="Joint analysis result", ls="--"
)
model_best_joint.spectral_model.plot_error(**plot_kwargs)

create_crab_spectral_model("hess_pl").plot(
    **plot_kwargs, label="Crab reference"
)
plt.legend()

[32]:

<matplotlib.legend.Legend at 0x12449f978>

../_images/tutorials_spectrum_analysis_56_1.png

Exercises¶

Now you have learned the basics of a spectral analysis with Gammapy. To practice you can continue with the following exercises:

Fit a different spectral model to the data. You could try gammapy.modeling.models.ExpCutoffPowerLawSpectralModel or gammapy.modeling.models.LogParabolaSpectralModel.
Compute flux points for the stacked dataset.
Create a gammapy.estimators.FluxPointsDataset with the flux points you have computed for the stacked dataset and fit the flux points again with obe of the spectral models. How does the result compare to the best fit model, that was directly fitted to the counts data?

What next?¶

The methods shown in this tutorial is valid for point-like or midly extended sources where we can assume that the IRF taken at the region center is valid over the whole region. If one wants to extract the 1D spectrum of a large source and properly average the response over the extraction region, one has to use a different approach explained in the extended source spectral analysis tutorial.

[ ]: