stats - Statistics¶

Introduction¶

gammapy.stats holds statistical estimators, fit statistics and algorithms commonly used in gamma-ray astronomy.

It is mostly concerned with the evaluation of one or several observations that count events in a given region and time window, i.e. with Poisson-distributed counts measurements.

For on-off methods we will use the following variable names following the notation in [Cousins2007]:

Variable	Dataset attribute name	Definition
`n_on`	`counts`	Total observed counts in the on region
`n_off`	`counts_off`	Total observed counts in the off region
`mu_on`	`npred`	Total expected counts in the on region
`mu_off`	`npred_off`	Total expected counts in the off region
`mu_sig`	`npred_sig`	Signal expected counts in the on region
`mu_bkg`	`npred_bkg`	Background expected counts in the on region
`a_on`	`acceptance`	Relative background exposure in the on region
`a_off`	`acceptance_off`	Relative background exposure in the off region
`alpha`	`alpha`	Background efficiency ratio `a_on` / `a_off`
`n_bkg`	`background`	Background estimate in the on region

The following formulae show how an on-off measurement \((n_{on}, n_{off})\) is related to the quantities in the above table:

\[n_{on} \sim Pois(\mu_{on})\text{ with }\mu_{on} = \mu_s + \mu_b n_{off} \sim Pois(\mu_{off})\text{ with }\mu_{off} = \mu_b / \alpha\text{ with }\alpha = a_{on} / a_{off}\]

With the background estimate in the on region

\[n_{bkg} = \alpha\ n_{off},\]

the maximum likelihood estimate of a signal excess is

\[n_{excess} = n_{on} - n_{bkg}.\]

When the background is known and there is only an “on” region (sometimes also called “source region”), we use the variable names n_on, mu_on, mu_sig and mu_bkg.

These are references describing the available methods: [LiMa1983], [Cash1979], [Stewart2009], [Rolke2005], [Feldman1998], [Cousins2007].

Getting Started¶

Li & Ma Significance¶

[LiMa1983] (see equation 17)

As an example, assume you measured \(n_{on} = 18\) counts in a region where you suspect a source might be present and \(n_{off} = 97\) counts in a background control region where you assume no source is present and that is \(a_{off}/a_{on}=10\) times larger than the on-region.

Here’s how you compute the statistical significance of your detection with the Li & Ma formula:

>>> from gammapy.stats import significance_on_off
>>> significance_on_off(n_on=18, n_off=97, alpha=1. / 10, method='lima')
2.2421704424844875

Confidence Intervals¶

Assume you measured 6 counts in a Poissonian counting experiment with an expected background \(b = 3\). Here’s how you compute the 90% upper limit on the signal strength \(\mu\):

import numpy as np
from scipy import stats
import gammapy.stats as gstats

x_bins = np.arange(0, 100)
mu_bins = np.linspace(0, 50, 50 / 0.005 + 1, endpoint=True)

matrix = [stats.poisson(mu + 3).pmf(x_bins) for mu in mu_bins]
acceptance_intervals = gstats.fc_construct_acceptance_intervals_pdfs(matrix, 0.9)
LowerLimitNum, UpperLimitNum, _ = gstats.fc_get_limits(mu_bins, x_bins, acceptance_intervals)
mu_upper_limit = gstats.fc_find_limit(6, UpperLimitNum, mu_bins)

The result is mu_upper_limit == 8.465.

Using `gammapy.stats`¶

Reference/API¶

gammapy.stats Package¶

Statistics.

Functions¶

`background`(n_off, alpha)	Estimate background in the on-region from an off-region observation.
`background_error`(n_off, alpha)	Estimate standard error on background in the on region from an off-region observation.
`cash`(n_on, mu_on)	Cash statistic, for Poisson data.
`cash_sum_cython`()	Summed cash fit statistics.
`cstat`(n_on, mu_on[, n_on_min])	C statistic, for Poisson data.
`excess`(n_on, n_off, alpha)	Estimate excess in the on region for an on-off observation.
`excess_error`(n_on, n_off, alpha)	Estimate error on excess for an on-off measurement.
`excess_matching_significance`(mu_bkg, …[, …])	Compute excess matching a given significance.
`excess_matching_significance_on_off`(n_off, …)	Compute sensitivity of an on-off observation.
`excess_ul_helene`(excess, excess_error, …)	Compute excess upper limit using the Helene method.
`fc_construct_acceptance_intervals`(…)	Convenience function that calculates the PDF for the user.
`fc_construct_acceptance_intervals_pdfs`(…)	Numerically choose bins a la Feldman Cousins ordering principle.
`fc_find_acceptance_interval_gauss`(mu, sigma, …)	Analytical acceptance interval for Gaussian with boundary at the origin.
`fc_find_acceptance_interval_poisson`(mu, …)	Analytical acceptance interval for Poisson process with background.
`fc_find_average_upper_limit`(x_bins, matrix, …)	Calculate the average upper limit for a confidence belt.
`fc_find_limit`(x_value, x_values, y_values)	Find the limit for a given x measurement.
`fc_fix_limits`(lower_limit, upper_limit)	Push limits outwards as described in the FC paper.
`fc_get_limits`(mu_bins, x_bins, …)	Find lower and upper limit from acceptance intervals.
`get_wstat_gof_terms`(n_on, n_off)	Goodness of fit terms for WSTAT.
`get_wstat_mu_bkg`(n_on, n_off, alpha, mu_sig)	Background estimate `mu_bkg` for WSTAT.
`probability_to_significance_normal`(probability)	Convert one-sided tail probability to significance.
`probability_to_significance_normal_limit`(…)	Tail probability to significance in small probability limit.
`significance`(n_on, mu_bkg[, method, n_on_min])	Compute significance for an observed number of counts and known background.
`significance_on_off`(n_on, n_off, alpha[, method])	Compute significance of an on-off observation.
`significance_to_probability_normal`(significance)	Convert significance to one-sided tail probability.
`significance_to_probability_normal_limit`(…)	Significance to tail probability in large significance limit.
`wstat`(n_on, n_off, alpha, mu_sig[, mu_bkg, …])	W statistic, for Poisson data with Poisson background.