hierarchical_clustering#

gammapy.utils.cluster.hierarchical_clustering(features, linkage_kwargs=None, fcluster_kwargs=None)[source]#

Hierarchical clustering using given features.

Parameters:
featuresTable

Table containing the features.

linkage_kwargsdict, optional

Arguments forwarded to scipy.cluster.hierarchy.linkage. Default is None, which uses method=”ward” and metric=”euclidean”.

fcluster_kwargsdict, optional

Arguments forwarded to scipy.cluster.hierarchy.fcluster. Default is None, which uses criterion=”maxclust” and t=3.

Returns:
featuresTable

Table containing the features and an extra column for the groups labels.

Examples

Cluster features into t=2 groups with a corresponding label for each group:

>>> from gammapy.data.utils import get_irfs_features
>>> from gammapy.data import DataStore
>>> from gammapy.utils.cluster import standard_scaler, hierarchical_clustering
>>> from astropy.coordinates import SkyCoord
>>> import astropy.units as u
>>> data_store = DataStore.from_dir("$GAMMAPY_DATA/hess-dl3-dr1/")
>>> obs_ids = data_store.obs_table["OBS_ID"][13:20]
>>> obs = data_store.get_observations(obs_ids)
>>> position = SkyCoord(329.716 * u.deg, -30.225 * u.deg, frame="icrs")
>>> names = ["edisp-res", "psf-radius"]
>>> features_irfs = get_irfs_features(
...     obs,
...     energy_true="1 TeV",
...     position=position,
...     names=names
... )
>>> scaled_features_irfs = standard_scaler(features_irfs)
>>> features = hierarchical_clustering(scaled_features_irfs, fcluster_kwargs={"t": 2})
>>> print(features)
     edisp-res      obs_id      psf-radius     labels
------------------- ------ ------------------- ------
-1.3020791585772495  20326 -1.2471938975366008      2
-1.3319831545301117  20327 -1.4586649826004114      2
-0.7763307219821931  20339 -0.6705024680435898      2
 0.9677107409819438  20343  0.9500979841335693      1
  0.820562952023891  20344  0.8160964882165554      1
 0.7771617763704126  20345  0.7718272408581743      1
 0.8449575657133206  20346  0.8383396349722769      1