PIG 18 - Documentation

  • Author: Christoph Deil, Axel Donath, José Enrique Ruiz

  • Created: Oct 16, 2019

  • Accepted: Nov 6, 2019

  • Status: accepted

  • Discussion: GH 2463

Abstract

Over the past years the Gammapy package and documentation has grown organically, at the moment there’s a lot of duplicated and missing content, especially for recently added functionality like datasets, the high level interface, and the new restructure of the gammapy package. We propose to spend significant effort to reorganise and improve the Gammapy documentation in Nov and Dec 2019, roughly following the plan outlined here. Further discussion and planning will occur in Github issues and pull requests, and will be summarised on the Documentation Github project board.

Introduction

Gammapy started in 2013 and since then the package and documentation has continuously evolved (see Gammapy changelog). The oldest version of the documentation that is still readily available online is for Gammapy v0.6 from April 2017 (https://docs.gammapy.org/0.6). The current version of the documentation is for Gammapy v0.14 from September 2019 (https://docs.gammapy.org/0.14).

In 2018, following other projects such as Astropy or Sunpy, we created a “project webpage” at https://gammapy.org which is not versioned and hand-written, in addition to https://docs.gammapy.org which is versioned and auto-generated by Sphinx. And we introduced a new setup for tutorials (written as Jupyter notebooks, integrated into the Sphinx documentation) and gammapy download as the way that users download versioned tutorial notebooks, example python scripts and example datasets in a reproducible conda environment (see PIG 4 - Setup for tutorial notebooks and data, ADASS XVIII proceedings).

Currently there are a 19 tutorial notebooks plus 7 listed as “extra topics”). Among the notebooks there is a lot of duplicated content, but on the other hand there is also still a lot of missing documentation (e.g. recently implemented large changes in Gammapy such as PIG 12 - High level interface and PIG 16 - Gammapy package structure are not completely documented yet). In addition to the Jupyter notebook tutorials, we have RST documentation pages for each Gammapy sub-package. In some cases there is a lot of content and examples (e.g. maps or modeling), in other cases there is only a sentence or two and the API docs (e.g. cube). The more technical documentation related with the API classes, methods and objects is autogenerated from Python docstrings written in their code.

The tutorials usually have the following structure: introduction, setup, main content, and sometimes at the end a summary, exercises or links to other documentation. The sub-package RST pages usually have the following structure, following the Astropy docs: Introduction (overview description), Getting Started (first examples), Using (links to tutorials and sometimes sub-pages), API reference.

We will not discuss how other projects structure their documentation, but we did look at a small list of projects and think it’s useful to compare and contrast to figure out a good documentation for Gammapy:

Generally one has to be aware that Gammapy is both a flexible and extensible library with building blocks that experts can use to implement advanced analyses, as well as a science tool package for IACTs (CTA, H.E.S.S.) with most analysis use cases pre-implemented, that users just need to configure and run. For some of the examples considered, that’s also the case (e.g. JupyterLab), some others (e.g. scikit-learn or Astropy) are just a library, and thus their documentation is partly different.

Proposal

Guidelines and specific actions

We propose to undertake a minor general restructure of the Getting started section described below, mostly keeping the existing Gammapy documentation setup (e.g. to maintain part of the documentation in RST pages and another part in Jupyter notebooks), though we admit that there is no clear separation between the content of both. We will take the following items as guidelines and actions to improve the documentation:

  • More content should be moved to Jupyter notebooks (e.g. currently the RST pages for maps, modeling, catalog, detect, etc. have a few code examples). Those should be moved to corresponding notebooks maps.ipynb, modeling.ipynb, catalog.ipynb and detect.ipynb, since in many cases there would be a hands-on tutorial introduction for each sub-package. More cross-links between IPYNB, RST and API docs should be created.

  • Sub-package RST pages will be kept short with links to relevant hands-on tutorials or Python scripts at the top, and the API docs at the bottom. Some pages have significant content, which is not related to code examples in between. (e.g. for maps, modeling or IRFs there is a description of the design).

  • When possible the notebooks should use the high level interface everywhere it makes sense (e.g. automatic data reduction), and the lower level API at the end for the very specific use case proposed, trying to have shorter notebooks going to the point.

  • Add a Gammapy overview page to the RST docs, where the general data analysis concepts are explained (DL3, Datasets, Model, Fitting). This page would be similar to the description of Gammapy in the paper that we also plan to write now, and the same figures would be used for both.

  • Add a HowTo RST page with a list of short specific needs linking to subsections of notebooks exposing the solution.

  • Add a few examples of how to use Gammapy with Python scripts, and provide these scripts with gammapy download.

  • Extend the Glossary present in the References section with some non-obvious but common terms used through the documentation and tutorials.

  • Some effort will be put in revisioning the completeness and consistency of the API docstrings.

Getting started section restructuring

Gammapy overview

We suggest to add an overview page at the beginning of the section. That’s a ten minute read and non-hands-on introduction to Gammapy, explaining the details of data analysis and giving an overview about concepts such as Datasets, Fit, Models etc. and how those play together in a Gammapy analysis. People could skip this section and go directly to the installation or hands-on tutorial notebooks and come back to that page later if they prefer.

Installation

Keep as it is.

Getting started

Keep as it is.

Tutorials

The tutorials will be reorganised in the following groups (items) and individual notebooks (sub-items).

  • First analysis
    • Config-driven 1D and 3D analysis of Crab (evolution of current hess.ipynb that could be renamed)

    • Extended source analysis, also showing the lower level API with customisation options for background modeling (to be implemented)

The group below will be a “starting page” for people from CTA, HESS and Fermi, and possibly other instruments in the future. We could remove https://gammapy.org/cta.html (very few, and outdated infos), since it is better to have one starting page for CTA users instead of two.

  • What data can I analyse?
    • Observations and Datasets (to be implemented)

    • CTA, mention prod3 and DC1, show what the IRFs look like (cta_data_analysis.ipynb)

    • HESS, mention DR1, show what the IRFs look like (to be implemented)

    • Fermi-LAT, show how to create map and spectrum dataset using 3FHL example (fermi_lat.ipynb)

  • What analyses can I do?
    • IACT data selection and grouping (to be implemented)

    • IACT 3D cube analysis (data reduction and fitting)

    • IACT 2D image analysis (image_analysis.ipynb)

    • IACT 1D spectrum analysis (data reduction and fitting)

    • IACT light curve computation (light_curve.ipynb)

    • Flux point fitting (sed_fitting_gammacat_fermi.ipynb)

    • Binned simulation 1D / 3D (spectrum_simulation.ipynb and simulate_3d.ipynb combined)

    • Binned sensitivity computation (cta_sensistivity.ipynb)

    • Pulsar analysis (pulsar_analysis.ipynb)

    • Naima model fitting (to be implemented)

    • Joint Crab analysis using Fermi + HESS + some flux points (to be implemented)

For many Gammapy sub-packages, we plan to have a corresponding notebook that is a hands-on, executable tutorial introduction that complements the description and API docs on the sub-package RST page. These notebooks are listed in the group below.

  • Gammapy package
    • Overview (short section with one example for each sub-package, hands-on, an evolution of the current getting_started.ipynb)

    • Maps (gammapy.maps) (maps.ipynb)

    • Models (gammapy.modeling.models) (models.ipynb)

    • Modeling (gammapy.modeling) (to be implemented)

    • Statistics (gammapy.stats) (to be implemented, explains about likelihood, TS values, significance, …)

    • Source detection (gammapy.detect) (detect_ts.ipynb, cwt.ipynb)

    • Source catalogs (gammapy.catalog) (to be implemented)

  • Scripts

This group will contain a few examples on how to use Gammapy from Python scripts (i.e. make a CTA 1DC survey counts map or some other long-running analysis or simulations). The Python scripts could be provided as links and also in gammapy download, as it is the case with the notebooks.

  • Extra topics
    • MCMC sampling (mcmc_sampling.ipynb)

    • Dark matter models (gammapy.astro.darkmatter) (astro_dark_matter.ipynb)

    • Dark matter analysis (to be implemented)

    • Light curve simulation (to be implemented)

    • Source population modelling (gammapy.astro.population) (source_population_model.ipynb)

    • Background model making (background_model.ipynb)

    • Sherpa for Gammapy users (image_fitting_with_sherpa.ipynb, spectrum_fitting_with_sherpa.ipynb)?

    • HESS Galactic Plane Survey data (hgps.ipynb)

More specialised notebooks, and in some cases of lower quality.

  • Basics

Leave the basics section at the end of the tutorials page, pretty much as-is.

How To

We suggest to add a HowTo RST file with short entries explaining how to do something specific in Gammapy. Probably each HOWTO should be a short section with just 1-2 sentences and links to tutorials, specific sections of the tutorials, to the API docs, or if it should be small mini-tutorials with code snippets, they could possibly go on sub-pages. The HowTo documentation page shows a preliminary version of the content of this page.

Reference

Keep as it is and extend the Glossary.

Changelog

Keep as it is.

Alternatives

We could try to change to a purely Jupyter notebook maintained documentation (e.g. the “Python Data Science Handbook” is written just as Jupyter notebooks). Or we could change documentation system and write all documentation as RST or MD, and then have a documentation processor that auto-generates notebooks. E.g. Jupytext does this, and partly e.g. the scikit-learn dos do that for their tutorials, they maintain it in Python scripts and RST files.

There’s a lot of ways to structure the documentation, or to put different focus.

Outlook

This is a short-term proposal, to quickly improve the Gammapy documentation within the next 1-2 months, with the limited contributors we already have. In early 2020, we should run a Gammapy user survey and gather feedback on the Gammapy package and documentation. Examples of previous user surveys exist, e.g. from CTA 1DC or the Scipy documentation user survey, that we can use as reference how to get useful feedback.

We should also try to attract or hire contributors to Gammapy that have a strong interest in documentation. Once concrete idea could be to participate in Google season of docs, to get a junior technical writer for a few months, if someone from the Gammapy team has time to work and mentor the project.

Another thing to keep in mind is that we should work towards a setup and structure for the Gammapy package that support CTA as well as possible, and that makes it easy for CTAO to choose and adapt Gammapy as prototype of the CTA science tools and evolve and maintain it. This PIG doesn’t propose a solution for this, that’s for later.

Implementation

Implementing this PIG is a lot of work, roughly 2 months of full-time work. We suggest that, after the PIG is accepted, one coordinator spends a few days to do quick additions / removals / renames / rearrangements, so that the structure of the RST and IPYNB files we want is in place. For this we propose the coordinator to fill the Documentation Github project with a list of 20-30 tasks that should be done (each 1-2 days of work, not longer) and asks for help. Each task is usually to edit one notebook or RST page, and needs one author and one reviewer. It is then up to those two people to coordinate their work: they can open a Github issue to discuss, or they can just do a phone call or meet. Eventually there is a pull request and when it’s in a state where both are happy, and it’s merged in. Whether to use “notes” or “issues” for each task will be discussed during the development of the PIG and and will be basically up to the documentation coordinator.

Decision

The PIG was extensively discussed and received a lot of feedback on GH 2463. The main suggestions received were incorporated. There was some controversy e.g. whether we should have more shorter pages and notebooks, or fewer longer ones. This PIG was accepted on Nov 6, 2019, although we’d like to note that the outline and changes described above aren’t set in stone, we expect the documentation to evolve and improve in an interactive fashion over the coming weeks, but also in 2020 and after.