PIG 18 - Documentation#
Author: Christoph Deil, Axel Donath, José Enrique Ruiz
Created: Oct 16, 2019
Accepted: Nov 6, 2019
Discussion: GH 2463
Over the past years the Gammapy package and documentation has grown organically, at the moment there’s a lot of duplicated and missing content, especially for recently added functionality like datasets, the high level interface, and the new restructure of the gammapy package. We propose to spend significant effort to reorganise and improve the Gammapy documentation in Nov and Dec 2019, roughly following the plan outlined here. Further discussion and planning will occur in Github issues and pull requests, and will be summarised on the Documentation Github project board.
Gammapy started in 2013 and since then the package and documentation has continuously evolved (see Gammapy changelog). The oldest version of the documentation that is still readily available online is for Gammapy v0.6 from April 2017 (https://docs.gammapy.org/0.6). The current version of the documentation is for Gammapy v0.14 from September 2019 (https://docs.gammapy.org/0.14).
In 2018, following other projects such as Astropy or Sunpy, we created a
“project webpage” at https://gammapy.org which is not versioned and
hand-written, in addition to https://docs.gammapy.org which is versioned and
auto-generated by Sphinx. And we introduced a new setup for tutorials (written
as Jupyter notebooks, integrated into the Sphinx documentation) and
download as the way that users download versioned tutorial notebooks, example
python scripts and example datasets in a reproducible conda environment (see
PIG 4 - Setup for tutorial notebooks and data, ADASS XVIII proceedings).
Currently there are a 19 tutorial notebooks plus 7 listed as “extra topics”). Among the notebooks there is a lot of duplicated content, but on the other hand there is also still a lot of missing documentation (e.g. recently implemented large changes in Gammapy such as PIG 12 - High level interface and PIG 16 - Gammapy package structure are not completely documented yet). In addition to the Jupyter notebook tutorials, we have RST documentation pages for each Gammapy sub-package. In some cases there is a lot of content and examples (e.g. maps or modeling), in other cases there is only a sentence or two and the API docs (e.g. cube). The more technical documentation related with the API classes, methods and objects is autogenerated from Python docstrings written in their code.
The tutorials usually have the following structure: introduction, setup, main content, and sometimes at the end a summary, exercises or links to other documentation. The sub-package RST pages usually have the following structure, following the Astropy docs: Introduction (overview description), Getting Started (first examples), Using (links to tutorials and sometimes sub-pages), API reference.
We will not discuss how other projects structure their documentation, but we did look at a small list of projects and think it’s useful to compare and contrast to figure out a good documentation for Gammapy:
http://cxc.harvard.edu/sherpa/ and https://sherpa.readthedocs.io
https://www.djangoproject.com/ and https://docs.djangoproject.com
Generally one has to be aware that Gammapy is both a flexible and extensible library with building blocks that experts can use to implement advanced analyses, as well as a science tool package for IACTs (CTA, H.E.S.S.) with most analysis use cases pre-implemented, that users just need to configure and run. For some of the examples considered, that’s also the case (e.g. JupyterLab), some others (e.g. scikit-learn or Astropy) are just a library, and thus their documentation is partly different.
Guidelines and specific actions#
We propose to undertake a minor general restructure of the Getting started section described below, mostly keeping the existing Gammapy documentation setup (e.g. to maintain part of the documentation in RST pages and another part in Jupyter notebooks), though we admit that there is no clear separation between the content of both. We will take the following items as guidelines and actions to improve the documentation:
More content should be moved to Jupyter notebooks (e.g. currently the RST pages for maps, modeling, catalog, detect, etc. have a few code examples). Those should be moved to corresponding notebooks
detect.ipynb, since in many cases there would be a hands-on tutorial introduction for each sub-package. More cross-links between IPYNB, RST and API docs should be created.
Sub-package RST pages will be kept short with links to relevant hands-on tutorials or Python scripts at the top, and the API docs at the bottom. Some pages have significant content, which is not related to code examples in between. (e.g. for maps, modeling or IRFs there is a description of the design).
When possible the notebooks should use the high level interface everywhere it makes sense (e.g. automatic data reduction), and the lower level API at the end for the very specific use case proposed, trying to have shorter notebooks going to the point.
Add a Gammapy overview page to the RST docs, where the general data analysis concepts are explained (DL3, Datasets, Model, Fitting). This page would be similar to the description of Gammapy in the paper that we also plan to write now, and the same figures would be used for both.
Add a HowTo RST page with a list of short specific needs linking to subsections of notebooks exposing the solution.
Add a few examples of how to use Gammapy with Python scripts, and provide these scripts with
Extend the Glossary present in the References section with some non-obvious but common terms used through the documentation and tutorials.
Some effort will be put in revisioning the completeness and consistency of the API docstrings.
Getting started section restructuring#
We suggest to add an overview page at the beginning of the section. That’s a ten minute read and non-hands-on introduction to Gammapy, explaining the details of data analysis and giving an overview about concepts such as Datasets, Fit, Models etc. and how those play together in a Gammapy analysis. People could skip this section and go directly to the installation or hands-on tutorial notebooks and come back to that page later if they prefer.
Keep as it is.
Keep as it is.
The tutorials will be reorganised in the following groups (items) and individual notebooks (sub-items).
- First analysis
Config-driven 1D and 3D analysis of Crab (evolution of current
hess.ipynbthat could be renamed)
Extended source analysis, also showing the lower level API with customisation options for background modeling (to be implemented)
The group below will be a “starting page” for people from CTA, HESS and Fermi, and possibly other instruments in the future. We could remove https://gammapy.org/cta.html (very few, and outdated infos), since it is better to have one starting page for CTA users instead of two.
- What data can I analyse?
Observations and Datasets (to be implemented)
CTA, mention prod3 and DC1, show what the IRFs look like (
HESS, mention DR1, show what the IRFs look like (to be implemented)
Fermi-LAT, show how to create map and spectrum dataset using 3FHL example (
- What analyses can I do?
IACT data selection and grouping (to be implemented)
IACT 3D cube analysis (data reduction and fitting)
IACT 2D image analysis (
IACT 1D spectrum analysis (data reduction and fitting)
IACT light curve computation (
Flux point fitting (
Binned simulation 1D / 3D (
Binned sensitivity computation (
Pulsar analysis (
Naima model fitting (to be implemented)
Joint Crab analysis using Fermi + HESS + some flux points (to be implemented)
For many Gammapy sub-packages, we plan to have a corresponding notebook that is a hands-on, executable tutorial introduction that complements the description and API docs on the sub-package RST page. These notebooks are listed in the group below.
- Gammapy package
Overview (short section with one example for each sub-package, hands-on, an evolution of the current
gammapy.modeling) (to be implemented)
gammapy.stats) (to be implemented, explains about likelihood, TS values, significance, …)
Source detection (
Source catalogs (
gammapy.catalog) (to be implemented)
This group will contain a few examples on how to use Gammapy from Python scripts (i.e. make a CTA 1DC
survey counts map or some other long-running analysis or simulations). The Python scripts could be provided
as links and also in
gammapy download, as it is the case with the notebooks.
- Extra topics
MCMC sampling (
Dark matter models (
Dark matter analysis (to be implemented)
Light curve simulation (to be implemented)
Source population modelling (
Background model making (
Sherpa for Gammapy users (
HESS Galactic Plane Survey data (
More specialised notebooks, and in some cases of lower quality.
Leave the basics section at the end of the tutorials page, pretty much as-is.
We suggest to add a HowTo RST file with short entries explaining how to do something specific in Gammapy. Probably each HOWTO should be a short section with just 1-2 sentences and links to tutorials, specific sections of the tutorials, to the API docs, or if it should be small mini-tutorials with code snippets, they could possibly go on sub-pages. The HowTo documentation page shows a preliminary version of the content of this page.
Keep as it is and extend the Glossary.
Keep as it is.
We could try to change to a purely Jupyter notebook maintained documentation (e.g. the “Python Data Science Handbook” is written just as Jupyter notebooks). Or we could change documentation system and write all documentation as RST or MD, and then have a documentation processor that auto-generates notebooks. E.g. Jupytext does this, and partly e.g. the scikit-learn dos do that for their tutorials, they maintain it in Python scripts and RST files.
There’s a lot of ways to structure the documentation, or to put different focus.
This is a short-term proposal, to quickly improve the Gammapy documentation within the next 1-2 months, with the limited contributors we already have. In early 2020, we should run a Gammapy user survey and gather feedback on the Gammapy package and documentation. Examples of previous user surveys exist, e.g. from CTA 1DC or the Scipy documentation user survey, that we can use as reference how to get useful feedback.
We should also try to attract or hire contributors to Gammapy that have a strong interest in documentation. Once concrete idea could be to participate in Google season of docs, to get a junior technical writer for a few months, if someone from the Gammapy team has time to work and mentor the project.
Another thing to keep in mind is that we should work towards a setup and structure for the Gammapy package that support CTA as well as possible, and that makes it easy for CTAO to choose and adapt Gammapy as prototype of the CTA science tools and evolve and maintain it. This PIG doesn’t propose a solution for this, that’s for later.
Implementing this PIG is a lot of work, roughly 2 months of full-time work. We suggest that, after the PIG is accepted, one coordinator spends a few days to do quick additions / removals / renames / rearrangements, so that the structure of the RST and IPYNB files we want is in place. For this we propose the coordinator to fill the Documentation GitHub project with a list of 20-30 tasks that should be done (each 1-2 days of work, not longer) and asks for help. Each task is usually to edit one notebook or RST page, and needs one author and one reviewer. It is then up to those two people to coordinate their work: they can open a GitHub issue to discuss, or they can just do a phone call or meet. Eventually there is a pull request and when it’s in a state where both are happy, and it’s merged in. Whether to use “notes” or “issues” for each task will be discussed during the development of the PIG and and will be basically up to the documentation coordinator.
The PIG was extensively discussed and received a lot of feedback on GH 2463. The main suggestions received were incorporated. There was some controversy e.g. whether we should have more shorter pages and notebooks, or fewer longer ones. This PIG was accepted on Nov 6, 2019, although we’d like to note that the outline and changes described above aren’t set in stone, we expect the documentation to evolve and improve in an interactive fashion over the coming weeks, but also in 2020 and after.