Technical documentation with Sphinx

This talk explains how one can use Sphinx for technical documentation, in particular this very site scicomp.aalto.fi. The focus is to make an overview for contributing to this site (or similar ones), but it will also provide a strong basis for creating such a site yourself.

Basics

scicomp.aalto.fi

  • Home of Aalto Scientific Computing’s documentation

  • Before 2017, was Triton’s documentation using Confluence

  • Now has information on many different topics about scientific computing.

  • Rather highly ranked in search engines.

  • Converted from wiki.aalto.fi (Triton) using _meta/confluence2html.py and then pandoc to convert HTML→ReST.

  • CC-BY license agreed at that time

Properties of good documentation

  • Organized, easy to use

  • Versioned

  • Anyone can contribute

  • Shareable, reuseable, licensed

  • No lock-in, can migrate later

  • Plain text so 50 years of text processing development (grep, sed), etc all work.

  • Not standalone, can integrate with other materials (e.g. literalinclude).

  • git? (naturally comes out of the above)

The basic documentation stack

  • Git repository

  • Hosted on Github

  • Documentation written in ReStructured Text or Markdown

  • Built with Sphinx

    • With various extensions

  • Hosted on ReadTheDocs

Demo: making a change

I want to add the Journal of Open Source Software (JOSS) review checklist (https://joss.readthedocs.io/en/latest/review_checklist.html) to the RSE checklists section (https://scicomp.aalto.fi/rse/#checklists).

Through this, we will see:

  • Git repository layout

  • ReStructructured Text format

  • Sphinx table of contents directives (toctree)

  • Creating a pull request with git-pr

  • Reviewing the pull request

  • Merging

  • See the rendered version.

Building the site

  • Git repo: https://github.com/AaltoSciComp/scicomp-docs/

  • Python projects, including requirements.txt

    • Until recently, was buildable with stock Debian/Ubuntu packages. Now it may require some.

  • conf.py contains all configuration

  • index.rst is the root of all docs.

  • Makefile builds it

    • make html to make it

    • make clean html to rebuild

    • make clean check to build and check for any errors

    • sphinx-autobuild . _build/html/ may be useful

  • View results in _build

Editing on the web

  • The Github web interface is suitable for making changes.

  • You can either directly commit or open a PR.

  • Can we use this more?

Sphinx toctree (table of contents tree)

  • The toctree directive is the fundamental building block of the site.

  • It organizes documents into a tree, and that three is used to make the sidebar. This directive can be put into any page.

  • Example:

    .. toctree::
       :maxdepth: 2
    
       aalto/*
       data/index
       README
    
  • Example: Follow it from index.rstaalto/index.rstaalto/jupyterhub.rstaalto/jupyterhub-instructors/index.rst → various subpages.

  • It makes sense, but for complicated case I often do trial and error.

Arrangement of the site

  • scicomp.aalto.fi started from the Triton wiki

  • It then grew top-level sections for Aalto, Triton, Data, Training, RSE, etc.

  • It is about time that we rethink how it is organized.

Other details

Sphinx

  • Sphinx is a full-fledged extendable documentation generator

  • We use many extensions such as sphinx_gitstamp, sphinx-{copybutton,tabs,togglebutton}, sphinx_rtd_theme.

  • Custom Javascript and CSS in _static.

ReStructured Text syntax

Most surprising ReST points:

  • Double quotes for literals:

    Run ``nano`` to begin
    

    (configurable)

  • Links are scoped:

    :doc:`/triton/index`
    :ref:`tutorials`
    

    (configurable)

  • Two underscores under links:

    The main `Aalto website <https://aalto.fi/>`__
    

Github Action checks

ReadTheDocs

  • https://readthedocs.org provides a management interface for the docs

  • There is a joint aalto-scicomp account to manage it

  • Demo if time, but pretty much self-explanatory

  • Occasionally a build fails for no reason an I need to go wipe and rebuild.

Little-known features

We could use Markdown or Jupyter

  • Via MyST-parser or MyST-nb for Jupyter.

  • They all work together in the same site.

  • ReST is really nicer for this than showing directives into Commonmark.

Compatible with many other projects

  • Standard documentation system for many projects

  • Used in recent CodeRefinery lessons, for example

Minipres

Redirect to HTTPS

  • ReadTheDocs doesn’t natively do this for external domains

  • Done via Javascript

  • Can anyone improve?

Other output formats

  • Sphinx can output to PDF, single-page HTML, epub, manual pages, and more.

  • Can anyone think of a use for this?

Substitution extension

sphinx-gitstamp

Open questions

Pull requests or not?

  • When should we use pull requests? When should we push directly?

  • In practice both are fine, up to you to decide what you want

Sharing with other sites

  • We had this long-term plan to build scicomp.aalto.fi so that other sites could share our HPC tutorials and customize them to their sites.

  • sphinx_ext_substitution (written by rkdarst) could make this easier

  • This has not yet been done, and by now scicomp-docs is so complex I’m not sure if that if it is a reasonable thing to do.

Others at Aalto can use scicomp.aalto.fi

  • Should we encourage others to join our project here?

Testable docs

  • Our dream would be to make examples in a testable form, where one can automatically run them all and find errors.

  • For example, this python-openmp example includes everything needed to submit and run the file.

  • Can this be automatically tested? A bit too complex for the typical doctest.

Integrated HPC-examples

Don’t use ReadTheDocs anymore?

  • Github Pages or other hosting sites would work instead of ReadTheDocs now.

How can we keep things up to date?

  • Requires continuous work, like any docs.

  • What should the threshold be for removing old material?

  • The sphinx-dust extension to remind us to look at pages?

  • We clearly need to think about this more.

Visitor stats

  • ReadTheDocs provides limited stats based on web server logs

  • rkdarst is against detailed web tracking

  • Can we find a way to get both?

Building a community

  • How can we get more people to contribute?