Documenting your code is an integral part of the programming process. In this chapter I give some recommendations about how to write a useful documentation and how dedicated tools can be used to generate an html documentation for your project.
There are three major elements of code documentation:
At the very least, your own code should have inline comments. API documentation is also very important, but it has a larger risk of being desynchronized from the code and therefore dangerous.
Inline comments are plain text explanations of your code. As written in cs.utah.edu: All programs should be commented in such a manner as to easily describe the purpose of the code and any algorithms used to accomplish the purpose. A user should be able to utilize a previously written program (or function) without ever having to look at the code, simply by reading the comments.
They are often placed at the top of a file (header comments) or before a thematic code block:
# 1D interpolation example from scipy
# see: https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
# Inline comments are my own (F. Maussion)
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
%matplotlib inline
# Create synthetic data for the plot
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
# Define two different interpolation functions to compare:
# linear (default) and cubic
fl = interp1d(x, y)
fc = interp1d(x, y, kind='cubic')
# Location at which we want to interpolate
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Compute the interpolated points and plot
plt.plot(x, y, 'o', xnew, fl(xnew), '-', xnew, fc(xnew), '--')
plt.legend(['data', 'linear', 'cubic'], loc='best');
As you can see, the comments help to organize the code. The same example without comments is much less engaging:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
%matplotlib inline
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
fl = interp1d(x, y)
fc = interp1d(x, y, kind='cubic')
xnew = np.linspace(0, 10, num=41, endpoint=True)
plt.plot(x, y, 'o', xnew, fl(xnew), '-', xnew, fc(xnew), '--')
plt.legend(['data', 'linear', 'cubic'], loc='best');
Comments can include references to algorithms or indications about who wrote these lines. They should become automatic when you write code.
Inline comments can be more harmfull than helping. See the following example:
# Numpy tutorial on matrix multiplication
# Author: mowglie
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
%matplotlib inline
# linspace between 0 and 10
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0) # apply cosinus
# from documentation:
# https://docs.numpy.org/doc/numpy/reference/generated/scipy.interpolate.interp1d.html#scipy.interpolate.interp1d
fl = interp1d(x, y)
fc = interp1d(x, y, kind='cubic')
# linspace between 0 and 10
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Plot
plt.plot(x, y, 'o', xnew, fl(xnew), '-', xnew, fc(xnew), '--')
plt.legend(['data', 'linear', 'cubic'], loc='best');
What are the problems here?
Regarding obvious comments, I really like this comic by Abstruse Goose which I find brings it quite to the point:
Basically: think about future readers when writing both code and comments! And don't forget: this future reader might be you, and you'll be thanking yourself.
A function signature (or type signature, or method signature) defines input and output of functions or methods. When writing a function, you expect users (including yourself) to use it for a certain period of time. Ideally, you would like to understand what a function does long after writing it. This is what docstrings are for:
def repeat(phrase, n_times=2, sep=', '):
"""Repeat a phrase a given number of times.
This uses the well known alogorithm of string multiplication
by GvR et al.
Parameters
----------
phrase : str
The phrase to repat
n_times : int, optional
The number of times the phrase should be repeated
sep : str, optional
The separator between each repetition
Returns
-------
str
The repeated phrase
Raises
------
ValueError
When ``phrase`` is not a string
"""
if not isinstance(phrase, str):
raise ValueError('phrase should be a string!')
return sep.join([phrase] * n_times)
Docstrings have a special meaning in python. They are not used by the language itself, but python offers way to deal with them:
print(repeat.__doc__)
The dosctring is also read by ipython when calling help (?
) on a function.
There are no strict rules about how docstrings should be written, but the scientific community has more or less agreed on a convention: numpydoc, first written for python and then applied by many other projects in the scientific stack. By complying to this convention you'll make the job of your readers easier.
The convention describes how to describe the input and output variables type as well as other information. More importantly, it can be parsed automatically by doc generators like Sphinx (see below).
I highly recommend to write numpydoc docstrings for your projects. There is one exception to this recommendation though: write docstrings only if they are accurate and if you plan to maintain them. Indeed, wrong/false documentation is worse than no documentation at all: it gives others confidence in what your function is supposed to do (a "contract"), and if you not complying to this contract will lead to bugs and deceptions.
Writing documentation is hard and tedious. It is a task that most people want to avoid, but it is extremely important. In particular in the python world where almost everything is open-source and based on the work of volunteers, documentation might sometimes be neglected.
Fortunately, some tools make it easier for open-source programmers to write documentation.
Shpinx is a tool that makes it easy to create intelligent and beautiful documentation.It can parse your documentation written as text files and convert them to nice, engaging html websites. Importantly, sphinx can parse python code and write an API documentation automatically.
Many open-source projects use sphinx for their documentation, including numpy and xarray. In the lecture we will make a demo of sphinx by building the xarray documentation locally.
readthedocs.org is a platform hosting the documentation of open-source projects for free. It build the documentation website using sphinx and actualizes it at each code update. The documentation of the Open Global Glacier Model or xarray are hosted on ReadTheDocs.
Back to the table of contents, or jump to the next chapter.