Good practices, programming style and conventions

Python is a very versatile programming language, and it has a very flexible syntax. Therefore, it is also very easy to write ugly code in python. This chapter reviews a number of guidelines and conventions widely accepted by the community.

Use PEP 8

PEP 8 is a style guide for python code. It introduces new rules where the syntax leaves them open. As an example:

In [1]:
# Not PEP8 compliant
a=2*3
# PEP8 compliant
a = 2*3

PEP8 gives guidelines, not strict rules. It is your choice to comply with them or not. As a matter of fact however, many open source projects have adopted PEP8 and require to use it if you want to contribute. If you want to use it too, a good choice is to turn on PEP8 checking in your favorite IDE (in Spyder: Tools -> Preferences -> Editor -> Code Introspection/Analysis -> Tick Real-time code style analysis for PEP8 style analysis).

Give meaningful names to variables and functions

Code space is less important than your time. Give meaningful name to your variables and functions, even if it makes them quite long! For example, prefer:

def fahrenheit_to_celsius(temp):
    """Converts degrees Fahrenheits to degrees Celsius."""

to the shorter but less explicit:

def f2c(tf):
    """Converts degrees Fahrenheits to degrees Celsius."""

Prefer writing many small functions instead of monolithic code blocks

Separation of concerns is an important design principle for separating a computer program into distinct sections, such that each section addresses a separate concern. This increases the code readability and facilitates unit testing.

For example, a scientific script can often be organized in well defined steps:

  • data input (read a NetCDF or text file)
  • data pre-processing (filtering missing data, discarding useless variables)
  • data processing (actual computation)
  • data visualization (producing a plot)
  • output (writing the processed data to a file for later use)

Each of these steps and sub-steps should be separated in different functions, maybe even in different scripts.

Document your code

Write comments in your code, and document the functions with docstrings. Here again there are some conventions that I recommend to follow.

Note that this is useful even if you don't plan to share your code. At the very last there is at least one person reading your code: yourself! And it's nice to be nice to yourself ;-)

Abstruse Goose made a funny comic about code documentation: http://abstrusegoose.com/432

Refactor your code

It is impossible to get all things right at the first shot. Some real-life examples:

  • after a while, a function that you found useful becomes clunky and its signature needs to be redefined
  • a code that worked with data A doesn't work with data B
  • you find yourself copy-pasting 5 lines of codes many times in the same script
  • the name of this variable made sense with data A, but not anymore with data B
  • nobody understands your code

All these are good indicators that your code needs some refactoring. This happens often (even for big programs) and becomes more and more difficult the longer you wait. For example, in the process of doing your assignments for next week you might need to revisit your code. Here also, good IDEs can help you in this process, either by letting you know where the functions are used, and by letting you know when your tests fail.

The Zen of Python

In [2]:
import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

What's next?