In this chapter we are going to learn how to check that the code you write is working as expected. This will be achieved with the introduction of testing tools which will help you to formalize and organize your tests, as well as with the definition of new concepts: unit testing and test driven development.
Nobody can write bug-free code in the first try. Believe me on that one, nobody can. Therefore, every programmer constantly needs to test that a newly written code is working as expected. Very often, scientists without formal training in computer sciences and programming will test their code "ad-hoc", i.e. with trial-and-error approaches on an interactive console or with print statements throughout the code. While this is a quick way to get the job done at the first sight, it is in fact a dangerous, non-scientific practice. Today and throughout the lecture I will often remind you of a simple rule:
Always write tests for your code.
Why am I insisting so much on this? Let's discuss it further.
As honest scientists we always have to provide results which are correct to the best of our knowledge. In recent years, erroneous software code lead to several "scandals" in the scientific world. For example, in 2006 scientists had to retract five of their papers after discovering that their results were based on erroneous computations. "As a result of a bug in a Perl script", other authors retracted a paper in PLoS. The challenge of (ir)reproducible research can only be tackled with openly tested scientific software.
The requirement of "100% error-free code" announced in my title will of course never be achieved, but we have to do our best to avoid situations like described above. This slide by Pietro Berkes presented at a SciPy tutorial brings it quite well to the point:
Checking that "the curve looks okay" or that "the relative humidity values seem realistic" is not a quantitative test: it is error prone and possibly wrong. Testing must be achieved in a formal and quantitative way:
This is a problem often overseen by scientists: once written and tested "ad-hoc", a script will be used and sometimes shared with colleagues. However, the same code might provide different results at a later time, as shown in these examples:
DeprecationWarning
when executedclever_compute()
which was too slow, and now it doesn't work anymoreabc()
where I found a bug, but now the other function dfg()
does not work anymore. In all these cases, formal tests would have helped to isolate the problems at an early stage and avoid late and tedious debugging. Therefore, tests are sometimes called "regression tests": once a test is written and works, it serves the purpose to make sure that future changes won't break things. Tests should be re-run each time something is changed in the code or when third-party packages are updated.
You would never use data from an uncalibrated temperature sensor, would you? Scientific code is no different: never trust untested code. Don't trust tested code either ;-), but untested code is much worse. When using new scripts/packages for your work, always check that these come with good tests for the functionality you are using.
One way to test your code is to write assertions within your scripts and functions. Let's say we would like to write a function converting degrees Fahrenheits to degrees Celsius:
def f2c(tf):
"""Converts degrees Fahrenheits to degrees Celsius"""
r = (tf - 32) * 5/9
# Check that we provide physically reasonable results
assert r > -273.15
return r
Assert statements are a convenient way to insert debugging assertions into a program. While programming it helps to check that some variables comply to certain rules you have decided are important. Here are some examples of assert statements:
result = 2.
assert result > 0
assert type(result) == float
Asserts are convenient, but they shouldn't be used too often. Indeed, my example above is a good example of bad code, since it might fail because of bad user input:
f2c(-1000)
This is not very informative. A much better practice would be:
def f2c(tf):
"""Converts degrees Fahrenheits to degrees Celsius"""
if tf < -459.67:
raise ValueError('Input temperature below absolute zero!')
r = (tf - 32) * 5/9
return r
Assertions in code are similar to debug print()
statements filling a script: they help to debug or understand an algorithm, but they are not sustainable and should not replace real, independent tests.
Now, how should we test our f2c
function appropriatly? Well, this should happen in a dedicated script or module testing that the function behaves like expected. A good start is to look for typical conversion table values and see if our function returns the expected values:
assert f2c(9941) == 5505
assert f2c(212) == 100
assert f2c(32) == 0
If the assertions fail it would let us know that something is wrong in our function. These tests, however, are still considered "ad-hoc": if you type them in the interpreter at the moment you write the function ("trial and error development") but don't store them for later, you might brake your code later on without noticing. This is why test automation is needed.
pytest is a python package helping you to formalize and run your tests efficiently. It is a command line program recognizing and executing tests on demand as well as a python library providing useful modules helping programmers to write better tests.
Let's start with an example:
Exercise: if not on a university computer, install the pytest package (i.e. conda install pytest
). In your working directory, create a new metutils.py
module, which provides the f2c
function. In the same directory, create a new test_metutils.py
module, which contains a test_f2c
function. This function must implement the tests we described above (note: don't forget to import metutils in the test module!). Now, in a command line environment (not in python!), type:
$ pytest
You should see an output similar to:
============================= test session starts ==============================
platform linux -- Python 3.6.4, pytest-3.4.2, py-1.5.2, pluggy-0.6.0
rootdir: /home/c7071047/ptest, inifile:
collected 1 item
test_metutils.py . [100%]
=========================== 1 passed in 0.02 seconds ===========================
What did pytest
just do? It searched the working directory for certain patterns (files and functions starting with test_*
) and executed them.
Exercise: now write a new function in the metutils
module called c2f
, converting Celsius to Fahrenheits. Write a new test for this function. Then, write a third test function (test_roundtrip_f2c
), which tests that for any valid value the "roundtrip" val == f2c(c2f(val))
is true. Run pytest
to see that everything is working as expected.
If you have read carefully you probably noticed that in this section I recommended against heavy use of assert
statements in your code. So what about all these assertions in our tests? Well, assert
statements in tests executed by pytest
are very different: they provide enriched information provided by the pytest package itself.
Exercise: make a test fail, either by altering the original function, or by writing a wrong assert statement. Verify that the pytest
log is giving you information about why the test is now failing.
You might be surprised to learn that there is a broad literature dedicated to software testing only. Authors and researchers agreed upon certain semantics and standards, and we are going to introduce a few of these concepts here.
Unit tests have the purpose to test the smallest possible units of source code, i.e. single functions in python. Units tests have the advantage to be highly specialized and are often written together or before the actual function from which they are testing the functionality.
Unit tests are useful because they encourage programmers to write small "code units" (functions) instead of monolithic code. Let's make an example: say you'd like to write a function which computes the dewpoint temperature from temperatures (in Fahrenheits) and relative humidity (in %). This function will first convert °F to °C (a first code unit), then the dewpoint temperature in °C (another unit), and convert back to °F (a third unit). A good way to write this tool would be to write the three smallest functions first and test them independently.
Integration tests check that the units are working as a group. To keep the dewpoint example from above, an integration test would be a test that checks that the entire computation chain works as expected.
Verification tests is the process of checking that a software system meets the specifications. Software specifications is an important concept in engineering and commercial applications, where softwares are built to meet the customer's needs.
In science, the verification tests would check that a function really meets the documented features. For example: does my function really works with arrays of any dimension (as announced in the documentation) or only with scalars? Does my function really fail for relative humidity values below 0%?
If you spend some time on the wikipedia articles linked above you will see that there is a large variety of concepts behind software testing practices. For us scientists these concepts are only partly applicable because often we do not have to meet customer requirements the same way software developers do. For now we will stick with these three concepts and try to apply them as thoroughly as possible.
At the beginning you will probably find writing tests hard and confusing. "What should I test and how?" are the questions you will have to ask yourself very often.
In the course of the semester this task will become easier and easier thanks to an influential testing philosophy: test-driven development (TDD). In TDD, one writes the test before coding the actual feature one would like to implement. Together with the function's documentation, this encourages to thoroughly think about the design of of a function's interface before writing it.
Chart by [Pietro Berkes](http://archive.euroscipy.org/talk/6634)
In principle all unit tests follow a very similar structure, the Arrange-Act-Assert workflow:
f2c
test we needed to gather the data in °F (to be converted) and °C (to be compared to). This step is the most complicated and can sometimes occupy most of the test functionThis is the most obvious kind of test and the one we used above. Following the AAA structure:
# Arange
tf = 212
expected = 100
# Act
result = f2c(212)
# Assert
assert result == expected
In most scientific applications, exact solutions cannot be guaranteed (one reason is floating point accuracy as we will learn later in the course). Therefore, numpy
come with several handy testing functions. We first need to make our function a little more flexible:
import numpy as np
# Make f2c numpy friendly
def f2c(tf):
"""Converts degrees Fahrenheits to degrees Celsius.
Works with numpy arrays as well as with scalars.
"""
if np.any(tf < -459.67):
raise ValueError('Input temperature below absolute zero!')
r = (tf - 32) * 5/9
return r
And now the test:
# Arange
tf = np.array([32, 212, 9941])
expected = np.array([0, 100, 5505])
# Act
result = f2c(tf)
# Assert
np.testing.assert_allclose(result, expected)
Other very useful assert functions are listed in the numpy documentation.
# Arange
tf = np.array([32, 212, 9941])
# Act
result = f2c(tf)
# Assert
assert np.all(~np.isnan(result))
These kind of tests verify that a function complies with the documentation. The focus is not necessarily the result, but also the type of the result:
# Arange
tf = np.array([32, 212, 9941])
# Act
result = f2c(tf)
# Assert
assert type(tf) == type(result)
assert len(tf) == len(result)
# Arange
tf = 32.
# Act
result = f2c(tf)
# Assert
assert type(tf) == type(result)
These tests are also testing the behavior of a function rather than its input. They are extremely important for libraries like numpy or matplotlib:
import pytest
with pytest.raises(ValueError):
f2c(-3000)
Or, even more precise:
with pytest.raises(ValueError, match="Input temperature below absolute zero"):
f2c(-3000)
Writing tests is hard. Programmers often spend as much time writing tests than writing actual code. Scientists cannot spend that much time writing tests of course, but I strongly believe that learning to write tests is a very good investment in the long term.
I often hear the question: "In science we are programming new models and equations, we can't test for that". This is only partly true, and should not refrain you from writing unit tests. In particular, automated tests that check that data types are conserved, that a function returns valid values, or that a function simply runs are already much better than no test at all.
Some parts of the code are really hard to test. For example:
For these cases (and other issues such as automated testing and continuous integration) we will get back to the topic in a second testing chapter at the end of the semester.
test_*
. They are run automatically with pytest
Back to the table of contents, or jump to the next chapter.