Number representation in computers

In this chapter we are going to give elements of understanding of how numbers are represented in today's computers. We won't have time to go into the details (which are perhaps not so relevant for non informaticians), however it is extremely important to understand the pitfalls of the so-called "floating-point arithmetics", because sooner or later one of these pitfalls will hit you (I promise).

Binary representation of integers

Traditional arithmetics and mathematics rely on the decimal system. Every number can be decomposed in a sum of products such as: $908.2 = 9\times10^2 + 0\times10^1 + 8\times10^{0} + 2\times10^{-1}$. This system is called decimal because the factors can take the values $[0 - 9]$ while the fixed base (the number which is scaled by an exponent) is 10.

Nearly all modern processors however represent numbers in binary form, with each digit being represented by a two-valued physical quantity such as a "low" or "high" voltage: 0s and 1s. These binary digits are called bits, and are the basis of the binary representation of numbers.

Basics

With only 0s and 1s available the most efficient way to represent numbers is as a sum of 2s to the power of i, with i going from 0 to N-1. It is best shown with an example, here with N=4 digits:

0101

To convert our number to a decimal representation we compute $0\times2^3 + 1\times2^2 + 0\times2^1 + 1\times2^0 = 5$. In this convention, the leftmost digit is called the most significant bit and the rightmost the least significant bit. If all the elements from the left are zeros, the convention is to omit them when writing. This is for example what the built-in bin function chooses to do:

In [1]:
bin(5)
Out[1]:
'0b101'

which converts an integer number to a binary string prefixed with 0b. Now it appears quite clear that the number of different integers that a computer can represent with this system depends on the size N of the binary sequence. A positive integer represented with a byte (8 bits are called a byte) can thus be as large as:

In [2]:
sum([2**i for i in range(8)])
Out[2]:
255

but not larger. Unless specified otherwise, the first bit is often used to give the sign of the integer it represents. Therefore, the actual range of numbers that a byte can represent is $[-2^7; 2^7-1]= [-128; 127]$ (the reason for this asymmetry is a matter of definition, as we will see later). If you are sure that you only want to do arithmetics with positive number you can spare this one bit and specify your binary number as being of unsigned integer type.

So how many bits does our computer use to represent integers? Well, it depends on the platform and programming language you are using. Many older languages (including old python versions) made the distinction between short (32 bits) and long (64 bits) integers.

Exercise: what is the largest number that a 64-bit integer can represent? The smallest? And a 32-bit unsigned integer?

Now, what is the default length of binary integers in python 3? Let's try to find out:

In [3]:
from sys import getsizeof
a = 10
getsizeof(a)
Out[3]:
28

So 28 bytes? That's a lot of bits. This is because the getsizeof function returns the memory consumption of the object int(10). What does that mean? Well, in python numbers are not only numbers, they are also "things" (objects). And these things come with services, like for example the bit_length method:

In [4]:
a.bit_length()
Out[4]:
4

Exercise: what does bit_length() do? What is the bit length of 5? of 127?

These services have a memory cost (an "overhead"), and are required no matter how big our number is:

In [5]:
size_int = getsizeof(int())
size_int  # size of an empty integer object
Out[5]:
24
In [6]:
def get_int_bitsize(integer):
    """Returns the actual memory consumption of an integer (in bits) without the object overhead."""
    return (getsizeof(integer) - getsizeof(int())) * 8
In [7]:
get_int_bitsize(2)
Out[7]:
32

Ha! This looks more reasonable. So python uses 32 bits to store integers? But then, the largest number it can manipulate must be very small? Let's see if we can create a number larger than 2**32-1:

In [8]:
12**68
Out[8]:
24228399786958997352779519807679893609907752382877715390869269071681028096
In [9]:
get_int_bitsize(2**18), get_int_bitsize(2**68), get_int_bitsize(2**100000)  
Out[9]:
(32, 96, 106688)

As shown above, it turns out that modern python versions have no limitations on the size of integers (other than the total memory available on your computer). The memory slot used to store your number simply depends on how large it is.

In [10]:
get_int_bitsize(2**100000) / 8 / 1024
Out[10]:
13.0234375

So the 2**100000 number requires about 13 KB (Kilobytes) of memory.

Exercise: print 2**100000 on screen, just for "fun".

Overflow

This dynamic resizing of integers in python means that they cannot overflow. Overflow is a common pitfall of many numerical languages though, and to illustrate what it is we can either use floats (unlike python integers, python floats can overflow), or numpy, which uses integer types of fixed length:

In [11]:
import numpy as np
a = np.int8(127)
a.dtype
Out[11]:
dtype('int8')
In [12]:
a
Out[12]:
127
In [13]:
a + np.int8(1)
/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/ipykernel_launcher.py:1: RuntimeWarning: overflow encountered in byte_scalars
  """Entry point for launching an IPython kernel.
Out[13]:
-128

What happened here? To understand this we need to understand how binary numbers are added together first. Please read the addition chapter of the Wikipedia article on binary numbers before going on.

Basically, we added 1 (binary 00000001) to 127 (binary 01111111) which gives us the binary number 10000000, i.e. -128 in two's complement representation, which is the most common method of representing signed integers on computers. At least, python warned us that we are doing something wrong here. But be aware that this is not always the case:

In [14]:
np.int32(2147483648)
Out[14]:
-2147483648
In [15]:
a = np.array([18])
a**a
Out[15]:
array([-497033925936021504])

These are examples of silent overflows. Silent means that they do not warn you about the probable mistake and could happen in your code without you noticing that this happens.

Extending the binary representation to non-integers: fixed-point notation

A more general definition of the binary representation for integers is to use negative exponents as well. Any rational number a can be approximated by:

$$a = \pm \sum_{i=-j}^{k} z_i b^i$$ with $b > 1$ (base) and $0 \le z_i \le b-1$, $j$ and $k$ all positive integers. The precision depends on the size of $j$ while the maximum range depends on the size of $k$.

In the notation, a fixed point separates digits of positive powers of the base, from those of negative powers. In base 2 (binary), the number $10000.1_2$ is equal to $16.5_{10}$ in base 10.

Indeed:

$$1\times2^4 + 0\times2^3 + 0\times2^2 + 0\times2^1 + 0\times2^0 + 1\times2^{-1} = 16.5_{10}$$

Although this representation is convenient, the representable value range is heavily limited by the number of bits available to represent the number (see this week's assignments). Therefore, most computers today are relying on the floating point notation to represent real numbers.

Floating point numbers

Introduction

Because computers have a finite number of storage units available, they can only represent a finite number of distinguishable values. In fact, a memory slot with $N$ available bits cannot represent more then $2^N$ distinguishable values. The range of real (or complex) numbers is of course infinite, and therefore it becomes clear that in the computer representation of numbers there will always be a trade-off between the range of numbers one would like to represent and their relative accuracy (i.e. the absolute difference between two consecutive representable numbers).

Taking the decimal representation of the number 1/3 as an example: it can be written as 0.33, 0.333, 0.3333, etc. Depending on the numbers of digits available, the precision of the number will increase but never reach the exact value, at least in the decimal representation.

This fundamental limitation is the explanation for unexpected results of certain arithmetic operations. For example:

In [16]:
0.1 + 0.1  # so far so good
Out[16]:
0.2
In [17]:
0.1 + 0.2  # wtf?
Out[17]:
0.30000000000000004

This is a typical rounding error, happening because most computers do not represent numbers as decimal fractions, but as binary. Without going too much into details (which can be tricky), this chapter will give you some elements of understanding in order to prepare you for the most common pitfalls of floating point arithmetics.

Scientific (exponential) notation in base b

In the exponential notation, a number is approximated with a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base; the base for the scaling is normally two, ten, or sixteen. A number that can be represented exactly is of the following form:

$\mathrm{number} = \mathrm{significand} \times \mathrm{base} ^{\mathrm{exponent}}$

For example, in base 10:

$$1.234 = 1234 \times 10^{-3}$$

The number 1.234 can easily be represented exactly in base 10, but the number 1/3 cannot. However, in base 3 the latter can be represented exactly by $1 \times 3^{-1}$.

To approximate the number 1/3 in base 10, the rule of "the more bits you have, the more precise you can be" that we learned with integers continues to hold true: $33 \times 10^{-2}$ and $33333333333333 \times 10^{-14}$ are two ways to approximate the number, the later being more expensive in terms of storage requirements but more accurate.

The exponential notation is the most common way to represent real numbers in computers and is the basis of the floating-point number representation. A floating point number in base 2, 10, or N will store the two attributes: the significand (including its sign) and the exponent.

IEEE 754

This precision/size trade-off raises many challenges: memory and computer time must be optimized while still allowing programs to realize computations on a wide range of numbers in the same program. A good example is provided with atmospheric models, which realize computations on specific humidity values ($\mathrm{kg}\,\mathrm{kg}^{-1}$) with typical values of 10$^{-5}$ or smaller up to values of several order of magnitude larger. Using different standards for each set of numbers would be impracticable. This lead to the development of the IEEE Standard for Floating-Point Arithmetic (IEEE 754).

The standard defines five basic formats that are named for their numeric base and the number of bits used in their encoding, as listed in this table. For example, binary64 is the famous "double precision" format, called float64 in numpy and simply float in the python standard library. In this format, 53 bits are used for the significand, and the remaining 11 for the exponent. Remember:

  • the significand determines the precision of the representation (significant digits)
  • the exponent determines the magnitude of the represented number

This has the important consequence that small numbers have a larger absolute precision than large numbers. See this example:

In [18]:
.99 == .98  # so far so good
Out[18]:
False
In [19]:
999999999999999.99 == 999999999999999.98  # wtf?
Out[19]:
True
In [20]:
999999999999999.99  # wtf?
Out[20]:
1000000000000000.0

Fortunately, it is possible to compute the approximate precision of IEEE754 floating point numbers according to their value:

img Source: wikipedia

Overflow

Like numpy integers, floating point numbers can overflow:

In [21]:
np.float16(65510) + np.float16(20)
/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/ipykernel_launcher.py:1: RuntimeWarning: overflow encountered in half_scalars
  """Entry point for launching an IPython kernel.
Out[21]:
inf

Fortunately, the IEEE 754 standard defines two new numbers (-inf and +inf) which are more informative (and less dangerous) than the negative reset of integer numbers. IEEE 754 also defines "Not A Number" (abbreviated NaN), which propagates through computations:

In [22]:
np.NaN * 10
Out[22]:
nan

What to do from here?

As we've learned, errors in floating-point numbers are unavoidable. Even if these errors are very small, simple calculations on approximated numbers can contain pitfalls that increase the error in the result way beyond just having the individual errors "add up".

Alternative data types

In certain cases where perfect decimal accuracy is needed (for example when dealing with currencies and money), it is possible to use a decimal floating point representation instead of the default binary one:

In [23]:
1/10*3  # not precise
Out[23]:
0.30000000000000004
In [24]:
from decimal import Decimal
Decimal(1) / Decimal(10) * 3  # precise
Out[24]:
Decimal('0.3')

With limited-precision decimals there are no unexpected rounding errors. In practice, however, alternative datatypes are used rarely because the precision gain comes with a performance overhead: computers work best with 0s and 1s, not with numbers humans can read.

Symbolic computations

Symbolic computations are realized literally (like in mathematics), not approximately. SymPy is the most famous python library for symbolic mathematics:

In [25]:
import sympy
a = sympy.sympify('1 / 3')
a + a
Out[25]:
2/3

Seams like the perfect solution, right? It probably is if you are a mathematician, but for actual numerical computations SymPy will be way too slow to use. Symbolic mathematics can only be used for problems where analytical solutions are known. Unfortunately, this is not always the case (take numerical models of the atmosphere for example).

Deal with it

There are no simple answers to numerical rounding errors. Be aware that they occur and try to mitigate their effect.

Awareness

Awareness is mostly hindered by the string representation of floating point numbers. In practice:

In [26]:
0.1
Out[26]:
0.1
In [27]:
format(0.1, '.16g')  # give 16 significant digits
Out[27]:
'0.1'
In [28]:
format(0.1, '.30g')  # give 30 significant digits
Out[28]:
'0.100000000000000005551115123126'

The default 0.1 print is therefore a "lie", but it is a useful one: in most cases you don't want to know about these insignificant digits at the end. The numpy.finfo is a useful function informing you about the machine limits for floating point types:

In [29]:
info = np.finfo(np.float)
info.bits, info.precision, info.max
Out[29]:
(64, 15, 1.7976931348623157e+308)

Error propagation

Preventing rounding errors to happen is not possible, but there are a few general rules:

  • Multiplication and division are "safer" operations
  • Addition and subtraction are dangerous, because when numbers of different magnitudes are involved, digits of the smaller-magnitude number are lost.
  • This loss of digits can be inevitable and benign (when the lost digits are also insignificant for the final result) or catastrophic (when the loss is magnified and distorts the result strongly).
  • The more calculations are done (especially when they form an iterative algorithm) the more important it is to consider this kind of problem.
  • A method of calculation can be stable (meaning that it tends to reduce rounding errors) or unstable (meaning that rounding errors are magnified). Very often, there are both stable and unstable solutions for a problem.

(list taken from from the floating point guide)

As illustration for the difference between addition and multiplication, see the following example:

In [30]:
a = 10 * 0.1
b = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
a == b
Out[30]:
False
In [31]:
a, b
Out[31]:
(1.0, 0.9999999999999999)

Realize safer computations therefore involves asking yourself at which stage the most precision is going to be lost: this is most often the case when adding numbers of very different magnitudes. When building numerical models, this should always be something you consider: is a formula leading to dangerous additions, then reformulate it and/or use other units for your variables (e.g. $\mathrm{g}\,\mathrm{kg}^{-1}$ instead of $\mathrm{kg}\,\mathrm{kg}^{-1}$ for specific humidity). Consider the following example:

In [32]:
a = 1
b = 1e-16
c = 10
In [33]:
c1 = c * (a + b)
c2 = c * a + c * b
In [34]:
c1 == c2
Out[34]:
False

c * (a + b) and c * a + c * b are mathematically equivalent, and the first is computationally less expensive (two operations instead of three). However, the second is less prone to rounding errors!

Safer tests

Fortunately, rounding errors often remain unnoticed, meaning that your computations are probably OK! In our field in particular, we often do not care if the post-processed temperature forecast is numerically precise at 0.001° since the forecast accuracy is much lower anyway. However, this can still lead to surprises when comparing arrays for equality (e.g. when testing that a temperature is equal to zero, or for matching coordinates like longitude or latitude). In these cases, prefer to use numpy's specialized functions:

In [35]:
np.isclose(c1, c2)  # OK is you don't care about small numerical errors
Out[35]:
True

Take home points

  • computers can only represent a finite number of distinguishable values.
  • the range of representable numbers depends on the size of memory allocated to store it. There is practically no limit to the size of integers in python, but there is a limit for floats. Numpy implements several types of variables named after their size in bits (e.g. np.float32, np.float64, np.float128).
  • there are many different ways to represent numbers on computers, all with different strengths and weaknesses. The vast majority of systems use the IEEE 754 standard for floating points, which is a good compromise between range and accuracy. Most systems are binary (base 2) per default, but there are other bases available: base 10 (decimal) and base 16 (hexadecimal) are frequent as well.
  • rounding errors happen because of these limitations. They always happen, even for the simplest arithmetics, and you shall not ignore them.
  • however, there are ways to mitigate the impact of these rounding errors. This includes the use of more precise datatypes (e.g. float64 instead of float32), alternative representations (decimal instead of binary), and the use of more conservative operations (e.g. multiplication before addition when dealing with numbers of different magnitude)
  • floating point errors have dramatic consequences in chaotic systems. A scary example is given in this paper about the influence of floating point computations on numerical weather forecasts.

Further reading

Because of the importance of floating point computations you will find many resources online. I highly recommend to go through the short floating point guide website, which explains the problem to non specialists. It will give you another point of view on the topic.

Other resources:

What's next?