Python dictionaries#

You’ve already learned about a few important datatypes which are built into the python language: float and int (for numbers), str (for text), list (for mutable sequences), tuple (for immutable sequences), func (for functions)…

Today, you will learn about the a further important built-in datatype in the python language: dictionaries (dict)

Copyright notice: this lecture is inspired from the py4e lecture on dictionaries (CC-BY - Charles R. Severance).

What is a dictionary?#

You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values. Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.

As an example, we’ll build a dictionary that maps from English to German words, so the keys and the values are all strings.

The function dict creates a new dictionary with no items. Because dict is the name of a built-in function, you should avoid using it as a variable name:

en2de = dict()
print(en2de)
{}

The curly brackets, {}, represent an empty dictionary. To add items to the dictionary, you can use square brackets:

en2de['one'] = 'eins'
en2de
{'one': 'eins'}

This output format is also an input format. For example, you can create a new dictionary with three items:

en2de = {'one': 'eins', 'two': 'zwei', 'three': 'drei'}
en2de
{'one': 'eins', 'two': 'zwei', 'three': 'drei'}

Important

In earlier versions of Python (pre 3.7, i.e. pre 2018), dictionaries where not guaranteed to be “ordered”, i.e. the order at which the key-value pairs are entered were not guaranteed to be preserved when printing the dict. As of now, this has become a feature of the language; but you will find many tutorials online (including PY4E) who warn you about this behavior. They are not “wrong”, they are just a little outdated.

In any case, the strength of dictionaries does not reside in their order (the previous behavior was OK in a large majority of the cases), but in the way one can access the values in them:

en2de['three']  # Acces the value stored at the key "three"
'drei'

This is what dictionaries are very good at: finding stuff, quickly (we’ll get back to that).

If the key isn’t in the dictionary, you get an exception:

en2de['four']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_123926/90977062.py in <cell line: 1>()
----> 1 en2de['four']

KeyError: 'four'

The len function works on dictionaries; it returns the number of key-value pairs:

len(en2de)
3

The in operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing as a value is not good enough):

'one' in en2de
True
'eins' in en2de
False

To see whether something appears as a value in a dictionary, you can use the method .values(), which returns the values as a type that can be converted to a list, and then use the in operator:

values = list(en2de.values())
'eins' in values
True

This type of usage is rare for dictionaries though.

Why dictionaries?#

dict is the last built-in (“standard”) python datatype you will learn from me in this class. But is is not any less important than the other ones. Dictionaries are used in plenty of situations! Let me show you a few “real world” examples.

Dictionary as “translators”#

Similar to the example above, dictionaries are good to store data (values) sorted by entries (keys). We use this feature to provide several language on our world glacier explorer app. The translation algorithm looks a lot like this simplified code:

s1 = {
    'en': 'Hello {}! ',
    'fr': 'Bonjour {}! ',
    'de': 'Hallo {}! ',
}

s2 = {
    'en': 'How are you?',
    'fr': 'Comment ça va?',
    'de': 'Wie geht es dir?',
}

# Pick a language and print the sentence
# Note that we are looping over the keys with this simple syntax
for lan in s1:
    # Print the sentence
    print(s1[lan].format('Émilie') + s2[lan])
Hello Émilie! How are you?
Bonjour Émilie! Comment ça va?
Hallo Émilie! Wie geht es dir?

Here is the translation file for the glacier explorer app if you are interested.

Dictionary as a set of counters#

Suppose you are given a string and you want to count how many times each letter appears. There are several ways you could do it, but one good way is to use a dict:

word = 'brontosaurus'
d = dict()
for c in word:
    if c not in d:
        d[c] = 1
    else:
        d[c] = d[c] + 1
d
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}

We are effectively computing a histogram, which is a statistical term for a set of counters (or frequencies).

The for loop traverses the string. Each time through the loop, if the character c is not in the dictionary, we create a new item with key c and the initial value 1 (since we have seen this letter once). If c is already in the dictionary we increment d[c] by 1.

Dictionaries have a method called .get() that takes a key and a default value. If the key appears in the dictionary, get returns the corresponding value; otherwise it returns the default value. For example:

counts = {'chuck': 1, 'annie': 42, 'jan': 100}
print(counts.get('jan', 0))
print(counts.get('tim', 0))
100
0

We can use .get() to write our histogram loop more concisely. Because the get method automatically handles the case where a key is not in a dictionary, we can reduce four lines down to one and eliminate the if statement:

word = 'brontosaurus'
d = dict()
for c in word:
    d[c] = d.get(c, 0) + 1
d
{'b': 1, 'r': 2, 'o': 2, 'n': 1, 't': 1, 's': 2, 'a': 1, 'u': 2}

The use of the .get() method to simplify this counting loop ends up being a very commonly used “idiom” (strategy) in Python. So you should take a moment and compare the loop using the if statement and in operator with the loop using the get method. They do exactly the same thing, but one is more succinct.

Dictionaries as very efficient “search” tools#

Lists and dictionaries have a few things in common:

  • they both contain data (they are “containers”)

  • they have a length

  • they can be indexed

Lists are indexed by integer (location), while dicts are indexed by key. This makes the reading and structuring of data much easier (see “Dictionary as containers” section below).

One of the things that dictionaries excel at is to check whether or not a key is available or not. Check the following examples

l = list(range(1000))

l is a list of 1000 numbers (from 0 to 999). You can check if a value is in the list with:

2 in l
True

It turns out that the time needed for this check is much longer if the value is at the end of the list or at the beginning:

%timeit 2 in l
43.1 ns ± 9.99 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit 998 in l
5.91 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

So almost a factor 200 slower!!!

Exercise: can you explain why this is the case?

Now lets repeat this exercise with a dict:

d = dict(zip(l, l))  # you can forget about this command for now

d is now a dict with 999 key-value pairs:

print(d[0], d[1], d[999])
print(2 in d)
0 1 999
True

Let’s see the performance of our search:

%timeit 2 in d
35.5 ns ± 5.58 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So more or less the performance of a list search when the value is at the beginning. What about the end?

%timeit 998 in d
41.9 ns ± 2.11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So almost the same performance!!! This feature is possible thanks to a very clever algorithm (hash-tables) which is not part of this lecture.

OK, so why is this useful? See for example this list of English words I downloaded from the internet. If you click on the link you will see that the author decided to store it as a json file (a dictionary) with the words as keys and only ones as values. But why? Let’s find out:

import json
with open('words_dictionary.json') as wf:
    words = json.load(wf)
len(words)
370101
words['computer']
1

This looks a bit silly (the “1s” are truly useless), but with this we can take advantage of the dictionaries efficient hash-table search algorithm. Checking if “ambivalent” exists is as fast as checking for “zoo”:

for w in ['zoo', 'bird', 'spätzle']:
    if w in words:
        print(f'{w} is an english word')
    else:
        print(f'{w} is not an english word')
zoo is an english word
bird is an english word
spätzle is not an english word

Note

Another python datatype (sets) would be even better than dicts (same speed, no need for “1s”), but we won’t talk about is today.

Dictionaries as structured containers#

This is probably the main use of dictionaries. They allow to share information in a structured and human readable form. For example, the ACINN weather data shared online comes in the JSON format (which is read as a dictionary in python): the keys are the variable names and the values are the timeseries.

The dictionary syntax (d['key'] = value) is also used extensively by high-level data structures such as pandas dataframes. This will be the topic of our next unit.

Learning checklist#