Modules, import mechanism, namespaces, scope(s)

In this chapter we will dive into one of the best features of python: the import mechanism. But before we go on, I ask you to go through the python tutorial section 6.

Prerequisites: you have read the python tutorial, sections 3 to 6.

Variables scopes

The scope of a variable is the region of the program where the variable is valid: where the variable name can be used to refer to the entity. Let's consider this first example:

In [1]:
def foo():
    i = 5
    print(i, 'in foo()')

print(i, 'global')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-0dfefe1d7f39> in <module>()
      3     print(i, 'in foo()')
      4 
----> 5 print(i, 'global')

NameError: name 'i' is not defined

It doesn't work because the name i is defined in the function which we defined but never called (nothing is printed), so the function's statements were actually never run. Let's see if the following example works better:

In [2]:
def foo():
    i = 5
    print(i, 'in foo()')

foo()

print(i, 'global')
5 in foo()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-afc465d69bd4> in <module>()
      5 foo()
      6 
----> 7 print(i, 'global')

NameError: name 'i' is not defined

Here, the function is called and i is defined and equal to 5. However, the scope of i is the block defined by the function foo: outside of this scope the variable doesn't exist anymore. If this is understood, the following example might now be clearer?

In [3]:
i = 1

def foo():
    i = 5
    print(i, 'in foo()')

print(i, 'global before foo()')

foo()

print(i, 'global after foo()')
1 global before foo()
5 in foo()
1 global after foo()

global scope refers to the highest scope level (the module or, in interactive mode, the interpreter. The function's scope in turn is called local scope. One says that the global scope prevails because what is possible in one direction isn't possible in the other:

In [4]:
k = 2

def foo():
    print(k, 'is there a k in foo()?')

foo()
2 is there a k in foo()?

Yes, there is. Global variables are available in the nested local scopes. They can be overridden locally (like in the example above), but this change won't be visible at the global scope level.

Exercise: do you think there is a way to really overwrite the value of a global variable from a local scope? If yes, can you find it?

Namespaces

Global variables are useful, but they should be used with care. Indeed, since they are available in the entire script or module, the namespace can quickly become "polluted", i.e it is hard to keep track of which variable is available where, and in which order. This can become even more complicated when new modules with their own new variables and functions are used.

Fortunately, the global scope is constrained to the current script or module, and using external modules is unlikely to cause confusion thanks to the import mechanism:

In [5]:
pi = 3.14
import math
print(math)
print(pi, math.pi)
<module 'math' (built-in)>
3.14 3.141592653589793

By importing the math module (available in python's standard library), we have access to new variables and functions. What are they?

In [6]:
dir(math)
Out[6]:
['__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'trunc']

Exercise: now ask the documentation what dir does (remember how? Simply type ? dir in the ipython interpreter, or use g00gle).

In the documentation we encountered a new concept (again!): attributes. We'll get back to them soon, but for now let's just observe that the list above contains the modules functions (like math.radians) as well as variables (math.nan). Some of the attributes have a prefix of two underscores __. We'll get back to them soon.

As you have learned in the tutorial, the four following pieces of code are equivalent:

In [7]:
# A
import math
math.sin(math.pi/2)
Out[7]:
1.0
In [8]:
# B
import math as ma
ma.sin(ma.pi/2)
Out[8]:
1.0
In [9]:
# C
from math import sin, pi
sin(pi/2)
Out[9]:
1.0
In [10]:
# D
from math import *
sin(pi/2)
Out[10]:
1.0

Now, which one to use? It is up to you, but there are some implicit rules (conventions) which are widely accepted:

  1. in case of doubt, use option A. It is the most readable and the most explicit
  2. the exception to rule 1 is when the library your are using has a naming convention for its acronym. A good example is numpy, which recommends to use import numpy as np
  3. option C might be useful if the names you are importing are very explicit, and if you expect to use them often in your script. Otherwise, it is not recommended
  4. option D is bad. Don't use it.
  5. if you really want to, use option D. It happens to me too. But keep it for yourself or for the command line (in interactive mode), and don't give your code to other people.

Exercise: try to find arguments as to why is option D a bad idea.

PYTHONPATH

But how does python know where to look for modules when you type in import mymodule? Well, it relies on a mechanism very similar to linux's PATH environment variable. Remember this one? Within python, you can ask in which directories the interpreter will look for modules with sys.path:

import sys
sys.path

['',
 '/scratch/c707/c7071047/miniconda3/bin',
 '/scratch/c707/c7071047/miniconda3/lib/python36.zip',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/lib-dynload',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/site-packages',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/site-packages/IPython/extensions',
 '/home/c7071047/.ipython']

Similar to the linux PATH mechanism, python will look into each of these directories in order. The first directory is always the current working directory, the rest of the list may vary depending on your environment. When a file called mymodule.py is found, it is imported once (and only once) and added to the sys.modules variable, effectively telling python that the module has already been imported. This means that if you change the file and import it again, nothing will change. Fortunately, there are ways to avoid this behavior (see the next chapter).

As you can see, there are many folders related to miniconda, the tool we used to install python. This makes sense, because we want python to look for modules related to our installation of python. In particular, the site-packages folder is of interest to us. If you look into this folder (remember how?) you'll find the many packages we already installed together last week.

You can edit sys.path at you wish and add new folders to it, exactly as you would with the PATH environment variable. In practice however it is recommended to use standard folders to install your packages (as we will see later in the lecture)

Exercise: let's mess around with sys.path a little: create a fake module called matplotlib.py in your current working directory. Open a python interpreter and import matplotlib. Which of the modules (the official or the fake one) is loaded?

Because of this "feature", it is important to find meaningful (and unique) names for your own modules (don't forget to delete the fake matplotlib module!).

Note: the "trick" above does not work with modules of the standard library, because built-in modules are imported as part of the interpreter's startup process.

Take home points

  • we learned that variables are defined in a scope, and we learned about two of them: local and global scope. There are certain rules as to how each scope can interact with the other.
  • the import mechanism is useful to avoid namespace pollution. Modules come with there own namespace, and it is recommended to keep each module's namespace clean (i.e. no from XYZ import *)
  • python borrowed the PATH mechanism from linux and uses a similar logic to look for modules and packages. Installing a new module is therefore super easy: just put a mymodule.py at the right place. (and don't mess with the existing modules!)

What's next?