Modules, import mechanism, namespaces, scope(s)#

In this chapter we will dive into one of the most important features of python: the import mechanism. We will explain why modules are useful, how to “import” them, and how you can write one yourself.

Variable scopes#

The scope of a variable is the region of the program where the variable is valid, i.e. the “location” where the variable name can be used to refer to the data it links to. Let’s consider this first example:

def foo():
    i = 5
    print(i, 'in foo()')

print(i, 'global')  # will throw an error
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 5
      2     i = 5
      3     print(i, 'in foo()')
----> 5 print(i, 'global')  # will throw an error

NameError: name 'i' is not defined

It doesn’t work because the name i is defined in the function foo, which we defined but never called (you can see this because nothing is printed), so the function’s statements were actually never run.

Let’s see if the following example works better:

def foo():
    i = 5
    print(i, 'in foo()')

foo()

print(i, 'global')   # will throw an error
5 in foo()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 7
      3     print(i, 'in foo()')
      5 foo()
----> 7 print(i, 'global')   # will throw an error

NameError: name 'i' is not defined

Here, the function is called and i is defined and equal to 5. However, the scope of the variable i is the block defined by the function foo: outside of this scope the variable doesn’t exist anymore, leading to the NameError at the global scope.

If this is understood, the following example might be clear as well?

i = 1

def foo():
    i = 5
    print(i, 'in foo()')

print(i, 'global before foo()')

foo()

print(i, 'global after foo()')
1 global before foo()
5 in foo()
1 global after foo()

The global scope refers to the highest scope level (the module or, in interactive mode, the interpreter). The function’s scope in turn is called local scope. One says that the global scope prevails because what is possible in one direction isn’t possible in the other:

k = 2

def foo():
    print(k, 'is there a k in foo()?')

foo()
2 is there a k in foo()?

Yes, k is available in foo(): global variables are also available in the nested local scopes. They can be overridden locally (like in the example above), but this change won’t be visible at the global scope level.

Modules#

Python modules are used to store a collection of functions and variables that can be imported into other modules or into the main environment.

A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.

For instance, I used my favorite text editor to create a file called greetings.py with the following content:

greetings.py:

"""A module to say hi in several langages."""

base = {
    'en': 'Hi {}!',
    'fr': 'Salut {}!',
    'cn': '你好 {}!',
}


def say_hi(name, language='en'):
    """Say hi in the current language."""
    print(base[language].format(name))

I copied this file in the current working directory (this “detail” is important, as we will see below). From the (i)python interpreter I can now import this module with the following command:

import greetings

By importing the module, I have access to the functions it contains:

greetings.say_hi('Fabi')
Hi Fabi!
greetings.say_hi('Jan', language='cn')
你好 Jan!

But also to the global variables defined at the top level:

greetings.base
{'en': 'Hi {}!', 'fr': 'Salut {}!', 'cn': '你好 {}!'}

Exercise 10

Repeat the steps above on your laptop and do additional experiments / ask questions until these variable scope mechanisms are well understood.

Namespaces#

Global variables in modules or scripts are useful, but they should be used with care. Indeed, since they are available from everywhere in the module, the namespace can quickly become “polluted”: for large modules, it becomes hard to keep track of all variables. This can become even more complicated when new modules with their own new variables and functions are used as well.

Fortunately, the global scope is constrained to the current script or module, and using external modules is unlikely to cause confusion thanks to the import mechanism:

pi = 3.14
import math
print(math)
print(pi, math.pi)
<module 'math' from '/Users/uu23343/.mambaforge/envs/oggm_env/lib/python3.12/lib-dynload/math.cpython-312-darwin.so'>
3.14 3.141592653589793

By importing the math module (available in python’s standard library), we have access to a new variable called math. The variable math has the type module, and provides new functions and variables (such as math.pi). What are all these functions and variables? Let’s use the built-in dir() function to find out:

print(dir(math))
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'cbrt', 'ceil', 'comb', 'copysign', 'cos', 'cosh', 'degrees', 'dist', 'e', 'erf', 'erfc', 'exp', 'exp2', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'isqrt', 'lcm', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'nextafter', 'perm', 'pi', 'pow', 'prod', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'sumprod', 'tan', 'tanh', 'tau', 'trunc', 'ulp']

Exercise 11

Ask the ipython documentation about dir (remember how?)

Like our greetings module, the math module contains functions (e.g. math.radians) as well as variables (e.g. math.nan or math.pi). For Python there is no fundamental difference between variables and functions: they are simply all attributes of the math module and therefore listed by dir (we’ll get back to attributes later: for now, remember the dot . syntax, dir and TAB autocompletion). Some of the attributes listed by dir have a prefix of two underscores __. We’ll get back to them soon.

There are several ways to import a module. See these four examples:

# A
# Import an entire module and access its attributes with a "dot"
import math
math.sin(math.pi/2)
1.0
# B
# Same as A, but store the module under a new variable named "ma"
import math as ma
ma.sin(ma.pi/2)
1.0
# C
# Import the attributes sin and pi *only*
from math import sin, pi
sin(pi/2)
1.0
# D
# Import everything from the module and add it in our global scope
from math import *
sin(pi/2)
1.0

All four options lead to the same result: they compute the sinus of π/2.

Now, which one to use? It is up to you, but there are some implicit rules (conventions) which are widely accepted:

Important

  1. In case of doubt, use option A. It is the most readable and the most explicit.

  2. The exception to the first rule is when the library has a naming convention for its acronym. A good example is numpy, which recommends to use import numpy as np.

  3. Option C might be useful if the names you are importing are very explicit, and if you expect to use them often in your script. Otherwise, it is not recommended.

  4. Option D is bad. Don’t use it.

  5. You can make an exception to rule 4 when working in the command line and exploring data. Never use option 4 in scripts.

Exercise 12

Try to find arguments why option D is a bad idea. If you can’t find any reason, ask me!

sys.path#

But how does python know where to look for modules when you type import mymodule in the interpreter? Well, it relies on a mechanism very similar to linux’s PATH environment variable. Remember this one? Within python, the directories where the interpreter will look for modules to import are listed in sys.path:

import sys
sys.path

['',
 '/scratch/c707/c7071047/miniconda3/bin',
 '/scratch/c707/c7071047/miniconda3/lib/python36.zip',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/lib-dynload',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/site-packages',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg',
 '/scratch/c707/c7071047/miniconda3/lib/python3.6/site-packages/IPython/extensions',
 '/home/c7071047/.ipython']

Similar to the linux PATH mechanism, python will look into each of these directories in order. The first directory is always the current working directory (an empty string), the rest of the list may vary depending on your environment. When a file called mymodule.py is found, it is imported once (and only once) and added to the sys.modules variable, effectively telling python that the module has already been imported. This means that if you change the file and import it again, nothing will change. Fortunately, there are ways to avoid this behavior (see below).

As you can see, there are many folders related to miniconda, the tool we used to install python. This makes sense, because we want python to look for modules related to our installation of python (not the one used by our operating system). In particular, the site-packages folder is of interest to us. If you look into this folder using the linux command line or ipython’s magic commands (remember how?) you’ll find the many packages we already installed together last week.

You can edit sys.path at your wish and add new folders to it, exactly as you would with the PATH environment variable in linux. In practice, however, it is recommended to use standard folders to install your packages instead of messing around with sys.path. I personally never had to change sys.path in my scripts, but it can come handy if you want to use a collection of modules that you have not “packaged” yet: in this case, you can put all modules in a folder and add it to sys.path with:

import sys
sys.path.append('/path/to/a/folder')

But this will add the folder to the current session or script only. To add a folder permanently, you can use the PYTHONPATH environment variable.

Exercise 13

Print your own sys.path on your laptop and explore the folders listed there.

Now let’s play with sys.path a little. Create a fake module called matplotlib.py in your current working directory. Open a python interpreter and import matplotlib. Which of the modules (the official or the fake one) is loaded by python, and why?

Because of this “feature”, it is important to find meaningful (and unique) names for your own modules (don’t forget to delete the fake matplotlib module after trying that out!).

Note: the “matplotlib trick” above does not work with modules of the standard library, because built-in modules are imported as part of the interpreter’s startup process.

Reimporting modules#

Of course, there is a lot more behind an import statement than this short summary (see the documentation for a more comprehensive overview). What you should remember however is that an import statement is not computationally cheap: it has to look for a file among many folders, compile the python code to bytecode, add the module to a registry… Actually, for all the things it does, it is quite fast already!

Since import statements are used all the time in Python, it is a good idea to optimize them: therefore, Python will not re-import a module that is imported already.

Most of the time this is not a problem, but sometimes (mostly in interactive development) it might be useful to make changes to a module and having them available at the command line without having to restart a fresh Python interpreter each time. To do this, there are two mechanisms available to you:

  1. Using Python’s importlib.reload() function:

import importlib
importlib.reload(greetings)
<module 'greetings' from '/Users/uu23343/Library/CloudStorage/Dropbox/HomeDocs/git/scientific_programming/book/week_03/greetings.py'>

This can reimport a module which has been imported before (i.e. it won’t search for it again). But this has to be done each time you change a module, making it quite cumbersome in interactive mode. Here again, ipython comes with a more flexible solution:

  1. Using ipython’s autoreload extension:

%load_ext autoreload
%autoreload 2

From now on, autoreload will make sure that all modules are reloaded automatically before entering the execution of code typed at the ipython prompt.

Executing modules as scripts#

You’ve already learned that importing modules actually runs them, the same way as $ python mymodule.py (from the linux terminal) or %run mymodule.py (in ipython) would. This can have some undesired consequences: what if, for example, you would like certain things to happen when the module is run as script but not when being imported?

There is a mechanism that allows to do exactly this. Add the following block of code at the end of the greetings.py module:

if __name__ == '__main__':
    # execute only if run as a script
    import sys
    nargs = len(sys.argv)
    if nargs == 3:
        say_hi(sys.argv[1], language=sys.argv[2])
    else:
        print('Syntax:')
        print('%run greetings.py name language')
        print('Languages available: {}'.format(list(base)))

Now you can execute your script like this:

%run greetings.py Fabi fr
Salut Fabi!

But importing it will produce no output:

import greetings

How does this work exactly? The important bit is the if __name__ == '__main__': line. __name__ (and all other attributes starting and ending with two underscores) is “reserved” by the language. They are no different than other variables, but you should not erase or replace them since they contain useful information and might be used by other tools. For example:

greetings.__doc__  # the documentation of the module as string
'A module to say hi in several langages.'
greetings.__name__  # the name of the module as string
'greetings'

Now, back to our script: the statement if __name__ == '__main__' is going to be false when the module is imported (since the name is greetings), but obviously true when the module is run as a script. Indeed, __name__ is set to __main__ when executed.

Take home points#

  • variables are defined in a scope, and we learned about two scopes: local and global. There are precise rules as to how each scope can interact with the other. As a general rule: global variables are available in nested scopes and cannot be overwritten, and local variables are not available in the global scope.

  • the import mechanism is useful to avoid namespace pollution. Modules come with their own namespace, and it is recommended to keep each module’s namespace clean (i.e. no from XYZ import *)

  • python borrowed the PATH mechanism from linux and uses a similar logic to look for modules and packages. Installing a new module is therefore super easy: just put a mymodule.py at the right place. (and don’t mess with the existing modules!)

  • the %autoreload command from ipython allows to dynamically change a module while using it at the command line

  • when writing scripts, always put the code that needs to be executed in an if __name__ == '__main__' block: that way, your script can still be imported for the functions it proposes