Language fundamentals#

This chapter will start with a short tutorial to familiarize you with the Python language. You will quickly see the similarities with the programming language you learned in your bachelors. Remember, our goal here is to formalize and name the programming constructs (semantics). As we discussed last week, using clear semantics is primordial to understand software documentation and to “ask questions the right way” in search engines.

An entry level tutorial#

Let’s start by following a simple tutorial together.

Tip

You can simply read through the examples and try to remember them. This might work out for those of you with programming experience. For the majority of you, I highly recommend to open an ipython interpreter (or a jupyter notebook) to test the commands yourself as the tutorial goes on. You can open the interpreter on MyBinder or your laptop, both will work.

Copyright notice: many of these examples and explanations are copy-pasted from the official python tutorial.

Python as a Calculator#

The interpreter acts as a simple calculator: you can type an expression at it and it will write the value. Expression syntax is straightforward: the operators +, -, * and / work just like in most other languages:

2 + 2
4
50 - 5*6
20
8 / 5  # division always returns a floating point number
1.6

Comments in Python start with the hash character, #, and extend to the end of the physical line. A comment may appear at the start of a line or following whitespace or code:

# this is the first comment
spam = 1  # and this is the second comment
          # ... and now a third!

Parentheses () can be used for grouping:

(50 - 5 * 6) / 4
5.0

With Python, the ** operator is used to calculate powers:

5 ** 2
25

The equal sign (=) is used to assign a value to a variable (variable assignment). Afterwards, no result is displayed before the next interactive prompt:

width = 20
height = 5 * 9
width * height
900

Tip

I remember my first programming class very well: the professor wrote i = i + 1 on the blackboard, and I was horrified: how can one write something so obviously wrong?

Many programming instructors recommend against reading out variable assignments as “name equals value” (i.e. from the example above: “i equals i + 1”), because it wrongly associates the = operator to “equals” in spoken language or mathematics.

A much better translation in spoken language would be “i becomes i + 1” or “i is assigned i + 1”. Try to remember this - I’ll do my best to use this in class as well, but I might forget.

If a variable is not “defined” (assigned a value), trying to use it will give you an error:

n  # trying to access an undefined variable raises an error
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_16510/1661493771.py in <cell line: 1>()
----> 1 n  # trying to access an undefined variable raises an error

NameError: name 'n' is not defined

In interactive mode, the last printed expression is assigned to the variable _. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:

tax = 12.5 / 100
price = 100.50
price * tax
12.5625
price + _
113.0625

_ should be treated as a read-only variable, to use in the interpreter only.

Strings#

Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes ('...') or double quotes ("...") with the same result:

'spam eggs'
'spam eggs'
"spam eggs"
'spam eggs'

The double quotes are useful if you need to use a single quote in a string:

"doesn't"
"doesn't"

Alternatively, \ can be used to escape quotes:

'doesn\'t'
"doesn't"

If you don’t want characters prefaced by \ to be interpreted as special characters, you can use raw strings by adding an r before the first quote. This is useful for Windows paths:

print('C:\some\name')  # here \n means newline!
C:\some
ame
print(r'C:\some\name')  # note the r before the quote
C:\some\name

For Windows users

Windows users: remember this trick! Paths to files or folders are used constantly in programming.

Strings can be concatenated (glued together) with the + operator, and repeated with *:

("She's a " + 'witch! ') * 3
"She's a witch! She's a witch! She's a witch! "

Strings can be indexed (subscripted), with the first character having index 0:

word = 'Python'
word[0]  # character in position 0
'P'
word[5]  # character in position 5
'n'

Indices may also be negative numbers, to start counting from the right:

word[-1]  # last character
'n'
word[-2]  # second-last character
'o'

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:

word[0:2]  # characters from position 0 (included) to 2 (excluded)
'Py'
word[2:5]  # characters from position 2 (included) to 5 (excluded)
'tho'

Note how the start is always included, and the end always excluded. This makes sure that s[:i] + s[i:] is always equal to s:

word[:2] + word[2:]
'Python'

Attempting to use an index that is too large will result in an error:

word[42]  # the word only has 6 characters: this will raise an error
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_16510/3588560929.py in <cell line: 1>()
----> 1 word[42]  # the word only has 6 characters: this will raise an error

IndexError: string index out of range

However, out of range slice indexes are handled gracefully when used for slicing:

word[4:42]
'on'
word[42:]
''

The built-in function len() returns the length of a string:

s = 'supercalifragilisticexpialidocious'
len(s)
34

Basic data types#

Now that you are more familiar with the basics, let’s start to name things “the right way”. For example: an informal way to describe a programming language is to say that it “does things with stuff”.

These “stuff” are formally called “objects” in python. We will define objects more precisely towards the end of the lecture, but for now remember one important thing: in python, everything is an object. Yes, everything.

Python objects have a type (synonym: data type). In the previous tutorial, you used exclusively built-in types. Built-in data types are directly available in the interpreter, as opposed to other data types which maybe obtained either by importing them (e.g. from collections import OrderedDict) or by creating new data types yourselves.

Asking for the type of an object#

type(1)
int
a = 'Hello'
type(a)
str

Exercise 8

Try print(type(a)) instead to see the difference with ipython’s simplified print. What is the type of type, by the way?

Numeric types#

There are three distinct numeric types: integers (int), floating point numbers (float), and complex numbers (complex). We will talk about these in more detail in the numerics chapter.

Booleans#

There is a built-in boolean data type (bool) useful to test for truth value. Examples:

type(True), type(False)
(bool, bool)
type(a == 'Hello')
bool
3 < 5
True

Note that there are other rules about testing for truth in python. This is quite convenient if you want to avoid doing operation on invalid or empty containers:

if '':
    print('This should not happen')

In Python, like in C, any non-zero integer value is true; zero is false:

if 1 and 2:
    print('This will happen')
This will happen

Refer to the docs for an exhaustive list of boolean operations and comparison operators.

Text#

In python (and many other languages) text sequences are named strings (str), which can be of any length:

type('Français, 汉语')  # unicode characters are no problem in Python
str

Unlike some languages, there is no special type for characters:

for char in 'string':
    # "char" is also a string of length 1
    print(char, type(char))
s <class 'str'>
t <class 'str'>
r <class 'str'>
i <class 'str'>
n <class 'str'>
g <class 'str'>

Since strings behave like lists in many ways, they are often classified together with the sequence types, as we will see below.

Python strings cannot be changed - they are immutable. Therefore, assigning to an indexed position in the string results in an error:

word = 'Python'
word[0] = 'J'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_16510/3247962163.py in <cell line: 2>()
      1 word = 'Python'
----> 2 word[0] = 'J'

TypeError: 'str' object does not support item assignment

Python objects have methods attached to them. We will learn more about methods later, but here is an example:

word.upper()  # the method .upper() converts all letters in a string to upper case
'PYTHON'
"She's a witch!".split(' ')  # the .split() method divides strings using a separator
["She's", 'a', 'witch!']

Sequence types - list, tuple, range#

Python knows a number of sequence data types, used to group together other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.

squares = [1, 4, 9, 16, 25, 36, 49]
squares
[1, 4, 9, 16, 25, 36, 49]

Lists can be indexed and sliced:

squares[0]
1
squares[-3:]
[25, 36, 49]
squares[0:7:2]  # new slicing! From element 0 to 7 in steps of 2
[1, 9, 25, 49]
squares[::-1]  # new slicing! All elements in steps of -1, i.e. reverse
[49, 36, 25, 16, 9, 4, 1]

Warning

Lists are not the equivalent of arrays in Matlab. One major difference being that the addition operator concatenates lists together (like strings), instead of adding the numbers elementwise like in Matlab. For example:

squares + [64, 81, 100]
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Unlike strings, which are immutable, lists are a mutable type, i.e. it is possible to change their content:

cubes = [1, 8, 27, 65, 125]  # something's wrong here
cubes[3] = 64
cubes
[1, 8, 27, 64, 125]

Assignment to slices is also possible, and this can even change the size of the list:

letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters[2:5] = ['C', 'D', 'E']  # replace some values
letters
['a', 'b', 'C', 'D', 'E', 'f', 'g']
letters[2:5] = []  # now remove them
letters
['a', 'b', 'f', 'g']

The built-in function len() also applies to lists:

len(letters)
4

It is possible to nest lists (create lists containing other lists), as it is possible to store different objects in lists. For example:

a = ['a', 'b', 'c']
n = [1, 2, 3]
x = [a, n, 3.14]
x
[['a', 'b', 'c'], [1, 2, 3], 3.14]
x[0][1]
'b'

Lists also have methods attached to them (see 5.1 More on lists for the most commonly used). For example:

alphabet = ['c', 'b', 'd']
alphabet.append('a')  # add an element to the list
alphabet.sort() # sort it
alphabet
['a', 'b', 'c', 'd']

Other sequence types include: string, tuple, range. Sequence types support a common set of operations and are therefore very similar:

l = [0, 1, 2]
t = (0, 1, 2)
r = range(3)
s = '123'
# Test if elements can be found in the sequence(s)
1 in l, 1 in t, 1 in r, '1' in s
(True, True, True, True)
# Ask for the length
len(l), len(t), len(r), len(s)
(3, 3, 3, 3)
# Addition
print(l + l)
print(t + t)
print(s + s)
[0, 1, 2, 0, 1, 2]
(0, 1, 2, 0, 1, 2)
123123

The addition operator won’t work for the range type though. Ranges are a little different than lists or strings:

r = range(2, 13, 2)
r  # r is an object of type "range". It doesn't print all the values, just the interval and steps
range(2, 13, 2)
list(r)  # applying list() converts range objects to a list of values
[2, 4, 6, 8, 10, 12]

Ranges are usually used as loop counter or to generate other sequences. Ranges have a strong advantage over lists and tuples: their elements are generated when they are needed, not before. Ranges have therefore a very low memory consumption. See the following:

range(2**100)  # no problem
range(0, 1267650600228229401496703205376)
list(range(2**100))  # trying to make a list of values out of it results in an error
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
/tmp/ipykernel_16510/3485074228.py in <cell line: 1>()
----> 1 list(range(2**100))  # trying to make a list of values out of it results in an error

OverflowError: Python int too large to convert to C ssize_t

An OverflowError tells me that I’m trying to create an array too big to fit into memory.

The “tuple” data type is probably a new concept for you, as tuples are quite specific to python. A tuple behaves almost like a list, but the major difference is that a tuple is immutable:

l[1] = 'ha!'  # I can change an element of a list
l
[0, 'ha!', 2]
t[1] = 'ha?'  # But I cannot change an element of a tuple
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_16510/2866505064.py in <cell line: 1>()
----> 1 t[1] = 'ha?'  # But I cannot change an element of a tuple

TypeError: 'tuple' object does not support item assignment

It is their immutability which makes tuples useful, but for beginners this is not really obvious at the first sight. We will get back to tuples later in the lecture.

Sets#

Sets are an unordered collection of distinct objects:

s1 = {'why', 1, 9}
s2 = {9, 'not'}
s1
{1, 9, 'why'}
# Let's compute the union of these two sets. We use the method ".union()" for this purpose:
s1.union(s2)  # 9 was already in the set, however it is not doubled in the union
{1, 9, 'not', 'why'}

Sets are useful for operations such as intersection, union, difference, and symmetric difference between sequences. You won’t see much use for them in this semester, but remember that they exist.

Mapping types - dictionaries#

A mapping object maps values (keys) to arbitrary objects (values): the most frequently used mapping object is called a dictionary. It is a collection of (key, value) pairs:

tel = {'jack': 4098, 'sape': 4139}
tel
{'jack': 4098, 'sape': 4139}
tel['guido'] = 4127
tel
{'jack': 4098, 'sape': 4139, 'guido': 4127}
del tel['sape']
tel
{'jack': 4098, 'guido': 4127}

Keys can be of any immutable type: e.g. strings and numbers are often used as keys. The keys in a dictionary are all unique (they have to be):

d = {'a':1, 2:'b', 'c':1}  # a, 2, and c are keys
d
{'a': 1, 2: 'b', 'c': 1}

You can ask whether a (key, value) pair is available in a dict with the statement:

2 in d
True

However, you cannot check appartenance by value, since the values are not necessarily unique:

1 in d
False

Warning

A python dict is not guaranteed to remember the order in which the keys have been added to it. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted, but it is not guaranteed in previous python versions and you should not count on it.

Dictionaries are (together with lists) the container type you will use the most often.

Note: there are other container types in python, but they are used less often. See Container datatypes in the official documentation.

Exercise 9

Can you think of examples of application of a dict? Describe a couple of them!

Semantics parenthesis: “literals”#

Literals are the fixed values of a programming language (“notations”). Some of them are pretty universal, like numbers or strings (9, 3.14, "Hi!", all literals) some are more language specific and belong to the language’s syntax. Curly brackets {} for example are the literal representation of a dict. The literal syntax has been added for convenience only:

d1 = dict(bird='parrot', plant='crocus')  # one way to make a dict
d2 = {'bird':'parrot', 'plant':'crocus'}  # another way to make a dict
d1 == d2
True

Both {} and dict() are equivalent: using one or the other to construct your containers is a matter of taste, but in practice you will see the literal version more often.

Control flow#

First steps towards programming#

Of course, we can use Python for more complicated tasks than adding two and two together. For instance, we can write an initial sub-sequence of the Fibonacci series as follows:

# Fibonacci series:
# the sum of two previous elements defines the next
a, b = 0, 1
while a < 10:
    print(a)
    a, b = b, a+b
0
1
1
2
3
5
8

This example introduces several new features.

  • The first line contains a multiple assignment: the variables a and b simultaneously get the new values 0 and 1. On the last line this is used again, demonstrating that the expressions on the right-hand side are all evaluated first before any of the assignments take place. The right-hand side expressions are evaluated from the left to the right.

  • The while loop executes as long as the condition (here: a < 10) remains true. The standard comparison operators are written the same as in C: < (less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to) and != (not equal to).

  • The body of the loop is indented: indentation is Python’s way of grouping statements, and not via brackets or begin .. end statements. Hate it or love it, this is how it is ;-). I learned to like this style a lot. Note that each line within a basic block must be indented by the same amount. Although the indentation could be anything (two spaces, three spaces, tabs…), the recommended way is to use four spaces.

The print() function accepts multiple arguments:

i = 256*256
print('The value of i is', i)
The value of i is 65536

The keyword argument (see definition below) end can be used to avoid the newline after the output, or end the output with a different string:

a, b = 0, 1
while a < 1000:
    print(a, end=',')
    a, b = b, a+b
0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,

The if statement#

Perhaps the most well-known statement type is the if statement:

x = 12
if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')
More

There can be zero or more elif parts, and the else part is optional. The keyword elif is short for “else if”, and is useful to avoid excessive indentation.

The for statement#

The for loops in python can be quite different than in other languages: in python, one iterates over sequences, not indexes. This is a feature I very much like for its readability:

words = ['She', 'is', 'a', 'witch']
for w in words:
    print(w)
She
is
a
witch

The equivalent for loop with a counter is considered “unpythonic”, i.e. not elegant.

Unpythonic:

seq = ['This', 'is', 'very', 'unpythonic']
# Do not do this at home!
n = len(seq)
for i in range(n):
    print(seq[i])
This
is
very
unpythonic

Pythonic:

seq[-1] = 'pythonic'
for s in seq:
    print(s)
This
is
very
pythonic

for i in range(xx) is almost never what you want to do in python. If you have several sequences you want to iterate over, then do:

squares = [1, 4, 9, 25]
for s, l in zip(seq, squares):
    print(l, s)
1 This
4 is
9 very
25 pythonic

The break and continue statements#

The break statement breaks out of the innermost enclosing for or while loop:

for letter in 'Python':
    if letter == 'h':
        break
    print('Current letter:', letter)
Current letter: P
Current letter: y
Current letter: t

The continue statement continues with the next iteration of the loop:

for num in range(2, 10):
    if num % 2 == 0:
        print("Found an even number", num)
        continue
    print("Found a number", num)
Found an even number 2
Found a number 3
Found an even number 4
Found a number 5
Found an even number 6
Found a number 7
Found an even number 8
Found a number 9

Defining functions#

A first example#

def fib(n):
    """Print a Fibonacci series up to n."""
    a, b = 0, 1
    while a < n:
        print(a, end=' ')
        a, b = b, a+b

# Now call the function we just defined:
fib(2000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 

The def statement introduces a function definition. It must be followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented.

The first statement of the function body can optionally be a string literal; this string literal is the function’s documentation string, or docstring (more about docstrings later: in the meantime, make a habit out of it).

A function definition introduces the function name in the current scope (we will learn about scopes soon). The value of the function name has a type that is recognized by the interpreter as a user-defined function. This value can be assigned to another name which can then also be used as a function. This serves as a general renaming mechanism:

fib
<function __main__.fib(n)>
f = fib
f(100)
0 1 1 2 3 5 8 13 21 34 55 89 

Coming from other languages, you might object that fib is not a function but a procedure since it doesn’t return a value. In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None (it’s a built-in name). Writing the value None is normally suppressed by the interpreter if it would be the only value written. You can see it if you really want to by using print():

fib(0)  # shows nothing
print(fib(0))  # prints None
None

It is simple to write a function that returns a list of the numbers of the Fibonacci series, instead of printing it:

def fib2(n):  # return Fibonacci series up to n
    """Return a list containing the Fibonacci series up to n."""
    result = []
    a, b = 0, 1
    while a < n:
        result.append(a) 
        a, b = b, a+b
    return result

r = fib2(100)  # call it
r  # print the result
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

Positional and keyword arguments#

Functions have two types of arguments: positional arguments and keyword arguments.

keyword arguments are preceded by an identifier (e.g. name=) and are attributed a default value. They are therefore optional:

def f(arg1, arg2, kwarg1=None, kwarg2='Something'):
    """Some function with arguments."""
    print(arg1, arg2, kwarg1, kwarg2)
f(1, 2)  # no need to specify them - they are optional and have default values
1 2 None Something
f(1, 2, kwarg1=3.14, kwarg2='Yes')  # but you can set them to a new value
f(1, 2, kwarg2='Yes', kwarg1=3.14)  # and the order is not important!
1 2 3.14 Yes
1 2 3.14 Yes

Unfortunately, it is also possible to set keyword arguments without naming them, in which case the order matters:

f(1, 2, 'Yes', 'No')
1 2 Yes No

I am not a big fan of this feature because it reduces the clarity of the code. I recommend to always use the kwarg= syntax. Others agree with me, and therefore python implemented a syntax to make calls like the above illegal:

# The * before the keyword arguments make them keyword arguments ONLY
def f(arg1, arg2, *, kwarg1=None, kwarg2='None'):
    print(arg1, arg2, kwarg1, kwarg2)
f(1, 2, 'Yes', 'No')  # This now raises an error
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_16510/102273986.py in <cell line: 1>()
----> 1 f(1, 2, 'Yes', 'No')  # This now raises an error

TypeError: f() takes 2 positional arguments but 4 were given

positional arguments are named like this because their position matters, and unlike keyword arguments they don’t have a default value and they are mandatory. Forgetting to set them results in an error:

f(1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_16510/1567560659.py in <cell line: 1>()
----> 1 f(1)

TypeError: f() missing 1 required positional argument: 'arg2'

Importing modules and functions#

Although python ships with some built-in functions available in the interpreter (e.g. len(), print()), it is by far not enough to do real world programming. Thankfully, python comes with a mechanism which allows us to access much more functionality:

import math
print(math)
print(math.pi)
<module 'math' from '/home/mowglie/.miniconda3/envs/py3/lib/python3.10/lib-dynload/math.cpython-310-x86_64-linux-gnu.so'>
3.141592653589793

math is a module, and it has attributes (e.g. pi) and functions attached to it:

math.sin(math.pi / 4)  # compute a sinus
0.7071067811865475

math is available in the python standard library (link): this means that it comes pre-installed together with python itself. Other modules can be installed (like numpy or matplotlib), but we won’t need them for now.

Modules often have a thematic grouping, i.e. math, time, multiprocessing. You will learn more about them in the next lecture.

Take home points#

  • in python, everything is an object - we’ll learn more about them later

  • for now you can remember that python objects have methods (“services”) attached to them, such as .split() for strings or .append() for lists

  • all objects have a data type: examples of data types include float, string, dict, list

  • you can ask for the type of an object with the built-in function type()

  • “built-in” means that a function or data type is available at the command prompt without import statement

  • the “standard library” is not the same as “built-in” (the standard library is the suite of modules which come pre-installed with python)

  • list and dict are the container data types you will use most often, tuple is often returned by Python itself or libraries.

  • certain objects are immutable (string, tuple), but others are mutable and can change their state (dict, list)

  • in python, indentation matters! This is how you define blocks of code. Keep your indentation consistent, with 4 spaces.

  • in python, one iterates over sequences, not indexes (for i in ... is very rare in python and so is the variable i)

  • functions are defined with def, and also rely on indentation to define blocks. They can have a return statement

  • there are two types or arguments in functions: positional (mandatory) and keyword (optional) arguments

  • the import statement opens a whole new world of possibilities: you can access other standard tools that are not available at the top-level prompt

We learned the basic elements of the python syntax: to become fluent with this new language you will have to get familiar with all of the elements presented above. With time, you might want to get back to this chapter (or to the python reference documentation) to revisit what you’ve learned. I also highly recommend to follow the official python tutorial, sections 3 to 5.