This chapter will start with a short tutorial to get you familiar with Python. You will quickly see the similarities with whatever programming language you already know. After this introduction we will start by formalizing things and naming them (semantics). As we discussed last week, using clear semantics is primordial to understand software documentation and to "ask questions the right way" in search engines.
Let's start by following a simple tutorial together. You can simply read through the examples; however, I highly recommend to open an ipython interpreter or a notebook (see the climate lecture) to test the commands yourself as the tutorial goes on.
In most online tutorials you will see >>>
to represent the python prompt, but with ipython or this tutorial you will use the numerated prompt In [1]:
.
Copyright notice: many of these examples and explanations are simply copy-pasted from the official python tutorial.
The interpreter acts as a simple calculator: you can type an expression at it and it will write the value. Expression syntax is straightforward: the operators +
, -
, *
and /
work just like in most other languages:
2 + 2
50 - 5*6
8 / 5 # division always returns a floating point number
Comments in Python start with the hash character, #
, and extend to the end of the physical line. A comment may appear at the start of a line or following whitespace or code:
# this is the first comment
spam = 1 # and this is the second comment
# ... and now a third!
Parentheses ()
can be used for grouping:
(50 - 5*6) / 4
With Python, the **
operator is used to calculate powers:
5 ** 2
The equal sign (=
) is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt:
width = 20
height = 5 * 9
width * height
If a variable is not “defined” (assigned a value), trying to use it will give you an error:
n # try to access an undefined variable
In interactive mode, the last printed expression is assigned to the variable _
. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:
tax = 12.5 / 100
price = 100.50
price * tax
price + _
_
should be treated as a read-only variable, to use in the interpreter only.
Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes ('...'
) or double quotes ("..."
) with the same result:
'spam eggs'
"spam eggs"
The double quotes are useful if you need to use a single quote in a string:
"doesn't"
Alternatively, \
can be used to escape quotes:
'doesn\'t'
If you don’t want characters prefaced by \
to be interpreted as special characters, you can use raw strings by adding an r
before the first quote. This is useful for Windows paths:
print('C:\some\name') # here \n means newline!
print(r'C:\some\name') # note the r before the quote
Strings can be concatenated (glued together) with the +
operator, and repeated with *
:
("She's a " + 'witch! ') * 3
Strings can be indexed (subscripted), with the first character having index 0:
word = 'Python'
word[0] # character in position 0
word[5] # character in position 5
Indices may also be negative numbers, to start counting from the right:
word[-1] # last character
word[-2] # second-last character
In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:
word[0:2] # characters from position 0 (included) to 2 (excluded)
word[2:5] # characters from position 2 (included) to 5 (excluded)
Note how the start is always included, and the end always excluded. This makes sure that s[:i] + s[i:]
is always equal to s
:
word[:2] + word[2:]
Attempting to use an index that is too large will result in an error:
word[42] # the word only has 6 characters
However, out of range slice indexes are handled gracefully when used for slicing:
word[4:42]
word[42:]
The built-in function len()
returns the length of a string:
s = 'supercalifragilisticexpialidocious'
len(s)
Now that you are more familiar with the basics, let's start to name things "the right way". For example: an informal way to describe a programming language is to say that it "does things with stuff".
These "stuff" are formally called "objects" in python. We will define objects more precisely towards the end of the lecture, but for now remember one important thing: in python, everything is an object. Yes, everything.
Python objects have a type (synonym: data type). In the previous tutorial, you used exclusively built-in types. Built-in data types are directly available in the interpreter, as opposed to other data types which maybe obtained either by importing them (e.g. from collections import OrderedDict
) or by creating new data types yourselves.
type(1)
a = 'Hello'
type(a)
Exercise: add a
print
call in the statement above to see the difference with ipython's simplified print. What is the type of type
, by the way?
There are three distinct numeric types: integers (int
), floating point numbers (float
), and complex numbers (complex
). We will talk about these in more details in the numerics chapter.
There is a built-in boolean data type (bool
) useful to test for truth value. Examples:
type(True), type(False)
type(a == 'Hello')
3 < 5
Note that there are other rules about testing for truth in python. This is quite convenient if you want to avoid doing operation on invalid or empty containers:
if '':
print('This should not happen')
In Python, like in C, any non-zero integer value is true; zero is false:
if 1 and 2:
print('This will happen')
Refer to the docs for an exhaustive list of boolean operations and comparison operators.
In python (and many other languages) text sequences are named strings (str
), which can be of any length:
type('Français, 汉语') # unicode characters are no problem in Python
Unlike some languages, there is no special type for characters:
for char in 'string':
# "char" is also a string of length 1
print(char, type(char))
Since strings behave like lists in many ways, they are often classified together with the sequence types, as we will see below.
Python strings cannot be changed - they are immutable. Therefore, assigning to an indexed position in the string results in an error:
word = 'Python'
word[0] = 'J'
Python objects have methods attached to them. We will learn more about methods later, but here is an example:
word.upper() # the method .upper() converts all letters in a string to upper case
"She's a witch!".split(' ') # the .split() method divides strings using a separator
Python knows a number of sequence data types, used to group together other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.
squares = [1, 4, 9, 16, 25, 36, 49]
squares
Lists can be indexed and sliced:
squares[0]
squares[-3:]
squares[0:7:2] # new slicing! From element 0 to 7 in steps of 2
squares[::-1] # new slicing! All elements in steps of -1, i.e. reverse
Careful! Lists are not the equivalent of arrays in Matlab. One major difference being that the addition operator concatenates lists together (like strings), instead of adding the numbers elementwise like in Matlab:
squares + [64, 81, 100]
Unlike strings, which are immutable, lists are a mutable type, i.e. it is possible to change their content:
cubes = [1, 8, 27, 65, 125] # something's wrong here
cubes[3] = 64
cubes
Assignment to slices is also possible, and this can even change the size of the list:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters[2:5] = ['C', 'D', 'E'] # replace some values
letters
letters[2:5] = [] # now remove them
letters
The built-in function len()
also applies to lists:
len(letters)
It is possible to nest lists (create lists containing other lists), as it is possible to store different objects in lists. For example:
a = ['a', 'b', 'c']
n = [1, 2, 3]
x = [a, n, 3.14]
x
x[0][1]
Lists also have methods attached to them (see 5.1 More on lists for the most commonly used). For example:
alphabet = ['c', 'b', 'd']
alphabet.append('a') # add an element to the list
alphabet.sort() # sort it
alphabet
Other sequence types include: string, tuple, range. Sequence types support a common set of operations and are therefore very similar:
l = [0, 1, 2]
t = (0, 1, 2)
r = range(3)
s = '123'
# Test if elements can be found in the sequence(s)
1 in l, 1 in t, 1 in r, '1' in s
# Ask for the length
len(l), len(t), len(r), len(s)
# Addition
print(l + l)
print(t + t)
print(s + s)
The addition operator won't work for the range type though. Ranges are a little different than lists or strings:
r = range(2, 13, 2)
r # r is an object of type "range". It doesn't print all the values, just the interval and steps
list(r) # applying list() converts range objects to a list of values
Ranges are usually used as loop counter or to generate other sequences. Ranges have a strong advantage over lists and tuples: their elements are generated when they are needed, not before. Ranges have therefore a very low memory consumption. See the following:
range(2**100) # no problem
list(range(2**100)) # trying to make a list of values out of it results in an error
An OverflowError
tells me that I'm trying to create an array too big to fit into memory.
The "tuple" data type is probably a new concept for you, as tuples are quite specific to python. A tuple behaves almost like a list, but the major difference is that a tuple is immutable:
l[1] = 'ha!' # I can change an element of a list
l
t[1] = 'ha?' # But I cannot change an element of a tuple
It is immutability which makes tuples useful, but for beginners this is not really obvious at the first sight. We will get back to tuples later in the lecture.
Sets are an unordered collection of distinct objects:
s1 = {'why', 1, 9}
s2 = {9, 'not'}
s1
# Let's compute the union of these two sets. We use the method ".union()" for this purpose:
s1.union(s2) # 9 was already in the set, however it is not doubled in the union
Sets are useful for operations such as intersection, union, difference, and symmetric difference between sequences. You won't see much use for them in this semester, but remember that they exist.
A mapping object maps values (keys) to arbitrary objects (values): the most frequently used mapping object is called a dictionary. It is a collection of (key, value) pairs:
tel = {'jack': 4098, 'sape': 4139}
tel
tel['guido'] = 4127
tel
del tel['sape']
tel
Keys can be of any immutable type: e.g. strings and numbers are often used as keys. The keys in a dictionary are all unique (they have to be):
d = {'a':1, 2:'b', 'c':1} # a, 2, and c are keys
d
You can ask whether a (key, value) pair is available in a dict with the statement:
2 in d
However, you cannot check appartenance by value, since the values are not necessarily unique:
1 in d
Dictionaries are (together with lists) the container type you will use the most often.
Note: there are other mapping types in python, but they are all related to the original dict
. Examples include collections.OrderedDict
, which is a dictionary preserving the order in which the keys are entered.
Exercise: can you think of examples of application of a
dict
? Describe a couple of them!
Literals are the fixed values of a programming language ("notations"). Some of them are pretty universal, like numbers or strings (9
, 3.14
, "Hi!"
, all literals) some are more language specific and belong to the language's syntax. Curly brackets {}
for example are the literal representation of a dict
. The literal syntax has been added for convenience only:
d1 = dict(bird='parrot', plant='crocus') # one way to make a dict
d2 = {'bird':'parrot', 'plant':'crocus'} # another way to make a dict
d1 == d2
Both {}
and dict()
are equivalent: using one or the other to construct your containers is a matter of taste, but in practice you will see the literal version more often.
Of course, we can use Python for more complicated tasks than adding two and two together. For instance, we can write an initial sub-sequence of the Fibonacci series as follows:
# Fibonacci series:
# the sum of two previous elements defines the next
a, b = 0, 1
while a < 10:
print(a)
a, b = b, a+b
This example introduces several new features.
a < 10
) remains true. The standard comparison operators are written the same as in C: <
(less than), >
(greater than), ==
(equal to), <=
(less than or equal to), >=
(greater than or equal to) and !=
(not equal to).begin .. end
statements. Hate it or love it, this is how it is ;-). I learned to like this style a lot. Note that each line within a basic block must be indented by the same amount. Although the indentation could be anything (two spaces, three spaces, tabs...), the recommended way is to use four spaces.The print() function accepts multiple arguments:
i = 256*256
print('The value of i is', i)
The keyword argument (see definition below) end
can be used to avoid the newline after the output, or end the output with a different string:
a, b = 0, 1
while a < 1000:
print(a, end=',')
a, b = b, a+b
if
statement¶Perhaps the most well-known statement type is the if statement:
x = 12
if x < 0:
x = 0
print('Negative changed to zero')
elif x == 0:
print('Zero')
elif x == 1:
print('Single')
else:
print('More')
There can be zero or more elif
parts, and the else
part is optional. The keyword elif
is short for "else if", and is useful to avoid excessive indentation.
for
statement¶The for
loops in python can be quite different than in other languages: in python, one iterates over sequences, not indexes. This is a feature I very much like for its readability:
words = ['She', 'is', 'a', 'witch']
for w in words:
print(w)
The equivalent for loop with a counter is considered "unpythonic", i.e. not elegant.
Unpythonic:
seq = ['This', 'is', 'very', 'unpythonic']
# Do not do this at home!
n = len(seq)
for i in range(n):
print(seq[i])
Pythonic:
seq[-1] = 'pythonic'
for s in seq:
print(s)
for i in range(xx)
is almost never what you want to do in python. If you have several sequences you want to iterate over, then do:
squares = [1, 4, 9, 25]
for s, l in zip(seq, squares):
print(l, s)
break
and continue
statements¶The break
statement breaks out of the innermost enclosing for or while loop:
for letter in 'Python':
if letter == 'h':
break
print('Current letter:', letter)
The continue statement
continues with the next iteration of the loop:
for num in range(2, 10):
if num % 2 == 0:
print("Found an even number", num)
continue
print("Found a number", num)
def fib(n):
"""Print a Fibonacci series up to n."""
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
# Now call the function we just defined:
fib(2000)
The def
statement introduces a function definition. It must be followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented.
The first statement of the function body can optionally be a string literal; this string literal is the function's documentation string, or docstring (more about docstrings later: in the meantime, make a habit out of it).
A function definition introduces the function name in the current scope (we will learn about scopes soon). The value of the function name has a type that is recognized by the interpreter as a user-defined function. This value can be assigned to another name which can then also be used as a function. This serves as a general renaming mechanism:
fib
f = fib
f(100)
Coming from other languages, you might object that fib
is not a function but a procedure since it doesn't return a value. In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None
(it’s a built-in name). Writing the value None
is normally suppressed by the interpreter if it would be the only value written. You can see it if you really want to by using print()
:
fib(0) # shows nothing
print(fib(0)) # prints None
It is simple to write a function that returns a list of the numbers of the Fibonacci series, instead of printing it:
def fib2(n): # return Fibonacci series up to n
"""Return a list containing the Fibonacci series up to n."""
result = []
a, b = 0, 1
while a < n:
result.append(a)
a, b = b, a+b
return result
r = fib2(100) # call it
r # print the result
Functions have two types of arguments: positional arguments and keyword arguments.
keyword arguments are preceded by an identifier (e.g. name=
) and are attributed a default value. They are therefore optional:
def f(arg1, arg2, kwarg1=None, kwarg2='Something'):
"""Some function with arguments."""
print(arg1, arg2, kwarg1, kwarg2)
f(1, 2) # no need to specify them - they are optional and have default values
f(1, 2, kwarg1=3.14, kwarg2='Yes') # but you can set them to a new value
f(1, 2, kwarg2='Yes', kwarg1=3.14) # and the order is not important!
Unfortunately, it is also possible to set keyword arguments without naming them, in which case the order matters:
f(1, 2, 'Yes', 'No')
I am not a big fan of this feature because it reduces the clarity of the code. I recommend to always use the kwarg=
syntax. Others agree with me, and therefore python implemented a syntax to make calls like the above illegal:
# The * before the keyword arguments make them keyword arguments ONLY
def f(arg1, arg2, *, kwarg1=None, kwarg2='None'):
print(arg1, arg2, kwarg1, kwarg2)
f(1, 2, 'Yes', 'No') # This now raises an error
positional arguments are named like this because their position matters, and unlike keyword arguments they don't have a default value and they are mandatory. Forgetting to set them results in an error:
f(1)
Although python ships with some built-in functions available in the interpreter (e.g. len()
, print()
), it is by far not enough to do real world programming. Thankfully, python comes with a mechanism which allows us to access much more functionality:
import math
print(math)
print(math.pi)
math
is a module, and it has attributes (e.g. pi
) and functions attached to it:
math.sin(math.pi / 4) # compute a sinus
math
is available in the python standard library (https://docs.python.org/3/library/): this means that it comes pre-installed together with python itself. Other modules can be installed (like numpy
or matplotlib
), but we won't need them for now.
Modules often have a thematic grouping, i.e. math
, time
, multiprocessing
. You will learn more about them in the next lecture.
type()
.upper()
for strings, .append()
for listsdef
, and also rely on indentation to define blocks. They can have a return
statementimport
statement opens a whole new world of possibilities: you can access other standard tools that are not available at the top-level promptWe learned the basic elements of the python syntax: to become fluent with this new language you will have to get familiar with all of the elements presented above. With time, you might want to get back to this chapter (or to the python reference documentation) to revisit what you've learned. I also highly recommend to follow the official python tutorial, sections 3 to 5.
Back to the table of contents, or jump to this week's assignment.