Language basics

In this chapter we are revisiting the things you learned during the tutorial, and give some example of applications for each of the new concepts that were introduced.

Learning outcomes: fundamentals of the python syntax. New semantics: data type, literal, sequences, mapping, keyword and positional arguments...

Prerequisites: you have read the python tutorial, sections 3 to 5.

Basic data types

An informal way to describe a programming language is to say that it "does things with stuff". These "stuff" are often formally called Data Types in computer science. In python, these data are called "objects": these can be described much more formally, but for now we will just mention that everything is an object in python.

Therefore, data types are sometimes called object types in some python textbooks. It doesn't make a difference and you can choose both terminologies. In sections 3 and 4 of the tutorial you used exclusively built-in types. Built-in data types are directly available in the interpreter, as opposed to other data types which maybe obtained either by importing them (eg from collections import OrderedDict) or by creating them yourselves.

Asking for the type of an object

In [1]:
type(1)
Out[1]:
int
In [2]:
a = 'Hello'
type(a)
Out[2]:
str

Exercise: add a print call in the statement above to see the difference with ipython's simplified print. What is the type of type, by the way?

Numeric types

There are three distinct numeric types: integers (int), floating point numbers (float), and complex numbers (complex). We will talk about these in more details in the numerics chapter.

Booleans

There is a built-in boolean data type (bool) useful to test for truth value. Examples:

In [3]:
type(True), type(False)
Out[3]:
(bool, bool)
In [4]:
type(a == 'Hello')
Out[4]:
bool
In [5]:
3 < 5
Out[5]:
True

Note that there are other rules about testing for truth in python. This is quite convenient if you want to avoid doing operation on invalid or empty containers:

In [6]:
if '':
    print('This should not happen')

Refer to the docs for an exhaustive list of boolean operations and comparison operators.

Text

In python (and many other languages) text sequences are named strings (str), which can be of any length:

In [7]:
type('Français')
Out[7]:
str

Unlike some languages, there is no special type for characters:

In [8]:
for char in 'string':
    print(char, type(char))
s <class 'str'>
t <class 'str'>
r <class 'str'>
i <class 'str'>
n <class 'str'>
g <class 'str'>

Since strings behave like lists in many ways, they are often classified together with the sequence types, as seen below.

Sequence types - list, tuple, range

Sequence types support a common set of operations and are therefore very similar:

In [9]:
l = [0, 1, 2]
t = (0, 1, 2)
r = range(3)
In [10]:
1 in l, 1 in t, 1 in r
Out[10]:
(True, True, True)
In [11]:
len(l), len(t), len(r)
Out[11]:
(3, 3, 3)
In [12]:
l + l
Out[12]:
[0, 1, 2, 0, 1, 2]
In [13]:
t + t
Out[13]:
(0, 1, 2, 0, 1, 2)

But this won't make much sense for the range type though. Ranges are a little different:

In [14]:
r = range(10)
r
Out[14]:
range(0, 10)
In [15]:
list(r)
Out[15]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

They are usually used for loops or to generate other sequences. Ranges have a strong advantage over lists and tuples: their elements are given upon request, and ranges have therefore a very low memory consumption. See the following:

In [16]:
range(2**100)
Out[16]:
range(0, 1267650600228229401496703205376)
In [17]:
list(range(2**100))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-17-e76816f49c68> in <module>()
----> 1 list(range(2**100))

OverflowError: Python int too large to convert to C ssize_t

An OverflowError tells me that I'm trying to create an array too big to fit into memory.

As mentioned earlier, strings also have much in common with lists:

In [18]:
'gg' in 'eggs'
Out[18]:
True

"Tuple" is probably a new concept for you. It is a type quite specific to python. It is almost like a list, but it is immutable:

In [19]:
l[1] = 'ha!'
l
Out[19]:
[0, 'ha!', 2]
In [20]:
t[1] = 'ha?'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-7f5185aad900> in <module>()
----> 1 t[1] = 'ha?'

TypeError: 'tuple' object does not support item assignment

It is exactly this property which makes tuples useful, but this is not really obvious at the first sight. We'll leave it here for now.

Sets

Sets are an unordered collection of distinct objects:

In [21]:
s = {'why', 1, 9}
s.union({9, 'not'})
Out[21]:
{1, 9, 'not', 'why'}

Sets are useful for realising operations such as intersection, union, difference, and symmetric difference.

Mapping types - dict

A mapping object maps values (keys) to arbitrary objects (values). It is a collection of (key, value) pairs:

In [22]:
d = {'a':1, 2:'b'}
d
Out[22]:
{2: 'b', 'a': 1}
In [23]:
2 in d
Out[23]:
True
In [24]:
1 in d
Out[24]:
False
In [25]:
d['a']
Out[25]:
1

Dictionaries are (together with lists) the container type you will use the most often.

Note: there are other mapping types in python, but they are all related to the original dict. Examples include collections.OrderedDict, which is a dictionary preserving the order in which the keys are entered.

Exercise: can you think of examples of application of a dict? Describe a couple of them!

Semantics parenthesis: "literals"

Literals are the fixed values of a programming language ("notations"). Some of them are pretty universal, like numbers or strings (9, 3.14, "Hi!", all literals) some are more language specific and belong to the language's syntax. Curly brackets {} for example are the literal representation of a set. This syntax has been added for convenience only:

In [26]:
s1 = set([1, 2, 3])
s2 = {1, 2, 3}
s1 == s2
Out[26]:
True
In [27]:
d1 = dict(bird='parrot', plant='crocus')
d2 = {'bird':'parrot', 'plant':'crocus'}
d1 == d2
Out[27]:
True

Using one or the other way to construct your containers is a matter of taste, but in practice you will see the literal version more often.

Control flow

In the tutorial you've learned about for and while loops (the latter being much less frequent). I'll just summarize the most important information here:

In python, one iterates over sequences, not indexes

This is the major difference with loops you might have known from other languages (and this is one of the best things in python):

Unpythonic:

In [28]:
seq = ['This', 'is', 'very', 'unpythonic']
for i in range(len(seq)):
    print(seq[i])
This
is
very
unpythonic

Pythonic:

In [29]:
seq[-1] = 'pythonic'
for s in seq:
    print(s)
This
is
very
pythonic

for i in range(xx) is almost never what you want to do in python. If you have several sequences you want to iterate over, then do:

In [30]:
lis = [1, 2, 3, 5]
for s, l in zip(seq, lis):
    print(l, s)
1 This
2 is
3 very
5 pythonic

In python, indentation matters

Blocks are defined via their indentation, and not via brackets or begin .. end statements. Hate it or love it, this is how it is ;-). I learned to like this style a lot. Although the indentation could be anything (two spaces, tabs...), the recommended way is to use four spaces.

In python, functions have two types of arguments

These are the positional arguments and keyword arguments.

keyword arguments are preceded by an identifier (e.g. name=) and are attributed a default value. They are therefore optional and the order in which they are given to the function does not matter:

In [31]:
def f(arg1, arg2, kwarg1=None, kwarg2='None'):
    print(arg1, arg2, kwarg1, kwarg2)
In [32]:
f(1, 2, kwarg2='Yes', kwarg1=3.14)
1 2 3.14 Yes
In [33]:
f(1, 2, 'Yes', 'No')
1 2 Yes No

I am not a big fan of this feature because it reduces the clarity of the code and recommend to always use the kwarg= syntax. I am not alone in this case, and therefore python implemented a syntax to make calls like above illegal:

In [34]:
# The * before the keyword arguments make them keyword arguments ONLY
def f(arg1, arg2, *, kwarg1=None, kwarg2='None'):
    print(arg1, arg2, kwarg1, kwarg2)
In [35]:
f(1, 2, 'Yes', 'No')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-affff2e98955> in <module>()
----> 1 f(1, 2, 'Yes', 'No')

TypeError: f() takes 2 positional arguments but 4 were given

positional arguments are named like this because their position matters, and unlike keyword arguments they don't have a default value and are mandatory. Forgetting to set them results in an error:

In [36]:
f(1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-281ab0a37d7d> in <module>()
----> 1 f(1)

TypeError: f() missing 1 required positional argument: 'arg2'

Take home points

  • we learned the basic elements of the python syntax. To become fluent in this language you will have to master all of the elements presented above.
  • we introduced new concepts: data type, literal, sequences, mapping, keyword and positional arguments.

What's next?