In the first two OOP units you learned the basic semantics of OOP in python. In this unit we will attempt to provide concrete examples of the use of objects in python (and other OOP languages) and provide some arguments in favor of the use of OOP in your everyday programming tasks.
OOP is a tool that, when used wisely, can help you to structure your programs in a way which might be more readable, easier to maintain and more flexible than purely procedural programs. But why is that so? In this lecture, we will discuss five core concepts of OOP:
Data abstraction refers to the separation between the abstract properties of an object and its internal representation. By giving a name to things and hiding unnecessary details from the user, objects provide an intuitive interface to concepts which might be very complex internally.
Going back to our examples from the last two units: we used the term "objects" in programming as surrogate for actual objects in the real world: a cat, a pen, a car... These objects have a state (in OOP: attributes
) and realize actions (in OOP: methods
). For a pen, the states (attributes) could be: ink_color
, ink_volume
, point_size
, etc. The actions (methods) could be: write()
, fill_ink()
, etc.
OOP allows you to write programs which feel more natural and intuitive than functions and procedures. If a concept in your program is easily describable in terms of "state" and "actions", it might be a good candidate for writing a python class.
Let's make an example based on a widely used object in Python, with an instance of the class string
:
a = 'hello!'
The state of our object is relatively simple to describe: it is the sentence (list of characters) stored in the object. We have access to this state (we can print its values) but the way these values are stored in memory is abstracted away. We don't care about the details, we just want a string. Now, a string provides many actions:
a.capitalize()
a.capitalize().istitle()
a.split('l')
Abstractions should be as simple and well defined as possible. Sometimes there is more than one possible way to provide an abstraction to the user, and it becomes a debate among the developers of a project whether these abstractions are useful or not.
Well defined abstractions can be composed together. A good example is provided by the xarray library that you are using in the climate lecture: an xarray.DataSet
is composed of several xarray.DataArray
objects. These xarray.DataArray
objects have the function to store data (a numpy.ndarray
object) together with coordinates (other numpy.ndarray
objects) and attributes (units, name, etc.). The xarray.DataArray
objects have the task to storexarray.DataArray
s. This chain of abstractions is possible only if each of these concepts has a clearly defined role: xarray does not mess around with numbers in arrays: numpy does the numerical job behind the scenes. Inversely, numpy does not care whether an array has coordinates or not: xarray does the job of tracking variable descriptions and units.
Encapsulation is tied to the concept of abstraction. By hiding the internal implementation of a class behind a defined interface, users of the class do not need to know details about the internals of the class to use it. The implementation of the class can be changed (or internal data can be modified) without having to change the code of the users of the class.
In python, encapsulation is more difficult to achieve than in other languages like Java. Indeed, Java implements the concept of private methods and attributes, which are hidden from the user per definition. In python, nothing is hidden from the user: however, developers make use of important conventions to inform the users that a method or attribute is not meant to be used by the class alone, not by users. Let's take an xarray DataArray as an example:
import xarray as xr
import numpy as np
da = xr.DataArray([1, 2, 3])
print(dir(da))
In this (very) long list of methods and attributes, some of them are available and documented. For example:
da.values
Other methods/attributes start with one underscore. This underscore has no special meaning in the Python language other than being a warning to the users, saying as much as: "Don't use this method or attribute. If you do, do it at your own risk". For example:
da._in_memory
_in_memory
is an attribute which is meant for internal use in the class (it is called private). Setting it to another value might have unpredictable consequences, and relying on it for your own code is not recommended: the xarray developers might rename it or change it without notice.
The methods having two trailing and leading underscores have a special meaning in Python and are part of the language specifications. We already encountered __init__
for our class instantiation, and we will talk about some others later in this chapter.
Modularity is a technique to separate different portions of the program (modules) based on some logical boundary. Modularity is a general principle in programming, although object-oriented programming typically makes it more explicit by giving meaningful names and actions to the program's tools.
Taking the example of xarray.DataArray
and numpy.Array
again: both classes have very clear domains of functionality. The latter shines at doing fast numerical computations on arrays, the former provides an intuitive abstraction to the internal arrays by giving names and coordinates to its axes. Modularity is achieved thanks to the naming and documentation of each object's tasks and purpose.
Polymorphism is the name given to the technique of creating multiple classes that obey the same interface. The "interface" of an object is the set of public attributes and methods it defines.
Objects from different classes can be mixed at runtime if they obey the same interface. In other words, polymorphism originates from the fact that a certain action can have well defined but different meanings depending on the objects they apply to.
An example of polymorphism is provided by the addition operation in python:
1 + 1
1 + 1.2
[1, 2] + [3, 4] + [5]
np.array([1, 2]) + [3, 4]
Each of these addition operations are performing a different action depending on the object they are applied to.
OOP relies on polymorphism to provide higher levels of abstraction. In our Cat
and Dog
example from last week, both classes provided a say_name()
method: the internal implementation, however, was different in each case.
Many OOP languages (including Python) provide powerful tools for the purpose of polymorphism. One of them is operator overloading:
class ArrayList(list):
def __repr__(self):
"""Don't do this at home!"""
return 'ArrayList(' + super().__repr__() + ')'
def __add__(self, other):
"""Don't do this at home!"""
return [a + b for a, b in zip(self, other)]
What did we just do? The class definition (class ArrayList(list)
) indicates that we created a subclass of the parent class list
, a well known data type in python. Our child class has all the attributes and methods of the original parent class:
a = ArrayList([1, 2, 3])
len(a)
a
b = [1, 2, 3]
b
np.array([1, 2, 3])
Now, we defined a method __add__
, which allows us to do some python magic: __add__
is the method which is actually called when two objects are added together. This means that the two statements below are totally equivalent:
[1] + [2]
[1].__add__([2]) # the functional version of the literal above
Now, what does that mean for the addition on our ArrayList
class? Let's try and find out:
a + [11, 12, 13]
We just defined a new way to realize additions on lists! How did this happen? Well, exactly like the example above: the python interpreter understood that it has to apply the literal operator +
on the two objects a
and [11, 12, 13]
, which translates to a call to a.__add__([11, 12, 13])
, which calls our own implementation of the list addition.
This is a very powerful mechanism: a prominent example is provided by numpy: by implementing the __add__
method on ndarray objects, they provide a new functionality which is hidden from the user but intuitive at the same time. Numpy arrays not only implement __add__
, they also implement __mul__
, __div__
, __repr__
, etc.
Exercise: what does the
__repr__
operator do? Can you implement one for our ArrayList
class? For example, it could make clear that the __repr__
is that of an ArrayList
, not of a list
(like numpy arrays).
a
a + [1, 2, 3]
Operator overloading should be used with care and can be considered an advanced use case of python classes, in particular when used with inheritance as in the example above. People used to lists
in python won't be happy with your new behavior, i.e. you have to be careful to document what you are doing.
For example, our class above is not finished yet. Indeed, see what happens here:
[11, 12, 13] + a
Huh? This did not work as expected! The difference with above is that our custom list a
is now on the right-hand side of the operator. For this behavior there is a class interface as well, called __radd__
(for "right-hand side addition"). Let's define this operator to do the same as on the left-hand side (this makes sense, because addition is commutative anyway):
class ArrayList(list):
def __add__(self, other):
"""Don't do this at home!"""
return [a + b for a, b in zip(self, other)]
__radd__ = __add__
a = ArrayList([1, 2, 3])
[11, 12, 13] + a
This looks better now! But this example illustrates the complexity of the topic, and recommends due caution with operator overloading.
Inheritance is a core OOP mechanism which is very useful to provide abstraction, encapsulation, modularity, and polymorphism to python objects. Let's take the concrete example of the Pet
, Cat
and Dog
classes from last week. Inheritance provides:
.say_name_loudly()
), but the actual outcome depends on the class of the caller (cat or dog) Now, how does the concept of class inheritance apply to real-world scientific applications? There are several examples from the scientific libraries you are using yourself:
WritableCFDataStore
or BackendArray
.YAxis
and XAxis
classes inherit from the general Axis
class, which itself inherits from the base Artist
class, which is responsible of drawing all kind of things. Such relationships are sometimes visually summarized in a class diagram like this one.PlateCarree
projection is one realization of the more general Projection
parent class, which has many subclasses.Inheritance can also be useful for numerical models. In the glacier model we are developing, we are using inheritance to provide different ways to compute the surface mass-balance of glaciers. Some mass-balance models are very simple (e.g. LinearMassBalance) and others are more complex (e.g. PastMassBalance), but all models inherit from a common MassBalanceModel class which defines their interface, i.e. how their methods should be called and the units of the data they compute. This is very useful in the model structure, because the actual user of the mass-balance models (in our case, a glacier dynamics model) then doesn't have to care at all about which mass-balance model is actually providing the data: it just needs the mass-balance, not the details about how it's computed.
A "true" OOP lecture could occupy a full semester: as an engineering student, I had a full semester lecture of functional programming in C, then a full semester of OOP programming in Java. Here, we only scratched the surface of a complex topic, and if you want to dig deeper in the programming world you will have to learn these skills yourself or with more advanced lectures. Fortunately, programming is one of the easiest skill to train online, and I hope that my introduction will help you to get started.
Back to the table of contents, or jump to this week's assignment!