# 02 - The scientific python stack

Unlike Matlab, the set of Python tools used by scientists does not come from one single source. It is the result of a non-coordinated, chaotic and creative development process originating from a community of volunteers and professionals. 

In this chapter I will shortly describe some of the essential tools that every scientific python programmer should know about. It is not representative or complete: it's just a list of packages I happen to know about, and I surely missed many of them.

## Python's scientific ecosystem

The set of python scientific packages is sometimes referred to as the "scientific python ecosystem". I didn't find an official explanation for this name, but I guess that it has something to do with the fact that many packages rely on the others to build new features on top of them, like a natural ecosystem.

Jake Vanderplas made a great graphic in a [2015 presentation](https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote) (the video of the presentation is also [available here](https://www.youtube.com/watch?v=5GlNDD7qbP4) if you are interested), and I took the liberty to adapt it a little bit:

![img](https://fabienmaussion.info/acinn_python_workshop/figures/scipy_ecosystem.png)

## The core packages 


- **numpy**: [documentation](https://docs.scipy.org/doc/), [code repository](https://github.com/numpy/numpy)
- **scipy**: [documentation](https://docs.scipy.org/doc/scipy/reference/), [code repository](https://github.com/scipy/scipy)
- **matplotlib**: [documentation](https://matplotlib.org/), [code repository](https://github.com/matplotlib/matplotlib)

Numpy provides the N-dimensional arrays necessary to do fast computations, and SciPy adds the fundamental scientific tools to it. SciPy is a very large package and covers many aspects of the scientific workflow. It is organized in submodules, all dedicated to a specific aspect of data processing. For example: [scipy.integrate](https://docs.scipy.org/doc/scipy/reference/integrate.html), [scipy.optimize](https://docs.scipy.org/doc/scipy/reference/optimize.html), or [scipy.linalg](https://docs.scipy.org/doc/scipy/reference/linalg.html). Matplotlib is the traditional package to make graphics in python.

## Essential numpy "extensions"

There are two packages which I consider essential when it comes to data processing:
- **pandas** provides data structures designed to make working with labeled data both easy and intuitive ([documentation](http://pandas.pydata.org/pandas-docs/stable/), [code repository](https://github.com/pandas-dev/pandas)). 
- **xarray** extends pandas to N-dimensional arrays ([documentation](http://xarray.pydata.org), [code repository](https://github.com/pydata/xarray)).

They both add a layer of abstraction to numpy arrays, giving "names" and "labels" to their dimensions and the data they contain. We will talk about them in the lecture, and most importantly, you will use both of them during the climate and cryosphere master lectures.

## Domain specific packages 

There are so many of them! I can't list them all, but here are a few that you will probably come across in your career:

**Geosciences/Meteorology**:
- [MetPy](https://unidata.github.io/MetPy/latest/index.html): the meteorology toolbox
- [Cartopy](https://scitools.org.uk/cartopy): maps and map projections
- [xESMF](https://xesmf.readthedocs.io/en/latest/): Universal Regridder for Geospatial Data
- [xgcm](https://xgcm.readthedocs.io/en/latest/): General Circulation Model Postprocessing with xarray
- [GeoPandas](http://geopandas.org/): Pandas for vector data
- [Rasterio](https://rasterio.readthedocs.io/en/latest/): geospatial raster data I/O

**Statistics/Machine Learning**:
- [Statsmodels](https://www.statsmodels.org/stable/index.html): statistic toolbox for models and tests
- [Seaborn](https://seaborn.pydata.org/index.html): statistical data visualization
- [Scikit-learn](http://scikit-learn.org/): machine learning tools
- [TensorFlow](https://www.tensorflow.org/): Google's brain
- [PyTorch](https://pytorch.org/): Facebook's brain

**Miscellaneous**:
- [Scikit-image](https://scikit-image.org/): image processing
- [Bokeh](https://bokeh.pydata.org/en/latest/): interactive plots
- [Dask](http://docs.dask.org/en/latest/): parallel computing
- ...