Assignment #05

Assignment #05#

numpy and matplotlib are two fundamental pillars of the scientific python stack. You will find numerous tutorials for both libraries online. I am asking you to learn the basics of both tools by yourself, at the pace that suits you. I can recommend these two tutorials:

They can be quite long if you are new to numpy - I’m not asking to do them all today! Sections 1.4.1.1 to 1.4.1.5 in the numpy tutorial should get you enough information for today’s assignments, or you can try without it and learn on the fly - your choice!

Exercise #05-01: numpy cycles#

Monthly averages of temperature data at Innsbruck can be downloaded from this lecture’s github via:

from urllib.request import Request, urlopen
import json

# Parse the given url
url = 'https://raw.githubusercontent.com/fmaussion/scientific_programming/master/data/innsbruck_temp.json'
req = urlopen(Request(url)).read()
# Read the data
inn_data = json.loads(req.decode('utf-8'))

(original data obtained from NOAA’s Global Surface Summary of the Day)

Explore the inn_data variable. What is the type of “inn_data”, and of the data it contains? Convert the data series to numpy arrays.

Using numpy/scipy, matplotlib, and the standard library only, compute and plot the mean monthly annual cycle for 1981-2010 and the mean annual temperature timeseries for 1977-2017. Compute the linear trend (using scipy.stats.linregress) of the average annual temperature over 1977-2017. Repeat with winter (DJF) and summer (JJA) trends.

Tip 1: to select part of an array (indexing) based on a condition, you can use the following syntax:

import numpy as np
x = np.arange(10)
y = x**2
y[x > 4]  # select y based on the values in x

array([25, 36, 49, 64, 81])

Tip 2: there is more than one way to compute the annual and monthly means. Some use loops, some use reshaping on the original 1D array.

Exercise #05-02: indexing#

Given a 2D numpy array defined as:

import numpy as np
x = np.array([[1, 2, 3],
              [4, 5, 6]])

The following indexing operations all select the same values out of the array:

x[:, 1]
x[slice(0, 2, 1), 1]
x[(slice(0, 2, 1), 1)]
x[slice(0, 2, 1), slice(1, 2, 1)]
x[..., 1]
x[::1, 1]
x[[0, 1], 1]
x[:, -2]
x[:, 1:2]
x[:, [1]]

This can be checked with the following test:

from numpy.testing import assert_equal

ref = 7

assert_equal(ref, x[:, 1].sum())
assert_equal(ref, x[..., 1].sum())
assert_equal(ref, x[::1, 1].sum())
assert_equal(ref, x[slice(0, 2, 1), 1].sum())
assert_equal(ref, x[(slice(0, 2, 1), 1)].sum())
assert_equal(ref, x[slice(0, 2, 1), slice(1, 2, 1)].sum())
assert_equal(ref, x[[0, 1], 1].sum())
assert_equal(ref, x[:, -2].sum())
assert_equal(ref, x[:, 1:2].sum())
assert_equal(ref, x[:, [1]].sum())

Questions:

What is the ... syntax doing? Again, it is the literal equivalent of an actual python object: what is it?
some of these indexing operations are truly equivalent to the “obvious” one, x[:, 1]. List them.
Classify these operations (i) in basic and advanced operations, and (ii) by the shape of their output. Explain.
I’d like my array a = x[:, 1:2] to have a shape of (2, ) like most of the other operations listed above. What can I do to reshape it?

Exercise #05-03: the difference#

Consider the following example:

a = np.array([1, 2, 3])
b = a
c = a

b = a - 10
c -= 100

What will be the values printed by print(a, b, c) after this code snippet? Explain.

Exercise #05-04: Greenwich#

ERA-Interim reanalysis provides global atmospheric fields from 1979 to today. Someone prepared a grid of average temperature available here:

from urllib.request import Request, urlopen
from io import BytesIO
import json

# Parse the given url
url = 'https://github.com/fmaussion/scientific_programming/raw/master/data/monthly_temp.npz'
req = urlopen(Request(url)).read()
with np.load(BytesIO(req)) as data:
    temp = data['temp']
    lon = data['lon']
    lat = data['lat']

However, the data is not well processed! The longitudes are ranging from 0 to 360°, thus cutting UK and Africa in half! Reorganize the data and the corresponding coordinate to obtain a plot similar to this one:

Exercise #05-05: ACINN meteorological data#

Warning

Online access to ACINN data does not work anymore. For a similar excercise I recommend to have a look at GeoSphere Austria’s API. I may update the resources here in the future.

The institute website provides raw data from several stations around Innsbruck using a live feed at the following addresses:

https://acinn-data.uibk.ac.at/innsbruck/3 for the three days data
https://acinn-data.uibk.ac.at/innsbruck/7 for the seven days data

The datasets for the other stations are available, per analogy:

The data is shared by ACINN under a Creative Commons Attribution-ShareAlike 4.0 International License.

The data is provided in the json format, often used for web applications. Fortunately, this is very easy to read in python:

from urllib.request import Request, urlopen
import json

url = 'https://acinn-data.uibk.ac.at/innsbruck/3'
# Parse the given url
req = urlopen(Request(url)).read()
# Read the data
data = json.loads(req.decode('utf-8'))

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[7], line 6
      4 url = 'https://acinn-data.uibk.ac.at/innsbruck/3'
      5 # Parse the given url
----> 6 req = urlopen(Request(url)).read()
      7 # Read the data
      8 data = json.loads(req.decode('utf-8'))

File ~/.mambaforge/envs/oggm_env/lib/python3.12/urllib/request.py:215, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    213 else:
    214     opener = _opener
--> 215 return opener.open(url, data, timeout)

File ~/.mambaforge/envs/oggm_env/lib/python3.12/urllib/request.py:521, in OpenerDirector.open(self, fullurl, data, timeout)
    519 for processor in self.process_response.get(protocol, []):
    520     meth = getattr(processor, meth_name)
--> 521     response = meth(req, response)
    523 return response

File ~/.mambaforge/envs/oggm_env/lib/python3.12/urllib/request.py:630, in HTTPErrorProcessor.http_response(self, request, response)
    627 # According to RFC 2616, "2xx" code indicates that the client's
    628 # request was successfully received, understood, and accepted.
    629 if not (200 <= code < 300):
--> 630     response = self.parent.error(
    631         'http', request, response, code, msg, hdrs)
    633 return response

File ~/.mambaforge/envs/oggm_env/lib/python3.12/urllib/request.py:559, in OpenerDirector.error(self, proto, *args)
    557 if http_err:
    558     args = (dict, 'default', 'http_error_default') + orig_args
--> 559     return self._call_chain(*args)

File ~/.mambaforge/envs/oggm_env/lib/python3.12/urllib/request.py:492, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    490 for handler in handlers:
    491     func = getattr(handler, meth_name)
--> 492     result = func(*args)
    493     if result is not None:
    494         return result

File ~/.mambaforge/envs/oggm_env/lib/python3.12/urllib/request.py:639, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    638 def http_error_default(self, req, fp, code, msg, hdrs):
--> 639     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 404: Not Found

Now I will help you to parse the timestamp of the data:

from datetime import datetime, timedelta
data['time'] = [datetime(1970, 1, 1) + timedelta(milliseconds=ds) for ds in data['datumsec']]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[8], line 2
      1 from datetime import datetime, timedelta
----> 2 data['time'] = [datetime(1970, 1, 1) + timedelta(milliseconds=ds) for ds in data['datumsec']]

File ~/.mambaforge/envs/oggm_env/lib/python3.12/site-packages/numpy/lib/npyio.py:263, in NpzFile.__getitem__(self, key)
    261         return self.zip.read(key)
    262 else:
--> 263     raise KeyError(f"{key} is not a file in the archive")

KeyError: 'datumsec is not a file in the archive'

And make a first plot to get you started:

import matplotlib.pyplot as plt
plt.plot(data['time'], data['dd'], '.');
plt.ylabel('Wind direction (°)'); 
plt.title('Wind direction at Innsbruck');

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[9], line 2
      1 import matplotlib.pyplot as plt
----> 2 plt.plot(data['time'], data['dd'], '.');
      3 plt.ylabel('Wind direction (°)'); 
      4 plt.title('Wind direction at Innsbruck');

File ~/.mambaforge/envs/oggm_env/lib/python3.12/site-packages/numpy/lib/npyio.py:263, in NpzFile.__getitem__(self, key)
    261         return self.zip.read(key)
    262 else:
--> 263     raise KeyError(f"{key} is not a file in the archive")

KeyError: 'time is not a file in the archive'

You will get much more time to get used to these data in the mid-semester projects. For today, I’m asking to write a script that takes the station and number of days as input (either as command line arguments or user input, your choice) and prints the following information in the terminal:

At station XXX, over the last X days, the dominant wind direction was XX (xx% of the time). 
The second most dominant wind direction was XX (xx% of the time), the least dominant wind
direction was XX (xx% of the time). The maximum wind speed was XX m/s (DATE and TIME), 
while the strongest wind speed averaged over an hour was XX m/s (DATE and TIME).

With the wind directions being of 8 classes: N, NW, W, SW, S, SE, E, NE.