Reading space-separated files

Contents

Reading space-separated files#

In one of the radiation lecture, you are asked to read files in .txt format in which the header contains metadata, followed by the data in columns, separated by spaces. The file looks like this.

Let’s read it with pandas.

Read the metadata#

This has to be done in pure python, line by line:

df_meta = {}
with open('SCIA_GSFC_NO2.txt') as file:
    for i, line in enumerate(file):
        line = line.rstrip()
        print(line)
        if line.startswith('Column'):
            k, v = line.split(':')
            df_meta[k.strip()] = v.strip()
        # Stop after 30 lines
        if i > 30:
            break

SCIAMACHY retrievals of total NO2 vertical column amounts
Pixel footprint overlaps Pandora location GSFC, Lat 38.9926, Long -76.8396
Climatological footprint correction is based on 0.05 grid resolution and the effective Pandora location.
----------------------------------------------------------------
Column 1: UT date and time for center of measurement, yyyymmddThhmmssZ (ISO 8601)
Column 2: Solar zenith angle at the center-time of the measurement in degree
Column 3: Footprint pixel center latitude
Column 4: Footprint pixel center longitude
Column 5: Footprint pixel latitude corner 1
Column 6: Footprint pixel latitude corner 2
Column 7: Footprint pixel latitude corner 3
Column 8: Footprint pixel latitude corner 4
Column 9: Footprint pixel longitude corner 1
Column 10: Footprint pixel longitude corner 2
Column 11: Footprint pixel longitude corner 3
Column 12: Footprint pixel longitude corner 4
Column 13: NO2 vertical column amount [Dobson Units]
Column 14: Uncertainty of NO2 vertical column amount [Dobson Units]
Column 15: Climatological footprint correction of NO2 vertical column amount [Dobson Units]
Column 16: Uncertainty climatological footprint correction of NO2 vertical column amount [Dobson Units]
Column 17: Cloud fraction
Column 18: Footprint pixel area [deg^2]
Column 19: Effective Pandora location latitudinal correction [deg]
Column 20: Effective Pandora location longitudinal correction [deg]
----------------------------------------------------------------
20090505T154440Z 28.4650 39.0214 -76.7878 39.0695 38.8444 39.1940 38.9728 -76.4204 -76.5029 -77.0681 -77.1471 1.5182e-01 8.2979e-03 5.4211e-04 7.7206e-04 0.9796 0.1544 -1.7804e-03 3.3307e-03
20090509T151909Z 31.2893 39.0146 -77.0317 39.0811 38.8425 39.1922 38.9447 -76.6278 -76.6920 -77.3865 -77.4505 1.4768e-01 7.5719e-03 3.3673e-03 7.7109e-04 0.1864 0.1912 -1.6040e-03 4.4252e-03
20090509T151910Z 31.2880 38.9243 -76.6652 39.3051 39.0468 38.7999 38.5739 -78.1856 -78.2494 -75.2876 -75.3527 1.3956e-01 7.2270e-03 1.2312e-02 7.6775e-04 0.1858 0.7330 -1.6040e-03 4.4248e-03
20090518T153611Z 27.0193 38.9538 -76.6282 39.0085 38.7916 39.1146 38.8982 -76.2898 -76.3627 -76.8938 -76.9647 2.3173e-01 1.1801e-02 8.9077e-04 7.7275e-04 0.0000 0.1383 -1.4539e-03 3.5464e-03
20090524T154731Z 24.4820 39.0604 -76.7830 39.1051 38.8745 39.2404 39.0153 -76.3954 -76.4825 -77.0766 -77.1595 1.2280e-01 7.5891e-03 8.6057e-04 7.7163e-04 0.2446 0.1665 -1.3754e-03 3.0335e-03
20090525T151619Z 29.2774 39.0623 -77.1074 39.1328 38.8852 39.2461 38.9878 -76.6839 -76.7478 -77.4854 -77.5491 1.4499e-01 7.7248e-03 5.2112e-03 7.7081e-04 0.2241 0.2096 -1.2583e-03 4.3692e-03
20090525T151619Z 29.2767 38.8651 -75.9646 39.2461 38.9878 38.7406 38.5146 -77.4854 -77.5491 -74.5870 -74.6520 1.3296e-01 7.9510e-03 1.1861e-02 7.6768e-04 0.2433 0.7332 -1.2583e-03 4.3690e-03

df_meta

{'Column 1': 'UT date and time for center of measurement, yyyymmddThhmmssZ (ISO 8601)',
 'Column 2': 'Solar zenith angle at the center-time of the measurement in degree',
 'Column 3': 'Footprint pixel center latitude',
 'Column 4': 'Footprint pixel center longitude',
 'Column 5': 'Footprint pixel latitude corner 1',
 'Column 6': 'Footprint pixel latitude corner 2',
 'Column 7': 'Footprint pixel latitude corner 3',
 'Column 8': 'Footprint pixel latitude corner 4',
 'Column 9': 'Footprint pixel longitude corner 1',
 'Column 10': 'Footprint pixel longitude corner 2',
 'Column 11': 'Footprint pixel longitude corner 3',
 'Column 12': 'Footprint pixel longitude corner 4',
 'Column 13': 'NO2 vertical column amount [Dobson Units]',
 'Column 14': 'Uncertainty of NO2 vertical column amount [Dobson Units]',
 'Column 15': 'Climatological footprint correction of NO2 vertical column amount [Dobson Units]',
 'Column 16': 'Uncertainty climatological footprint correction of NO2 vertical column amount [Dobson Units]',
 'Column 17': 'Cloud fraction',
 'Column 18': 'Footprint pixel area [deg^2]',
 'Column 19': 'Effective Pandora location latitudinal correction [deg]',
 'Column 20': 'Effective Pandora location longitudinal correction [deg]'}

I think that at this point it would be best to rename the columns to better variable names. Also, the exact line at which the data starts could also be inferred programmatically. This exercise is left to the reader.

Read the data#

import pandas as pd

df = pd.read_csv('SCIA_GSFC_NO2.txt',
                 header=None,  # There is no proper header in the file
                 sep=' ',  # The separator is spaces
                 skiprows=25,  # The first rows are not organized (25 could be fetched automatically)
                 index_col=0,  # The first column is the time index
                 parse_dates=True,  # Parse the time automatically
                )

# Give "nicer" names to columns
df.index.name = 'Time (UTC)'
df.columns = list(df_meta.keys())[1:]

df

	Column 2	Column 3	Column 4	Column 5	Column 6	Column 7	Column 8	Column 9	Column 10	Column 11	Column 12	Column 13	Column 14	Column 15	Column 16	Column 17	Column 18	Column 19	Column 20
Time (UTC)
2009-05-05 15:44:40+00:00	28.4650	39.0214	-76.7878	39.0695	38.8444	39.1940	38.9728	-76.4204	-76.5029	-77.0681	-77.1471	0.15182	0.008298	0.000542	0.000772	0.9796	0.1544	-0.001780	0.003331
2009-05-09 15:19:09+00:00	31.2893	39.0146	-77.0317	39.0811	38.8425	39.1922	38.9447	-76.6278	-76.6920	-77.3865	-77.4505	0.14768	0.007572	0.003367	0.000771	0.1864	0.1912	-0.001604	0.004425
2009-05-09 15:19:10+00:00	31.2880	38.9243	-76.6652	39.3051	39.0468	38.7999	38.5739	-78.1856	-78.2494	-75.2876	-75.3527	0.13956	0.007227	0.012312	0.000768	0.1858	0.7330	-0.001604	0.004425
2009-05-18 15:36:11+00:00	27.0193	38.9538	-76.6282	39.0085	38.7916	39.1146	38.8982	-76.2898	-76.3627	-76.8938	-76.9647	0.23173	0.011801	0.000891	0.000773	0.0000	0.1383	-0.001454	0.003546
2009-05-24 15:47:31+00:00	24.4820	39.0604	-76.7830	39.1051	38.8745	39.2404	39.0153	-76.3954	-76.4825	-77.0766	-77.1595	0.12280	0.007589	0.000861	0.000772	0.2446	0.1665	-0.001375	0.003034
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2012-03-18 15:36:11+00:00	45.6646	38.8337	-76.5165	39.1740	38.9530	38.7062	38.4945	-77.7083	-77.7721	-75.3132	-75.3815	0.10088	0.009215	0.010842	0.000768	0.5504	0.5484	-0.003730	0.005392
2012-03-23 15:52:37+00:00	41.9500	39.3591	-78.0177	39.7136	39.5019	39.1953	38.9750	-79.1681	-79.2379	-76.7408	-76.8210	0.15875	0.010232	NaN	NaN	0.0000	0.5623	-0.003456	0.004192
2012-03-23 15:52:39+00:00	41.9475	39.0725	-77.1190	39.1203	38.9000	39.2407	39.0243	-76.7593	-76.8395	-77.3939	-77.4707	0.21058	0.010698	0.004900	0.000773	0.0000	0.1478	-0.003456	0.004190
2012-04-06 15:39:29+00:00	38.0911	39.0904	-76.9473	39.1477	38.9354	39.2458	39.0318	-76.6175	-76.6847	-77.2129	-77.2788	0.18388	0.010499	0.002880	0.000773	0.0000	0.1333	-0.002774	0.004349
2012-04-06 15:39:31+00:00	38.0871	39.0305	-77.2920	39.3710	39.1500	38.9027	38.6910	-78.4871	-78.5511	-76.0853	-76.1538	0.15921	0.011494	0.012565	0.000768	0.0000	0.5500	-0.002774	0.004348

271 rows × 19 columns

df['Column 2'].plot();

../_images/33ebd0cc9d042cc764876717e34b994306b34af18d1ff64dacf94338fbdff2b5.png