String formatting and file paths#

A scientific data analysis workflow almost invariably implies downloading, manipulating and opening files. Often, it also implies writing new files (for example with post-processed data). Fortunately, python comes with many handy tools to format strings.

String Formatting#

“String formatting” refers to formatting the content of variables (strings, numbers, paths, etc.) into strings, for example to display them on screen or to write them to a text file. Unfortunately, there are more than one way to format strings in python (actually there are at least 4!). This short section will guide you to the ones you should preferably use.

The modern way: formatted string (“f-string”) literals#

Consider the following example:

name = 'Assane'
print(f'Hello {name}!')
Hello Assane!

The important bits here is the f prefix to the string literal which indicates to the interpreter that the string might contain curly braces, which contain variable names that will be replaced with their values.

Not only strings can be formatted into a string. Numbers can too:

n = 3
print(f'{name} has {n} cats.')
Assane has 3 cats.
pi = 3.14 
print(f'pi ≈ {pi}')
pi ≈ 3.14

f-strings are quite powerful. They can evaluate arbitrary expressions:

print(f'2 pi ≈ {2 * pi}')
2 pi ≈ 6.28

This feature is to be used with care, you probably don’t want to have very complicated expressions within your f-strings!

Formatting numbers in strings#

Very often, you want your strings to be of predictable length and format. For example, you may want float numbers to be printed only with a chosen precision:

frac = 2 / 3
print(f'Set free: {frac}')
print(f'Formatted to 2 decimals: {frac:.2f}')  # f is for "float"
print(f'Formatted to an integer: {int(frac)}')
print(f'Formatted to a rounded integer: {round(frac)}')
print(f'Formatted to a rounded integer with leading spaces: {round(frac):4d}')  # d is for "int"
print(f'Formatted to a rounded integer with leading zeros: {round(frac):04d}')
Set free: 0.6666666666666666
Formatted to 2 decimals: 0.67
Formatted to an integer: 0
Formatted to a rounded integer: 1
Formatted to a rounded integer with leading spaces:    1
Formatted to a rounded integer with leading zeros: 0001

Formatting dates in strings#

Also possible:

import datetime
now = datetime.datetime.now()

print(now)
print(f'{now:%Y-%m-%d %H:%M}')
2022-10-13 19:24:32.452266
2022-10-13 19:24

The old ways to format strings#

f-strings have been added to python in version 3.6 (end of 2016: that’s only a few years back!). Before that, two other string formatting tools were available and are still used today. Therefore, you should learn them as well.

The “not too old” way: .format() (still useful!)#

This string formatting method was introduced with python 3 (so that’s already a bit older). It works with appending a call to the .format() method to a string:

print('Hello, {}!'.format(name))
Hello, Assane!

No f prefix here (don’t mix them up!). Otherwise, it works more or less the same:

print('{} has {} cats.'.format(name, n))
print('{name_of_person} has {n_cats:02d} cats, right?'.format(n_cats=n, name_of_person=name))
Assane has 3 cats.
Assane has 03 cats, right?

You see where this is going! This is pretty much the same syntax, but a bit more verbose. f-strings are generally more readable and should be preferred, but the .format() can be useful in very specific cases, as shown in the greetings example from the previous lesson which I adapt here:

template_string = 'Hey {name}! I think you should come over to {city} and visit {place}.'
# some code ommitted here...
print(template_string.format(name='Lesedi', city='Cape Town', place='Table Mountain'))
Hey Lesedi! I think you should come over to Cape Town and visit Table Mountain.

The nice thing in the example above is that you can create a template string early in your script, that you can “fill” with values later in your program.

The very old way: the % operator#

This was the standard in python 2 but is still working today (and will continue to work in the future):

print('Hello %s!' % name)
Hello Assane!

We don’t like this way, but you may encounter it in some older code.

Take home: string formatting#

  • there is probably no string format you can think of that doesn’t have a formatting solution (people are manipulating strings all the time)

  • use f-strings! They are great.

  • sometimes, use .format(). This is also great.

  • if you want more examples, read this tutorial from the python docs

Path handling in python#

Handling file paths is one of the not-so-fun parts of a scientist’s job, I agree. But we just have to do it, there is no way around it.

Please read this short blog post which is a good entry-level tutorial to path handling.

Take home: path handling#

  • never use + and other string shenanigans to handle your file paths

  • os.path and, in particular, os.path.join is a simple way to deal with paths as strings

  • pathlib is the new cool kid in the block. It works very well but might be a bit confusing at first.