What open-source can do for you

and what you can do for open-source

An incentive for better software development practices in the research community

           

ACINN Graduate Seminar, 08.11.2017

Fabien Maussion
Institute of Atmospheric and Cryospheric Sciences (ACINN)
University of Innsbruck

Disclaimer

I am a Python user, and therefore the examples in this talk come from the Python world. But the message remains the same with any other programming languages like C, R, or Julia.


(if you are not a programmer yourself there is still something to learn from Open Source!)
(and yes, this also works with proprietary languages like Matlab or IDL, however this is partly missing the point)

Take home points

What can open-source do for you?

Open-source isn't only "free": it is a programming culture conveying the concepts of code readability, documentation, reproducibility, and open review.

These concepts can (and should) be applied to the scientific practice (Open Science).

Embracing this culture and getting involved in the community (even with small contributions!) will offer you much more than simple "tools": you will learn from this experience and develop ideas for your own work.

Take home points

What can you do for open-source?

The responsibility for the development and maintenance of today's major scientific tools changed hands, from private companies to volunteers and academics. Funding agencies and universities are saving money, and this money should be redistributed to the open-source community (directly or indirectly).

Open science takes time! Scientific papers should be evaluated according to new standards: transparency and reproducibility of the analysis chain, availability of data and code and its documentation.

Open source takes time! The work of open source developers should be acknowledged and should become an asset for academic jobs, not a handicap.

Not convinced yet?

Let's start with a provocative statement...

Science relies on

  • peer review
  • skepticism
  • transparency
  • attribution
  • accountability
  • collaboration
  • impact


Since always, academic science has been perfecting these tenents.
Open source software is now excellent (superior?) at all of them.
© original slide by Katy Huff

A common misconception

Open source free (as in "free beer")

Free Open source

Open source source-available

Important elements of an open-source project:

  • License (copyleft or permissive)
  • Documentation
  • Contributing rules
  • Tests
  • Peer review
  • Community governance (rules depend on the project)

Collaborative development workflow

Example


New feature in the Astropy package

A common misconception

“Open-source software is not-reliable / untested / not-working properly.”

Well, it depends:
OS development is self-regulating, and it works best if people actually use the system and report issues.

Bug fixes, with...

... a proprietary software:
  • User fills a ticket
  • If lucky, after a certain time the user gets notified that the bug was fixed
  • The fix is available in the upcoming version of the software
    (sometimes at the price of a new license)


... an open-source software:
  • User fills a ticket
  • Maintainers and user discuss the cause of the bug
  • If lucky, someone volunteers to fix the bug
  • A fix is proposed to the code base, discussed, and eventually merged
Things to consider when using OS software:
  • How large is the user base?
  • Are the maintainers responsive?
  • Are there bug reports? (the more, the better!)
  • Where are the tests, do they cover the functionality I'm using?
  • Continuous integration?
  • (Is there an institution supporting the software?)

Testing and continuous integration



(is it getting boring already? We'll only do a short introduction ;)

I'll throw some affirmations in the room

(correct me if I'm wrong ;)
  • “coding” is a menial task, not worthy of the attention of a distinguished scientist
  • I can write error-free code in one shot
  • Scientists do test their code...
  • ... but often not in a formal/reproducible way
  • A script that worked once will always work.

A glimpse in software testing techniques

A common misconception

“If I share my data / code people will steal my results and I will be left with nothing.”

Publishing First

  • Good: share code/data once all papers have been published
  • Better: share code/data as soon as there is a pre-print
  • Best: share, with a license, while working, if not sooner
Congratulations: your online repository history is an insurance policy against thievery.
© original slide by Katy Huff

Data and the cake

Success stories

Software carpentry
Project Jupyter
and its contribution to the ACInn lectures
and, incidentally, to the Nobel Prize in Physics
xarray

And many more...

Challenges

Growth of popular data-science languages

https://insights.stackoverflow.com/trends       https://stackoverflow.blog/2017/09/06/incredible-growth-python/

Sustainability crisis?

  • Number of users increases, what about the number of maintainers?
  • User claim for stable, reliable, and cross-platform software impedes innovation (e.g. Py2 VS Py3 conundrum)
  • Did you know that NumPy isn't funded? (this will change soon)
  • Did you know that most OS packages can be cited in scientific publications?


See e.g. initatives like

Cloud computing and big data

  • CMIP3 (40 TB), CMIP5 (2 PB), CMIP6 (~10 PB) (source)
  • Earth Observation data
Google Earth Engine, good or evil?
With open source, modern software development practices and education, we can tackle these.

There is a train to catch, and it is time that (EU) universities and funding agencies realize it, so that we don't miss it.

Thank you!


The presentation "What open-source can do for you and what you can do for open-source"
by Fabien Maussion is licensed under a Creative Commons Attribution 4.0 International License.
Further reading: