It’s a typical example: the paper is published, describing a new algorithm for data analysis. Mathematical background is described in the paper, roughly. A piece of software that implements it, is written and available for download from a web-site. You visit the web site, download it and run it. You get unexpected results. You wonder what’s happening. You go back to the site and look for the source code ― and it’s not there.

I’ve recently visited and tested two pieces of software doing basically the same thing: predicting missing genotypes. There is no source code for any of those two, and fastPHASE additionally needs you to register and accept an academic license to use it, introducing an annoying delay in obtaining the program.

By the way, why are all those scientific program names written in UPPERCASE? Because it creates an impression of IMPORTANCE? Just a side note.

Scientists work for the sake of humanity (I hope), striving to make our world a better place. Right? So why don’t they make the source code available?

Not releasing source code of scientific software is a Bad Thing, because it harms research in the field and is antisocial. The ones that lose, is the closed-source project itself, other projects in the field, and subsequently, everyone who could have benefit from the research. The only one who can possibly benefit from it, is only the author, but I highly doubt that they ever do.

Keeping the source code secret is a typical practice for corporations, who seek to profit from selling the binaries. I don’t know what business model can be built on restricted source code access in science, but I don’t think they’re every going to make any money on that.

What could be other reasons not to release the source code? Remaining the sole author, keeping all the credit? Keeping complete control? Hoping to sell license to business clients?

The main effect of making the source code unavailable is that the program internals cannot be inspected and analyzed. It’s only a binary that is available; people can obtain it and run it, without being able to modify it.

All the general arguments pro open-source software apply to the scientific software. Obstructing the software has several negative results.

  • Fewer people use the program.
  • None of the users can adapt or fix the program.
  • Other developers cannot learn from the program, or base new work on it.

I think that should be enough, but I would like to add two points that apply specifically to scientific software.

Loss of credibility

In scientific research, they key point is to prove and verify the results. With closed source, other scientists can only run the software and examine the output, without being able to check if the program really does what the paper describes. Being unable to do that, the rest of the world has to believe the authors. Do they have something to hide?

I don’t think scientists would actually question a paper as a whole because of the source code unavailability, but it certainly makes raises some concerns about its quality.

It’s antisocial

Scientific research is usually funded from government grants, which in turn come from tax payers. Scientists are not corporations who fund themselves. It’s the society, it’s the other people who effectively pay for the research (through various funding organizations), and I believe it’s a moral obligation to, if they share their research results, share them fully.

By not releasing the source code, they only make an impression of publishing their work. They can get away with that, because many people will think that, if they can download the program and run it, it’s “available”. But it’s not!

Please, dear scientists, do what guys from projects such as GNU Octave, or R project do: share your source code. Everybody will benefit from it, including your projects and yourselves.

Author: automatthias

  1. “Publish or Perish”


    Scientists funding is tied to how much they publish. That may include how well-researched it is. But not how competently-prepared it is, unless by coincidence.

    So, theory is a concern, results are a concern, but software isn’t (it’s only engineering, after all). They can skimp on the software. It’s probably an ad-hoc single-use hack, and far too embarrassing to show to the world. The results are *probably* “right”, but the risk of them being discredited on the grounds of a stupid bug are more than the author’s job is worth.

