Ben Marwick gets into How computers broke science – and what we can do to fix it. The issue has a current high profile example of the problem in climatology right now. The Congressional brouhaha over an agency’s manipulation of data and that agency’s reluctance to comply with requests for information is a case in point.
The problem:
For most of the history of science, researchers have reported their methods in a way that enabled independent reproduction of their results. But, since the introduction of the personal computer – and the point-and-click software programs that have evolved to make it more user-friendly – reproducibility of much research has become questionable, if not impossible. Too much of the research process is now shrouded by the opaque use of computers that many researchers have come to depend on. This makes it almost impossible for an outsider to recreate their results.
…
The problem is that most modern science is so complicated, and most journal articles so brief, it’s impossible for the article to include details of many important methods and decisions made by the researcher as he analyzed his data on his computer. How, then, can another researcher judge the reliability of the results, or reproduce the analysis?
Stanford statisticians Jonathan Buckheit and David Donoho described this issue as early as 1995:
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.
…
It means all those private files on our personal computers, and the private analysis tasks we do as we work toward preparing for publication should be made public along with the journal article.
This would be a huge change in the way scientists work. We’d need to prepare from the start for everything we do on the computer to eventually be made available for others to see. For many researchers, that’s an overwhelming thought. Victoria Stodden has found the biggest objection to sharing files is the time it takes to prepare them by writing documentation and cleaning them up. The second biggest concern is the risk of not receiving credit for the files if someone else uses them.
What to do? FOSS is gaining attention.
Currently, these are the tools and methods of the avant-garde, and many midcareer and senior researchers have only a vague awareness of them. But many undergraduates are learning them now. Many graduate students, seeing personal advantages to getting organized, using open formats, free software and streamlined collaboration, are seeking out training and tools from volunteer organizations such as Software Carpentry, Data Carpentry and rOpenSci to fill the gaps in their formal training.
Measurements in the lab have always been described the tools being used and any peculiarities involved in the test and measurement setup and procedure as a routine part of reports. What Ben notes is that this bit of background in the reporting of investigations has become rather sloppy when it comes to twiddling with numbers using modern computer technologies. The suggestion for a fix is to go back to fundamentals. Describe the methods used to obtain and manipulate measurements. Provide the software and the well sourced data in a manner that anyone can replicate and examine.
Complicated? That’s an excuse. When there are enthusiasts reverse engineering cheap VHF-UHF radio firmware, hackers trying to see if they can get past security barriers in cell phones and business databases as entertainment, and FOSS projects such as the Linux kernal and the GNU project, excuses don’t cut it. What should cause wonderment is why it is only just now that “many graduate students” are just beginning to see personal advantages in the FOSS paradigm and why proprietary data formats such as those native to Microsoft Office, are so predominant. It’s been more than thirty years since VisiCalc took the  financial world by storm. Isn’t “avant-garde” getting a bit stale for this sort of technology?