Memorandum submitted by Professor Darrel Ince (CRU 34)

 

 

I am Professor of Computing at the Open University and the author of 18 books and over a hundred papers on software topics. My submission to the committee is an expanded version of an article that I wrote for the Guardian and was published on 5th February 2010.

1. First a disclosure: I am not a fan of computer modelling. However, most of the modelling work that has been carried out is in a sense irrelevant in that there is plenty of evidence that the earth is changing and that a potential result of this could be cataclysm. Because of the high stakes I support some of the efforts to bring our planet back to what it was forty years ago.

2. My favourite quote about science is by Karl Popper: almost certainly the most influential philosopher of science to this day

'Every intellectual has a very special responsibility. He has the privilege and opportunity of studying. In return, he owes it to his fellow men (or 'to society') to represent the results of his study as simply, clearly and modestly as he can. The worst thing that intellectuals can do - the cardinal sin - is to try to set themselves up as great prophets vis-a-vis their fellow men and to impress them with puzzling philosophies. Anyone who cannot speak simply and clearly should say nothing and continue to work until he can do so.'

3. This is one of the reasons why I feel strongly about one or two of the issues you will be considering.

4. One of the spin-offs from the emails that were leaked from the Climate Research Unit at the University of East Anglia is the light that was shone on the role of program code in climate research. There is a particularly revealing set of emails that were produced by a programmer at UEA known as Harry ReadMe. The emails indicate someone struggling with undocumented, baroque code and missing data which forms part of one of the three major climate databases used by researchers throughout the world

5. A number of climate scientists have refused to publish their computer programs; what I want to suggest is that this is both unscientific behaviour and, equally importantly ignores a major problem: that scientific software has got a poor reputation for error.

6. There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules occurred at the rate of one in every 7 interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error-just one-will often invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from 6 significant figures to 1 significant figure during the running of programs.

7. Hatton and other researchers' work indicates that scientific software is often of poor quality. What is staggering about the research that has been done is that it examines scientific software that is commercial: produced by software engineers who have to undergo a regime of thorough testing, quality assurance and a change control discipline known as configuration management. Scientific software developed in our universities and research institutes is often produced by scientists with no training in software engineering and with no quality mechanisms in place and so, no doubt, the occurrence of errors will be even higher. The Climate Research unit Harry ReadMe files are a graphic indication of such working conditions

8. Computer code is also at the heart of a scientific issue. One of the key features of science is deniability: if you erect a theory and if anyone produces evidence that it is wrong then it falls. This is how science works: by openness, by publishing minute details of an experiment, some mathematical equations or a simulation; by doing this you embrace deniability. This does not seem to have happened in climate research. Researchers have refused to release their computer programs-even though they are still in existence and not subject to commercial agreements. For example, Professor Mann's initial refusal to give up the codes that were used to construct the hockey stick model that demonstrated that human-made global warming is a unique artefact of the last few decades (He has now released all his code).

9. The situation is by no means bad across academia: most academics release code and data. Also, a number of journals, for example those in the area of economics and econometrics, insist on an author lodging both the data and the programs with the journal before publication. There's also an object lesson in a landmark piece of mathematics: the proof of the four colour conjecture by Apel and Haken. They showed that in a map the regions can be coloured using at most four colours so that no two adjacent regions have the same colour. Their proof was controversial in that instead of an elegant mathematical exposition they partly used a computer program. Their work was criticised for inelegance, but it was correct and the computer program was published for checking.

10. The problem of large-scale scientific computing and the publication of data is being addressed by organisations and individuals that have signed up to the idea of the fourth paradigm. This was the idea of Jim Grey, a senior researcher at Microsoft, who identified the problem well before the Climategate affair. There is now a lot of R and D work going into mechanisms whereby the web can be used as a repository for scientific publications and more importantly the computer programs and the huge amount of data that they use and generate. A number of workers are even devising systems that show the progress of a scientific idea from first thoughts to the final published papers. The problems with climate research will do doubt provide an impetus for this work to be accelerated

11. I believe that, if you are publishing research articles that use computer programs, if you want to claim that you are engaging in science, the programs are in your possession and you will not release then you are not a scientist; I would also regard any papers based on the software as null and void. There are of course some exceptions which would apply both now and in the past and would excuse many of those who have refused to release code and will in the future refuse: for example, a scientist may have a commercial agreement with some body for the whole software, or part of the code is commercial; another issue which complicated Prof Mann's position is that of intellectual property rights. Another issue is the fact that developing software is hard to do and considerable effort goes into it. There should be a period in which it is not released so that a researcher can make the most of its efforts by, for example, publishing more papers. Steve Schneider of MIT has suggested two years.

12. There are a number of ways that this can be enforced: by journals insisting that code and data be lodged with them; by the research councils insisting that as a condition of granting research funds that all data and software be lodged somewhere and a failure to do this would result in no further funding while this occurs; and our universities making it a clause in an academic's terms and conditions that lodging data and software should occur.

13. I would be happy to meet the committee.

 

Professor Darrel Ince

February 2010