Disturbing revelations about climate science

on Nov 25 in Climate Change, Software tagged by Trevor Hicks

I’m not talking about the hardball or dirty pool revealed in the leaked emails and data from the Climatic Research Unit at East Anglia University.  I never had any reason to suspect that academics were nicer to each other than, say, investment bankers.  Nope, what has me bothered is actually much more relevant to the results of the research.  Some of the email and source code that was leaked also indicates some rather shockingly poor software engineering processes that call into question the validity of any results produced by the standard climate models.

As far as I can tell, there are three legs of support for the notion that increased concentrations of greenhouse gases in the atmosphere such as CO2 resulting from emissions from human activity are warming the planet.  First, there is the theoretical science based on the physics of the gases and water vapor, the energy of the sun’s rays and how they are affected by passing through or reflecting off various materials, etc.  The leak shows that the global warming side has been very aggressive about suppressing academic dissent, but it certainly may be the case of the police framing a guilty man.  That is, their dirty tricks don’t, in and of themselves, discredit the science.  Next there is the empirical data, both from modern measurements and from other sources such as tree rings and ice cores that allow us to track and correlate the changes over time in temperature and greenhouse gas concentrations.  And third there are the computer models that are intended to apply the theoretical science to the accumulated data and make predictions about what will be observed in the future.  Unfortunately this leak should cause us to doubt the validity of both the data and the models.

First with respect to the data, it’s clear that there was no source control on the data.  Thus there is no plausible chain of custody if you will that can give us confidence that the raw data being fed into the model is a faithful representation of what was actually recorded by the various sensors or other data collection methods.  This wouldn’t be a huge problem in isolation, but it’s clear that some files of raw data were lost and subsequently ’synthesized’  or reverse engineered from secondary sources.  Also, there is no controlled taxonomy or metadata leaving the interpretation of the various bits of data in question.  The emails and comments in the leak also indicate that some liberties were taken in data input routines to massage or alter the data to ensure that the models would run correctly.  That is potentially really scary.  And I say potentially on purpose, none of this yet amounts to any smoking gun that the data is wrong, it just means we should have a lot less trust in its fidelity.  What we don’t know is the sensitivity of the model to the possible problems in the data.

But we have the same sets of problems with the source code for the models as well.  No source or revision control processes appear to exist.  There was clearly a ‘band-aid’ approach to debugging and no systematic review of the architecture of the model at any time.  And worst of all, testing appears to be a completely foreign notion to this project.  There’s no evidence of any kind of comprehensive test plan and no regression testing which means there is no reason to have any confidence that the model is an accurate implementation of the science.  And unlike similarly complex modeling systems like I’ve worked on throughout my career, there isn’t a good way to really ‘field test’ the predictions of the model against what is observed in actual operation because the time scales are so long.  The approach to making the model’s predictions fit out-of-sample real-world measurements appears to be to apply arbitrary tweaks to either the code or the data to get the pre-ordained ‘correct’ answer.

This is a very similar software system to what would be used by a hedge fund using a quantitative trading strategy.  A hedge fund with a similar approach to quality in its software would probably go out of business very rapidly unless it was just spectacularly lucky.

I’ve tried to be careful to point out that what I’ve seen so far is certainly not a definitive refutation of global warming, far from it.  It’s quite possible that all of these faulty processes and errors in the code or data make very little difference in the accuracy of the published results and forecasts.  But it also forces us to consider that it’s quite possible that the data and climate models are utter garbage.  With so many people around the world now poring over the emails and source code I think we will get a clearer picture on the trustworthiness of the data and models soon.

Personally, I’m still willing to state that I think man-made global warming is probably real and even if the models are (or ought to be) just thrown away I expect that opinion will not immediately change.  I certainly have a lot less confidence in that opinion than before.  In fact, it gives me enough pause to think that we probably ought to have a thorough review and possibly a rewrite of the data and models before we undertake any expensive emissions reductions policies though.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • LinkedIn
  • TwitThis
  • Yahoo! Buzz
  • Ping.fm
  • Reddit
  • StumbleUpon
  • Technorati

There are no comments yet, add one below.

Leave a Comment