Wednesday, December 2, 2009

Why the CRU is misinforming about the "Harry" Readme...

Berchmans posted below (in comments) the response by CRU to the "HARRY_Readme.txt" file that I referenced earlier:

HARRY_read_me.txt. This is a 4 year-long work log of Ian (Harry) Harris who was working to upgrade the documentation, metadata and databases associated with the legacy CRU TS 2.1 product, which is not the same as the HadCRUT data (see Mitchell and Jones, 2003 for details). The CSU TS 3.0 is available now (via ClimateExplorer for instance), and so presumably the database problems got fixed. Anyone who has ever worked on constructing a database from dozens of individual, sometimes contradictory and inconsistently formatted datasets will share his evident frustration with how tedious that can be.
In other words, they are suggesting two things: One that CRU TS 2.1 is not connected to later data. And two, that CSU TS 3.0 has fixed the problems of earlier data sets. These claims are not supported by an examination of the emails and files that were discoved, it seems to me.

See here for example, which is from  the 15,000 or so lines in the Harry_read_me text:

"So, you release a dataset that people have been clamouring for, and the buggers only start


using it! And finding problems. For instance:





Hi Tim (good start! -ed)



I realise you are likely to be very busy at the moment, but we have come across something in

the CRU TS 3.0 data set which I hope you can help out with.



We have been looking at the monthly precipitation totals over southern Africa (Angola, to be

precise), and have found some rather large differences between precipitation as specified in

the TS 2.1 data set, and the new TS 3.0 version. Specifically, April 1967 for the cell 12.75

south, 16.25 east, the monthly total in the TS 2.1 data set is 251mm, whereas in TS 3.0 it is

476mm. The anomaly does not only appear in this cell, but also in a number of neighbouring

cells. This is quite a large difference, and the new TS 3.0 value doesn't entirely tie in

with what we might have expected from the station-based precip data we have for this area.

Would it be possible for you could have a quick look into this issue?



Many thanks,



Daniel.



--------------------------------------------------------

Dr Daniel Kingston

Post Doctoral Research Associate

Department of Geography

University College London

Gower Street

London

WC1E 6BT

UK

Email d.kingston@ucl.ac.uk

Tel. +44 (0)20 7679 0510





Well, it's a good question! And it took over two weeks to answer. I wrote angola.m, which

pretty much established that three local stations had been augmented for 3.0, and that

April 1967 was anomalously wet. Lots of non-reporting stations (ie too few years to form

normals) also had high values. As part of this, I also wrote angola3.m, which added two

rather interesting plots: the climatology, and the output from the Fortran gridder I'd just

completed. This raised a couple of points of interest:



1. The 2.10 output doesn't look like the climatology, despite there being no stations in

the area. It ought to have simply relaxed to the clim, instead it's wetter.



2. The gridder output is lower than 3.0, and much lower than the stations!



I asked Tim and Phil about 1., they couldn't give a definitive opinion. As for 2., their

guesses were correct, I needed to mod the distance weighting. As usual, see gridder.sandpit

for the full info.

To my reading, this suggests a couple of things: 1) Clearly they hadn't corrected the problems in CRU TS 3.0 either. And secondly, if 2.10 wasn't connected to 3.0, why would they examine 2.10 for clarification on the data included in 3.0?

It's worth noting as well that the HARRY_read_me.txt file was still being edited/added to after CRU 3.0 was released. Why? Because (to use the words of the code expert I linked to earlier):

They keep trying to match the results that came from v2.10 because it was made public with v3.0. The only problem with that is the catalog of errors that have been found in the 2.10 code, the databases, etc.




So now they're going back and doing **** to make it look right....
Rather than acknowledge that 3.0 had issues, they decided it was better to try and erase the problems in version 2.10 so that the provenance of 3.0 couldn't be questioned.

1 comment:

  1. Btw, I'm only highlighting one of the many comments that tie TS 3.0 to 2.10. This above isn't the sole connector. I just wanted to use it as an example.

    ReplyDelete