Guy Callendar vs the GCMs

As many readers have already surmised, the “GCM-Q” model that visually out-performed the Met Office CMIP5 contribution (HadGEM2) originated with Guy Callendar, and, in particular, Callendar 1938 (QJRMS). My attention was drawn to Callendar 1938 by occasional CA reader Phil Jones (see here and cover blog post by co-author Ed Hawkins here.) See postscript for some comments on these articles.
Callendar 1938 proposed (his Figure 2) a logarithmic relationship between CO2 levels and global temperature (expressed as an anomaly to then present mean temperature.) In my teaser post, I used Callendar’s formula (with no modification whatever) together with RCP4.5 total forcing and compared the result to the UK Met Office’s CMIP5 contribution (HadGEM2) also using RCP4.5 forcing.
In today’s post, I’ll describe Callendar’s formula in more detail. I’ll also present skill scores for global temperature (calculated in a conventional way) for all 12 CMIP5 RCP4.5 models for 1940-2013 relative to simple application of the Callendar formula. Remarkably, none of the 12 GCM’s outperform Callendar and 10 of 12 do much worse.
I’m not arguing that this proves that Callendar’s parameterization is therefore engraved in stone. Callendar would undoubtedly have been the first to say so. It is undoubtedly rather fortuitous that the parameters of Callendar’s Figure 2 outperform so many GCMs. The RCP4.5 forcing used in my previous post included an aerosol history, the provenance of which I have not parsed. I’ve done a similar reconstruction using RCP4.5 GHG only with a re-estimate of the Callendar parameters, which I will show below.
Guy Callendar
Guy Callendar (see profile here) seems entirely free of the bile and rancor of the Climategate correspondents that characterizes too much modern climate science.
Callendar’s life got off to a good start in 1898: he was born in Canada (though he was raised and lived in England). (The next major AGW figure, Gilbert Plass, was born and brought up in Canada, though he later moved to the U.S.) He was the son of a prominent physicist, Hugh Callendar, who was succeeded at Montreal’s McGill University by Ernest Rutherford. Hugh Callendar appears to have been very prominent in his day and, among other activities, had developed steam tables that were widely used in industry. (Much of the present profile is drawn from here.)
Guy Callendar lost one eye in a childhood accident, but nonetheless was a keen tennis player, reaching the finals of the club singles championship in 1928 at Ealing Lawn Tennis Club and winning the club doubles championship at Horsham Lawn Tennis Club at the age of 49 in 1947. Not easy for someone with only one eye.
Callendar earned a certificate in Mechanics and Mathematics in 1922 at City & Guilds College and then went to work for his father examining the physics of steam until his father’s death in 1930. In 1938, Callendar was employed as a “steam technologist” by the British Electrical and Allied Industries Research Association and his seminal 1938 paper was therefore communicated to the Royal Meteorological Society by Dr G.R. Dobson, F.R.S:

Although Callendar’s qualifications would undoubtedly lead a modern Real Climate or Skeptical Science reader to dismiss him as suffering from Dunning-Kruger syndrome, Callendar (1938) is the first article that provides a clear scientific basis for modern AGW theory, albeit of a low-sensitivity and unhysterical type. Callendar’s detailed and first-hand technical expertise on steam and water vapour enabled him to articulate the infrared properties of increased carbon dioxide in the atmosphere, an understanding that appears to have eluded the contemporary establishment, whose views seem related to modern skydragons. Indeed, the structure of Callendar (1938) includes a discussion of issues that frequently trouble newcomers to the debate (spectral overlap, CO2 dissolution in the ocean) and, in my opinion, IPCC reports are diminished by not including modern reviews of such topics.
Callendar’s “Formula”
In Figure 2 of Callendar (1938) – see below, Callendar showed his estimate of the change in temperature (as an anomaly) arising from varying CO2 levels in “temperate” zones. Although Callendar did not characterize the curve in this figure as logarithmic, it obviously can be closely approximated by a log curve, as shown by the red overplot which shows a log curve fitted to the Callendar graphic. Its 2xCO2 sensitivity is 1.67 deg.

Figure 1. Callendar 1938 showing temperature zone relationship. Log curve (red) fitted by digitizing 13 points on the graphic and fitting a log curve by regression: y= -2.635113 + 2.410493 *x. This yields sensitivity of 2.41 *log(2) = 1.67 deg.
Callendar did not show corresponding graphics for tropical or polar regions, but commented that the results for other zones were similar. Nor did Callendar show the derivation of his results in Callendar (1938). It is my understanding that he derived these results from his knowledge of the infrared properties of carbon dioxide and water vapour (and not by curve fitting to observations, though he had also carried out his own estimates of changes in global temperature.)
Callendar implicitly discounted the arguments for substantial positive feedbacks on initial forcing that characterize subsequent GCMs, observing the nagative feedback from clouds as follows:

On the earth the supply of water vapour is unlimited over the greater part of the surface, and the actual mean temperature results from a balance reached between the solar ” constant ” and the properties of water and air. Thus a change of water vapour, sky radiation and tempcrature is corrected by a change of cloudiness and atmospheric circulation, the former increasing the reflection loss and thus reducing the effective sun heat.

“GCM-Q”
Although Callendar (1938) included projections of future carbon dioxide emissions and levels, Callendar had no inkling of the astonishing economic development of the second half of the 20th century. As Gavin Schmidt has (reasonably in this case) observed in connection with Hansen’s Scenario A, the ability to forecast future emissions is unrelated to the evaluation of the efficacy of a model’s ability to estimate temperature given GHG levels.
I thought that it would be an interesting exercise to see how Callendar’s 1938 “formula” applied out-of-sample when applied to observed forcing and compare it to the UK contribution to CMIP5 (HadGEM2), which I had been discussing. For my comparison, I used IPCC RCP4.5 forcing as a mainstream estimate, inputting their “CO2 equivalent” of all forcings (RCP4.5 column 2). It turns out that RCP4.5 column 2 “CO2 equivalent” includes aerosols converted to ppm CO2 somehow, as well as the other GHG gases (CH4, N2O, CFCs etc) plus aerosols (converted to ppm CO2). At present, I don’t know how these estimates have been constructed and make no comment on their validity: for this exercise, I am merely taking them as face value for a relatively apples-to-apples comparison.
Callendar’s relationship was based on anomaly to “present mean temperature”. For my calculations, I adopted the 1921-1940 anomaly as an interpretation (differences from this are slight) and therefore centered HadCRUt4 observations and HadGEM2 on 1921-40 for comparison.
Callendar’s Figure 2 is for “temperate” zones but he reported that the relationship was “remarkably uniform for the different climate zones of the earth”. For the purpose of the exercise, I therefore used the relationship of Callendar’s Figure 2 to estimate GLB temperature, recognizing that the parameters of this figure would only be an approximation to Callendar’s GLB calculation. I have not examined whether the Callendar formula might work better or worse for 60S-60N, as, in carrying out the exercise, I was not taking the position that the parameters in the Callendar formula were “right” – only seeing what would result.
Here’s what resulted (as I showed in the previous post). A reconstruction from the Callendar 1938 formula applied to RCP4.5 CO2 equivalent seemingly out-performed the HadGEM2 GCM. While some readers presumed that “GCM-Q” must have incorporated some knowledge or information of second-half 20th century temperature history in the development of the “model”, this is not the case. “GCM-Q” directly used the formula implicit in Callendar 1938 Figure 2. (I realize that my interest in the results arises in large part from their coherence with subsequent observations, but it wasn’t as though I foraged around or did multiple experiments before arriving at the results that I showed here, the first runs of which I sent to Ross McKitrick and Steve Mosher.)

Figure 2. Temperature estimate using Callendar relationship versus HadGEM2.
Skill Scores
Next in today’s post, I will quantify the visual impression that “GCM-Q” outperformed HadGEM2 by using a skill score statistic that is commonplace in the evaluation of forecasts, estimating the “skill” of a model from the sum of squares of the residuals from the proposed model as opposed to a base case, as expressed below where obs is a vector of observations and “model” and “base” are vectors of estimates.

skill = 1 - sum( (model-obs)^2)/sum( base-obs)^2)

This calculation is closely related to the RE statistic in proxy reconstructions, where the base case is the mean temperature in the calibration period. However, the concept of a skill score is more general and long preceded the use of RE statistics in proxy reconstructions. In today’s calculation, I used 1940-2013 for comparison (using 2013 YTD as an estimate of 2013.)
In addition to calculating the skill score of HadGEM2, I also calculated skill scores for the 12 CMIP5 RCP4.5 averages on file at KNMI. These skill scores (perfect is 1) are shown in the barplot below:

Figure 2. Skill Scores of CMIP5 RCP4.5 models relative to Callendar 1938.
Remarkably, none of the 12 CMIP5 have any “skill” in reconstructing GLB temperature relative to the simple GCM-Q formula. Indeed, 10 of 12 do dramatically worse.
Aerosols
In the comments to my previous post, there was some discussion about the importance of aerosols and whether 20th century temperature history could be accounted for without invoking aerosols.
Directly using the Callendar 1938 “formula” on RCP4.5 GHG CO2 equivalent (RCP column 3) leads to a substantial overshoot of present temperatures. As an exercise, I re-calibrated a Callendar-style logarithmic relationship of temperature to RCP4.5 GHG and did the corresponding reconstruction of 20th century temperature history, once again calculating skill scores for each of the CMIP5 GCMs, this time against the Callendar-style estimate only using GHG (no aerosols), as shown in the graphic below:

Figure 3. Skill Scores of CMIP5 RCP4.5 models relative to re-calibrated Callendar-style estimate using GHGs only.
The temperature reconstruction using the reparameterization is shown in the graphic below. This reconstruction is not out-of-sample, as observations have been used to re-calibrate. Its climate sensitivity is lower than the Callendar 1938 model: it is 1.34 deg C.

Figure 4. As Figure 2 above, but including recalibrated temperature reconstruction using RCP4.5 GHG (column 3).
Comments
What does this mean? I’m not entirely sure: these are relatively new topics for me.
For sure, it is completely bizarre that a simple reconstruction from Callendar out-performs the CMIP5 GCMs – and, for most of them, by a lot. For the purposes of this observation, it is irrelevant that Callendar reconstructed temperature zones (both given his comment that other zones were remarkably similar and the fact that the specific parameters of Callendar Figure 2 are not engraved in stone). Even if the Callendar parameters had been calculated using the observed temperature history, it is surely surprising that such a simple formula can out-perform the GCMs, especially given the enormous amount of time, resources and effort expended in these GCMs. And, yes, I recognize that GCMs provide much more information than GLB temperature, but GLB temperature is surely the most important single statistic yielded by these models and it is disquieting that the GCMs have no skill relative to a reconstruction using only the Callendar 1938 formula. As Mosher observed in a comment on the predecessor post, a more complicated model ought to be able to advance beyond the simple model and, if there is a deterioration in performance, there’s something wrong with the model.
From time to time, others have pointed out this ability of simple models (and a couple of readers have sent me interesting essays on this topic offline). In one sense, “GCM-Q” is merely one more example. However, the fact that the parameters were estimated in 1938 adds a certain shall-we-say piquancy to the results. Nor do I believe that one can ignore the relative coherence of Callendar’s low sensitivity results to observations in forming an opinion on the still highly uncertain issue of sensitivity. That GCM-Q performed so well out of sample would interest me if I were a climate modeler.
Third, all the GCMs that underperform the Callendar formula run too hot. It seems evident to me (and I do not claim that this observation is original) that the range of IPCC models do not fully sample the range of physically possible or even plausible GCMs at lower sensitivities. Perhaps it’s time that the climate community turned down some of the tuning knobs.
Finally, Callendar 1938 closed with the relatively optimistic comment that, in addition to the direct benefits of heat and power, there would be indirect benefits at the northern margin of cultivation, through carbon dioxide fertilization of plant growth and even delay the return of Northern Hemisphere glaciation:

it may be said that the combustion of fossil fuel, whether it be peat from the surface or oil from 10,000 feet below, is likely to prove beneficial to mankind in several ways, besides the provision of heat and power. For instance the above mentioned small increases of mean temperature would be important at the northern margin of cultivation, and the growth of favourably situated plants is directly proportional to the carbon dioxide pressure (Brown and Escombe, 1905): In any case the return of the deadly glaciers should be delayed indefinitely.

This last comment was noted up in Hawkins and Jones 2013, who sniffed in contradiction that “great progress” had subsequently been made in determining whether warming was “beneficial or not”, bowdlerizing Callendar by removing Callendar’s reference to direct benefits (heat and power) and carbon dioxide fertilization:

Since Callendar (1938), great progress has been made in understanding the past changes in Earth’s climate, and whether continued warming is beneficial or not. In 1938, Callendar himself concluded that, “the combustion of fossil fuel [. . .] is likely to prove beneficial to mankind in several ways” , notably allowing cultivation at higher northern latitudes, and because, “the return of the deadly glaciers should be delayed indefinitely”.

Postscript
As noted above, my attention was drawn to Callendar 1938 by occasional CA reader Phil Jones in Hawkins and Jones (2013) (here), which was discussed by coauthor Hawkins here.
Hawkins and Jones (2013) focused on one small aspect of Callendar’s work: his compilation of World Weather Records station temperature data into zonal and global temperature anomalies, in effect, delimiting Callendar, whose contribution was much more diverse, as a sort of John the Baptist of temperature accountancy, merely preparing the way for Phil Jones.
They noted that Callendar was “meticulous” in his work, an adjective that future historians will find hard to apply to present-day CRU. Hawkins and Jones observed that Callendar’s original working papers and station histories had been carefully preserved (at the University of East Anglia). The preservation of Callendar’s original work at East Anglia seems all the more remarkable given that Jones’ CRU notoriously reported that it had failed to preserve the original CRUTEM station data supposedly because of insufficient computer storage – an excuse that ought to have been soundly rejected by the climate community at the time, but which seems even more laughable given the preservation of Callendar’s records.
Postscript2: “Were Callendar’s Estimates Accurate?” is in the background to Richard Allen’s pod-snippet linked by Bishop Hill here. The screenshot in the background appears to be from the poster presentation by Hawkins and Jones – an odd choice of background.

Source