Two Minutes to Midnight

There is much in the news about how IPCC will handle the growing discrepancy between models and observations – long an issue at skeptic blogs. According to BBC News, a Dutch participant says that “governments are demanding a clear explanation” of the discrepancy. On the other hand, Der Spiegel reports:

German ministries insist that it is important not to detract from the effectiveness of climate change warnings by discussing the past 15 years’ lack of global warming. Doing so, they say, would result in a loss of the support necessary for pursuing rigorous climate policies.

According to Der Spiegal (h/t Judy Curry), Joachim Marotzke, has promised that the IPCC will “address this subject head-on”. Troublingly, Marotzke felt it necessary to add that “climate researchers have an obligation not to environmental policy but to the truth”.
Unfortunately, as Judy Curry recently observed, it is now two minutes to midnight in the IPCC timetable. It is now far too late to attempt to craft an assessment of a complicated issue.
Efforts to craft an assessment on the run are further complicated by past failures and neglect both by IPCC and the wider climate science community. In its two Draft Reports sent to external scientific review, while IPCC mostly evaded the problem, its perfunctory assessment of the developing discrepancy between models and observations, such as it was, included major errors and misrepresentations, all tending in the direction of minimizing the issue.
IPCC has a further dilemma in coopering up an assessment on the run. Although the topic is obviously an important one, it received negligible coverage in academic literature, especially prior to the IPCC publication cutoff date, and the few relevant peer-reviewed articles (e.g. Easterling and Wehner 2009; Knight et al 2009) are unconvincing.
The IPCC assessment has also been compromised by gatekeeping by fellow-traveler journal editors, who have routinely rejected skeptic articles on the discrepancy between models and observations or pointing out the weaknesses of articles now relied upon by IPCC. Despite exposure of these practices in Climategate, little has changed. Had the skeptic articles been published (as they ought to have been), the resulting debate would have been more robust and IPCC would have had more to draw on its present assessment dilemma. As it is, IPCC is surely in a well-earned quandary.
Interested readers should also consult Lucia’s recent post which also comments on leaked IPCC draft material. Lucia’s diagnosis of IPCC’s quandary is very similar to mine. She also uses boxplots.

IPCC Statements
First, I’ll briefly review how IPCC’s position on the discrepancy has developed.
The First Order Draft stated (chapter 1):

The [temperature] observations through 2010 fall within the upper range of the TAR projections (IPCC, 2001) and roughly in the middle of the AR4 model results.

This assertion was flat-out untrue. Their Figure 1.4 (see below), which purported to support this claim, was not derived from peer reviewed literature and was botched. They misplaced observations relative to AR4 model projections (presumably due to an error in transposing reference periods).

Figure 1. IPCC AR5 First Draft Figure 1.4. The brown wedge purports to show AR4 projections. HadCRUT4 values have been overplotted in yellow (and within amendment correspond to the black squares plotted by IPCC) and appear to support IPCC’s summary. However, IPCC mislocated the AR4 and other projections. The red arrows show the actual AR4 envelope for 2005, 2010 and 2015 (digitized from the original AR4 diagram). Observations are outside the properly plotted envelope.
While the First Draft was a “draft”, the error nonetheless passed IPCC’s own internal review process. The error also went in a “favorable” direction. In the Second Draft, IPCC (chapter 1) re-iterated the assertion that observations were “in the middle” of projections:

the globally-averaged surface temperatures are well within the uncertainty range of all previous IPCC projections, and generally are in the middle of the scenario ranges”.

However, their revised Figure 1.4 directly contradicted their claim. Observations since 2007, including the most recent ones, were now outside the AR4 envelope, as shown below.

Figure 2. IPCC AR5 Second Draft Figure 1.4 with annotations: red squares are 2012 and 2013 (to date) HadCRUT4. The orange wedge illustrates combined AR4 A1B-A1T projections. The yellow arrows show verified confidence intervals in 2005, 2010 and 2015 digitized from the original AR4 diagram (Figure 10.26) for A1B. Observed values have been outside the AR4 envelope for all but one year since publication of AR4. IPCC authors added a grey envelope around the AR4 envelope, presumably to give rhetorical support for their false claim about models and observations; however, this envelope did not occur in AR4 or any peer reviewed literature.
In a recent article in National Post, Ross
McKitrick pointed out the inconsistency between IPCC’s language and its graphic, acidly observing:

The IPCC must take everybody for fools. Its own graph shows that observed temperatures are not within the uncertainty range of projections; they have fallen below the bottom of the entire span.

Reiner Grundmann at Klimazweibel also recently drew attention to the discrepancy in this graphic (citing McKitrick).
SPM Draft, June 2013
The Summary for Policy Makers attached to the Second Draft avoided any discussion of the discrepancy between models and observations.
Presumably responding to demands that the discrepancy be addressed, the Government Draft in June 2013 added a lengthy section (Box 9.2) purporting to address the discrepancy between models and observations and the Summary for Policy Makers included two somewhat inconsistent discussions of this issue in connection with both chapter 9 (Evaluation of Climate Models) and chapter 10 (Detection and Attribution).
The chapter 10 summary attributed the discrepancy in “roughly equal measure” to internal variability and a reduced trend in radiative forcing due to recent volcanic activity and downward solar phase:

The observed reduction in warming trend over the period 1998-2012 as compared to the period 1951- 2 2012, is due in roughly equal measure to a cooling contribution from internal variability and a reduced trend in radiative forcing (medium confidence). The reduced trend in radiative forcing is primarily due to volcanic eruptions and the downward phase of the current solar cycle. However, there is low confidence in quantifying the role of changes in radiative forcing in causing this reduced warming trend. {Box 9.2; 10.3.1; Box 10.2}

The chapter 9 summary also conceded the discrepancy, but attributed it “to a substantial degree” to natural variability, with “possible” contributions from forcing – mentioning aerosols as well as solar and volcanics – and, “in some models”, to too strong a response to greenhouse forcing:

Models do not generally reproduce the observed reduction in surface warming trend over the last 10-15 years. There is medium confidence that this difference between models and observations is to a substantial degree caused by unpredictable climate variability, with possible contributions from inadequacies in the solar, volcanic, and aerosol forcings used by the models and, in some models, from too strong a response to increasing greenhouse-gas forcing. {9.4.1, 10.3.1, 11.3.2; Box 9.2} [SPM – evaluation]

The IPCC Second Draft had cited four articles supposedly supporting the consistency of models and observations, three of which were also cited in the Government Draft (Mitchell et al 2012b GRL does not exist at GRL nor can an article by its title be located):

it is found that global temperature trends since 1998 are consistent with internal variability overlying the forced trends seen in climate model projections (Easterling and Wehner, 2009; Mitchell et al., 2012b); see also Figure 1.1, where differences between the observed and multimodel response of comparable duration occurred earlier. Liebmann et al. (2010) conclude that observed HadCRUT3 global mean temperature trends of 2-10 years ending in 2009 are not unusual in the context of the record since 1850. After removal of ENSO influence, Knight et al. (2009) concluded that observed global mean temperature changes over a range of periods to 2008 are within the 90% range of simulated temperature changes in HadCM3.

Both Easterling and Wehner 2009 and Knight et al 2009 had been severely criticized by Lucia in blog posts ( see here here here here here.) Lucia was sufficiently annoyed by the defects in Easterling and Wehner 2009 that she submitted a comment to GRL. Though her comment was accurate on all points, it was bench rejected by GRL. (see retrospective here). Subsequently, Lucia was co-author of another submission on the discrepancy between models and observations (a group that ecumenically included both Pat Michaels and James Annan), but this too was rejected (see discussion at Judy Curry’s here).
The criticisms in both the Liljegren comment and the Michaels et al submission were valid at the time and remain valid today. Many of their criticisms surfaced recently in Fyfe et al 2013, though this did not rebut Easterling and Wehner 2009 or Knight et al 2009 as directly. Fyfe et al 2013 was not published until after the IPCC deadline and, thus, Easterling and Wehner 2009 and Knight et al 2009 remained unrebutted in academic journals and were essentially all that was in the cupboard for the IPCC assessment.
Ross and I had experienced something similar in our comment on Santer et al 2008, which was likewise rejected by the original journal (International Journal of Climatology.) A couple of years later, Ross managed to get much of this material into print as McKitrick et al 2010. However, in the meantime, Santer et al 2008 continued to be cited in assessment reports. As an ironic footnote to our earlier controversy, AR5 now cites McKitrick et al 2010 and concedes that the discrepancy between models and observations in the tropical troposphere is unresolved.
The Problem Re-stated
IPCC’s Government Draft attempt to frame the discrepancy between models and observations as due to “natural variability” is ultimately a statistical problem – never a strong point of IPCC authors. Further, as noted above, the statistical analysis in the Government Draft purporting to support “natural variability” is not drawn from previously published literature, but was developed within the chapter (despite frequent protestations that IPCC does not itself do research.)
IPCC conceded in the Government Draft that there has been a 15-year “hiatus” (their term) in temperature increase, but assert that “individual decades” of hiatus are also “exhibited” in climate models, during which time the “energy budget is balanced” by energy uptake in the deep ocean:

However, climate models exhibit individual decades of GMST trend hiatus even during a prolonged phase of energy uptake of the climate system (e. g., Figure 9.8, (Easterling and Wehner, 2009; Knight et al., 2009)), in which case the energy budget would be balanced by increasing subsurface-ocean heat uptake (Meehl et al., 2011; Guemas et al., 2013; Meehl et al., 2013a).

However, pointing to the deep ocean doesn’t actually resolve the discrepancy between models and observations, since, as Hans von Storch recently observed, climate models did not include this effect.

Among other things, there is evidence that the oceans have absorbed more heat than we initially calculated. Temperatures at depths greater than 700 meters (2,300 feet) appear to have increased more than ever before. The only unfortunate thing is that our simulations failed to predict this effect.

IPCC also asserted that similar hiatuses are “common” in the instrumental record:

15-year-long hiatus periods are common in both the observed and CMIP5 historical GMST time series (see [Figure 9.8] and also Section 2.4.3, Figure 2.20; Easterling and Wehner, 2009, Liebmann et al., 2010).

As shown below, there is indeed a lengthy “hiatus” in the 20th century record, stretching almost 40 years from the 1940s until 1980. However, IPCC is surely being a bit sly in saying that 15-year-long hiatus periods were “common” in the 20th century. It is far more reasonable to say that there was a steady temperature increase from the 19th century to the 1940s, followed by a 30-40 year hiatus, then a 30-year period of increase to the end of the century.

Figure 3. HadCRUT4 GLB (black) versus CMIP5 ensemble average (red). Note the lengthy hiatus from the 1940s to 1980. Also note the divergence between models and observations in the 21st century.
The simplest inspection of the above graphic also shows important differences between the present hiatus and the long hiatus between the 1940s and 1980. In the long earlier hiatus, the models ran cooler than observations, whereas the opposite is the case right now. The model ensemble has been running hot for about 14 years and counting. Despite assertions by climate scientists of the supposed statistical insignificance of the divergence, in fact, it is, so to speak, unprecedented: there is no corresponding period in which models ran hot for such an extended period. In most statistical circumstances, residuals that consistently run in one direction at the end of a sample give grounds for statistical concern and not reassurance.
The suddenly-fashionable attribution of the present hiatus to unmodeled energy accumulation in the deep ocean also invites questions about the earlier hiatus, which the climate “community” conventionally attributes to aerosols. There is no independent record of historical aerosol levels, which (e.g. the prominent GISS series by Hansen’s group) have primarily been developed by climate modelers seeking to explain the long hiatus. Skeptics have long argued that aerosol histories have been used as a sort of deus ex machina to paper over excessively sensitive climate models.
Once again, IPCC invoked both “volcanic” and “aerosol” forcing as possible contributors for the present hiatus, but one feels that these efforts were somewhat half-hearted, though they did make their way to the SPM. The failure of IPCC scientists to draw attention in real time to the supposedly responsible volcanic events inevitably compromises any attempts to do so after the fact.
Thus, the sudden interest in positing energy accumulation in the deep ocean.
However, if the present hiatus is attributed to an unmodeled accumulation of energy in the deep ocean, how do we know that something similar didn’t happen during the long earlier hiatus? Could some portion of the earlier hiatus be due to deep ocean accumulation as opposed to aerosols? It’s a big door that’s being opened.
Opening the door also opens up questions about the potential length of the present hiatus. If unmodeled deep ocean processes are involved, how can we say with any certainty that the present hiatus won’t extend for 30-40 years?
Boxplots
In the (new) Box 9.2 of the Government Draft, IPCC conceded that recent 15-year observations run below models, but argue that 15-year trends ending with the big 1998 El Nino undershoot models.

an analysis of the full suite of CMIP5 historical simulations (augmented for the period 2006-2012 by RCP4.5 simulations, Section 9.3.2) reveals that 111 out of 114 realisations show a GMST trend over 1998-2012 that is higher than the entire HadCRUT4 trend ensemble …
During the 15-year period beginning in 1998, the ensemble of HadCRUT4 GMST trends lies below almost all model-simulated trends whereas during the 15-year period ending in 1998, it lies above 93 out of 114 modelled trends.

They then assert that models and observations cohere over the 62-year period from 1951-2012, concluding that there is therefore “very high confidence” in the models and that the 15-year discrepancies are mere fluctuations with the 1998 El Nino skewing recent comparisons:

Over the 62-year period 1951- 2012, observed and CMIP5 ensemble-mean trend agree to within 0.02 ºC per decade (Box 9.2 Figure 1c; CMIP5 ensemble-mean trend 0.13°C per decade). There is hence very high confidence that the CMIP5 models show long-term GMST trends consistent with observations, despite the disagreement over the most recent 15-year period. Due to internal climate variability, in any given 15-year period the observed GMST trend sometimes lies near one end of a model ensemble an effect that is pronounced in Box 9.2, Figure 1a,b since GMST was influenced by a very strong El Niño event in 1998

None of the above analysis by IPCC appears in peer reviewed literature. It is ad hoc analysis that can and should be parsed. Demonstrating that 15-year trend comparisons can yield inconsistent results does not remotely settle the statistical question of models running too hot that is evident in the opening graph. Indeed, it is little more than a debating trick.
The following graph compares models to observations over the period 1979-2013, long enough to place the 1998 El Nino in the middle, but excluding the earlier hiatus of the 1950s and 1960s. 1979 is also when the satellite record commences. The figure is a standard box-and-whiskers diagram of a type routinely used in statistics (rather than some ad hoc method). I’ve shown models with multiple runs as separate boxes and grouped models with singleton runs together. On the right in orange, I’ve done a separate box-and-whisker plot for all models. (Lucia has recently done plots in a similar style: her results look similar, but I haven’t parsed them yet as I’ve been working on this post.)
The figure shows that nearly every run of every model ran too hot over the 1979-2013 period, with many models running substantially too hot. The discrepancy can be seen with box-and-whiskers of the ensemble, but it pervades all models.

Figure 4. Boxplot of GLB temperature (tas) trends (1979-2013) from 109 CMIP5 RCP4.5 model runs versus HadCRUt4.
The boxplot shows fundamental discrepancies that pervade all models. Nor do these inconsistencies have anything to do with 15-year trends or the 1998 El Nino. IPCC’s entire discussion of 15-year trends is completely worthless.
Hiatuses in a Warming World
One final figure demonstrating the problem.
As noted above, IPCC (and others) have observed that hiatuses occur from time to time in climate models but didn’t disclose the scarcity of hiatuses of the length of the present negative trend (13 years from 2001 to 2013, a period that does not include the 1998 El Nino).
To assess this (varying a form of analysis that Lucia has used), I calculated all 13-year trends for all 109 CMIP5 RCP4.5 models presently at KNMI for the warming period 2005-2050, yielding a population of 3379 trends (109 models * 31 starting years). Only 0.5% of the population were negative (19 of 3379) and only 0.3% (10 of 3379) were lower than the slightly negative observed trend.

So while it is true that 13-year hiatuses occur from time to time in CMIP5 models of a future warming world, they are statistically rather scarce. Given this scarceness, no one can “with medium confidence” attribute the present hiatus to “natural variability” and, whatever the ultimate explanation of the hiatus, IPCC’s attribution “with medium confidence” to “natural variability” is merely wishful thinking.
Tropical Troposphere
While recent discussion of the discrepancy between models and observations has focused on global surface temperature, the discrepancy between models and observations was first raised in connection with the tropical troposphere, where the discrepancy is even stronger.
In this earlier controversy as well, IPCC and other assessments (e.g. the US CCSP) placed far too much credence in pettifogging arguments by Santer and coauthors that the discrepancy was not “statistically significant”, arguments that were untrue at the time, but which have gone even further offside with the passage of time.
The IPCC has now unequivocally conceded the discrepancy, even citing McKitrick et al 2010 (though not without taking an unwarranted sideswipe at us). In the Second Draft, the IPCC said that explanation of the discrepancy was “elusive”. The new draft refrains from the word “elusive”, but concedes the over-estimate, noting that much of the over-estimate arises from an over-estimate of tropical ocean SST propagated upwards.

In summary, most, though not all, CMIP3 and CMIP5 models overestimate the observed warming trend in
the tropical troposphere during the satellite period 1979–2012. Roughly one-half to two-thirds of this difference from the observed trend is due to an overestimate of the SST trend, which is propagated upward because models attempt to maintain static stability.

The inconsistency between models and observations for tropical SST is even stronger than for global temperature – casting further doubt on IPCC’s attribution of the global inconsistency to “natural variability”. In addition, even IPCC’s seemingly broad concession somewhat understates the problem, as all (not “most”) CMIP5 RCP8.5 runs and models run too hot, as shown in the following boxplot:

Figure 5. Boxplot of TRP TLT trends 1979-2013 for CMIP5 RCP8.5 models. Here I’ve used John Christy’s collation of CMIP5 runs. Christy collated RCP8.5 because that was used in Santer et al 2013. The historic portion of RCP8.5 and RCP 4.5 is very similar. It is possible that a couple of RCP4.5 runs will yield lower trends, but the overwhelming point will remain.
Although IPCC largely conceded the discrepancy, they couldn’t help taking a thoroughly unwarranted sideswipe at McKitrick et al 2010, stating:

The very high significance levels of model–observation discrepancies in LT and MT trends that were obtained in some studies (e.g., Douglass et al., 2008; McKitrick et al., 2010) thus arose to a substantial degree from using the standard error of the model ensemble mean as a measure of uncertainty, instead of the ensemble standard deviation or some other appropriate measure for uncertainty arising from internal climate variability.

The very high levels of significance observed in McKitrick et al 2010 occurred because there were very high levels of significance, not because of the use of “inappropriate” statistics. Indeed, as observed in our rejected submission, had Santer et al 2008 used up-to-data, their own method would have demonstrated the “very high significance levels” that IPCC objects to here.
Conclusion
No credence should be given to IPCC’s last-minute attribution of the discrepancy to “natural variability”. IPCC’s ad hoc analysis purporting to support this claim does not stand up to the light of day.
Gavin Schmidt excused IPCC’s failure to squarely address the discrepancy between models and observations saying that it was “just ridiculous” that IPCC be “up to date”:

The idea that IPCC needs to be up to date on what was written last week is just ridiculous.”

But the problem not arise “last week”. While the issue has only recently become acute, it has become acute because of accumulating failure during the AR5 assessment process, including errors and misrepresentations by IPCC in the assessments sent out for external review; the almost total failure of the academic climate community to address the discrepancy; gatekeeping by fellow-traveling journal editors that suppressed criticism of the defects in the limited academic literature on the topic.
Whatever the ultimate scientific explanation for the pause and its implications for the apparent discrepancy between models and observations, policy-makers must be feeling very letdown by the failure of IPCC and its contributing academic community to adequately address an issue that is critical to them and to the public.
That academics (e.g. Fyfe et al here; von Storch here) have finally begun to touch on the problem, but only after the IPCC deadline must surely add to their frustration. Von Storch neatly summarized the problem and calmly (as he does well) set it out as an important topic of ongoing research, but any investor in the climate research process must surely wonder why this wasn’t brought up six years ago in the scoping of the AR5 report.
One cannot help but wonder whether WG1 Chair Thomas Stocker might not have served the policy community better by spending more time ensuring that the discrepancy between models and observations was properly addressed in the IPCC draft reports, perhaps even highlighting research problems while there was time in the process, than figuring out how IPCC could evade FOI requests.

Tags

Source