On the likelihood of recent record warmth

by Judith Curry
[O]ur results suggest that the recent record temperature years are are roughly 600 to 130,000 times more likely to have occurred under conditions of anthropogenic than in its absence.  – Mann et al.

Recently published in Nature [link; full paper available]:
The Likelihood of Recent Record Warmth
M.E. Mann, S. Rahmstorf, B.A. Steinman, M. Tingley, and S.K. Miller
Abstract. 2014 was nominally the warmest year on record for both the globe and northern hemisphere based on historical records spanning the past one and a half centuries1,2. It was the latest in a recent run of record temperatures spanning the past decade and a half. Press accounts reported odds as low as one-in-650 million that the observed run of global temperature records would be expected to occur in the absence of human-caused global warming. Press reports notwithstanding, the question of how likely observed temperature records may have have been both with and without human influence is interesting in its own right. Here we attempt to address that question using a semi-empirical approach that combines the latest (CMIP53) climate model simulations with observations of global and hemispheric mean temperature. We find that individual record years and the observed runs of record-setting temperatures were extremely unlikely to have occurred in the absence of human-caused climate change, though not nearly as unlikely as press reports have suggested. These same record temperatures were, by contrast, quite likely to have occurred in the presence of anthropogenic climate forcing.
The paper is getting some media attention, here are two articles showing the range:

JC comments
My comments were solicited for the Examiner article, here is what I sent the reporter:
The analysis of Mann et al. glosses over 3 major disputes in climate research:
i) Errors and uncertainty in the temperature record, and reconciling the surface temperature record (which shows some warming in the recent decades) against the global satellite record (which shows essentially no warming for the past 18 years).
ii) Climate models that are running significantly too hot. For the past decade, global average surface temperatures have been at the bottom of the envelope of climate model simulations. [link]  Even the very warm year 2015 (anomalously warm owing to a very strong El Nino) is cooler than the multi-model ensemble mean prediction.
iii) How to separate out human-caused climate variability from natural climate variability remains a challenging and unsolved problem. Mann et al. use the method of Steinmann et al. to infer the forced variability (e.g. CO2, solar, volcanoes), calculating the internal variability (e.g. from ocean circulations) as a residual. In effect, the multi-model ensemble used by Steinmann et al. assume that all of the recent warming is forced by CO2I and my colleagues, led by Sergey Kravtsov, recently published a paper in Science [link; see also this blog post] arguing that the method of Steinman et al. is flawed, resulting in a substantial underestimate of the internal variability from large scale, multi-decadal ocean oscillations.
Global temperatures have overall been increasing for more than 200 years. Human caused CO2 emissions does not explain a significant amount of this warming prior to 1950. How to attribute the recent variations in global temperature remains an issue associated with substantial uncertainty. The IPCC assessment reports conclude ‘more than half’ of the warming since 1950 is caused by humans, with more than half implying >50% [who knows what this actually implies; see my disagreement with Gavin]. This assessment acknowledges uncertainties in climate models, which find that all of the warming since 1950 is caused by humans. The Mann et al. paper is assuming that all of the warming has been caused by humans, which given our current state of knowledge is an unwarranted assumption.
———–
An additional comment, too technical to send to the Examiner:
iv)  the use of the multi-model ensemble in this way is simply inappropriate from a statistical perspective.  See my previous post How should we interpret an ensemble of climate models?  Excerpts:
Given the inadequacies of current climate models, how should we interpret the multi-model ensemble simulations of the 21st century climate used in the IPCC assessment reports? This ensemble-of-opportunity is comprised of models with generally similar structures but different parameter choices and calibration histories. McWilliams (2007) and Parker (2010) argue that current climate model ensembles are not designed to sample representational uncertainty in a thorough or strategic way. 
Stainforth et al. (2007) argue that model inadequacy and an insufficient number of simulations in the ensemble preclude producing meaningful probability distributions from the frequency of model outcomes of future climate. Stainforth et al. state: “[G]iven nonlinear models with large systematic errors under current conditions, no connection has been even remotely established for relating the distribution of model states under altered conditions to decision-relevant probability distributions. . . Furthermore, they are liable to be misleading because the conclusions, usually in the form of PDFs, imply much greater confidence than the underlying assumptions justify.”
Nic Lewis’ comments
I solicited comments from Nic Lewis on the paper, he sent some quick initial comments, the revised version is included below:
Hi Judy, It is a paper that would be of very little scientific value even if it were 100% correct. I have some specific comments:
1. They say “It is appropriate to define a stationary stochastic time series model for using parameters estimated from the residual series.” This is an unsupported assertion. On the contrary, this method of using residuals between the recorded and model-simulated temperature changes to estimate internal variability appears unsatisfactory; the observed record is too short to fully sample internal variability and there is only one instance of it. Moreover the model parameters and their forcing strengths have veryquite likely been tuned so that model simulations of the historical period provide a good match to observed temperature changes (e.g. by using strongly negative aerosol forcing changes in the third quarter of the 20th century to provide a better match to the ‘grand hiatus’). Doing so artificially reduces the residuals. Using estimates based on long period AOGCM unforced control runs, as routinely done in detection and attribution studies, is a much less unsatisfactory method albeit far from perfect.
2. The proper way of separating anthropogenically forced from naturally forced climate changes and from internal variability is to perform a multimodel detection and attribution analysis, using gridded data not just global or NH means. Two thorough recent studies that did so were used in the IPCC AR5 report to reach their anthropogenic attribution statements. This study does nothing of comparable sophistication and does not enable any stronger statements to be made than in AR5.
3. They say that a long range dependence (long memory) noise process is not supported: ‘Some researchers have argued that climate noise may exhibit first order non-stationary behaviour, i.e. so-called “long-range dependence”. Analyses of both modern and paleoclimate observations, however, support the conclusion that it is only the anthropogenic climate change signal that displays first-order non-stationarity behavior, with climatic noise best described by a stationary noise model. We have nonetheless considered the additional case of (3) ‘persistent’ red noise wherein the noise model is fit to the raw observational series’
Three problems there.
a) The “analyses” they cite is an editorial comment by Michael Mann.
b) Long-range dependency does NOT in general involve first-order non-stationarity behavior. A classical case of long-range dependence is a fractional difference model, which is not first-order non-stationary provided that the difference parameter is under 0.5. Such a model is considered a physically plausible simple one-adjustable-parameter characterisation of climate internal variability, just as is the first-order autoregressive (AR(1)), short range dependency, model that they use. (Hasselmann 1979; Vyushin and Kushner 2009). Imbers et al (2013) found that both models adequately fitted climate internal variability in GMST over the historical period, but that using the long-range dependency model the uncertainty ranges were larger.
c) Their ‘persistent’ red noise model does NOT have any long range dependence at all (it is an AR(1) model with a differently estimated autocorrelation parameter), so it provides little or no test of the effects of the true internal variability process having long range dependency.
4. Nothing in this study considers the probability of the high recent recorded temperatures having arisen in the case where there is an anthropogenic component but it is less strong than that simulated by the CMIP5 models, e.g. because they are too sensitive. Moreover, simple models involving lower sensitivity to greenhouse gas forcing, but also less aerosol cooling, than in most CMIP5 models can provide as good or better a match to the historical record as the CMIP5 multimodel mean – better if known natural multidecadal variability (AMO) is taken into account. This isThese are really the key questions – is all the warming over the historical period anthropogenic; and even assuming it is can it be accounted for by models that are less sensitive to increasing greenhouse gas concentrations? Few serious people these days argue that no part of the warming over the historical period has an anthropogenic cause, which is all that Mann’s method can seek to rule out.
5. I think that their extension of the CMIP5 Historical simulations from 2005 to 2014 is highly questionable. They say “We extend the CMIP5 series through 2014 using the estimates provided by ref. 13 (Supplementary Information).” That means, I think, that they reduce the CMIP5 model temperature trends after 2005 to account for supposedly lower actual than modelled forcing. In the SI it refers instead to ref. 12. Neither ref. 12 nor ref 13. appears to provide any such forcing estimates. Ref 14, which in their reference list bears the title of a different paper (the correct title is ‘Reconciling warming trends’) is highly likely wrong in its conclusion that forcings in CMIP5 models have been overestimated over the last decade or so. They only considered forcings that they thought were changed too positively in models. The issue was properly investigated in a more recent paper, Outen et al 2015, which considered all forcings and concluded that there was “no evidence that forcing errors play a significant role in explaining the hiatus” – they found a negligible difference in forcing when substituting recent observational estimates for those used in a CMIP5 model.
6. They use modelled SST (TOS) rather than 2m height air temperature (TAS) over the ocean. In principle this is appropriate when comparing with HadCRUT4 and old versions of GISTEMP, but not with the latest version of GISTEMP or with the new NOAA (Karl et al – MLOST?) record, as that adjusts SST to match near-surface air temperature on a decadal and multidecadal timescale.
Hope this helps. Nic
JC Conclusion
The Mann et al. paper certainly provides a punchy headline, and it is a challenge to communicate the problems with the paper to the public.
As I see it, this paper is a giant exercise in circular reasoning:

  1.  Assume that the global surface temperature estimates are accurate; ignore the differences with the satellite atmospheric temperatures
  2. Assume that the CMIP5 multi-model ensemble can be used to accurately portray probabilities
  3. Assume that the CMIP5 models adequately simulate internal variability
  4. Assume that external forcing data is sufficiently certain
  5. Assume that the climate models are correct in explaining essentially 100% of the recent warming from CO2

In order for Mann et al.’s analysis to work, you have to buy each of these 5 assumptions; each of these is questionable to varying degrees.Filed under: Attribution, climate models

Source