Does the observational evidence in AR5 support its/the CMIP5 models’ TCR ranges?

A guest post by Nic Lewis
Steve McIntyre pointed out some time ago, here, that almost all the global climate models around which much of the IPCC’s AR5 WGI report was centred had been warming faster than the real climate system over the last 35-odd years, in terms of the key metric of global mean surface temperature. The relevant figure from Steve’s post is reproduced as Figure 1 below.

Figure 1 Modelled versus observed decadal global surface temperature trend 1979–2013
Temperature trends in °C/decade. Models with multiple runs have separate boxplots; models with single runs are grouped together in the boxplot marked ‘singleton’. The orange boxplot at the right combines all model runs together. The default settings in the R boxplot function have been used. The red dotted line shows the actual increase in global surface temperature over the same period per the HadCRUT4 observational dataset.

Transient climate response
Virtually all the projections of future climate change in AR5 are based on the mean and range of outcomes simulated by this latest CMIP5 generation of climate models (AOGCMs). Changes in other variables largely scale with changes in global surface temperature. The key determinant of the range and mean level of projected increases in global temperature over the rest of this century is the transient climate response (TCR) exhibited by each CMIP5 model, and their mean TCR. Model equilibrium climate sensitivity (ECS) values, although important for other purposes, provide little information regarding surface warming to the last quarter of this century beyond that given by TCR values.
TCR represents the increase in 20-year mean global temperature over a 70 year timeframe during which CO2 concentrations, rising throughout at 1% p.a. compound, double. More generally, paraphrasing from Section 10.8.1 of AR5 WG1,TCR can be thought of as a generic property of the climate system that determines the global temperature response ΔT to any gradual increase in (effective) radiative forcing (ERF – see AR5 WGI glossary, here ) ΔF taking place over a ~70-year timescale, normalised by the ratio of the forcing change to the forcing due to doubling CO2, F2xCO2: TCR = F2xCO2 ΔT/ΔF. This equation permits warming resulting from a gradual change in ERF over a 60–80 year timescale, at least, to be estimated from the change in ERF and TCR. Equally, it permits TCR to be estimated from such changes in global temperature and in ERF.
The TCRs of the 30 AR5 CMIP5 models featured in WGI Table 9.5 vary from 1.1°C to 2.6°C, with a mean of slightly over 1.8°C. Many projections in AR5 are for changes up to 2081–2100. Applying the CMIP5 TCRs to the changes in CO2 concentration and other drivers of climate change from the first part of this century up to 2081–2100, expressed as the increase in total ERF, explains most of the projected rises in global temperature on the business-as-usual RCP8.5 scenario, although the relationship varies from model to model. Overall the models project about 10–20% faster warming than would be expected from their TCR values, allowing for warming ‘in-the-pipeline’. That discrepancy, which will not be investigated in this article, implies that the mean ‘effective’ TCR of the AR5 CMIP5 models for warming towards the end of this century under RCP8.5 is in the region of 2.0–2.2°C.

Observational evidence in AR5 about TCR
AR5 gives a ‘likely’ (17–83% probability) range for TCR of 1.0–2.5°C, pretty much in line with the 5–95% CMIP5 model TCR range (from fitting a Normal distribution) but with a downgraded certainty level. How does that compare with the observational evidence in AR5? Figure 10.20a thereof, reproduced as Figure 2 here, shows various observationally based TCR estimates.
 
Figure 2. Reproduction of Figure 10.20a from AR5
Bars show 5–95% uncertainty ranges for TCR.[i]
.
On the face of it, the observational study TCR estimates in Figure 2 offer reasonable support to the AR5 1.0–2.5°C range, leaving aside the Tung et al. (2008) study, which uses a method that AR5 WGI discounts as unreliable. However, I have undertaken a critical analysis of all these TCR studies, here. I find serious fault with all the studies other than Gillett et al. (2013), Otto et al. (2013) and Schwartz (2012). Examples of the faults that I find with other studies are:
Harris et al. (2013): This perturbed physics/parameter ensemble (PPE) study’s TCR range, like its ECS range, almost entirely reflects the characteristics of the UK Met Office HadCM3 model. Despite the HadCM3 PPE (as extended by emulation) sampling a wide range of values for 31 key model atmospheric parameters, the model’s structural rigidities are so strong that none of the cases results in the combination of low-to-moderate climate sensitivity and low-to-moderate aerosol forcing that the observational data best supports – nor could perturbing aerosol model parameters achieve this.
Knutti and Tomassini (2008): This study used initial estimates of aerosol forcing totalling −1.3 W/m² in 2000, in line with AR4 but far higher than the best estimate in AR5. Although it attempted to observationally-constrain these initial estimates, the study’s use of only global temperature data makes it impossible to separate properly greenhouse gas and aerosol forcing, the evolution of which are very highly (negatively) correlated at a global scale. The resulting final estimates of aerosol forcing are still significantly stronger than the AR5 estimates, biasing up TCR estimation. The use of inappropriate uniform and expert priors for ECS in the Bayesian statistical analysis further biases TCR estimation.
Rogelj et al. (2012): This study does not actually provide an observationally-based estimate for TCR. It explicitly sets out to generate a PDF for ECS that simply reflects the AR4 ‘likely’ range and best estimate; in fact it reflects a slightly higher range. Moreover, the paper and its Supplementary Information do not even mention estimation of TCR or provide any estimated PDF for TCR.
Stott and Forest (2007): This TCR estimate is based on the analysis in Stott et al. (2006), an AR4 study from which all four of the unlabelled grey dashed-line PDFs in Figure 10.20a are sourced. It used a detection-and-attribution regression method applied to 20th century temperature observations to scale TCR values, and 20th century warming attributable to greenhouse gases, for three AOGCMs. Gillett et al. (2012) found that just using 20th century data for this purpose biased TCR estimation up by almost 40% compared with when 1851–2010 data was used. Moreover, the 20th century greenhouse gas forcing increase used in Stott and Forest (2007) to derive TCR (from the Stott et al. (2006) attributable warming estimate) is 11% below that per AR5, biasing up its TCR estimation by a further 12%.
In relation to the three studies that I do not find any serious fault with, some relevant details from my analysis are:
Gillett et al. (2013): This study uses temperature observations over 1851–2010 and a detection-and-attribution regression method to scale AOGCM TCR values. The individual CMIP5 model regression-based observationally-constrained TCRs shown in a figure in the Gillett et al. (2013) study imply a best (median[ii]) estimate for TCR of 1.4°C, with a 5–95% range of 0.8–2.0°C.[iii] That compares with a range of 0.9–2.3°C given in the study based on a single regression incorporating all models at once, which it is unclear is as suitable a method.
Otto et al. (2013): There are two TCR estimates from this energy budget study included in Figure 10.20a. One estimate uses 2000–2009 data and has a median of 1.3°C, with a 5–95% range of 0.9–2.0°C. The other estimate uses 1970–2009 data and has a median of slightly over 1.35°C, with a 5–95% range of 0.7–2.5°C. Since mean forcing was substantially higher over 2000–2009 than over 1970–2009, and was also less affected by volcanic activity, the TCR estimate based on 2000–2009 data is less uncertain, and arguably more reliable, than that based on 1970–2009 data.
Schwartz (2012): This study derived TCR by zero-intercept regressions of changes, from the 1896–1901 mean, in observed global surface temperature on corresponding changes in forcing, up to 2009, based on forcing histories used in historical model simulations. The mean change in forcing up to 1990 (pre the Mount Pinatubu eruption) per the five datasets used to derive the TCR range is close to the best estimate of the forcing change per AR5. The study’s TCR range is 0.85–1.9°C, with a median estimate of 1.3°C.
So the three unimpeached studies in Figure 10.20a support a median TCR estimate of about 1.35°C, and a top of the ‘likely’ range for TCR of about 2.0°C based on downgrading 5–95% ranges, following AR5.

The implication for TCR of the substantial revision in AR5 to aerosol forcing estimates
There has been a 43% increase in the best estimate of total anthropogenic radiative forcing between that for 2005 per AR4, and that for 2011 per AR5. Yet global surface temperatures remain almost unchanged: 2012 was marginally cooler than 2007, whilst the trailing decadal mean temperature was marginally higher. The same 0.8°C warming now has to be spread over a 43% greater change in total forcing, natural forcing being small in 2005 and little different in 2012. The warming per unit of forcing is a measure of climate sensitivity, in this case a measure close to TCR, since most of the increase in forcing has occurred over the last 60–70 years. It follows that TCR estimates that reflect the best estimates of forcing in AR5 should be of the order of 30% lower than those that reflected AR4 forcing estimates.
Two thirds of the 43% increase in estimated total anthropogenic forcing between AR4 and AR5 is accounted for by revisions to the 2005 estimate, reflecting improved understanding, with the increase in greenhouse gas concentrations between 2005 and 2011 accounting for almost all of the remainder. Almost all of the revision to the 2005 estimate relates to aerosol forcing. The AR5 best (median) estimate of recent total aerosol forcing is −0.9 W/m2, a large reduction from −1.3 W/m2 (for a more limited measure of aerosol forcing) in AR4. This reduction has major implications for TCR and ECS estimates.
Moreover, the best estimate the IPCC gives in AR5 for total aerosol forcing is not fully based on observations. It is an expert judgement based on a composite of estimates derived from simulations by global climate models and from satellite observations. The nine satellite-observation-derived aerosol forcing estimates featured in Figure 7.19 of AR5 WGI range from −0.09 W/m2 to −0.95 W/m2, with a mean of −0.65 W/m2. Of these, six satellite studies with a mean best estimate of −0.78 W/m2 were taken into account in deciding on the −0.9 W/m2 AR5 composite best estimate of total aerosol forcing.

TCR calculation based on AR5 forcing estimates
Arguably the most important question is: what do the new ERF estimates in AR5 imply about TCR? Over the last century or more we have had a period of gradually increasing ERF, with some 80% of the decadal mean increase occurring fairly smoothly, volcanic eruptions apart, over the last ~70 years. We can therefore use the TCR = F2xCO2 ΔT/ΔF equation to estimate TCR from ΔT and ΔF, taking the change in each between the means for two periods, each long enough for internal variability to be small.
That is exactly the method used, with a base period of 1860–1879, by the ‘energy budget’ study Otto et al. (2013), of which I was a co-author. That study used estimates of radiative forcing that are approximately consistent with estimates from Chapters 7 and 8 of AR5, but since AR5 had not at that time been published the forcings were actually diagnosed from CMIP5 models, with an adjustment being made to reflect satellite-observation-derived estimates of aerosol forcing. However, in a blog-published study, here, I did use the same method but with forcing estimates (satellite-based for aerosols) taken from the second draft of AR5. That study estimated only ECS, based on changes between 1871–1880 and 2002–2011, but a TCR estimate of 1.30°C is readily derived from information in it.
We can now use the robust method of the Otto et al. (2013) paper in conjunction with the published AR5 forcing best (median) estimates up to 2011, the most recent year given. The best periods to compare appear to be 1859–1882 and 1995–2011. These two periods are the longest ones in respectively the earliest and latest parts of the instrumental period that were largely unaffected by major volcanic eruptions. Volcanic forcing appears to have substantially less effect on global temperature than other forcings, and so can distort TCR estimation. Using a final period that ends as recently as possible is important for obtaining a well-constrained TCR estimate, since total forcing (and the signal-to-noise ratio) declines as one goes back in time. Measuring the change from early in the instrumental period maximises the ratio of temperature change to internal variability, and since non-volcanic forcings were small then it matters little that they are known less accurately than in recent decades. Moreover, these two periods are both near the peak of the quasi-periodic ~65 year AMO cycle. Using a base period extending before 1880 limits one to using the HadCRUT surface temperature dataset. However, that is of little consequence since the HadCRUT4 v2 change in global temperature from 1880–1900 to 1995–2011 is identical to that per NCDC MLOST and only marginally below that per GISS.
In order to obtain a TCR estimate that is as independent of global climate models as possible, one should scale the aerosol component of the AR5 total forcing estimates to match the AR5 recent satellite-observation-derived mean of −0.78 W/m2. Putting this all together gives ΔF = 2.03 W/m2 and ΔT = 0.71, which, since AR5 uses F2xCO2 = 3.71 W/m, gives a best estimate of 1.30°C for TCR. The best estimate for TCR would be 1.36°C without scaling aerosol forcing to match the satellite-observation derived mean.
So, based on the most up to date numbers from the IPCC AR5 report itself and using the most robust methodology on the data with the best signal-to-noise ratio, one arrives at an observationally based best estimate for TCR of 1.30°C, or 1.36°C based on the unadjusted AR5 aerosol forcing estimate.
I selected 1859–1882 and 1995–2011 as they seem to me to be the best periods for estimating TCR. But it is worth looking at longer periods as well, even though the signal-to-noise ratio is lower. Using 1850–1900 and 1985–2011, two periods with mean volcanic forcing levels that, although significant, are well matched, gives a TCR best estimate of 1.24°C, or 1.30°C based on the unadjusted AR5 aerosol forcing estimate. The TCR estimates are even lower using 1850–1900 to 1972–2011, periods that are also well-matched volcanically.
What about estimating TCR over a shorter timescale? If one took ~65 rather than ~130 years between the middles of the base and end periods, and compared 1923–1946 with 1995–2011, the TCR estimates would be almost unchanged. But there is some sensitivity to the exact periods used. An alternative approach is to use information in the AR5 Summary for Policymakers (SPM) about anthropogenic-only changes over 1951–2010, a well-observed period. The mid-range estimated contributions to global mean surface temperature change over 1951–2010 per Section D.3 of the SPM are 0.9°C for greenhouse gases and ‑0.25°C for other anthropogenic forcings, total 0.65°C. The estimated change in total anthropogenic radiative forcing between 1950 and 2011 of 1.72 Wm-2 per Figure SPM.5, reduced by 0.04 Wm-2 to adjust to 1951–2010, implies a TCR of 1.4°C after multiplying by an F2xCO2 of 3.71 Wm-2. When instead basing the estimate on the linear trend increase in observed total warming of 0.64°C over 1951–2010 per Jones et al. (2013) – the study cited in the section to which the SPM refers – (the estimated contribution from internal variability being zero) and the linear trend increase in total forcing per AR5 of 1.73 Wm-2, the implied TCR is also 1.4°C. Scaling the AR5 aerosol forcing estimates to match the mean satellite observation derived aerosol forcing estimate would reduce the mean of these two TCR estimates to 1.3°C.

So does the observational evidence in AR5 support its/the CMIP5 models’ TCR ranges?
The evidence from AR5 best estimates of forcing, combined with that in solid observational studies cited in AR5, points to a best (median) estimate for TCR of 1.3°C if the AR5 aerosol forcing best estimate is scaled to match the satellite-observation-derived best estimate thereof, or 1.4°C if not (giving a somewhat less observationally-based TCR estimate). We can compare this with model TCRs. The distribution of CMIP5 model TCRs is shown in Figure 3 below, with a maximally observationally-based TCR estimate of 1.3°C for comparison.
.

Figure 3. Transient climate response distribution for CMIP5 models in AR5 Table 9.5
The bar heights show how many models in Table 9.5 exhibit each level of TCR
.
Figure 3 shows an evident mismatch between the observational best estimate and the model range. Nevertheless, AR5 states (Box 12.2) that:
“the ranges of TCR estimated from the observed warming and from AOGCMs agree well, increasing our confidence in the assessment of uncertainties in projections over the 21st century.”
How can this be right, when the median model TCR is 40% higher than an observationally-based best estimate of 1.3°C, and almost half the models have TCRs 50% or more above that? Moreover, the fact that effective model TCRs for warming to 2081–2100 are the 10%–20% higher than their nominal TCRs means that over half the models project future warming on the RCP8.5 scenario that is over 50% higher than what an observational TCR estimate of 1.3°C implies.
Interestingly, the final draft of AR5 WG1 dropped the statement in the second draft that TCR had a most likely value near 1.8°C, in line with CMIP5 models, and marginally reduced the ‘likely’ range from 1.2–2.6°C to 1.0–2.5°C, at the same time as making the above claim.
So, in their capacity as authors of Otto et al. (2013), we have fourteen lead or coordinating lead authors of the WG1 chapters relevant to climate sensitivity stating that the most reliable data and methodology give ‘likely’ and 5–95% ranges for TCR of 1.1–1.7°C and 0.9–2.0°C, respectively. They go on to suggest that some CMIP5 models have TCRs that are too high to be consistent with recent observations. On the other hand, we have Chapter 12, Box 12.2, stating that the ranges of TCR estimated from the observed warming and from AOGCMs agree well. Were the Chapter 10 and 12 authors misled by the flawed TCR estimates included in Figure 10.20a? Or, given the key role of the CMIP5 models in AR5, did the IPCC process offer the authors little choice but to endorse the CMIP5 models’ range of TCR values?
.

[i] Note that the PDFs and ranges given for Otto et al. (2013) are slightly too high in the current version of Figure 10.20a. It is understood that those in the final version of AR5 will agree to the ranges in the published study.

[ii] All best estimates given are medians (50% probability points for continuous distributions), unless otherwise stated.

[iii] This range for Gillett et al. (2013) excludes an outlier at either end; doing so does not affect the median.

Source