Reply to Patrick Brown’s response to my article commenting on his Nature paper

Introduction
I thank Patrick Brown for his detailed response (also here) to statistical issues that I raised in my critique “Brown and Caldeira: A closer look shows global warming will not be greater than we thought” of his and Ken Caldeira’s recent paper (BC17).[1] The provision of more detailed information than was given in BC17, and in particular the results of testing using synthetic data, is welcome. I would reply as follows.
Brown comments that I suggested that rather than focusing on the simultaneous use of all predictor fields, BC17 should have focused on the results associated with the single predictor field that showed the most skill: The magnitude of the seasonal cycle in OLR. He goes on to say: “Thus, Lewis is arguing that we actually undersold the strength of the constraints that we reported, not that we oversold their strength.”
To clarify, I argued that BC17 undersold the statistical strength of the relationships involved, in the RCP8.5 2090 case focussed on in their Abstract, for which the signal-to-noise ratio is highest. But I went on to say that I did not think the stronger relationships would really provide a guide to how much global warming there would actually be late this century on the RCP8.5 scenario, or any other scenario. That is because, as I stated, I disagree with BC17’s fundamental assumption that the relationship of future warming to certain aspects of the recent climate that holds in climate models necessarily also applies in the real climate system. I will return to that point later. But first I will discuss the statistical issues.
Statistical issues
When there are many more predictor variables than observations, the dimensionality of the predictor information has to be reduced in some way to avoid over-fitting. There are a number of statistical approaches to achieving this using a linear model, of which the partial least squares (PLS) regression method used in BC17 is arguably one of the best, at least when its assumptions are satisfied. All methods estimate a statistical model fit that provides a set of coefficients, one for each predictor variable.[2] The general idea is to preserve as much of the explanatory power of the predictors as possible without over-fitting, thus maximizing the fit’s predictive power when applied to new observations.
If the PLS method is functioning as intended, adding new predictors should not worsen the predictive skill of the resulting fitted statistical model. That is because, if those additional predictors contain useful information about the predictand(s), that information should be incorporated appropriately, while if the additional predictors do not contain any such information they should be given zero coefficients in the model fit. Therefore, the fact that, in the highest signal-to-noise ratio, RCP8.5 2090 case focussed on both in BC17 and my article, the prediction skill when using just the OLR seasonal cycle predictor field is very significantly reduced by adding the remaining eight predictor fields indicates that something is amiss.
Brown say that studies are often criticized for highlighting the single statistical relationship that appears to be the strongest while ignoring or downplaying weaker relationships that could have been discussed. However, the logic with PLS is to progressively include weaker relationships but to stop at the point where they are so weak that doing so worsens predictive accuracy. Some relationships are sufficiently weak that including them adds too much noise relative to information useful for prediction. My proposal of just using the OLR seasonal cycle to predict RCP8.5 2090 temperature was accordingly in line with the logic underlying PLS – it was not a case of just ignoring weaker relationships.
Indeed, the first reference for the PLS method that BC17 give (de Jong, 1993),  justified PLS by referring to a paper [3] that specifically proposed carrying out the analysis in steps, selecting one variable/component at a time and not adding an additional one if it worsened the statistical model fit’s predictive accuracy. At the predictor field level, that strongly suggests that, in the RCP8.5 2090 case, when starting with the OLR seasonal cycle field, one would not go on to add any of the other predictor fields, as in all cases doing so worsens the fit’s predictive accuracy. And there would not be any question of using all predictor fields simultaneously, since doing so also worsens predictive accuracy compared to using just the OLR seasonal cycle field.
In principle, even when given all the predictor fields simultaneously PLS should have been able to optimally weight the predictor variables to build composite components in order of decreasing predictive power, to which the add-one-at-a-time principle could be applied.  However, it evidently was unable to do so in the RCP8.5 2090 case or other cases. I can think of two reasons for this. One is that the measure of prediction accuracy used –  RMS prediction error when applying leave-one-out cross-validation – is imperfect. But I think that the underlying problem is the non-satisfaction of a key assumption of the PLS method: that the predictor variables are free of uncertainty. Here, although the CMIP5-model-derived predictor variables are accurately  measured, they are affected by the GCMs’ internal variability. This uncertainty-in-predictor-values problem was made worse by the decision in BC17 to take their values from a single simulation run by each CMIP5 model rather than averaging across all its available runs.
Brown claims (a) that each model’s own value is included in the multi-model average which gives the multi-model average an inherent advantage over the cross-validated PLSR estimate and (b) that this means that PLSR is able to provide meaningful Prediction Ratios even when the Spread Ratio is near or slightly above 1. Point (a) is true but the effect is very minor. Based on the RCP8.5 2090 predictions, it would normally cause a 1.4% upwards bias in the Spread Ratio. Since Brown did not adjust for the difference of one in the degrees of freedom involved, the bias is twice that level – still under 3%. Brown’s claim (b), that PLS regression is able to provide meaningful Prediction Ratios even when the Spread Ratio is at or virtually at the level indicating a skill no higher than when always predicting warming equal to the mean value for the models used to estimate the fit, is self-evidently without merit.
As Brown indicates, adding random noise affects correlations, and can produce spurious correlations between unrelated variables. His test results using synthetic data are interesting, although they only show Spread ratios. They show that one of the nine synthetic predictor fields produced a reduction in the Spread ratio below one that was very marginally – 5% – greater than that when using all nine fields simultaneously. But the difference I highlighted, in the highest signal RCP8.5 2090 case, between the reduction in Spread ratio using just the OLR seasonal cycle ratio and that using all predictors simultaneously was an order of magnitude larger – 40%. It seems very unlikely that the superior performance of the OLR seasonal cycle on its own arose by chance.
Moreover, the large variation in Spread ratios and Prediction ratios between different cases and different (sets of) predictors calls into question the reliability of estimation using PLS. In view of the non-satisfaction of the PLS assumption of no errors in the predictor variables, a statistical method that does take account of errors in them would arguably be more appropriate. One such method is the RegEM (regularized expectation maximization) algorithm, which was developed for use in climate science.[4] The main version of RegEM uses ridge regression with the ridge coefficient (the inverse of which is analogous to the number of retained components in PLS) being chosen by generalized cross-validation. Ridge regression RegEM, unlike the TTLS variant used by Michael Mann, produces very stable estimation. I have applied RegEM to BC17’s data in the RCP8.5 2090 case, using all predictors simultaneously.[5] The resulting Prediction ratio was 1.08 (8% greater warming), well below the comparative 1.12 value Brown arrives at (for grid-level standardization). And using just the OLR seasonal cycle , the excess of the Prediction ratio over one was only half that for the comparative PLS estimate.
Issues with the predictor variables and the emergent constraints approach
I return now to BC17’s fundamental assumption that the relationship of future warming to certain aspects of the recent climate that holds in climate models also applies in the real climate system. They advance various physical arguments for why this might be the case in relation to their choice of predictor variables. They focus on the climatology and seasonal cycle magnitude predictors as they find, compared with the monthly variability predictor, these have more similar PLS loading patterns to those when targeting shortwave cloud feedback, the prime source of intermodel variation in ECS.
There are major problems in using climatological values (mean values in recent years) for OLR, OSR and the TOA radiative imbalance N. Most modelling groups target agreement of simulated climatological values of these variables with observed values (very likely spatially as well as in the global mean) when tuning their GCMs, although some do not do so. Seasonal cycle magnitudes may also be considered when tuning GCMs. Accordingly, how close values simulated by each model are to observed values may very well reflect whether and how closely the model has been tuned to match observations, and not be indicative of how good the GCM is at representing the real climate system, let alone how realistic its strength of multidecadal warming in response to forcing is.
There are further serious problems with use of climatological values of TOA radiation variables. First, in some CMIP5 GCMs substantial energy leakages occur, for example at the interface between their atmospheric and ocean grids.[6] Such models are not necessarily any worse in simulating future warming than other models, but they need (to be tuned) to have TOA radiation fluxes significantly different from observed values in order for their ocean surface temperature change to date, and in future, to be realistic.
Secondly, at least two of the CMIP5 models used in BC17 (NorESM1-M and NorESM1-ME) have TOA fluxes and a flux imbalance that differ substantially from CERES observed values, but it appears that this merely reflects differences between derived TOA values and actual top-of-model values. There is very little flux imbalance within the GCM itself.[7] Therefore, it is unfair to treat these models as having lower fidelity – as BC17’s method does for climatology variables – on account of their TOA radiation variables differing, in the mean, from observed values.
Thirdly, most CMIP5 GCMs simulate too cold an Earth: their GMST is below the actual value, by up to several degrees. It is claimed, for instance in IPCC AR5, that this does not affect their GMST response to forcing. However, it does affect their radiative fluxes. A colder model that simulates TOA fluxes in agreement with observations should not be treated as having good fidelity. With a colder surface its OLR should be significantly lower than observed, so if it is in line then either the model has compensating errors or its OLR has been tuned to compensate, either of which indicates its fidelity is poorer than it appears to be. Moreover, complicating the picture, there is an intriguing, non-trivial correlation between preindustrial absolute GMST and ECS in CMIP5 models.
Perhaps the most serious shortcoming of the predictor variables is that none of them are directly related to feedbacks operating over a multidecadal scale, which (along with ocean heat uptake) is what most affects projected GMST rise to 2055 and 2090. Predictor variables that are related to how much GMST has increased in the model since its preindustrial control run, relative to the increase in forcing – which varies substantially between CMIP5 models – would seem much more relevant. Unfortunately, however, historical forcing changes have not been measured for most CMIP5 models. Although one would expect some relationship between seasonal cycle magnitude of TOA variables and intra-annual feedback strengths, feedbacks operating over the seasonal cycle may well be substantially different from feedbacks acting on a multidecadal timescale in response to greenhouse gas forcing.
Finally, a recent paper by scientists as GFDL laid bare the extent of the problem with the whole emergent constraints approach. They found that, by a simple alteration of the convective parameterization scheme, they could engineer the climate sensitivity of the GCM they were developing, varying it over a wide range, without them being able to say that one model version showed a greater fidelity in representing recent climate system characteristics than another version with a very different ECS.[8] The conclusion from their Abstract is worth quoting:”Given current uncertainties in representing convective precipitation microphysics and the current inability to find a clear observational constraint that favors one version of the authors’ model over the others, the implications of this ability to engineer climate sensitivity need to be considered when estimating the uncertainty in climate projections.” This strongly suggests that at present emergent constraints cannot offer a reliable insight into the magnitude of future warming. And that is before taking account of the possibility that there may be shortcomings common to all or almost all GCMs that lead them to misestimate the climate system response to increased forcing.
 
Nicholas Lewis                                                                                   23 December 2017
 
[1] Patrick T. Brown & Ken Caldeira, 2017. Greater future global warming inferred from Earth’s recent energy budget, doi:10.1038/nature24672.
[2] The predicted value of the predictand is the sum of the predictor variables each weighted by its coefficient, plus an intercept term.
[3] A Hoskuldsson, 1992. The H-principle in modelling with applications to chemometrics. Chemometrics and Intelligent Laboratory Systems, 14, 139-153.
[4] Schneider, T., 2001: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values. J. Climate, 14, 853–871.
[5] Due to memory limitations I had to reduce the longitudinal resolution by a factor of three when using all predictor fields simultaneously. Note that RegEM standardizes all predictor variables to unit variance.
[6] Hobbs et al, 2016. An Energy Conservation Analysis of Ocean Drift in the CMIP5 Global Coupled Models. DOI: 10.1175/JCLI-D-15-0477.1.
[7] See discussion following this blog comment.
[8] Ming Zhao et al, 2016. Uncertainty in model climate sensitivity traced to representations of cumulus precipitation microphysics. J Cli, 29, 543-560.

Source