by Judith Curry
“Letting go of the phantastic mathematical objects and achievables of model- land can lead to more relevant information on the real world and thus better-informed decision- making.” – Erica Thompson and Lenny Smith
The title and motivation for this post comes from a new paper by Erica Thompson and Lenny Smith, Escape from Model-Land. Excerpts from the paper:
“Model-land is a hypothetical world (Figure 1) in which mathematical simulations are evaluated against other mathematical simulations, mathematical models against other (or the same) mathematical model, everything is well-posed and models (and their imperfections) are known perfectly.”
“It also promotes a seductive, fairy-tale state of mind in which optimising a simulation invariably reflects desirable pathways in the real world. Decision-support in model-land implies taking the output of model simulations at face value (perhaps using some form of statistical processing to account for blatant inconsistencies), and then interpreting frequencies in model-land to represent probabilities in the real-world.”
“It is comfortable for researchers to remain in model-land as far as possible, since within model-land everything is well-defined, our statistical methods are all valid, and we can prove and utilise theorems. Exploring the furthest reaches of model-land in fact is a very productive career strategy, since it is limited only by the available computational resource.”
“For what we term “climate-like” tasks, the realms of sophisticated statistical processing which variously “identify the best model”, “calibrate the parameters of the model”, “form a probability distribution from the ensemble”, “calculate the size of the discrepancy” etc., are castles in the air built on a single assumption which is known to be incorrect: that the model is perfect. These mathematical “phantastic objects”, are great works of logic but their outcomes are relevant only in model-land until a direct assertion is made that their underlying assumptions hold “well enough”; that they are shown to be adequate for purpose, not merely today’s best available model. Until the outcome is known, the ultimate arbiter must be expert judgment, as a model is always blind to things it does not contain and thus may experience Big Surprises.”
The Hawkmoth Effect
The essential, and largely unrecognized, problem with global climate models is model structural uncertainty/error, which is referred to by Thompson and Smith as the Hawkmoth Effect. A poster by Thompson and Smith provides a concise description of the Hawkmoth effect:
“The term “butterfly effect”, coined by Ed Lorenz, has been surprisingly successful as a device for communication of one aspect of nonlinear dynamics, namely, sensitive dependence on initial conditions (dynamical instability), and has even made its way into popular culture. The problem is easily solved using probabilistic forecasts.
“A non-technical summary of the Hawkmoth Effect is that “you can be arbitrarily close to the correct equations, but still not be close to the correct solutions”.
“Due to the Hawkmoth Effect, it is possible that even a good approximation to the equations of the climate system may not give output which accurately reflects the future climate.”
From their (2019) paper:
“It is sometimes suggested that if a model is only slightly wrong, then its outputs will correspondingly be only slightly wrong. The Butterfly Effect revealed that in deterministic nonlinear dynamical systems, a “slightly wrong” initial condition can yield wildly wrong outputs. The Hawkmoth Effect implies that when the mathematical structure of the model is only “slightly wrong”, then even the best formulated probability forecasts will be wildly wrong in time. These results from pure mathematics hold consequences not only for the aims of prediction but also for model development and calibration, ensemble interpretation and for the formation of initial condition ensembles.”
“Naïvely, we might hope that by making incremental improvements to the “realism” of a model (more accurate representations, greater details of processes, finer spatial or temporal resolution, etc.) we would also see incremental improvement in the outputs. Regarding the realism of short- term trajectories, this may well be true. It is not expected to be true in terms of probability forecasts. The nonlinear compound effects of any given small tweak to the model structure are so great that calibration becomes a very computationally-intensive task and the marginal performance benefits of additional subroutines or processes may be zero or even negative. In plainer terms, adding detail to the model can make it less accurate, less useful.”
JC note: This effect relates to the controversy surrounding the very high values of ECS in the latest CMIP6 global model simulations (see section 5 in What’s the worst case?), which is largely related to incorporation of more sophisticated parameterizations of cloud-aerosol interactions.
Fitness for purpose
From the Thompson and Smith paper:
“How good is a model before it is good enough to support a particular decision – i.e., adequate for the intended purpose (Parker, 2009)? This of course depends on the decision as well as on the model, and is particularly relevant when the decision to take no action at this time could carry a very high cost. When the justification of the research is to inform some real-world time-sensitive decision, merely employing the best available model can undermine (and has undermined) the notion of the science-based support of decision making, when limitations like those above are not spelt out clearly.”
“Is the model used simply the “best available” at the present time, or is it arguably adequate for the specific purpose of interest? How would adequacy for purpose be assessed, and what would it look like? Are you working with a weather-like task, where adequacy for purpose can more or less be quantified, or a climate-like task, where relevant forecasts cannot be evaluated fully? How do we evaluate models: against real-world variables, or against a contrived index, or against other models? Or are they primarily evaluated by means of their epistemic or physical foundations? Or, one step further, are they primarily explanatory models for insight and understanding rather than quantitative forecast machines? Does the model in fact assist with human understanding of the system, or is it so complex that it becomes a prosthesis of understanding in itself?”
“Using expert judgment, informed by the realism of simulations of the past, to define the expected relationship of model with reality and critically, to be very clear on the known limitations of today’s models and the likelihood of solving them in the near term, for the questions of interest.”
My report Climate Models for Laypersons, addressed the issue of fitness for purpose of global climate models for attribution of 20th century global warming:
“Evidence that the climate models are not fit for the purpose of identifying with high confidence the relative proportions of natural and human causes to the 20th century warming is as follows:
- substantial uncertainties in equilibrium climate sensitivity (ECS)
- the inability of GCMs to simulate the magnitude and phasing of natural internal variability on decadal-to-century timescales
- the use of 20th century observations in calibrating/tuning the GCMs
- the failure of climate models to provide a consistent explanation of the early 20th century warming and the mid-century cooling.”
From my article in the CLIVAR Newsletter:
“Assessing the adequacy of climate models for the purpose of predicting future climate is particularly difficult and arguably impossible. It is often assumed that if climate models reproduce current and past climates reasonably well, then we can have confidence in future predictions. However, empirical accuracy, to a substantial degree, may be due to tuning rather than to the model structural form. Further, the model may lack representations of processes and feedbacks that would significantly influence future climate change. Therefore, reliably reproducing past and present climate is not a sufficient condition for a model to be adequate for long-term projections, particularly for high-forcing scenarios that are well outside those previously observed in the instrumental record.”
With regards to 21st century climate model projections, Thompson and Smith make the following statement:
“An example: the most recent IPCC climate change assessment uses an expert judgment that there is only approximately a 2/3 chance that the actual outcome of global average temperatures in 2100 will fall into the central 90% confidence interval generated by climate models. Again, this is precisely the information needed for high-quality decision support: a model-based forecast, completed by a statement of its own limitations (the Probability of a “Big Surprise”).”
While the above statement is mostly correct, the IPCC does not provide a model-based forecast, since they admittedly ignore future volcanic and solar variability.
Personally I think that the situation with regards to 21st century climate projections is much worse. From Climate Models for Laypersons:
“The IPCC’s projections of 21st century climate change explicitly assume that carbon dioxide is the control knob for global climate. The CMIP climate model projections of the 21st century climate used by the IPCC are not convincing as predictions because of:
- failure to predict the warming slowdown in the early 21st century
- inability to simulate the patterns and timing of multidecadal ocean oscillations
- lack of account for future solar variations and solar indirect effects on climate
- neglect of the possibility of volcanic eruptions that are more active than the relatively quiet 20th century
- apparent oversensitivity to increases in greenhouse gases”
With regards to fitness for purpose of global/regional climate models for climate adaptation decision making, there are two particularly relevant articles:
- The Myopia of Imperfect Climate Models, by Frigg, Smith and Stainforth
- On the use and misuse of climate change projections in international development by Nissan et al.
“When a long-term view genuinely is relevant to decision making, much of the information available is not fit for purpose. Climate model projections are able to capture many aspects of the climate system and so can be relied upon to guide mitigation plans and broad adaptation strategies, but the use of these models to guide local, practical adaptation actions is unwarranted. Climate models are unable to represent future conditions at the degree of spatial, temporal, and probabilistic precision with which projections are often provided which gives a false impression of confidence to users of climate change information.”
Pathways out of model land and back to reality
Thompson and Smith provide the following criteria for identifying whether you are stuck in model land with a model that is not adequate for purpose:
“You may be living in model-land if you…
- try to optimize anything regarding the future;
- believe that decision-relevant probabilities can be extracted from models;
- believe that there are precise parameter values to be found;
- refuse to believe in anything that has not been seen in the model;
- think that learning more will reduce the uncertainty in a forecast;
- explicitly or implicitly set the Probability of a Big Surprise to zero; that there is nothing your model cannot simulate;
- want “one model to rule them all”;
- treat any failure, no matter how large, as a call for further extension to the existing modelling strategy.”
“Where we rely more on expert judgment, it is likely that models with not-too-much complexity will be the most intuitive and informative, and reflect their own limitations most clearly.”
“In escaping from model-land do we discard models completely: rather, we aim to use them more effectively. The choice is not between model-land or nothing. Instead, models and simulations are used to the furthest extent that confidence in their utility can be established, either by quantitative out-of-sample performance assessment or by well-founded critical expert judgment.”
Thompson and Smith focus on the desire to provide probabilistic forecasts to support real-world decision making, while at the same time providing some sense of uncertainty/confidence about these probabilities. IMO once you start talking about the ‘probability of the probabilities,’ then you’ve lost the plot in terms of anything meaningful for decision making.
Academic climate economists seem to want probabilities (with or without any meaningful confidence in them), and also some who are in the insurance sector and the broader financial sector. Decision makers that I work with seem less interested in probabilities. Those in the financial sector want a very large number of scenarios (including plausible worst case) and are less interested in actual probabilities of weather/climate outcomes. In non financial sectors, they mostly want a ‘best guess’ with a range of uncertainty (nominally the ‘very likely’ range); this is to assess to what degree they should be concerned about local climate change relative to other concerns.
As argued in my paper Climate Change: What’s the Worst Case?, model inadequacy and an inadequate number of simulations in the ensemble preclude producing unique or meaningful probability distributions from the frequency of model outcomes of future climate. I further argued that statistical creation of ‘fat tails’ from limited information about a distribution can produce very misleading information. I argued for creating a possibility distribution of possible scenarios, that can be created in a variety of ways (including global climate models), with a ‘necessity’ function describing the level and type of justification for the scenario.
Expert judgment is unavoidable in dealing with projections of future climates, but expert judgment on model adequacy for purpose is arguably more associated with model ‘comfort’ than with any rigorous assessment (see my previous post Culture of building confidence in climate models .)
The ‘experts’ are currently stymied by the latest round of CMIP6 climate model simulations, where about half of them (so far) have equilibrium climate sensitivity values exceeding 4.7C – well outside the bounds of long-established likely range of 1.5-4.5C. It will be very interesting to see how this plays out – do you toss out the climate model simulations, or the long-standing range of ECS values that is supported by multiple lines of evidence?
Application of expert judgment to assess the plausibility of future scenario outcomes, rather than assessing the plausibility of climate model adequacy, is arguably more useful.
Alternative scenario generation methods
An earlier paper by Smith and Stern (2011) argues that there is value in scientific speculation on policy-relevant aspects of plausible, high-impact scenarios, even though we can neither model them realistically nor provide a precise estimate of their probability. A surprise occurs if a possibility that had not even been articulated becomes true. Efforts to avoid surprises begin with ensuring there has been a fully imaginative consideration of possible future outcomes.
For examples of alternative scenario generation that are of particular relevance to regional climatic change (which is exceptionally poorly simulated by climate models), see these previous posts:
Historical and paleoclimate data, statistical forecast models, climate dynamics considerations and simple climate models can provide the basis for alternative scenario generation.
Given the level and types of uncertainty, efforts to bound the plausible range of future scenarios makes more sense for decision making than assessing the probability of probabilities, and statistically manufacturing ‘fat tails.’
Further this approach is a heck of lot less expensive than endless enhancements to climate models to be run on the world’s most powerful supercomputers that don’t address the fundamental structural problems related to the nonlinear interactions of two chaotic fluids.
Kudos to Thompson and Smith for their insightful paper and drawing attention to this issue.