Picking Cherries in the Gulf of Alaska

The bias arising from ex post selection of sites for regional tree ring chronologies has been a long standing issue at Climate Audit, especially in connection with Briffa’s chronologies for Yamal and Polar Urals (see tag.)  I discussed it most recently in connection with the Central Northwest Territories (CNWT) regional chronology of D’Arrigo et al 2006,  in which I showed a remarkable example of ex post selection.
In today’s post, I’ll show a third vivid example of the impact of ex post site selection on the divergence problem in Gulf of Alaska regional chronologies.  I did not pick this chronology as a particularly lurid example after examining multiple sites. This chronology is the first column in the Wilson et al 2016 N-TREND spreadsheet and was the first site in that collection that I examined closely.  It is also a site for which most (but not all) of the relevant data has been archived and which can therefore be examined. Unfortunately, data for many of the Wilson et al 2016 sites has not been been archived and, if past experience is any guide, it might take another decade to become available (by which time we will have all “moved on”).
The 2006 and 2014 Chronologies
In this case, the Gulf of Alaska chronology of D’Arrigo et al 2006 was the first long chronology using mountain hemlocks (TSME) from the Gulf of Alaska coast.   It had a pronounced divergence problem (top panel) and was never reported in a technical publication. In 2007, Wilson et al published a second long chronology, which purported to somewhat mitigate the divergence problem. (See Postscript).  In 2014, Wiles et al published a third long Gulf of Alaska TSME long chronology (later used in Wilson et al 2016), which was virtually identical to the 2006 version through its early history up to the 18th century or so, but which goes up in the 20th century, seemingly avoiding the divergence problem of the earlier series:

Figure 1. Gulf of Alaska TSME regional chronologies: top – D’Arrigo et al 2006; bottom – Wiles et al 2014, as used in Wilson et al 2016. 
Effect of Site Selection
Both Gulf of Alaska chronologies (D’Arrigo et al 2006 and Wiles et al 2006) used the same two subfossil data sets: both on the coast of Prince William Sound to the left of the location map shown below as Figure 2 (shown in large red-pink icons).  The identity of subfossil data explains the remarkable similarity of the two versions of the chronology up to about the 18th century: they are similar because they used the same data in this period.
However, the modern portion of the chronologies differs: the D’Arrigo et al 2006 version has a divergence problem, whereas the Wiles et al 2014 does not.   Both D’Arrigo et al 2006 and Wiles et al 2014 used RCS variations, but Wiles et al only used three (yellow) of ten D06 sites; Wiles et al discarded seven sites used in D’Arrigo et al 2006 (red below) and added five sites not used in D’Arrigo et al (green).  The D06 sites were first listed in the D’Arrigo et al 2006 Supplementary Information in 2012, over seven years after the article was cited by IPCC.
Remarkably, nearly all of the modern sites discarded by Wiles et al (red pins) are located close to and even almost contiguous with the two subfossil sites (both near the coast of Prince William Sound), while the five sites added by Wiles et al are all located about 800 km away near Juneau.

Figure 2 Location map comparing sites in D’Arrigo et al 2006 and Wiles et al 2014. Large red-pink – two subfossil sites used in both studies; red- seven modern sites only used in D’Arrigo et al 2006; yellow- three modern sites used in both studies; green – five modern sites only used in Wiles et al 2014. 
 
The only information in D’Arrigo et al 2006 on the provenance of their Gulf of Alaska data was that they used 820 cores and that its reference was “Wiles et al., Tree-ring evidence for a medieval warm period along the southern coast of Alaska, manuscript in preparation, 2005.”   Unfortunately, this article never appeared and, to my knowledge, there was never any technical publication of the D’Arrigo et al 2006 Gulf of Alaska series.  In 2012, an amendment to the D’Arrigo et al 2006 Supplementary Information finally listed the sites used in the D06 Gulf of Alaska regional chronology (used in the above location map.)
Wiles et al did not reconcile their sites against the sites previously used in D’Arrigo et al and, based on the location map, it is very difficult to contemplate a plausible ex ante rationale.  Indeed, it is hard to think of any rationale for the 800 km migration other than an intent by Wiles et al to  “partially circumvent” the divergence problem by only using modern sites that went up,  a program described in D’Arrigo et al 2009, (quoted in the previous post) as follows:

The divergence problem can be partially circumvented by utilizing tree-ring data for dendroclimatic reconstructions from sites where divergence is either absent or minimal. (Wilson et al., 2007; Buntgen et al., in press; Youngblut and Luckman, in press).

And, indeed, the divergence problem was definitely on the minds of Wiles et al. In their abstract, they stated that the modern sites in their network showed no “evidence of the so-called divergence effect”. They attributed this to the “moderate elevation” of the sites in their selection of sites:

The moderate elevation at the tree-ring sites has allowed these trees to retain their temperature signal without evidence of the so-called divergence effect, or underestimation of tree-ring inferred temperature trends, which is observed at many northern latitude forest locations.

Later, in the running text, they explained that they “target[ed]” sites where the “trees appear to still be responding positively to temperature” to avoid “bias[ing]” their results:

Here, we use tree-ring records from living hemlock at mid-elevation GOA sites where such trees appear to still be responding positively to temperature as in the past. Targeting such sites, we minimize divergence in the recent period that might bias our results and thus provide a more accurate assessment of contemporary warming relative to previous centuries.

 
It was either cheeky or ignorant on their part to characterize such blatant cherrypicking as a technique to avoid “bias[ing] their results”.   That such strategies are accepted without qualm both by referees and other specialists in the field speaks volumes.
A Replication Puzzle
Even spotting Wiles et al their modern sites, I do not believe that it is possible to replicate their non-declining chronology based on available data.
Wiles et al used 8 modern sites and two subfossil sites (listed in their Table 1).  Measurement data for the two subfossil sites and six of eight modern sites appears to be fully archived at NOAA, but one data set (Wright Mountain) is completely unarchived and an unarchived (and expanded) second version of Eyak Mountain appears to have been used in Wiles et al 2014.  Ironically, Wiles et al 2014 Table 1 specifically (but incorrectly) stated that the Wright Mountain data had been archived at ITRDB.
Nonetheless, the archived data for the two subfossil sites and 6.5 (of 8) modern sites permits calculation of an RCS chronology that would one would expect to be quite similar to the chronology reported in Wiles et al 2014.  Using the available data, I therefore calculated an RCS chronology (see bottom panel) using a one-size-fits-all standardization curve, an RCS variant said to have been used, according to the running text of Wiles et al 2014.   The correspondence between the Wiles chronology and my emulation is very close up to the 18th century, but I was unable to replicate the closing uptick of the Wiles et al 2014 reconstruction, obtaining instead the closing decline, also seen in the D’Arrigo et al 2006 version.

Figure 3. Top – Wiles et al 2014 reconstruction re-scaled to match chronology scale; bottom – emulated RCS chronology using available ITRDB data for sites listed in Wiles et al Table 1.
In the next figure, I’ ve tried to highlight the 20th century difference between the two versions by zooming in.  At high frequency, the Wiles et al version and the emulation are very similar, but the emulation (red) shows the characteristic decline (divergence problem), while the Wiles version goes up slightly in the 20th century, with most of the increase due to higher post-1975 values in the Wiles reconstruction.

Figure 4. Detail of chronologies shown in Figure 3.
It is possible that inclusion of the unarchived data from Wright Mountain and Eyak Mountain will reconcile the differences; if so, there is considerable irony in the proposed mitigation of the divergence problem depending on only two sites, neither of which have been archived.  It is possible that the difference arises in different implementations of poorly described RCS protocols – maybe the chronologies were estimated site by site and averaged, rather than one size fits all.  There is one final possibility that I would never have postulated prior to my recent reconciliation of the D’Arrigo et al Central Northwest Territories regional chronology: in that case, D’Arrigo selectively included cores from a site that went up, while selectively excluding cores from a site that went down.  Without a complete measurement archive, there is little point reflecting further on such matters.
Conclusion
My underlying issue with “regional chronologies” is that the 20th century shape of the chronologies can be dramatically impacted by ex post selection of modern data.  I originally raised the question of ex post data collection in the earliest days of Climate Audit in connection with the NH reconstruction of Jacoby and D’Arrigo 1989.  I wrote many posts on this issue in connection with Briffa’s Yamal and Polar Urals chronologies, where site selection clearly impacted the shape of the chronology (see e.g. here here here here here.)  This was a large controversy leading into Climategate.
In a recent post, I showed that D’Arrigo consciously attempted to “circumvent” the divergence problem by ex post selection of sites that went up, with a surprisingly blunt implementation of this questionable strategy in the CNWT regional chronology of D’Arrigo et al 2006. In today’s post, I showed that the Gulf of Alaska regional chronology is one more example, where the shape of the regional chronology has been impacted by ex post site selection, in this case, with the selective use of sites over 800 km distant from the target subfossil sites.
Some time ago, Gavin Schmidt observed of a chronology of which he disapproved (his objections not actually being valid, but that’s another story):

if any actual scientist had produced such a poorly explained, unvalidated, uncalibrated, reconstruction with no error bars or bootstrapping or demonstrations of common signals etc., McIntyre would have been (rightly) scornful.

Even though that the most recent Gulf of Alaska chronology amply meets Schmidt’s criteria of being “poorly explained, unvalidated, uncalibrated, reconstruction with no error bars or bootstrapping or demonstrations of common signals”,  I will content myself with mild (Canadian) disapproval, but would not strongly argue with Schmidt if he wrote a review that was more severely “scornful”.
 
 
 
Postscript – Wilson et al 2007
In 2007, Wilson et al published a third regional chronology using Gulf of Alaska TSME sites.  While this chronology was not used in the Wilson et al 2016 composite,  the Supplementary Information of D’Arrigo et al 2006 stated that the cores used in D’Arrigo et al 2006 were identical to the cores used in Wilson et al 2007.   I have concluded that his information is false, but it took me quite a bit of time to be confident of this conclusion and I wish to document my reasoning while it is fresh in my mind.
Wilson et al 2007 had been discussed at Climate Audit soon after publication (also here on varimax rotation).  Needless to say, the measurement data required for analysis was not available at the time of publication.  The comments thread contained a lively exchange between Willis Eschenbach and Rob Wilson about archiving: Eschenbach sharply criticized Wilson and coauthors for failing to archive data concurrent with publication; Wilson attempted to deflect the criticism as overwrought on the grounds that archiving delay, while regrettable, would be slight.  As it turned out, the majority of the missing data wasn’t archived for another five years (2012) and a little is still unarchived, a delay which, in my opinion, more than vindicates Eschenbach’s side of the dispute.
In fall 2009, Kaufman et al (2009) published a multi-proxy Arctic reconstruction, one item in which was a Gulf of Alaska temperature reconstruction attributed to D’Arrigo et al 2006 (which had produced an RCS chronology but not a temperature reconstruction.)  In December 2009, the Supplementary Information to D’Arrigo et al 2006 was amended, including the archiving of the Gulf of Alaska temperature reconstruction used in the recently published Kaufman et al. (All other D06 chronologies remained unarchived until 2012!!)
The 2009 SI amendment stated that the D06 Gulf of Alaska chronology had used the same 820 cores as the Wilson et al 2007 reconstruction:

Wilson et al. 2007 produced a Gulf of Alaska reconstruction based on an STD chronology derived from the same 820 ringwidth series….

820 individual series that were published in the two articles listed above. The Standard Chronology (ak096.crn) was used for the reconstruction by Wilson et al. 2007. The RCS chronology (ak096c.crn) was used in the D’Arrigo et al. 2006 reconstruction.

At this time, two chronologies (ak096.crn and ak096c.crn) and one measurement dataset (ak096.rwl) were contributed to the ITRDB data bank.
However, Wilson et al 2007 (of which Wiles was a coauthor) described an entirely different network that that illustrated in my Figure 2 (based on my reconciliation of the core numbers of ak096.rwl.  Wilson et al listed an opening network of 31 sites in their Table 1.   Wilson et al appear to have calculated RCS chronologies on a site-by-site basis for all 31 sites, which were then screened for correlation to instrumental data, resulting in nine sites being discarded.  The 31 Wilson et al 2007 sites were shown in a location map in the original article, reproduced and annotated below, showing a stretch of the Alaska coastline almost 1000 km long, reaching from the Juneau area on the right to Kodiak Island on the left:
 

Figure 5. Location map from Wilson et al 2007, showing the 31 sites (22 used sites in solid colors), overprinting the D06 sites (magenta +). 
In 2012, more major changes were made to the SI to D’Arrigo et al 2006.  Seven years after my request to IPCC, the 19 regional STD and RCS chronologies were finally archived.  While the STD and RCS chronologies archived in 2012 for Gulf of Alaska matched the two ak096 chronologies archived in 2009, the chronologies for most of the sites appeared in 2012 for the first time.  New 2012 commentary on the Gulf of Alaska chronologies stated that the D06 chronology had been developed from 10 modern sites:

Coastal Alaska
10 Living chronologies:

Data with ITRDB code:
Ellsworth Glacier, Alaska (EL) ITRDB AK015
Rock Glacier (RG) ITRDB AK024
Water Supply (WS) ITRDB AK029
Wolverine Glacier (WV) ITRDB AK030
Tebenkof Glacier (TB) ITRDB AK025
Miners Well (MW) ITRDB AK021
Nichawak Mountain (NK) ITRDB AK022
Cordova Eyak Mountain (CV) ITRDB AK020
Massive Rock near Cordova (MR) ITRDB AK090
Rock Tor (RT) ITRDB AK091

Sub-fossil material: Data not archived and continually being updated.
Relevant contact is Greg Wiles (gwiles@xxx) – primary generator
of the data - and Rob Wilson (rjsw@xxx) who has original
2006 version used.

I’ve marked the location of these sites used in D’Arrigo et al 2006 with a magenta + sign.  Nearly all of the 10 come from the Prince William Sound area (top towards the left), whereas the W2007 sites stretch for about 1000 km along the coast.  The two subfossil sites (used in all long chronologies) both come from the Prince William Sound area (marked with solid magenta dots).  Ironically, although the 2012 SI amendment said that the subfossil data was not archived, it had actually been archived in 2009 (as part of ak096.rwl).
Obviously , the 10 modern sites used in D’Arrigo et al (2006) do not match the 22 modern sites used in Wilson et al 2007. Only nine sites are common. Thirteen sites used in Wilson et al 2007 are not used in D’Arrigo et al 2006, while one site used in D’Arrigo et al (Tebenkof Glacier) was not used in Wilson et al 2007.  It is obviously impossible for the 820 cores used in D’Arrigo et al 2006 to be identical to the cores used in Wilson et al 2007, unless the descriptions in Wilson et al 2007 are completely incorrect.
It is also instructive to review the multivariate methodology of Wilson et al 2007 as a potential contributor to their “circumventing” the divergence problem.  After they had screened their original network from 31 to 22 sites  – ex post screening of the type long criticized at Climate Audit, they carried out principal components analysis on the 22 site-by-site chronologies (each of which was calculated as a site STD chronology).  They retained four principal components, which were then subjected to varimax rotation.  They then calculated a temperature reconstruction by regressing instrumental temperature onto the four (rotated) principal components in a calibration period.   The resulting temperature reconstruction (not shown in this post, but its shape is similar to the Wiles et al 2014 reconstruction shown above) did not have the 20th century decline that characterized the Arrigo et al (2006) reconstruction.
In a recent discussion at Bishop Hill,  Rob Wilson likened the improvement in recent regional chronologies to the improvement from a Trabant to a 2016 BMW Series 1:

Of course there are older versions, but only a fool would use an old version with less data or that had calibration issues etc. Would you rather drive a Trabant or a 2016 BMW series 1. Duh!

Each of the multivariate operations in their PC methodology is linear and thus the temperature reconstruction is necessarily a linear function of the underlying 22 chronologies. However, the technique of Wilson et al (2007) does not constrain the coefficients to remain positive.  Their method can result in negative coefficients i.e. flipping of series upside down (an issue that Jeff Id and I have discussed on many occasions in the context of Mannian methodology). Even if it is possible to extract information on regional temperature from the tree ring data, in my opinion, complicated multivariate methods like that of Wilson et al 2007 are a retrogression from simpler regional averages, rather than an improvement – let alone an improvement on the order of a Trabant to a BMW Series 1 – unless one were  attempting to quantify “improvemens” in the technology of “data torture”.

Source