The Kaufman Tautology

The revised PAGES2K Arctic reconstruction used 56 proxies (down three from the original 59).  Although McKay and Kaufman 2014 didn’t mention the elephant in the room changes in their reconstruction (as discussed at CA here here), they reported with some satisfaction that “decadal-scale variability in the revised [PAGES2K] reconstruction is quite similar to that determined by Kaufman et al. (2009)”, presumably thinking that this replication in the larger dataset was evidence of robustness of at least this property of the data. However, while the decadal scale similarity is real enough, this is more of a tautology rather than evidence of robustness, as 16 of the most highly weighted PAGES2K proxies come from the Kaufman et al 2009 network  (the 22 Kaufman 2009 proxies being assigned over 80% of the total weight and the other  34 proxies under 20%.)
The Decadal-Scale Similarity
McKay and Kaufman illustrated the decadal-scale similarity between the Kaufman et al 2009 reconstruction and the PAGES2K Arctic (revised) reconstruction in their Figure 2d (shown below).  A similar point could have been made about the PAGES2k-2013 version as well. The decadal-scale similarity is real enough.

Figure 1. McKay and Kaufman 2014 Figure 2d showing - on inconsistent scale -  revised PAGES Arctic 2K (red) and Kaufman 2009 (black).
As noted in our opening discussions, the scale of the Kaufman et al 2009 and PAGES2K reconstructions are not the same in the above diagram.  Figure 2 below compares the Kaufman and PAGES2K Arctic (revised) reconstructions on a consistent scale, overplotting PAGES2K data onto an image used by the New York Times to illustrate the Kaufman reconstruction. From this perspective, the difference in scale is most manifest as much cooler temperatures in the Little Ice Age, especially in the early 19th century,

Figure 2. Comparison of revised PAGES Arctic 2K (blue) and Kaufman 2009 (black), overplotting onto New York Times figure.
Barplot of Weights
The reason for the similarity is not robustness of the results to the new data, but because of the weights assigned to proxies by the paico algorithm as implemented by PAGES2K (presumably unintentionally.) In Figure 3 below, I show a barplot of weights for the PAGES2K proxies calculated by Jean S using the paico decomposition method that we recently used in connection with the Hanhihjarvi data.
In the previous post on “Paico Decomposition”, Jean S and I showed that the effective weights of each proxy in a paico reconstruction could be estimated from a dataset in which each column was the difference between the base (uncalibrated) reconstruction and a reconstruction in which each proxy was individually inverted.  The sum of these columns closely approximated the base reconstruction. Thus, the standard deviation of each column measured the effective weight of each proxy.
The barplot in Figure 1 below shows the effective weight of each proxy calculated using this methodology (with the weights standardized so the sum of squares is equal to 1). Proxies previously used in Kaufman et al 2009 shown in red and other proxies in black.
One easily observes that nearly all of the most heavily weighted proxies had been previously used in Kaufman et al 2009 (16 of top 18; 18 of top 21), with over 80% of the total weight assigned to Kaufman proxies and under 20% to the 34 “new” proxies.  The average weight of a Kaufman-2009 proxy is nearly 5 times greater than the average weight assigned to other proxies.

Figure 3. Barplot of estimated effective weight of each proxy in PAGES2K 2014 (Arctic).  Red- also in Kaufman et al 2009; black – not in Kaufman et al 2009. Interestingly, the top weighted proxy is the Hvitarvatn proxy, the orientation of which was inverted between PAGES2K-2013 and PAGES2K-2014.   Its high weighting undoubtedly explains the majority of the large change between the two reconstructions. Also note that three Briffa series are in the top 10 (including the 2008 Yamal superstick  in the top four).
Discussion
Because of the heavy weighting of Kaufman et al 2009 proxies, the McKay and Kaufman conclusion that the “decadal-scale variability in the revised [PAGES2K] reconstruction is quite similar to that determined by Kaufman et al.” is, as advertised above, more of a tautology rather evidence of robustness of the result in the additional data.
At the end of the day, any proxy reconstruction is either a linear combination of the underlying proxies (or can be closely approximated by such a linear combination.) Over the years, I’ve consistently urged that the effective weights be shown for novel methods. Had this been done, I doubt that the above weights would have been the result, since it’s hard to believe that the Arctic2K authors intentionally adopted the above weights. Jean S has done some experiments and there are definitely alternative weighting schemes that can result from slightly varied implementations of paico.
As CA readers are aware, I remain dubious that material benefits arise from putting relatively simple datasets into increasingly complicated and poorly understood multivariate methods and remain of the opinion that there are better opportunities for improving analysis by first comparing like proxies across regions and comparisons of unlike proxies within a region, prior to venturing into the assimilation of unlike proxies in different regions.  But this recommendation has been mostly rejected by specialists in the field, who remain committed to dumping data into black boxes, but who get huffy when resulting defects are criticized.
Finally nearly all the difference between the PAGES2K-2013 and the revised result arises from a single proxy (Hvitarvatn, used upside down in the earlier version.)  Some readers have expressed surprise at the idea that specialists could use proxies upside down, observing that their interpretation as temperature proxies must be very tenuous if even specialists didn’t know which way was up.  Particularly in a multi-author Nature article, subsequently relied upon by IPCC.   I agree with this and have written numerous articles critical of varvology, proxies that have become widely used in post-AR4 multiproxy studies.  I think that there may well be usable information in this data, but as long as thick varves are interpreted by some specialists as evidence of cold and by other specialists as evidence of warmth, the first order of business for assessment is to reconcile varve thickness data before dumping the data into a multiproxy composite, rather than after.

Source