Polar Bears, Inadequate data and Statistical Lipstick


A recent paper Internet Blogs, Polar Bears, and Climate-Change Denial by Proxy by JEFFREY A. HARVEY and 13 others has been creating somewhat of a stir in the blogosphere. The paper’s abstract purports to achieve the following:

Increasing surface temperatures, Arctic sea-ice loss, and other evidence of anthropogenic global warming (AGW) are acknowledged by every major scientific organization in the world. However, there is a wide gap between this broad scientific consensus and public opinion. Internet blogs have strongly contributed to this consensus gap by fomenting misunderstandings of AGW causes and consequences. Polar bears (Ursus maritimus) have become a “poster species” for AGW, making them a target of those denying AGW evidence. *Here, focusing on Arctic sea ice and polar bears, we show that blogs that deny or downplay AGW disregard the overwhelming scientific evidence of Arctic sea-ice loss and polar bear vulnerability.* By denying the impacts of AGW on polar bears, bloggers aim to cast doubt on other established ecological consequences of AGW, aggravating the consensus gap. To counter misinformation and reduce this gap, scientists should directly engage the public in the media and blogosphere.

Reading further into the paper we find that this seems to be yet another piece of  propaganda to push a Climate Change agenda. In line with the high standards of climate science “communication”, there are over 50 occurences of various forms of the derogatory labels “denier” or “deny” in a mere five pages of text and two pages of references. Such derogatory language has become commonplace in the climate change academic world and reflects badly on the authors who use it.

The paper offers nothing new in terms of scientific research on polar bears or on any other topic, so to justify publication, it includes a superfluous “study” of the views held on the subject within blogs and academic papers. My concern is with the accompanying data and statistical analysis providing the “scientific” veneer for their discussion.

The “lipstick” on the paper’s Figure 2 is perhaps one of the best examples of creating a misleading aura of “real science” that I have encountered in some time.



The closely bunched points massed together in the circled groups create an impression that the data from which it was generated must be based on a set of carefully crafted “measurements” capable of representing in-depth rigorous scientific assessments of a set of papers and blogs. To understand what it actually represents and how it came to be, we start by examining the methodology described in the paper for creating an appropriate data set:

As natural and social scientists, we grounded our study in Nisbet’s (2014) typology of frames used by science policymakers and journalists and provide full context and statistical analysis with objective interpretation.

Say what? In order to make sense of what this might mean we look at the Nisbet reference referred to in the above quotation. I found the following indicating what “framing” can be used for.

(From Nisbet, p.43):
Scientists can use framing to motivate greater interest and concern; to shape preferences for policies informed by or supportive of science; to influence political or personal behavior; to go beyond polarization and unite various publics around common ground; to define policy choices or options; and/or to rally fellow scientists around shared goals or strategy.

OK. Got it. Translation: We need to produce some climate science propaganda so we will design the method for collecting our data with that in mind.

Next step:

We conducted a content analysis to categorize how blogs presented evidence of or opinions on AGW to explain the current and future effects of AGW on Arctic ice extent and polar-bear status (Braun and Clarke 2006). On the basis of statements regarding the current and future status of Arctic sea-ice extent and polar-bear populations, we entered keywords, including global warming, climate change, polar bear, and Arctic ice, into Google’s search engine. From the blogs, we identified common positions on Arctic ice extent (1–3) and polar-bear status (4–6) and methodically coded each entry’s stated positions from the 90 blogs using a constant comparative approach, ensuring that no additional codes were required (Kolb 2012).

No wonder they needed 14 authors to create this paper! “Methodical coding” of this type must require a lot of work.

From Kolb, p.1:
In grounded theory the researcher uses multiple stages of collecting, refining, and categorizing the data (Strauss & Corbin). As identified in the literature, making constant comparisons and applying theoretical sampling are necessary strategies used for developing grounded theory.

The constant comparative method “combines systematic data collection, coding, and analysis with theoretical sampling in order to generate theory that is integrated, close to the data, and expressed in a form clear enough for further testing” (Conrad, Neumann, Haworth, & Scott, 1993, p. 280).

That sounds impressive! Using the above “guidelines”, we searched for some hot button key words on blogs to produce some data which we can then code and interpret.

Continuing with the data-gathering process:

Each blog was coded for stated positions on these two topics (Arctic ice extent and polar-bear status). The six codes identified were the following: (1) sea-ice extent is on average declining rapidly in the Arctic; (2) seaice extent is decreasing only marginally, is not decreasing significantly, or is currently recovering in the Arctic; (3) changes in sea-ice extent in the Arctic are due to natural variability, and it is impossible to predict future conditions; (4) polar bears are threatened with extinction by present and future AGW; (5) polar bears are not threatened with extinction by present and future AGW; and (6) polar bears will adapt to any future changes in Arctic ice extent whether because of AGW or natural variability. We also collected every peer-reviewed scientific paper that we could find that investigated both polar bears and sea ice in our search process (92 papers) and scored their positions for the same six statements. The scores for both blogs and papers were analyzed, and a principle component analysis was used to visualize their relations.

After all this heavy lifting, we have six binary variables. This does not appear to be sufficient for any sort of robust statistical analysis. There is no description of any intermediate procedures used in examining the blogs or papers nor any further information in the data set to demonstrate that any “multiple stages of collecting, refining, and categorizing the data” or anything else was used to categorize the data in a formal scientifically valid fashion.

Furthermore, the language of several questions seems to be opposite sides of the same question. The “codes” (1) and (2) are different sides of the same concept. The second of these conflates three possibilities (although to me it looks like “decreasing only marginally” and “not decreasing significantly” are really pretty much the same). These might be better posited as a single variable with three or more ordinal categories providing some nuance to an ill-conceived design.

The same criticism obviously applies to aspects (4) and (5). Nobody would answer “yes” for both of these simultaneously (except a person who may simultaneously hold two contradictory ideas previously known only to exist in the mind of a certain psychology professor) so a better choice here could have been to combine them two as a single question with three choices, “yes”, “no” and “don’t know” (or “doesn’t say”). Unfortunately, their methods lead to a very large number (154) of “missing values” in the resulting data set thereby strongly interfering with the robustness of subsequent statistical analyses. These missing values are due not to some unforeseen inability to access the required information, but to the poor design for collecting and coding the information.

From looking at the data released by the authors, I am guessing that an explicit indication that either (4) or (5) be true might have been required for for a “yes” to be selected for each of these. However, dealing with the missing values separately is still both clumsy and distorts the analysis.

Note that the above quote states

We also collected every peer-reviewed scientific paper that we could find that investigated both polar bears and sea ice in our search process (92 papers) and scored their positions for the same six statements.

One of the variables in the data is code which identifies individual blogs and papers.  The papers have values which go from paper_4 to paper_165 with intermittent gaps in within the sequence.  From this one can deduce the fact at least 165 papers were initially considered and subsequently reduced to the 86 AGW == Pro and 6 Skeptic == Contra papers used in the final study.  I did not find any real explanation of either the methods or the reasons used to remove the missing papers.

A much greater problem was the lack of diversity in the authorship of the Pro papers.  As noticed  elsewhere, three persons (two authors of this paper, Steven Amstrup and Ian Stirling, along with Andrew Derocher) were either lead or co-authors of 60 Pro papers.  It is not surprising that the data gleaned from these papers might be virtually identical given that the same people are beating the same drum repeatedly.

But we are not quite there yet. From the paper’s Supplement document:

Citation of Susan Crockford was also recorded. Blogs were assigned ‘science-based’ and ‘denier’ categories on the basis of their positions taken relative to those drawn by the IPCC on global warming (e.g. whether it is warming or not and the anthropogenic contribution).

The possible mislabeling of several blogs which is unavoidable due to the lack of proper information about them has been noticed by others. I won’t address this issue because it does not contribute to the salient point of Figure 2.

The creation of a seventh binary variable – whether a paper or a blog references Dr. Crockford actually gets to the raison d’etre of the whole exercise:

Although the effects of warming on some polar-bear subpopulations are not yet documented and other subpopulations are apparently still faring well, the fundamental relationship between polar-bear welfare and sea-ice availability is well established, and unmitigated AGW assures that all polar bears ultimately will be negatively affected. Indeed, credible estimates suggest that the entire Arctic may be ice-free during summer within several decades (Snape and Forster 2014, Stroeve and Notz 2015, Notz and Stroeve 2017), a process that, as has been suggested by both theoretical and empirical evidence, will drastically reduce polar-bear populations across their range (Amstrup et al. 2010, Stirling and Derocher 2012, Atwood et al. 2016, Regehr et al. 2016).

It appears that the paper is meant as a hit piece on Dr. Susan Crockford who has done work on bears and presents some of her results on her blog . She maintains that polar bears are currently doing reasonably well and this seems to have bothered the authors of the paper. Interestingly enough, they reluctantly admit the polar bears ARE doing well enough now, and seem to be upset that others may not be totally convinced that the projected catastrophe will occur as scheduled or that the bears might be able to adapt as they have for centuries.

Merely discussing or even citing what Dr. Crockford has written is sufficient grounds  for being declared true “deniers” by the authors of this paper.  That the future is certain and the polar bears will all suffer is the only allowable belief.

We are now ready to tackle the “statistical lipstick” which adds that dollop of “science” to this propagandistic diatribe.

As we have seen, the data on which the plot is ostensibly based is a set of 182 vectors each consisting of 7 binary elements, i.e. each element can only be one of two possible responses: yes or no, in this case. That means that the data can at best distinguish between a maximum of 128 different cases. Missing values create a third possibility but, as indicated earlier, this is a self-inflicted detriment to a proper analysis because it interferes with the analytic procedures. Infilling the missing data is not the same as structuring the information properly and affects the end result according to how the infilling is implemented.

The actual situation is much worse. A simple check leads to the fact that there are only 25 unique vectors among the 182 observations. Furthermore, only 7 of those 25 do not contain missing values.

Using the coding: 0 = Missing,    1 = No,    2 = Yes.      Contra = Skeptics.     Pro = AGWers

  1. citation of Susan Crawford;
  2. sea-ice extent is on average declining rapidly in the Arctic;
  3. sea-ice extent is decreasing only marginally, is not decreasing significantly, or is currently recovering in the Arctic;
  4. changes in sea-ice extent in the Arctic are due to natural variability, and it is impossible to predict future conditions;
  5. polar bears are threatened with extinction by present and future AGW;
  6. polar bears are not threatened with extinction by present and future AGW;
  7. polar bears will adapt to any future changes in Arctic ice extent whether because of AGW or natural variability.

For the Contra group (45 Blogs and 6 Papers):

1002122 1111122 1112101 1112122 1122102 1122122 1211020
1 1 1 1 7 2 2
1212102 2110102 2120102 2120201 2122102 2122122 2212122
1 1 2 1 23 7 1

For the Pro group (45 Blogs and 86 Papers):

1001201 1200200 1200201 1200211 1201201 1210201 1210211
1 1 6 2 1 4 2
1211200 1211201 1211211 1211221
2 63 47 2

The values underneath each sequence represent the frequency of responses for that sequence.  You will note that there is NO overlap between these two sets of answers.

Blogs were selected from the internet by simplistic criteria  guaranteed to separate the subject matter  into two disjoint groups, Pro and Contra.  The data lacks any capability for discriminating among various levels of acceptance or rejection of the statements above so every blog and paper is automatically identical to all others within the same group.  The lack of diversity in the Pro Group is very apparent in that only two answer sequences account for 84% of the “scientific” papers and blogs.  The Contra group has almost 73% of the group sharing only three different answers.

I have had undergraduate students who have done more professional work designing a study  such as this after taking a single elementary statistics course.

Why do we need the following gratuitous high-powered statistical method mentioned in the Supplementary document to see if the groups are separable?

Blogs were assigned ‘science-based’ and ‘denier’ categories on the basis of their positions taken relative to those drawn by the IPCC on global warming (e.g. whether it is warming or not and the anthropogenic contribution). The assignment was confirmed by creating a distance matrix from the scores using absolute distance (Manhattan distance) and performing a hierarchical cluster analysis on the result (Ward.D2 method from R 3.3.3, R Core Team, 2017). Both methods yielded two large clusters with identical content.

Identical content indeed! Clusters consisting of blogs and papers with pretty much identical classifying variables as explained above which have been pre-binned into two groups by some nebulous criteria for evaluating their “scientific” attitude status. One could more easily demonstrate this using simple cross-tabulations tables, but then it would draw attention to the poorly done data gathering.

The next step was to perform a principal component analysis of the data from which the graph above was plotted.  Again from the Supplementary document:

A broad keyword search on the internet and the ISI Web of Science database yielded 90 blogs (described above) and 92 peer reviewed papers reporting on both Polar bears and arctic ice. Author’s positions in papers were scored in in same “position space” defined by binary answers to the six statements formulated in the main papers and citation of Dr. Susan Crockford as an expert. Missing values were replaced by zero after scaling and centering to minimize the influence of the replacement. The final data matrix contained the sources in the rows and the scores in the columns. The PCA was conducted using the prcomp routine from R 3.3.3 (R Core Team, 2017). Papers were classified as controversial when they evoked critical comments and discussion in the peer reviewed literature., blogs were colour coded using the results of a hierarchical cluster analysis (Ward.D2 method from R 3.3.3, R Core Team, 2017). Datapoint[s] were slightly jittered to improve visibility of overlapping points.

The 182 vectors each consisting of the seven scaled and infilled and scaled again answers were passed through a sophisticated mathematical procedure to calculate a pair of coordinates for plotting each point on the graph. This should generate 192 points, right? Well, yes and no. Principal components have the property that if all of the answers are identical for two blogs or papers, then the coordinates for those sources will be exactly the same and one point will cover the existence of the other when plotted.  Since there are only 25 distinct sets of answers, only 25 points will actually show on the plot.  I have added a fifth category consisting of several points which were shared by Pro blogs and Pro papers.


So how did the authors create the plot near the top of the post.  The last line of the quoted text provides the answer for that question . “Datapoint[s] were slightly jittered…” For those not familiar with this concept, “jittering” is the addition of random amounts to each coordinate of overlapping points  for the purpose of causing a slight separation . I would suggest that in this case “slightly” is in fact a gross understatement.   The relatively large amounts added to each coordinate create a very different impression of a data set with sufficient information and discriminatory power to justify being  real  “science”, something that clearly is not the case for this data.  Furthermore, it appears that all of the points were treated in this fashion including the ones that were not covered by any of their neighbors and therefore should have been left alone.

Figure 1 also has some problems, but this post is long enough.

The data and the statistical aspects of this paper  are lame lipstick for a propaganda attack on everyone who does not share the beliefs of the authors.  It is sad that cursory peer review persists in the climate change world allowing  incompetent papers to pass through over and over again.

The supplementary paper can be found here.

The data for the paper is here.


Climate Audit

Dear friends of this aggregator

  • Yes, I intentionally removed Newsbud from the aggregator on Mar 22.
  • Newsbud did not block the aggregator, although their editor blocked me on twitter after a comment I made to her
  • As far as I know, the only site that blocks this aggregator is Global Research. I have no idea why!!
  • Please stop recommending Newsbud and Global Research to be added to the aggregator.

Support this site

News Sources

Source Items
WWI Hidden History 50
Grayzone Project 119
Pass Blue 180
Dilyana Gaytandzhieva 14
John Pilger 416
The Real News 367
Scrutinised Minds 29
Need To Know News 2408
FEE 4435
Marine Le Pen 323
Francois Asselineau 25
Opassande 53
HAX on 5July 220
Henrik Alexandersson 847
Mohamed Omar 356
Professors Blog 10
Arg Blatte Talar 40
Angry Foreigner 18
Fritte Fritzson 12
Teologiska rummet 32
Filosofiska rummet 104
Vetenskapsradion Historia 151
Snedtänkt (Kalle Lind) 214
Les Crises 2655
Richard Falk 158
Ian Sinclair 101
SpinWatch 59
Counter Currents 8810
Kafila 433
Gail Malone 37
Transnational Foundation 221
Rick Falkvinge 94
The Duran 9470
Vanessa Beeley 93
Nina Kouprianova 9
MintPress 5570
Paul Craig Roberts 1662
News Junkie Post 58
Nomi Prins 27
Kurt Nimmo 191
Strategic Culture 4673
Sir Ken Robinson 20
Stephan Kinsella 93
Liberty Blitzkrieg 842
Sami Bedouin 64
Consortium News 2568
21 Century Wire 3477
Burning Blogger 324
Stephen Gowans 85
David D. Friedman 150
Anarchist Standard 16
The BRICS Post 1507
Tom Dispatch 503
Levant Report 18
The Saker 4149
The Barnes Review 517
John Friend 453
Psyche Truth 152
Jonathan Cook 145
New Eastern Outlook 3799
School Sucks Project 1768
Giza Death Star 1852
Andrew Gavin Marshall 15
Red Ice Radio 606
GMWatch 2200
Robert Faurisson 150
Espionage History Archive 34
Jay's Analysis 920
Le 4ème singe 90
Jacob Cohen 206
Agora Vox 14803
Cercle Des Volontaires 431
Panamza 2106
Fairewinds 116
Project Censored 928
Spy Culture 502
Conspiracy Archive 76
Crystal Clark 11
Timothy Kelly 553
PINAC 1482
The Conscious Resistance 799
Independent Science News 76
The Anti Media 6584
Positive News 820
Brandon Martinez 30
Steven Chovanec 61
Lionel 291
The Mind renewed 439
Natural Society 2619
Yanis Varoufakis 964
Tragedy & Hope 122
Dr. Tim Ball 114
Web of Debt 141
Porkins Policy Review 408
Conspiracy Watch 174
Eva Bartlett 591
Libyan War Truth 321
DeadLine Live 1910
Kevin Ryan 62
Aaron Franz 225
Traces of Reality 166
Revelations Radio News 121
Dr. Bruce Levine 142
Peter B Collins 1533
Faux Capitalism 205
Dissident Voice 10504
Climate Audit 222
Donna Laframboise 424
Judith Curry 1119
Geneva Business Insider 40
Media Monarchy 2313
Syria Report 78
Human Rights Investigation 91
Intifada (Voice of Palestine) 1685
Down With Tyranny 11556
Laura Wells Solutions 43
Video Rebel's Blog 429
Revisionist Review 485
Aletho News 19954
ضد العولمة 27
Penny for your thoughts 2949
Northerntruthseeker 2330
كساريات 37
Color Revolutions and Geopolitics 27
Stop Nato 4703 Blog 2997 Original Content 6797
Corbett Report 2296
Stop Imperialism 491
Land Destroyer 1175
Webster Tarpley Website 1083

Compiled Feeds

Public Lists

Title Visibility
Funny Public