By Mike Masnick | Techdirt | August 14, 2013
Last week we wrote about the NSA’s ridiculous attempt to justify its surveillance efforts, including this really wacky callout designed to show just how “little” data the NSA collects.
Scope and Scale of NSA Collection
According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However, of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world’s traffic in conducting their mission — that’s less than one part in a million. Put another way, if a standard basketball court represented the global communications environment, NSA’s total collection would be represented by an area smaller than a dime on that basketball court.
This was bizarre on a number of levels, not the least of which is the wacky basketball court-to-dime scale. Next time, maybe we can play “is it bigger than a breadbox” with the NSA. But, as for what any of this meant, it hasn’t been at all clear. Since the NSA has already redefined basic English words like “collect,” “target,” “datamine,” and “relevant” it’s not at all clear what is meant by “touch.” However, some are starting to dig into the numbers, and contrary to the NSA’s attempt to suggest that this is “nothing to fear,” a bit of analysis certainly suggests they’re collecting quite a bit of info.
First up, we have Jeff Jarvis, who highlights a bunch of important comparative datapoints including that Sandvine claims that only 2.9% of US traffic is communication traffic and 68.8% of all email is spam — meaning that it’s entirely possible that the NSA collects nearly all non-spam email and it would still be within its 1.6% number. He also points out that 62% of traffic on the internet is considered entertainment, and we can assume that the NSA doesn’t need to collect every copy of Game of Thrones that people are passing around (I’m sure one or two will do the job). He similarly points out that Google itself claims to only index approximately 0.004% of traffic on the internet, suggesting that the NSA may be collecting more info than Google indexes by two orders of magnitude.
Meanwhile, Sean Gallagher, over at Ars Technica, digs a bit deeper into the numbers, suggesting that the NSA’s data collection is closer to being on par with Google, but still greater than Google:
The dime on the basketball court, as NSA describes it, is still 29.21 petabytes of data a day. That means NSA is “touching” more data than Google processes every day (a mere 20 petabytes).
Gallagher also looks much more closely at the recently revealed details of the Xkeyscore program, to show how that 1.6% of “touched” internet communications can cover pretty much everything important.
As a result, if properly tuned, the packet analyzer gear at the front-end of XKeyscore (and other deep packet inspection systems) can pick out a very small fraction of the actual packets sent over the wire while still extracting a great deal of information (or metadata) about who is sending what to who. This leaves disk space for “full log data” on connections of particular interest.
In other words, while the 1.6% number was put forth by the NSA to try to make people think this is no big deal, when you look at what it means, it suggests it’s a very big deal indeed. In fact, the NSA may be collecting even more information that people had believed before.