Guccifer 2: From January to May, 2016

Within the small community conducting technical analysis of the DNC hack, there has been ongoing controversy over whether Guccifer 2 (G2) was a false flag for the Russians, whether G2 was located in the US rather than Russia, whether the G2 files were copied locally rather than hacked, whether G2 was a false flag for the DNC (didn’t hack any documents at all).
In today’s post, I’ll try to shed a little light on the puzzle by presenting a case that metadata  from G2’s cf.7z dossier  shows that, between at least January 7, 2016 and May 4, 2016, Guccifer 2 copied numerous documents (primarily from the Democratic Party of Virginia – DPVA) within a few minutes of the documents being saved.  This strongly suggests to me that Guccifer 2 was a genuine hacker who had indeed installed malware on a Democrat computer, which was then used to automatically exfiltrate documents.
Unlike the ngpvan.7z previously analysed by Forensicator, the copying structure of cf.7z is formidably complex, with evidence of both Unix-type and Windows-type copying, possibly in multiple stages. 
Stale Documents
Forensicator’s analysis of the ngpvan.7z dossier was restricted to the 7z (or directory) modification dates and times in the 7z archive i.e. the modification times displayed by the 7z software.  In the ngpvan.7z dossier, all documents had directory modification dates of July 5, 2016 and modification times within one 14-minute session.   In addition to their properties in 7z, the documents (pdf, docx, xlsx) also have metadata from their original software. I’ve manually opened and examined document modification times of examples from all ngpvan.7z directories, without finding a single document that wasn’t extremely stale  (2008-2011).
There are many stale documents in the cf.7z dossier as well, though typically somewhat less stale (2013-2014). Some documents even came from the same July 5, 2016 copy operation as ngpvan.7z (as previously discussed at CA here and Forensicator here.)  The July 5 copying incident appears to me to be an internal re-arrangement of Guccifer 2’s inventory of documents, rather than an exfiltration event, as G2 was almost certainly expelled from DNC computers by July 5.
“Bulk” Unix Copying
The July 5 copy incident was an example of “bulk” Unix-style copying i.e. copies linked together in one copying session with the same modification date and sequential modification times. Prior to the July 5 incident, there were previous sessions with “bulk” copying on  April 18, May 23, June 4, June 6 and June 20.  These typically retrieved stale documents, but the June 4 session was an exception: it included the most recent documents in the entire G2 corpus – documents dated to June 1 and June 2, 2016 not just by metadata but by contents e.g. Orange Pod Press Clips 6.1.16.docx in the Intern Sandbox directory.
There are also examples of “fossilized” bulk copying e.g. Insurance Benefits Summary directory where the document modification times (in addition to 7z modification times) show the sequential modifications characteristic of a bulk Unix copy. In this case, the bulk Unix copy appears to have been followed by a Windows-type copy (preserving the document modification times to the 7z modification times).
Same-Day Copies – Timezone Issues
Unlike ngpvan.7z, the cf.7z dossier contained numerous documents from 2015 and first half 2016, including numerous documents with identical 7z and document modification dates. However, the modification times presented problems as shown in the table below: the document modification time and 7z modification time were exactly four hours apart. The minutes and seconds matched exactly, but not the hours. This shows that a Window-type copying operation has taken place after which 7z interprets the modification time incorrectly. My surmise is that 1) the document modification time is saved as absolute seconds in local time; 2) the 7z software presumes that the absolute seconds are in UTC i.e. the document is 10:31 UTC rather than 10:31 Eastern; 3) 7z then displays the directory modification time in local time (6:31 Eastern), 4 hours “earlier” than the corresponding document modification time.

Complicating matters further, 7z handles timezone metadata for pdf documents differently than docx or xlsx documents.  The next table shows directory and document modification times for selected pdf, docx and xlsx documents when inspected in Eastern (columns 6-7) and UTC (columns 8-9).  Pdf documents display the same local time in both Eastern and UTC (and all other timezones) i.e. different absolute times, while docx and xlsx documents display different local times in Eastern and UTC timezones (but a constant absolute time).

Constructing a Database of Metadata
In order to advance from manual inspection and collation to analysis of the full population, I constructed a database as follows.
The R function file.info is able to extract directory modification times (also creation and access times, not relevant here). I was able to locate an R packages (pdftools) which extracts pdf metadata, including document modification and creation times, but I was unable to locate a corresponding package for Word or Excel (though one probably exists.)
I first extracted the 7z dossier from 7z into a Windows directory after first setting my computer to UTC. (I originally did this in Eastern, but eventually settled on UTC with the objective of simplifying analysis.) Using R, I then sequentially extracted document names in each directory down all directory and subdirectory trees, keeping track of the directory tree and document name.  This resulted in 2105 documents without unpacking the zip directories (which contained stale documents anyway.)
I then added a column in which I distinguished pdf, docx and xlsx documents using grep: there were 815 doc, 597 pdf and 356 xls.  There were also a few txt, xml and miscellaneous documents, which I didn’t consider for the analysis.
I then extracted the directory modification time (as a POSIXct object) for each document using the R-function file.info. This enables a separate extraction of timezone. The timezone for all documents was shown as EDT and/or EST  even when I set the computer to UTC. I’m not sure whether this is an artifact of my usual computer setting (Eastern) or whether it is additional evidence that G2 operated in Eastern time (evidence of which has been presented previously by Forensicator and myself.) Someone may be able to shed light on this for me.
I then extracted document modification and creation times for the 597 pdf’s. Not all pdf documents had readable document modification times and/or some retrieved pdf document dates were in the 22nd century and clearly an artifact. These dates were set to NA. This left a dataset of 530 pdf documents with both document modification times and directory modification times.
The next graph shows the number of days between document modification date and 7z modification date for these 530 pdf documents. The “stale” documents are typically 3-4 years old (with some nearly 10 years old). There are nearly 200 pdf documents with less than 50 days between document and 7z modifications, including 122 documents in which modification dates are identical. 
From the same-day inventory, we wish to exclude Windows-type copying (using Forensicator’s distinction) which is uninformative because the copy modification metadata simply preserves the document modification metadata. The pdf’s in the Insurance Benefits Summary directory (shown in the first example above) are of this type.  This excluded 20 documents and left a dataset of 102 documents with modification dates ranging from January 7, 2016 to May 4, 2016. An extract showing the first five examples is shown below (otime- directory modification time; dtime – document modification time; ozone- directory timezone).  In each case, the directory modification time is 3-12 minutes after the document was saved (document modification time). All but two EST documents fit this pattern.

However, for documents with EDT timezones, the 7z modification time is “earlier” than the document modification time. The anomaly seems to be something to do with a difference between EST and EDT, but it is not just that: a bodge of an additional hour gets rid of some discrepancies, but many documents are still 10-30 minutes “early”. I’m presently stumped.

Again, we know that the 7z modification time cannot be earlier than the document modification time.  Even without being able to precisely pin down the reason for the discrepancy, we can still safely record the range of dates on which we’ve observed documents with identical directory and document modification dates and non-identical modification times – from January 7, 2016 to May 4, 2016.
xlsx and docx Documents
I did a spot check of xlsx and docx documents (doing manual comparison) – not especially thorough. Examples in the spot check with 2015 modification dates were all Windows-type copying (exactly the same modification time “modulo” hours – to borrow a math term) and thus uninformative on the copy date.
Discussion
The short time interval between a document being saved (document modification time) and being copied to an archive used in compilation of cf.7z indicates to me that these particular documents were not exfiltrated manually, but instead with some sort of “eavesdropping” software. (This observation applies only to this subset of documents, not to “batch” copying.)
The range of observed dates seems interpretable to me:
The terminus ab quo date of January 7, 2016 for automated eavesdropping is only a couple of weeks after a computer security incident in which Sanders supporters obtained access to NGP files that were supposed to be private to Hillary Clinton supporters  (incident described by DNC here). Guccifer 2 claimed to Vice magazine to have obtained access to DNC computers through a “0-day exploit of NGP VAN soft”, after which he claimed to have “installed shell-code into the DNC server”. In the same interview, he claimed to have first hacked them in summer 2015.  Guccifer 2’s claim to have accessed DNC servers through a 0-day exploit of NGP VAN software commending in summer 2015 has been widely repudiated e.g. ThreatConnect here. It seems entirely possible to me that Guccifer 2’s access began in December 2015 or January 2016 rather than summer 2015 – statements in a hacker interview ought to be considered, but deception and misdirection needs to be allowed for. It also seems possible to me that the NGP-VAN incident in December 2015 might have functioned similarly to the Mole incident in Climategate – an analogy that will not have any meaning to anyone other than long-time Climate Audit readers but may nonetheless be useful. The Mole incident resulted in numerous Climate Audit readers looking through and parsing the University of East Anglia website and FTP site for clues. A couple of readers reported falling through trapdoors into unexpected areas of the computer, but chose not to investigate. My guess is that Mr FOIA did so as well, but, unlike the other readers, continued into the UEA computer, eventually discovering the backup email server. I can readily imagine a computer nerd/geek/hacker being drawn to the DNC computer by the NGP VAN incident and gaining access (just as Mr FOIA obtained access to UEA.)  Such a scenario is consistent with the terminus ab quo of January 7 (but obviously not proven by this).
The terminus ad quem date is May 4, 2016, only a couple of days prior to Crowdstrike’s installation of Falcon software. Although Crowdstrike was unsuccessful in interrupting the leak of DNC emails, it would be odd if their anti-hacking software didn’t do anything. Based on these dates, perhaps Crowdstrike did indeed interrupt Guccifer 2’s automatic eavesdropping, but without preventing access entirely (based on subsequent batch copying sessions on May 23, June 6 and June 10 plus emails continuing to May 25.)
To my eye, there is convincing evidence that G2 actually hacked Democrat Party computers from at least January 2016 on. This is inconsistent with Adam Carter’s theory that G2 was a false flag operation by Crowdstrike and the DNC – the metadata points to too early a start to support such a theory. G2 metadata also points too early for G2 to be a false flag by Fancy Bear/APT28 who are said to have gained access only in April 2016.
The hacking dates of Guccifer 2 more plausibly connect to the dates assigned to the user of the tools ascribed to Cozy Bear/APT 29.  This  in turn points to a very specific attribution question: how unique are the tools ascribed to the “Cozy Bear” group (as opposed to the Fancy Bear group)? Are they generic enough to be available to a lone wolf hacker, making unique attribution subject to great uncertainty?
No bleaching of metadata: in the Climategate CG-1 release, Mr FOIA (a lone individual, not an intelligence service) bleached all metadata showing date of access and download of the emails, but neglected to bleach directory timestamp metadata for numerous documents.  See discussion and compilation at ijish.livejournal.com ^ . The timestamp information showed that Mr FOIA’s access to documents began on or about Sept 15, 2009 (a month after the Mole Incident) and ended on November 16, 2009. It also showed that Mr FOIAuploaded documents specifically pertaining to Yamal within a few hours of a widely publicized Climate Audit post on Yamal. In CG-2, Mr FOIA bleached all directory timestamp information. In contrast, G2 did not bleach any directory metadata – for some reason, omitting a precaution taken by Mr FOIA.
G2 and the Russian “Clown Outfit”:  In Climategate, while Mr FOIA bleached directory metadata, he did not change or modify any internal document (pdf, doc, xls) metadata on modification times, default language or anything else. Nor did Guccifer 2 for any of the documents in cf.7z, ngpvan.7z or any of the documents released at the G2 blog from July on. However, as discussed endlessly, in G2’s announcement blogpost, he attached four documents (1.doc, 2.doc, 3.doc and 5.doc) which had been materially altered earlier on June 15 with the sole purpose of adding “Russian” metadata (see recent CA review here ^).  A distinction between directory metadata and document metadata has been emphasized over and over in this post and I hope that this highlights the baroque-ness of G2’s “Russian” alterations on June 15. Some commenters, even so-called “experts” such as Thomas Rid, have grossly misled their readers on these alterations: Rid claimed that later G2 releases “were now scrubbed of the sort of distinguishing metadata that had allowed analysts to trace the leak back to Russian intelligence”.  What total rubbish.  No such metadata was “scrubbed” in cf.7z or other later releases. The situation is the opposite to what Rid describes: the “distinguishing metadata” had been manually added to a few early documents on June 15.
My own working hypothesis is that G2 was a lone wolf hacker. This is a surmise only. This surmise is NOT proven by the analysis provided above, but I do not believe that it is inconsistent with the information marshalled here. I’ll try to outline why I believe G2 to have been a lone wolf hacker on another occasion.

Source