"A lot of people believe that if it's on the Web it will stay on the Web. Chances are that it won't" (Jill Lepore)

This page is far from the only one that isn't available!"The average life of a Web page is about a hundred days. Strelkov's 'We just downed a plane' post lasted barely two hours. It might seem, and it often feels, as though stuff on the Web lasts forever, for better and frequently for worse. . . . No one believes any longer, if anyone ever did, that 'if it's on the Web it must be true,' but a lot of people do believe that if it's on the Web it will stay on the Web. Chances are, though, that it actually won't."-- from Jill Lepore's "The Cobweb," in the Jan. 26 New Yorkerby KenWe'll come back to "Strelkov's 'We just downed a plane' post" in a moment, because it's a nifty story in its own right, and it introduces those of us who haven't yet had a proper introduction to the Wayback (well, more properly WABAC) machine housed at the Internet Archive in a converted church in San Franciso.First, though, I should explain that Harvard historian Jill Lepore, in her January 26 New Yorker "Annals of Technology" piece, "The Cobweb," has much larger ambitions than the limited one I've set for us in this post. Jill sets out to answer the question posed in the piece's subtitle: "Can the Internet be archived?" I think a lot of readers will be interested in the story Jill has to tell about people who are attempting to figure out just how to do that -- i.e., to archive the Internet -- starting with the creator of the storage system that he dubbed the Wayback Machine, Brewster Kahle, the founder of the Internet Archive. (I guess it's properly spelled WABAC, but Kahle is upfront about the device being named for the wondrous machine with which the cartoon Mr. Sherman used to attempt to educate his boy Sherman.)For that part of the story, however, you'll need to consult the piece itself. What concerns us here is that, as Jill indicates in the passage I've plunked atop this post, a lot of people don't realize just how transitory the Internet is.In fact, I suspect that a lot of people, not having given it much thought, think that t the Internet is itself an archive -- and depend on it, again without giving it much thought, as such, even though most of us are thoroughly used to encountering error messsages like the Facebook one I've also placed at the top of this post. As Jill says, "It might seem, and it often feels, as though stuff on the Web lasts forever, for better and frequently for worse," and she provides examples. How often do the media regale us with stories of things that people wish desperately to make disappear from the Web?And usually the upshot is that you can't ever get rid of it, no matter how much you might wish you could.The reality, though, is, Jill says, "The Web dwells in a never-ending present. It is -- elementally -- ethereal, ephemeral, unstable, and unreliable." With regard to the belief "that if it's on the Web it will stay on the Web," she counters, as we've already read, "Chances are, though, that it actually won't." And she provides some charming for-instances.

In 2006, David Cameron gave a speech in which he said that Google was democratizing the world, because “making more information available to more people” was providing “the power for anyone to hold to account those who in the past might have had a monopoly of power.” Seven years later, Britain’s Conservative Party scrubbed from its Web site ten years’ worth of Tory speeches, including that one.

Last year, BuzzFeed deleted more than four thousand of its staff writers’ early posts, apparently because, as time passed, they looked stupider and stupider. Social media, public records, junk: in the end, everything goes.

Which is in fact disastrous in many walks of life we may not normally stop to consider.WHICH IS "A DISASTER" IN MANY AREASTo begin with, it's not just via deliberate deletion that Web pages become unfindable. Jill is quick to point out: "Web pages don’t have to be deliberately deleted to disappear."

Sites hosted by corporations tend to die with their hosts. When MySpace, GeoCities, and Friendster were reconfigured or sold, millions of accounts vanished. (Some of those companies may have notified users, but Jason Scott, who started an outfit called Archive Team—its motto is “We are going to rescue your shit”—says that such notification is usually purely notional: “They were sending e-mail to dead e-mail addresses, saying, ‘Hello, Arthur Dent, your house is going to be crushed.’ ”) Facebook has been around for only a decade; it won’t be around forever. Twitter is a rare case: it has arranged to archive all of its tweets at the Library of Congress. In 2010, after the announcement, Andy Borowitz tweeted, “Library of Congress to acquire entire Twitter archive—will rename itself Museum of Crap.” Not long after that, Borowitz abandoned that Twitter account. You might, one day, be able to find his old tweets at the Library of Congress, but not anytime soon: the Twitter Archive is not yet open for research. Meanwhile, on the Web, if you click on a link to Borowitz’s tweet about the Museum of Crap, you get this message: “Sorry, that page doesn’t exist!”The Web dwells in a never-ending present. It is -- elementally -- ethereal, ephemeral, unstable, and unreliable. Sometimes when you try to visit a Web page what you see is an error message: “Page Not Found.” This is known as “link rot,” and it’s a drag, but it’s better than the alternative. More often, you see an updated Web page; most likely the original has been overwritten. (To overwrite, in computing, means to destroy old data by storing new data in their place; overwriting is an artifact of an era when computer storage was very expensive.) Or maybe the page has been moved and something else is where it used to be. This is known as “content drift,” and it’s more pernicious than an error message, because it’s impossible to tell that what you’re seeing isn’t what you went to look for: the overwriting, erasure, or moving of the original is invisible.

For consequences that most of us probably haven't thought about, consider pretty much the whole of our judicial system.

For the law and for the courts, link rot and content drift, which are collectively known as “reference rot,” have been disastrous. In providing evidence, legal scholars, lawyers, and judges often cite Web pages in their footnotes; they expect that evidence to remain where they found it as their proof, the way that evidence on paper—in court records and books and law journals—remains where they found it, in libraries and courthouses. But a 2013 survey of law- and policy-related publications found that, at the end of six years, nearly fifty per cent of the URLs cited in those publications no longer worked. According to a 2014 study conducted at Harvard Law School, “more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information.” The overwriting, drifting, and rotting of the Web is no less catastrophic for engineers, scientists, and doctors. Last month, a team of digital library researchers based at Los Alamos National Laboratory reported the results of an exacting study of three and a half million scholarly articles published in science, technology, and medical journals between 1997 and 2012: one in five links provided in the notes suffers from reference rot. It’s like trying to stand on quicksand.

WHICH BRINGS US BACK TO WHERE WE STARTEDSpecifically, to "Strelkov's 'We just downed a plane' post" -- a ripping good story I said we'd come back to, ripped out of the headlines, as it were:

Malaysia Airlines Flight 17 took off from Amsterdam at 10:31A.M. G.M.T. on July 17, 2014, for a twelve-hour flight to Kuala Lumpur. Not much more than three hours later, the plane, a Boeing 777, crashed in a field outside Donetsk, Ukraine. All two hundred and ninety-eight people on board were killed. The plane’s last radio contact was at 1:20 P.M. G.M.T. At 2:50P.M. G.M.T., Igor Girkin, a Ukrainian separatist leader also known as Strelkov, or someone acting on his behalf, posted a message on VKontakte, a Russian social-media site: “We just downed a plane, an AN-26.” (An Antonov 26 is a Soviet-built military cargo plane.) The post includes links to video of the wreckage of a plane; it appears to be a Boeing 777.Two weeks before the crash, Anatol Shmelev, the curator of the Russia and Eurasia collection at the Hoover Institution, at Stanford, had submitted to the Internet Archive, a nonprofit library in California, a list of Ukrainian and Russian Web sites and blogs that ought to be recorded as part of the archive’s Ukraine Conflict collection. Shmelev is one of about a thousand librarians and archivists around the world who identify possible acquisitions for the Internet Archive’s subject collections, which are stored in its Wayback Machine, in San Francisco. Strelkov’s VKontakte page was on Shmelev’s list. “Strelkov is the field commander in Slaviansk and one of the most important figures in the conflict,” Shmelev had written in an e-mail to the Internet Archive on July 1st, and his page “deserves to be recorded twice a day.”On July 17th, at 3:22 P.M. G.M.T., the Wayback Machine saved a screenshot of Strelkov’s VKontakte post about downing a plane. Two hours and twenty-two minutes later, Arthur Bright, the Europe editor of the Christian Science Monitor, tweeted a picture of the screenshot, along with the message “Grab of Donetsk militant Strelkov’s claim of downing what appears to have been MH17.” By then, Strelkov’s VKontakte page had already been edited: the claim about shooting down a plane was deleted. The only real evidence of the original claim lies in the Wayback Machine.

Later Jill tells us:

The day after Strelkov’s “We just downed a plane” post was deposited into the Wayback Machine, Samantha Power, the U.S. Ambassador to the United Nations, told the U.N. Security Council, in New York, that Ukrainian separatist leaders had “boasted on social media about shooting down a plane, but later deleted these messages.” In San Francisco, the people who run the Wayback Machine posted on the Internet Archive’s Facebook page, “Here’s why we exist.”

IT'S POSSIBLE THAT PAGE PRESERVATIONCOULD HAVE BEEN DESIGNED INTO HTTPEnglish computer scientist Tim Berners-Lee, the father of the "hypertext transfer protocol" created "to link pages on what he called the World Wide Web says it was considered. Partly it didn't happen, Jill says, because of "the preference for the most up-to-date information: a bias against obsolescence."

But the chief reason was the premium placed on ease of use. “We were so young then, and the Web was so young,” Berners-Lee told me. “I was trying to get it to go. Preservation was not a priority. But we’re getting older now.”

And Berners-Lee isn't alone in his concern. Vint Cerf, another developer who is in at the beginning, and is now Google's "Chief Internet Evangelist," e-mailed Jill: "I worry that the twenty-first century will become an informational black hole."As she documents, there are people all over the world, in addition to Internet Archive's Kahle, tackling the problem of archiving the Internet. But there is so much information, of so many different kinds, coming from so many different sources, that the complexity of the problem is humongous. Just consider the difference between copyrighted and non-copyrighted material, which dictates entirely different ways of handling the stuff. (Jill describes copyright as "the elephant in the archive.") Or consider that practically every country on the planet has different laws "relating to legal deposit, copyright, and privacy."National libraries are one place where a lot of archiving is being done, but "they collect chiefly what's in their own domains." And certainly no one else is attempting anything on the scale of the Internet Archive, whose WABAC machine (the name, we're told was designed to sound sort ofl like legendary early compouters such as UNIVAC, but really was borrowed from the Wayback Machine in which that most erudite of cartoon dogs, Mr. Peabody, takes "his boy Sherman" on doggedly educational time trips) captures picture after picture after picture of a growing number of websites around the world. "More than 30 billion Web pages" is the count Jill offers for what IA has archived. (Which of course creates the obvious problem: Once you've preserved all that, er, stuff, how do you find anything in it?)The original Wayback Machine -- with theinventor, Mr. Peabody, and his boy ShermanA lot of people are working on the problem, and you can read about more of them in the article. You may be buoyed to know that a solution, in the form of what Jill describes as "an excellent patch," has been devised for the vanishing-footnote problem: a collaboratively supported thing called Perma.cc which scholars writing papers can use to create links that really will be permanent (maybe we should say "permanent-ish"). "Perma.cc has already been adopted by law reviews and state courts," Jill tells us, and "it’s only a matter of time before it’s universally adopted as the standard in legal, scientific, and scholarly citation."Well, that's something. But already how many of us can't access files of our own we created as recently as a couple of years ago? Perhaps they're parked in a storage medium we no longer have access to, or perhaps they're trapped in antique software. Considering the frantic pace at which new content is being spewed onto the Web, it's not hard to imagine whole new worlds of missing-document pain.#

Link

http://downwithtyranny.blogspot.com/2015/02/a-lot-of-people-believe-that-if-its…

Title	Items
UNZ	837
In This Together	76
Julius Reuchel	39
Truth Comes to Light	1878
The Unweb Developer	15
Grand theft world	2889
Ivor Cummings	171
World Freedom Alliance	1183
Swebb TV	18
SGT Report	19578
Friends Against Government	114
Scott Horton	630
Tim Woods	636
Ron Paul Institute	187
Covid Infos	63
Technocracy News	1962
Ochelli Effect	521
Computing Forever	137
Summit news	4424
Unlimited Hangout	405
American Institute for Economic Research	3089
The last American Vagabond	856
The Gray Zone	255
Covert Action Magazine	690
The high wire	318
Tareq Haddad	32
Please Stop the Ride	102
The Infectious Myth	27
Lockdown Skeptics	3538
Sam Husseini	50
Dr. Andrew Kaufman	4
Swiss Propaganda Research	367
Off Guardian	1950
Cory Morningstar	19
James Bovard	663
WWI Hidden History	51
Grayzone Project	749
Pass Blue	466
Dilyana Gaytandzhieva	32
John Pilger	437
The Real News	402
Scrutinised Minds	39
Need To Know News	5518
FEE	7340
Marine Le Pen	472
Francois Asselineau	25
Opassande	55
HAX on 5July	220
Henrik Alexandersson	1894
Mohamed Omar	409
Professors Blog	10
Arg Blatte Talar	40
Angry Foreigner	19
Fritte Fritzson	12
Teologiska rummet	36
Filosofiska rummet	297
Vetenskapsradion Historia	364
Snedtänkt (Kalle Lind)	437
Les Crises	5899
Richard Falk	390
Ian Sinclair	236
SpinWatch	71
Counter Currents	20574
Kafila	1103
Gail Malone	59
Transnational Foundation	221
Rick Falkvinge	96
The Duran	19500
Vanessa Beeley	555
Nina Kouprianova	29
MintPress	7402
Paul Craig Roberts	6988
News Junkie Post	91
Nomi Prins	27
Kurt Nimmo	191
Strategic Culture	7683
Sir Ken Robinson	98
Stephan Kinsella	1144
Liberty Blitzkrieg	890
Sami Bedouin	65
Consortium News	2685
21 Century Wire	6186
Burning Blogger	324
Stephen Gowans	178
David D. Friedman	322
Anarchist Standard	16
The BRICS Post	1558
Tom Dispatch	736
Levant Report	18
The Saker	8224
The Barnes Review	623
John Friend	770
Psyche Truth	160
Jonathan Cook	184
New Eastern Outlook	7880
School Sucks Project	1932
Giza Death Star	2993
Andrew Gavin Marshall	28
Red Ice Radio	1098
GMWatch	3090
Robert Faurisson	150
Espionage History Archive	38
Jay's Analysis	1823
Le 4ème singe	92
Jacob Cohen	238
Agora Vox	30494
Cercle Des Volontaires	539
Panamza	3561
Fairewinds	127
Project Censored	1944
Spy Culture	983
Conspiracy Archive	135
Crystal Clark	76
Timothy Kelly	1003
PINAC	1482
The Conscious Resistance	1721
Independent Science News	118
The Anti Media	6913
Positive News	830
Brandon Martinez	30
Steven Chovanec	63
Lionel	323
The Mind renewed	562
Natural Society	2627
Yanis Varoufakis	1424
Tragedy & Hope	138
Dr. Tim Ball	114
Web of Debt	207
Porkins Policy Review	495
Conspiracy Watch	174
Eva Bartlett	769
Libyan War Truth	395
DeadLine Live	2006
Kevin Ryan	74
BSNEWS	2315
Aaron Franz	426
Traces of Reality	166
Revelations Radio News	307
Dr. Bruce Levine	244
Peter B Collins	1983
Faux Capitalism	205
Dissident Voice	16972
Climate Audit	246
Donna Laframboise	682
Judith Curry	1397
Geneva Business Insider	40
Media Monarchy	4120
Syria Report	87
Human Rights Investigation	98
Intifada (Voice of Palestine)	1685
Down With Tyranny	14579
Laura Wells Solutions	91
Video Rebel's Blog	691
Revisionist Review	485
Aletho News	31557
ضد العولمة	27
Penny for your thoughts	3947
Northerntruthseeker	4206
كساريات	37
Color Revolutions and Geopolitics	27
Stop Nato	5698
AntiWar.com Blog	5173
AntiWar.com Original Content	10472
Corbett Report	3491
Stop Imperialism	491
Land Destroyer	1685
Webster Tarpley Website	1463

"A lot of people believe that if it's on the Web it will stay on the Web. Chances are that it won't" (Jill Lepore)

Tags