AJAX, Web 2.0 and the Threat to Digital Archives
All the online world’s abuzz over the seeming resurgence of enthusiasm around web development; Jeffrey Zeldman’s Web 3.0 post notwithstanding, it can seem as though the past six years didn’t happen and all is sunshine, roses and optimism for the online world. While the move toward dynamic scripting and web applications is clearly changing the ways in which we interact with the world within our browser pane (in most cases clearly for the better), there’s a small but nasty thundercloud on the horizon that hardly anyone’s talking about, but which has the potential to seriously rain on the parade.
Here’s the problem: since human beings first started scratching marks on stone, we’ve followed a trajectory away from a culture of story-telling to a culture of record-keeping. Walter Ong calls this the difference between orality and literacy; while it used to be that people would pass knowledge from generation to generation by telling each other stories, now we write things down and trust that they’ll still be there later. In short, we’ve taken our collective memory and stored it outside of our heads.
This externalization, in general, is seen as a good thing – many have argued that it lets us develop far more complex ideas, and that the encyclopedic collection of knowledge gathered in the age of Science and Reason is really only possible if you can keep it someplace objective and everlasting while each individual adds his or her own contribution. The flip side, however, is that if the warehouse where you’re storing all your books burns down, you’re left with nothing. The unseen brace holding up the entire structure of modern culture is what librarians call “preservation,” the set of practices that keep our books from crumbling and our digital tapes from demagnetizing.
Every form of media requires some form of preservation, but some are more delicate than others. If you’ve got a set of stone tablets, preservation pretty much means making sure that nobody takes a sledgehammer to them (and that acid rain doesn’t eat them away). Books are trickier, what with acid in their paper and their weaknesses w/r/t both water and fire, but they’re still pretty robust1. Film and magnetic media are even more problematic, because you need not only to make sure that the celluloid doesn’t degrade or the tapes demagnetize (thus losing the information therein encoded), but also to make sure that you keep around a machine that could actually read that information should you want to.
Call it the “Betamax” dilemma – you’ve got a home movie of your now-25-year-old daughter’s 3rd birthday party, but it’s on a Betamax videocassette. Nobody makes Betamax VCRs anymore, so you’re stuck trying to find an old machine on which you can play the tape. Now, imagine that your great-great-grandson comes across the tape in a box in an attic somewhere. The point is that the more layers of mediation there are between you and the information you’re trying to preserve, the more likely it is that you won’t be able to access that information in the future2.
For historians, this problem is particularly painful; as information gets wrapped in more and more layers of technology, the profession increasingly relies in a very real sense on the work of preservationists who keep this “stuff of history” around for future generations. Such professional archivists are often incredibly good at what they do, and have developed an extensive set of best practices for storing and preserving knowledge in its many forms. In the past decade, there’s been an explosion of digital information, and archivists have aptly kept pace; data once stored on magnetic reels was moved to new formats as old ones became obsolete, and standardized XML allows archivists to extract the important information from any digital medium and move it into a platform-agnistic esperanto of sorts. Since the web entered the picture, an increasing number of forward-thinking archivists have worked to take snapshots of both individual sites and the global web as a whole, and while not totally comprehensive, there’s hope that our current sea of blogs, podcasts and discussion boards need not be lost to the forces of digital erosion.
Here, however, we run headlong into the Web 2.0 brick wall. In a sense, the whole point of Web 2.0 is to make it harder to archive and preserve knowledge. As websites move from being documents to being applications, their actual content is wrapped up in increasingly difficult-to-penetrate layers of code, both client-side (which is at least capturable by webcrawlers) and server-side (which is impossible to archive without effort on the part of the site creator). Moreover, the movement toward remixing and mashing up both content and functionality means that information is increasingly inextricable from its context in very real technical ways – a good thing in some ways for user experience, but a remarkably bad thing for archivists looking not at the now but at the 200-years-from-now.
1 Nicholson Baker does a great job of demystifying some of the fragility of books.
2 I recently heard Clay Shirky use the metaphor of “shearing layers” a la Stewart Brand’s “How Buildings Learn” to make essentially the same point.
January 23rd, 2006 at 3:24 pm
Retaining electronic records for long-term storage is definitely a real issue. Fortunately, the International Organization for Standardization has already identified the concerns and recommended a standard way of satisfying them:
“ISO 19005, Document management – Electronic document file format for long-term preservation”
http://www.iso.org/iso/en/commcentre/pressreleases/2005/Ref974.html
For more info, particularly from a user’s point of view, try this ten-page PDF… eye-opening:
“PDFs as a Standard for Archiving”
http://www.adobe.com/products/acrobat/pdfs/pdfarchiving.pdf
For web pages, it’s easy to capture a full presentation with linked files into a single static PDF for archiving. It’s still good to keep the info live in its original database, as well as the templates into which that data flows, but for long-term storage of a website as it was viewed at the time, ISO 19005 can handle this problem.
jd/adobe
January 24th, 2006 at 8:02 am
I don’t agree with the assertion that:
“In a sense, the whole point of Web 2.0 is to make it harder to archive and preserve knowledge.”
To me, the whole point of of Web 2.0 apps is that data is freer than it has previously been. Instead of forcing people to visit a website to peruse their own data, Web 2.0 apps offer APIs that allow third party developers to create other tools for viewing the same data.
To my mind, that freedom of data movement is what defines the Web 2.0 mindset. Personally, I don’t even visit sites like Flickr, del.iciou.us or Upcoming all that often, even though I’m constantly adding to the store of my data they hold. I add and retrieve my content using desktop apps or more lightweight websites.
The ability to easily archive and retrieve my own content is the most important factor in my mind when it comes to choosing a centralised hosted service. As I understand it (and I may well be wrong here), that ease of access to my own content is the defining element of Web 2.0.
How the website itself chooses to present the stored data — through Ajax, Flash, or whatever — is largely irrelevant. That is, it’s very relevant for the current, short-term user experience but it’s irrelevant for the purposes of archiving and storage.
You say:
“As websites move from being documents to being applications, their actual content is wrapped up in increasingly difficult-to-penetrate layers of code, both client-side (which is at least capturable by webcrawlers) and server-side (which is impossible to archive without effort on the part of the site creator).”
But if the site creator hasn’t provided easy access to the user’s content (through an open API) then that application is, de facto, not Web 2.0.
January 24th, 2006 at 11:57 pm
I should clarify a point…as a historian, my concern isn’t just with the pure content that’s being streamed via web services; especially given my particular research interests (in the everyday experience of technology), it’s with preserving the actual end-user experience of that content, embedded in whatever Ajaxified/Flashified/DHTMLified interface it’s been streamed/mashed-up/remixed into.
Thus my point; as the actual user experience of digital content is increasingly mediated by web apps run within the browser, preservationists are going to have to move from archiving documents (which we know how to do well) to archiving applications, as well as the underlying data streams that power them.
In short – I have a flickr stream. I can access and preserve those images and captions very easily (Jeremy’s point r.e. Web 2.0). What’s much harder is to preserve them as they appear in a sidebar on my blog (where most people might see them) if they’re dynamically put there by an AJAX call after the page loads. For the purposes of a web crawler (still the state of the art when you’re talking about archiving, be it Internet archive or the Library of Congress), that sidebar would be an imprnetrable black box.
January 25th, 2006 at 8:20 am
Really interesting stuff, Josh. Given your example in your post (the Betamax tape) and your clarification in your last comment, it seems like the idea that “preservationists are going to have to move from archiving documents…to archiving applications…” would be a GOOD thing. For an archive to access that Betamax tape, they will have had to archive the “application” (i.e., the hardware) to play the tape. Or, figure out another way to access that tape, resulting in yet another “application”. The stone tablet you mention also requires these “applications”, though in a difference sense. Where was it used or read? Who read it and used it? How was it transported? What language is it in? There’s no way to archive these “actual user experiences” with this stone tablet, and equally difficult to archive the applications used to access the information on the tablet. The very nature of the externalization of experience itself creates layers of mediation that require some work to access, understand, and share. It seems, then, that preservationists have, since the beginning of the idea of preservation, have also had to preserve the “applications” that come along with the materials they preserve but have also had difficulty preserving how the user experienced the materials.
On a different note, it seems that, overall, Web 2.0 has in fact made digital preservationists out of everyone. We’re creating our own little digital archives with blogging, Flickr, del.icio.us, Ning, etc. Web 2.0 might be making it harder for “traditional” (for lack of a better word) archivists and preservationists, but it’s made it easier for enthusiasts and amateurs to become preservationists. And, knowing your interests in amateurization, enthusiasts, and tinkering, I’m curious what your thoughts are about this. I think all this might threaten digital archives like the September 11 Archive or the Internet Archive, but I think it might in fact increase the number of digital preservationists in practice.
January 25th, 2006 at 8:58 am
[…] Josh’s recent (and really good) post examines the impact that Web 2.0 and the consequences of a “web as application” for digital preservationists. Ultimately, Josh is concerned that it has become increasingly difficult for digital preservationists to archive the “end user experience” of a web that has moved quickly away from static documents and more toward interactive applications. […]
January 25th, 2006 at 8:43 pm
[…] Mostly because this an essay I ran across and haven’t read yet – talks about how the AJAX or software being replaced by web service model might not be so great for archiving digital information: […]
January 27th, 2006 at 11:10 am
[…] Interesting article about how it becomes increasingly dificult to archive and preserve knowledge. As websites move from being documents to being applications, their actual content is wrapped up in increasingly difficult-to-penetrate layers of code, both client-side (which is at least capturable by webcrawlers) and server-side (which is impossible to archive without effort on the part of the site creator). Moreover, the movement toward remixing and mashing up both content and functionality means that information is increasingly inextricable from its context in very real technical ways – a good thing in some ways for user experience, but a remarkably bad thing for archivists looking not at the now but at the 200-years-from-now. […]
June 2nd, 2007 at 3:22 pm
Ylia…
Ich nicht ferschtein this articklees…
July 29th, 2007 at 4:13 pm
So here’s Heather Locklear on the beach playing football, flirting, and generally acting young. And while her body is still in great shape, the signs of ageing are starting to show. Most notably in her face, where it seems she skipped her last Botox appointment (or maybe her face-lift could use a little tightening).
Sadly, she’s also showing signs of getting older in her arms and legs, where she’s starting to get a bit of that “old lady waddle.” In fact, the only place it looks as though she hasn’t aged is around her boobs, which are still remarkably perky (though the reason for that is probably an easy guess.)
If you need more help making you decision, check out these Heather Locklear bikini pictures, nude pussy and tits, as well as the rest of this set after the jump.
October 15th, 2007 at 12:25 am
online marketing melbourne…
online marketing melbourne…
November 20th, 2007 at 11:49 pm
Teens For Cash Videos…
Teens For Cash Videos…
November 28th, 2007 at 11:04 pm
backseat bangers backseatbangers…
hottest backseat bangers backseatbangers…
December 19th, 2007 at 9:52 pm
July 23rd, 2008 at 12:50 pm
qwrstmv vhirg zgtmbql sybv cjopk xeybl dfpnwkjic