Losing Ourselves

Vint Cerf, Internet pioneer and current Google VP, recently expressed concerns about an approaching “digital dark age” in which our prized digital memories will be lost forever, not through cataclysm or degrading media, but because of the obsolescence of the hardware and software needed to access those digital objects. Cerf called for the creation of what he called “digital vellum,” a snapshot that includes content and its related software and hardware together in one package. This object would be preserved in some sort of cloud server structure along with a description of the machine needed to run it,thus allowing those in the distant future to access this era’s content by reproducing the environment in which it was originally created and stored.

This is a very real problem, one that anyone who grew up playing ancient video games is painfully aware of. My first computer was a ZX Spectrum+. It was a big improvement on the original Spectrum, with hard plastic keys (as opposed to the softer, much less user-friendly rubber keys of the previous version) and a whopping 48k of RAM (as opposed to 16k). When I was 13 my family moved from Uruguay to the US, and it didn’t survive the trip. I moved on to the NES, and that was that, or at least that was the case until the age of the emulator dawned on us. Today, anyone feeling the pangs of nostalgia can go to one of several fan sites and download a Spectrum emulator, as well as a broad complement of games and other programs. The machine itself has disappeared, but its spirit, if you will, lives on to, in theory, captivate new generations of enthusiasts. I mean, there’s only so much you can do with 48k.

The problem with the emulator model is that it requires a fan base with enough critical mass and resources to support the effort necessary to create (and update) the emulation software, something that will  become less and less likely as the original users of these machines give way to new generations that lack familiarity with them. My son will grow up playing Wii, or PS4; why would he care about the Spectrum, or the original NES? Digital vellum is a better idea, closer in kin to digital archeology and/or anthropology, but who is going to devote resources to a project of this magnitude? Software and hardware creators are unlikely to carve room out of their bottom lines for something like this. If Mr. Cerf is this concerned about the idea, then Google may get involved, but we’ve seen what happens when they grow bored with a project (or find that it’s not as profitable as they once thought). Organizations like Internet Archive are committed to digital preservation, but their resources are limited. So who is going to do this?

Providing Access

I recently spent some time helping out on the Digital Library Collections (DLC) project here at Columbia, an initiative to create a single portal for discovery of the Libraries’ non-textual digital assets. Pre-DLC, access to Columbia Unversity Libraries (CUL) digital assets was provided through the use of custom project sites like this one. The sites place individual assets within the context of the overall collection, which is great, but limit the ability of a researcher to search across multiple collections: you would have to go to this page and search each collection separately, one by one. The DLC portal provides a more global approach to digital object discovery.

In order to maximize the tool’s effectiveness, the team designed a universal, extensible schema that can be applied to any digital collection that is included in the site. This schema will inform data collection and creation for all future digital efforts, providing an individual project stakeholder with a guide that will help her optimize collection discovery. This also ensures a consistent metadata baseline across a diverse set of collections (images, audio, video, and the born-digital versions of each, etc.) But first, metadata for existing collections must be updated wherever possible to bring it up to the new DLC standard (a process that is currently ongoing).

The project team also took into account the various technical formats used across these collections, and made the decision to create a homogeneous derivative set that would support desired image manipulation functionality (in this case, JPEG 2000). This is a decision that will have to be revisited periodically as available technology improves; format obsolescence or future functionality needs may drive a shift to a different platform.

Because digital texts are cataloged in CLIO (Columbia’s online catalog), they were excluded from this tool, which is geared toward the display and manipulation of images. Of course, the demarcation between text and image is not always clear: what about, for example, correspondence? And what about mixed collections (those that include both image- and text-based objects)? Ultimately, the project’s scope was widened to include archival types of textual material (like correspondence) but not full book-like objects that merit individual description through a cataloging system. DLC will include images and archival parts of mixed collections, while the book-like objects will be cataloged individually and made available through CLIO. DLC will also be used as a tool to examine born-digital collections.

This project allowed me to dig into the complex issues surrounding the provision of meaningful access to a contemporary academic library’s collections. It’s not enough to make objects available; the metadata foundation needs to be there for researchers to be able to find everything the library has on their specific subject. This means a lot of work up front, and the need to constantly re-examine standards as collections incorporate new types of materials that may not readily be described by existing schemas. For example, the distinction between image-based and text-based objects made sense for the project right now, but will that be meaningful in five years, as the proportion of born-digital materials increases?