I recently spent some time helping out on the Digital Library Collections (DLC) project here at Columbia, an initiative to create a single portal for discovery of the Libraries’ non-textual digital assets. Pre-DLC, access to Columbia Unversity Libraries (CUL) digital assets was provided through the use of custom project sites like this one. The sites place individual assets within the context of the overall collection, which is great, but limit the ability of a researcher to search across multiple collections: you would have to go to this page and search each collection separately, one by one. The DLC portal provides a more global approach to digital object discovery.
In order to maximize the tool’s effectiveness, the team designed a universal, extensible schema that can be applied to any digital collection that is included in the site. This schema will inform data collection and creation for all future digital efforts, providing an individual project stakeholder with a guide that will help her optimize collection discovery. This also ensures a consistent metadata baseline across a diverse set of collections (images, audio, video, and the born-digital versions of each, etc.) But first, metadata for existing collections must be updated wherever possible to bring it up to the new DLC standard (a process that is currently ongoing).
The project team also took into account the various technical formats used across these collections, and made the decision to create a homogeneous derivative set that would support desired image manipulation functionality (in this case, JPEG 2000). This is a decision that will have to be revisited periodically as available technology improves; format obsolescence or future functionality needs may drive a shift to a different platform.
Because digital texts are cataloged in CLIO (Columbia’s online catalog), they were excluded from this tool, which is geared toward the display and manipulation of images. Of course, the demarcation between text and image is not always clear: what about, for example, correspondence? And what about mixed collections (those that include both image- and text-based objects)? Ultimately, the project’s scope was widened to include archival types of textual material (like correspondence) but not full book-like objects that merit individual description through a cataloging system. DLC will include images and archival parts of mixed collections, while the book-like objects will be cataloged individually and made available through CLIO. DLC will also be used as a tool to examine born-digital collections.
This project allowed me to dig into the complex issues surrounding the provision of meaningful access to a contemporary academic library’s collections. It’s not enough to make objects available; the metadata foundation needs to be there for researchers to be able to find everything the library has on their specific subject. This means a lot of work up front, and the need to constantly re-examine standards as collections incorporate new types of materials that may not readily be described by existing schemas. For example, the distinction between image-based and text-based objects made sense for the project right now, but will that be meaningful in five years, as the proportion of born-digital materials increases?