Requirements and National Digital Infrastructures: Digital Preservation in the Humanities

Author: Sharon Webb (Digital Repository of Ireland)

We historians, literary scholars, linguists, philosophers, musicians, and others, as practitioners of humanities, have embraced the use of digital technology in all aspects of our work. We use online digital infrastructures to access the vast majority of our sources, we use bibliographic management systems to automate the referencing and organization of source material, we use spreadsheets and databases to structure, analyze, and visualize our data, and we use powerful text editors to typeset our work which, in many ways, support “rapid prototyping” of scholarly texts. We also disseminate our research online through tweets, blogs, and personal and academic websites. In a sense these new tools have seeped into the culture of humanities research and have become part of what we do—they are immersed in the culture (of the humanities) and extend the humanities toolkit.[1] Yet, while some theorists consider technology as agents of change and new mediums as entities that, as Willard McCarty states, shape our thought, the majority of users who employ technology within the humanities today are agnostic to the impact its use has on their research and are unaware of how, or if, it shapes their thought. [2] They make use of various digital tools but make little to no reference to the theories or frameworks that occupy scholars working within the realm of Digital Humanities; for the most part, they simply use the tools or technologies at their disposal. It is within the remit, then, of Digital Humanities to acknowledge these considerations and to scope out functionality based on the needs of humanities researchers; but it is also within their remit to consider that, from this emerging culture—that is, Digital Humanities—there are specific needs that only come to the fore because of its existence, because of the capabilities provided by the field. This paper will consider the importance of requirements analysis to further developing the humanities toolkit and to look at how national digital infrastructures that focus on long-term digital preservation can support new developments in humanities and Digital Humanities. It posits that Digital Humanities projects, as well as other digital research, should utilize existing digital environments, especially those that focus on providing long-term digital preservation support and guidance, in order to fulfill requirements such as storage and access; in doing so, researchers exploiting these infrastructures can focus more usefully on developing digital tools that engage different users in new types of knowledge creation and development.

Both the importance and value of the humanities have gone through a number of transformative epochs linked to technological progress and advances in our information mediums. Technologies—consisting of what we do and what we make—extend our natural abilities and help us overcome natural shortfalls or deficiencies; in essence, they supplement what we, as humans, do. Digital Humanities is a result of the current digital technological revolution and is the latest in a series of transformations that have fundamentally changed how the humanities, as a discipline, functions, as well as how scholars carry out their work. The use of digital technology in contemporary research is omnipresent. Yet, as Cathy Davidson states, historically we have adopted new technologies “without any particular new social or philosophical arrangements”—their use “simply drift[s] into the culture.”[3] Within the humanities, however, there is a difference between scholars who simply use these new technologies—that is, for which their use “drift[s] into the culture” of contemporary humanities research—and scholars who experiment, build, innovate, and create within the Digital Humanities culture. It is the remit of Digital Humanities to add value to the “conventional” mainstream use of technology, yet what is considered by people as having “added value” is highly subjective and ultimately depends upon “your” perspective on Digital Humanities as well as upon your research priorities and goals. It is also dependent on the artifactual evidence you require in order to resolve or inform particular research statements. Traditional artifactual evidence, consisting of letters, journals, ledgers, minutes, diaries, photos, scripts, novels, and so on, are the focal point of humanities research. They contain the evidence required by researchers to create dialogue between sources, people, places, and events. These in turn inform the arguments and research statements within scholarly writings. But what added value can Digital Humanities projects bring to these icons of traditional humanities research, beyond online or digital access, and what provisions can we make for interaction and engagement with born digital artifacts? What digital tools can enhance humanities research without detracting from the core of humanities research? Or, alternatively, must we consider how the traditional sense or functionality of humanities research is changing? In order to answer some of these questions, we must first give thought to the “requirements” of the humanities scholar: what are their needs? What are the essential, or necessary, features of humanities research activities and endeavors?

Within software engineering, the process known as “requirements engineering” is an essential activity that informs the shape, functionality, and development of any software project; as such, it should be one of the first activities any Digital Humanities projects—or software development project with a humanities focus—should undertake. In order to build the correct digital tools, we must first understand what it is we are trying to build. What questions do we want answered, what are the needs of the user and how can these be supported? We need to understand the problem domain before we start building the solution. There are many ways to inform requirements including participant observation, requirements interviews, surveys, and focus groups.[4] Essentially, all requirements elicitation forums should concentrate on listening to the needs of all potential users—in this case, the humanities scholar. However, careful consideration must also be given to future users, as well as to public audiences and the wider community of researchers. For example, a digital history project will have particular requirements based on the needs of academic scholars or, depending on the scope of the project, the general public (for example, citizens who are interested in a particular historical narrative or era, and who may include amateur historians as well as individuals interested in public history). Importantly, a project may have to cater to all of these needs. Dialogue between those that are developing digital tools for humanities research and any potential end user is crucial. This dialogue is necessary in all software development projects. It is within this dialogue that the two cultures—in other words, the software engineers and the humanists—explicitly meet. However, within Digital Humanities, this dialogue is essentially and intrinsically part of that culture, as many Digital Humanities scholars both carry out humanities research and build software tools. The needs of a humanities scholar, or indeed of any “user,” are dependent upon their activities, tasks, and goals. Therefore, we must also examine the needs of the Digital Humanities scholar. Understanding the user within these contexts allows for a comprehensive understanding of how digital tools or technologies should support, augment, or transform existing methodologies as well as create new ones. It is only from this perspective that we can develop projects and tools that are fit for purpose and that cater to the authentic needs of the user.

In terms of the sources used to support, inform, and drive humanities scholarship, what are the essential criteria or features of this artifactual evidence? From the perspective of the humanities scholar, he or she essentially requires access to sources. More than that, he or she requires sustainable access to sources in order to cultivate and maintain trust, credibility, and authority in his or her scholarly endeavors—this is essential for current as well as future users. Linked to this requirement, the humanities scholar also needs to cite and reference the evidence. And, in addition, a humanities scholar may need to authenticate sources, to ensure that it is what it purports to be. Finally, he or she may also want to republish sources within publications—like articles, websites, and so on—or organize and collate their sources for dissemination or aggregated interrogation. 

These requirements exist outside, and are independent of, the digital realm—they are not driven by the digital domain but reflect the authentic needs of humanities scholars within the context of their sources. Yet, the digital medium has revolutionized the dissemination of information, and all manner of cultural objects are now available digitally online, and are often publicly accessible. Within a digital framework, new requirements arise as a result of the digital medium with the introduction of new capabilities, new opportunities, and indeed new audiences; the Digital Humanities framework also has the same effect.

To create digitally enabled access to traditional, physical sources (such as letters, books, and so forth) we must first create a digital surrogate, or representation, of that object. It is for this reason that digitization programs that incentivize the creation of digital surrogates of artifactual evidence for access as well as conservation purposes are a popular activity within, and outside of, Digital Humanities. We must remember, though, that while digitization can help us conserve objects—as it reduces physical handling, for one—it should not be viewed as preservation. So, while the high-level requirement for access is fulfilled, the requirement for sustainable access is not. It is a misnomer to fund digitization projects in order to “preserve” a physical object without also providing, or allocating, funds to stabilize and preserve the newly created digital one. Digitization efforts can often divert funds from conservation or preservation of the physical object; however, while this may not have an immediate effect, negligence in the short- to medium-term is costly in the long-term.[5] The same can also be said for preserving digital surrogates and objects.

The requirement to provide sustained access is linked to the need for openness and transparency with the artifactual evidence used in academic and public research. Digitization process can support access to sources, but this does not ensure that the object is available, accessible, or useable in the future. Traditionally the responsibility to store and preserve cultural objects has fallen to cultural institutions, archives, and libraries. However, their remit now extends to creating digital surrogates of these objects in order to make them accessible and functional online or in a digital environment. Digitization—that is, making a digital image of an object, creating associated metadata, and, where applicable, encoding the text—has been a key activity within Digital Humanities projects. It can be both the final goal of a project or a feature that supports or enables further research and development. Access may be considered the minimum in terms of the functionality of digital objects and Digital Humanities research, but the effort involved to fulfill this requirement is not trivial. And yet, this process or technique is not wholly or solely the concern or remit of Digital Humanities; rather, it is an activity that extends to all areas of humanities and is part of the humanities’ extensive toolkit. Whether digitization of particular sources satisfies a project’s goals or deliverables is, of course, dependent on those goals and, indeed, the requirements of the various users and the project.

Beyond digitization, a huge and often complex part of access is the building and managing of websites, online exhibitions, digital libraries, archives, and repositories; once an object is digitized, it needs a home to facilitate access. In this context, and in the context of the capabilities of such environments, we must further consider, and reconsider, the needs of the various users. What functionality will a user expect beyond access, and how can added value be achieved through tools that can help users interrogate, manipulate, and repurpose content? The development of such digital infrastructures must be driven by those user requirements, but inevitably policy and business requirements will also drive this development— considerations such as copyright law, intellectual property rights, and licensing for reuse and repurposing must be addressed.[6] However, when we provide access, how can we ensure that the same object is available to future users? Can we provide sustained access to an object’s form and functionality? Can we satisfy the needs of the future user or researcher? These types of projects, that often use open-source repository and content management systems, are susceptible to problems related to maintenance and sustainability.[7] So, while projects that digitize and build portals for access are fulfilling a number of primary requirements such as access and citation functionality, they require continued and persistent resources to sustain and preserve both the content and the form, the infrastructure, to fulfil the requirement related to sustained access.[8]

Sustained access to sources—as well to the websites, mobile apps, and other digital environments that use, repurpose, and create new resources and research data—is paramount to building trust between the various “users” of humanities scholarship; but, importantly, it also gives credibility to that scholarship. Many of these environments are also the output of various research projects, and therefore their form and functionality must also be considered for preservation. But long-term digital preservation is a complex task, and it is one requirement that we can never truly say is fulfilled or complete. It is an activity that requires constant awareness given the volatile nature of the digital medium and the quick pace of current technological change and advances. As a process and an activity, it is dependent upon many individual, interconnecting components—factors include hardware obsolescence (for example, degradation of storage media and other equipment), software obsolescence (for example, problems associated with backward compatibility and software upgrades, unsupported software applications, outdated code, and so on and so forth), as well as the financial and institutional supports required to sustain software applications and online environments. And yet, the question must be asked: is this a body of work that should be part of Digital Humanities? If so, at what point does the “digital” become disruptive to the “humanities”? Sustaining and developing architectures that are tasked with long-term preservation is a complex task but also an essential one; however, even though Digital Humanities research and project outputs should include provisions for long-term digital preservation, they should not necessarily have to supply the solutions.

Providing sustainable access to objects is an aspect of Digital Humanities that should utilize the national digital infrastructures that are everywhere being built and developed. National digital (information) infrastructures can support humanities endeavors in the digital environment and can offer digital tools to help with user engagement, interrogation, and manipulation of digital content. One such example, in the Irish context, is the Digital Repository of Ireland (DRI), which publically launched its “Data Seal of Approval”-accredited Trusted Digital Repository in June 2015. The Digital Repository of Ireland is built by a research consortium of six academic partners working together to deliver the repository, policies, guidelines and training. These research consortium partners are: Royal Irish Academy (RIA, lead institute), National University of Ireland, Maynooth (NUIM), Trinity College Dublin (TCD), Dublin Institute of Technology (DIT), National University of Ireland, Galway (NUIG), and National College of Art and Design (NCAD). DRI is also supported by a network of academic, cultural, social, and industry partners, including the National Library of Ireland (NLI), the National Archives of Ireland (NAI) and RTÉ. Originally awarded €5.2M from the Higher Education Authority PRTLI Cycle 5 for the period of 2011-2015, DRI has also received awards from Enterprise Ireland, Science Foundation Ireland, The European Commission’s Seventh Framework Programme (FP7), and the Ireland Funds, and has extended its funding out to 2019. The DRI is charged with the responsibility of “building an interactive national trusted digital repository for contemporary and historical, social and cultural data held by Irish institutions.”[9] Additionally, it seeks to “link together and preserve the rich data held by Irish institutions, providing a central internet access point and interactive multimedia tools [...] for use by the public, students and scholars.”[10]

National digital infrastructures—that support national data curators and custodians of culture—remove some of the burden associated with particular aspects of Digital Humanities projects, particularly that of curatorial or archival roles. As stated, sustainability of Digital Humanities projects is a crucial concern: leveraging, state-funded digital infrastructures can alleviate some of the unknown or potential risks involved when project funding runs out. Utilizing national digital infrastructures such as DRI allows the Digital Humanities community to concentrate on developing new tools for user engagement, to develop those added-value features that support new digital methods based upon the capabilities and underlying structure of a trusted digital repository that supports long-term digital preservation. The utilization of these infrastructures can also relinquish some of the responsibility in terms of creating, maintaining, and/or acquiring digital repositories, archives, or libraries, and it provides the means or the room to explore new requirements for current and future users. That said, these national infrastructures still require community and user engagement and cooperation, much like those institutions that are traditionally charged with preserving our cultural heritage do.

Pragmatic concerns related to long-term digital preservation thus become a national concern, and the use of a national digital infrastructure supports and promotes collaborative efforts among and within the local, regional, national, and international community. National discourse and dialogue also promotes and identify areas in which collaboration can enrich Digital Humanities projects that are created nationally. DRI’s national survey on requirements and policy—carried out, along with the subsequent report, between November 2011 and August 2012—outlines, the need for national digital infrastructures that cater to local, national, and international audience.[11]  While specific needs of this community of users are diverse, the need for provisions, education, and collaboration on digital preservation was evident. In addition, on a national level, best-practice guidelines in areas such as, among others, digitization, metadata, and controlled vocabularies, were identified as essential to the development of coherent national strategies to support not only Digital Humanities projects but also projects wider afield (as, for example, within the social sciences). These national strategies are essential to the sustainability and longevity of our digital cultural heritage and rely upon the cooperation and collaboration between and within the national cultural heritage institutions—the custodians of our cultural heritage, now alongside Digital Humanities projects. DRI promotes this essential dialogue between the cultural partners and has tasked itself with providing information and standards for the community; the development of these guidelines, however, relies upon cooperation with that same community.

Importantly, DRI encourages the use and reuse of publicly accessible content and allows third parties to develop tools on top of the DRI platform, using its REST-based API.[12] This requirement was identified through DRI’s stakeholder interviews, in which it was revealed that there existed a significant gap in the production and development of digital tools to support user engagement and interaction with digital cultural heritage. It was in this area, as well as long-term digital preservation, that stakeholders required additional resources to meet user demand and increased user expectations. This development of digital tools to support engagement with content also reflects the needs and requirements of humanities researchers as the digital medium creates new ways and means to interact with sources. The Digital Humanities community can capitalize on existing architectures such as DRI to provide the means to store and preserve digital content, thereby encouraging and supporting the development of tools that motivate new interactions, manipulation, and interrogation of the artifacts related to humanities research. Indeed, utilizing already existing architectures that are geared towards and focused on providing sustainable access encourages and supports the development of new requirements, for new users and new audiences. One such project that leveraged the DRI infrastructure is Inspiring Ireland. This project demonstrates the benefit and value of using existing national infrastructure to promote cultural heritage and highlights the need to preserve our digital cultural record. As the Inspiring Ireland website states, “our digital records and images are fragile and will be lost over time without a trusted national infrastructure to preserve them.”[13] The Inspiring Ireland project was able to develop quickly and efficiently given the existence of a national infrastructure.

Democratization of knowledge through national digital channels not only supports humanities-based research but can also provide levels of trust associated with authenticity of documents, as well as with long-term preservation of data. Additionally, the use of centralized digital infrastructures also promotes transparency in scholarship and enables scholars and readers to engage more thoroughly in the sources, resources, and artifacts that influence and inform scholarly narratives. National digital infrastructures allow us to fulfill the stated needs of the humanities scholar in terms of artifactual evidence and encourage the development of new digital methods and tools through the capabilities of the infrastructure, the community, and shared resources.

When we consider the humanities within the context of current technological change, we are reconfiguring the requirements of the humanities scholar through the digital lens and extending the functionality of humanities endeavors: we are supplementing our memory, our methods of research, and our natural scholarly ability to cope with and consider the scale of information available to us. Digitization and the development of supporting architectures enhance access and usability of humanists’ sources. Through requirements engineering and engagement with potential users, Digital Humanities projects can add value to those objects by exploring new possibilities born from the digital medium. But, as stated, we must make provisions for long-term digital preservation of those research outputs. Preservation is an activity that cannot happen retrospectively, and it is only useful if procedures and processes are in place now. We must explicitly make those social and philosophical, as well as cultural, arrangements to support digital preservation activity. Digital preservation is about anticipating and observing changes in technology that could potentially result in file corruption, loss, or inaccessibility—formats evolve, websites die, hard drives fail. But these procedures have as much to do with policy and business strategies as they do with technologic solutions that contribute to the preservation effort. Digital preservation must be an active process; it cannot be reactive or passive. In the digital medium, the focal point of humanities research—artifactual evidence—are volatile and fragile, and they require strategies and procedures to sustain them into the future. National digital infrastructures can provide supports for these issues. Additionally, aggregating content through national digital channels can support comprehensive searches and allow centralized access to our knowledge and cultural economies; they can also support new and unanticipated requirements. Utilizing national infrastructures such as DRI can remove some of the burden related to Digital Humanities projects—that of sustainability and maintenance—and can help Digital Humanities practitioners, researchers, observers, and theorists focus on adding value to digital resources through the creation and generation of new digital tools and technologies that will, ultimately, help shape our thought.

