Nobody knows what Digital Curation is. But they’re all doing it.
January 9, 2013 18 Comments
As you may have noticed, I am quite keen on conceptual clarity, and the important role of semantics (word meanings) to achieve that. The field of the information professional is, rather paradoxically I would have thought, fraught with vague terms, obscure phrases, mots du jour and recondite terminology. Well, I suppose every field of endeavour has to keep things a little elusive in order to protect its territory from invasion by barbarians – those who have not been properly initiated. But with the increasing fascination with ‘Digital Curation‘, the passion for trend-following, re-inventing the wheel and just basic lack of clarity, seems to have reached its zenith.
Consider, if you will, the following interpretations of the phrase. This will undoubtedly remind you of Lewis Carroll‘s Humpty Dumpty (“‘When I use a word,’ Humpty Dumpty said, in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.”)
1. A digital curator is a normal curator, but works with digital materials.
The traditional definition of curator is someone who is the keeper of a museum or other collection. In social media, a curator is the keeper of their interest graphs. By discovering, organizing, and sharing relevant content from around the Web, curators invest in the integrity and vibrancy of their nicheworks and the relationships that define them. Information becomes currency and the ability to repackage something of interest as a compelling, consumable and also sharable social object is an art. As a result, the social capital of a curator is earned through qualifying, filtering, and refining relevant content and how well
objects spark engagement and learning (Solis, 2011, online).
2. Digital Curation.
Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets. Digital curation generally refers to the process of establishing and developing long term repositories of digital assets for current and future reference by researchers, scientists, historians, and scholars. Enterprises are starting to utilize digital curation to improve the quality of information and data within their operational and strategic processes (Wikipedia, 2012b).
Digital curation, broadly interpreted, is about maintaining and adding value to a trusted body of digital information for both current and future use: in other words, it is the active management and appraisal of digital information over its entire life cycle (Pennock, 2007, online).
Pennock’s ‘lifecycle’ approach is a variation of that used in records management, which has, to all intents and purposes, been replaced by the continuum approach as developed by Upward and McKemmish (see McKemmish, 1997).
3. Similar to, or the same as, ‘data curation’
Data curation is a term used to indicate management activities required to maintain research data long-term such that it is available for reuse and preservation. In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a biological database. The term is also used in the humanities, where increasing cultural and scholarly data from digital humanities projects requires the expertise and analytical practices of data curation. In broad terms, curation means a range of activities and processes done to create, manage, maintain, and validate a component (Wikipedia, 2012a, online).
..data curation is a means to collect, organize, validate, and preserve data so that scientists can find new ways to address the grand research challenges that face society (Choudury, 2010, online).
Data curation views the whole life cycle of digital objects and views the object in its intellectual context. It strives to preserve not only the data item but also the meaning of the digital object to foster comprehension and reusability in the future when the original context (e.g., the creators, organisational environment) has disappeared.
Components of data curation include data and metadata modelling, appraisal of the object’s value for selection as well as retention/disposition, integration of the data item in usage environments, object versioning, and others. Examples include the capture, cleaning, filtering, and iterative processing and visualisation of instrumental data; or the enrichment of ancient texts with dictionaries and other contextual data from their time of creation.
Data curation tasks usually involve considerable domain knowledge and depend on the cooperation of the creators, as well as close embedding in the creation context. Generic service providers are therefore usually not in place to fully cover data curation. Data-quality issues (e.g., accuracy, completeness, and consistency) are clearly very community specific. However, more specialised service providers, such as World Date Centers, can adequately support curation within a community (GRDI 2020, 2011).
4. Content Curation as a synonym.
Content curation, meant as the capacity of filtering and adding value to the content we receive and are exposed to everyday from all the online sources… A content curator is a critical knowledge broker who seeks, collects and shares on a continuous base the most relevant content in her area of expertise (Fiorelli, 2011).
Content curation is the act of discovering, gathering, and presenting digital content that surrounds specific subject matter (Mullan, 2011).
Content curation seems to be very similar to what librarians have traditionally done, except in digital mode.
5. Digital Library
A digital library is a library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible via computers. The digital content may be stored locally, or accessed remotely via computer networks. A digital library is a type of information retrieval system (Wikipedia, 2012c, online).
An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. A crucial part of this definition is that the information is managed. A stream of data sent to earth from a satellite is not a library. The same data, when organized systematically, becomes a digital library collection. Most people would not consider a database containing financial records of one company to be a digital library, but would accept a collection of such information from many companies as part of a library. Digital libraries contain diverse information for use by many different users. Digital libraries range in size from tiny to huge. They can use any type of computing equipment and any suitable software. The unifying theme is that information is organized on computers and available over a network, with procedures to select the material in the collections, to organize it, to make it available to users, and to archive it (Arms, 2000, p. 9).
Digital libraries basically store materials in electronic format and manipulate large collections of those materials effectively. Research into digital libraries is research into network information systems, concentrating on how to develop the necessary infrastructure to effectively mass-manipulate the information on the Net (NSF, 1999).
A digital library, like any library, is a service which is based on principles of selection, acquisition, access, management and preservation, related to a specific client community (Cathro, 1999, online).
6. Digital archives
Of particular note is the definition of ‘digital archiving’ which, for the purposes of this paper, is taken to include all of the processes associated with selecting, acquiring, describing, managing, preserving and providing access to digital collections. The choice of the phrase ‘archiving’ is deliberate in that it aims to reclaim and reinstate meaning to a word that has been co-opted and used in a narrow and distorted way by the information technology industry and profession (CAARA, 2006, p. 7).
A digital archives is a repository that stores one or more collections of digital information objects with the intention of providing long-term access to the information. A digital archives can be a sophisticated, multi-tiered storage system or simply a C:\ drive on someone’s home computer (van Garderen , 2005, online).
7. Digital repositories
Digital archives accept and preserve digital content for long-term use. Increasingly, stakeholders are creating large-scale digital repositories to ingest surrogates of archival resources or digitized books whose intellectual value as surrogates may exceed that of the original sources themselves. Although digital repository developers have expended significant effort to establish the trustworthiness of repository procedures and infrastructures, relatively little attention has been paid to the quality and usefulness of the preserved content itself. In situations where digital content has been created by third-party firms, content quality (or its absence in the form of unacceptable error) may directly influence repository trustworthiness. This article establishes a conceptual foundation for the association of archival quality and information quality research. It outlines a research project that is designed to develop and test measures of quality for digital content preserved in HathiTrust, a large-scale preservation repository. The research establishes methods of measuring error in digitized books at the data, page, and volume level and applies the measures to statistically valid samples of digitized books, adjusting for inter-coder inconsistencies and the effects of sampling strategies. The research findings are then validated with users who conform to one of four use-case scenarios: reading online, printing on demand, data mining, and print collection management. The paper concludes with comments on the implications of assessing archival quality within a digital preservation context.
8. Digital Humanities
The digital humanities is an area of research, teaching, and creation concerned with the intersection of computing and the disciplines of the humanities. Developing from an earlier field called humanities computing, today digital humanities embrace a variety of topics ranging from curating online collections to data mining large cultural data sets. Digital Humanities currently incorporates both digitized and born-digital materials and combines the methodologies from the traditional humanities disciplines (such as history, philosophy, linguistics, literature, art, archaeology, music, and cultural studies) with tools provided by computing (such as data visualisation, information retrieval, data mining, statistics, computational analysis) and digital publishing (Wikipedia, 2012d, online).
Schnapp et al. (2008) make this overlap (or confusion) very clear, where the digital humanities are considered to be an ‘array of practices’ and these sites are given as examples.
Index Thomisticus http://www.corpusthomisticum.org/it/
Internet Archive http://archive.org/
Perseus Digital Library http://www.perseus.tufts.edu/hopper/
The wide array of work undertaken in the area of digital humanties, as described by Zorich (2008) indicates its scope. It is yet to be clarified which of these tasks are to be undertaken by digital librarians, and which by humanities scholars (or even computer scientists).
A digital humanities center is an entity where new media and technologies are used for humanities-based research, teaching, and intellectual engagement and experimentation. The goals of the center are to further humanities scholarship, create new forms of knowledge, and explore technology’s impact on humanities based disciplines. To accomplish these goals, a digital humanities center undertakes some or all of the following activities:
builds digital collections as scholarly or teaching resources; creates tools for
authoring (i.e., creating multimedia products and applications with minimal technical knowledge or training)
building digital collections
analyzing humanities collections, data, or research processes
managing the research process;
uses digital collections and analytical tools to generate new intellectual products;
offers digital humanities training (in the form of workshops, courses, academic degree programs, postgraduate and faculty training, fellowships, and internships);
offers lectures, programs, conferences, or seminars on digital humanities topics for general or academic audiences;
has its own academic appointments and staffing (i.e., staff does not rely solely on faculty located in another academic department);
provides collegial support for, and collaboration with, members of other academic departments within the DHC’s home institution (e.g., offers free or fee-based consultation services; enters into collaborative projects with other campus departments);
provides collegial support for, and collaboration with, members of other academic departments, organizations, or projects outside the DHC’s home institution (e.g., offers free or fee-based consultation to outside groups; enters into collaborative projects with external groups);
conducts research in humanities and humanities computing (digital scholarship);
creates a zone of experimentation and innovation for humanists;
serves as an information portal for a particular humanities discipline;
serves as a repository for humanities-based digital collections (e.g., Web sites, electronic text projects, QuickTime movie clips);
provides technology solutions to humanities departments (e.g., serves an information (Zorich, 2008).
Please note that all material in grey boxes are extracts from other works, noted below.
Arms, William. 2000. Digital libraries. Cambridge [MA]: MIT Press.
Bodnar, Kipp. 2010. The marketer’s guide to content curation [Online]. http://blog.hubspot.com/blog/tabid/6307/bid/6800/A-Marketer-s-Guide-to-Content-Curation.aspx
Cathro, Warwick. 1999. Digital libraries: a national library perspective. [Online]. http://www.nla.gov.au/openpublish/index.php/nlasp/article/viewArticle/1112/1375
Choudury, Sayeed. 201. Data curation: an ecological perspective. In: College and research libraries news. [Online]. http://crln.acrl.org/content/71/4/194.full
Conway, Paul. 2011. Archival quality and long-term preservation: a research framework for validating the usefulness of digital surrogates. In: Archival science. [Online]. http://hathitrust-quality.projects.si.umich.edu/files/Article.pdf
Council of Australasian Archives and Records Authorities. 2006. Digital archiving in the 21st century: archives domain discussion paper. Canberra: National Archives of Australia. [Online]. http://www.caara.org.au/wp-content/uploads/2010/03/DigitalArchiving21C.pdf
Fiorelli, Gianluca. 2011. Content Curation: definition and generation. [Online]. http://www.iloveseo.net/content-curation-definition-and-generation/
Global Research Data Infrastructure2020. 2011. http://grdi2020.eu/mediawiki/index.php/Data_Curation
Lord, Philip et al. 2010. From data deluge to data curation. [Online]. http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/150.pdf
McKemmish, Sue 1997. Yesterday, today and tomorrow: a continuum of responsibility. In: Proceedings of the Records Management Association of Australia 14th National Convention, 15-17 Sept 1997, RMAA Perth 1997. [Online]. http://www.infotech.monash.edu.au/research/groups/rcrg/publications/recordscontinuum-smckp2.html
Mullan, Eileen. 2011. What is content curation? In: EContent. [Online]. http://www.econtentmag.com/Articles/Resources/Defining-EContent/What-is-Content-Curation-79167.htm
Pennock, Maureen. 2007. Digital curation: a life-cycle approach to managing and preserving usable digital information. In: Library and archives, (1). [Online]. http://www.dcc.ac.uk
Schnapp, Jeffrey et al. 2008. What is(n’t) Digital Humanities?. In: The digital humanities manifesto 2.0. [Online]. http://www.humanitiesblast.com/manifesto/Manifesto_V2.pdf
Solis, Brian. 2011. The curation economy and the 3Cs of information commerce. [Online]. http://www.briansolis.com/2011/04/the-curation-economy-and-the-three-3c%E2%80%99s-of-information-commerce/
Van Garderen, Peter. 2005. Digital archives. [Online]. http://archivemati.ca/2005/11/08/digital-archives/
Wikipedia. 2012a. Data curation. [Online]. http://en.wikipedia.org/wiki/Data_curation
Wikipedia. 2012b. Digital curation. [Online]. http://en.wikipedia.org/wiki/Digital_curation
Wikipedia, 2012c. Digital library. [Online]. http://en.wikipedia.org/wiki/Digital_libraries
Wikipedia. 2012d. Digital humanities. [Online]. http://en.wikipedia.org/wiki/Digital_humanities
Zorich, Dian. 2008. A survey of digital humanities centers in the United States. [Online]. http://www.clir.org/pubs/reports/pub143/pub143.pdf
- Digital Curator at British Library (digital-scholarship.org)
- “Digital Curation in the Academic Library Job Market” (digital-scholarship.org)
- UNC at Chapel Hill Offers Post-Masters Certificate in Data Curation (digital-scholarship.org)
- Digital Curation and CALL (randallrebman.blogspot.com)
- Job: Digital Curator at British Library (laurientaylor.org)
- The Future of Libraries (wnyc.org)
- Building a digital preservation toolkit for digital curators (girlinthearchive.wordpress.com)
- Digital Curation Workstation (mith.umd.edu)
- Trends: Be Discovered in 2013 via Content Curation and the Interest Graph (pamil-visions.net)