Nobody knows what Digital Curation is. But they’re all doing it.

Humpty Dumpty and Alice. From Through the Look...

Humpty Dumpty and Alice. From Through the Looking-Glass. Illustration by John Tenniel. (Photo credit: Wikipedia)

As you may have noticed, I am quite keen on conceptual clarity, and the important role of semantics (word meanings) to achieve that.  The field of the information professional is, rather paradoxically I would have thought, fraught with vague terms, obscure phrases, mots du jour and recondite terminology.  Well, I suppose every field of endeavour has to keep things a little elusive in order to protect its territory from invasion by barbarians – those who have not been properly initiated. But with the increasing fascination with ‘Digital Curation‘, the passion for trend-following, re-inventing the wheel and just basic lack of clarity, seems to have reached its zenith.

Consider, if you will, the following interpretations of the phrase. This will undoubtedly remind you of Lewis Carroll‘s Humpty Dumpty (“‘When I use a word,’ Humpty Dumpty said, in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.”)

1.  A digital curator is a normal curator, but works with digital materials.

The traditional definition of curator is someone who is the keeper of a museum or other collection. In social media, a curator is the keeper of their interest graphs. By discovering, organizing, and sharing relevant content from around the Web, curators invest in the integrity and vibrancy of their nicheworks and the relationships that define them. Information becomes currency and the ability to repackage something of interest as a compelling, consumable and also sharable social object is an art.  As a result, the social capital of a curator is earned through qualifying, filtering, and refining relevant content and how well

objects spark engagement and learning (Solis, 2011, online).

2.  Digital Curation.

Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets.  Digital curation generally refers to the process of establishing and developing long term repositories of digital assets for current and future reference by researchers, scientists, historians, and scholars. Enterprises are starting to utilize digital curation to improve the quality of information and data within their operational and strategic processes (Wikipedia, 2012b).

Digital curation, broadly interpreted, is about maintaining and adding value to a trusted body of digital information  for both current and future use: in other words, it is the active management and appraisal of digital information over its entire life cycle (Pennock, 2007, online).

Pennock’s ‘lifecycle’ approach is a variation of that used in records management, which has, to all intents and purposes, been replaced by the continuum approach as developed by Upward and McKemmish (see McKemmish, 1997).

3.  Similar to, or the same as, ‘data curation’

Data curation is a term used to indicate management activities required to maintain research data long-term such that it is available for reuse and preservation. In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a biological database.[1] The term is also used in the humanities, where increasing cultural and scholarly data from digital humanities projects requires the expertise and analytical practices of data curation.[2] In broad terms, curation means a range of activities and processes done to create, manage, maintain, and validate a component (Wikipedia, 2012a, online).

..data curation is a means to collect, organize, validate, and preserve data so that scientists can find new ways to address the grand research challenges that face society (Choudury, 2010, online).

Data curation views the whole life cycle of digital objects and views the object in its intellectual context. It strives to preserve not only the data item but also the meaning of the digital object to foster comprehension and reusability in the future when the original context (e.g., the creators, organisational environment) has disappeared.

Components of data curation include data and metadata modelling, appraisal of the object’s value for selection as well as retention/disposition, integration of the data item in usage environments, object versioning, and others. Examples include the capture, cleaning, filtering, and iterative processing and visualisation of instrumental data; or the enrichment of ancient texts with dictionaries and other contextual data from their time of creation.

Data curation tasks usually involve considerable domain knowledge and depend on the cooperation of the creators, as well as close embedding in the creation context. Generic service providers  are therefore usually not in place to fully cover data curation. Data-quality issues (e.g., accuracy, completeness, and consistency) are clearly very community specific. However, more specialised service providers, such as World Date Centers, can adequately support curation within a community (GRDI 2020, 2011).

4.  Content Curation as a synonym.

Content curation, meant as the capacity of filtering and adding value to the content we receive and are exposed to everyday from all the online sources… A content curator is a critical knowledge broker who seeks, collects and shares on a continuous base the most relevant content in her area of expertise (Fiorelli, 2011).

Content curation is the act of discovering, gathering, and presenting digital content that surrounds specific subject matter (Mullan, 2011).

Content curation seems to be very similar to what librarians have traditionally done, except in digital mode.

5.  Digital Library

A digital library is a library in which collections are stored in digital formats (as opposed to print, microform, or other media) and accessible via computers. The digital content may be stored locally, or accessed remotely via computer networks. A digital library is a type of information retrieval system (Wikipedia, 2012c, online).

An informal definition of a digital library is a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network. A crucial part of this definition is that the information is managed. A stream of data sent to earth from a satellite is not a library. The same data, when organized systematically, becomes a digital library collection. Most people would not consider a database containing financial records of one company to be a digital library, but would accept a collection of such information from many companies as part of a library. Digital libraries contain diverse information for use by many different users. Digital libraries range in size from tiny to huge. They can use any type of computing equipment and any suitable software. The unifying theme is that information is organized on computers and available over a network, with procedures to select the material in the collections, to organize it, to make it available to users, and to archive it (Arms, 2000, p. 9).

Digital libraries basically store materials in electronic format and manipulate large collections of those materials effectively. Research into digital libraries is research into network information systems, concentrating on how to develop the necessary infrastructure to effectively mass-manipulate the information on the Net (NSF, 1999).

A digital library, like any library, is a service which is based on principles of selection, acquisition, access, management and preservation, related to a specific client community (Cathro, 1999, online).

6.  Digital archives

Of particular note is the definition of ‘digital archiving’ which, for the purposes of this paper, is taken to include all of the processes associated with selecting, acquiring, describing, managing, preserving and providing access to digital collections. The choice of the phrase ‘archiving’ is deliberate in that it aims to reclaim and reinstate meaning to a word that has been co-opted and used in a narrow and distorted way by the information technology industry and profession (CAARA, 2006, p. 7).

A digital archives is a repository that stores one or more collections of digital information objects with the intention of providing long-term access to the information. A digital archives can be a sophisticated, multi-tiered storage system or simply a C:\ drive on someone’s home computer (van Garderen , 2005, online).

7.  Digital repositories

Digital archives accept and preserve digital content for long-term use. Increasingly, stakeholders are creating large-scale digital repositories to ingest surrogates of archival resources or digitized books whose intellectual value as surrogates may exceed that of the original sources themselves. Although digital repository developers have expended significant effort to establish the trustworthiness of repository procedures and infrastructures, relatively little attention has been paid to the quality and usefulness of the preserved content itself. In situations where digital content has been created by third-party firms, content quality (or its absence in the form of unacceptable error) may directly influence repository trustworthiness. This article establishes a conceptual foundation for the association of archival quality and information quality research. It outlines a research project that is designed to develop and test measures of quality for digital content preserved in HathiTrust, a large-scale preservation repository. The research establishes methods of measuring error in digitized books at the data, page, and volume level and applies the measures to statistically valid samples of digitized books, adjusting for inter-coder inconsistencies and the effects of sampling strategies. The research findings are then validated with users who conform to one of four use-case scenarios: reading online, printing on demand, data mining, and print collection management. The paper concludes with comments on the implications of assessing archival quality within a digital preservation context.

8.  Digital Humanities

The digital humanities is an area of research, teaching, and creation concerned with the intersection of computing and the disciplines of the humanities. Developing from an earlier field called humanities computing, today digital humanities embrace a variety of topics ranging from curating online collections to data mining large cultural data sets. Digital Humanities currently incorporates both digitized and born-digital materials and combines the methodologies from the traditional humanities disciplines (such as history, philosophy, linguistics, literature, art, archaeology, music, and cultural studies) with tools provided by computing (such as data visualisation, information retrieval, data mining, statistics, computational analysis) and digital publishing (Wikipedia, 2012d, online).

Schnapp et al. (2008) make this overlap (or confusion) very clear, where the digital humanities are considered to be an ‘array of practices’ and these sites are given as examples.

Europeana http://www.europeana.eu/portal/

Gallica http://gallica.bnf.fr

GoogleBooks http://books.google.com/

Index Thomisticus http://www.corpusthomisticum.org/it/

Internet Archive http://archive.org/

Perseus Digital Library http://www.perseus.tufts.edu/hopper/

The wide array of work undertaken in the area of digital humanties, as described by Zorich (2008) indicates its scope.  It is yet to be clarified which of these tasks are to be undertaken by digital librarians, and which by humanities scholars (or even computer scientists).

A digital humanities center is an entity where new media and technologies are used for humanities-based research, teaching, and intellectual engagement and experimentation. The goals of the center are to further humanities scholarship, create new forms of knowledge, and explore technology’s impact on humanities based disciplines. To accomplish these goals, a digital humanities center undertakes some or all of the following activities:

builds digital collections as scholarly or teaching resources; creates tools for

  • authoring (i.e., creating multimedia products and applications with minimal technical knowledge or training)

  • building digital collections

  • analyzing humanities collections, data, or research processes

  • managing the research process;

  • uses digital collections and analytical tools to generate new intellectual products;

  • offers digital humanities training (in the form of workshops, courses, academic degree programs, postgraduate and faculty training, fellowships, and internships);

  • offers lectures, programs, conferences, or seminars on digital humanities topics for general or academic audiences;

  • has its own academic appointments and staffing (i.e., staff does not rely solely on faculty located in another academic department);

  • provides collegial support for, and collaboration with, members of other academic departments within the DHC’s home institution (e.g., offers free or fee-based consultation services; enters into collaborative projects with other campus departments);

  • provides collegial support for, and collaboration with, members of other academic departments, organizations, or projects outside the DHC’s home institution (e.g., offers free or fee-based consultation to outside groups; enters into collaborative projects with external groups);

  • conducts research in humanities and humanities computing (digital scholarship);

  • creates a zone of experimentation and innovation for humanists;

  • serves as an information portal for a particular humanities discipline;

  • serves as a repository for humanities-based digital collections (e.g., Web sites, electronic text projects, QuickTime movie clips);

  • provides technology solutions to humanities departments (e.g., serves an information (Zorich, 2008).

Please note that all material in grey boxes are extracts from other works, noted below.

References

Arms, William.  2000.   Digital libraries.  Cambridge [MA]: MIT Press.

Bodnar, Kipp.  2010.  The marketer’s guide to content curation [Online].  http://blog.hubspot.com/blog/tabid/6307/bid/6800/A-Marketer-s-Guide-to-Content-Curation.aspx

Cathro, Warwick.  1999.  Digital libraries: a national library perspective.  [Online].  http://www.nla.gov.au/openpublish/index.php/nlasp/article/viewArticle/1112/1375

Choudury, Sayeed.  201.  Data curation: an ecological perspective.  In: College and research libraries news.  [Online].  http://crln.acrl.org/content/71/4/194.full

Conway, Paul.  2011.  Archival quality and long-term preservation: a research framework for validating the usefulness of digital surrogates.  In: Archival science.  [Online].  http://hathitrust-quality.projects.si.umich.edu/files/Article.pdf

Council of Australasian Archives and Records Authorities.  2006.  Digital archiving in the 21st century: archives domain discussion paper.  Canberra: National Archives of Australia.  [Online].  http://www.caara.org.au/wp-content/uploads/2010/03/DigitalArchiving21C.pdf

Fiorelli, Gianluca. 2011.  Content Curation: definition and generation. [Online]. http://www.iloveseo.net/content-curation-definition-and-generation/

Global Research Data Infrastructure2020.  2011.  http://grdi2020.eu/mediawiki/index.php/Data_Curation

Lord, Philip et al.  2010. From data deluge to data curation.  [Online].  http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/150.pdf

McKemmish, Sue 1997.  Yesterday, today and tomorrow: a continuum of responsibility.  In: Proceedings of the Records Management Association of Australia 14th National Convention, 15-17 Sept 1997, RMAA Perth 1997. [Online]. http://www.infotech.monash.edu.au/research/groups/rcrg/publications/recordscontinuum-smckp2.html

Mullan, Eileen.  2011.  What is content curation?  In: EContent.  [Online]. http://www.econtentmag.com/Articles/Resources/Defining-EContent/What-is-Content-Curation-79167.htm

Pennock, Maureen.  2007.  Digital curation: a life-cycle approach to managing and preserving usable digital information.  In:  Library and archives, (1).  [Online].  http://www.dcc.ac.uk

Schnapp, Jeffrey et al.  2008.  What is(n’t) Digital Humanities?.  In:  The digital humanities manifesto 2.0.  [Online]. http://www.humanitiesblast.com/manifesto/Manifesto_V2.pdf

Solis, Brian.  2011.  The curation economy and the 3Cs of information commerce.  [Online].  http://www.briansolis.com/2011/04/the-curation-economy-and-the-three-3c%E2%80%99s-of-information-commerce/

Van Garderen, Peter.  2005.  Digital archives.  [Online].  http://archivemati.ca/2005/11/08/digital-archives/

Wikipedia.  2012a.  Data curation.  [Online].  http://en.wikipedia.org/wiki/Data_curation

Wikipedia.  2012b.  Digital curation.  [Online].  http://en.wikipedia.org/wiki/Digital_curation

Wikipedia, 2012c.  Digital library.  [Online].  http://en.wikipedia.org/wiki/Digital_libraries

Wikipedia.  2012d.  Digital humanities.  [Online].  http://en.wikipedia.org/wiki/Digital_humanities

Zorich, Dian.  2008. A survey of digital humanities centers in the United States.  [Online].  http://www.clir.org/pubs/reports/pub143/pub143.pdf

About Susan
Retired academic, website creator, SEO advisor, grandmother. I love the sea, dogs and walks; I hate fluorescent lights and TV sport.

18 Responses to Nobody knows what Digital Curation is. But they’re all doing it.

  1. Pingback: Nobody knows what Digital Curation is. But they’re all doing it. | Mi sitio

  2. Pingback: Nobody knows what Digital Curation is. But they're all doing it. | Cibercultura: una expresión de sujetos rizomaticos contemporáneos | Scoop.it

  3. Pingback: The memory is the mind process happening in your brain, it can never be the artefact that plays back footage of an experience. | My Mind Bursts

  4. Pingback: Nobody knows what Digital Curation is. But they're all doing it ... | JT Information Curation | Scoop.it

  5. Pingback: Nobody knows what Digital Curation is. But they’re all doing it. | Content Creation, Curation, Management | Scoop.it

  6. Dear David
    Thank you so much for your thoughtful comments – and congratulations on your PhD.
    I should point out that just because I posted these definitions doesn’t necessarily imply that I agree with them. My point was simply to show that there are areas of overlap between all these fields, each of which regards itself as distinct and separate from the others. The notion of ‘digital curation’ seems to me to provide an opportunity to do some serious work on NOT re-inventing the wheel (which has been done for some time in the information professions) but it seems rather to have caused more division in an already absurdly fragmented area.
    I would agree that ‘digital preservation’ does appear to be separate from ‘digital curation. However, I would rather suggest that ‘digital preservation’ is a process or function which is part of all the areas mentioned here: big data, data science, digital (or data) repositories, digital archives, digital archives and of course all the digital materials (documents) that are used for advanced studies in the humanities (apart from the tools which aid such investigations, such as text analysis, concordia, and so forth).
    To me, the idea of digital curation involves the selection, organisation, providing access to, securing, protecting and preserving digital documents (which I consider to include, for example, music scores, paintings, books, invoices, globes, statues or anything else that can be digitised, no matter what software is used to create it. In other words, digital curation is what information professionals, such as librarians, archivists, recordkeepers, museologists and gallery curators have always done, except now they are dealing with digital versions. But because of the quantitative bent in our society, there is an obsession with ‘data’, by which is usually meant ‘numbers’, ‘figures’, ‘facts’ or ‘statistics’. I have put these words in inverted commas because each term carries some baggage, depending on who is using it.

    • Oh, I didn’t think you were endorsing the definition, certainly not! Since it’s on Wikipedia, perhaps I should go change the definition, seeing as it doesn’t please me, though? 😉

      As to curation involving preservation, though … well, if we include such activities as managing one’s Flickr photos as a curatorial activity, then preservation is separated from curation, at least insofar as the image formats are dictated by the application and any preservation at the filetype level is managed by the application owners, rather than the curator. Agreed, the individual choosing which photos to preverve is part of the process, but I would argue that activity to be selection, rather than preservation.

      As to all of the buzz-words floating about, I blame it on the field of technology, and upon marketing efforts. So many of these things were invented / adopted in order to sell a particular application or business process. And, as in “big data,” the buzz-words are quite meaningless.

      • I couldn[t agree with you more regarding the inventing of buzzwords by the technology industry. They range from the meaningless to the absurd to the downright contradictory.
        With regard to you other point, do you see preservation as an entirely separate process from curation then, with ‘selection’ as a process under the ‘curation’ umbrella?

        • I tend to view preservation as distinct from curation, yes, primarily because preservation doesn’t require the same level of judgment – preservation, particularly of the digital, may not involve much judgment at all, particularly when applied to massive collections of digital objects. For example, the Flickr images out there: if they’re to be migrated to a new object type (converted from .jpg to something else) then that would fall under digital preservation, yet would not include any decisions regarding how those images are curated. The same could be said for many other preservation activities: that they are not curatorial.

          Now, this is “just” my opinion, of course, but I tend to think of preservation activities as being more along the lines of a technician, rather than someone with deeper knowledge of the subject matter under curation. For example, the person preserving an art object may have deep knowledge of construction methods, etc., but have no knowledge of the history surrounding the artifact itself; the book binder may know how the book was originally bound, using what adhesives, materials, etc., but is not called upon to understand the content. To me, curation is more about content and context than merely objects themselves.

          • Excellent explanation and differentiation – I like this a lot. I believe that the two concepts are often confused or used synonymously, as you identified earlier. Very useful contribution. Thanks, David.

    • Hello Susan, thank you so much for bringing together this valuable definitions of what digital curation is all about.

      With the goal of better understanding your context and objectives in doing this, I wanted to kindly ask why you are linking (fourth link in this article) the word “Web” to this url: http://www.symantec.com/web-security-software and why you also have additional dubious or way-too-generic links such as “service providers” or “word data centers” or “digital information” linked to Wikipedia page for “digital”. Are there specific reasons to do so?

      When I notice such links I am always quite skeptical of the content that surrounds them, as they do not seem to support the topic being explored.

      On the other hand I am sincerely wondering why references to specific and relevant works such as “The wide array of work undertaken in the area of digital humanties, as described by Zorich (2008)” or “Schnapp et al. (2008)” or many of the definitions you have placed in grey boxes do not have an appropriate clickable link accompanying them.

      Can you help me make sense of why you made these decisions?

      I wanted to reference your article, but I need to better understand your motives and these references before I can do so reliably.

      Many thanks in advance,

      Robin Good

      • Dear Robin
        I think I have a virus that randomly identifies words and links them to things that [it] thinks are relevant – including the usual spam-type ads. I did not link ‘web’ to the Symantec site, for example. It seems too that this virus – or perhaps WordPress? – made the links to the non-cited Wikipedia articles you mention. I suppose they do serve the purpose of providing some (rather general) description of the terms.
        There are no links within the boxes because these are direct quotes from other sources, which did not provide such links, rather than my own writing. I am sorry if this is not clear.
        All the best

        PS – sorry about the delay in response – I have been in transit.

        • It’s possibly a plug-in – this would have to have taken place prior to publication, I believe, as you’re hosted by WordPress. So, unless you’re using something else to author your posts, that would be my first guess.

          • Dear Susan, thank you for your kind reply.

            Unfortunately if you chose to leave those spammy links in your text, as it is now, I do not feel appropriate to share and recommend it to others as they will have my own reactions to it.

            I suggest therefore that you do take this situation very seriously and you correct links that should not be there.

            Though I trust your honesty and good will, you are responsible for your own content, pages and links and I think it is very risky, for your own reputation to let this go for much longer without some corrective action.

            P.S.: Do not expect everyone who notices this strange situation, to have the time and will to stop and let you know their disappointment about it. Most will think this is software generated text and leave you in a blink. And that would be a pity, since you are producing some quality work.

            Look forward to see this resolved and updated.

            • Obviously, hyperlinks don’t matter much to some of us, as we’re happy to read the high-quality content, and can judge for ourselves whether a link is worth clicking. I suppose not everyone is tech-savvy, though I wonder how they survive the regular distractions of life without having someone else curate their existence.

            • Dear Robin
              I am sorry that this presented such a problem for you, but I am glad that you brought these issues to my attention. You will note that I have now modified the text accordingly.
              All the best
              S

  7. Pingback: Nobody knows what Digital Curation is. But they’re all doing it. | Digital Delights - Digital Tribes | Scoop.it

  8. Hmm. First off, let me disagree with definition #8. Having recently (2012) been awarded my PhD in Humanities Computing, I wouldn’t say that the degree is “older” and I’m also vaguely uncomfortable with the definition provided in that it places undue emphasis upon the use of digital objects in the research itself, rather than having a balanced inclusion of the humanities (it seems that the digital objects are the primary object of study, rather than the other fields). That said, perhaps I’m an outlier (my study had to do more with sociology and cognitive psychology as applied to the destruction & retention of digital objects).

    Second, it would seem that digital curation must include or involve digital preservation, at least in the longer term. I think that there’s some room here to perhaps disentangle digital preservation from digital curation, and to draw a distinction between the two, such that preservation is a distinct activity necessarily performed upon longer-term collections whereas curation does not necessarily occur in such collections. Drawing such a distinction between curation and preservation might help the thinking about both activities.

Leave a comment