ITIG TechCorner March 18, 2005
ITIG ACRL/NEC Information Technology Interest Group Homepage
ACRL/NEC Home | ITIG Home | ITIG Officers | Join ITIG | ITIG-L
TechCorner | Annual Reports | Meeting Minutes | Programs


ITIG Tech Corner Report on:

"Institutional Repositories:
Capturing and Preserving Digital Collections"

-Presenters-
Eliot Wilczek, Kevin Glick,
Michael Leach and Jeff Riedel

Medical School's Arthur and Martha Pappas Amphitheatre
University of Massachusetts
Worcester, MA
Report by: Janice Schuster - Providence College

Co-sponsored by the Preservation and Conservation Interest Group (P/CIG)

March 18, 2005


Introductions

Melissa Behney, co-chair of ITIG, welcomed the attendees and thanked David Walls, chair of the Preservation and Conservation Interest Group, for his and his group's assistance in planning the program. She also expressed our appreciation to the librarians at the UMass Medical School for their assistance in planning the program and for arranging the wonderful facilities for the program.

David Walls added his welcome to Ms. Behney's and introduced the first speaker. David introduced the other speakers as well, before their presentations.


Here today, here tomorrow: issues concerning digital repositories.
Presented by:  Eliot Wilczek, University Records Manager, Tufts University

Mr. Wilczek's presentation discussed the nature of digital repositories and gave an overview of the issues institutions must consider when implementing a repository. He began with a definition of a repository taken from: Lynch, Clifford. "Institutional repositories: essential infrastructure for scholarship in the digital age," portal, v. 3, issue 2, pp. 327-336 [View as: Lynch, Clifford A. "Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age" ARL, no. 226 (February 2003): 1-7. at http://www.arl.org/newsltr/226/ir.html]:

"...a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its members. It is most essentially an organizational commitment to the stewardship of these digital materials…" (page 328; emphasis mine, JS)
He broke down the above definition into its component parts and discussed each part:
  • Set of services: including receiving the documents into the repository; maintaining them (including who has rights to access them); preserving them; and delivering them to those who need them.
  • Management: including establishment of policies at the institution level and the drafting and execution of procedures.
  • Dissemination: search and delivery tools to deliver meaningful and functional data/objects/records; access and use rights (including copyright, etc.)
  • Digital materials: including objects, publications, data, and records.
  • Stewardship: institutional repository commitment, both long- and short-term, to define who is responsible for the repository, to whom does the responsible entity report? and the relationship between the responsible party and the rest of the institution.
Mr. Wilczek mentioned that there is a distinct difference between a repository and a library: A repository is a collection of output regardless of subject, whereas a library contains holdings which are usually built around some kind of subject order.

He then mentioned some of the issues which must be addressed when setting up a repository:
  • What is the repository's role in the institution? Will it be an active workspace with digital materials entering and possibly leaving or will it be an inactive space just for holding on to materials? What will its relationship be to other systems at the institution (such as the library, archives, etc.)
  • Scope of holdings? Intellectual appraisal (what will be accepted from which contributors?); technical appraisal (which formats will be accepted? PDF, Word, XML, HTML, etc.); creation of the holdings (what format should they be created in?); management of the holdings (retention and disposition).
  • Sustainability: Institutional commitment to preserving the materials (i.e. strategy for ensuring that the materials are still readable in 5, 10, 20 years, in order to meet future demand for the materials).
  • Delivery: Ensure that the material is discoverable, i.e. that metadata is included in order to be able to retrieve the material later; use rights (who has the right to access? restrictions on use by secondary parties, i.e. outside the institution?
  • Service level: agreements as to which digital materials to accept and what levels of service to provide for each type of material.
  • Faculty/researcher participation: What do faculty want from the repository? (They usually want it to disseminate their work; in their own area of study; they want to retain ownership of their work; they DON'T want to be the administrator of the repository).
  • Repository needs: To have easy submission so that faculty will want to submit their work; to control access to the materials; low administrative burden.
Strategies/approaches:
  • Determine needs: both of the institution and of the faculty/researcher
  • Envision needs: Is the repository driven by the institution or by the faculty? For example, at Tufts, the librarians determined the need for the repository, not the faculty.
  • First steps: Can determine policies first, to lay out the ground rules ahead of time; or can build the repository first and develop policies from that (Tufts took the latter approach).
Mr. Wilczek ended his presentation by mentioning A guide to institutional repository software, 2nd ed. (Open Society Institute, 2004)  [View 3rd edition at:http://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_Software_v3.pdf].

View Mr. Wilczek's MS Powerpoint presentation


Fedora and institutional repositories
Presented by:  Kevin Glick, Electronic Records Archivist, Yale University

Mr. Glick's presentation focused on Fedora, the Flexible Extensible Digital Object Repository Architecture, which is an underlying architecture of a digital repository. Fedora supports a variety of tools that compose a repository.

Digital content: includes not just documents: many different resources are kept as a by-product of teaching/research, including dissertations; statistics; e-prints; tapes; file; maps; cds; drawings, etc. Many of the resources are linked to a specific application which must be preserved. The creators of the content want to be able to organize and maintain it but don't have the resources to do it.

Institutional repository issues: how researchers interact with complex digital materials; associate services and tools with objects to provide different presentations of the same objects.

Mr. Glick mentioned that Fedora is an acronym and a trademark of a specific product; the two are arguing over the trademark. At Yale, they wanted a central digital repository; they needed flexibility.

Fedora enables: to contain the content; to manage digital resources and associated metadata; have content available in different contexts; Yale uses it mainly for shared content, not for shared applications or hardware.

Fedora timeline:

  • 1997-Defense advanced research projects agency
  • 2002-present Mellon Foundation grant was made to the University of Virginia for the development and implementation of a large-scale digital object repository and retrieval system that provides fast, integrated access to hundreds of millions of items in a variety of formats. The system is to be deployed by digital libraries and project-oriented humanities groups. (from the Mellon Foundation website, http://juicy.mellon.org/RIT/MellonOSProjects/Fedora/)
  • 2003 Fedora v. 1.0 released
  • 2005 Fedora v. 2.0 released
  • Fedora co-exists with other tools, including the Open Archive Information System model.

    Types of Fedora digital objects:
  • data objects representing digital collections/content
  • behavior definition objects: used as building blocks for dissemination, stores service descriptions
  • behavior mechanism: used as building blocks for dissemination, stores concrete service binding metadata


  • Unique to Fedora:
  • supports complex objects
  • developer-focused, not user-focused (it's not as easy to set up as some other systems; not "right out of the box;" not designed for users (lack of explanatory material, etc.)
  • designed for federation


  • Mr. Glick ended by mentioning websites:
    Fedora Website -- http://www.fedora.info
    Yale Manuscripts and Archives -- http://www.library.yale.edu/mssa/
    Fedora and the Preservation of University Electronic Records research project -- http://dca.tufts.edu/features/nhpoc

    View Mr. Glick's MS Powerpoint presentation


    DSpace implementation and use
    Presented by:  Michael Leach, Director, Physics Research and Kummel Libraries, Harvard University

    Mr. Leach gave details about the Harvard Sciences Digital Library (HSDL) digital repository, based on MIT's open source DSpace code.

    Mission of HSDL: to support the research and teaching needs of the university and to support the missions of each science library at Harvard. The repository reflects the scholarly output of Harvard and is designed to increase access to scholarly materials.

    Digital objects included: articles; theses; video; serials; datasets; learning objects (jpeg files, etc.)

    Why do this?

  • Publishers have allowed the right of authors to self-archive on the web
  • More teaching is done on the web
  • Seminars and other scholarly activity are being recorded in digital form (no longer just in print)


  • Faculty response:
  • Varies from enthusiastic to no interest at all
  • Convenience and access are key
  • Archival possibilities (can archive on repository what a faculty has on their individual web site; will be permanent on the repository).


  • Harvard librarian response:
  • Some individual schools have no interest; some are curious (wait and see attitude); some contentious (feel the repository will draw resources from other projects); some very excited.

    Development of policies: Policies are time-consuming but must be drafted in order to have some control over the repository; the policies will evolve over time. Should cover:
  • What can be submitted and by whom? (must be scholarly and curriculum-related; may be ephemeral; fun stuff (games, etc.) not allowed; if submitted by student, must be connected to a Harvard faculty member or lab.)
  • Rights/responsibilities of authors (fee v. free, etc.)
  • Legal issues (Harvard requires contributor to sign statement indicating that they have the legal right to give the material to the repository)
  • Workflow issues (volume issues: how much work does the repository create? At Harvard, they are redesigning services within the library to accommodate the additional work; something must go in order to do this since there are not unlimited resources.)


  • Training: Harvard is working on training manuals as well as on a "train-the-trainer" project; learn by sharing (listserv, etc.)

    Awareness: Publicity/marketing are vital
  • Attend administrative meetings
  • Emphasize the public service aspects of the repository, such as copyright resolution
  • Develop user needs analysis programs
  • Brand the repository and show it off


  • Future:
  • Possibly add non-scientific objects?
  • Re-visit which objects to accept (currently don't accept undergraduate theses; possibly accept portfolio items for students; Harvard internal reports?)
  • How much bureaucracy is needed beyond the steering committee?


  • View Mr. Leach's MS Powerpoint presentation


    ProQuest Digital Commons
    Presented by:  Jeff Riedel, Program Manager, Digital Commons, ProQuest Information and Learning

    Mr. Riedel's presentation provided an overview of digital repositories, including system architecture; organizational challenges and technological solutions; and touched on repositories as scholarly publication tools.

    Digital Commons=ProQuest's institutional repository service:

  • Powered by Bepress (Berkeley University electronic publishing); Bepress provides the technical expertise; ProQuest provides setup; training; electronic tech support for librarians, faculty and end-users; documentation; upgrades; hosting.


  • Key features:
  • Included in Google and other search engines
  • Full-text searching
  • Can save searches; email notification when new content is added
  • Service is hosted at secure, redundant site (mirror site to the upfront one for instant backup)
  • Nightly backups
  • URL in institution's domain, i.e. http://digitalcommons.myuniv.edu
  • You own and keep your content (ProQuest retains no copyright/ownership
  • Upon termination of contract, objects can be imported to new system


  • Current Digital Commons customers include
  • Boston College's eScholarship@Boston College http://escholarship.bc.edu/
  • University of Connecticut http://digitalcommons.uconn.edu


  • Challenges:
  • Content recruitment (email faculty inviting them to upload their papers to the repository)
  • Faculty engagement (easy to use; can have grad. student enter the information)
  • Author retention (confirming email to the author when work is posted, including URL; monthly email to author with usage statistics for works; reminder about links to papers and to series; invitation to submit additional work)




  • View Mr. Riedel's MS Powerpoint presentation.

    More information can be found at http://www.umi.com/proquest/digitalcommons/.


    Report by:
    Janice Schuster
    Reference Services
    Providence College
    Providence RI
    jschustr@providence.edu


    Comments Welcome!




    © Copyright 1999-2005, ITIG ACRL/NEC Information Technology Interest Group. All Rights Reserved.
    Website currently maintained by ITIG Webmaster, Olga Verbeek
    ITIG URL: http://www.acrlnec.org/sigs/itig/
    Last updated: Friday May 27, 2005.
    ACRL/NEC Home
    ACRL/NEC Newsletter
    Join ACRL/NEC
    ACRL Home