the OAI Protocol for Metadata Harvesting an update Herbert Van de Sompel Los Alamos National Laboratory – Research Library herbert van de sompel The Open Archives Initiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. Paul Ginsparg, Rick Luce & Herbert Van de Sompel herbert van de sompel Luce * Van de Sompel * Ginsparg herbert van de sompel • 2 core motivations • as a systems librarian: change the system • as a researcher: find (technical) ways to facilitate the change herbert van de sompel as a systems librarian optimizing the output A P U B D IS L I B the input is far from optimal herbert van de sompel R eprint systems • xxx e-print archive (Physics - 1991 - Los Alamos - Ginsparg) • RePEc (Economy - Surrey U - Krichel) • NCSTRL (Computer Science - Cornell U - Lagoze) • NDLTD (Theses - Virginia Tech - Fox) • CogPrints (Cognitive Sciences - Southampton U - Harnad) herbert van de sompel as a researcher • eprints are attractive building block in ongoing transformation of scholarly communication • but: interoperability could increase impact of e- prints: • amongst e-print solutions • with building blocks that implement other functions of scholarly communication • with the established communication system herbert van de sompel UPS Prototype: eprints discovery • 1999: Van de Sompel, Krichel, Nelson • results: • insights regarding how un-interoperable the systems were • a cross-repository searching and linking service • recommendations to the Santa Fe meeting: • data provider / service provider model • metadata harvesting • simplicity herbert van de sompel evolution towards OAI-PMH v.2.0 Santa Fe Convention [02/2000] OAI-PMH 1.0 [01/2001] OAI-PMH 2.0 [06/2002] herbert van de sompel Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0 nature experimental experimental stable verbs Dienst OAI-PMH OAI-PMH requests HTTP GET/POST HTTP GET/POST HTTP GET/POST responses XML XML XML transport HTTP HTTP HTTP metadata OAMS unqualified Dublin Core about eprints unqualified Dublin Core document like objects model metadata harvesting metadata harvesting metadata harvesting herbert van de sompel resources OAI-PMH model service provider h a r v e s t e r data provider 6 OAI-PMH Requests herbert van de sompel Replies r e p o s i t o r y OAI-PMH model service provider h a r v e s t e r herbert van de sompel Supporting protocol requests: • Identify • ListMetadataFormats • ListSets Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord data provider r e p o s i t o r y OAI-PMH model service provider h a r v e s t e r herbert van de sompel data provider Datestamp Identifier Set Records r e p o s i t o r y federated services e-print FTXT A&I OPAC image herbert van de sompel metadata harvesting via OAI-PMH harvester metadata e-print FTXT FTXT A&I OPAC image herbert van de sompel metadata harvesting via OAI-PMH e-print metadata FTXT Author Title Abstract Identifer A&I OPAC image herbert van de sompel issue solved? • no, just a tiny part of the technical challenges to support discovery • many more technical issues • even more non-technical issues herbert van de sompel issue solved? technical awareness certification interoperable grid A registration herbert van de sompel archiving rewarding R issue solved? non-technical • I am happy to leave those to you • but: even for non-technological issues, part of the answer might be found in applying technology herbert van de sompel indicators of adoption of OAI-PMH data providers service providers tools structural support herbert van de sompel data providers • 49 registered repositories [11/2001] • 65 registered repositories [03/2002] • 5+ million records • many unregistered repositories herbert van de sompel service providers •Arc : cross-searching of registered repositories [Old Dominion U] [ http://arc.cs.odu.edu ] • OLAC: cross-searching of Language Archive Community repositories http://www.language-archives.org/index.html herbert van de sompel service providers • Scirus scientific search engine [Elsevier] [ http://www.scirus.com ] • my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.] [http://www.myoai.com] • growing interest from web search engines herbert van de sompel OAI-PMH tools • Repository Explorer: interactive exploration of repositories [Virginia Tech] [ http://www.purl.org/NET/oai_explorer ] • eprints.org: generic OAI-PMH compliant repository software [U of Southampton] [ http://www.eprints.org ] • ALCME repository and harvester software [OCLC] [ http://alcme.oclc.org/index.html ] herbert van de sompel OAI-PMH flies: structural support • Metadata Harvesting Initiative of the Mellon Foundation • NSDL (NSF funded) • UK FAIR call for proposals to support disclosure of institutional assets (papers, learning materials, etc.) • Institute for Museum and Library Services • several EC projects exploring/supporting usage of OAI-PMH: TEL, Leaf, Cyclades, OA Forum herbert van de sompel OAI-PMH flies: and also … • Australian Museums Online & CIMI : OAI conference • NIMH white paper on data archiving for Animal Cognition Research • Library of Congress • National Library of Canada • OCLC thesis database • Illinois State Library Catalogue herbert van de sompel future OAI OAI-PMH communities adoption herbert van de sompel the OAI-PMH • release of OAI-PMH v.2.0 [06/2002] • no backwards compatibility with v.1.0/1.1 • stable • migration process for registered repos • ? formal standardization ? • ? SOAP version ~ web services framework [SOAP, WSDL, UDDI] ? herbert van de sompel communities • proliferation of community-specific add-ons for: • collection & set level metadata • expressive metadata formats (e.g. qualified DC XML Schema) • shared set-structures • machine readable rights (about the metadata) herbert van de sompel adoption • evolution • from talking about OAI-PMH • to talking about projects that use OAI-PMH • to talking about projects and failing to mention they use OAI-PMH => OAI-PMH becomes part of the infrastructure herbert van de sompel I just wanted to report what I consider an OAI success. I discovered that RLG had harvested records for two of the American Memory collections I had made available and integrated them into their Cultural Materials Initiative service without the need for a single e-mail or phone call. They reported that it was working very well for them. [Caroline Arms, Library of Congress] herbert van de sompel http://www.openarchives.org [email protected] herbert van de sompel the OAI: not really an organization • Executive: Carl Lagoze & Herbert Van de Sompel • 2000 – 2002 funding from CNI and DLF • Steering Committee • Technical Committe: • protocol revision & stabilization • Alpha testers herbert van de sompel OAI-tech US representatives Thomas Krichel (Long Island U) - Jeff Young (OCLC) Tim Cole - (U of Illinois at Urbana Champaign) Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Muhammad Zubair (Old Dominion U) - Steven Bird (U Penn.) European representatives Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton) herbert van de sompel OAI-PMH 2.0 alpha testers (1/2) • • • • • • • • • The British Library Cornell U. -- NSDL project & e-print arXiv Ex Libris FS Consulting Inc -- harvester for my.OAI Humboldt-Universität zu Berlin InQuirion Pty Ltd, RMIT University Library of Congress NASA OCLC herbert van de sompel OAI-PMH 2.0 alpha testers (2/2) • Old Dominion U. -- ARC , DP9 • U. of Illinois at Urbana-Champaign • U. Of Southampton -- OAIA, CiteBase, eprints.org • UCLA, John Hopkins U., Indiana U., NYU -- sheet music collection • UKOLN, U. of Bath -- RDN • Virginia Tech -- repository explorer herbert van de sompel