6 OAI-PMH

advertisement
the OAI Protocol for Metadata Harvesting
an update
Herbert Van de Sompel
Los Alamos National Laboratory – Research Library
herbert van de sompel
The Open Archives Initiative has been set up
to create a forum to discuss and solve matters
of interoperability between preprint solutions,
as a way to promote their global acceptance.
Paul Ginsparg, Rick Luce & Herbert Van de Sompel
herbert van de sompel
Luce * Van de Sompel * Ginsparg
herbert van de sompel
• 2 core motivations
• as a systems librarian: change the
system
• as a researcher: find (technical) ways to
facilitate the change
herbert van de sompel
as a systems librarian
optimizing the output
A
P
U
B
D
IS
L
I
B
the input is far from optimal
herbert van de sompel
R
eprint systems
• xxx e-print archive
(Physics - 1991 - Los Alamos - Ginsparg)
• RePEc
(Economy - Surrey U - Krichel)
• NCSTRL
(Computer Science - Cornell U - Lagoze)
• NDLTD
(Theses - Virginia Tech - Fox)
• CogPrints
(Cognitive Sciences - Southampton U - Harnad)
herbert van de sompel
as a researcher
• eprints are attractive building block in ongoing
transformation of scholarly communication
• but: interoperability could increase impact of e-
prints:
• amongst e-print solutions
• with building blocks that implement other
functions of scholarly communication
• with the established communication system
herbert van de sompel
UPS Prototype: eprints discovery
• 1999: Van de Sompel, Krichel, Nelson
• results:
• insights regarding how un-interoperable the
systems were
• a cross-repository searching and linking service
• recommendations to the Santa Fe meeting:
• data provider / service provider model
• metadata harvesting
• simplicity
herbert van de sompel
evolution towards OAI-PMH v.2.0
 Santa Fe Convention [02/2000]
 OAI-PMH 1.0 [01/2001]
 OAI-PMH 2.0 [06/2002]
herbert van de sompel
Santa Fe
convention
OAI-PMH
v.1.0/1.1
OAI-PMH
v.2.0
nature
experimental
experimental
stable
verbs
Dienst
OAI-PMH
OAI-PMH
requests
HTTP GET/POST
HTTP GET/POST
HTTP GET/POST
responses
XML
XML
XML
transport
HTTP
HTTP
HTTP
metadata
OAMS
unqualified
Dublin Core
about
eprints
unqualified
Dublin Core
document
like objects
model
metadata
harvesting
metadata
harvesting
metadata
harvesting
herbert van de sompel
resources
OAI-PMH model
service provider
h
a
r
v
e
s
t
e
r
data provider
6 OAI-PMH Requests
herbert van de sompel
Replies
r
e
p
o
s
i
t
o
r
y
OAI-PMH model
service provider
h
a
r
v
e
s
t
e
r
herbert van de sompel
Supporting protocol requests:
• Identify
• ListMetadataFormats
• ListSets
Harvesting protocol requests:
• ListRecords
• ListIdentifiers
• GetRecord
data provider
r
e
p
o
s
i
t
o
r
y
OAI-PMH model
service provider
h
a
r
v
e
s
t
e
r
herbert van de sompel
data provider
Datestamp
Identifier
Set
Records
r
e
p
o
s
i
t
o
r
y
federated services
e-print
FTXT
A&I
OPAC
image
herbert van de sompel
metadata harvesting via OAI-PMH
harvester
metadata
e-print
FTXT
FTXT
A&I
OPAC
image
herbert van de sompel
metadata harvesting via OAI-PMH
e-print
metadata
FTXT
Author
Title
Abstract
Identifer
A&I
OPAC
image
herbert van de sompel
issue solved?
• no, just a tiny part of the technical challenges to
support discovery
• many more technical issues
• even more non-technical issues
herbert van de sompel
issue solved? technical
awareness
certification
interoperable grid
A
registration
herbert van de sompel
archiving
rewarding
R
issue solved? non-technical
• I am happy to leave those to you
• but: even for non-technological issues, part of the
answer might be found in applying technology
herbert van de sompel
indicators of adoption of OAI-PMH
 data providers
 service providers
 tools
 structural support
herbert van de sompel
data providers
• 49 registered repositories [11/2001]
• 65 registered repositories [03/2002]
• 5+ million records
• many unregistered repositories
herbert van de sompel
service providers
•Arc : cross-searching of registered
repositories [Old Dominion U]
[ http://arc.cs.odu.edu ]
• OLAC: cross-searching of Language Archive
Community repositories
http://www.language-archives.org/index.html
herbert van de sompel
service providers
• Scirus scientific search engine [Elsevier]
[ http://www.scirus.com ]
• my.OAI : user-tailorable cross-searching of
registered repositories [FS Consulting, Inc.]
[http://www.myoai.com]
• growing interest from web search engines
herbert van de sompel
OAI-PMH tools
• Repository Explorer: interactive exploration
of repositories [Virginia Tech]
[ http://www.purl.org/NET/oai_explorer ]
• eprints.org: generic OAI-PMH compliant
repository software [U of Southampton]
[ http://www.eprints.org ]
• ALCME repository and harvester software
[OCLC]
[ http://alcme.oclc.org/index.html ]
herbert van de sompel
OAI-PMH flies: structural support
• Metadata Harvesting Initiative of the Mellon
Foundation
• NSDL (NSF funded)
• UK FAIR call for proposals to support
disclosure of institutional assets (papers,
learning materials, etc.)
• Institute for Museum and Library Services
• several EC projects exploring/supporting
usage of OAI-PMH: TEL, Leaf, Cyclades, OA
Forum
herbert van de sompel
OAI-PMH flies: and also …
• Australian Museums Online & CIMI : OAI
conference
• NIMH white paper on data archiving for
Animal Cognition Research
• Library of Congress
• National Library of Canada
• OCLC thesis database
• Illinois State Library Catalogue
herbert van de sompel
future
 OAI
 OAI-PMH
 communities
 adoption
herbert van de sompel
the OAI-PMH
• release of OAI-PMH v.2.0 [06/2002]
• no backwards compatibility with v.1.0/1.1
• stable
• migration process for registered repos
• ? formal standardization ?
• ? SOAP version ~ web services framework
[SOAP, WSDL, UDDI] ?
herbert van de sompel
communities
• proliferation of community-specific add-ons
for:
• collection & set level metadata
• expressive metadata formats (e.g. qualified DC
XML Schema)
• shared set-structures
• machine readable rights (about the metadata)
herbert van de sompel
adoption
• evolution
• from talking about OAI-PMH
• to talking about projects that use OAI-PMH
• to talking about projects and failing to mention
they use OAI-PMH
=> OAI-PMH becomes part of the infrastructure
herbert van de sompel
I just wanted to report what I consider an
OAI success. I discovered that RLG had
harvested records for two of the American
Memory collections I had made available and
integrated them into their Cultural Materials
Initiative service without the need for a single
e-mail or phone call. They reported that it was
working very well for them.
[Caroline Arms, Library of Congress]
herbert van de sompel
http://www.openarchives.org
[email protected]
herbert van de sompel
the OAI: not really an organization
• Executive: Carl Lagoze & Herbert Van de Sompel
• 2000 – 2002 funding from CNI and DLF
• Steering Committee
• Technical Committe:
• protocol revision & stabilization
• Alpha testers
herbert van de sompel
OAI-tech
US representatives
Thomas Krichel (Long Island U) - Jeff Young (OCLC) Tim Cole - (U of Illinois at Urbana Champaign) Hussein Suleman (Virginia Tech) - Simeon Warner
(Cornell U) - Michael Nelson (NASA) - Caroline Arms
(LoC) - Muhammad Zubair (Old Dominion U) - Steven
Bird (U Penn.)
European representatives
Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer
(DTV) - Thomas Baron (CERN) - Les Carr (U of
Southampton)
herbert van de sompel
OAI-PMH 2.0 alpha testers (1/2)
•
•
•
•
•
•
•
•
•
The British Library
Cornell U. -- NSDL project & e-print arXiv
Ex Libris
FS Consulting Inc -- harvester for my.OAI
Humboldt-Universität zu Berlin
InQuirion Pty Ltd, RMIT University
Library of Congress
NASA
OCLC
herbert van de sompel
OAI-PMH 2.0 alpha testers (2/2)
• Old Dominion U. -- ARC , DP9
• U. of Illinois at Urbana-Champaign
• U. Of Southampton -- OAIA, CiteBase, eprints.org
• UCLA, John Hopkins U., Indiana U., NYU -- sheet
music collection
• UKOLN, U. of Bath -- RDN
• Virginia Tech -- repository explorer
herbert van de sompel
Download