Skip to content
Snippets Groups Projects
Commit b9b656d0 authored by Georg Petz's avatar Georg Petz
Browse files

Metadata slides update

parent 7148422a
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# 2 - Metadata and Catalogue
[https://labs.onb.ac.at/en/dataset/lod/](https://labs.onb.ac.at/en/dataset/lod/)
[https://labs.onb.ac.at/en/tool/sparql/](https://labs.onb.ac.at/en/tool/sparql/)
%% Cell type:markdown id: tags:
### In this block:
* Overview data formats
* Overview container formats
* Overview protocols
%% Cell type:markdown id: tags:
* Example SRU (2.1)
* Example data harvesting OAI-PMH (2.2)
* Example SPARQL (2.3)
* Example: SRU (2.1)
* Example: data harvesting OAI-PMH (2.2)
* Example: SPARQL (2.3)
%% Cell type:markdown id: tags:
## Overview data formats
%% Cell type:markdown id: tags:
* Dublin Core
* set of vocabulary terms to describe digital resources
* 15 classic metadata terms, known as the Dublin Core Metadata Element Set (DCMES)
* [Dublin Core Metadata Initiative](http://dublincore.org/)
%% Cell type:markdown id: tags:
* MARC
* MARC (MAchine-Readable Cataloging) standards
* developed in the 1960s to create records that could be read by computers and shared among libraries
* MARC 21 MARC record format for the 21st century
* MARC 21, MARC record format for the 21st century
%% Cell type:markdown id: tags:
* Dublin Core Metadata Element Set (DCMES) 1.1
1. Title: The name of the object
2. Creator: An entity primarily responsible for making the resource
3. Subject: The topic addressed by the work
4. Description: An account of the resource
5. Publisher: The agent or agency responsible for making the object available
6. Contributor: An entity responsible for making contributions to the resource
7. Date: The date of publication
8. Type: The nature or genre of the resource
9. Format: The file format, physical medium, or dimensions of the resource
10. Identifier: String or number used to uniquely identify the object
%% Cell type:markdown id: tags:
* Dublin Core Metadata Element Set (DCMES) 1.1
11. Source: Objects, either print or electronic, from which this object is derived, if applicable
12. Language: Language of the intellectual content
13. Relation: Relationship to other objects
14. Coverage: The spatial locations and temporal durations characteristic of the object
15. Rights: Information about rights held in and over the resource
%% Cell type:markdown id: tags:
## Overview container formats
%% Cell type:markdown id: tags:
* Simple DC container XML Schema [http://www.dublincore.org/schemas/xmls/](http://www.dublincore.org/schemas/xmls/)
![simpledc xml schema](./media/simpledc.png)
%% Cell type:markdown id: tags:
* JSON
* a string = { "name":"John" }
* a number = { "age":30 }
* an object (JSON object) = {"employee":{ "name":"John", "age":30, "city":"New York" }}
* an array = {"employees":[ "John", "Anna", "Peter" ]}
* a boolean = { "sale":true }
* null = { "middlename":null }
%% Cell type:markdown id: tags:
* JSON-LD
* JSON for Linked Data
* keywords
* @context to provide additional mappings from JSON to an RDF model (map terms to IRIs)
* @id to uniquely identify things
* @type to set the data type of a node or typed value
* @container to set the default container type for a term
* "@container": "@set" defines a container as an unordered set
%% Cell type:markdown id: tags:
```javascript
{
"@context": {
"name": "http://xmlns.com/foaf/0.1/name",
"homepage": {
"@id": "http://xmlns.com/foaf/0.1/workplaceHomepage",
"@type": "@id"
},
"Person": "http://xmlns.com/foaf/0.1/Person"
},
"@id": "https://me.example.com",
"@type": "Person",
"name": "John Smith",
"homepage": "https://www.example.com/"
}
```
%% Cell type:markdown id: tags:
[https://json-ld.org/playground/](https://json-ld.org/playground/)
%% Cell type:markdown id: tags:
* DCMES and DCMI Metadata Terms [http://www.dublincore.org/specifications/dublin-core/dcmi-terms/](http://www.dublincore.org/specifications/dublin-core/dcmi-terms/) within JSON-LD
```javascript
{
...
"publisher": "Arn. Giull. de Brocario",
"place_of_publication": "Compluti",
"language": "http://id.loc.gov/vocabulary/iso639-2/mul",
"@id": "https://open-na.hosted.exlibrisgroup.com/alma/43ACC_ONB/bibs/990028618530603338",
"title": "Biblia polyglotta",
"@context": "https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib"
}
```
* [https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib](https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib)
%% Cell type:code id: tags:
``` python
import requests
cont=requests.get("https://open-na.hosted.exlibrisgroup.com/alma/43ACC_NETWORK/bibs/990106901740203331")
cont.json()
```
%% Output
{'date': '9999',
'note': 'Aus: (Sammelband von 63 Hochzeitsgedichten).',
'identifier': [{'label': '(DE-599)OBVAC10480601'},
{'label': '(Aleph)010690174ACC01'},
{'label': '(AT-OBV)AC10480601'},
{'label': 'AC10480601'}],
'@type': 'Book',
'place_of_publication': 's.l.',
'language': 'http://id.loc.gov/vocabulary/iso639-2/ger',
'@id': 'https://open-na.hosted.exlibrisgroup.com/alma/43ACC_NETWORK/bibs/990106901740203331',
'title': 'Bey dem hochadelichen Helmrich- und Bassronischen Beylager, welches ... zu sonderbahren Ehren beyder Vermählten ...',
'@context': 'https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib'}
%% Cell type:markdown id: tags:
* MARCXML
* MARCXML is an XML schema based on the common MARC21 standards
* [http://www.loc.gov/standards/marcxml/](http://www.loc.gov/standards/marcxml/)
%% Cell type:markdown id: tags:
## Overview protocols
%% Cell type:markdown id: tags:
* SRU
* SRU (Search/Retrieve via URL) permits targeted searches within the Catalogue based on well established internet standards.
* [https://developers.exlibrisgroup.com/alma/integrations/SRU/](https://developers.exlibrisgroup.com/alma/integrations/SRU/)
* [http://www.loc.gov/standards/sru/](http://www.loc.gov/standards/sru/)
* based on CQL (Contextual Query Language) to search within the catalogue
* for retrieval of a bibliographic record the Barcode or Metadata Management System ID (MMS-ID) is used
* CQL query
* alma.mms_id=990055772160603338 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml))
%% Cell type:markdown id: tags:
* CQL query
* alma.title=transzendental ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc))
* alma.barcode=%2BZ199052304 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml))
* alma.mmsid=990034300920603338%20or%20alma.mmsid=990028618530603338 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc))
%% Cell type:code id: tags:
``` python
import requests
from lxml import etree
cont=requests.get("https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml").content
e = etree.XML(cont)
print(etree.tostring(e, encoding='unicode', pretty_print=True))
```
%% Output
<searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/">
<version>1.2</version>
<numberOfRecords>1</numberOfRecords>
<records>
<record>
<recordSchema>marcxml</recordSchema>
<recordPacking>xml</recordPacking>
<recordData>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nam a2200000 c 4500</leader>
<controlfield tag="001">990030217420603338</controlfield>
<controlfield tag="005">20180123084300.0</controlfield>
<controlfield tag="007">cr#|||||||||||</controlfield>
<controlfield tag="007">tu</controlfield>
<controlfield tag="008">000101|1814####xx############|||#|#ger#u</controlfield>
<controlfield tag="009">AC09865194</controlfield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">AC09865194</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(Aleph)009871525ACC01</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(DE-599)OBVAC09865194</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(AT-OBV)AC09865194</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(EXLNZ-43ACC_NETWORK)990098715250203331</subfield>
</datafield>
<datafield tag="040" ind1=" " ind2=" ">
<subfield code="a">ONB</subfield>
<subfield code="b">ger</subfield>
<subfield code="c">ONB-AK-RETRO</subfield>
<subfield code="d">AT-OeNB</subfield>
<subfield code="e">pi</subfield>
</datafield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">ger</subfield>
</datafield>
<datafield tag="044" ind1=" " ind2=" ">
<subfield code="c">XA-DXDE</subfield>
</datafield>
<datafield tag="245" ind1="0" ind2="0">
<subfield code="a">&lt;&lt;Die&gt;&gt; Flucht über den Rhein odar Das unverhoffte Wiedersehen</subfield>
<subfield code="b">Ein erlustirend historisch-rührendes Familiengemälde mit Erscheinungen und vollstimmigen Chören von Baschkiren und Cosaken, und allen Batterien der Deutschen</subfield>
</datafield>
<datafield tag="264" ind1=" " ind2="1">
<subfield code="a">[Meißen]</subfield>
<subfield code="b">[Gödsche]</subfield>
<subfield code="c">1814</subfield>
</datafield>
<datafield tag="300" ind1=" " ind2=" ">
<subfield code="a">32 S.</subfield>
</datafield>
<datafield tag="689" ind1="0" ind2="0">
<subfield code="a">Deutschland</subfield>
<subfield code="D">g</subfield>
<subfield code="0">(DE-588)4011882-4</subfield>
</datafield>
<datafield tag="689" ind1="0" ind2="1">
<subfield code="a">Krieg</subfield>
<subfield code="D">s</subfield>
<subfield code="0">(DE-588)4033114-3</subfield>
</datafield>
<datafield tag="689" ind1="0" ind2="3">
<subfield code="a">Belletristische Darstellung</subfield>
<subfield code="A">f</subfield>
</datafield>
<datafield tag="689" ind1="0" ind2=" ">
<subfield code="5">AT-OBV</subfield>
<subfield code="5">ONB-AK</subfield>
</datafield>
<datafield tag="689" ind1="1" ind2="0">
<subfield code="a">Drama</subfield>
<subfield code="D">s</subfield>
<subfield code="0">(DE-588)4012899-4</subfield>
</datafield>
<datafield tag="689" ind1="1" ind2="1">
<subfield code="a">Deutsch</subfield>
<subfield code="D">s</subfield>
<subfield code="0">(DE-588)4113292-0</subfield>
</datafield>
<datafield tag="689" ind1="1" ind2=" ">
<subfield code="5">AT-OBV</subfield>
<subfield code="5">ONB-AK</subfield>
</datafield>
<datafield tag="710" ind1="2" ind2=" ">
<subfield code="a">Goedsche, Friedrich Wilhelm</subfield>
<subfield code="4">pbl</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="u">http://data.onb.ac.at/imgk/AZ00308934SZ00220134SZ00628562</subfield>
<subfield code="z">Zettel</subfield>
<subfield code="o">Katalogkarte</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2="0">
<subfield code="m">V:AT-OBV;B:AT-OeNB</subfield>
<subfield code="q">application/html</subfield>
<subfield code="u">http://data.onb.ac.at/ABO/%2BZ182067107</subfield>
<subfield code="x">ONB-ABO</subfield>
<subfield code="3">Volltext</subfield>
<subfield code="o">OBV-ONB-ABO</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2="0">
<subfield code="m">V:AT-OBV;B:AT-OeNB</subfield>
<subfield code="q">application/html</subfield>
<subfield code="u">http://data.onb.ac.at/ABO/%2BZ199052304</subfield>
<subfield code="x">ONB-ABO</subfield>
<subfield code="3">Volltext</subfield>
<subfield code="o">OBV-ONB-ABO</subfield>
</datafield>
<datafield tag="974" ind1="0" ind2="s">
<subfield code="V">029</subfield>
<subfield code="a">LZ01187985</subfield>
</datafield>
<datafield tag="974" ind1="0" ind2="s">
<subfield code="F">030</subfield>
<subfield code="A">u|1uf||||||37</subfield>
</datafield>
<datafield tag="974" ind1="0" ind2="s">
<subfield code="F">050</subfield>
<subfield code="A">a|a|||||g|||||</subfield>
</datafield>
<datafield tag="974" ind1="0" ind2="s">
<subfield code="F">051</subfield>
<subfield code="A">m|||||||</subfield>
</datafield>
<datafield tag="980" ind1="0" ind2=" ">
<subfield code="a">0</subfield>
<subfield code="9">LOCAL</subfield>
</datafield>
<datafield tag="980" ind1="0" ind2=" ">
<subfield code="a">ONB-AK-RETRO</subfield>
<subfield code="9">LOCAL</subfield>
</datafield>
<datafield tag="982" ind1=" " ind2=" ">
<subfield code="f">Drama</subfield>
<subfield code="9">LOCAL</subfield>
</datafield>
<datafield tag="982" ind1=" " ind2=" ">
<subfield code="f">Dramen / deutsche / 19. Jh.</subfield>
<subfield code="9">LOCAL</subfield>
</datafield>
<datafield tag="AVA" ind1=" " ind2=" ">
<subfield code="0">990030217420603338</subfield>
<subfield code="8">22288570940003338</subfield>
<subfield code="a">43ACC_ONB</subfield>
<subfield code="b">ZALT</subfield>
<subfield code="c">State Hall at Josefsplatz</subfield>
<subfield code="d">80.J.58</subfield>
<subfield code="e">available</subfield>
<subfield code="f">1</subfield>
<subfield code="g">0</subfield>
<subfield code="i">ONB</subfield>
<subfield code="j">PRUNK</subfield>
<subfield code="p">1</subfield>
<subfield code="q">Department of Manuscripts and Rare Books (ALT)</subfield>
</datafield>
<datafield tag="AVA" ind1=" " ind2=" ">
<subfield code="0">990030217420603338</subfield>
<subfield code="8">22288570920003338</subfield>
<subfield code="a">43ACC_ONB</subfield>
<subfield code="b">ZFID</subfield>
<subfield code="c">Bildarchiv und Grafiksammlung</subfield>
<subfield code="d">288765-B</subfield>
<subfield code="e">available</subfield>
<subfield code="f">1</subfield>
<subfield code="g">0</subfield>
<subfield code="i">ONB</subfield>
<subfield code="j">MAG</subfield>
<subfield code="p">2</subfield>
<subfield code="q">Picture Archives and Graphics Department (FID)</subfield>
</datafield>
</record>
</recordData>
<recordIdentifier>990030217420603338</recordIdentifier>
<recordPosition>0</recordPosition>
</record>
</records>
<extraResponseData xmlns:xb="http://www.exlibris.com/repository/search/xmlbeans/">
<xb:exact>true</xb:exact>
<xb:responseDate>2019-04-27T11:10:11+0200</xb:responseDate>
</extraResponseData>
</searchRetrieveResponse>
%% Cell type:markdown id: tags:
* OAI-PMH
* OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is used for metadata harvesting
* 6 verbs
* GetRecord – Used to retrieve an individual metadata record.
* Identify – Used to retrieve repository information (ex. name, version).
* ListIdentifiers – Used to retrieve only headers.
* ListMetadataFormats – Used to retrieve the available metadata formats.
* ListRecords – Used to retrieve actual item metadata records.
* ListSets – Used to retrieve the set structure of a repository
%% Cell type:code id: tags:
``` python
from sickle import Sickle
sickle = Sickle('https://eu02.alma.exlibrisgroup.com/view/oai/43ACC_ONB/request')
oai_sets = sickle.ListSets()
for oai_set in oai_sets:
print('setSpec value for selective harvesting: ' + oai_set.setSpec)
print('Name of the set (setName): ' + oai_set.setName + '\n')
```
%% Output
setSpec value for selective harvesting: PAPYRUSDC
Name of the set (setName): Papyri records in DC simple
setSpec value for selective harvesting: FULLMARC
Name of the set (setName): Complete set of ONB records in MARC
setSpec value for selective harvesting: HANNAMARC
Name of the set (setName): HANNA records in MARC
setSpec value for selective harvesting: ESPERANTOMARC
Name of the set (setName): Esperanto records in MARC
setSpec value for selective harvesting: ESPERANTODC
Name of the set (setName): Esperanto Records in DC simple
setSpec value for selective harvesting: PAPYRUSMARC
Name of the set (setName): Papyri records in MARC
setSpec value for selective harvesting: HANNADC
Name of the set (setName): HANNA records in DC simple
setSpec value for selective harvesting: ABODC
Name of the set (setName): Austrian Books Online in DC simple
setSpec value for selective harvesting: ARIADNEDC
Name of the set (setName): Ariadne records in DC simple
setSpec value for selective harvesting: ARIADNEMARC
Name of the set (setName): Ariadne records in MARC
setSpec value for selective harvesting: MAPMARC
Name of the set (setName): Maps and Globes records in MARC
setSpec value for selective harvesting: FULLDC
Name of the set (setName): Complete set of ONB records in DC simple
setSpec value for selective harvesting: MAPDC
Name of the set (setName): Maps and Globes records in DC simple
setSpec value for selective harvesting: ABOMARC
Name of the set (setName): Austrian Books Online in MARC
setSpec value for selective harvesting: OAIBIBLIOA
Name of the set (setName): Austrian Bibliography A
setSpec value for selective harvesting: MUSHANDC
Name of the set (setName): Musikhandschriften in DC
setSpec value for selective harvesting: MUSHANMARC
Name of the set (setName): Music Manuscripts
setSpec value for selective harvesting: CERLMARC
Name of the set (setName): Old prints and manuscripts for CERL portal
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment