Loading 2 - Metadata and Catalogue.ipynb +63 −0 Original line number Diff line number Diff line %% Cell type:markdown id: tags: # 2 - Metadata and Catalogue %% Cell type:markdown id: tags: ### In this block: * Overview data formats * Overview container formats * Overview protocols %% Cell type:markdown id: tags: * Example SRU (2.1) * Example data harvesting OAI-PMH (2.2) * Example SPARQL (2.3) %% Cell type:markdown id: tags: ## Overview data formats %% Cell type:markdown id: tags: * Dublin Core * set of vocabulary terms to describe digital resources * 15 classic metadata terms, known as the Dublin Core Metadata Element Set (DCMES) * [Dublin Core Metadata Initiative](http://dublincore.org/) %% Cell type:markdown id: tags: * MARC * MARC (MAchine-Readable Cataloging) standards * developed in the 1960s to create records that could be read by computers and shared among libraries * MARC 21 MARC record format for the 21st century %% Cell type:markdown id: tags: * Dublin Core Metadata Element Set (DCMES) 1.1 1. Title: The name of the object 2. Creator: An entity primarily responsible for making the resource 3. Subject: The topic addressed by the work 4. Description: An account of the resource 5. Publisher: The agent or agency responsible for making the object available 6. Contributor: An entity responsible for making contributions to the resource 7. Date: The date of publication 8. Type: The nature or genre of the resource 9. Format: The file format, physical medium, or dimensions of the resource 10. Identifier: String or number used to uniquely identify the object %% Cell type:markdown id: tags: * Dublin Core Metadata Element Set (DCMES) 1.1 11. Source: Objects, either print or electronic, from which this object is derived, if applicable 12. Language: Language of the intellectual content 13. Relation: Relationship to other objects 14. Coverage: The spatial locations and temporal durations characteristic of the object 15. Rights: Information about rights held in and over the resource %% Cell type:markdown id: tags: ## Overview container formats %% Cell type:markdown id: tags: * Simple DC container XML Schema [http://www.dublincore.org/schemas/xmls/](http://www.dublincore.org/schemas/xmls/)  %% Cell type:markdown id: tags: * JSON * a string = { "name":"John" } * a number = { "age":30 } * an object (JSON object) = {"employee":{ "name":"John", "age":30, "city":"New York" }} * an array = {"employees":[ "John", "Anna", "Peter" ]} * a boolean = { "sale":true } * null = { "middlename":null } %% Cell type:markdown id: tags: * JSON-LD * JSON for Linked Data * keywords * @context to provide additional mappings from JSON to an RDF model (map terms to IRIs) * @id to uniquely identify things * @type to set the data type of a node or typed value * @container to set the default container type for a term * "@container": "@set" defines a container as an unordered set %% Cell type:markdown id: tags: ```javascript { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "Person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "https://me.example.com", "@type": "Person", "name": "John Smith", "homepage": "https://www.example.com/" } ``` %% Cell type:markdown id: tags: * DCMES and DCMI Metadata Terms [http://www.dublincore.org/specifications/dublin-core/dcmi-terms/](http://www.dublincore.org/specifications/dublin-core/dcmi-terms/) within JSON-LD ```javascript { ... "publisher": "Arn. Giull. de Brocario", "place_of_publication": "Compluti", "language": "http://id.loc.gov/vocabulary/iso639-2/mul", "@id": "https://open-na.hosted.exlibrisgroup.com/alma/43ACC_ONB/bibs/990028618530603338", "title": "Biblia polyglotta", "@context": "https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib" } ``` * [https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib](https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib) %% Cell type:code id: tags: ``` python import requests cont=requests.get("https://open-na.hosted.exlibrisgroup.com/alma/43ACC_NETWORK/bibs/990106901740203331") cont.json() ``` %% Output {'date': '9999', 'note': 'Aus: (Sammelband von 63 Hochzeitsgedichten).', 'identifier': [{'label': '(DE-599)OBVAC10480601'}, {'label': '(Aleph)010690174ACC01'}, {'label': '(AT-OBV)AC10480601'}, {'label': 'AC10480601'}], '@type': 'Book', 'place_of_publication': 's.l.', 'language': 'http://id.loc.gov/vocabulary/iso639-2/ger', '@id': 'https://open-na.hosted.exlibrisgroup.com/alma/43ACC_NETWORK/bibs/990106901740203331', 'title': 'Bey dem hochadelichen Helmrich- und Bassronischen Beylager, welches ... zu sonderbahren Ehren beyder Vermählten ...', '@context': 'https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib'} %% Cell type:markdown id: tags: * MARCXML * MARCXML is an XML schema based on the common MARC21 standards * [http://www.loc.gov/standards/marcxml/](http://www.loc.gov/standards/marcxml/) %% Cell type:markdown id: tags: ## Overview protocols %% Cell type:markdown id: tags: * SRU * SRU (Search/Retrieve via URL) permits targeted searches within the Catalogue based on well established internet standards. * [https://developers.exlibrisgroup.com/alma/integrations/SRU/](https://developers.exlibrisgroup.com/alma/integrations/SRU/) * [http://www.loc.gov/standards/sru/](http://www.loc.gov/standards/sru/) * based on CQL (Contextual Query Language) to search within the catalogue * for retrieval of a bibliographic record the Barcode or Metadata Management System ID (MMS-ID) is used * CQL query * alma.mms_id=990055772160603338 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml)) %% Cell type:markdown id: tags: * CQL query * alma.title=transzendental ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc)) * alma.barcode=%2BZ199052304 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml)) * alma.mmsid=990034300920603338%20or%20alma.mmsid=990028618530603338 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc)) %% Cell type:markdown id: tags: * OAI-PMH * OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is used for metadata harvesting * 6 verbs * GetRecord – Used to retrieve an individual metadata record. * Identify – Used to retrieve repository information (ex. name, version). * ListIdentifiers – Used to retrieve only headers. * ListMetadataFormats – Used to retrieve the available metadata formats. * ListRecords – Used to retrieve actual item metadata records. * ListSets – Used to retrieve the set structure of a repository %% Cell type:code id: tags: ``` python from sickle import Sickle sickle = Sickle('https://eu02.alma.exlibrisgroup.com/view/oai/43ACC_ONB/request') oai_sets = sickle.ListSets() for oai_set in oai_sets: print('setSpec value for selective harvesting: ' + oai_set.setSpec) print('Name of the set (setName): ' + oai_set.setName + '\n') ``` %% Output setSpec value for selective harvesting: PAPYRUSDC Name of the set (setName): Papyri records in DC simple setSpec value for selective harvesting: FULLMARC Name of the set (setName): Complete set of ONB records in MARC setSpec value for selective harvesting: HANNAMARC Name of the set (setName): HANNA records in MARC setSpec value for selective harvesting: ESPERANTOMARC Name of the set (setName): Esperanto records in MARC setSpec value for selective harvesting: ESPERANTODC Name of the set (setName): Esperanto Records in DC simple setSpec value for selective harvesting: PAPYRUSMARC Name of the set (setName): Papyri records in MARC setSpec value for selective harvesting: HANNADC Name of the set (setName): HANNA records in DC simple setSpec value for selective harvesting: ABODC Name of the set (setName): Austrian Books Online in DC simple setSpec value for selective harvesting: ARIADNEDC Name of the set (setName): Ariadne records in DC simple setSpec value for selective harvesting: ARIADNEMARC Name of the set (setName): Ariadne records in MARC setSpec value for selective harvesting: MAPMARC Name of the set (setName): Maps and Globes records in MARC setSpec value for selective harvesting: FULLDC Name of the set (setName): Complete set of ONB records in DC simple setSpec value for selective harvesting: MAPDC Name of the set (setName): Maps and Globes records in DC simple setSpec value for selective harvesting: ABOMARC Name of the set (setName): Austrian Books Online in MARC setSpec value for selective harvesting: OAIBIBLIOA Name of the set (setName): Austrian Bibliography A setSpec value for selective harvesting: MUSHANDC Name of the set (setName): Musikhandschriften in DC setSpec value for selective harvesting: MUSHANMARC Name of the set (setName): Music Manuscripts setSpec value for selective harvesting: CERLMARC Name of the set (setName): Old prints and manuscripts for CERL portal Loading
2 - Metadata and Catalogue.ipynb +63 −0 Original line number Diff line number Diff line %% Cell type:markdown id: tags: # 2 - Metadata and Catalogue %% Cell type:markdown id: tags: ### In this block: * Overview data formats * Overview container formats * Overview protocols %% Cell type:markdown id: tags: * Example SRU (2.1) * Example data harvesting OAI-PMH (2.2) * Example SPARQL (2.3) %% Cell type:markdown id: tags: ## Overview data formats %% Cell type:markdown id: tags: * Dublin Core * set of vocabulary terms to describe digital resources * 15 classic metadata terms, known as the Dublin Core Metadata Element Set (DCMES) * [Dublin Core Metadata Initiative](http://dublincore.org/) %% Cell type:markdown id: tags: * MARC * MARC (MAchine-Readable Cataloging) standards * developed in the 1960s to create records that could be read by computers and shared among libraries * MARC 21 MARC record format for the 21st century %% Cell type:markdown id: tags: * Dublin Core Metadata Element Set (DCMES) 1.1 1. Title: The name of the object 2. Creator: An entity primarily responsible for making the resource 3. Subject: The topic addressed by the work 4. Description: An account of the resource 5. Publisher: The agent or agency responsible for making the object available 6. Contributor: An entity responsible for making contributions to the resource 7. Date: The date of publication 8. Type: The nature or genre of the resource 9. Format: The file format, physical medium, or dimensions of the resource 10. Identifier: String or number used to uniquely identify the object %% Cell type:markdown id: tags: * Dublin Core Metadata Element Set (DCMES) 1.1 11. Source: Objects, either print or electronic, from which this object is derived, if applicable 12. Language: Language of the intellectual content 13. Relation: Relationship to other objects 14. Coverage: The spatial locations and temporal durations characteristic of the object 15. Rights: Information about rights held in and over the resource %% Cell type:markdown id: tags: ## Overview container formats %% Cell type:markdown id: tags: * Simple DC container XML Schema [http://www.dublincore.org/schemas/xmls/](http://www.dublincore.org/schemas/xmls/)  %% Cell type:markdown id: tags: * JSON * a string = { "name":"John" } * a number = { "age":30 } * an object (JSON object) = {"employee":{ "name":"John", "age":30, "city":"New York" }} * an array = {"employees":[ "John", "Anna", "Peter" ]} * a boolean = { "sale":true } * null = { "middlename":null } %% Cell type:markdown id: tags: * JSON-LD * JSON for Linked Data * keywords * @context to provide additional mappings from JSON to an RDF model (map terms to IRIs) * @id to uniquely identify things * @type to set the data type of a node or typed value * @container to set the default container type for a term * "@container": "@set" defines a container as an unordered set %% Cell type:markdown id: tags: ```javascript { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "homepage": { "@id": "http://xmlns.com/foaf/0.1/workplaceHomepage", "@type": "@id" }, "Person": "http://xmlns.com/foaf/0.1/Person" }, "@id": "https://me.example.com", "@type": "Person", "name": "John Smith", "homepage": "https://www.example.com/" } ``` %% Cell type:markdown id: tags: * DCMES and DCMI Metadata Terms [http://www.dublincore.org/specifications/dublin-core/dcmi-terms/](http://www.dublincore.org/specifications/dublin-core/dcmi-terms/) within JSON-LD ```javascript { ... "publisher": "Arn. Giull. de Brocario", "place_of_publication": "Compluti", "language": "http://id.loc.gov/vocabulary/iso639-2/mul", "@id": "https://open-na.hosted.exlibrisgroup.com/alma/43ACC_ONB/bibs/990028618530603338", "title": "Biblia polyglotta", "@context": "https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib" } ``` * [https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib](https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib) %% Cell type:code id: tags: ``` python import requests cont=requests.get("https://open-na.hosted.exlibrisgroup.com/alma/43ACC_NETWORK/bibs/990106901740203331") cont.json() ``` %% Output {'date': '9999', 'note': 'Aus: (Sammelband von 63 Hochzeitsgedichten).', 'identifier': [{'label': '(DE-599)OBVAC10480601'}, {'label': '(Aleph)010690174ACC01'}, {'label': '(AT-OBV)AC10480601'}, {'label': 'AC10480601'}], '@type': 'Book', 'place_of_publication': 's.l.', 'language': 'http://id.loc.gov/vocabulary/iso639-2/ger', '@id': 'https://open-na.hosted.exlibrisgroup.com/alma/43ACC_NETWORK/bibs/990106901740203331', 'title': 'Bey dem hochadelichen Helmrich- und Bassronischen Beylager, welches ... zu sonderbahren Ehren beyder Vermählten ...', '@context': 'https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib'} %% Cell type:markdown id: tags: * MARCXML * MARCXML is an XML schema based on the common MARC21 standards * [http://www.loc.gov/standards/marcxml/](http://www.loc.gov/standards/marcxml/) %% Cell type:markdown id: tags: ## Overview protocols %% Cell type:markdown id: tags: * SRU * SRU (Search/Retrieve via URL) permits targeted searches within the Catalogue based on well established internet standards. * [https://developers.exlibrisgroup.com/alma/integrations/SRU/](https://developers.exlibrisgroup.com/alma/integrations/SRU/) * [http://www.loc.gov/standards/sru/](http://www.loc.gov/standards/sru/) * based on CQL (Contextual Query Language) to search within the catalogue * for retrieval of a bibliographic record the Barcode or Metadata Management System ID (MMS-ID) is used * CQL query * alma.mms_id=990055772160603338 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml)) %% Cell type:markdown id: tags: * CQL query * alma.title=transzendental ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc)) * alma.barcode=%2BZ199052304 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml)) * alma.mmsid=990034300920603338%20or%20alma.mmsid=990028618530603338 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc)) %% Cell type:markdown id: tags: * OAI-PMH * OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is used for metadata harvesting * 6 verbs * GetRecord – Used to retrieve an individual metadata record. * Identify – Used to retrieve repository information (ex. name, version). * ListIdentifiers – Used to retrieve only headers. * ListMetadataFormats – Used to retrieve the available metadata formats. * ListRecords – Used to retrieve actual item metadata records. * ListSets – Used to retrieve the set structure of a repository %% Cell type:code id: tags: ``` python from sickle import Sickle sickle = Sickle('https://eu02.alma.exlibrisgroup.com/view/oai/43ACC_ONB/request') oai_sets = sickle.ListSets() for oai_set in oai_sets: print('setSpec value for selective harvesting: ' + oai_set.setSpec) print('Name of the set (setName): ' + oai_set.setName + '\n') ``` %% Output setSpec value for selective harvesting: PAPYRUSDC Name of the set (setName): Papyri records in DC simple setSpec value for selective harvesting: FULLMARC Name of the set (setName): Complete set of ONB records in MARC setSpec value for selective harvesting: HANNAMARC Name of the set (setName): HANNA records in MARC setSpec value for selective harvesting: ESPERANTOMARC Name of the set (setName): Esperanto records in MARC setSpec value for selective harvesting: ESPERANTODC Name of the set (setName): Esperanto Records in DC simple setSpec value for selective harvesting: PAPYRUSMARC Name of the set (setName): Papyri records in MARC setSpec value for selective harvesting: HANNADC Name of the set (setName): HANNA records in DC simple setSpec value for selective harvesting: ABODC Name of the set (setName): Austrian Books Online in DC simple setSpec value for selective harvesting: ARIADNEDC Name of the set (setName): Ariadne records in DC simple setSpec value for selective harvesting: ARIADNEMARC Name of the set (setName): Ariadne records in MARC setSpec value for selective harvesting: MAPMARC Name of the set (setName): Maps and Globes records in MARC setSpec value for selective harvesting: FULLDC Name of the set (setName): Complete set of ONB records in DC simple setSpec value for selective harvesting: MAPDC Name of the set (setName): Maps and Globes records in DC simple setSpec value for selective harvesting: ABOMARC Name of the set (setName): Austrian Books Online in MARC setSpec value for selective harvesting: OAIBIBLIOA Name of the set (setName): Austrian Bibliography A setSpec value for selective harvesting: MUSHANDC Name of the set (setName): Musikhandschriften in DC setSpec value for selective harvesting: MUSHANMARC Name of the set (setName): Music Manuscripts setSpec value for selective harvesting: CERLMARC Name of the set (setName): Old prints and manuscripts for CERL portal