Commit 16ee3069 authored by Georg Petz's avatar Georg Petz
Browse files

Notebook Updates

parent 2faf8e80
Loading
Loading
Loading
Loading
+111 −1
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# 2 - Metadata and Catalogue

%% Cell type:markdown id: tags:

### In this block:

* Overview data formats
* Overview container formats
* Overview protocols

%% Cell type:markdown id: tags:

* Example SRU (2.1)
* Example data harvesting OAI-PMH (2.2)
* Example SPARQL (2.3)

%% Cell type:markdown id: tags:

## Overview data formats

%% Cell type:markdown id: tags:

* Dublin Core
    * set of vocabulary terms to describe digital resources
    * 15 classic metadata terms, known as the Dublin Core Metadata Element Set (DCMES)
    * [Dublin Core Metadata Initiative](http://dublincore.org/)

%% Cell type:markdown id: tags:

* MARC
    * MARC (MAchine-Readable Cataloging) standards
        * developed in the 1960s to create records that could be read by computers and shared among libraries
    * MARC 21 MARC record format for the 21st century

%% Cell type:markdown id: tags:

* Dublin Core Metadata Element Set (DCMES) 1.1

    1. Title: The name of the object
    2. Creator: An entity primarily responsible for making the resource
    3. Subject: The topic addressed by the work
    4. Description: An account of the resource
    5. Publisher: The agent or agency responsible for making the object available
    6. Contributor: An entity responsible for making contributions to the resource
    7. Date: The date of publication
    8. Type: The nature or genre of the resource
    9. Format: The file format, physical medium, or dimensions of the resource
    10. Identifier: String or number used to uniquely identify the object

%% Cell type:markdown id: tags:

* Dublin Core Metadata Element Set (DCMES) 1.1

    11. Source: Objects, either print or electronic, from which this object is derived, if applicable
    12. Language: Language of the intellectual content
    13. Relation: Relationship to other objects
    14. Coverage: The spatial locations and temporal durations characteristic of the object
    15. Rights: Information about rights held in and over the resource

%% Cell type:markdown id: tags:

## Overview container formats

%% Cell type:markdown id: tags:

* Simple DC container XML Schema [http://www.dublincore.org/schemas/xmls/](http://www.dublincore.org/schemas/xmls/)
![simpledc xml schema](./media/simpledc.png)

%% Cell type:markdown id: tags:

* DCMES and DCMI Metadata Terms [http://www.dublincore.org/specifications/dublin-core/dcmi-terms/](http://www.dublincore.org/specifications/dublin-core/dcmi-terms/) within JSON-LD

```javascript
{
...
    "publisher": "Arn. Giull. de Brocario",
    "place_of_publication": "Compluti",
    "language": "http://id.loc.gov/vocabulary/iso639-2/mul",
    "@id": "https://open-na.hosted.exlibrisgroup.com/alma/43ACC_ONB/bibs/990028618530603338",
    "title": "Biblia polyglotta",
    "@context": "https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib"
}
```

* [https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib](https://open-na.hosted.exlibrisgroup.com/alma/contexts/bib)

%% Cell type:markdown id: tags:

* MARCXML
    * MARCXML is an XML schema based on the common MARC21 standards
    * [http://www.loc.gov/standards/marcxml/](http://www.loc.gov/standards/marcxml/)

%% Cell type:markdown id: tags:

## Overview protocols

%% Cell type:markdown id: tags:

* OAI-PMH
* SRU
    * SRU (Search/Retrieve via URL) permits targeted searches within the Catalogue based on well established internet standards.
    * [https://developers.exlibrisgroup.com/alma/integrations/SRU/](https://developers.exlibrisgroup.com/alma/integrations/SRU/)
    * [http://www.loc.gov/standards/sru/](http://www.loc.gov/standards/sru/)
    * based on CQL (Contextual Query Language) to search within the catalogue
    * for retrieval of a bibliographic record the Barcode or Metadata Management System ID (MMS-ID) is used
    * CQL query
        * alma.mms_id=990055772160603338  ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990055772160603338&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml))


%% Cell type:markdown id: tags:

* CQL query
    * alma.title=transzendental ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.title=transzendental&startRecord=0&maximumRecords=5&operation=searchRetrieve&recordSchema=dc))
    * alma.barcode=%2BZ199052304 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.barcode=%2BZ199052304&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=marcxml))
    * alma.mmsid=990034300920603338%20or%20alma.mmsid=990028618530603338 ([https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc](https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id=990034300920603338%20or%20alma.mms_id=990028618530603338&startRecord=1&maximumRecords=5&operation=searchRetrieve&recordSchema=dc))

%% Cell type:markdown id: tags:

* OAI-PMH
    * OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is used for metadata harvesting
    * 6 verbs
        * GetRecord – Used to retrieve an individual metadata record.
        * Identify – Used to retrieve repository information (ex. name, version).
        * ListIdentifiers – Used to retrieve only headers.
        * ListMetadataFormats – Used to retrieve the available metadata formats.
        * ListRecords – Used to retrieve actual item metadata records.
        * ListSets – Used to retrieve the set structure of a repository




%% Cell type:code id: tags:

``` python
from sickle import Sickle
sickle = Sickle('https://eu02.alma.exlibrisgroup.com/view/oai/43ACC_ONB/request')
oai_sets = sickle.ListSets()
for oai_set in oai_sets:
    print('setSpec value for selective harvesting: ' + oai_set.setSpec)
    print('Name of the set (setName): ' + oai_set.setName + '\n')
```

%% Output

    setSpec value for selective harvesting: PAPYRUSDC
    Name of the set (setName): Papyri records in DC simple
    
    setSpec value for selective harvesting: FULLMARC
    Name of the set (setName): Complete set of ONB records in MARC
    
    setSpec value for selective harvesting: HANNAMARC
    Name of the set (setName): HANNA records in MARC
    
    setSpec value for selective harvesting: ESPERANTOMARC
    Name of the set (setName): Esperanto records in MARC
    
    setSpec value for selective harvesting: ESPERANTODC
    Name of the set (setName): Esperanto Records in DC simple
    
    setSpec value for selective harvesting: PAPYRUSMARC
    Name of the set (setName): Papyri records in MARC
    
    setSpec value for selective harvesting: HANNADC
    Name of the set (setName): HANNA records in DC simple
    
    setSpec value for selective harvesting: ABODC
    Name of the set (setName): Austrian Books Online in DC simple
    
    setSpec value for selective harvesting: ARIADNEDC
    Name of the set (setName): Ariadne records in DC simple
    
    setSpec value for selective harvesting: ARIADNEMARC
    Name of the set (setName): Ariadne records in MARC
    
    setSpec value for selective harvesting: MAPMARC
    Name of the set (setName): Maps and Globes records in MARC
    
    setSpec value for selective harvesting: FULLDC
    Name of the set (setName): Complete set of ONB records in DC simple
    
    setSpec value for selective harvesting: MAPDC
    Name of the set (setName): Maps and Globes records in DC simple
    
    setSpec value for selective harvesting: ABOMARC
    Name of the set (setName): Austrian Books Online in MARC
    
    setSpec value for selective harvesting: OAIBIBLIOA
    Name of the set (setName): Austrian Bibliography A
    
    setSpec value for selective harvesting: MUSHANDC
    Name of the set (setName): Musikhandschriften in DC
    
    setSpec value for selective harvesting: MUSHANMARC
    Name of the set (setName): Music Manuscripts
    
    setSpec value for selective harvesting: CERLMARC
    Name of the set (setName): Old prints and manuscripts for CERL portal
    

2.1 SRU.ipynb

0 → 100644
+42 −0
Original line number Diff line number Diff line
%% Cell type:code id: tags:

``` python
from lxml import etree
import requests
import pandas as pd
```

%% Cell type:code id: tags:

``` python
def getDCDataMMS(mms_id):
    cont=requests.get('https://obv-at-oenb.alma.exlibrisgroup.com/view/sru/43ACC_ONB?version=1.2&query=alma.mms_id='
                      + mms_id + '&startRecord=0&maximumRecords=1&operation=searchRetrieve&recordSchema=dc').content
    e = etree.XML(cont)
    namespaces = {
        'srw': 'http://www.loc.gov/zing/srw/',
        'srw_dc': 'info:srw/schema/1/dc-schema',
        'dc': 'http://purl.org/dc/elements/1.1/'
    }
    xpath = '/srw:searchRetrieveResponse/srw:records/srw:record/srw:recordData/srw_dc:dc/dc:{}/text()'

    titleResult = e.xpath(xpath.format('title'), namespaces=namespaces)
    title = "; ".join(titleResult) if titleResult else ''

    contributorResult = e.xpath(xpath.format('contributor'), namespaces=namespaces)
    contributor = "; ".join(contributorResult) if contributorResult else ''

    dateResult = e.xpath(xpath.format('date'), namespaces=namespaces)
    date = "; ".join(dateResult) if dateResult else ''
    return [title, contributor, date]
```

%% Cell type:code id: tags:

``` python
getDCDataMMS(str(990048102650603338))
```

%% Output

    ['Sammlung der besten Reisebeschreibungen', '', '1784']

2.3 SPARQL.ipynb

0 → 100644
+144 −0

File added.

Preview size limit exceeded, changes collapsed.

+2 −0
Original line number Diff line number Diff line
@@ -3,3 +3,5 @@ sickle
lxml
requests
pyswagger
pandas
SPARQLWrapper
 No newline at end of file