{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from lxml import etree\n", "import requests\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Linked Data from ALMA (library management system) can be retrieved in \n", "\n", "* BIBFRAME via `https://open-na.hosted.exlibrisgroup.com/alma//bf/entity/instance/`\n", "* JSON-LD via `https://open-na.hosted.exlibrisgroup.com/alma//bibs/.jsonld`\n", "* RDA/RDF via `https://open-na.hosted.exlibrisgroup.com/alma//rda/entity/manifestation/.rdf`\n", "\n", "For a Network Zone MMS ID the institution code is 43ACC_NETWORK and for the Institution MMS ID it is 43ACC_ONB.\n", "\n", "The following xpath `/rdf:RDF/bf:Instance/bf:hasItem/bf:Item/bf:electronicLocator/rdfs:Resource/bflc:locator/@rdf:resource` selects URLs of the Viewer. We use the + sign (URL encoded %2B) to spilt the URL in order to extract the Barcode." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def getLinksAndBarcodes(local_mms_id):\n", " cont=requests.get('https://open-na.hosted.exlibrisgroup.com/alma/43ACC_ONB/bf/entity/instance/' + local_mms_id).content\n", " e = etree.XML(cont)\n", " namespaces = {\n", " 'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',\n", " 'bf': 'http://id.loc.gov/ontologies/bibframe/',\n", " 'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',\n", " 'bflc': 'http://id.loc.gov/ontologies/bflc/'\n", " }\n", " result = e.xpath('/rdf:RDF/bf:Instance/bf:hasItem/bf:Item/bf:electronicLocator/rdfs:Resource/bflc:locator/@rdf:resource', namespaces=namespaces)\n", " barcodes = []\n", " for link in result:\n", " splits = link.split('%2B')\n", " if len(splits) >= 2:\n", " barcodes.append('+' + link.split('%2B')[1])\n", " print (local_mms_id + ': ' + \", \".join(barcodes))\n", " linksJoined = \", \".join(result)\n", " barcodesJoined = \", \".join(barcodes)\n", " #returns a list with URLs and Barcodes\n", " return [linksJoined, barcodesJoined]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "export lists from ALMA as Excel file and read it into a pandas DataFrame (the column MMS-ID contains Institution MMS IDs)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "NaN 990032334150603338\n", "NaN 990035648370603338\n", "NaN 990043237990603338\n", "Name: MMS-ID, dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_excel('ABOExamplesFromALMA.xlsx')\n", "df_sample = df.sample(3).copy()\n", "df_sample['MMS-ID']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "add additional columens to the dataframe with ViewerLinks and Barcodes" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "990032334150603338: +Z227525900, +Z172047601\n", "990035648370603338: +Z219890307, +Z255756803\n", "990043237990603338: +Z172048009, +Z207476305\n" ] } ], "source": [ "df_sample[['Viewerlinks','Barcodes']] = df_sample.apply(lambda row: getLinksAndBarcodes(str(row['MMS-ID'])), axis=1, result_type='expand')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "write the extened dataframe into an Excel file again" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "writer = pd.ExcelWriter(r'ABOExamplesFromALMAextended.xlsx', engine='xlsxwriter',options={'strings_to_urls': False})\n", "df_sample.to_excel(writer)\n", "writer.close()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 1 }