Skip to content
ViewerlinksFromBibframe.ipynb 4.66 KiB
Newer Older
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lxml import etree\n",
    "import requests\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Linked Data from ALMA (library management system) can be retrieved in \n",
    "\n",
    "* BIBFRAME via `https://open-na.hosted.exlibrisgroup.com/alma/<institution code>/bf/entity/instance/<mms id>`\n",
    "* JSON-LD via `https://open-na.hosted.exlibrisgroup.com/alma/<institution code>/bibs/<mms_id>.jsonld`\n",
    "* RDA/RDF via `https://open-na.hosted.exlibrisgroup.com/alma/<institution code>/rda/entity/manifestation/<mms id>.rdf`\n",
    "\n",
    "For a Network Zone MMS ID the institution code is 43ACC_NETWORK and for the Institution MMS ID it is 43ACC_ONB.\n",
    "\n",
    "The following xpath `/rdf:RDF/bf:Instance/bf:hasItem/bf:Item/bf:electronicLocator/rdfs:Resource/bflc:locator/@rdf:resource` selects URLs of the Viewer. We use the + sign (URL encoded %2B) to spilt the URL in order to extract the Barcode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def getLinksAndBarcodes(local_mms_id):\n",
    "    cont=requests.get('https://open-na.hosted.exlibrisgroup.com/alma/43ACC_ONB/bf/entity/instance/' + local_mms_id).content\n",
    "    e = etree.XML(cont)\n",
    "    namespaces = {\n",
    "        'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',\n",
    "        'bf': 'http://id.loc.gov/ontologies/bibframe/',\n",
    "        'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',\n",
    "        'bflc': 'http://id.loc.gov/ontologies/bflc/'\n",
    "    }\n",
    "    result = e.xpath('/rdf:RDF/bf:Instance/bf:hasItem/bf:Item/bf:electronicLocator/rdfs:Resource/bflc:locator/@rdf:resource', namespaces=namespaces)\n",
    "    barcodes = []\n",
    "    for link in result:\n",
    "        splits = link.split('%2B')\n",
    "        if len(splits) >= 2:\n",
    "            barcodes.append('+' + link.split('%2B')[1])\n",
    "    print (local_mms_id + ': ' + \", \".join(barcodes))\n",
    "    linksJoined = \", \".join(result)\n",
    "    barcodesJoined = \", \".join(barcodes)\n",
    "    #returns a list with URLs and Barcodes\n",
    "    return [linksJoined, barcodesJoined]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "export lists from ALMA as Excel file and read it into a pandas DataFrame (the column MMS-ID contains Institution MMS IDs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "NaN    990032334150603338\n",
       "NaN    990035648370603338\n",
       "NaN    990043237990603338\n",
       "Name: MMS-ID, dtype: int64"
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_excel('ABOExamplesFromALMA.xlsx')\n",
    "df_sample = df.sample(3).copy()\n",
    "df_sample['MMS-ID']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "add additional columens to the dataframe with ViewerLinks and Barcodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "990032334150603338: +Z227525900, +Z172047601\n",
      "990035648370603338: +Z219890307, +Z255756803\n",
      "990043237990603338: +Z172048009, +Z207476305\n"
     ]
    }
   ],
   "source": [
    "df_sample[['Viewerlinks','Barcodes']] = df_sample.apply(lambda row: getLinksAndBarcodes(str(row['MMS-ID'])), axis=1, result_type='expand')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "write the extened dataframe into an Excel file again"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "writer = pd.ExcelWriter(r'ABOExamplesFromALMAextended.xlsx', engine='xlsxwriter',options={'strings_to_urls': False})\n",
    "df_sample.to_excel(writer)\n",
    "writer.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
 "nbformat_minor": 1