From 31950397dba5fa9795dc55c86267c6f90409d52d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Gabriele=20H=C3=B6fler?= Date: Tue, 19 Nov 2019 15:50:24 +0100 Subject: [PATCH] Query records by MMS-ID Not all records exported from Alma will have an AC-number in 009 plus this makes the Notebook useable for a wider audience. Using MMS-ID instead of AC-number as this is the system's default uid. --- Extract_Bibliographic_Info_From_Alma.ipynb | 99 +++++++++++++--------- 1 file changed, 60 insertions(+), 39 deletions(-) diff --git a/Extract_Bibliographic_Info_From_Alma.ipynb b/Extract_Bibliographic_Info_From_Alma.ipynb index c2dd740..02b0cb7 100644 --- a/Extract_Bibliographic_Info_From_Alma.ipynb +++ b/Extract_Bibliographic_Info_From_Alma.ipynb @@ -5,10 +5,15 @@ "metadata": {}, "source": [ "# Extract Bibliographic Data by Unique ID\n", + "## Introduction\n", "\n", - "This notebook assumes that the user has a list of unique IDs for bibliographic records within the library software system [Alma](https://knowledge.exlibrisgroup.com/Alma/Product_Documentation/010Alma_Online_Help_(English)/010Getting_Started/010Alma_Introduction/010Alma_Overview).\n", + "This notebook assumes that the user has a list of unique IDs for bibliographic records stored in the library software system [Alma](https://knowledge.exlibrisgroup.com/Alma/Product_Documentation/010Alma_Online_Help_(English)/010Getting_Started/010Alma_Introduction/010Alma_Overview).\n", "\n", - "In the following code the unique IDs are AC-numbers, which are a special identifier within the [Austrian Library Network](https://www.obvsg.at/). You could also provide other unique IDs like MMS-IDs or barcodes. In that case find and replace the function *by_marc_009()* with one of the other two functions provided by the catalogue submodule: *by_barcode()* or *by_mms_id()*.\n", + "These records are then filtered by categories contained in MARC-XML. Find documentation on the MARC-XML-format through the website of the [Library of Congress](https://www.loc.gov/marc/bibliographic/) or specifically for Austrian cataloging standards refer to the second and third column of the (german-only) [Konkordanz](https://wiki.obvsg.at/Katalogisierungshandbuch/KonKordanz).\n", + "\n", + "In the following code the unique IDs are MMS-IDs, which are a special unique identifier within Alma-records. You could also provide other unique IDs like barcodes or any ID from MARC 009 (e. g. for the Austrian Library Network: AC-numbers). In case you need to use another unique ID do the following:\n", + "* find and replace the function *by_mms_id()* with one of the other two functions provided by the catalogue submodule: *by_barcode()* or *by_marc_009()*\n", + "* replace the *regex_pattern*\n", "\n", "In this example the catalogue of the Austrian National Library is the source. We use SRU to fetch the data and python's pandas module to export the data to Excel." ] @@ -66,12 +71,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "RegEx-pattern for AC-numbers:" + "*ac_pattern* is needed for hierarchies between records within the Austrian Library Network (OBV). Within OBV the hierarchies are linked using MARC categories 773 and 830, identfying the parent by their AC-number.\n", + "\n", + "If you want to query Alma instances outside OBV, get in contact with your local consortium or institution to find out how hierarchies are linked in the MARC-record. You will need to change the function *find_parent_id_in_child_xml()* accordingly." ] }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ @@ -98,7 +105,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -115,7 +122,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -184,7 +191,7 @@ "33 Signatur aus Subfield $$d, danach ohne Trennze... " ] }, - "execution_count": 4, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -209,7 +216,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ @@ -227,7 +234,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -285,20 +292,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "With a given list of AC-numbers, create an Excel-file of the bibliographic data for the records." + "With a given list of unique identifiers, create an Excel-file of the bibliographic data for the records." ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "metadata": {}, "outputs": [], "source": [ - "def ac_list_to_excel(ac_list, excel_file_name_stem):\n", - " data = [get_bibliographic_for_ac(ac) for ac in ac_list]\n", + "def uid_list_to_excel(uid_list, excel_file_name_stem):\n", + " data = [get_bibliographic_for_uid(uid) for uid in uid_list]\n", " df = pd.DataFrame(data)\n", " df_post = post(df)\n", - " df_post.to_excel(f'Output/{excel_file_name_stem} {now()}.xlsx')" + " df_post.to_excel(f'Output/{excel_file_name_stem}_{now()}.xlsx', index=False)" ] }, { @@ -310,7 +317,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ @@ -324,17 +331,24 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Extract AC-numbers from a given Excel-file." + "Extract unique IDs from a given Excel-file. In this example we used a search-export done in Alma, where the MMS-ID is listed in the column 'MMS ID'.\n", + "\n", + "For any other Excel-file use the header of the column. This means your Excel-file may not contain data in the first row, but must have a distinct name for the data listed in the column below." ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 11, "metadata": {}, "outputs": [], "source": [ - "def load_ac_list(file_name):\n", - " return pd.read_excel(file_name)['Datensatznummer'].apply(lambda s: ac_pattern.findall(s)[0])" + "def load_uid_list(file_name):\n", + " try:\n", + " record_numbers = pd.read_excel(file_name)['MMS ID']\n", + " except Exception as e:\n", + " print(f'Exception encountered while reading Excel-file: {str(e)}', file=sys.stderr)\n", + " else:\n", + " return record_numbers" ] }, { @@ -353,26 +367,26 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ - "def get_bibliographic_for_ac(ac):\n", + "def get_bibliographic_for_uid(uid):\n", " try:\n", - " marc_xml = alma.by_marc_009(ac)\n", - " parent_acnum = find_parent_id_in_child_xml(marc_xml)\n", - " if parent_acnum:\n", - " parent_xml = fetch_parent_xml(parent_acnum)\n", + " marc_xml = alma.by_mms_id(uid)\n", + " parent_uid = find_parent_id_in_child_xml(marc_xml)\n", + " if parent_uid:\n", + " parent_xml = fetch_parent_xml(parent_uid)\n", " parent_title, parent_categories, parent_contents = inherit_from_parent(parent_xml)\n", " except almasru.NoRecord:\n", - " print(f'No record for AC number \"{ac}\" found.', file=sys.stderr)\n", + " print(f'No record for unique ID \"{uid}\" found.', file=sys.stderr)\n", " d = OrderedDict()\n", " for column, _ in column_extractors.items():\n", " d[column] = None\n", - " d[\"Systemnummer\"] = ac\n", + " d[\"Systemnummer\"] = uid\n", " return d\n", " except Exception as e:\n", - " print(f'Exception encountered: {str(e)}', file=sys.stderr)\n", + " print(f'Exception encountered while fetching bibliographic data: {str(e)}', file=sys.stderr)\n", " else:\n", " d = OrderedDict()\n", " for column, extractor in column_extractors.items():\n", @@ -388,12 +402,12 @@ "source": [ "### Fetch Data of Parent Record\n", "\n", - "Parents can be referenced by unique ID either in MARC 773 \\$\\$w or 830 \\$\\$w. In our case references can only be resolved if they are AC-numbers." + "Parents can be referenced by unique ID either in MARC 773 Subfield w or 830 Subfield w. In our case references can only be resolved if they are AC-numbers." ] }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 13, "metadata": {}, "outputs": [], "source": [ @@ -404,10 +418,10 @@ " for subfield in datafield:\n", " if subfield.attrib.items() >= {\"code\": \"w\"}.items():\n", " try:\n", - " parent_acnum = ac_pattern.findall(subfield.text)[0]\n", + " parent_uid = ac_pattern.findall(subfield.text)[0]\n", " except Exception as e:\n", " print(f\"ERROR: Couldn't find AC-Num in 773 or 830 of the child. {e}\", file=sys.stderr)\n", - " return parent_acnum" + " return parent_uid" ] }, { @@ -419,15 +433,15 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 14, "metadata": {}, "outputs": [], "source": [ - "def fetch_parent_xml(parent_acnum):\n", + "def fetch_parent_xml(parent_uid):\n", " try:\n", - " parent_xml = alma.by_marc_009(parent_acnum)\n", + " parent_xml = alma.by_marc_009(parent_uid)\n", " except Exception as e:\n", - " print(f\"ERROR: Fetching XML of parent {parent_acnum} caused an error. {e}\", file=sys.stderr)\n", + " print(f\"ERROR: Fetching XML of parent {parent_uid} caused an error. {e}\", file=sys.stderr)\n", " return parent_xml" ] }, @@ -440,7 +454,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ @@ -477,7 +491,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 16, "metadata": {}, "outputs": [], "source": [ @@ -526,8 +540,15 @@ "metadata": {}, "outputs": [], "source": [ - "ac_list_to_excel(ac_list, file_name_stem)" + "uid_list_to_excel(uid_list, file_name_stem)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { -- GitLab