Enriching Metadata for Travelogues
The interdisciplinary FWF/DFG-funded project "Travelogues: Perceptions of the Other 1500-1876" analyses historical travelogues from the Austrian National Library's collections by combining historical and computational methods. For the analysis of intertextual relations among travelogues and of concepts of otherness and by means of a genre-classifier for travelogues, we have built a corpus of over 3000 travelogues. Bibliographic metadata is building the core of the project, which is why we decided to edit and enrich our metadata in the integrated library system (catalogue) as single open accessible source. Bring-your-project considerably supported our metadata-workflow by setting up an ALMA Data Extractor and by running automated matching with the VD17.
The VD17 (Verzeichnis der im deutschen Sprachraum erschienen Druckes des 17. Jahrhunderts) is since its foundation in 1996 the most extensive database for German Baroque prints. As the collective volume "Schmelze des barocken Eisbergs?" (2010) shows, it offers reference material for bibliographic research based on quantitative methods. More recently SRU has been added to query for single elements or combinations and to export data in DC, MODS or MARC-XML format. This also offers the possibility to automatically match results to queries in other library catalogues. For the Travelogues project matching was done on the basis of selected PICA-elements (language and genre) and selected elements of the Austrian National Library's MARCXML to on the one hand answer the question of how many travelogues listed in the VD17 are not part of travelogues corpus and on the other hand to possibly find more travelogues in the library catalogue not marked as travelogues as of yet. Due to inconsistent and messy data in the library catalogue potential new travelogues could not be narrowed automatically and had to be checked manually. As a result five additional travelogues were detected. Bibliographers in the Travelogues project now also have a better understanding of the representativeness of the collected 17th-century travelogues in the Austrian National Library.
Title-page and frontispiece to a 17th century travelogue (http://data.onb.ac.at/rec/AC05902884)
Users usually address the Austrian National Library's metadata by running a query in Primo/Quicksearch, the integrated library systems graphic user interface. The query delivers results of a single record or a list of records, but offers only limited export of single records metadata in predesigned formats (Bibtex, RIS, Refworks, Endnote, Easybib, Mail, Print) and no possibility at all to export a list. On the other hand access to catalogue data via Austrian National Library's SRU and OAI-PHM-Sets as offered by ONB Labs - though well documented - maybe too technically demanding for most users not familiar with these standards. For most users, as well as researches in particular, spreadsheets are still a common format to get a low-level overview over datasets or bigger amounts of structured data. The British Library for example not only offers data in genuine formats as RDF/XML, JSON-LD or MARCXML but also in "Researcher Format" (CSV) as part of their Collection Metadata Strategy. For the Austrian National Library an extraction based on a Pythonscript was developed to bridge this gap. Though the Exlibris integrated library system ALMA gives librarians without access to their analytics-tool only an xlsx-output, which can not be adjusted to individual needs, the newly developed extraction can be adjusted for individual needs and projects by changing the mapping. For librarians this customised extractions offer a smart tool for data profiling and quality control of more than one dataset at time.
Mag. Martin Krickl, scientist at Travelogues
Alma Data Extractor
Tool to extract metadata from the ALMA catalogue and fruther processing via spreadsheet software
Project website with published materials of the Travelogues project