The project “Bibliotheca Eugeniana Digital” (BED) is a cooperation project between the Austrian National Library and the University for Continuing Education Krems, funded by the Austrian Academy of Sciences under the “go!digital 3.0” program. The project runs for two years, from November 2022 to November 2024.
The aim of the project Bibliotheca Eugeniana Digital (BED) is the digital reconstruction and visual representation of Prince Eugene’s book collection (UNESCO “Memory of Austria”), one of the most famous collections of the Baroque period. Since 1738, the collection has been part of the Habsburg Court Library, today the Austrian National Library (ONB). To this day, the exact composition, extent, and locations of the printed books in the ONB’s collections have not been analyzed, as the task has been considered too vast and complex for traditional methods. The digitization of sources, combined with new digital approaches, now enables new ways to open up large cultural collections such as the “Bibliotheca Eugeniana”.
The project applies tools and methods from Digital Humanities and Data Science to digitally reconstruct and visually explore the library in a systematic way, examining its composition and history through a variety of sources.
Most of the books of the Bibliotheca Eugeniana were digitized as part of the project “Austrian Books Online” (ABO). The majority of its bound volumes are uniformly decorated on the front and back covers with Prince Eugene’s coat of arms (see Figure 1). These bindings, here referred to as “supralibros bindings”, will be analyzed in the project using Machine Learning (ML) to detect visual features. In addition, the historical handwritten catalog of the Bibliotheca Eugeniana as well as archival material on the transformation of this library in the 19th century will be digitally processed using ML for handwritten text recognition (HTR) and published in the Austrian National Library’s digital editions infrastructure.
All of these data will be merged with the metadata from the Austrian National Library’s public catalog. Titles from the Digital Edition and full texts from ABO will again be classified into subject groups using ML and natural language processing (NLP) algorithms. This classification will provide new insights into the internal structure of the library and its relationship to the color system of the supralibro bindings.
The University for Continuing Education Krems (UWK) will develop a set of coordinated visualizations based on the multilayered historical collection data, enabling analysis and research into the structure, transformation, and localization of the Bibliotheca Eugeniana collection. For public communication of the project outcomes, complementary narrative visualizations will be created. BED will publish the results in a variety of formats tailored to both experts and a general audience.
All data generated within the project will be made available via the ONB Labs and shared with European research infrastructures in line with the FAIR principles. As a collaboration between a cultural heritage institution and a research institution, BED contributes to the strategy “DH Austria 2021” by promoting knowledge transfer between the two sectors.
To obtain as much information as possible for the reconstruction of the book collection, various approaches are combined:
In this step, ML-based image classification models are used to identify provenance markers. In a pilot study of the ONB, this method was successfully applied using CNN models for binary classification of Bibliotheca Eugeniana Supralibros from the ABO corpus. In the BED project, the approach will be revised and expanded by comparing different types of CNN architectures and configurations (e.g., varying network depths). A two-step model will be pursued. In the first step, the provenance marker, called the supralibros, is detected, and a cropped image of it is returned.
In the second step, the cropped image is processed by a binary classifier designed to preserve the optical information that would otherwise be lost through image scaling. Particular emphasis will be placed on building the training corpus appropriately, both in terms of size and quality, so that the different types of supralibros can be structurally distinguished. This will make it possible to develop a multi-classifier model. For true positive attributions, the corresponding descriptions will be integrated into the ONB’s publicly accessible catalog.
With the help of an HTR model for text recognition with Transkribus, the information from the five-volume handwritten historical collection catalog digitized at the ONB will be extracted. For the (semi-)automatic tagging of authors and publication places, NLP methods (e.g., named entity recognition) will then be applied. The entries will be mapped to TEI-XML elements on the basis of a schema already developed by the ONB for the digital edition of another historical library catalog. The XML files and page images will be published as a digital edition in the sustainable infrastructure of the ONB for digital editions (edition.onb.ac.at). This digital edition will contain indices with bibliographic information on all titles, persons, and publication places.
To identify those books that are still in the ONB today, the search API of the digital catalog will be used. With the aid of fuzzy string matching, the titles, places, and years of publication in the historical catalog will be compared with those in the modern catalog. Furthermore, titles and available full texts will be clustered into subject categories with the help of the ANNIF algorithm for subject classification, to gain deeper insights into the classification and subject areas of the library. The results of the subject classification will be mapped to the subject areas in the modern library catalog and later integrated into the digital edition of the historical catalog, to create an additional subject index. The descriptions of the identified volumes with supralibros bindings will be automatically generated on the basis of the results of the image classification and supplemented manually if necessary. The digital edition will be enriched with descriptions of the identified objects, links to named entities, and references to the open-access catalog of the ÖNB. This approach has already been tested.
The metadata of the bibliographic entries will be published as an LOD set that corresponds to the DINI schema for RDF representations of bibliographic resources and is aligned with the DARIAH collection description schema.
Data visualizations are intended to enable the exploration, representation, and public communication of the collection. They will allow the Eugeniana collection, its metadata, and their quality assessment to be represented from different analytical perspectives. In this way, they are meant to visually support distant reading and exploration of the collection, making it easier to analyze questions of composition and provenance, and to identify relevant patterns and information for further analyses and close reading.
The development of the visualizations follows a user-centered, iterative data-user-task approach, within which the most relevant options for visual analysis and exploration are defined collaboratively and iteratively in sessions with target users, and the available data examined. This analysis will serve as the basis for defining user requirements for the subsequent design and implementation of relevant visual perspectives and possible interactions. To ensure that the visualizations sufficiently support the intended tasks, interaction with the novel visualizations will be observed in a small user study, and the design refined based on the evaluation results.
In addition, a visual storytelling approach will be applied to present the history and provenance of the Bibliotheca Eugeniana in an engaging way to the public. The storyboard will be enriched with (interactive) visualizations and implemented by the UWK team in the form of a web-based story. The interface will be tested with target users from the general public and adjusted on the basis of these evaluation results. The visualized story of the provenance of the Bibliotheca Eugeniana will be an important outcome for supporting the public communication of the project’s results.
A prototype already allows a first exploration of Prince Eugene’s historical holdings through a visualization of the State Hall.
The publication of the digital edition of the handwritten catalog of the “Bibliotheca Eugeniana” is available via the platform for digital editions of the Austrian National Library.
The code and the data of the project can be viewed via the project’s open GitLab repository.
For questions or suggestions, please contact: bed-project@onb.ac.at.
On request, information on the data generated in the project can be viewed in the data management plan.