Planned Languages

Info

Data Set created: December 2019

Data Set updated: 25 November 2024

Data Provenance & Quality

The objects relating to this data set have been digitized by the department of digitization of the ONB 

Österreichische Nationalbibliothek. Austria’s largest library and a cultural and research institution located in Vienna; founded in the 14th century...

.
The metadata is based on the library catalogue and contains 2.818 instances relating to as many objects held in the collection of the Department of Planned Languages at the ONB. For more information on the data points and their respective contents check the Readme linked below.
The data set has not been cleaned, augmented or corrected. It does not represent the entirety of the collection at the Department of Planned Languages, but merely an excerpt of it (see Historical Background below). When the data set was originally put together in 2018 only those objects were chosen that had already been catalogized as well as digitized.
The data set comprises 40 languages.

Historical Background

The Department of Planned Languages exists under this name since 1990. However, in 1927 Hugo Steiner founded the International Esperanto Museum, which has been a part of the ONB 

Österreichische Nationalbibliothek. Austria’s largest library and a cultural and research institution located in Vienna; founded in the 14th century...

since 1928. Today, the Esperanto Museum’s library contains about more than 150.000 items (e.g. 40.000 flyers, 40.000 printed volumes, 25.000 newspaper articles, 22.000 photographs, 20.000 handwritten texts and manuscripts, 4.100 journals). Since 2005, the Esperanto Museum is located in Palais Mollard, Herrengasse 9, Vienna.

Maintenance

This data set can be updated irregularly.

  • 2.818
  • IIIF Manifests
  • 310.217
  • Images
  • 2.818
  • Documents

Detail from the cover of Harald Clegg: Esperanto. The Why and The What (London 1906) [ONB call number: 701561-A ESP MAG].

Reuse

Information on Rights and Reuse

Cite as

ONB Labs. “Dataset Planned Languages.” ONB Labs. Nov 25, 2024. Accessed on Aug 19, 2025, https://labs.onb.ac.at/en/datasets/esperanto/.

Preview

Documents in the Data Set

Data Set Items

No manifest selected
    Toggle full pageToggle full pageToggle full pageToggle full page
    Previous pagePrevious pagePrevious pagePrevious page
    Next pageNext pageNext pageNext page

    Data

    Download and Access Options

    Sample

    • 5 selected documents from the collection with metadata records
    • 296 .jpg files
    • Readme explanation of properties used
    • .zip archive (242 MB)

    Metadata Records

    • Bibliographic metadata for all documents
    • Readme explanation of properties used
    • .csv table (2.9 MB)

    IIIF Collection

    • IIIF collection with URLs to all 310.217 images
    • 2.818 documents with metadata
    • .json file format

    OAI-PMH Data Set

    • Metadata from the library catalogue
    • OAI-PMH standard for data harvesting
    • .xml file format (MARC21)

    Use Cases

    Areas of Application Related to the Data Set

    Possible Uses

    Since the contents of the data set are highly multilingual, the images could be used to train a multilingual CV 

    Computer Vision is a field of artificial intelligence (AI) focused on enabling computers to interpret and understand visual information from the wor...

    -/OCR 

    Optical Character Recognition. Electronic conversion of images with typewritten, handwritten or printed text into machine-encoded text, for example ...

    -model. Furthermore, since the data set contains metadata to text documents, it could be used to gain an exemplary understanding of the distribution of languages over time, and to therefore gain an insight into certain characteristics that might be present in the entire collection. The metadata could also be analyzed to get an overview of the places, where the text documents mostly originate from (via the data point 'countryCodes'). It might also be possible to visualize a network of people and/or publishers (via the data point 'persons' and/or 'publishers').