Esperanto Newspaper Excerpts
Info
Data Set created: April 2024
Data Set updated: 25 April 2024
Content
Images, text and metadata for a collection of historical newspaper excerpts (from the period 1898–1915), containing information about the Esperanto language and community.
Data Provenance and Quality
The data set contains recently scanned images (at ONB Österreichische Nationalbibliothek. Austria’s largest library and a cultural and research institution located in Vienna; founded in the 14th century... Consortium of Austrian universities and research institutions that coordinates and promotes Austrian activities in the European → ESFRI research infra...
Historical Background
Esperanto is a language designed to enable easy communication between people of different countries and cultures. It was launched in 1887 by Dr. Ludwik L. Zamenhof under the pseudonym “Doktoro Esperanto”, which literally means “Doctor Hoper”. The goal of Esperanto was to create an international auxiliary language that everyone on Earth could speak, facilitating global communication and understanding. With its regular grammar and structure, Esperanto has been used as a universal second language for international communication. The “Hachette Collection” consists of 17,204 articles taken from newspapers published in many European countries in the period from 1898 until 1915, which are held at the Departmenf of Planned Languages. The articles themselves deal with Esperanto-themed events and persons, e.g., reports from the Esperanto World Congress and as such serve as an excellent and unique opportunity to study the history of the Esperanto movement in Europe in the early 20th century.
Maintenance
This data set can be updated irregularly.
Links
- 17.204
- Articles
- 22.247
- Images
- 17.204
- IIIF Manifests

Detail from an article (taken from the 13 August 1910 issue of the Washington Times) featuring an image of the founder of Esperanto, Ludwik L. Zamenhof
Reuse
Information on Rights and Reuse
Cite as
Preview
Documents in the Data Set
Data Set Items
Data
Download and Access Options
Sample
- 15 selected articles from the collection with metadata records
- 26 .jpg, .xml (ALTO), .txt files
- Readme explanation of properties used
- .zip archive (72.7 MB)
Metadata Records
- Bibliographic metadata for all articles
- Readme explanation of properties used
- .csv table (3.4 MB) and .xlsx file (1.1 MB)
IIIF Collection
- IIIF collection with URLs to all 22.247 images
- 17.204 articles with metadata
- .json file format
Code
- Repository with Python source code used in the project
- Jupyter Notebooks
- Readme file with requirements and installation instructions
Use Cases
Areas of Application Related to the Data Set
Possible Uses
Since the contents of the data set are highly multilingual, the images could be used to train a multilingual CV Computer Vision is a field of artificial intelligence (AI) focused on enabling computers to interpret and understand visual information from the wor... Optical Character Recognition. Electronic conversion of images with typewritten, handwritten or printed text into machine-encoded text, for example ...