Run the notebooks in this repo in your browser by clicking the following link: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Flabs.onb.ac.at%2Fgitlab%2Flabs-team%2Fdigital-methods-of-newspaper-analysis/main) # Digital methods of newspaper analysis This is the public repository for the presentation *ANNO – von Daten zur Forschung. Arbeiten mit dem Zeitschriftenportal der Österreichischen Nationalbibliothek* given by staff of ONB Labs and the ONB Digitization Department at the summer school *Digitale Methoden der Zeitungsanalyse* (see https://www.zb.uzh.ch/en/events/summer-school-digitale-methoden-der-zeitungsanalyse). You will find here the Jupyter notebooks presented, text data as well as sample images. ## Installation Install the required packages with pip into your local Python environment (Python version 3.12) via `pip install -r requirements.txt`. Then start your jupyter server via `jupyter lab`. ## Contents of notebooks ### [ONB_IIIF_API](ONB_IIIF_API.ipynb) Here we present how to access ONB's newspaper data using Python and an API. The API follows the specification of the IIIF (https://iiif.io) and supplies images, metadata and text annotations. ### [OCR_samples](OCR_samples.ipynb) Here we talk about two methods for creating and improving your own OCR starting from images using [Tesseract OCR](https://github.com/tesseract-ocr/tesseract). Firstly, improving the image quality and orientation of the text. Secondly, we use specifically trained language data adatped for German Fraktur script.