README.md

# Extract figures by iiif manifest

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Flabs.onb.ac.at%2Fgitlab%2Fa.rabensteiner%2Fextract_figures_abo/HEAD?labpath=extract_figures.ipynb)

This repository provides a Jupyter notebook [extract_figures.ipynb](extract_figures.ipynb) that uses a YOLOv8 model to extract figures from a book given the url of its iiif manifest.

The model has been trained on the following fives book from the ABO corpus:
- http://data.onb.ac.at/ABO/%2BZ97792402
- http://data.onb.ac.at/ABO/%2BZ155502807
- http://data.onb.ac.at/ABO/%2BZ156318706
- http://data.onb.ac.at/ABO/%2BZ164403901
- http://data.onb.ac.at/ABO/%2BZ22101290X

From these approximately 1700 book pages 250 contain figures that have been annotated with bounding boxes with the image annotation webservice [CVAT](https://www.cvat.ai/). Training has been done locally with the nano version of YOLOv8. The resulting model for figure detection is given by [model_extract_figures.pt](model_extract_figures.pt).

![Suggested bounding boxes of figures by the trained model.](example.png)