# Creating your own text using [Tesseract OCR](https://github.com/tesseract-ocr/tesseract)
If available OCR is too bad, we can manually create our own. Here are some ideas on what to try:
**Improve image quality and text orientation**
We will use a custom Python script developed at ONB for use in two projects supported by CLARIAH-AT (see https://clariah.at/en/projects/machine-learning-suite-iiif-resources/ and https://clariah.at/en/projects/esperanto-newspaper-excerpts/). The script is intended for use on historic scans and will remove scanning borders, deskew the image as well as convert to gray scale. See also the demonstration of this script at the ANNO event 2023 (https://labs.onb.ac.at/gitlab/labs-team/anno-event-2023).
**Improve data basis of OCR software**
For German Fraktur script Tesseract with default data will not give good results. We can use specialized models trained for Fraktur from https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/. For example:
### Apply Tesseract to original and preprocessed images
This assumes your system has Tesseract installed and that it is available (i.e. added to the PATH variable) under the command `tesseract`. We make the call with the following options:
- language data `-l deu+Fraktur+frk`: provided by Tesseract for German and Fraktur scripts (see https://github.com/tesseract-ocr/tessdata_best)
- page segmentation mode `--psm 3`: Fully automatic page segmentation, but no OSD. (Default). You can see other available modes using the command `tesseract --help-extra`