Examples for using the webarchive API in Python
Starting March 2009 the webarchive Austria archives the "Austrian Webspace" https://webarchiv.onb.ac.at. The data is collected via regular domain, thematic and event-based crawls. These crawls include all .at, ac.at and gv.at domains, all .wien and .tirol domains and further websites with Austrian content. The data set includes metadata to selected domains within topical collections. The webarchive API allows to search URLs and partial full text searches.
The webarchive Austria includes more than 2 million websites. The metadata is licensed under Creative Commons Zero Lizenz (CC0).
Currently the following data is accessible:
Basis for the webarchive collection 'Laufende Crawls'
Basis for the webarchive collection 'Event Crawls'
Other Web Archives
Links to other wayback machines - accept queries using the same format as the Webarchive Austria
Number of objects currently in the webarchive
APIs and modules
API description using Swagger - swagger.json
Python module for using the webarchive API
Instructive Jupyter Notebooks
Extract all Seeds from a selective crawl
Notebook Wayback Search
Search for all Captures of a URL and process the results
Notebook Text Search
Search within the webarchive's text and process the results
Notebook Combined Search
Wayback search all URLs of a selective crawl