Jupyter Notebooks
Examples for using the webarchive API in Python
Starting March 2009 the webarchive Austria archives the “Austrian Webspace” https://webarchiv.onb.ac.at. The data is collected via regular domain, thematic and event-based crawls. These crawls include all .at, ac.at and gv.at domains, all .wien and .tirol domains and further websites with Austrian content. The data set includes metadata to selected domains within topical collections. The webarchive API allows to search URLs and partial full text searches.
The webarchive Austria includes more than 2 million websites. The metadata is licensed under Creative Commons Zero Lizenz (CC0).
Currently the following data is accessible:
Description | Link | |
---|---|---|
Selective Crawls | Basis for the webarchive collection “Laufende Crawls” | |
Event Crawls | Basis for the webarchive collection “Event Crawls” | |
Other Web Archives | Links to other wayback machines - accept queries using the same format as the Webarchive Austria | |
Object Count | Number of objects currently in the webarchive |
APIs and modules
Description | Link | |
---|---|---|
API description | API description using Swagger - swagger.json | |
Python binding | Python module for using the webarchive API |
Instructive Jupyter Notebooks
Description | Link | |
---|---|---|
Notebook Selective | Extract all Seeds from a selective crawl | |
Notebook Wayback Search | Search for all Captures of a URL and process the results | |
Notebook Text Search | Search within the webarchive’s text and process the results | |
Notebook Combined Search | Wayback search all URLs of a selective crawl |