Webarchive Austria

Starting March 2009 the webarchive Austria archives the “Austrian Webspace” https://webarchiv.onb.ac.at. The data is collected via regular domain, thematic and event-based crawls. These crawls include all .at, ac.at and gv.at domains, all .wien and .tirol domains and further websites with Austrian content. The data set includes metadata to selected domains within topical collections. The webarchive API allows to search URLs and partial full text searches.

The webarchive Austria includes more than 2 million websites. The metadata is licensed under Creative Commons Zero Lizenz (CC0).

Tools & Experiments

Data

Currently the following data is accessible:

DescriptionLink

Selective Crawls

Basis for the webarchive collection “Laufende Crawls”

/data/selective.json

Event Crawls

Basis for the webarchive collection “Event Crawls”

/data/events.json

Other Web Archives

Links to other wayback machines - accept queries using the same format as the Webarchive Austria

/data/otherwebarchives.json

Object Count

Number of objects currently in the webarchive

/data/objectcount.json

Code

APIs and modules

DescriptionLink

API description

API description using Swagger - swagger.json

Swagger Tool

Python binding

Python module for using the webarchive API

webarchiv.py

Tutorials

Instructive Jupyter Notebooks

DescriptionLink

Notebook Selective

Extract all Seeds from a selective crawl

sample1.ipynb

Notebook Wayback Search

Search for all Captures of a URL and process the results

sample2.ipynb

Notebook Text Search

Search within the webarchive’s text and process the results

sample3.ipynb

Notebook Combined Search

Wayback search all URLs of a selective crawl

sample4.ipynb