Tools for accessing the Webarchive API
webarchiv.py
from the ONB LabsThe documentation is available under https://webarchiv.onb.ac.at/api.html#.
API_KEY = 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ'
Take a look at https://webarchiv.onb.ac.at/api.html#/ and try it out.
import requests
BASE_URL = 'https://webarchiv.onb.ac.at/api'
Let's take a look at /welcome
r = requests.get(f'{BASE_URL}/welcome')
r.json()
We need a fingerprint and a valid API key. A key has been generated for PyDays19.
import uuid
FINGERPRINT = str(uuid.uuid4())
API_KEY = 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ'
FINGERPRINT, API_KEY
We need to authenticate first in order to get a valid token.
auth_r = requests.post(f'{BASE_URL}/authentication', json={
'apikey': API_KEY,
'version': '0.1.0',
'fingerprint': FINGERPRINT
})
auth_r.status_code
auth_r.json()
token = auth_r.json()['t']
Now we can submit other requests, a search for example.
search_r = requests.get(f'{BASE_URL}/search/domainname', params={
'apikey': API_KEY,
't': token,
'q': 'wien'
})
search_r.status_code
search_r.json()
We use the online generator at https://generator.swagger.io/.
import io
import zipfile
import shutil
def generate_swagger_client():
# Generate Python Client
generated_r = requests.post('https://generator.swagger.io/api/gen/clients/python', json={
'swaggerUrl': 'https://webarchiv.onb.ac.at/api/swagger.json'
})
generated_r.raise_for_status()
link = generated_r.json()['link']
# Download ZIP with generated client
zip_r = requests.get(link)
zip_r.raise_for_status()
# Open and extract
zip_file = zipfile.ZipFile(io.BytesIO(zip_r.content))
zip_file.extractall()
# Move package to working directory and clean up
shutil.move('python-client/swagger_client', 'swagger_client')
shutil.rmtree('python-client')
import swagger_client
Set base URL
client = swagger_client.ApiClient()
client.configuration.host = 'https://webarchiv.onb.ac.at/api'
Authenticate
auth_obj = swagger_client.Authenticate(apikey=API_KEY, fingerprint=str(uuid.uuid4()))
aa = swagger_client.AuthenticationApi(client)
auth_r = aa.authenticate(body=auth_obj)
auth_r
token = auth_r.t
Search for domain name
search_api = swagger_client.SearchApi(client)
search_r = search_api.search_domainname(q='wien', t=token, apikey=API_KEY)
search_r
from pyswagger import App
from pyswagger.contrib.client.requests import Client
from pyswagger.utils import jp_compose
Create client and app
app = App.create(url='https://webarchiv.onb.ac.at/api/swagger.json')
client = Client()
Add missing support for JSON-LD
app.mime_codec.register('application/ld+json', app.mime_codec._codecs['application/json'])
List operations
app.op
Authenticate
r = client.request(app.op['authenticate'](body={
'apikey': API_KEY,
'fingerprint': '1234'
}))
r.status
r.data
token = r.data['t']
Search for domain name
r = client.request(app.op['searchDomainname'](
apikey=API_KEY,
t=token,
q='wien'
))
r.status
r.data
webarchiv.py
is part of this repository. It makes extensive use of requests
.
If you need the direct download link:
https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/raw/master/webarchiv.py?inline=false
import webarchiv
Authentication is automatic
session = webarchiv.WebarchivSession(API_KEY, allow_tracking=True)
Search for domain name
r = session.domain_name_search('wien')
r.status_code
r.json()
Available access methods for WebarchivSession
help(session)
More samples using webarchiv.py
: