Commit 4ba9f5f8 authored by Stefan Karner's avatar Stefan Karner
Browse files

Add allow_tracking option to webarchive session

parent b9b656d0
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# 4.1 - Webarchive - Interacting With The API

*Tools for accessing the Webarchive API*

%% Cell type:markdown id: tags:

* Variant 1: Exploring the API manually
* Variant 2: Generate Code from Swagger JSON
* Variant 3: Use Swagger JSON dynamically
* Variant 4: Use `webarchiv.py` from the ONB Labs

%% Cell type:markdown id: tags:

The documentation is available under [https://webarchiv.onb.ac.at/api.html#](https://webarchiv.onb.ac.at/api.html#).

%% Cell type:code id: tags:

``` python
API_KEY = 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ'
```

%% Cell type:markdown id: tags:

## Variant 1: Exploring the API manually

Take a look at [https://webarchiv.onb.ac.at/api.html#/](https://webarchiv.onb.ac.at/api.html#/) and try it out.

%% Cell type:code id: tags:

``` python
import requests

BASE_URL = 'https://webarchiv.onb.ac.at/api'
```

%% Cell type:markdown id: tags:

Let's take a look at `/welcome`

%% Cell type:code id: tags:

``` python
r = requests.get(f'{BASE_URL}/welcome')
r.json()
```

%% Output

    {'@context': 'http://schema.org/',
     '@type': 'WebAPI',
     'name': 'Webarchive Austria Search API',
     'version': '0.1.0',
     'description': 'The Webarchive Austria Search API lets you find archived webpages by Fulltext or URL. The API uses standard schema.org types and is compliant with the JSON-LD specification.',
     'documentation': 'https://webarchiv.onb.ac.at/api.html',
     'provider': {'@type': 'Organization',
      'name': 'Austrian National Library',
      'contactPoint': [{'@type': 'ContactPoint',
        'name': 'Webarchive Austria',
        'url': 'https://webarchiv.onb.ac.at'}]},
     'versions': ['0.1.0'],
     'license': 'https://creativecommons.org/publicdomain/mark/1.0/',
     'transport': 'HTTP',
     'apiProtocol': 'JSON API',
     'webApiDefinitions': [{'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/authenticate',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'POST'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/domainname',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/domainname?q={q}&page={page}&pagesize={pagesize}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/fulltext',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/fulltext?q={q}&from={from}&to={to}&maxaggs={maxaggs}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/fulltext/seed',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/fulltext/seed?q={q}&g={g}&from={from}&to={to}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/fulltext/capture',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/fulltext/capture?q={q}&g={g}&from={from}&to={to}&page={page}&pagesize={pagesize}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/wayback',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/wayback?q={q}&from={from}&to={to}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/status/fulltext',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/status/fulltext?requestid={requestid}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/status/wayback',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/status/wayback?requestid={requestid}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/status/kill',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'DELETE'}]}

%% Cell type:markdown id: tags:

We need a fingerprint and a valid API key.
A key has been generated for PyDays19.

%% Cell type:code id: tags:

``` python
import uuid

FINGERPRINT = str(uuid.uuid4())
API_KEY = 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ'

FINGERPRINT, API_KEY
```

%% Output

    ('c941f5c6-c97b-4f75-bf7d-5419df62cf5f', 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ')

%% Cell type:markdown id: tags:

We need to authenticate first in order to get a valid token.

%% Cell type:code id: tags:

``` python
auth_r = requests.post(f'{BASE_URL}/authentication', json={
    'apikey': API_KEY,
    'version': '0.1.0',
    'fingerprint': FINGERPRINT
})
auth_r.status_code
```

%% Output

    201

%% Cell type:code id: tags:

``` python
auth_r.json()
```

%% Output

    {'@context': 'https://webarchiv.onb.ac.at/contexts/authenticate.jsonld',
     'apikey': 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ',
     'fingerprint': 'c941f5c6-c97b-4f75-bf7d-5419df62cf5f',
     'timestamp': 1556089763561,
     't': '9defd49246b9e8c36202ce33d6a43e268530996a',
     'version': '0.1.0'}

%% Cell type:code id: tags:

``` python
token = auth_r.json()['t']
```

%% Cell type:markdown id: tags:

Now we can submit other requests, a search for example.

%% Cell type:code id: tags:

``` python
search_r = requests.get(f'{BASE_URL}/search/domainname', params={
    'apikey': API_KEY,
    't': token,
    'q': 'wien'
})
search_r.status_code
```

%% Output

    200

%% Cell type:code id: tags:

``` python
search_r.json()
```

%% Output

    {'hits': [{'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieno.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiengut.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienmed.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwin.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiental.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieners.wien'}],
     'searchstring': 'wien',
     'context': 'https://webarchiv.onb.ac.at/contexts/domainnamesearchresult.jsonld',
     'requestid': '',
     'message': '',
     'returncode': 0,
     'total': 35101,
     'type': 1,
     'took': 427,
     'version': '0.1.0'}

%% Cell type:markdown id: tags:

## Variant 2: Generate Code from Swagger JSON

We use the online generator at [https://generator.swagger.io/](https://generator.swagger.io/).

%% Cell type:code id: tags:

``` python
import io
import zipfile
import shutil

def generate_swagger_client():
    # Generate Python Client
    generated_r = requests.post('https://generator.swagger.io/api/gen/clients/python', json={
        'swaggerUrl': 'https://webarchiv.onb.ac.at/api/swagger.json'
    })
    generated_r.raise_for_status()
    link = generated_r.json()['link']
    # Download ZIP with generated client
    zip_r = requests.get(link)
    zip_r.raise_for_status()
    # Open and extract
    zip_file = zipfile.ZipFile(io.BytesIO(zip_r.content))
    zip_file.extractall()
    # Move package to working directory and clean up
    shutil.move('python-client/swagger_client', 'swagger_client')
    shutil.rmtree('python-client')
```

%% Cell type:code id: tags:

``` python
import swagger_client
```

%% Cell type:markdown id: tags:

Set base URL

%% Cell type:code id: tags:

``` python
client = swagger_client.ApiClient()
client.configuration.host = 'https://webarchiv.onb.ac.at/api'
```

%% Cell type:markdown id: tags:

Authenticate

%% Cell type:code id: tags:

``` python
auth_obj = swagger_client.Authenticate(apikey=API_KEY, fingerprint=str(uuid.uuid4()))
aa = swagger_client.AuthenticationApi(client)
auth_r = aa.authenticate(body=auth_obj)
auth_r
```

%% Output

    {'apikey': 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ',
     'fingerprint': '635fbeae-50d5-4df7-8372-7bc93bcbec74',
     't': 'b831ef03103dd7bb74838e0678e7d2bf2aaef809',
     'timestamp': 1555515615761,
     'version': '0.1.0'}

%% Cell type:code id: tags:

``` python
token = auth_r.t
```

%% Cell type:markdown id: tags:

Search for domain name

%% Cell type:code id: tags:

``` python
search_api = swagger_client.SearchApi(client)
search_r = search_api.search_domainname(q='wien', t=token, apikey=API_KEY)
```

%% Cell type:code id: tags:

``` python
search_r
```

%% Output

    {'hits': [{'value': 'wieno.wien'},
              {'value': 'wien.wien'},
              {'value': 'wien1.wien'},
              {'value': 'wiener.wien'},
              {'value': 'wien-wien.at'},
              {'value': 'wiengut.wien'},
              {'value': 'wienmed.wien'},
              {'value': 'wienwin.wien'},
              {'value': 'wiental.wien'},
              {'value': 'wieners.wien'}],
     'message': '',
     'requestid': '',
     'returncode': 0,
     'searchstring': 'wien',
     'took': 615,
     'total': 35101,
     'type': 1,
     'version': '0.1.0'}

%% Cell type:markdown id: tags:

## Variant 3: Use Swagger JSON dynamically

Uses package [`pyswagger`](https://github.com/pyopenapi/pyswagger)

%% Cell type:code id: tags:

``` python
from pyswagger import App
from pyswagger.contrib.client.requests import Client
from pyswagger.utils import jp_compose
```

%% Cell type:markdown id: tags:

Create client and app

%% Cell type:code id: tags:

``` python
app = App.create(url='https://webarchiv.onb.ac.at/api/swagger.json')
client = Client()
```

%% Cell type:markdown id: tags:

Add missing support for JSON-LD

%% Cell type:code id: tags:

``` python
app.mime_codec.register('application/ld+json', app.mime_codec._codecs['application/json'])
```

%% Cell type:markdown id: tags:

List operations

%% Cell type:code id: tags:

``` python
app.op
```

%% Output

    {'welcome!##!welcome': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8033160>,
     'snapshot!##!getSnapshot': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021eb8>,
     'search!##!searchhistogram': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021be0>,
     'search!##!searchcapturegroup': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021898>,
     'search!##!searchdomaingroup': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80215c0>,
     'search!##!searchDomainname': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021320>,
     'search!##!killSearchRequest': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80210f0>,
     'search!##!getWaybackCalheatmapSearchRequestStatus': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010dd8>,
     'search!##!getFulltextsearchRequestStatus': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010b00>,
     'search!##!searchWayback': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010860>,
     'search!##!searchFulltext': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010588>,
     'savepage!##!send': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80103c8>,
     'authentication!##!authenticate': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80100b8>}

%% Cell type:markdown id: tags:

Authenticate

%% Cell type:code id: tags:

``` python
r = client.request(app.op['authenticate'](body={
    'apikey': API_KEY,
    'fingerprint': '1234'
}))
r.status
```

%% Output

    201

%% Cell type:code id: tags:

``` python
r.data
```

%% Output

    {'apikey': 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ',
     'fingerprint': '1234',
     'timestamp': 1555515632891,
     't': '7cf715f4487b1ace3eacf19bf3febda27f854819',
     'version': '0.1.0',
     '@context': 'https://webarchiv.onb.ac.at/contexts/authenticate.jsonld'}

%% Cell type:code id: tags:

``` python
token = r.data['t']
```

%% Cell type:markdown id: tags:

Search for domain name

%% Cell type:code id: tags:

``` python
r = client.request(app.op['searchDomainname'](
    apikey=API_KEY,
    t=token,
    q='wien'
))
r.status
```

%% Output

    200

%% Cell type:code id: tags:

``` python
r.data
```

%% Output

    {'hits': [{'value': 'wieno.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wien.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wien1.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wiener.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wien-wien.at',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wiengut.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wienmed.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wienwin.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wiental.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wieners.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'}],
     'searchstring': 'wien',
     'requestid': '',
     'message': '',
     'returncode': 0,
     'total': 35101,
     'type': 1,
     'took': 37,
     'version': '0.1.0',
     'context': 'https://webarchiv.onb.ac.at/contexts/domainnamesearchresult.jsonld'}

%% Cell type:markdown id: tags:

## Variant 4: Use webarchiv.py from the ONB Labs

`webarchiv.py` is part of this repository. It makes extensive use of `requests`.

If you need the direct download link:

[https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/raw/master/webarchiv.py?inline=false](https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/raw/master/webarchiv.py?inline=false)

%% Cell type:code id: tags:

``` python
import webarchiv
```

%% Cell type:markdown id: tags:

Authentication is automatic

%% Cell type:code id: tags:

``` python
session = webarchiv.WebarchivSession(API_KEY)
session = webarchiv.WebarchivSession(API_KEY, allow_tracking=True)
```

%% Cell type:markdown id: tags:

Search for domain name

%% Cell type:code id: tags:

``` python
r = session.domain_name_search('wien')
r.status_code
```

%% Output

    200

%% Cell type:code id: tags:

``` python
r.json()
```

%% Output

    {'hits': [{'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieno.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiengut.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienmed.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwin.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiental.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieners.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien24.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-m.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'f-u-c-k.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'b-z.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-d.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'm-k.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 's-v-h.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'v-i-p.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'a-z.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'i.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u-4.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'v-1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'p-7.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'z-u-g.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u-d-o.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'f-w.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 's-k.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-i-p.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'akh-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'gkk-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'hno-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'seo-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienfoto.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wientaxi.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwahl.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieninfo.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener-gkk.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwert.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienview.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerin.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u5.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u2.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u6.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u4.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u3.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'a1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienclean.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienergkk.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienliebe.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienguide.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienkarte.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienscout.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienhotel.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienfluss.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-haus.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-wahl.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'f2f.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'e4b.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'a2z.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'b2b.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-2.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-6.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-7.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'm2m.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'm4j.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'c-sk.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-3.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-9.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'b-it.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'se-a.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-1.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-4.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-8.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'c2b.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-a-c-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': '24-7.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'e-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-5.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'i2c.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-x.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienbibliothek.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener-biene.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener-madln.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienergebietskrankenkassegesundheitsverbund.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerjugendstil.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerlinien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerphilharmoniker.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerstaedtischeversicherung.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienfuehrung.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiendomain.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienergesundheitsverbund.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienernaschmarkt.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienersalon.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerwein.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerwirtschaft.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerwohnen.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienflughafentaxi.wien'}],
     'searchstring': 'wien',
     'context': 'https://webarchiv.onb.ac.at/contexts/domainnamesearchresult.jsonld',
     'requestid': '',
     'message': '',
     'returncode': 0,
     'total': 35101,
     'type': 1,
     'took': 138,
     'version': '0.1.0'}

%% Cell type:markdown id: tags:

Available access methods for `WebarchivSession`

%% Cell type:code id: tags:

``` python
help(session)
```

%% Output

    Help on WebarchivSession in module webarchiv object:
    
    class WebarchivSession(builtins.object)
     |  WebarchivSession(api_key)
     |
     |  Methods defined here:
     |
     |  __init__(self, api_key)
     |      Initialize self.  See help(type(self)) for accurate signature.
     |
     |  connect(self)
     |      Connect to the Webarchive API, request and save a token.
     |
     |  domain_name_search(self, query_string, page_=1, pagesize_=100)
     |      Start a domain name search in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param page_: The page number parameter works with the page size parameter to control the offset of the records returned in the results. Default value is 1
     |      :param pagesize_: The page size parameter works with the page number parameter to control the offset of the records returned in the results. It also controls how many results are returned with each request. Default value is 10
     |      :return: result as json
     |
     |  fulltext_search(self, query_string, from_=None, to_=None)
     |      Start a fulltext search query in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param from_: Optional earliest date bound for the search
     |        in the format YYYYMM.
     |      :param to_: Optional latest date bound for the search
     |        in the format YYYYMM.
     |      :return: None
     |
     |  getSnapshotUrl(self, seed, capture, onlysvg)
     |
     |  histogram_search(self, query_string, interval_=3, from_=None, to_=None)
     |      Start a domain name search in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param page_: The page number parameter works with the page size parameter to control the offset of the records returned in the results. Default value is 1
     |      :param pagesize_: The page size parameter works with the page number parameter to control the offset of the records returned in the results. It also controls how many results are returned with each request. Default value is 10
     |      :return: result as json
     |
     |  savePage(self, url)
     |
     |  status_query(self, resp)
     |      this is the pollingrequest for the given typen of request
     |
     |      :param response: String to search for
     |      :return: response
     |
     |  waitForResponse(self, response)
     |      Polls until the server responds with a result
     |
     |      :param response: String to search for
     |      :return: response
     |
     |  wayback_search(self, query_string, from_=None, to_=None)
     |      Start a wayback search query in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param from_: Optional earliest date bound for the search
     |        in the format YYYYMM.
     |      :param to_: Optional latest date bound for the search
     |        in the format YYYYMM.
     |      :return: None
     |
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |
     |  __dict__
     |      dictionary for instance variables (if defined)
     |
     |  __weakref__
     |      list of weak references to the object (if defined)
     |
     |  api_path
     |      Protocol, domain and path prefix for the Webarchive API,
     |      with a single positional format string placeholder
     |      for the REST operation and parameters.
     |
     |  base_url
     |      Protocol, domain and path prefix for the Webarchive API,
     |      with a single positional format string placeholder
     |      for the REST operation and parameters.
     |
     |  version
     |      Current protocol version
    

%% Cell type:markdown id: tags:

More samples using `webarchiv.py`:

[https://labs.onb.ac.at/gitlab/labs-team/webarchive-api](https://labs.onb.ac.at/gitlab/labs-team/webarchive-api)
+15 −3
Original line number Diff line number Diff line
import sys
import time
import requests
import uuid
import hashlib
from requests import HTTPError

_datetime_format_string = '%Y%m%d%H%M%S'
@@ -45,8 +47,9 @@ class WebarchivSession:
        """
        return 'HTTP ERROR - status code {status_code}\n----\n{response_text}\n----\n\n'

    def __init__(self, api_key):
    def __init__(self, api_key, allow_tracking=False):
        self.api_key = api_key
        self.allow_tracking = allow_tracking
        self.token = None

    def connect(self):
@@ -59,12 +62,21 @@ class WebarchivSession:
            self._display_http_error(e)

    def _authenticate(self):
        if self.allow_tracking:
            from uuid import getnode as get_mac
            mac = get_mac()
            sha256 = hashlib.sha256()
            sha256.update(str(mac).encode('utf-8'))
            fingerprint = sha256.hexdigest()
        else:
            fingerprint = ''

        r = requests.post(self.base_url.format('authentication'),
                          data='''{{
                              "apikey": "{api_key}",
                              "fingerprint": "string",
                              "fingerprint": "{fingerprint}",
                              "version": "{version}"
                          }}'''.format(api_key=self.api_key, version=self.version),
                          }}'''.format(api_key=self.api_key, version=self.version, fingerprint=fingerprint),
                          headers={
                              'content-type': 'application/json',
                              'accept': 'application/ld+json'