Commit 7148422a authored by Stefan Karner's avatar Stefan Karner
Browse files

Add stubs for 4.2 and 4.3; start 4.4; start 3.2

parent 2158121e
Loading
Loading
Loading
Loading
+26 −0
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# Free Stuff For Devs

**Use Images, Text, Webarchive and Catalogue Data from the Austrian National Library in Jupyter Notebooks**

**Workshop 2019-05-03 - PyDays19**

[https://labs.onb.ac.at](https://labs.onb.ac.at)

*Georg Petz, Stefan Karner - Austrian National Library*

%% Cell type:markdown id: tags:

# 1 - Overview

[https://labs.onb.ac.at](https://labs.onb.ac.at)

%% Cell type:markdown id: tags:

### What's gonna happen here?

* Part 1: Overview
  * What's this all about?
  * Who are these clowns?
  * What do I need?
* Part 2: Metadata and Catalogue
* Part 3: Images and Text
* Part 4: Webarchive

%% Cell type:code id: tags:

``` python
```
+98 −17
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# 3 - Images and Text

[https://labs.onb.ac.at/en/tool/sacha/](https://labs.onb.ac.at/en/tool/sacha/)

[https://labs.onb.ac.at/en/dataset/akon/](https://labs.onb.ac.at/en/dataset/akon/)

[https://labs.onb.ac.at/en/dataset/anno/](https://labs.onb.ac.at/en/dataset/anno/)

%% Cell type:markdown id: tags:

In this block:
### In this block

* Overview IIIF
* Overview OCR formats

%% Cell type:markdown id: tags:

* What is IIIF?
    * International Image Interoperability Framework ([http://iiif.io/](http://iiif.io/))
    *  standardised method of describing and delivering images over the web
    * community that develops APIs and implements them in Software
* Example: Create IIIF collection from SPARQL query result
* Example: Download pre-downsized images for machine learning
* Example: Download OCR text

%% Cell type:markdown id: tags:

![iiif apis](./media/api_puzzle_pieces.png)
## Overview IIIF

[http://iiif.io/](http://iiif.io/)

%% Cell type:markdown id: tags:

* IIIF Image API
    * requesting and delivering images on the Web
    * [https://iiif.io/api/image/2.1/](https://iiif.io/api/image/2.1/)
    * Image Request URI Syntax
        * {scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}
        * [http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/0/native.jpg](http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/0/native.jpg)
        * 90 degree rotation: [http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/90/native.jpg](http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/90/native.jpg)
        * "Articulus Quartus": [https://iiif.onb.ac.at/images/ABO/Z196807705/00000009/pct:0,0,100,33/full/0/native.jpg](https://iiif.onb.ac.at/images/ABO/Z196807705/00000009/pct:0,0,100,33/full/0/native.jpg)
### What is IIIF?

%% Cell type:markdown id: tags:

* International Image Interoperability Framework ([http://iiif.io/](http://iiif.io/) - well written, worth a read)
* Standardised method of **describing and delivering images over the web**
* Community that develops APIs and implements them in Software

%% Cell type:markdown id: tags:

<img src="./media/api_puzzle_pieces.png" style="max-height: 500px;" />

*Image courtesy of [https://github.com/IIIF/training](https://github.com/IIIF/training), CC-BY 4.0*

%% Cell type:markdown id: tags:

### Why would I use this?

%% Cell type:markdown id: tags:

#### If you want to display images

* If you want to use one of several nice viewers for images (zoom, rotate, fullscreen ootb)
* If you want to include image data hosted elsewhere

%% Cell type:markdown id: tags:

#### If you want to process images

* If you want structured access to potentially huge sets of images
* If you want included metadata
* If you want to resize images *before* downloading

%% Cell type:markdown id: tags:

### How would I use this?

%% Cell type:markdown id: tags:

* You could access an **image** directly (Image API)
  * Parameters can be changed in the URL
  * [https://iiif.onb.ac.at/images/AKON/AK035_199/199/full/full/0/native.jpg](https://iiif.onb.ac.at/images/AKON/AK035_199/199/full/full/0/native.jpg)


%% Cell type:markdown id: tags:

* You could get a **manifest** JSON (Presentation API)
  * Contains images and metadata
  * [https://iiif.onb.ac.at/presentation/AKON/AK035_199/manifest/](https://iiif.onb.ac.at/presentation/AKON/AK035_199/manifest/)


%% Cell type:markdown id: tags:

* You could get a **collection** JSON (Presentation API)
  * Contains manifests and possibly other collections
  * [https://iiif.onb.ac.at/presentation/collection/pydays19](https://iiif.onb.ac.at/presentation/collection/pydays19)

%% Cell type:markdown id: tags:

### Pics or didn't happen!

%% Cell type:markdown id: tags:

* The ONB Labs viewers use IIIF: [https://labs.onb.ac.at/en/dataset/akon/](https://labs.onb.ac.at/en/dataset/akon/)

**TODO**: Available viewers, available data sources (europeana, ?), applications

%% Cell type:code id: tags:

``` python
```

%% Cell type:markdown id: tags:


%% Cell type:markdown id: tags:


%% Cell type:markdown id: tags:

* IIIF Presentation API
    * returns JSON-LD structured documents that together describe the structure and layout of a digitized object or other collection of images and related content
    * [https://iiif.io/api/presentation/2.1/](https://iiif.io/api/presentation/2.1/)
    * [https://iiif.onb.ac.at/presentation/ABO/+Z196807705/manifest/](https://iiif.onb.ac.at/presentation/ABO/+Z196807705/manifest/)


%% Cell type:markdown id: tags:

* Example download OCR text
* Example download pre-resized images for machine learning
* Example create IIIF collection from SPARQL query result
* IIIF Image API
    * requesting and delivering images on the Web
    * [https://iiif.io/api/image/2.1/](https://iiif.io/api/image/2.1/)
    * Image Request URI Syntax
        * {scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}
        * [http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/0/native.jpg](http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/0/native.jpg)
        * 90 degree rotation: [http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/90/native.jpg](http://iiif.onb.ac.at/images/ABO/Z196807705/00000009/full/full/90/native.jpg)
        * "Articulus Quartus": [https://iiif.onb.ac.at/images/ABO/Z196807705/00000009/pct:0,0,100,33/full/0/native.jpg](https://iiif.onb.ac.at/images/ABO/Z196807705/00000009/pct:0,0,100,33/full/0/native.jpg)


%% Cell type:code id: tags:

``` python
```
+643 −0

File added.

Preview size limit exceeded, changes collapsed.

+8 −3
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# 4 - Webarchive

[https://webarchiv.onb.ac.at](https://webarchiv.onb.ac.at)

[https://labs.onb.ac.at/dataset/webarchive/](https://labs.onb.ac.at/dataset/webarchive/)

%% Cell type:markdown id: tags:

### In this block

* Overview Webarchive
* Overview Content
* Overview API

%% Cell type:markdown id: tags:

* Example Wayback search via API
* Example full text search via API
* Example download preview SVG thumb of saved page
* Example: Interacting with the API
* Example: Wayback search via API
* Example: Full text search via API
* Example: Download preview SVG thumb of saved page

%% Cell type:markdown id: tags:

## Overview Webarchive

[https://webarchiv.onb.ac.at](https://webarchiv.onb.ac.at)

%% Cell type:markdown id: tags:

![ÖNB Webarchive Terminal](https://webarchiv.onb.ac.at/web/20170925041718/https://webarchiv.onb.ac.at/img/webarchiv_terminal1.jpg)

%% Cell type:markdown id: tags:

### What is the Webarchive Austria?

%% Cell type:markdown id: tags:

* Attempt to conserve online data for future generations
* Webarchive Austria crawls officially since March 2009
* All domains within `.at`, `.ac.at`, `.gv.at`, `.wien`, `.tirol`
* Selected other domains with 'Austrian content'
* About 2 million websites saved

%% Cell type:markdown id: tags:

### Who is the Webarchive Austria?

%% Cell type:markdown id: tags:

* Andreas Predikaka
* webarchiv@onb.ac.at

%% Cell type:markdown id: tags:

### What can I use?

%% Cell type:markdown id: tags:

* Websites: no public access
  * Access on premises at the ÖNB
  * Exception: onb.ac.at
* Metadata: public access
* Full text search: public access
  * Viewing the results of the full text search: no public access

%% Cell type:markdown id: tags:

### What does that mean?

%% Cell type:markdown id: tags:

* Searching outside the ÖNB gives you URLs, doesn't give you page content

%% Cell type:markdown id: tags:

### What if I really really need to see the content?

%% Cell type:markdown id: tags:

* You can come to the ÖNB in person and use one of two offline computers...

%% Cell type:markdown id: tags:

* ...to `PRINT OUT THE INTERNET!`

%% Cell type:markdown id: tags:

![Office folders labeled 'Internet'](./media/internet-folders.jpg)

%% Cell type:markdown id: tags:

### How is a search where I don't see detailed results useful to me?

%% Cell type:markdown id: tags:

* Sometimes the content is still online
* Sometimes the Internet Archive has a copy
* You can observe the emergence of certain terms ('Westbalkanroute', 'Soldatna')

%% Cell type:markdown id: tags:

* ???

%% Cell type:markdown id: tags:

## Overview Content

[https://webarchiv.onb.ac.at](https://webarchiv.onb.ac.at)

%% Cell type:markdown id: tags:

### What's inside?

%% Cell type:markdown id: tags:

* High crawl frequency (daily or weekly)
  * Media sites (ORF)
  * Political parties
* Low crawl frequency (a few times per year)
  * Topic: Gender
  * Austrian domains (via nic.at)
* Event crawls (daily or weekly within a certain timespan)
  * Elections
  * Olympia
  * Refugee crisis 2015
  * Song Contest 2015

%% Cell type:markdown id: tags:

### Can I have a list?

%% Cell type:markdown id: tags:

* Sure, there you go:
  * Media, political, gender: [https://webarchiv.onb.ac.at/data/selective.json](https://webarchiv.onb.ac.at/data/selective.json)
  * Events: [https://webarchiv.onb.ac.at/data/events.json](https://webarchiv.onb.ac.at/data/events.json)
  * All domains: [https://webarchiv.onb.ac.at/data/domainnames.json](https://webarchiv.onb.ac.at/data/domainnames.json)

```json
[
  {
    "id": 37,
    "name": "Frau/Gender",
    "begin": "29.11.2016",
    "groups": [
      {
        "seeds": [
          "http://abtreibung.at/"
        ],
        "group_id": 1,
        "name": "Abtreibung.at"
      },
      {
        "seeds": [
          "http://aep.at"
        ],
        "group_id": 2,
        "name": "Arbeitskreis Emanzipation Partnerschaft"
      },```

%% Cell type:markdown id: tags:

### How big is the Austrian Webarchive?

%% Cell type:markdown id: tags:

* About 500GiB indexed text
* About 100 million HTML documents
* Raw data: 115.28TiB uncompressed

%% Cell type:markdown id: tags:

### Where's the catch?

%% Cell type:markdown id: tags:

* Social media is currently too hard to crawl
* Limited disk space necessitates a size limit per page
  * Ex: domain crawl 10MB -> 100MB -> 7GB
* Limitations of public access
  * Practically every webarchive except the Internet Archive

%% Cell type:markdown id: tags:

## Overview API

[https://webarchiv.onb.ac.at](https://webarchiv.onb.ac.at)

%% Cell type:markdown id: tags:

### How can I access the Austrian Webarchive?

%% Cell type:markdown id: tags:

* On site at the ONB
* Online: [https://webarchiv.onb.ac.at](https://webarchiv.onb.ac.at)
* REST API: [https://webarchiv.onb.ac.at/api.html](https://webarchiv.onb.ac.at/api.html)
  * Swagger definition: [https://webarchiv.onb.ac.at/api/swagger.json](https://webarchiv.onb.ac.at/api/swagger.json)
* Python module for easier access: [https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/blob/master/webarchiv.py](https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/blob/master/webarchiv.py) ([raw](https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/raw/master/webarchiv.py?inline=false))

%% Cell type:markdown id: tags:

### Why is access via API useful?

%% Cell type:markdown id: tags:

* Individual searches may take up to 1 minute
* Sift through loads of metadata
* API-only goodies
  * Easily nominate pages with Austrian content to be saved
  * Download SVG thumbnails of rendered websites
* It's way more fun

%% Cell type:markdown id: tags:

* Make Andreas happy :)
+4 −4
Original line number Diff line number Diff line
%% Cell type:markdown id: tags:

# 4.1 - Webarchive - Interacting With The API

*Tools for accessing the Webarchive API*

%% Cell type:markdown id: tags:

* Variant 1: Exploring the API manually
* Variant 2: Generate Code from Swagger JSON
* Variant 3: Use Swagger JSON dynamically
* Variant 4: Use `webarchiv.py` from the ONB Labs

%% Cell type:markdown id: tags:

The documentation is available under [https://webarchiv.onb.ac.at/api.html#](https://webarchiv.onb.ac.at/api.html#).

%% Cell type:code id: tags:

``` python
API_KEY = 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ'
```

%% Cell type:markdown id: tags:

## Variant 1: Exploring the API manually

Take a look at [https://webarchiv.onb.ac.at/api.html#/](https://webarchiv.onb.ac.at/api.html#/) and try it out.

%% Cell type:code id: tags:

``` python
import requests

BASE_URL = 'https://webarchiv.onb.ac.at/api'
```

%% Cell type:markdown id: tags:

Let's take a look at `/welcome`

%% Cell type:code id: tags:

``` python
r = requests.get(f'{BASE_URL}/welcome')
r.json()
```

%% Output

    {'@context': 'http://schema.org/',
     '@type': 'WebAPI',
     'name': 'Webarchive Austria Search API',
     'version': '0.1.0',
     'description': 'The Webarchive Austria Search API lets you find archived webpages by Fulltext or URL. The API uses standard schema.org types and is compliant with the JSON-LD specification.',
     'documentation': 'https://webarchiv.onb.ac.at/api.html',
     'provider': {'@type': 'Organization',
      'name': 'Austrian National Library',
      'contactPoint': [{'@type': 'ContactPoint',
        'name': 'Webarchive Austria',
        'url': 'https://webarchiv.onb.ac.at'}]},
     'versions': ['0.1.0'],
     'license': 'https://creativecommons.org/publicdomain/mark/1.0/',
     'transport': 'HTTP',
     'apiProtocol': 'JSON API',
     'webApiDefinitions': [{'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/authenticate',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'POST'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/domainname',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/domainname?q={q}&page={page}&pagesize={pagesize}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/fulltext',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/fulltext?q={q}&from={from}&to={to}&maxaggs={maxaggs}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/fulltext/seed',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/fulltext/seed?q={q}&g={g}&from={from}&to={to}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/fulltext/capture',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/fulltext/capture?q={q}&g={g}&from={from}&to={to}&page={page}&pagesize={pagesize}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/search/wayback',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/search/wayback?q={q}&from={from}&to={to}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/status/fulltext',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/status/fulltext?requestid={requestid}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/status/wayback',
       'urlTemplate': 'https://webarchiv.onb.ac.at/api/status/wayback?requestid={requestid}&t={t}&apikey={apikey}',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'GET'},
      {'@type': 'EntryPoint',
       'url': 'https://webarchiv.onb.ac.at/api/status/kill',
       'encodingType': 'application/json',
       'contentType': 'application/ld+json',
       'httpMethod': 'DELETE'}]}

%% Cell type:markdown id: tags:

We need a fingerprint and a valid API key.
A key has been generated for PyDays19.

%% Cell type:code id: tags:

``` python
import uuid

FINGERPRINT = str(uuid.uuid4())
API_KEY = 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ'

FINGERPRINT, API_KEY
```

%% Output

    ('00b5b6ec-ca35-4345-b1e2-82d6dd99c05a', 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ')
    ('c941f5c6-c97b-4f75-bf7d-5419df62cf5f', 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ')

%% Cell type:markdown id: tags:

We need to authenticate first in order to get a valid token.

%% Cell type:code id: tags:

``` python
auth_r = requests.post(f'{BASE_URL}/authentication', json={
    'apikey': API_KEY,
    'version': '0.1.0',
    'fingerprint': FINGERPRINT
})
auth_r.status_code
```

%% Output

    201

%% Cell type:code id: tags:

``` python
auth_r.json()
```

%% Output

    {'@context': 'https://webarchiv.onb.ac.at/contexts/authenticate.jsonld',
     'apikey': 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ',
     'fingerprint': '00b5b6ec-ca35-4345-b1e2-82d6dd99c05a',
     'timestamp': 1555515593482,
     't': 'ff58c39dfde2639849c901388fbcf959132dea2d',
     'fingerprint': 'c941f5c6-c97b-4f75-bf7d-5419df62cf5f',
     'timestamp': 1556089763561,
     't': '9defd49246b9e8c36202ce33d6a43e268530996a',
     'version': '0.1.0'}

%% Cell type:code id: tags:

``` python
token = auth_r.json()['t']
```

%% Cell type:markdown id: tags:

Now we can submit other requests, a search for example.

%% Cell type:code id: tags:

``` python
search_r = requests.get(f'{BASE_URL}/search/domainname', params={
    'apikey': API_KEY,
    't': token,
    'q': 'wien'
})
search_r.status_code
```

%% Output

    200

%% Cell type:code id: tags:

``` python
search_r.json()
```

%% Output

    {'hits': [{'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieno.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiengut.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienmed.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwin.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiental.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieners.wien'}],
     'searchstring': 'wien',
     'context': 'https://webarchiv.onb.ac.at/contexts/domainnamesearchresult.jsonld',
     'requestid': '',
     'message': '',
     'returncode': 0,
     'total': 35101,
     'type': 1,
     'took': 427,
     'version': '0.1.0'}

%% Cell type:markdown id: tags:

## Variant 2: Generate Code from Swagger JSON

We use the online generator at [https://generator.swagger.io/](https://generator.swagger.io/).

%% Cell type:code id: tags:

``` python
import io
import zipfile
import shutil

def generate_swagger_client():
    # Generate Python Client
    generated_r = requests.post('https://generator.swagger.io/api/gen/clients/python', json={
        'swaggerUrl': 'https://webarchiv.onb.ac.at/api/swagger.json'
    })
    generated_r.raise_for_status()
    link = generated_r.json()['link']
    # Download ZIP with generated client
    zip_r = requests.get(link)
    zip_r.raise_for_status()
    # Open and extract
    zip_file = zipfile.ZipFile(io.BytesIO(zip_r.content))
    zip_file.extractall()
    # Move package to working directory and clean up
    shutil.move('python-client/swagger_client', 'swagger_client')
    shutil.rmtree('python-client')
```

%% Cell type:code id: tags:

``` python
import swagger_client
```

%% Cell type:markdown id: tags:

Set base URL

%% Cell type:code id: tags:

``` python
client = swagger_client.ApiClient()
client.configuration.host = 'https://webarchiv.onb.ac.at/api'
```

%% Cell type:markdown id: tags:

Authenticate

%% Cell type:code id: tags:

``` python
auth_obj = swagger_client.Authenticate(apikey=API_KEY, fingerprint=str(uuid.uuid4()))
aa = swagger_client.AuthenticationApi(client)
auth_r = aa.authenticate(body=auth_obj)
auth_r
```

%% Output

    {'apikey': 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ',
     'fingerprint': '635fbeae-50d5-4df7-8372-7bc93bcbec74',
     't': 'b831ef03103dd7bb74838e0678e7d2bf2aaef809',
     'timestamp': 1555515615761,
     'version': '0.1.0'}

%% Cell type:code id: tags:

``` python
token = auth_r.t
```

%% Cell type:markdown id: tags:

Search for domain name

%% Cell type:code id: tags:

``` python
search_api = swagger_client.SearchApi(client)
search_r = search_api.search_domainname(q='wien', t=token, apikey=API_KEY)
```

%% Cell type:code id: tags:

``` python
search_r
```

%% Output

    {'hits': [{'value': 'wieno.wien'},
              {'value': 'wien.wien'},
              {'value': 'wien1.wien'},
              {'value': 'wiener.wien'},
              {'value': 'wien-wien.at'},
              {'value': 'wiengut.wien'},
              {'value': 'wienmed.wien'},
              {'value': 'wienwin.wien'},
              {'value': 'wiental.wien'},
              {'value': 'wieners.wien'}],
     'message': '',
     'requestid': '',
     'returncode': 0,
     'searchstring': 'wien',
     'took': 615,
     'total': 35101,
     'type': 1,
     'version': '0.1.0'}

%% Cell type:markdown id: tags:

## Variant 3: Use Swagger JSON dynamically

Uses package [`pyswagger`](https://github.com/pyopenapi/pyswagger)

%% Cell type:code id: tags:

``` python
from pyswagger import App
from pyswagger.contrib.client.requests import Client
from pyswagger.utils import jp_compose
```

%% Cell type:markdown id: tags:

Create client and app

%% Cell type:code id: tags:

``` python
app = App.create(url='https://webarchiv.onb.ac.at/api/swagger.json')
client = Client()
```

%% Cell type:markdown id: tags:

Add missing support for JSON-LD

%% Cell type:code id: tags:

``` python
app.mime_codec.register('application/ld+json', app.mime_codec._codecs['application/json'])
```

%% Cell type:markdown id: tags:

List operations

%% Cell type:code id: tags:

``` python
app.op
```

%% Output

    {'welcome!##!welcome': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8033160>,
     'snapshot!##!getSnapshot': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021eb8>,
     'search!##!searchhistogram': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021be0>,
     'search!##!searchcapturegroup': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021898>,
     'search!##!searchdomaingroup': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80215c0>,
     'search!##!searchDomainname': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8021320>,
     'search!##!killSearchRequest': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80210f0>,
     'search!##!getWaybackCalheatmapSearchRequestStatus': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010dd8>,
     'search!##!getFulltextsearchRequestStatus': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010b00>,
     'search!##!searchWayback': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010860>,
     'search!##!searchFulltext': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc8010588>,
     'savepage!##!send': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80103c8>,
     'authentication!##!authenticate': <pyswagger.spec.v2_0.objects.Operation at 0x7f9fc80100b8>}

%% Cell type:markdown id: tags:

Authenticate

%% Cell type:code id: tags:

``` python
r = client.request(app.op['authenticate'](body={
    'apikey': API_KEY,
    'fingerprint': '1234'
}))
r.status
```

%% Output

    201

%% Cell type:code id: tags:

``` python
r.data
```

%% Output

    {'apikey': 'wGdLmWMlaM2V6j73V9zS0KHqBgfG67vJ',
     'fingerprint': '1234',
     'timestamp': 1555515632891,
     't': '7cf715f4487b1ace3eacf19bf3febda27f854819',
     'version': '0.1.0',
     '@context': 'https://webarchiv.onb.ac.at/contexts/authenticate.jsonld'}

%% Cell type:code id: tags:

``` python
token = r.data['t']
```

%% Cell type:markdown id: tags:

Search for domain name

%% Cell type:code id: tags:

``` python
r = client.request(app.op['searchDomainname'](
    apikey=API_KEY,
    t=token,
    q='wien'
))
r.status
```

%% Output

    200

%% Cell type:code id: tags:

``` python
r.data
```

%% Output

    {'hits': [{'value': 'wieno.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wien.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wien1.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wiener.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wien-wien.at',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wiengut.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wienmed.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wienwin.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wiental.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'},
      {'value': 'wieners.wien',
       'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld'}],
     'searchstring': 'wien',
     'requestid': '',
     'message': '',
     'returncode': 0,
     'total': 35101,
     'type': 1,
     'took': 37,
     'version': '0.1.0',
     'context': 'https://webarchiv.onb.ac.at/contexts/domainnamesearchresult.jsonld'}

%% Cell type:markdown id: tags:

## Variant 4: Use webarchiv.py from the ONB Labs

`webarchiv.py` is part of this repository. It makes extensive use of `requests`.

If you need the direct download link:

[https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/raw/master/webarchiv.py?inline=false](https://labs.onb.ac.at/gitlab/labs-team/webarchive-api/raw/master/webarchiv.py?inline=false)

%% Cell type:code id: tags:

``` python
import webarchiv
```

%% Cell type:markdown id: tags:

Authentication is automatic

%% Cell type:code id: tags:

``` python
session = webarchiv.WebarchivSession(API_KEY)
```

%% Cell type:markdown id: tags:

Search for domain name

%% Cell type:code id: tags:

``` python
r = session.domain_name_search('wien')
r.status_code
```

%% Output

    200

%% Cell type:code id: tags:

``` python
r.json()
```

%% Output

    {'hits': [{'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieno.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiengut.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienmed.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwin.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiental.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieners.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien24.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-m.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'f-u-c-k.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'b-z.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-d.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'm-k.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 's-v-h.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'v-i-p.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'a-z.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'i.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u-4.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'v-1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'p-7.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'z-u-g.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u-d-o.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'f-w.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 's-k.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-i-p.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'akh-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'gkk-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'hno-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'seo-wien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienfoto.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wientaxi.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwahl.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wieninfo.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener-gkk.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienwert.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienview.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerin.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u5.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u2.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u6.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u4.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u3.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'a1.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienclean.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienergkk.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienliebe.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienguide.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienkarte.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienscout.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienhotel.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienfluss.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-haus.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-wahl.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'f2f.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'e4b.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'a2z.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'b2b.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-2.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-6.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-7.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'm2m.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'm4j.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'c-sk.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-3.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-9.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'b-it.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'se-a.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-1.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-4.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-8.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'c2b.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'h-a-c-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': '24-7.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'e-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-5.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'i2c.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'u-wien.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wien-x.at'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienbibliothek.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener-biene.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiener-madln.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienergebietskrankenkassegesundheitsverbund.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerjugendstil.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerlinien.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerphilharmoniker.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerstaedtischeversicherung.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienfuehrung.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wiendomain.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienergesundheitsverbund.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienernaschmarkt.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienersalon.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerwein.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerwirtschaft.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienerwohnen.wien'},
      {'context': 'https://webarchiv.onb.ac.at/contexts/dnhit.jsonld',
       'value': 'wienflughafentaxi.wien'}],
     'searchstring': 'wien',
     'context': 'https://webarchiv.onb.ac.at/contexts/domainnamesearchresult.jsonld',
     'requestid': '',
     'message': '',
     'returncode': 0,
     'total': 35101,
     'type': 1,
     'took': 138,
     'version': '0.1.0'}

%% Cell type:markdown id: tags:

Available access methods for `WebarchivSession`

%% Cell type:code id: tags:

``` python
help(session)
```

%% Output

    Help on WebarchivSession in module webarchiv object:
    
    class WebarchivSession(builtins.object)
     |  WebarchivSession(api_key)
     |
     |  Methods defined here:
     |
     |  __init__(self, api_key)
     |      Initialize self.  See help(type(self)) for accurate signature.
     |
     |  connect(self)
     |      Connect to the Webarchive API, request and save a token.
     |
     |  domain_name_search(self, query_string, page_=1, pagesize_=100)
     |      Start a domain name search in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param page_: The page number parameter works with the page size parameter to control the offset of the records returned in the results. Default value is 1
     |      :param pagesize_: The page size parameter works with the page number parameter to control the offset of the records returned in the results. It also controls how many results are returned with each request. Default value is 10
     |      :return: result as json
     |
     |  fulltext_search(self, query_string, from_=None, to_=None)
     |      Start a fulltext search query in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param from_: Optional earliest date bound for the search
     |        in the format YYYYMM.
     |      :param to_: Optional latest date bound for the search
     |        in the format YYYYMM.
     |      :return: None
     |
     |  getSnapshotUrl(self, seed, capture, onlysvg)
     |
     |  histogram_search(self, query_string, interval_=3, from_=None, to_=None)
     |      Start a domain name search in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param page_: The page number parameter works with the page size parameter to control the offset of the records returned in the results. Default value is 1
     |      :param pagesize_: The page size parameter works with the page number parameter to control the offset of the records returned in the results. It also controls how many results are returned with each request. Default value is 10
     |      :return: result as json
     |
     |  savePage(self, url)
     |
     |  status_query(self, resp)
     |      this is the pollingrequest for the given typen of request
     |
     |      :param response: String to search for
     |      :return: response
     |
     |  waitForResponse(self, response)
     |      Polls until the server responds with a result
     |
     |      :param response: String to search for
     |      :return: response
     |
     |  wayback_search(self, query_string, from_=None, to_=None)
     |      Start a wayback search query in the Webarchive.
     |      The current status of running queries can be read via status_open_queries().
     |
     |      :param query_string: String to search for
     |      :param from_: Optional earliest date bound for the search
     |        in the format YYYYMM.
     |      :param to_: Optional latest date bound for the search
     |        in the format YYYYMM.
     |      :return: None
     |
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |
     |  __dict__
     |      dictionary for instance variables (if defined)
     |
     |  __weakref__
     |      list of weak references to the object (if defined)
     |
     |  api_path
     |      Protocol, domain and path prefix for the Webarchive API,
     |      with a single positional format string placeholder
     |      for the REST operation and parameters.
     |
     |  base_url
     |      Protocol, domain and path prefix for the Webarchive API,
     |      with a single positional format string placeholder
     |      for the REST operation and parameters.
     |
     |  version
     |      Current protocol version
    

%% Cell type:markdown id: tags:

More samples using `webarchiv.py`:

[https://labs.onb.ac.at/gitlab/labs-team/webarchive-api](https://labs.onb.ac.at/gitlab/labs-team/webarchive-api)
Loading