Skip to content
AKON Data Overview.ipynb 60.2 KiB
Newer Older
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AKON Metadata - Data Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Get a first impression of the postcard metadata*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the [Pandas Python Data Analysis Library](https://pandas.pydata.org/).\n",
    "\n",
    "For an intro to pandas feel free to take a look at this [Workshop for CBioVikings](https://github.com/dblyon/PandasIntro) by David Lyon."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`df` stands for *Data Frame*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/kst/tmp/dingsdi/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3049: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.\n",
      "  interactivity=interactivity, compiler=compiler, result=result)\n"
     ]
    }
   ],
    "df = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2', compression='bz2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## View Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Rough Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How much datasets are in there?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What does a dataset look like?\n",
    "Show me the first one!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>akon_id</th>\n",
       "      <th>id</th>\n",
       "      <th>altitude</th>\n",
       "      <th>building</th>\n",
       "      <th>city</th>\n",
       "      <th>color</th>\n",
       "      <th>comment</th>\n",
       "      <th>mountain</th>\n",
       "      <th>other</th>\n",
       "      <th>...</th>\n",
       "      <th>geoname_id</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "      <th>name</th>\n",
       "      <th>country_id</th>\n",
       "      <th>admin_name_1</th>\n",
       "      <th>admin_code_1</th>\n",
       "      <th>geo</th>\n",
       "      <th>download_link</th>\n",
       "      <th>download_link_256x256</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>AK111_021</td>\n",
       "      <td>74682</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Kiel, Blücherplatz</td>\n",
       "      <td>False</td>\n",
       "      <td>1921 gel</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>2891122.0</td>\n",
       "      <td>54.32133</td>\n",
       "      <td>10.13489</td>\n",
       "      <td>Kiel</td>\n",
       "      <td>DE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>54.32133, 10.13489</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 32 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0    akon_id     id  altitude building                city  color  \\\n",
       "0           0  AK111_021  74682       NaN      NaN  Kiel, Blücherplatz  False   \n",
       "\n",
       "    comment mountain other  ... geoname_id  latitude longitude  name  \\\n",
       "0  1921 gel      NaN   NaN  ...  2891122.0  54.32133  10.13489  Kiel   \n",
       "  country_id  admin_name_1 admin_code_1                 geo  \\\n",
       "0         DE           NaN          NaN  54.32133, 10.13489   \n",
       "                                       download_link  \\\n",
       "0  https://iiif.onb.ac.at/images/AKON/AK111_021/0...   \n",
       "                               download_link_256x256  \n",
       "0  https://iiif.onb.ac.at/images/AKON/AK111_021/0...  \n",
       "[1 rows x 32 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There seem to be a few columns missing from the output. Let's fix that by setting pandas output options:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.set_option('display.max_columns', 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's try again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>akon_id</th>\n",
       "      <th>id</th>\n",
       "      <th>altitude</th>\n",
       "      <th>building</th>\n",
       "      <th>city</th>\n",
       "      <th>color</th>\n",
       "      <th>comment</th>\n",
       "      <th>mountain</th>\n",
       "      <th>other</th>\n",
       "      <th>photographer</th>\n",
       "      <th>publisher</th>\n",
       "      <th>publisher_place</th>\n",
       "      <th>region</th>\n",
       "      <th>water_body</th>\n",
       "      <th>year</th>\n",
       "      <th>inventory_number</th>\n",
       "      <th>signature</th>\n",
       "      <th>revision_date</th>\n",
       "      <th>date</th>\n",
       "      <th>feature_class</th>\n",
       "      <th>feature_code</th>\n",
       "      <th>geoname_id</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "      <th>name</th>\n",
       "      <th>country_id</th>\n",
       "      <th>admin_name_1</th>\n",
       "      <th>admin_code_1</th>\n",
       "      <th>geo</th>\n",
       "      <th>download_link</th>\n",
       "      <th>download_link_256x256</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>AK111_021</td>\n",
       "      <td>74682</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Kiel, Blücherplatz</td>\n",
       "      <td>False</td>\n",
       "      <td>1921 gel</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Geogr. Topogr. Bilder-Samml. 1943, 7735</td>\n",
       "      <td>2014-09-05 10:13:06.342</td>\n",
       "      <td>gelaufen 1921</td>\n",
       "      <td>P</td>\n",
       "      <td>PPLA</td>\n",
       "      <td>2891122.0</td>\n",
       "      <td>54.32133</td>\n",
       "      <td>10.13489</td>\n",
       "      <td>Kiel</td>\n",
       "      <td>DE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>54.32133, 10.13489</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0    akon_id     id  altitude building                city  color  \\\n",
       "0           0  AK111_021  74682       NaN      NaN  Kiel, Blücherplatz  False   \n",
       "    comment mountain other photographer publisher publisher_place region  \\\n",
       "0  1921 gel      NaN   NaN          NaN       NaN             NaN    NaN   \n",
       "  water_body  year inventory_number                                signature  \\\n",
       "0        NaN   NaN              NaN  Geogr. Topogr. Bilder-Samml. 1943, 7735   \n",
       "\n",
       "             revision_date           date feature_class feature_code  \\\n",
       "0  2014-09-05 10:13:06.342  gelaufen 1921             P         PPLA   \n",
       "\n",
       "   geoname_id  latitude  longitude  name country_id admin_name_1 admin_code_1  \\\n",
       "0   2891122.0  54.32133   10.13489  Kiel         DE          NaN          NaN   \n",
       "                  geo                                      download_link  \\\n",
       "0  54.32133, 10.13489  https://iiif.onb.ac.at/images/AKON/AK111_021/0...   \n",
       "                               download_link_256x256  \n",
       "0  https://iiif.onb.ac.at/images/AKON/AK111_021/0...  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we see all columns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are all the columns called again?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Unnamed: 0', 'akon_id', 'id', 'altitude', 'building', 'city', 'color',\n",
       "       'comment', 'mountain', 'other', 'photographer', 'publisher',\n",
       "       'publisher_place', 'region', 'water_body', 'year', 'inventory_number',\n",
       "       'signature', 'revision_date', 'date', 'feature_class', 'feature_code',\n",
       "       'geoname_id', 'latitude', 'longitude', 'name', 'country_id',\n",
       "       'admin_name_1', 'admin_code_1', 'geo', 'download_link',\n",
       "       'download_link_256x256'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Show Random Entries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Show me 3 random entries:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>akon_id</th>\n",
       "      <th>id</th>\n",
       "      <th>altitude</th>\n",
       "      <th>building</th>\n",
       "      <th>city</th>\n",
       "      <th>color</th>\n",
       "      <th>comment</th>\n",
       "      <th>mountain</th>\n",
       "      <th>other</th>\n",
       "      <th>photographer</th>\n",
       "      <th>publisher</th>\n",
       "      <th>publisher_place</th>\n",
       "      <th>region</th>\n",
       "      <th>water_body</th>\n",
       "      <th>year</th>\n",
       "      <th>inventory_number</th>\n",
       "      <th>signature</th>\n",
       "      <th>revision_date</th>\n",
       "      <th>date</th>\n",
       "      <th>feature_class</th>\n",
       "      <th>feature_code</th>\n",
       "      <th>geoname_id</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "      <th>name</th>\n",
       "      <th>country_id</th>\n",
       "      <th>admin_name_1</th>\n",
       "      <th>admin_code_1</th>\n",
       "      <th>geo</th>\n",
       "      <th>download_link</th>\n",
       "      <th>download_link_256x256</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>28908</th>\n",
       "      <td>28908</td>\n",
       "      <td>AK066_086</td>\n",
       "      <td>40120</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Innsbruck, Maria Theresienstrasse</td>\n",
       "      <td>False</td>\n",
       "      <td>1907 gel</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Gratl</td>\n",
       "      <td>Innsbruck</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2014-08-04 07:59:10.424</td>\n",
       "      <td>vor 1907</td>\n",
       "      <td>P</td>\n",
       "      <td>PPLA</td>\n",
       "      <td>2775220.0</td>\n",
       "      <td>47.26266</td>\n",
       "      <td>11.39454</td>\n",
       "      <td>Innsbruck</td>\n",
       "      <td>AT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.26266, 11.39454</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK066_086/0...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK066_086/0...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21317</th>\n",
       "      <td>21317</td>\n",
       "      <td>AK034_386</td>\n",
       "      <td>20303</td>\n",
       "      <td>251.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Gars-Thunau am Kamp</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ledermann</td>\n",
       "      <td>Wien</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2014-08-04 07:59:10.272</td>\n",
       "      <td>1909</td>\n",
       "      <td>P</td>\n",
       "      <td>PPL</td>\n",
       "      <td>2763660.0</td>\n",
       "      <td>48.58333</td>\n",
       "      <td>15.65000</td>\n",
       "      <td>Thunau am Kamp</td>\n",
       "      <td>AT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>48.58333, 15.65</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK034_386/3...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK034_386/3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23201</th>\n",
       "      <td>23201</td>\n",
       "      <td>AK041_572</td>\n",
       "      <td>24699</td>\n",
       "      <td>251.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Gars-Thunau am Kamp</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ledermann</td>\n",
       "      <td>Wien</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2014-08-04 07:59:10.328</td>\n",
       "      <td>1908</td>\n",
       "      <td>P</td>\n",
       "      <td>PPL</td>\n",
       "      <td>2763660.0</td>\n",
       "      <td>48.58333</td>\n",
       "      <td>15.65000</td>\n",
       "      <td>Thunau am Kamp</td>\n",
       "      <td>AT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>48.58333, 15.65</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK041_572/5...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK041_572/5...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Unnamed: 0    akon_id     id  altitude building  \\\n",
       "28908       28908  AK066_086  40120       NaN      NaN   \n",
       "21317       21317  AK034_386  20303     251.0      NaN   \n",
       "23201       23201  AK041_572  24699     251.0      NaN   \n",
       "                                    city  color   comment mountain other  \\\n",
       "28908  Innsbruck, Maria Theresienstrasse  False  1907 gel      NaN   NaN   \n",
       "21317                Gars-Thunau am Kamp   True       NaN      NaN   NaN   \n",
       "23201                Gars-Thunau am Kamp  False       NaN      NaN   NaN   \n",
       "      photographer  publisher publisher_place region water_body    year  \\\n",
       "28908          NaN      Gratl       Innsbruck    NaN        NaN     NaN   \n",
       "21317          NaN  Ledermann            Wien    NaN        NaN  1909.0   \n",
       "23201          NaN  Ledermann            Wien    NaN        NaN  1908.0   \n",
       "      inventory_number signature            revision_date      date  \\\n",
       "28908              NaN       NaN  2014-08-04 07:59:10.424  vor 1907   \n",
       "21317              NaN       NaN  2014-08-04 07:59:10.272      1909   \n",
       "23201              NaN       NaN  2014-08-04 07:59:10.328      1908   \n",
       "      feature_class feature_code  geoname_id  latitude  longitude  \\\n",
       "28908             P         PPLA   2775220.0  47.26266   11.39454   \n",
       "21317             P          PPL   2763660.0  48.58333   15.65000   \n",
       "23201             P          PPL   2763660.0  48.58333   15.65000   \n",
       "                 name country_id admin_name_1 admin_code_1  \\\n",
       "28908       Innsbruck         AT          NaN          NaN   \n",
       "21317  Thunau am Kamp         AT          NaN          NaN   \n",
       "23201  Thunau am Kamp         AT          NaN          NaN   \n",
       "\n",
       "                      geo                                      download_link  \\\n",
       "28908  47.26266, 11.39454  https://iiif.onb.ac.at/images/AKON/AK066_086/0...   \n",
       "21317     48.58333, 15.65  https://iiif.onb.ac.at/images/AKON/AK034_386/3...   \n",
       "23201     48.58333, 15.65  https://iiif.onb.ac.at/images/AKON/AK041_572/5...   \n",
       "\n",
       "                                   download_link_256x256  \n",
       "28908  https://iiif.onb.ac.at/images/AKON/AK066_086/0...  \n",
       "21317  https://iiif.onb.ac.at/images/AKON/AK034_386/3...  \n",
       "23201  https://iiif.onb.ac.at/images/AKON/AK041_572/5...  "
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sample(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling `sample` again yields different entries:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>akon_id</th>\n",
       "      <th>id</th>\n",
       "      <th>altitude</th>\n",
       "      <th>building</th>\n",
       "      <th>city</th>\n",
       "      <th>color</th>\n",
       "      <th>comment</th>\n",
       "      <th>mountain</th>\n",
       "      <th>other</th>\n",
       "      <th>photographer</th>\n",
       "      <th>publisher</th>\n",
       "      <th>publisher_place</th>\n",
       "      <th>region</th>\n",
       "      <th>water_body</th>\n",
       "      <th>year</th>\n",
       "      <th>inventory_number</th>\n",
       "      <th>signature</th>\n",
       "      <th>revision_date</th>\n",
       "      <th>date</th>\n",
       "      <th>feature_class</th>\n",
       "      <th>feature_code</th>\n",
       "      <th>geoname_id</th>\n",
       "      <th>latitude</th>\n",
       "      <th>longitude</th>\n",
       "      <th>name</th>\n",
       "      <th>country_id</th>\n",
       "      <th>admin_name_1</th>\n",
       "      <th>admin_code_1</th>\n",
       "      <th>geo</th>\n",
       "      <th>download_link</th>\n",
       "      <th>download_link_256x256</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>18810</th>\n",
       "      <td>18810</td>\n",
       "      <td>AK025_111</td>\n",
       "      <td>14618</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bruck an der Mur</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Ledermann</td>\n",
       "      <td>Wien</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2014-10-15 12:03:01.028</td>\n",
       "      <td>1916</td>\n",
       "      <td>P</td>\n",
       "      <td>PPLA3</td>\n",
       "      <td>2781371.0</td>\n",
       "      <td>47.41667</td>\n",
       "      <td>15.28333</td>\n",
       "      <td>Bruck an der Mur</td>\n",
       "      <td>AT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.41667, 15.28333</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK025_111/1...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK025_111/1...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28146</th>\n",
       "      <td>28146</td>\n",
       "      <td>AK061_165</td>\n",
       "      <td>36541</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Orosháza</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Vágner</td>\n",
       "      <td>Orosháza</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1917.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Kartensammlung 79/66 G</td>\n",
       "      <td>2015-08-25 15:28:56.547</td>\n",
       "      <td>1917</td>\n",
       "      <td>P</td>\n",
       "      <td>PPL</td>\n",
       "      <td>716736.0</td>\n",
       "      <td>46.56667</td>\n",
       "      <td>20.66667</td>\n",
       "      <td>Oroshaza</td>\n",
       "      <td>HU</td>\n",
       "      <td>Bekes County</td>\n",
       "      <td>03</td>\n",
       "      <td>46.56667, 20.66667</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK061_165/1...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK061_165/1...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8335</th>\n",
       "      <td>8335</td>\n",
       "      <td>AK088_563</td>\n",
       "      <td>56103</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Bad Reichenhall</td>\n",
       "      <td>False</td>\n",
       "      <td>1907 gel</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Geogrphisch-topographische Bildersammlung 1076/43</td>\n",
       "      <td>2014-08-28 16:20:02.029</td>\n",
       "      <td>vor 1907</td>\n",
       "      <td>P</td>\n",
       "      <td>PPLA3</td>\n",
       "      <td>2953371.0</td>\n",
       "      <td>47.72947</td>\n",
       "      <td>12.87819</td>\n",
       "      <td>Bad Reichenhall</td>\n",
       "      <td>DE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.72947, 12.87819</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK088_563/5...</td>\n",
       "      <td>https://iiif.onb.ac.at/images/AKON/AK088_563/5...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Unnamed: 0    akon_id     id  altitude building              city  \\\n",
       "18810       18810  AK025_111  14618       NaN      NaN  Bruck an der Mur   \n",
       "28146       28146  AK061_165  36541       NaN      NaN          Orosháza   \n",
       "8335         8335  AK088_563  56103       NaN      NaN   Bad Reichenhall   \n",
       "       color   comment mountain other photographer  publisher publisher_place  \\\n",
       "18810   True       NaN    Mugel   NaN          NaN  Ledermann            Wien   \n",
       "28146   True       NaN      NaN   NaN          NaN     Vágner        Orosháza   \n",
       "8335   False  1907 gel      NaN   NaN          NaN        NaN             NaN   \n",
       "      region water_body    year inventory_number  \\\n",
       "18810    NaN        NaN  1916.0              NaN   \n",
       "28146    NaN        NaN  1917.0              NaN   \n",
       "8335     NaN        NaN     NaN              NaN   \n",
       "                                               signature  \\\n",
       "18810                                                NaN   \n",
       "28146                             Kartensammlung 79/66 G   \n",
       "8335   Geogrphisch-topographische Bildersammlung 1076/43   \n",
       "                 revision_date      date feature_class feature_code  \\\n",
       "18810  2014-10-15 12:03:01.028      1916             P        PPLA3   \n",
       "28146  2015-08-25 15:28:56.547      1917             P          PPL   \n",
       "8335   2014-08-28 16:20:02.029  vor 1907             P        PPLA3   \n",
       "\n",
       "       geoname_id  latitude  longitude              name country_id  \\\n",
       "18810   2781371.0  47.41667   15.28333  Bruck an der Mur         AT   \n",
       "28146    716736.0  46.56667   20.66667          Oroshaza         HU   \n",
       "8335    2953371.0  47.72947   12.87819   Bad Reichenhall         DE   \n",
       "\n",
       "       admin_name_1 admin_code_1                 geo  \\\n",
       "18810           NaN          NaN  47.41667, 15.28333   \n",
       "28146  Bekes County           03  46.56667, 20.66667   \n",
       "8335            NaN          NaN  47.72947, 12.87819   \n",
       "\n",
       "                                           download_link  \\\n",
       "18810  https://iiif.onb.ac.at/images/AKON/AK025_111/1...   \n",
       "28146  https://iiif.onb.ac.at/images/AKON/AK061_165/1...   \n",
       "8335   https://iiif.onb.ac.at/images/AKON/AK088_563/5...   \n",
       "                                   download_link_256x256  \n",
       "18810  https://iiif.onb.ac.at/images/AKON/AK025_111/1...  \n",
       "28146  https://iiif.onb.ac.at/images/AKON/AK061_165/1...  \n",
       "8335   https://iiif.onb.ac.at/images/AKON/AK088_563/5...  "
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sample(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Count Things"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many entries show things in Italy?\n",
    "\n",
    "Let's use the `country_id` for this question:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_in_italy = df[df['country_id'] == 'IT']"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df_in_italy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many postcards are in color?"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_in_color = df[df['color'] == True]"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df_in_color)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can I do this in one line?"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[df['color'] == True])"
   ]