diff --git a/3.2 - Images - Download pre-downsized images for machine learning.ipynb b/3.2 - Images - Download pre-downsized images for machine learning.ipynb
index 413b54ed22705fb64444ab2e136cfb54377dbdf8..31979d1ce614f7f78e95055657bcd310ec782cb1 100644
--- a/3.2 - Images - Download pre-downsized images for machine learning.ipynb
+++ b/3.2 - Images - Download pre-downsized images for machine learning.ipynb
@@ -28,7 +28,7 @@
},
"source": [
"Let's say you got a bunch of old timey scenery photographs.\n",
- "And you want to extract all images of lakes, why not.\n",
+ "And you want to extract all images containing mountains, why not.\n",
"And, because you can, you want an AI to do all the dirty work for you.\n",
"\n",
"What that has to do with this workshop?\n",
@@ -51,7 +51,7 @@
"* [Keras](https://www.tensorflow.org/guide/keras) or\n",
"* [PyTorch](https://pytorch.org/).\n",
"\n",
- "One way to do it: Download a VGG16 network that's pre-trained on ImageNet, remove the last layer (the actual classifier), add your own output layer with 2 outputs ('lakes', 'no lakes') and train that one.\n",
+ "One way to do it: Download a VGG16 network that's pre-trained on ImageNet, remove the last layer (the actual classifier), add your own output layer with 2 outputs ('mountain', 'no mountain') and train that one.\n",
"\n",
"Now back to the show."
]
@@ -68,11 +68,11 @@
"\n",
"* **Download Metdata**\n",
" * List of all available postcards\n",
- " * Info about the 'lake-ness' of postcards\n",
+ " * Info about the 'mountain-ness' of postcards\n",
"* **Create Download Links**\n",
" * To fetch all images\n",
"* **Split Into Two Sets**\n",
- " * Lakes and non-lakes\n",
+ " * Mountain and non-mountain\n",
"* **Download Images**"
]
},
@@ -80,7 +80,7 @@
"cell_type": "markdown",
"metadata": {
"slideshow": {
- "slide_type": "subslide"
+ "slide_type": "slide"
}
},
"source": [
@@ -103,7 +103,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
- "/home/kst/tmp/dingsdi/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3049: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.\n",
+ "/home/oida/labs/pydays19/venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3049: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" interactivity=interactivity, compiler=compiler, result=result)\n"
]
}
@@ -120,7 +120,7 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "subslide"
@@ -182,274 +182,282 @@
" \n",
"
\n",
" \n",
- " 21958 | \n",
- " 21958 | \n",
- " AK036_452 | \n",
- " 21573 | \n",
- " 355.0 | \n",
+ " 6683 | \n",
+ " 6683 | \n",
+ " AK121_352 | \n",
+ " 80931 | \n",
" NaN | \n",
- " Piesting | \n",
- " True | \n",
+ " Zwinger | \n",
+ " Dresden | \n",
+ " False | \n",
+ " v. 1907 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
- " Ledermann | \n",
- " Wien | \n",
" NaN | \n",
" NaN | \n",
- " 1906.0 | \n",
" NaN | \n",
" NaN | \n",
- " 2014-08-04 07:59:10.288 | \n",
- " 1906 | \n",
+ " NaN | \n",
+ " Geogr. Topogr. Bilder-Samml. 1943, 7402 | \n",
+ " 2014-08-25 13:52:35.479 | \n",
+ " vor 1907 | \n",
" P | \n",
- " PPLA3 | \n",
- " 2771869.0 | \n",
- " 47.87358 | \n",
- " 16.12510 | \n",
- " Piesting | \n",
- " AT | \n",
+ " PPLA | \n",
+ " 2935022.0 | \n",
+ " 51.05089 | \n",
+ " 13.73832 | \n",
+ " Dresden | \n",
+ " DE | \n",
" NaN | \n",
" NaN | \n",
- " 47.87358, 16.1251 | \n",
+ " 51.05089, 13.73832 | \n",
"
\n",
" \n",
- " 31428 | \n",
- " 31428 | \n",
- " AK102_214 | \n",
- " 66938 | \n",
- " NaN | \n",
+ " 1060 | \n",
+ " 1060 | \n",
+ " AK074_287 | \n",
+ " 45904 | \n",
" NaN | \n",
- " Schaffhausen | \n",
- " False | \n",
- " v. 1907 | \n",
" NaN | \n",
- " Hohfluh | \n",
+ " Solingen | \n",
+ " True | \n",
+ " 1908 gel | \n",
" NaN | \n",
+ " Kaiser Wilhelm-Brücke | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
- " Geogr. Topogr. Bilder-Samml. 1951, 1093 | \n",
- " 2014-09-03 10:50:31.524 | \n",
- " vor 1907 | \n",
- " P | \n",
- " PPLA | \n",
- " 2658761.0 | \n",
- " 47.69732 | \n",
- " 8.63493 | \n",
- " Schaffhausen | \n",
- " CH | \n",
" NaN | \n",
" NaN | \n",
- " 47.69732, 8.63493 | \n",
+ " 2014-08-19 15:22:42.160 | \n",
+ " gelaufen 1908 | \n",
+ " P | \n",
+ " PPLA3 | \n",
+ " 2831580.0 | \n",
+ " 51.17343 | \n",
+ " 7.08450 | \n",
+ " Solingen | \n",
+ " DE | \n",
+ " Nordrhein-Westfalen | \n",
+ " 07 | \n",
+ " 51.17343, 7.0845 | \n",
"
\n",
" \n",
- " 13828 | \n",
- " 13828 | \n",
- " AK008_015 | \n",
- " 4271 | \n",
+ " 34225 | \n",
+ " 34225 | \n",
+ " AK087_169 | \n",
+ " 54994 | \n",
" NaN | \n",
" NaN | \n",
- " Eiskaarspitze | \n",
+ " Venezia, Piazza S. Marco | \n",
" False | \n",
+ " 1925 gel | \n",
+ " NaN | \n",
+ " NaN | \n",
" NaN | \n",
- " Dachstein | \n",
" NaN | \n",
" NaN | \n",
- " Ledermann | \n",
- " Wien | \n",
" NaN | \n",
" NaN | \n",
- " 1921.0 | \n",
" NaN | \n",
" NaN | \n",
- " 2014-08-04 07:59:09.895 | \n",
- " 1921 | \n",
- " T | \n",
- " MT | \n",
- " 2775701.0 | \n",
- " 47.47545 | \n",
- " 13.60588 | \n",
- " Dachstein | \n",
- " AT | \n",
" NaN | \n",
+ " 2014-08-25 09:26:12.544 | \n",
+ " gelaufen 1925 | \n",
+ " P | \n",
+ " PPLA | \n",
+ " 3164603.0 | \n",
+ " 45.43713 | \n",
+ " 12.33265 | \n",
+ " Venecia | \n",
+ " IT | \n",
" NaN | \n",
- " 47.47545, 13.60588 | \n",
+ " NaN | \n",
+ " 45.43713, 12.33265 | \n",
"
\n",
" \n",
- " 22725 | \n",
- " 22725 | \n",
- " AK039_299 | \n",
- " 23224 | \n",
+ " 20250 | \n",
+ " 20250 | \n",
+ " AK030_367 | \n",
+ " 17883 | \n",
" NaN | \n",
" NaN | \n",
- " Attnang | \n",
+ " Vorder Stoder | \n",
" False | \n",
" NaN | \n",
+ " Todtengebirge, Spitzmauer, Kleiner Priel, Groß... | \n",
" NaN | \n",
" NaN | \n",
- " NaN | \n",
- " Topf | \n",
- " Attnang-Puchheim | \n",
+ " Ledermann | \n",
+ " Wien | \n",
" NaN | \n",
" NaN | \n",
- " 1919.0 | \n",
+ " 1909.0 | \n",
" NaN | \n",
" NaN | \n",
- " 2014-08-04 07:59:10.309 | \n",
- " 1919 | \n",
+ " 2014-08-04 07:59:10.235 | \n",
+ " 1909 | \n",
" P | \n",
- " PPLX | \n",
- " 2782285.0 | \n",
- " 48.01667 | \n",
- " 13.71667 | \n",
- " Attnang | \n",
+ " PPL | \n",
+ " 2762185.0 | \n",
+ " 47.71337 | \n",
+ " 14.22712 | \n",
+ " Vorderstoder | \n",
" AT | \n",
" NaN | \n",
" NaN | \n",
- " 48.01667, 13.71667 | \n",
+ " 47.71337, 14.22712 | \n",
"
\n",
" \n",
- " 32047 | \n",
- " 32047 | \n",
- " AK067_186 | \n",
- " 40906 | \n",
+ " 19981 | \n",
+ " 19981 | \n",
+ " AK029_173 | \n",
+ " 17088 | \n",
" NaN | \n",
" NaN | \n",
- " Salzburg | \n",
+ " Pöggstall | \n",
" False | \n",
- " 1918 gel | \n",
- " Mönchsberg | \n",
+ " 1903 gel | \n",
" NaN | \n",
" NaN | \n",
- " Würthle & Sohn Nachfolger G. m. b. H | \n",
- " Salzburg | \n",
" NaN | \n",
+ " Hofmeister | \n",
+ " Pöggstall | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
- " 2014-08-12 14:04:34.797 | \n",
- " gelaufen 1918 | \n",
+ " NaN | \n",
+ " 2014-08-04 07:59:10.223 | \n",
+ " gelaufen 1903 | \n",
" P | \n",
- " PPLA | \n",
- " 2766824.0 | \n",
- " 47.79941 | \n",
- " 13.04399 | \n",
- " Salzburg | \n",
+ " PPLA3 | \n",
+ " 2768616.0 | \n",
+ " 48.31667 | \n",
+ " 15.18333 | \n",
+ " Pöggstall | \n",
" AT | \n",
" NaN | \n",
" NaN | \n",
- " 47.79941, 13.04399 | \n",
+ " 48.31667, 15.18333 | \n",
"
\n",
" \n",
- " 17407 | \n",
- " 17407 | \n",
- " AK020_443 | \n",
- " 11927 | \n",
+ " 30492 | \n",
+ " 30492 | \n",
+ " AK088_055 | \n",
+ " 55510 | \n",
" NaN | \n",
" NaN | \n",
- " Saalfelden | \n",
+ " Neutitschein, Obertorstrasse | \n",
" False | \n",
- " v 1907 | \n",
+ " 1920 gel | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
- " Ledermann | \n",
- " Wien | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
- " 2014-08-04 07:59:10.121 | \n",
- " vor 1907 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2014-08-28 13:39:02.860 | \n",
+ " gelaufen 1920 | \n",
" P | \n",
- " PPLA3 | \n",
- " 2766922.0 | \n",
- " 47.42681 | \n",
- " 12.84800 | \n",
- " Saalfelden am Steinernen Meer | \n",
- " AT | \n",
+ " PPL | \n",
+ " 3069305.0 | \n",
+ " 49.59438 | \n",
+ " 18.01028 | \n",
+ " Neutitschein | \n",
+ " CZ | \n",
" NaN | \n",
" NaN | \n",
- " 47.42681, 12.848 | \n",
+ " 49.59438, 18.01028 | \n",
"
\n",
" \n",
"\n",
""
],
"text/plain": [
- " Unnamed: 0 akon_id id altitude building city color \\\n",
- "21958 21958 AK036_452 21573 355.0 NaN Piesting True \n",
- "31428 31428 AK102_214 66938 NaN NaN Schaffhausen False \n",
- "13828 13828 AK008_015 4271 NaN NaN Eiskaarspitze False \n",
- "22725 22725 AK039_299 23224 NaN NaN Attnang False \n",
- "32047 32047 AK067_186 40906 NaN NaN Salzburg False \n",
- "17407 17407 AK020_443 11927 NaN NaN Saalfelden False \n",
+ " Unnamed: 0 akon_id id altitude building \\\n",
+ "6683 6683 AK121_352 80931 NaN Zwinger \n",
+ "1060 1060 AK074_287 45904 NaN NaN \n",
+ "34225 34225 AK087_169 54994 NaN NaN \n",
+ "20250 20250 AK030_367 17883 NaN NaN \n",
+ "19981 19981 AK029_173 17088 NaN NaN \n",
+ "30492 30492 AK088_055 55510 NaN NaN \n",
"\n",
- " comment mountain other photographer \\\n",
- "21958 NaN NaN NaN NaN \n",
- "31428 v. 1907 NaN Hohfluh NaN \n",
- "13828 NaN Dachstein NaN NaN \n",
- "22725 NaN NaN NaN NaN \n",
- "32047 1918 gel Mönchsberg NaN NaN \n",
- "17407 v 1907 NaN NaN NaN \n",
+ " city color comment \\\n",
+ "6683 Dresden False v. 1907 \n",
+ "1060 Solingen True 1908 gel \n",
+ "34225 Venezia, Piazza S. Marco False 1925 gel \n",
+ "20250 Vorder Stoder False NaN \n",
+ "19981 Pöggstall False 1903 gel \n",
+ "30492 Neutitschein, Obertorstrasse False 1920 gel \n",
"\n",
- " publisher publisher_place region \\\n",
- "21958 Ledermann Wien NaN \n",
- "31428 NaN NaN NaN \n",
- "13828 Ledermann Wien NaN \n",
- "22725 Topf Attnang-Puchheim NaN \n",
- "32047 Würthle & Sohn Nachfolger G. m. b. H Salzburg NaN \n",
- "17407 Ledermann Wien NaN \n",
+ " mountain \\\n",
+ "6683 NaN \n",
+ "1060 NaN \n",
+ "34225 NaN \n",
+ "20250 Todtengebirge, Spitzmauer, Kleiner Priel, Groß... \n",
+ "19981 NaN \n",
+ "30492 NaN \n",
+ "\n",
+ " other photographer publisher publisher_place region \\\n",
+ "6683 NaN NaN NaN NaN NaN \n",
+ "1060 Kaiser Wilhelm-Brücke NaN NaN NaN NaN \n",
+ "34225 NaN NaN NaN NaN NaN \n",
+ "20250 NaN NaN Ledermann Wien NaN \n",
+ "19981 NaN NaN Hofmeister Pöggstall NaN \n",
+ "30492 NaN NaN NaN NaN NaN \n",
"\n",
" water_body year inventory_number \\\n",
- "21958 NaN 1906.0 NaN \n",
- "31428 NaN NaN NaN \n",
- "13828 NaN 1921.0 NaN \n",
- "22725 NaN 1919.0 NaN \n",
- "32047 NaN NaN NaN \n",
- "17407 NaN NaN NaN \n",
+ "6683 NaN NaN NaN \n",
+ "1060 NaN NaN NaN \n",
+ "34225 NaN NaN NaN \n",
+ "20250 NaN 1909.0 NaN \n",
+ "19981 NaN NaN NaN \n",
+ "30492 NaN NaN NaN \n",
"\n",
" signature revision_date \\\n",
- "21958 NaN 2014-08-04 07:59:10.288 \n",
- "31428 Geogr. Topogr. Bilder-Samml. 1951, 1093 2014-09-03 10:50:31.524 \n",
- "13828 NaN 2014-08-04 07:59:09.895 \n",
- "22725 NaN 2014-08-04 07:59:10.309 \n",
- "32047 NaN 2014-08-12 14:04:34.797 \n",
- "17407 NaN 2014-08-04 07:59:10.121 \n",
+ "6683 Geogr. Topogr. Bilder-Samml. 1943, 7402 2014-08-25 13:52:35.479 \n",
+ "1060 NaN 2014-08-19 15:22:42.160 \n",
+ "34225 NaN 2014-08-25 09:26:12.544 \n",
+ "20250 NaN 2014-08-04 07:59:10.235 \n",
+ "19981 NaN 2014-08-04 07:59:10.223 \n",
+ "30492 NaN 2014-08-28 13:39:02.860 \n",
"\n",
" date feature_class feature_code geoname_id latitude \\\n",
- "21958 1906 P PPLA3 2771869.0 47.87358 \n",
- "31428 vor 1907 P PPLA 2658761.0 47.69732 \n",
- "13828 1921 T MT 2775701.0 47.47545 \n",
- "22725 1919 P PPLX 2782285.0 48.01667 \n",
- "32047 gelaufen 1918 P PPLA 2766824.0 47.79941 \n",
- "17407 vor 1907 P PPLA3 2766922.0 47.42681 \n",
+ "6683 vor 1907 P PPLA 2935022.0 51.05089 \n",
+ "1060 gelaufen 1908 P PPLA3 2831580.0 51.17343 \n",
+ "34225 gelaufen 1925 P PPLA 3164603.0 45.43713 \n",
+ "20250 1909 P PPL 2762185.0 47.71337 \n",
+ "19981 gelaufen 1903 P PPLA3 2768616.0 48.31667 \n",
+ "30492 gelaufen 1920 P PPL 3069305.0 49.59438 \n",
"\n",
- " longitude name country_id admin_name_1 \\\n",
- "21958 16.12510 Piesting AT NaN \n",
- "31428 8.63493 Schaffhausen CH NaN \n",
- "13828 13.60588 Dachstein AT NaN \n",
- "22725 13.71667 Attnang AT NaN \n",
- "32047 13.04399 Salzburg AT NaN \n",
- "17407 12.84800 Saalfelden am Steinernen Meer AT NaN \n",
+ " longitude name country_id admin_name_1 admin_code_1 \\\n",
+ "6683 13.73832 Dresden DE NaN NaN \n",
+ "1060 7.08450 Solingen DE Nordrhein-Westfalen 07 \n",
+ "34225 12.33265 Venecia IT NaN NaN \n",
+ "20250 14.22712 Vorderstoder AT NaN NaN \n",
+ "19981 15.18333 Pöggstall AT NaN NaN \n",
+ "30492 18.01028 Neutitschein CZ NaN NaN \n",
"\n",
- " admin_code_1 geo \n",
- "21958 NaN 47.87358, 16.1251 \n",
- "31428 NaN 47.69732, 8.63493 \n",
- "13828 NaN 47.47545, 13.60588 \n",
- "22725 NaN 48.01667, 13.71667 \n",
- "32047 NaN 47.79941, 13.04399 \n",
- "17407 NaN 47.42681, 12.848 "
+ " geo \n",
+ "6683 51.05089, 13.73832 \n",
+ "1060 51.17343, 7.0845 \n",
+ "34225 45.43713, 12.33265 \n",
+ "20250 47.71337, 14.22712 \n",
+ "19981 48.31667, 15.18333 \n",
+ "30492 49.59438, 18.01028 "
]
},
- "execution_count": 4,
+ "execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
@@ -466,12 +474,12 @@
}
},
"source": [
- "Ok, we have metadata. And look, there's a column *water_body:*"
+ "Ok, we have metadata. And look, there's a column *mountain:*"
]
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 3,
"metadata": {},
"outputs": [
{
@@ -496,61 +504,55 @@
" \n",
" | \n",
" akon_id | \n",
- " water_body | \n",
+ " mountain | \n",
"
\n",
" \n",
" \n",
" \n",
- " 19411 | \n",
- " AK027_436 | \n",
- " Millstätter See | \n",
+ " 33148 | \n",
+ " AK076_442 | \n",
+ " Watzmann, Hochkalter | \n",
"
\n",
" \n",
- " 24183 | \n",
- " AK045_467 | \n",
+ " 29503 | \n",
+ " AK070_327 | \n",
" NaN | \n",
"
\n",
" \n",
- " 7038 | \n",
- " AK082_171 | \n",
+ " 5663 | \n",
+ " AK075_470 | \n",
" NaN | \n",
"
\n",
" \n",
- " 21425 | \n",
- " AK035_008 | \n",
+ " 31604 | \n",
+ " AK107_561 | \n",
" NaN | \n",
"
\n",
" \n",
- " 8853 | \n",
- " AK107_209 | \n",
+ " 8748 | \n",
+ " AK091_235 | \n",
" NaN | \n",
"
\n",
- " \n",
- " 27306 | \n",
- " AK057_094 | \n",
- " Csorba See | \n",
- "
\n",
" \n",
"\n",
""
],
"text/plain": [
- " akon_id water_body\n",
- "19411 AK027_436 Millstätter See\n",
- "24183 AK045_467 NaN\n",
- "7038 AK082_171 NaN\n",
- "21425 AK035_008 NaN\n",
- "8853 AK107_209 NaN\n",
- "27306 AK057_094 Csorba See"
+ " akon_id mountain\n",
+ "33148 AK076_442 Watzmann, Hochkalter\n",
+ "29503 AK070_327 NaN\n",
+ "5663 AK075_470 NaN\n",
+ "31604 AK107_561 NaN\n",
+ "8748 AK091_235 NaN"
]
},
- "execution_count": 4,
+ "execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "meta.sample(6)[['akon_id', 'water_body']]"
+ "meta.sample(5)[['akon_id', 'mountain']]"
]
},
{
@@ -564,7 +566,7 @@
"cell_type": "markdown",
"metadata": {
"slideshow": {
- "slide_type": "subslide"
+ "slide_type": "slide"
}
},
"source": [
@@ -615,7 +617,7 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
@@ -625,7 +627,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@@ -638,7 +640,7 @@
"'https://iiif.onb.ac.at/presentation/AKON/AK024_176/manifest'"
]
},
- "execution_count": 12,
+ "execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -660,7 +662,7 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 6,
"metadata": {
"scrolled": true,
"slideshow": {
@@ -735,7 +737,7 @@
" 'on': 'https://iiif.onb.ac.at/presentation/AKON/AK024_176/canvas/176'}]}]}]}"
]
},
- "execution_count": 14,
+ "execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -755,12 +757,12 @@
}
},
"source": [
- "The manifest link seems to work. Let's add them to the dataframe."
+ "The manifest link seems to work. Let's add manifest links for all postcards to the dataframe:"
]
},
{
"cell_type": "code",
- "execution_count": 16,
+ "execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@@ -773,7 +775,7 @@
},
{
"cell_type": "code",
- "execution_count": 17,
+ "execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@@ -807,34 +809,34 @@
" \n",
" \n",
" \n",
- " 34406 | \n",
- " AK072_262 | \n",
- " https://iiif.onb.ac.at/presentation/AKON/AK072... | \n",
+ " 32242 | \n",
+ " AK049_538 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK049... | \n",
"
\n",
" \n",
- " 30373 | \n",
- " AK085_299 | \n",
- " https://iiif.onb.ac.at/presentation/AKON/AK085... | \n",
+ " 10827 | \n",
+ " AK001_237 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK001... | \n",
"
\n",
" \n",
- " 18098 | \n",
- " AK023_263 | \n",
- " https://iiif.onb.ac.at/presentation/AKON/AK023... | \n",
+ " 14148 | \n",
+ " AK009_081 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK009... | \n",
"
\n",
" \n",
- " 3152 | \n",
- " AK122_506 | \n",
- " https://iiif.onb.ac.at/presentation/AKON/AK122... | \n",
+ " 8074 | \n",
+ " AK087_246 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK087... | \n",
"
\n",
" \n",
- " 31981 | \n",
- " AK097_136 | \n",
- " https://iiif.onb.ac.at/presentation/AKON/AK097... | \n",
+ " 33232 | \n",
+ " AK082_006 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK082... | \n",
"
\n",
" \n",
- " 160 | \n",
- " AK111_325 | \n",
- " https://iiif.onb.ac.at/presentation/AKON/AK111... | \n",
+ " 22877 | \n",
+ " AK040_083 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK040... | \n",
"
\n",
" \n",
"\n",
@@ -842,15 +844,15 @@
],
"text/plain": [
" akon_id manifest_link\n",
- "34406 AK072_262 https://iiif.onb.ac.at/presentation/AKON/AK072...\n",
- "30373 AK085_299 https://iiif.onb.ac.at/presentation/AKON/AK085...\n",
- "18098 AK023_263 https://iiif.onb.ac.at/presentation/AKON/AK023...\n",
- "3152 AK122_506 https://iiif.onb.ac.at/presentation/AKON/AK122...\n",
- "31981 AK097_136 https://iiif.onb.ac.at/presentation/AKON/AK097...\n",
- "160 AK111_325 https://iiif.onb.ac.at/presentation/AKON/AK111..."
+ "32242 AK049_538 https://iiif.onb.ac.at/presentation/AKON/AK049...\n",
+ "10827 AK001_237 https://iiif.onb.ac.at/presentation/AKON/AK001...\n",
+ "14148 AK009_081 https://iiif.onb.ac.at/presentation/AKON/AK009...\n",
+ "8074 AK087_246 https://iiif.onb.ac.at/presentation/AKON/AK087...\n",
+ "33232 AK082_006 https://iiif.onb.ac.at/presentation/AKON/AK082...\n",
+ "22877 AK040_083 https://iiif.onb.ac.at/presentation/AKON/AK040..."
]
},
- "execution_count": 17,
+ "execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -879,7 +881,7 @@
},
{
"cell_type": "code",
- "execution_count": 18,
+ "execution_count": 9,
"metadata": {
"scrolled": true
},
@@ -951,7 +953,7 @@
" 'on': 'https://iiif.onb.ac.at/presentation/AKON/AK024_176/canvas/176'}]}]}]}"
]
},
- "execution_count": 18,
+ "execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -976,7 +978,7 @@
},
{
"cell_type": "code",
- "execution_count": 21,
+ "execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
@@ -987,7 +989,7 @@
},
{
"cell_type": "code",
- "execution_count": 23,
+ "execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@@ -1000,7 +1002,7 @@
"['https://iiif.onb.ac.at/images/AKON/AK024_176/176/full/full/0/native.jpg']"
]
},
- "execution_count": 23,
+ "execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -1022,7 +1024,7 @@
},
{
"cell_type": "code",
- "execution_count": 27,
+ "execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
@@ -1033,40 +1035,26 @@
" try:\n",
" json = r.json()\n",
" except:\n",
- " # default to empty on any errors\n",
- " # makes batch processing easier in pandas\n",
+ " # default to empty on exceptions - makes batch processing easier in pandas\n",
" json = {}\n",
" image_links = [match.value for match in image_id_jp.find(json)]\n",
" return image_links"
]
},
{
- "cell_type": "code",
- "execution_count": 28,
+ "cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "['https://iiif.onb.ac.at/images/AKON/AK024_176/176/full/full/0/native.jpg']"
- ]
- },
- "execution_count": 28,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
"source": [
- "image_links_for_manifest_link(akon_id_to_manifest_link('AK024_176'))"
+ "Let's test it:"
]
},
{
"cell_type": "code",
- "execution_count": 29,
+ "execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "fragment"
@@ -1076,21 +1064,27 @@
{
"data": {
"text/plain": [
- "['https://iiif.onb.ac.at/images/AKON/AK111_325/325/full/full/0/native.jpg']"
+ "['https://iiif.onb.ac.at/images/AKON/AK036_284/284/full/full/0/native.jpg']"
]
},
- "execution_count": 29,
+ "execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "image_links_for_manifest_link(akon_id_to_manifest_link('AK111_325'))"
+ "random_akon_id = meta.sample().iloc[0]['akon_id']\n",
+ "manifest_link = akon_id_to_manifest_link(random_akon_id)\n",
+ "image_links_for_manifest_link(manifest_link)"
]
},
{
"cell_type": "markdown",
- "metadata": {},
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
"source": [
"Looking good."
]
@@ -1110,22 +1104,13 @@
},
{
"cell_type": "code",
- "execution_count": 31,
+ "execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "CPU times: user 7min 26s, sys: 18 s, total: 7min 44s\n",
- "Wall time: 12min 34s\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"# %%time\n",
"# meta['image_links'] = meta['manifest_link'].apply(image_links_for_manifest_link)"
@@ -1133,120 +1118,914 @@
},
{
"cell_type": "code",
- "execution_count": 33,
+ "execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
- "/home/kst/tmp/dingsdi/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3049: DtypeWarning: Columns (14) have mixed types. Specify dtype option on import or set low_memory=False.\n",
+ "/home/oida/labs/pydays19/venv/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3049: DtypeWarning: Columns (14) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" interactivity=interactivity, compiler=compiler, result=result)\n"
]
}
],
"source": [
- "meta = pd.read_csv('postcards_with_image_links.csv.bz2', compression='bz2')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "slideshow": {
- "slide_type": "subslide"
- }
- },
- "source": [
- "## Split Into Two Sets"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "**TODO**"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "slideshow": {
- "slide_type": "subslide"
- }
- },
- "source": [
- "## Download"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "**TODO**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "slideshow": {
- "slide_type": "slide"
- }
- },
- "source": [
- "# Just The Code"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import pandas as pd\n",
- "\n",
- "\n",
- "def akon_id_to_manifest_link(akon_id):\n",
- " return f'https://iiif.onb.ac.at/presentation/AKON/{akon_id}/manifest'\n",
+ "import json\n",
"\n",
+ "def load_json(s):\n",
+ " try:\n",
+ " return json.loads(s.replace(\"'\", '\"'))\n",
+ " except:\n",
+ " return []\n",
"\n",
- "meta = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/' \\\n",
- " 'raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2', compression='bz2')\n",
- "meta['manifest_link'] = meta['akon_id'].apply(akon_id_to_manifest_link)"
+ "meta = pd.read_csv('postcards_with_image_links.csv.bz2', compression='bz2', converters={\n",
+ " 'image_links': load_json\n",
+ "})"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
- "outputs": [],
- "source": [
- "import requests\n",
- "from jsonpath_ng import jsonpath, parse\n",
- "\n",
- "\n",
- "image_id_jp = parse('$.sequences[*].canvases[*].images[*].resource.@id')\n",
- "\n",
- "def image_links_for_manifest_link(manifest_link):\n",
- " r = requests.get(manifest_link)\n",
- " try:\n",
- " json = r.json()\n",
- " except:\n",
- " json = {}\n",
- " return [match.value for match in image_id_jp.find(json)]\n",
- "\n",
- "\n",
- "meta['image_links'] = meta['manifest_link'].apply(image_links_for_manifest_link)"
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " Unnamed: 0 | \n",
+ " Unnamed: 0.1 | \n",
+ " akon_id | \n",
+ " id | \n",
+ " altitude | \n",
+ " building | \n",
+ " city | \n",
+ " color | \n",
+ " comment | \n",
+ " mountain | \n",
+ " other | \n",
+ " photographer | \n",
+ " publisher | \n",
+ " publisher_place | \n",
+ " region | \n",
+ " water_body | \n",
+ " year | \n",
+ " inventory_number | \n",
+ " signature | \n",
+ " revision_date | \n",
+ " date | \n",
+ " feature_class | \n",
+ " feature_code | \n",
+ " geoname_id | \n",
+ " latitude | \n",
+ " longitude | \n",
+ " name | \n",
+ " country_id | \n",
+ " admin_name_1 | \n",
+ " admin_code_1 | \n",
+ " geo | \n",
+ " manifest_link | \n",
+ " image_links | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 243 | \n",
+ " 243 | \n",
+ " 243 | \n",
+ " AK111_476 | \n",
+ " 75139 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Rochlitz | \n",
+ " False | \n",
+ " v. 1907 | \n",
+ " Rochlitzer Berg | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Niederösterreichische Landesbibliothek 1672 | \n",
+ " 2014-09-05 11:30:43.299 | \n",
+ " vor 1907 | \n",
+ " T | \n",
+ " HLL | \n",
+ " 2846260.0 | \n",
+ " 51.02678 | \n",
+ " 12.77079 | \n",
+ " Rochlitzer Berg | \n",
+ " DE | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 51.02678, 12.77079 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK111... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK111_476/... | \n",
+ "
\n",
+ " \n",
+ " 34809 | \n",
+ " 34809 | \n",
+ " 34809 | \n",
+ " AK073_578 | \n",
+ " 45523 | \n",
+ " NaN | \n",
+ " Kgl. Residenz | \n",
+ " Würzburg | \n",
+ " False | \n",
+ " 1909 gel | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Martin | \n",
+ " Nürnberg | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2014-08-19 14:22:35.340 | \n",
+ " gelaufen 1909 | \n",
+ " P | \n",
+ " PPLA2 | \n",
+ " 2805615.0 | \n",
+ " 49.79391 | \n",
+ " 9.95121 | \n",
+ " Würzburg | \n",
+ " DE | \n",
+ " Bayern | \n",
+ " 02 | \n",
+ " 49.79391, 9.95121 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK073... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK073_578/... | \n",
+ "
\n",
+ " \n",
+ " 18069 | \n",
+ " 18069 | \n",
+ " 18069 | \n",
+ " AK023_145 | \n",
+ " 13445 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Villach | \n",
+ " True | \n",
+ " NaN | \n",
+ " Mittagskogel | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1912.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2014-08-04 07:59:10.156 | \n",
+ " 1912 | \n",
+ " P | \n",
+ " PPLA2 | \n",
+ " 2762372.0 | \n",
+ " 46.61028 | \n",
+ " 13.85583 | \n",
+ " Villach | \n",
+ " AT | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 46.61028, 13.85583 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK023... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK023_145/... | \n",
+ "
\n",
+ " \n",
+ " 4554 | \n",
+ " 4554 | \n",
+ " 4554 | \n",
+ " AK034_086 | \n",
+ " 20003 | \n",
+ " 693.0 | \n",
+ " Chorherrensift Vorau | \n",
+ " Vorau | \n",
+ " False | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Raza | \n",
+ " Vorau | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1924.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2014-09-16 14:48:11.455 | \n",
+ " 1924 | \n",
+ " S | \n",
+ " MSTY | \n",
+ " 2762297.0 | \n",
+ " 47.40000 | \n",
+ " 15.90000 | \n",
+ " Stift Vorau | \n",
+ " AT | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 47.4, 15.9 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK034... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK034_086/... | \n",
+ "
\n",
+ " \n",
+ " 20907 | \n",
+ " 20907 | \n",
+ " 20907 | \n",
+ " AK032_497 | \n",
+ " 19311 | \n",
+ " NaN | \n",
+ " Schloss Purgstall | \n",
+ " NaN | \n",
+ " False | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1918.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2014-08-04 07:59:10.257 | \n",
+ " 1918 | \n",
+ " A | \n",
+ " ADM3 | \n",
+ " 7873031.0 | \n",
+ " 48.05513 | \n",
+ " 15.13316 | \n",
+ " Purgstall an der Erlauf | \n",
+ " AT | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 48.05513, 15.13316 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK032... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK032_497/... | \n",
+ "
\n",
+ " \n",
+ " 5136 | \n",
+ " 5136 | \n",
+ " 5136 | \n",
+ " AK111_054 | \n",
+ " 74715 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Kindberg | \n",
+ " False | \n",
+ " 1901 gel | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Niederösterreichische Landesbibliothek 1664 | \n",
+ " 2014-09-05 10:17:42.132 | \n",
+ " gelaufen 1901 | \n",
+ " P | \n",
+ " PPLA3 | \n",
+ " 2774437.0 | \n",
+ " 47.50000 | \n",
+ " 15.45000 | \n",
+ " Kindberg | \n",
+ " AT | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 47.5, 15.45 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK111... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK111_054/... | \n",
+ "
\n",
+ " \n",
+ " 3871 | \n",
+ " 3871 | \n",
+ " 3871 | \n",
+ " AK125_381 | \n",
+ " 83488 | \n",
+ " 601.0 | \n",
+ " Hans Hackl's Gasthof zum Jaidhaus | \n",
+ " Hinterstoder | \n",
+ " False | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 1911.0 | \n",
+ " NaN | \n",
+ " Nationalbibliothek Karten Abteilung 5862 | \n",
+ " 2014-09-12 16:07:31.780 | \n",
+ " 1911 | \n",
+ " P | \n",
+ " PPL | \n",
+ " 2776235.0 | \n",
+ " 47.69957 | \n",
+ " 14.15468 | \n",
+ " Hinterstoder | \n",
+ " AT | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 47.69957, 14.15468 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK125... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK125_381/... | \n",
+ "
\n",
+ " \n",
+ " 1174 | \n",
+ " 1174 | \n",
+ " 1174 | \n",
+ " AK116_235 | \n",
+ " 77922 | \n",
+ " NaN | \n",
+ " Burgruine Gars | \n",
+ " Gars a. Kamp | \n",
+ " False | \n",
+ " 1913 gel | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Kiennast | \n",
+ " Gars | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 79/59 K | \n",
+ " NaN | \n",
+ " 2014-09-09 12:22:52.928 | \n",
+ " gelaufen 1913 | \n",
+ " P | \n",
+ " PPLA3 | \n",
+ " 2778845.0 | \n",
+ " 48.58333 | \n",
+ " 15.65000 | \n",
+ " Gars am Kamp | \n",
+ " AT | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 48.58333, 15.65 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK116... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK116_235/... | \n",
+ "
\n",
+ " \n",
+ " 1897 | \n",
+ " 1897 | \n",
+ " 1897 | \n",
+ " AK118_376 | \n",
+ " 65136 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " False | \n",
+ " 1925 gel | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Grundlsee | \n",
+ " NaN | \n",
+ " 11/44 Kt. | \n",
+ " Geogr. Topogr. Bilder-Samml. 1944, 4144 | \n",
+ " 2014-09-10 07:51:30.611 | \n",
+ " gelaufen 1925 | \n",
+ " H | \n",
+ " LK | \n",
+ " 2777424.0 | \n",
+ " 47.63333 | \n",
+ " 13.86667 | \n",
+ " Grundlsee | \n",
+ " AT | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 47.63333, 13.86667 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK118... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK118_376/... | \n",
+ "
\n",
+ " \n",
+ " 33243 | \n",
+ " 33243 | \n",
+ " 33243 | \n",
+ " AK083_217 | \n",
+ " 52264 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Höllenthal | \n",
+ " False | \n",
+ " v 1905 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " Johannes | \n",
+ " Partenkirchen-Garmisch | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " 2014-08-26 12:14:56.005 | \n",
+ " vor 1905 | \n",
+ " T | \n",
+ " CRQ | \n",
+ " 2900507.0 | \n",
+ " 47.43333 | \n",
+ " 11.01667 | \n",
+ " Höllental Kar | \n",
+ " DE | \n",
+ " Bayern | \n",
+ " 02 | \n",
+ " 47.43333, 11.01667 | \n",
+ " https://iiif.onb.ac.at/presentation/AKON/AK083... | \n",
+ " [https://iiif.onb.ac.at/images/AKON/AK083_217/... | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Unnamed: 0 Unnamed: 0.1 akon_id id altitude \\\n",
+ "243 243 243 AK111_476 75139 NaN \n",
+ "34809 34809 34809 AK073_578 45523 NaN \n",
+ "18069 18069 18069 AK023_145 13445 NaN \n",
+ "4554 4554 4554 AK034_086 20003 693.0 \n",
+ "20907 20907 20907 AK032_497 19311 NaN \n",
+ "5136 5136 5136 AK111_054 74715 NaN \n",
+ "3871 3871 3871 AK125_381 83488 601.0 \n",
+ "1174 1174 1174 AK116_235 77922 NaN \n",
+ "1897 1897 1897 AK118_376 65136 NaN \n",
+ "33243 33243 33243 AK083_217 52264 NaN \n",
+ "\n",
+ " building city color comment \\\n",
+ "243 NaN Rochlitz False v. 1907 \n",
+ "34809 Kgl. Residenz Würzburg False 1909 gel \n",
+ "18069 NaN Villach True NaN \n",
+ "4554 Chorherrensift Vorau Vorau False NaN \n",
+ "20907 Schloss Purgstall NaN False NaN \n",
+ "5136 NaN Kindberg False 1901 gel \n",
+ "3871 Hans Hackl's Gasthof zum Jaidhaus Hinterstoder False NaN \n",
+ "1174 Burgruine Gars Gars a. Kamp False 1913 gel \n",
+ "1897 NaN NaN False 1925 gel \n",
+ "33243 NaN Höllenthal False v 1905 \n",
+ "\n",
+ " mountain other photographer publisher publisher_place \\\n",
+ "243 Rochlitzer Berg NaN NaN NaN NaN \n",
+ "34809 NaN NaN NaN Martin Nürnberg \n",
+ "18069 Mittagskogel NaN NaN NaN NaN \n",
+ "4554 NaN NaN NaN Raza Vorau \n",
+ "20907 NaN NaN NaN NaN NaN \n",
+ "5136 NaN NaN NaN NaN NaN \n",
+ "3871 NaN NaN NaN NaN NaN \n",
+ "1174 NaN NaN NaN Kiennast Gars \n",
+ "1897 NaN NaN NaN NaN NaN \n",
+ "33243 NaN NaN NaN Johannes Partenkirchen-Garmisch \n",
+ "\n",
+ " region water_body year inventory_number \\\n",
+ "243 NaN NaN NaN NaN \n",
+ "34809 NaN NaN NaN NaN \n",
+ "18069 NaN NaN 1912.0 NaN \n",
+ "4554 NaN NaN 1924.0 NaN \n",
+ "20907 NaN NaN 1918.0 NaN \n",
+ "5136 NaN NaN NaN NaN \n",
+ "3871 NaN NaN 1911.0 NaN \n",
+ "1174 NaN NaN NaN 79/59 K \n",
+ "1897 NaN Grundlsee NaN 11/44 Kt. \n",
+ "33243 NaN NaN NaN NaN \n",
+ "\n",
+ " signature revision_date \\\n",
+ "243 Niederösterreichische Landesbibliothek 1672 2014-09-05 11:30:43.299 \n",
+ "34809 NaN 2014-08-19 14:22:35.340 \n",
+ "18069 NaN 2014-08-04 07:59:10.156 \n",
+ "4554 NaN 2014-09-16 14:48:11.455 \n",
+ "20907 NaN 2014-08-04 07:59:10.257 \n",
+ "5136 Niederösterreichische Landesbibliothek 1664 2014-09-05 10:17:42.132 \n",
+ "3871 Nationalbibliothek Karten Abteilung 5862 2014-09-12 16:07:31.780 \n",
+ "1174 NaN 2014-09-09 12:22:52.928 \n",
+ "1897 Geogr. Topogr. Bilder-Samml. 1944, 4144 2014-09-10 07:51:30.611 \n",
+ "33243 NaN 2014-08-26 12:14:56.005 \n",
+ "\n",
+ " date feature_class feature_code geoname_id latitude \\\n",
+ "243 vor 1907 T HLL 2846260.0 51.02678 \n",
+ "34809 gelaufen 1909 P PPLA2 2805615.0 49.79391 \n",
+ "18069 1912 P PPLA2 2762372.0 46.61028 \n",
+ "4554 1924 S MSTY 2762297.0 47.40000 \n",
+ "20907 1918 A ADM3 7873031.0 48.05513 \n",
+ "5136 gelaufen 1901 P PPLA3 2774437.0 47.50000 \n",
+ "3871 1911 P PPL 2776235.0 47.69957 \n",
+ "1174 gelaufen 1913 P PPLA3 2778845.0 48.58333 \n",
+ "1897 gelaufen 1925 H LK 2777424.0 47.63333 \n",
+ "33243 vor 1905 T CRQ 2900507.0 47.43333 \n",
+ "\n",
+ " longitude name country_id admin_name_1 \\\n",
+ "243 12.77079 Rochlitzer Berg DE NaN \n",
+ "34809 9.95121 Würzburg DE Bayern \n",
+ "18069 13.85583 Villach AT NaN \n",
+ "4554 15.90000 Stift Vorau AT NaN \n",
+ "20907 15.13316 Purgstall an der Erlauf AT NaN \n",
+ "5136 15.45000 Kindberg AT NaN \n",
+ "3871 14.15468 Hinterstoder AT NaN \n",
+ "1174 15.65000 Gars am Kamp AT NaN \n",
+ "1897 13.86667 Grundlsee AT NaN \n",
+ "33243 11.01667 Höllental Kar DE Bayern \n",
+ "\n",
+ " admin_code_1 geo \\\n",
+ "243 NaN 51.02678, 12.77079 \n",
+ "34809 02 49.79391, 9.95121 \n",
+ "18069 NaN 46.61028, 13.85583 \n",
+ "4554 NaN 47.4, 15.9 \n",
+ "20907 NaN 48.05513, 15.13316 \n",
+ "5136 NaN 47.5, 15.45 \n",
+ "3871 NaN 47.69957, 14.15468 \n",
+ "1174 NaN 48.58333, 15.65 \n",
+ "1897 NaN 47.63333, 13.86667 \n",
+ "33243 02 47.43333, 11.01667 \n",
+ "\n",
+ " manifest_link \\\n",
+ "243 https://iiif.onb.ac.at/presentation/AKON/AK111... \n",
+ "34809 https://iiif.onb.ac.at/presentation/AKON/AK073... \n",
+ "18069 https://iiif.onb.ac.at/presentation/AKON/AK023... \n",
+ "4554 https://iiif.onb.ac.at/presentation/AKON/AK034... \n",
+ "20907 https://iiif.onb.ac.at/presentation/AKON/AK032... \n",
+ "5136 https://iiif.onb.ac.at/presentation/AKON/AK111... \n",
+ "3871 https://iiif.onb.ac.at/presentation/AKON/AK125... \n",
+ "1174 https://iiif.onb.ac.at/presentation/AKON/AK116... \n",
+ "1897 https://iiif.onb.ac.at/presentation/AKON/AK118... \n",
+ "33243 https://iiif.onb.ac.at/presentation/AKON/AK083... \n",
+ "\n",
+ " image_links \n",
+ "243 [https://iiif.onb.ac.at/images/AKON/AK111_476/... \n",
+ "34809 [https://iiif.onb.ac.at/images/AKON/AK073_578/... \n",
+ "18069 [https://iiif.onb.ac.at/images/AKON/AK023_145/... \n",
+ "4554 [https://iiif.onb.ac.at/images/AKON/AK034_086/... \n",
+ "20907 [https://iiif.onb.ac.at/images/AKON/AK032_497/... \n",
+ "5136 [https://iiif.onb.ac.at/images/AKON/AK111_054/... \n",
+ "3871 [https://iiif.onb.ac.at/images/AKON/AK125_381/... \n",
+ "1174 [https://iiif.onb.ac.at/images/AKON/AK116_235/... \n",
+ "1897 [https://iiif.onb.ac.at/images/AKON/AK118_376/... \n",
+ "33243 [https://iiif.onb.ac.at/images/AKON/AK083_217/... "
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "meta.sample(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Split Into Two Sets"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We'll split the dataframe into two: One with mountains, one without."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "nomountain = meta[ meta['mountain'].isnull() ]\n",
+ "mountain = meta[ ~ meta['mountain'].isnull() ]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(34846, 29271, 5575)"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "len(meta), len(nomountain), len(mountain)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Yeah, that adds up."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Download"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Ok, so what's left to do?\n",
+ "\n",
+ "* Download all image data into two separate directories for training\n",
+ "* Resize the images for the CNN used\n",
+ "\n",
+ "VGG16 and VGG19 expect 224x224 pixel RGB images."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Luckily, IIIF allows us to request images already resized to our demands. That saves on bandwidth, time and code complexity.\n",
+ "\n",
+ "According to the [standard](https://iiif.io/api/image/2.1/#size) we can use the `size` parameter to resize the image exactly to the dimensions we need.\n",
+ "\n",
+ "The links, before and after, would be:\n",
+ "\n",
+ "`https://iiif.onb.ac.at/images/AKON/AK024_176/176/full/full/0/native.jpg`\n",
+ "\n",
+ "`https://iiif.onb.ac.at/images/AKON/AK024_176/176/full/224,224/0/native.jpg`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Let's try it:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "r = requests.get('https://iiif.onb.ac.at/images/AKON/AK024_176/176/full/224,224/0/native.jpg')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from IPython.display import display, Image"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/jpeg": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display(Image(r.content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "That looks about right."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Download to file:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "-"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import shutil\n",
+ "\n",
+ "def download_to_file(url, filename):\n",
+ " with requests.get(url, stream=True) as r:\n",
+ " with open(filename, 'wb') as fh:\n",
+ " shutil.copyfileobj(r.raw, fh)\n",
+ "\n",
+ "def sized_link(iiif_url, size='224,224'):\n",
+ " frags = iiif_url.split('/')\n",
+ " frags[-3] = size\n",
+ " return '/'.join(frags)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "Test that:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "link = sized_link('https://iiif.onb.ac.at/images/AKON/AK024_176/176/full/full/0/native.jpg')\n",
+ "download_to_file(link, 'testimg.jpg')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/jpeg": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "with open('testimg.jpg', 'rb') as fh:\n",
+ " display(Image(fh.read()))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Create directories:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "-"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "os.mkdir('./images')\n",
+ "os.mkdir('./images/mountain')\n",
+ "os.mkdir('./images/nomountain')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Now let's download!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "...................."
+ ]
+ }
+ ],
+ "source": [
+ "# For this demonstration we'll just take 10 images each\n",
+ "for idx, row in mountain.sample(10).iterrows():\n",
+ " akon_id = row['akon_id']\n",
+ " for n, link in enumerate(row['image_links']):\n",
+ " small_image_link = sized_link(link)\n",
+ " file_name = f'./images/mountain/{akon_id}_{n}.jpg'\n",
+ " download_to_file(small_image_link, file_name)\n",
+ " print('.', end='')\n",
+ "for idx, row in nomountain.sample(10).iterrows():\n",
+ " akon_id = row['akon_id']\n",
+ " for n, link in enumerate(row['image_links']):\n",
+ " small_image_link = sized_link(link)\n",
+ " file_name = f'./images/nomountain/{akon_id}_{n}.jpg'\n",
+ " download_to_file(small_image_link, file_name)\n",
+ " print('.', end='') "
]
}
],