Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AKON Metadata - Data Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Get a first impression of the postcard metadata*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using the [Pandas Python Data Analysis Library](https://pandas.pydata.org/).\n",
"\n",
"For an intro to pandas feel free to take a look at this [Workshop for CBioVikings](https://github.com/dblyon/PandasIntro) by David Lyon."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`df` stands for *Data Frame*"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/kst/tmp/dingsdi/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3049: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" interactivity=interactivity, compiler=compiler, result=result)\n"
]
}
],
"df = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2', compression='bz2')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## View Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Rough Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How much datasets are in there?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What does a dataset look like?\n",
"Show me the first one!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>akon_id</th>\n",
" <th>id</th>\n",
" <th>altitude</th>\n",
" <th>building</th>\n",
" <th>city</th>\n",
" <th>color</th>\n",
" <th>comment</th>\n",
" <th>mountain</th>\n",
" <th>other</th>\n",
" <th>...</th>\n",
" <th>geoname_id</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>name</th>\n",
" <th>country_id</th>\n",
" <th>admin_name_1</th>\n",
" <th>admin_code_1</th>\n",
" <th>geo</th>\n",
" <th>download_link</th>\n",
" <th>download_link_256x256</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>AK111_021</td>\n",
" <td>74682</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Kiel, Blücherplatz</td>\n",
" <td>False</td>\n",
" <td>1921 gel</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>2891122.0</td>\n",
" <td>54.32133</td>\n",
" <td>10.13489</td>\n",
" <td>Kiel</td>\n",
" <td>DE</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>54.32133, 10.13489</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1 rows × 32 columns</p>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 akon_id id altitude building city color \\\n",
"0 0 AK111_021 74682 NaN NaN Kiel, Blücherplatz False \n",
"\n",
" comment mountain other ... geoname_id latitude longitude name \\\n",
"0 1921 gel NaN NaN ... 2891122.0 54.32133 10.13489 Kiel \n",
" country_id admin_name_1 admin_code_1 geo \\\n",
"0 DE NaN NaN 54.32133, 10.13489 \n",
" download_link \\\n",
"0 https://iiif.onb.ac.at/images/AKON/AK111_021/0... \n",
" download_link_256x256 \n",
"0 https://iiif.onb.ac.at/images/AKON/AK111_021/0... \n",
"[1 rows x 32 columns]"
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There seem to be a few columns missing from the output. Let's fix that by setting pandas output options:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"pd.set_option('display.max_columns', 100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try again:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>akon_id</th>\n",
" <th>id</th>\n",
" <th>altitude</th>\n",
" <th>building</th>\n",
" <th>city</th>\n",
" <th>color</th>\n",
" <th>comment</th>\n",
" <th>mountain</th>\n",
" <th>other</th>\n",
" <th>photographer</th>\n",
" <th>publisher</th>\n",
" <th>publisher_place</th>\n",
" <th>region</th>\n",
" <th>water_body</th>\n",
" <th>year</th>\n",
" <th>inventory_number</th>\n",
" <th>signature</th>\n",
" <th>revision_date</th>\n",
" <th>date</th>\n",
" <th>feature_class</th>\n",
" <th>feature_code</th>\n",
" <th>geoname_id</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>name</th>\n",
" <th>country_id</th>\n",
" <th>admin_name_1</th>\n",
" <th>admin_code_1</th>\n",
" <th>geo</th>\n",
" <th>download_link</th>\n",
" <th>download_link_256x256</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>AK111_021</td>\n",
" <td>74682</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Kiel, Blücherplatz</td>\n",
" <td>False</td>\n",
" <td>1921 gel</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Geogr. Topogr. Bilder-Samml. 1943, 7735</td>\n",
" <td>2014-09-05 10:13:06.342</td>\n",
" <td>gelaufen 1921</td>\n",
" <td>PPLA</td>\n",
" <td>2891122.0</td>\n",
" <td>54.32133</td>\n",
" <td>10.13489</td>\n",
" <td>Kiel</td>\n",
" <td>DE</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>54.32133, 10.13489</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK111_021/0...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 akon_id id altitude building city color \\\n",
"0 0 AK111_021 74682 NaN NaN Kiel, Blücherplatz False \n",
" comment mountain other photographer publisher publisher_place region \\\n",
"0 1921 gel NaN NaN NaN NaN NaN NaN \n",
" water_body year inventory_number signature \\\n",
"0 NaN NaN NaN Geogr. Topogr. Bilder-Samml. 1943, 7735 \n",
"\n",
" revision_date date feature_class feature_code \\\n",
"0 2014-09-05 10:13:06.342 gelaufen 1921 P PPLA \n",
"\n",
" geoname_id latitude longitude name country_id admin_name_1 admin_code_1 \\\n",
"0 2891122.0 54.32133 10.13489 Kiel DE NaN NaN \n",
" geo download_link \\\n",
"0 54.32133, 10.13489 https://iiif.onb.ac.at/images/AKON/AK111_021/0... \n",
" download_link_256x256 \n",
"0 https://iiif.onb.ac.at/images/AKON/AK111_021/0... "
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we see all columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What are all the columns called again?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Unnamed: 0', 'akon_id', 'id', 'altitude', 'building', 'city', 'color',\n",
" 'comment', 'mountain', 'other', 'photographer', 'publisher',\n",
" 'publisher_place', 'region', 'water_body', 'year', 'inventory_number',\n",
" 'signature', 'revision_date', 'date', 'feature_class', 'feature_code',\n",
" 'geoname_id', 'latitude', 'longitude', 'name', 'country_id',\n",
" 'admin_name_1', 'admin_code_1', 'geo', 'download_link',\n",
" 'download_link_256x256'],\n",
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
" dtype='object')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Show Random Entries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show me 3 random entries:"
]
},
{
"cell_type": "code",
"execution_count": 8,
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>akon_id</th>\n",
" <th>id</th>\n",
" <th>altitude</th>\n",
" <th>building</th>\n",
" <th>city</th>\n",
" <th>color</th>\n",
" <th>comment</th>\n",
" <th>mountain</th>\n",
" <th>other</th>\n",
" <th>photographer</th>\n",
" <th>publisher</th>\n",
" <th>publisher_place</th>\n",
" <th>region</th>\n",
" <th>water_body</th>\n",
" <th>year</th>\n",
" <th>inventory_number</th>\n",
" <th>signature</th>\n",
" <th>revision_date</th>\n",
" <th>date</th>\n",
" <th>feature_class</th>\n",
" <th>feature_code</th>\n",
" <th>geoname_id</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>name</th>\n",
" <th>country_id</th>\n",
" <th>admin_name_1</th>\n",
" <th>admin_code_1</th>\n",
" <th>geo</th>\n",
" <th>download_link</th>\n",
" <th>download_link_256x256</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>28908</th>\n",
" <td>28908</td>\n",
" <td>AK066_086</td>\n",
" <td>40120</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Innsbruck, Maria Theresienstrasse</td>\n",
" <td>False</td>\n",
" <td>1907 gel</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Gratl</td>\n",
" <td>Innsbruck</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2014-08-04 07:59:10.424</td>\n",
" <td>vor 1907</td>\n",
" <td>PPLA</td>\n",
" <td>2775220.0</td>\n",
" <td>47.26266</td>\n",
" <td>11.39454</td>\n",
" <td>Innsbruck</td>\n",
" <td>AT</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>47.26266, 11.39454</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK066_086/0...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK066_086/0...</td>\n",
" <th>21317</th>\n",
" <td>21317</td>\n",
" <td>AK034_386</td>\n",
" <td>20303</td>\n",
" <td>251.0</td>\n",
" <td>Gars-Thunau am Kamp</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Ledermann</td>\n",
" <td>Wien</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1909.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2014-08-04 07:59:10.272</td>\n",
" <td>1909</td>\n",
" <td>P</td>\n",
" <td>PPL</td>\n",
" <td>2763660.0</td>\n",
" <td>48.58333</td>\n",
" <td>15.65000</td>\n",
" <td>Thunau am Kamp</td>\n",
" <td>AT</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>48.58333, 15.65</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK034_386/3...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK034_386/3...</td>\n",
" <th>23201</th>\n",
" <td>23201</td>\n",
" <td>AK041_572</td>\n",
" <td>24699</td>\n",
" <td>251.0</td>\n",
" <td>Gars-Thunau am Kamp</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Ledermann</td>\n",
" <td>Wien</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1908.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2014-08-04 07:59:10.328</td>\n",
" <td>1908</td>\n",
" <td>P</td>\n",
" <td>PPL</td>\n",
" <td>2763660.0</td>\n",
" <td>48.58333</td>\n",
" <td>15.65000</td>\n",
" <td>Thunau am Kamp</td>\n",
" <td>AT</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>48.58333, 15.65</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK041_572/5...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK041_572/5...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 akon_id id altitude building \\\n",
"28908 28908 AK066_086 40120 NaN NaN \n",
"21317 21317 AK034_386 20303 251.0 NaN \n",
"23201 23201 AK041_572 24699 251.0 NaN \n",
" city color comment mountain other \\\n",
"28908 Innsbruck, Maria Theresienstrasse False 1907 gel NaN NaN \n",
"21317 Gars-Thunau am Kamp True NaN NaN NaN \n",
"23201 Gars-Thunau am Kamp False NaN NaN NaN \n",
" photographer publisher publisher_place region water_body year \\\n",
"28908 NaN Gratl Innsbruck NaN NaN NaN \n",
"21317 NaN Ledermann Wien NaN NaN 1909.0 \n",
"23201 NaN Ledermann Wien NaN NaN 1908.0 \n",
" inventory_number signature revision_date date \\\n",
"28908 NaN NaN 2014-08-04 07:59:10.424 vor 1907 \n",
"21317 NaN NaN 2014-08-04 07:59:10.272 1909 \n",
"23201 NaN NaN 2014-08-04 07:59:10.328 1908 \n",
" feature_class feature_code geoname_id latitude longitude \\\n",
"28908 P PPLA 2775220.0 47.26266 11.39454 \n",
"21317 P PPL 2763660.0 48.58333 15.65000 \n",
"23201 P PPL 2763660.0 48.58333 15.65000 \n",
" name country_id admin_name_1 admin_code_1 \\\n",
"28908 Innsbruck AT NaN NaN \n",
"21317 Thunau am Kamp AT NaN NaN \n",
"23201 Thunau am Kamp AT NaN NaN \n",
"\n",
" geo download_link \\\n",
"28908 47.26266, 11.39454 https://iiif.onb.ac.at/images/AKON/AK066_086/0... \n",
"21317 48.58333, 15.65 https://iiif.onb.ac.at/images/AKON/AK034_386/3... \n",
"23201 48.58333, 15.65 https://iiif.onb.ac.at/images/AKON/AK041_572/5... \n",
"\n",
" download_link_256x256 \n",
"28908 https://iiif.onb.ac.at/images/AKON/AK066_086/0... \n",
"21317 https://iiif.onb.ac.at/images/AKON/AK034_386/3... \n",
"23201 https://iiif.onb.ac.at/images/AKON/AK041_572/5... "
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sample(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Calling `sample` again yields different entries:"
]
},
{
"cell_type": "code",
"execution_count": 9,
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Unnamed: 0</th>\n",
" <th>akon_id</th>\n",
" <th>id</th>\n",
" <th>altitude</th>\n",
" <th>building</th>\n",
" <th>city</th>\n",
" <th>color</th>\n",
" <th>comment</th>\n",
" <th>mountain</th>\n",
" <th>other</th>\n",
" <th>photographer</th>\n",
" <th>publisher</th>\n",
" <th>publisher_place</th>\n",
" <th>region</th>\n",
" <th>water_body</th>\n",
" <th>year</th>\n",
" <th>inventory_number</th>\n",
" <th>signature</th>\n",
" <th>revision_date</th>\n",
" <th>date</th>\n",
" <th>feature_class</th>\n",
" <th>feature_code</th>\n",
" <th>geoname_id</th>\n",
" <th>latitude</th>\n",
" <th>longitude</th>\n",
" <th>name</th>\n",
" <th>country_id</th>\n",
" <th>admin_name_1</th>\n",
" <th>admin_code_1</th>\n",
" <th>geo</th>\n",
" <th>download_link</th>\n",
" <th>download_link_256x256</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>18810</th>\n",
" <td>18810</td>\n",
" <td>AK025_111</td>\n",
" <td>14618</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Bruck an der Mur</td>\n",
" <td>True</td>\n",
" <td>Mugel</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Ledermann</td>\n",
" <td>Wien</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1916.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2014-10-15 12:03:01.028</td>\n",
" <td>1916</td>\n",
" <td>P</td>\n",
" <td>PPLA3</td>\n",
" <td>2781371.0</td>\n",
" <td>47.41667</td>\n",
" <td>15.28333</td>\n",
" <td>Bruck an der Mur</td>\n",
" <td>AT</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>47.41667, 15.28333</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK025_111/1...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK025_111/1...</td>\n",
" <th>28146</th>\n",
" <td>28146</td>\n",
" <td>AK061_165</td>\n",
" <td>36541</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Orosháza</td>\n",
" <td>True</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Vágner</td>\n",
" <td>Orosháza</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1917.0</td>\n",
" <td>NaN</td>\n",
" <td>Kartensammlung 79/66 G</td>\n",
" <td>2015-08-25 15:28:56.547</td>\n",
" <td>1917</td>\n",
" <td>P</td>\n",
" <td>PPL</td>\n",
" <td>716736.0</td>\n",
" <td>46.56667</td>\n",
" <td>20.66667</td>\n",
" <td>Oroshaza</td>\n",
" <td>HU</td>\n",
" <td>Bekes County</td>\n",
" <td>03</td>\n",
" <td>46.56667, 20.66667</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK061_165/1...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK061_165/1...</td>\n",
" <th>8335</th>\n",
" <td>8335</td>\n",
" <td>AK088_563</td>\n",
" <td>56103</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Bad Reichenhall</td>\n",
" <td>False</td>\n",
" <td>1907 gel</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Geogrphisch-topographische Bildersammlung 1076/43</td>\n",
" <td>2014-08-28 16:20:02.029</td>\n",
" <td>vor 1907</td>\n",
" <td>P</td>\n",
" <td>PPLA3</td>\n",
" <td>2953371.0</td>\n",
" <td>47.72947</td>\n",
" <td>12.87819</td>\n",
" <td>Bad Reichenhall</td>\n",
" <td>DE</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>47.72947, 12.87819</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK088_563/5...</td>\n",
" <td>https://iiif.onb.ac.at/images/AKON/AK088_563/5...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Unnamed: 0 akon_id id altitude building city \\\n",
"18810 18810 AK025_111 14618 NaN NaN Bruck an der Mur \n",
"28146 28146 AK061_165 36541 NaN NaN Orosháza \n",
"8335 8335 AK088_563 56103 NaN NaN Bad Reichenhall \n",
" color comment mountain other photographer publisher publisher_place \\\n",
"18810 True NaN Mugel NaN NaN Ledermann Wien \n",
"28146 True NaN NaN NaN NaN Vágner Orosháza \n",
"8335 False 1907 gel NaN NaN NaN NaN NaN \n",
" region water_body year inventory_number \\\n",
"18810 NaN NaN 1916.0 NaN \n",
"28146 NaN NaN 1917.0 NaN \n",
"8335 NaN NaN NaN NaN \n",
" signature \\\n",
"18810 NaN \n",
"28146 Kartensammlung 79/66 G \n",
"8335 Geogrphisch-topographische Bildersammlung 1076/43 \n",
" revision_date date feature_class feature_code \\\n",
"18810 2014-10-15 12:03:01.028 1916 P PPLA3 \n",
"28146 2015-08-25 15:28:56.547 1917 P PPL \n",
"8335 2014-08-28 16:20:02.029 vor 1907 P PPLA3 \n",
"\n",
" geoname_id latitude longitude name country_id \\\n",
"18810 2781371.0 47.41667 15.28333 Bruck an der Mur AT \n",
"28146 716736.0 46.56667 20.66667 Oroshaza HU \n",
"8335 2953371.0 47.72947 12.87819 Bad Reichenhall DE \n",
"\n",
" admin_name_1 admin_code_1 geo \\\n",
"18810 NaN NaN 47.41667, 15.28333 \n",
"28146 Bekes County 03 46.56667, 20.66667 \n",
"8335 NaN NaN 47.72947, 12.87819 \n",
"\n",
" download_link \\\n",
"18810 https://iiif.onb.ac.at/images/AKON/AK025_111/1... \n",
"28146 https://iiif.onb.ac.at/images/AKON/AK061_165/1... \n",
"8335 https://iiif.onb.ac.at/images/AKON/AK088_563/5... \n",
" download_link_256x256 \n",
"18810 https://iiif.onb.ac.at/images/AKON/AK025_111/1... \n",
"28146 https://iiif.onb.ac.at/images/AKON/AK061_165/1... \n",
"8335 https://iiif.onb.ac.at/images/AKON/AK088_563/5... "
"execution_count": 9,
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sample(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Count Things"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many entries show things in Italy?\n",
"\n",
"Let's use the `country_id` for this question:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"df_in_italy = df[df['country_id'] == 'IT']"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df_in_italy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many postcards are in color?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"df_in_color = df[df['color'] == True]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df_in_color)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Can I do this in one line?"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df[df['color'] == True])"
]