{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# AKON Metadata - Data Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Get a first impression of the postcard metadata*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the [Pandas Python Data Analysis Library](https://pandas.pydata.org/).\n", "\n", "For an intro to pandas feel free to take a look at this [Workshop for CBioVikings](https://github.com/dblyon/PandasIntro) by David Lyon." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`df` stands for *Data Frame*" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2', compression='bz2')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## View Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rough Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How much datasets are in there?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "34846" }, "metadata": {}, "execution_count": 5 } ], "source": [ "len(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What does a dataset look like?\n", "Show me the first one!" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " Unnamed: 0 akon_id id altitude building city color \\\n0 0 AK111_021 74682 NaN NaN Kiel, Blücherplatz False \n\n comment mountain other ... geoname_id latitude longitude name \\\n0 1921 gel NaN NaN ... 2891122.0 54.32133 10.13489 Kiel \n\n country_id admin_name_1 admin_code_1 geo \\\n0 DE NaN NaN 54.32133, 10.13489 \n\n download_link \\\n0 https://iiif.onb.ac.at/images/AKON/AK111_021/0... \n\n download_link_256x256 \n0 https://iiif.onb.ac.at/images/AKON/AK111_021/0... \n\n[1 rows x 32 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Unnamed: 0akon_ididaltitudebuildingcitycolorcommentmountainother...geoname_idlatitudelongitudenamecountry_idadmin_name_1admin_code_1geodownload_linkdownload_link_256x256
00AK111_02174682NaNNaNKiel, BlücherplatzFalse1921 gelNaNNaN...2891122.054.3213310.13489KielDENaNNaN54.32133, 10.13489https://iiif.onb.ac.at/images/AKON/AK111_021/0...https://iiif.onb.ac.at/images/AKON/AK111_021/0...
\n

1 rows × 32 columns

\n
" }, "metadata": {}, "execution_count": 6 } ], "source": [ "df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There seem to be a few columns missing from the output. Let's fix that by setting pandas output options:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "pd.set_option('display.max_columns', 100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try again:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " Unnamed: 0 akon_id id altitude building city color \\\n0 0 AK111_021 74682 NaN NaN Kiel, Blücherplatz False \n\n comment mountain other photographer publisher publisher_place region \\\n0 1921 gel NaN NaN NaN NaN NaN NaN \n\n water_body year inventory_number signature \\\n0 NaN NaN NaN Geogr. Topogr. Bilder-Samml. 1943, 7735 \n\n revision_date date feature_class feature_code \\\n0 2014-09-05 10:13:06.342 gelaufen 1921 P PPLA \n\n geoname_id latitude longitude name country_id admin_name_1 admin_code_1 \\\n0 2891122.0 54.32133 10.13489 Kiel DE NaN NaN \n\n geo download_link \\\n0 54.32133, 10.13489 https://iiif.onb.ac.at/images/AKON/AK111_021/0... \n\n download_link_256x256 \n0 https://iiif.onb.ac.at/images/AKON/AK111_021/0... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Unnamed: 0akon_ididaltitudebuildingcitycolorcommentmountainotherphotographerpublisherpublisher_placeregionwater_bodyyearinventory_numbersignaturerevision_datedatefeature_classfeature_codegeoname_idlatitudelongitudenamecountry_idadmin_name_1admin_code_1geodownload_linkdownload_link_256x256
00AK111_02174682NaNNaNKiel, BlücherplatzFalse1921 gelNaNNaNNaNNaNNaNNaNNaNNaNNaNGeogr. Topogr. Bilder-Samml. 1943, 77352014-09-05 10:13:06.342gelaufen 1921PPPLA2891122.054.3213310.13489KielDENaNNaN54.32133, 10.13489https://iiif.onb.ac.at/images/AKON/AK111_021/0...https://iiif.onb.ac.at/images/AKON/AK111_021/0...
\n
" }, "metadata": {}, "execution_count": 8 } ], "source": [ "df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we see all columns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What are all the columns called again?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "Index(['Unnamed: 0', 'akon_id', 'id', 'altitude', 'building', 'city', 'color',\n 'comment', 'mountain', 'other', 'photographer', 'publisher',\n 'publisher_place', 'region', 'water_body', 'year', 'inventory_number',\n 'signature', 'revision_date', 'date', 'feature_class', 'feature_code',\n 'geoname_id', 'latitude', 'longitude', 'name', 'country_id',\n 'admin_name_1', 'admin_code_1', 'geo', 'download_link',\n 'download_link_256x256'],\n dtype='object')" }, "metadata": {}, "execution_count": 9 } ], "source": [ "df.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Show Random Entries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show me 3 random entries:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " Unnamed: 0 akon_id id altitude building \\\n7880 7880 AK086_229 54365 NaN Montraux-Palace, Belmont \n16014 16014 AK015_033 8502 NaN Schloss Waldhausen \n26648 26648 AK053_218 31570 NaN NaN \n\n city color comment mountain \\\n7880 Montreux True 1911 gel Alpes de la Savoie \n16014 NaN False NaN NaN \n26648 Kalksburg, Breitenfurterstrasse False NaN NaN \n\n other photographer publisher publisher_place region water_body \\\n7880 NaN NaN Photoglob Co. Zürich NaN NaN \n16014 NaN NaN NaN NaN NaN NaN \n26648 NaN NaN Janko Kalksburg NaN NaN \n\n year inventory_number signature revision_date \\\n7880 NaN NaN NaN 2014-08-27 15:44:51.079 \n16014 1910.0 NaN NaN 2014-08-04 07:59:10.026 \n26648 1918.0 NaN NaN 2014-08-04 07:59:10.386 \n\n date feature_class feature_code geoname_id latitude \\\n7880 gelaufen 1911 P PPL 2659601.0 46.43301 \n16014 1910 P PPL 2762012.0 48.27377 \n26648 1918 A ADM4 2774904.0 48.13754 \n\n longitude name country_id admin_name_1 \\\n7880 6.91143 Montreux CH Waadt \n16014 14.94750 Waldhausen im Strudengau AT NaN \n26648 16.24599 Kalksburg AT NaN \n\n admin_code_1 geo \\\n7880 VD 46.43301, 6.91143 \n16014 NaN 48.27377, 14.9475 \n26648 NaN 48.13754, 16.24599 \n\n download_link \\\n7880 https://iiif.onb.ac.at/images/AKON/AK086_229/2... \n16014 https://iiif.onb.ac.at/images/AKON/AK015_033/0... \n26648 https://iiif.onb.ac.at/images/AKON/AK053_218/2... \n\n download_link_256x256 \n7880 https://iiif.onb.ac.at/images/AKON/AK086_229/2... \n16014 https://iiif.onb.ac.at/images/AKON/AK015_033/0... \n26648 https://iiif.onb.ac.at/images/AKON/AK053_218/2... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Unnamed: 0akon_ididaltitudebuildingcitycolorcommentmountainotherphotographerpublisherpublisher_placeregionwater_bodyyearinventory_numbersignaturerevision_datedatefeature_classfeature_codegeoname_idlatitudelongitudenamecountry_idadmin_name_1admin_code_1geodownload_linkdownload_link_256x256
78807880AK086_22954365NaNMontraux-Palace, BelmontMontreuxTrue1911 gelAlpes de la SavoieNaNNaNPhotoglob Co.ZürichNaNNaNNaNNaNNaN2014-08-27 15:44:51.079gelaufen 1911PPPL2659601.046.433016.91143MontreuxCHWaadtVD46.43301, 6.91143https://iiif.onb.ac.at/images/AKON/AK086_229/2...https://iiif.onb.ac.at/images/AKON/AK086_229/2...
1601416014AK015_0338502NaNSchloss WaldhausenNaNFalseNaNNaNNaNNaNNaNNaNNaNNaN1910.0NaNNaN2014-08-04 07:59:10.0261910PPPL2762012.048.2737714.94750Waldhausen im StrudengauATNaNNaN48.27377, 14.9475https://iiif.onb.ac.at/images/AKON/AK015_033/0...https://iiif.onb.ac.at/images/AKON/AK015_033/0...
2664826648AK053_21831570NaNNaNKalksburg, BreitenfurterstrasseFalseNaNNaNNaNNaNJankoKalksburgNaNNaN1918.0NaNNaN2014-08-04 07:59:10.3861918AADM42774904.048.1375416.24599KalksburgATNaNNaN48.13754, 16.24599https://iiif.onb.ac.at/images/AKON/AK053_218/2...https://iiif.onb.ac.at/images/AKON/AK053_218/2...
\n
" }, "metadata": {}, "execution_count": 10 } ], "source": [ "df.sample(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling `sample` again yields different entries:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " Unnamed: 0 akon_id id altitude building city color \\\n32175 32175 AK068_223 41634 NaN NaN Baden True \n17757 17757 AK021_519 12608 NaN NaN Zell am See False \n2833 2833 AK121_438 81017 NaN Festenburg Festenburg False \n\n comment mountain other photographer publisher publisher_place region \\\n32175 NaN NaN NaN NaN Bauer Wien NaN \n17757 v 1907 NaN NaN NaN Ledermann Wien NaN \n2833 NaN NaN NaN NaN Pelnitschar Aspang NaN \n\n water_body year inventory_number \\\n32175 NaN 1913.0 NaN \n17757 Zeller See NaN NaN \n2833 NaN 1920.0 NaN \n\n signature revision_date \\\n32175 Vues-Sammlung I. 7425 2014-08-13 14:19:10.145 \n17757 NaN 2014-08-04 07:59:10.136 \n2833 Nationalbibliothek Karten Abteilung 3062 2014-09-12 08:38:22.055 \n\n date feature_class feature_code geoname_id latitude longitude \\\n32175 1913 P PPLA3 2782067.0 48.00543 16.23264 \n17757 vor 1907 P PPLA3 2760634.0 47.32556 12.79444 \n2833 1920 S CSTL 2779616.0 47.45000 15.91667 \n\n name country_id admin_name_1 admin_code_1 \\\n32175 Baden bei Wien AT NaN NaN \n17757 Zell am See AT NaN NaN \n2833 Festenburg AT NaN NaN \n\n geo download_link \\\n32175 48.00543, 16.23264 https://iiif.onb.ac.at/images/AKON/AK068_223/2... \n17757 47.32556, 12.79444 https://iiif.onb.ac.at/images/AKON/AK021_519/5... \n2833 47.45, 15.91667 https://iiif.onb.ac.at/images/AKON/AK121_438/4... \n\n download_link_256x256 \n32175 https://iiif.onb.ac.at/images/AKON/AK068_223/2... \n17757 https://iiif.onb.ac.at/images/AKON/AK021_519/5... \n2833 https://iiif.onb.ac.at/images/AKON/AK121_438/4... ", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Unnamed: 0akon_ididaltitudebuildingcitycolorcommentmountainotherphotographerpublisherpublisher_placeregionwater_bodyyearinventory_numbersignaturerevision_datedatefeature_classfeature_codegeoname_idlatitudelongitudenamecountry_idadmin_name_1admin_code_1geodownload_linkdownload_link_256x256
3217532175AK068_22341634NaNNaNBadenTrueNaNNaNNaNNaNBauerWienNaNNaN1913.0NaNVues-Sammlung I. 74252014-08-13 14:19:10.1451913PPPLA32782067.048.0054316.23264Baden bei WienATNaNNaN48.00543, 16.23264https://iiif.onb.ac.at/images/AKON/AK068_223/2...https://iiif.onb.ac.at/images/AKON/AK068_223/2...
1775717757AK021_51912608NaNNaNZell am SeeFalsev 1907NaNNaNNaNLedermannWienNaNZeller SeeNaNNaNNaN2014-08-04 07:59:10.136vor 1907PPPLA32760634.047.3255612.79444Zell am SeeATNaNNaN47.32556, 12.79444https://iiif.onb.ac.at/images/AKON/AK021_519/5...https://iiif.onb.ac.at/images/AKON/AK021_519/5...
28332833AK121_43881017NaNFestenburgFestenburgFalseNaNNaNNaNNaNPelnitscharAspangNaNNaN1920.0NaNNationalbibliothek Karten Abteilung 30622014-09-12 08:38:22.0551920SCSTL2779616.047.4500015.91667FestenburgATNaNNaN47.45, 15.91667https://iiif.onb.ac.at/images/AKON/AK121_438/4...https://iiif.onb.ac.at/images/AKON/AK121_438/4...
\n
" }, "metadata": {}, "execution_count": 11 } ], "source": [ "df.sample(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Count Things" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many entries show things in Italy?\n", "\n", "Let's use the `country_id` for this question:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "df_in_italy = df[df['country_id'] == 'IT']" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "3221" }, "metadata": {}, "execution_count": 13 } ], "source": [ "len(df_in_italy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many postcards are in color?" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "df_in_color = df[df['color'] == True]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "7667" }, "metadata": {}, "execution_count": 15 } ], "source": [ "len(df_in_color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can I do this in one line?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "7667" }, "metadata": {}, "execution_count": 16 } ], "source": [ "len(df[df['color'] == True])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many different publisher places are in the data set?" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "1545" }, "metadata": {}, "execution_count": 17 } ], "source": [ "len(df['publisher_place'].unique())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Show me some! Let's wrap it in a pandas DataFrame, step by step:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "publisher_places = df['publisher_place'].unique()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "array([nan, 'Wien', 'Kierling', ..., 'Königstein i. T.', 'Detmold',\n 'Furth i. W.'], dtype=object)" }, "metadata": {}, "execution_count": 19 } ], "source": [ "publisher_places" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "pp = pd.DataFrame(publisher_places)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " 0\n0 NaN\n1 Wien\n2 Kierling\n3 Kindberg\n4 Kirchau\n... ...\n1540 Pisa\n1541 Straßburg i./E.\n1542 Königstein i. T.\n1543 Detmold\n1544 Furth i. W.\n\n[1545 rows x 1 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
0
0NaN
1Wien
2Kierling
3Kindberg
4Kirchau
......
1540Pisa
1541Straßburg i./E.
1542Königstein i. T.
1543Detmold
1544Furth i. W.
\n

1545 rows × 1 columns

\n
" }, "metadata": {}, "execution_count": 21 } ], "source": [ "pp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Better. Now show me some randomly:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " 0\n624 Braunau a. Inn\n1360 Kratzau\n1319 Nový Bydžov\n592 Hyères\n1060 Kapellen a. d. Mürz", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
0
624Braunau a. Inn
1360Kratzau
1319Nový Bydžov
592Hyères
1060Kapellen a. d. Mürz
\n
" }, "metadata": {}, "execution_count": 22 } ], "source": [ "pp.sample(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sort Things" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just sort the sample, please:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " 0\n29 Arys\n576 Buenos Aires\n1372 Getzersdorf\n148 Kochel\n1119 Maria Trost\n238 Mariazell\n1096 Melk\n273 Münden\n1196 Stein\n1047 Vitis", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
0
29Arys
576Buenos Aires
1372Getzersdorf
148Kochel
1119Maria Trost
238Mariazell
1096Melk
273Münden
1196Stein
1047Vitis
\n
" }, "metadata": {}, "execution_count": 23 } ], "source": [ "pp.sample(10).sort_values(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why the '0' in `sort_values(0)`? That's the name of the column to sort by." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sort the whole thing:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": " 0\n1248 Békéscsaba \n1303 Łuck\n389 #\n1489 A B.\n1239 A.\n... ...\n861 w\n417 Č. Krumlov\n1304 Łuck\n893 Šibenik\n0 NaN\n\n[1545 rows x 1 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
0
1248Békéscsaba
1303Łuck
389#
1489A B.
1239A.
......
861w
417Č. Krumlov
1304Łuck
893Šibenik
0NaN
\n

1545 rows × 1 columns

\n
" }, "metadata": {}, "execution_count": 24 } ], "source": [ "pp.sort_values(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems like there's something weird going on with 'Békéscsaba', it doesn't sort right. What is wrong?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's extract the datum:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "0 Békéscsaba \nName: 1248, dtype: object" }, "metadata": {}, "execution_count": 25 } ], "source": [ "pp.iloc[1248]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More specifically the column '0':" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": "' Békéscsaba '" }, "metadata": {}, "execution_count": 26 } ], "source": [ "pp.iloc[1248][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seems there's a space in front of the 'B'. That's why it sorts wrong." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.7.7 64-bit ('venv': venv)", "language": "python", "name": "python37764bitvenvvenveb3c9aa788d446a5bb7cfee674062d0a" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7-final" } }, "nbformat": 4, "nbformat_minor": 2 }