{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# LOC Colors - Data Management" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Export data as minimal JSON files - only the essentials to create the swatches in the browser*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use the created swatches for [https://labs.onb.ac.at/en/topic/akon-swatches/](https://labs.onb.ac.at/en/topic/akon-swatches/), I have to create a JSON file looking like this:\n", "\n", "```json\n", "[[\"AK111_461\", [\"#f4e6cd\", \"#cac4b2\", \"#7e8077\", \"#3e4139\", \"#2f3431\", \"#000304\"], \"Nonza\", \"gelaufen 1903\"],\n", "[\"AK111_072\", [\"#e2d7c1\", \"#a19c8f\", \"#504e42\", \"#494a44\", \"#010500\", \"#393c39\"], \"Kirchberg am Walde\", \"gelaufen 1914\"],\n", "[\"AK111_077\", [\"#454234\", \"#3e3b1f\", \"#7f7e77\", \"#a9b8be\", \"#3b4347\", \"#425a6b\"], \"Kirchberg am Wechsel\", \"gelaufen 1913\"]]\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first part (the id and the colors) are part of the created swatches, the second part (a name and an approximate date) are part of the metadata to download here: [https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2?inline=false](https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2?inline=false)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read created swatches" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/color-swatches-data/-/raw/master/historic_postcards_color_swatches.csv.bz2?inline=false', compression='bz2')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0akon_idimage_linkhex_colorshtml
2191421914AK036_405https://iiif.onb.ac.at/images/AKON/AK036_405/4...['#f2e3c1', '#e6dec6', '#8e8a7a', '#7b7864', '...<a href=\"https://iiif.onb.ac.at/images/AKON/AK...
\n", "
" ], "text/plain": [ " Unnamed: 0 akon_id \\\n", "21914 21914 AK036_405 \n", "\n", " image_link \\\n", "21914 https://iiif.onb.ac.at/images/AKON/AK036_405/4... \n", "\n", " hex_colors \\\n", "21914 ['#f2e3c1', '#e6dec6', '#8e8a7a', '#7b7864', '... \n", "\n", " html \n", "21914 \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
akon_idhex_colors
14980AK010_595['#cfbfa4', '#62583b', '#c8c7b6', '#4f5144', '...
7474AK084_243['#aca693', '#414028', '#444537', '#5c5f57', '...
23730AK043_578['#4e502c', '#444735', '#51554a', '#dae1d1', '...
30352AK085_096['#b5aa9d', '#f3e7d7', '#756d62', '#211a0f', '...
22389AK038_067['#f2e9cb', '#989384', '#545245', '#fcf7db', '...
\n", "" ], "text/plain": [ " akon_id hex_colors\n", "14980 AK010_595 ['#cfbfa4', '#62583b', '#c8c7b6', '#4f5144', '...\n", "7474 AK084_243 ['#aca693', '#414028', '#444537', '#5c5f57', '...\n", "23730 AK043_578 ['#4e502c', '#444735', '#51554a', '#dae1d1', '...\n", "30352 AK085_096 ['#b5aa9d', '#f3e7d7', '#756d62', '#211a0f', '...\n", "22389 AK038_067 ['#f2e9cb', '#989384', '#545245', '#fcf7db', '..." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id_and_hex_colors.sample(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parse Color Array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to properly export the color array as a JSON array later, convert the data representation slightly." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import json" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "id_and_hex_colors['colors'] = id_and_hex_colors['hex_colors'].apply(\n", " lambda c: json.loads(c.replace(\"'\", '\"'))\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
akon_idhex_colorscolors
1291AK116_455['#f4ece0', '#d4d2ce', '#716d5c', '#65665e', '...[#f4ece0, #d4d2ce, #716d5c, #65665e, #b2b4bb, ...
\n", "
" ], "text/plain": [ " akon_id hex_colors \\\n", "1291 AK116_455 ['#f4ece0', '#d4d2ce', '#716d5c', '#65665e', '... \n", "\n", " colors \n", "1291 [#f4ece0, #d4d2ce, #716d5c, #65665e, #b2b4bb, ... " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id_and_hex_colors.sample()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do you see the subtle difference? The entry in the colors column is now an array with strings _without_ the single quotes `'`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "id_and_colors = id_and_hex_colors[['akon_id', 'colors']].copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add Metadata From Original Records" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next up: Combining the id and the color array with names and dates. Read the metadata dump directly from the link found on [https://labs.onb.ac.at/en/dataset/akon/](https://labs.onb.ac.at/en/dataset/akon/):" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.\n", " interactivity=interactivity, compiler=compiler, result=result)\n" ] } ], "source": [ "original = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2?inline=false', compression='bz2')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
akon_idnamedate
16251AK016_087Neulengbachvor 1907
23864AK044_401Gloriette1907
33292AK085_517Milanovor 1905
13875AK008_070Schloß Schönbrunn1906
12223AK004_040Pürgg1909
4129AK125_097Altausseegelaufen 1901
23756AK044_080Radstädter Tauern1907
26479AK069_067Attersee1906
14251AK109_329Josefsthalvor 1907
28518AK063_083Gaußig1908
\n", "
" ], "text/plain": [ " akon_id name date\n", "16251 AK016_087 Neulengbach vor 1907\n", "23864 AK044_401 Gloriette 1907\n", "33292 AK085_517 Milano vor 1905\n", "13875 AK008_070 Schloß Schönbrunn 1906\n", "12223 AK004_040 Pürgg 1909\n", "4129 AK125_097 Altaussee gelaufen 1901\n", "23756 AK044_080 Radstädter Tauern 1907\n", "26479 AK069_067 Attersee 1906\n", "14251 AK109_329 Josefsthal vor 1907\n", "28518 AK063_083 Gaußig 1908" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "original[['akon_id', 'name', 'date']].sample(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are the columns needed." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "original_info = original[['akon_id', 'name', 'date']].copy()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
akon_idnamedate
24871AK048_377Reichenau an der Rax1925
4191AK054_543Aflenz Kurortvor 1905
\n", "
" ], "text/plain": [ " akon_id name date\n", "24871 AK048_377 Reichenau an der Rax 1925\n", "4191 AK054_543 Aflenz Kurort vor 1905" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "original_info.sample(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas offers a handy function for merging two dataframes _not on the index_, but on a shared column:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "colors_and_info = pd.merge(id_and_colors, original_info, on='akon_id')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
akon_idcolorsnamedate
16356AK016_310[#5f5a57, #9c938c, #4d453f, #cabbab, #3c3b39, ...Kapellen1908
1117AK116_129[#dcc9ab, #cec0a7, #a8a290, #48473a, #6c6c62, ...Garstengelaufen 1902
\n", "
" ], "text/plain": [ " akon_id colors name \\\n", "16356 AK016_310 [#5f5a57, #9c938c, #4d453f, #cabbab, #3c3b39, ... Kapellen \n", "1117 AK116_129 [#dcc9ab, #cec0a7, #a8a290, #48473a, #6c6c62, ... Garsten \n", "\n", " date \n", "16356 1908 \n", "1117 gelaufen 1902 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "colors_and_info.sample(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's exactly what's needed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save to JSON-File" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas can export the data exactly in the target format:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "colors_and_info.to_json('historic_postcards__id_colors_name_date.json', orient='values')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract Subset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's almost all. For the preview on [https://labs.onb.ac.at/en/dataset/akon/](https://labs.onb.ac.at/en/dataset/akon/) I need a subset of 100 swatches and save them:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "id_and_colors_100 = colors_and_info.iloc[:100]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
akon_idcolorsnamedate
99AK111_207[#403e2c, #f9f8eb, #7e7f6b, #35362d, #32342f, ...Klosterneuburggelaufen 1908
34AK111_293[#e6d8bf, #c7c1b1, #4a4c3c, #aeafa5, #464740, ...Komotauvor 1905
86AK111_184[#f7edd6, #d1cbb8, #f6f0db, #2d2a1b, #9e9f93, ...Klausenburggelaufen 1904
3AK111_026[#e2cba6, #9e8e73, #574c39, #3c311a, #4c473b, ...Kierling1922
39AK111_072[#e5d9c2, #c0baac, #928e81, #4c493e, #484943, ...Kirchberg am Waldegelaufen 1914
\n", "
" ], "text/plain": [ " akon_id colors \\\n", "99 AK111_207 [#403e2c, #f9f8eb, #7e7f6b, #35362d, #32342f, ... \n", "34 AK111_293 [#e6d8bf, #c7c1b1, #4a4c3c, #aeafa5, #464740, ... \n", "86 AK111_184 [#f7edd6, #d1cbb8, #f6f0db, #2d2a1b, #9e9f93, ... \n", "3 AK111_026 [#e2cba6, #9e8e73, #574c39, #3c311a, #4c473b, ... \n", "39 AK111_072 [#e5d9c2, #c0baac, #928e81, #4c493e, #484943, ... \n", "\n", " name date \n", "99 Klosterneuburg gelaufen 1908 \n", "34 Komotau vor 1905 \n", "86 Klausenburg gelaufen 1904 \n", "3 Kierling 1922 \n", "39 Kirchberg am Walde gelaufen 1914 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id_and_colors_100.sample(5)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "id_and_colors_100.to_json('historic_postcards__id_colors_name_date__100.json', orient='values')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And done!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 User Default", "language": "python", "name": "python_3_user_default" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }