{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LOC Colors - Data Management"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Export data as minimal JSON files - only the essentials to create the swatches in the browser*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To use the created swatches for [https://labs.onb.ac.at/en/topic/akon-swatches/](https://labs.onb.ac.at/en/topic/akon-swatches/), I have to create a JSON file looking like this:\n",
"\n",
"```json\n",
"[[\"AK111_461\", [\"#f4e6cd\", \"#cac4b2\", \"#7e8077\", \"#3e4139\", \"#2f3431\", \"#000304\"], \"Nonza\", \"gelaufen 1903\"],\n",
"[\"AK111_072\", [\"#e2d7c1\", \"#a19c8f\", \"#504e42\", \"#494a44\", \"#010500\", \"#393c39\"], \"Kirchberg am Walde\", \"gelaufen 1914\"],\n",
"[\"AK111_077\", [\"#454234\", \"#3e3b1f\", \"#7f7e77\", \"#a9b8be\", \"#3b4347\", \"#425a6b\"], \"Kirchberg am Wechsel\", \"gelaufen 1913\"]]\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first part (the id and the colors) are part of the created swatches, the second part (a name and an approximate date) are part of the metadata to download here: [https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2?inline=false](https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2?inline=false)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read created swatches"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/color-swatches-data/-/raw/master/historic_postcards_color_swatches.csv.bz2?inline=false', compression='bz2')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" akon_id | \n",
" image_link | \n",
" hex_colors | \n",
" html | \n",
"
\n",
" \n",
" \n",
" \n",
" 21914 | \n",
" 21914 | \n",
" AK036_405 | \n",
" https://iiif.onb.ac.at/images/AKON/AK036_405/4... | \n",
" ['#f2e3c1', '#e6dec6', '#8e8a7a', '#7b7864', '... | \n",
" <a href=\"https://iiif.onb.ac.at/images/AKON/AK... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 akon_id \\\n",
"21914 21914 AK036_405 \n",
"\n",
" image_link \\\n",
"21914 https://iiif.onb.ac.at/images/AKON/AK036_405/4... \n",
"\n",
" hex_colors \\\n",
"21914 ['#f2e3c1', '#e6dec6', '#8e8a7a', '#7b7864', '... \n",
"\n",
" html \n",
"21914 \n",
"\n",
"\n",
" \n",
" \n",
" | \n",
" akon_id | \n",
" hex_colors | \n",
"
\n",
" \n",
" \n",
" \n",
" 14980 | \n",
" AK010_595 | \n",
" ['#cfbfa4', '#62583b', '#c8c7b6', '#4f5144', '... | \n",
"
\n",
" \n",
" 7474 | \n",
" AK084_243 | \n",
" ['#aca693', '#414028', '#444537', '#5c5f57', '... | \n",
"
\n",
" \n",
" 23730 | \n",
" AK043_578 | \n",
" ['#4e502c', '#444735', '#51554a', '#dae1d1', '... | \n",
"
\n",
" \n",
" 30352 | \n",
" AK085_096 | \n",
" ['#b5aa9d', '#f3e7d7', '#756d62', '#211a0f', '... | \n",
"
\n",
" \n",
" 22389 | \n",
" AK038_067 | \n",
" ['#f2e9cb', '#989384', '#545245', '#fcf7db', '... | \n",
"
\n",
" \n",
"
\n",
""
],
"text/plain": [
" akon_id hex_colors\n",
"14980 AK010_595 ['#cfbfa4', '#62583b', '#c8c7b6', '#4f5144', '...\n",
"7474 AK084_243 ['#aca693', '#414028', '#444537', '#5c5f57', '...\n",
"23730 AK043_578 ['#4e502c', '#444735', '#51554a', '#dae1d1', '...\n",
"30352 AK085_096 ['#b5aa9d', '#f3e7d7', '#756d62', '#211a0f', '...\n",
"22389 AK038_067 ['#f2e9cb', '#989384', '#545245', '#fcf7db', '..."
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id_and_hex_colors.sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parse Color Array"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to properly export the color array as a JSON array later, convert the data representation slightly."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import json"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"id_and_hex_colors['colors'] = id_and_hex_colors['hex_colors'].apply(\n",
" lambda c: json.loads(c.replace(\"'\", '\"'))\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" akon_id | \n",
" hex_colors | \n",
" colors | \n",
"
\n",
" \n",
" \n",
" \n",
" 1291 | \n",
" AK116_455 | \n",
" ['#f4ece0', '#d4d2ce', '#716d5c', '#65665e', '... | \n",
" [#f4ece0, #d4d2ce, #716d5c, #65665e, #b2b4bb, ... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" akon_id hex_colors \\\n",
"1291 AK116_455 ['#f4ece0', '#d4d2ce', '#716d5c', '#65665e', '... \n",
"\n",
" colors \n",
"1291 [#f4ece0, #d4d2ce, #716d5c, #65665e, #b2b4bb, ... "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id_and_hex_colors.sample()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do you see the subtle difference? The entry in the colors column is now an array with strings _without_ the single quotes `'`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"id_and_colors = id_and_hex_colors[['akon_id', 'colors']].copy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add Metadata From Original Records"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next up: Combining the id and the color array with names and dates. Read the metadata dump directly from the link found on [https://labs.onb.ac.at/en/dataset/akon/](https://labs.onb.ac.at/en/dataset/akon/):"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (13) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" interactivity=interactivity, compiler=compiler, result=result)\n"
]
}
],
"source": [
"original = pd.read_csv('https://labs.onb.ac.at/gitlab/labs-team/raw-metadata/raw/master/akon_postcards_public_domain.csv.bz2?inline=false', compression='bz2')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" akon_id | \n",
" name | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 16251 | \n",
" AK016_087 | \n",
" Neulengbach | \n",
" vor 1907 | \n",
"
\n",
" \n",
" 23864 | \n",
" AK044_401 | \n",
" Gloriette | \n",
" 1907 | \n",
"
\n",
" \n",
" 33292 | \n",
" AK085_517 | \n",
" Milano | \n",
" vor 1905 | \n",
"
\n",
" \n",
" 13875 | \n",
" AK008_070 | \n",
" Schloß Schönbrunn | \n",
" 1906 | \n",
"
\n",
" \n",
" 12223 | \n",
" AK004_040 | \n",
" Pürgg | \n",
" 1909 | \n",
"
\n",
" \n",
" 4129 | \n",
" AK125_097 | \n",
" Altaussee | \n",
" gelaufen 1901 | \n",
"
\n",
" \n",
" 23756 | \n",
" AK044_080 | \n",
" Radstädter Tauern | \n",
" 1907 | \n",
"
\n",
" \n",
" 26479 | \n",
" AK069_067 | \n",
" Attersee | \n",
" 1906 | \n",
"
\n",
" \n",
" 14251 | \n",
" AK109_329 | \n",
" Josefsthal | \n",
" vor 1907 | \n",
"
\n",
" \n",
" 28518 | \n",
" AK063_083 | \n",
" Gaußig | \n",
" 1908 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" akon_id name date\n",
"16251 AK016_087 Neulengbach vor 1907\n",
"23864 AK044_401 Gloriette 1907\n",
"33292 AK085_517 Milano vor 1905\n",
"13875 AK008_070 Schloß Schönbrunn 1906\n",
"12223 AK004_040 Pürgg 1909\n",
"4129 AK125_097 Altaussee gelaufen 1901\n",
"23756 AK044_080 Radstädter Tauern 1907\n",
"26479 AK069_067 Attersee 1906\n",
"14251 AK109_329 Josefsthal vor 1907\n",
"28518 AK063_083 Gaußig 1908"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"original[['akon_id', 'name', 'date']].sample(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are the columns needed."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"original_info = original[['akon_id', 'name', 'date']].copy()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" akon_id | \n",
" name | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 24871 | \n",
" AK048_377 | \n",
" Reichenau an der Rax | \n",
" 1925 | \n",
"
\n",
" \n",
" 4191 | \n",
" AK054_543 | \n",
" Aflenz Kurort | \n",
" vor 1905 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" akon_id name date\n",
"24871 AK048_377 Reichenau an der Rax 1925\n",
"4191 AK054_543 Aflenz Kurort vor 1905"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"original_info.sample(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas offers a handy function for merging two dataframes _not on the index_, but on a shared column:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"colors_and_info = pd.merge(id_and_colors, original_info, on='akon_id')"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" akon_id | \n",
" colors | \n",
" name | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 16356 | \n",
" AK016_310 | \n",
" [#5f5a57, #9c938c, #4d453f, #cabbab, #3c3b39, ... | \n",
" Kapellen | \n",
" 1908 | \n",
"
\n",
" \n",
" 1117 | \n",
" AK116_129 | \n",
" [#dcc9ab, #cec0a7, #a8a290, #48473a, #6c6c62, ... | \n",
" Garsten | \n",
" gelaufen 1902 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" akon_id colors name \\\n",
"16356 AK016_310 [#5f5a57, #9c938c, #4d453f, #cabbab, #3c3b39, ... Kapellen \n",
"1117 AK116_129 [#dcc9ab, #cec0a7, #a8a290, #48473a, #6c6c62, ... Garsten \n",
"\n",
" date \n",
"16356 1908 \n",
"1117 gelaufen 1902 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"colors_and_info.sample(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's exactly what's needed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save to JSON-File"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas can export the data exactly in the target format:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"colors_and_info.to_json('historic_postcards__id_colors_name_date.json', orient='values')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extract Subset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's almost all. For the preview on [https://labs.onb.ac.at/en/dataset/akon/](https://labs.onb.ac.at/en/dataset/akon/) I need a subset of 100 swatches and save them:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"id_and_colors_100 = colors_and_info.iloc[:100]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" akon_id | \n",
" colors | \n",
" name | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 99 | \n",
" AK111_207 | \n",
" [#403e2c, #f9f8eb, #7e7f6b, #35362d, #32342f, ... | \n",
" Klosterneuburg | \n",
" gelaufen 1908 | \n",
"
\n",
" \n",
" 34 | \n",
" AK111_293 | \n",
" [#e6d8bf, #c7c1b1, #4a4c3c, #aeafa5, #464740, ... | \n",
" Komotau | \n",
" vor 1905 | \n",
"
\n",
" \n",
" 86 | \n",
" AK111_184 | \n",
" [#f7edd6, #d1cbb8, #f6f0db, #2d2a1b, #9e9f93, ... | \n",
" Klausenburg | \n",
" gelaufen 1904 | \n",
"
\n",
" \n",
" 3 | \n",
" AK111_026 | \n",
" [#e2cba6, #9e8e73, #574c39, #3c311a, #4c473b, ... | \n",
" Kierling | \n",
" 1922 | \n",
"
\n",
" \n",
" 39 | \n",
" AK111_072 | \n",
" [#e5d9c2, #c0baac, #928e81, #4c493e, #484943, ... | \n",
" Kirchberg am Walde | \n",
" gelaufen 1914 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" akon_id colors \\\n",
"99 AK111_207 [#403e2c, #f9f8eb, #7e7f6b, #35362d, #32342f, ... \n",
"34 AK111_293 [#e6d8bf, #c7c1b1, #4a4c3c, #aeafa5, #464740, ... \n",
"86 AK111_184 [#f7edd6, #d1cbb8, #f6f0db, #2d2a1b, #9e9f93, ... \n",
"3 AK111_026 [#e2cba6, #9e8e73, #574c39, #3c311a, #4c473b, ... \n",
"39 AK111_072 [#e5d9c2, #c0baac, #928e81, #4c493e, #484943, ... \n",
"\n",
" name date \n",
"99 Klosterneuburg gelaufen 1908 \n",
"34 Komotau vor 1905 \n",
"86 Klausenburg gelaufen 1904 \n",
"3 Kierling 1922 \n",
"39 Kirchberg am Walde gelaufen 1914 "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id_and_colors_100.sample(5)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"id_and_colors_100.to_json('historic_postcards__id_colors_name_date__100.json', orient='values')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And done!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 User Default",
"language": "python",
"name": "python_3_user_default"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}