{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# LOC Colors - Production" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Calculate color swatches for historic postcards*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from PIL import Image\n", "from sys import exit\n", "from io import BytesIO\n", "from colorsys import rgb_to_hsv, hsv_to_rgb\n", "from scipy.cluster.vq import kmeans\n", "from numpy import array" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "DEFAULT_NUM_COLORS = 6\n", "# default minimum and maximum values are used to clamp the color values to a specific range\n", "# originally this was set to 170 and 200, but I'm running with 0 and 256 in order to \n", "# not clamp the values. This can also be set as a parameter. \n", "DEFAULT_MINV = 0\n", "DEFAULT_MAXV = 256\n", "\n", "THUMB_SIZE = (200, 200)\n", "SCALE = 256.0\n", "\n", "def down_scale(x):\n", " return x / SCALE\n", "\n", "def up_scale(x):\n", " return int(x * SCALE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The original code by Laura Wrubel uses the [RGB (red, green, blue) color space](https://en.wikipedia.org/wiki/RGB_color_space) for most color computations.\n", "\n", "We're using the [HSV (hue, saturation, value) color space](https://en.wikipedia.org/wiki/HSL_and_HSV) for clustering in the hope of getting prettier and more colorful results for our historic postcards.\n", "\n", "That necessitates modifying some utility functions:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def clamp_hsv(color, min_v, max_v):\n", " \"\"\"\n", " Clamps a color such that the value (lightness) is between min_v and max_v.\n", " \"\"\"\n", " # use down_scale to convert color to value between 0-1 as expected by rgb_hsv\n", " h, s, v = [down_scale(c) for c in color]\n", " # also convert the min_v and max_v to values between 0-1\n", " min_v, max_v = map(down_scale, (min_v, max_v))\n", " # get the maximum of the min value and the color's value (therefore bumping it up if needed)\n", " # then get the minimum of that number and the max_v (bumping the value down if needed)\n", " v = min(max(min_v, v), max_v)\n", " # apply upscale to get the h, s, v(which has been clamped) back to 0-255, return as tuple\n", " return tuple(map(up_scale, (h, s, v)))\n", "\n", "\n", "def order_by_hue_hsv(colors):\n", " \"\"\"\n", " Orders colors by hue.\n", " \"\"\"\n", " hsvs = [list(map(down_scale, color)) for color in colors]\n", " hsvs.sort(key=lambda t: t[0])\n", " return [tuple(map(up_scale, hsv_to_rgb(*hsv))) for hsv in hsvs]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All postcards are scanned in front of a black background, and many contain a lot of very dark colors. This function lets us experiment on removing all colors under a certain saturation or value threshold: colorless (grey-ish) and dark colors, respectively." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def clip_hsv(colors_hsv, min_s, min_v):\n", " min_s = down_scale(min_s)\n", " min_v = down_scale(min_v)\n", " hsvs = [tuple(map(down_scale, color)) for color in colors_hsv]\n", " hsvs = filter(lambda hsv: (hsv[1] >= min_s) and (hsv[2] >= min_v), hsvs)\n", " return [tuple(map(up_scale, hsv)) for hsv in hsvs]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a certain color appears more than once in the picture (when `count >= 1`), we add it more than once to the dataset. This way, large areas of a single color factor in heavily in the resulting clusters:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def get_colors(img, colorspace='HSV'):\n", " \"\"\"\n", " Returns a list of all the image's colors.\n", " \"\"\"\n", " w, h = img.size\n", " # convert('RGB') converts the image's pixels info to RGB \n", " # getcolors() returns an unsorted list of (count, pixel) values\n", " # w * h ensures that maxcolors parameter is set so that each pixel could be unique\n", " # there are three values returned in a list\n", " # return [color for count, color in img.convert(colorspace).getcolors(w * h)]\n", " return [single_color for count, color in img.convert(colorspace).getcolors(w * h) for single_color in [color] * count]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def hexify(rgb):\n", " return \"#{0:02x}{1:02x}{2:02x}\".format(*rgb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For experimentation, allow scaling of the colorspace (effectively making clustering along scaled down axes more likely, and along scaled up axes less likely), clipping of pixels with low saturation and/or low value.\n", "\n", "The scaling is inverted after the clustering algorithm is executed." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def colorz(image_url, n=DEFAULT_NUM_COLORS, min_v=DEFAULT_MINV, max_v=DEFAULT_MAXV,\n", " order_colors=True, coefficients=[1.0, 1.0, 1.0], clip_colors=False, min_clip_s=20, min_clip_v=20):\n", " \"\"\"\n", " Get the n most dominant colors of an image.\n", " Clamps value to between min_v and max_v.\n", "\n", " Total number of colors returned is n, optionally ordered by hue.\n", " Returns as a list of RGB triples.\n", "\n", " \"\"\"\n", " try:\n", " r = requests.get(image_url)\n", " except ValueError:\n", " print(\"{0} was not a valid URL.\".format(image_file))\n", " exit(1)\n", " img = Image.open(BytesIO(r.content))\n", " img.thumbnail(THUMB_SIZE) # replace with a thumbnail with same aspect ratio, no larger than THUMB_SIZE\n", "\n", " obs = get_colors(img, 'HSV') # gets a list of RGB/HSV colors (e.g. (213, 191, 152)) for each pixel\n", " # adjust the value of each color, if you've chosen to change minimum and maximum values\n", " clamped = [clamp_hsv(color, min_v, max_v) for color in obs]\n", " clipped = clip_hsv(clamped, min_clip_s, min_clip_v) if clip_colors else clamped\n", " # turns the list of colors into a numpy array of floats, then applies scipy's k-means function\n", " clusters, _ = kmeans(array(clipped).astype(float) * coefficients, n)\n", " normalized_clusters = clusters / coefficients\n", " colors = order_by_hue_hsv(normalized_clusters) if order_colors else normalized_clusters\n", " \n", " hex_colors = list(map(hexify, colors)) # turn RGB into hex colors for web\n", " return hex_colors" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def draw_row_with_links(link_and_colors):\n", " html = \"\"\n", " url, colors = link_and_colors\n", " for count, color in enumerate(colors):\n", " square = ''.format(((count * 30)), 0, color)\n", " html += square\n", " full_html = '{1}'.format(url, html, len(colors) * 30)\n", " return full_html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at how different parameters affect how the swatches look." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll need an image link. Grab a link to a IIIF manifest from [https://labs.onb.ac.at/en/dataset/akon/](https://labs.onb.ac.at/en/dataset/akon/) or take the one provided down below." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import requests" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'@context': 'https://iiif.io/api/presentation/2/context.json',\n", " '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/manifest',\n", " '@type': 'sc:Manifest',\n", " 'label': 'Dresden',\n", " 'metadata': [{'label': [{'@value': 'Id', '@language': 'en'},\n", " {'@value': 'Id', '@language': 'ger'}],\n", " 'value': 'AK115_479'},\n", " {'label': [{'@value': 'Title', '@language': 'en'},\n", " {'@value': 'Titel', '@language': 'ger'}],\n", " 'value': 'Dresden'},\n", " {'label': [{'@value': 'Place', '@language': 'en'},\n", " {'@value': 'Ort', '@language': 'ger'}],\n", " 'value': \"Dresden\"},\n", " {'label': [{'@value': 'Year', '@language': 'en'},\n", " {'@value': 'Jahr', '@language': 'ger'}],\n", " 'value': '1906'},\n", " {'label': [{'@value': 'Disseminator', '@language': 'en'},\n", " {'@value': 'Anbieter', '@language': 'ger'}],\n", " 'value': \"Ansichtskarten Online\"},\n", " {'label': [{'@value': 'Physical Location', '@language': 'en'},\n", " {'@value': 'Standort', '@language': 'ger'}],\n", " 'value': 'Niederösterreichische Landesbibliothek 1672 - ÖNB'}],\n", " 'description': 'Ministerium, Dampferlandeplatz',\n", " 'viewingDirection': 'left-to-right',\n", " 'viewingHint': 'paged',\n", " 'license': 'http://creativecommons.org/publicdomain/mark/1.0/',\n", " 'attribution': [{'@value': 'Austrian National Library', '@language': 'en'},\n", " {'@value': 'Österreichische Nationalbibliothek', '@language': 'ger'}],\n", " 'logo': 'https://iiif.onb.ac.at/logo/',\n", " 'seeAlso': [{'@id': 'http://data.onb.ac.at/AKON/AK115_479',\n", " 'format': 'text/html'},\n", " {'@id': 'http://data.onb.ac.at/AKON/AK115_479.rdf',\n", " 'format': 'application/rdf+xml'}],\n", " 'sequences': [{'@context': 'https://iiif.io/api/presentation/2/context.json',\n", " '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/sequence/normal',\n", " '@type': 'sc:Sequence',\n", " 'startCanvas': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/canvas/479',\n", " 'canvases': [{'@context': 'https://iiif.io/api/presentation/2/context.json',\n", " '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/canvas/479',\n", " '@type': 'sc:Canvas',\n", " 'label': 'Dresden',\n", " 'height': 1462,\n", " 'width': 2200,\n", " 'images': [{'@context': 'https://iiif.io/api/presentation/2/context.json',\n", " '@id': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/annotation/479',\n", " '@type': 'oa:Annotation',\n", " 'motivation': 'sc:painting',\n", " 'resource': {'@id': 'https://iiif.onb.ac.at/images/AKON/AK115_479/479/full/full/0/native.jpg',\n", " '@type': 'dctypes:Image',\n", " 'height': 1462,\n", " 'width': 2200,\n", " 'format': 'image/jpeg',\n", " 'service': {'@context': 'https://iiif.io/api/image/2/context.json',\n", " '@id': 'https://iiif.onb.ac.at/images/AKON/AK115_479/479',\n", " 'profile': 'https://iiif.io/api/image/2/level2.json'}},\n", " 'on': 'https://iiif.onb.ac.at/presentation/AKON/AK115_479/canvas/479'}]}]}]}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "r = requests.get('https://iiif.onb.ac.at/presentation/AKON/AK115_479/manifest/')\n", "r.json()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The image link can be found under `sequences[*].canvases[*].images[*].resource.@id`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "image_link = 'https://iiif.onb.ac.at/images/AKON/AK115_479/479/full/!200,200/0/native.jpg'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at it. For our calculation, we'll use a much smaller variant of the image. Using the [IIIF Image API](https://iiif.io/api/image/2.1/), we can request an image of a certain size. To do this, we'll substitute the second `full` parameter by `!200,200`, meaning the resulting image should fit inside a 200x200 square." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/jpeg": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import IPython.display as ipd\n", "\n", "im_r = requests.get(image_link)\n", "ipd.display(ipd.Image(im_r.content))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create the color swatches..." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['#beaea3', '#494440', '#211d11', '#313c3f', '#8e9da2', '#534a4e']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols1 = colorz(image_link)\n", "cols1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...and display them as well:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def display_colors(color_array, link):\n", " html = draw_row_with_links((link, color_array))\n", " ipd.display(ipd.HTML(html))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_colors(cols1, image_link)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['#c4aa9e', '#5b5651', '#242411', '#8a9597', '#435158', '#5d5559']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols2 = colorz(image_link, coefficients=[1.0, 2.0, 0.6])\n", "cols2" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_colors(cols2, image_link)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['#c8aea2', '#4b4540', '#3d3422', '#4b5e62', '#829396', '#594f52']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols3 = colorz(image_link, coefficients=[1.0, 2.0, 0.6], clip_colors=True, min_clip_v=30)\n", "cols3" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_colors(cols3, image_link)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['#c9ac9f', '#49413b', '#332c1c', '#8ba9b0', '#2f393d', '#5d6b72']" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols5 = colorz(image_link, clip_colors=True, min_clip_s=30, min_clip_v=30)\n", "cols5" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display_colors(cols5, image_link)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is getting tedious.\n", "\n", "Let's define a function that computes swatches and then displays the original image and the swatches side by side:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def colorize_and_display(image_link=image_link, **kwargs):\n", " cols = colorz(image_link, **kwargs)\n", " display_colors(cols, image_link)\n", " ipd.display(ipd.Image(requests.get(image_link).content))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/jpeg": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colorize_and_display(clip_colors=True, min_clip_s=50, min_clip_v=0)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/jpeg": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colorize_and_display(clip_colors=True, min_clip_s=50, min_clip_v=30)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/jpeg": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colorize_and_display(clip_colors=True, min_clip_s=20, min_clip_v=30)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/jpeg": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colorize_and_display(clip_colors=True, min_clip_s=20, min_clip_v=30, coefficients=[1.0, 2.0, 0.6])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/jpeg": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colorize_and_display(clip_colors=True, coefficients=[1.0, 2.0, 0.6], min_clip_s=20, min_clip_v=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks like a winner to me. We'll use `create_swatches.py` to create swatches for all available images in batch." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 User Default", "language": "python", "name": "python_3_user_default" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }