Republishing Old Books: Black and White Gordo

The Gordo books are black and white, except for the covers. There is some text on the introductory pages, but once we get to the actual comic strips, they are all just black and white. Or maybe shades of gray? When we scanned the images, we told the scanner program "Gray", but what did we actually get?

First we convert from TIFF to GIF, since most of our software works on GIF images. Both formats are loss-less so we can easily convert from one to the other. For GIF images, we have a 256 entry color table of (R,G,B) values and each pixel in the image is an index into this table. So any given image has no more than 256 different colors. Different images, of course, might have different color tables, in which case the total number of colors may exceed 256, but for one image we have only 256. TIFF files can hold up to 24 bits of color data (8 bits for each of R, G, B) for each pixel.

But of course, if we actually have gray scale, then the value of R is the same as the value of G and the value of B for each pixel, since gray is the result of equal R, G, B values. So even with 24 bits, we still only have 256 different levels of gray.

We have a program that will read a GIF file and produce a list of all the different colors in the file, and the number of times each color is used. If we run that on all the GIF files for both books, we find there are only 256 different colors, in all the files for all the pages, and each of those colors is a gray level, with R = G = B.

If we plot the number of occurrences of each color in a file, for example p135.gif, we get a plot like:

where we see a number of black pixels on the left of the plot, a very small number of gray pixels, followed by a whole bunch of white pixels at the right of the plot. The minimum number of a color is 11,339 (90,90,90) and the maximum is 21,667,543 for (255,255,255) (white). This is out of a total of 33,660,000 which is an 8.5 x 11 inch image at 600 dpi (5100 x 6600 pixels). So for this image 64% of the pixels are pure white.

The second largest number of pixels (2,194,373) are for (248,248,248). I suggest that for our purposes, the difference between (248,248,248) and (255,255,255) is not meaningful -- a person can not tell the difference between these two colors. Both are effectively "white". Similarly the next two most common colors are (14,14,14) and (9,9,9), both of which are effectively "black".

If we create a similar graph for all the pixels in all the pages, we have the following:

which again has a small cluster of black pixels at the left, and a very high cluster of white pixels at the right, with a vast sequence of gray colors in-between which are almost never used. There is that unusual spike just to the left of the white spike. We believe that is in fact caused by the scanning of the rectangular holes for the GBC binding. If we take just the blank pages, which have nothing other than the holes and random variation from a white surface, and plot the colors from that we get:

Thus, I think if we collapse all the colors above from around 150 to be 255 (mapping all those really light greys to white), we will both get rid of the scanning artifact of the GBC holes, and get a much brighter, whiter, background white color.

Similarly, if we map all the colors from about 100 down to be 0 (black), we will get a solid black for the images. That leaves a small number of gray level colors (between 100 and 150 to be considered.

If we try this on a sample from p135.gif, we start with

and converting the range 0 to 100 to black (0) and 150 to 255 to white (255), we get

Notice that the GBC holes are gone. Otherwise, the only change I can see is that the lamp shade is somewhat lighter.

This still leaves gray colors from 102 to 145 (for this image). Where are those gray colors? If we zoom in further, and convert all those gray level colors to red,

we can see that these gray colors are in the transition from black to white (or white to black). If we convert all these to white, it will "thin" the strokes that make up the image; converting them all to black will make them "thicker".

However, about half way between these two numbers is 128. If we use 128 as the dividing line between black and white, we would get something like:

where green represents all the colors above 128, and red all the colors below 128. It looks roughly equal, so that should not make the strokes thicker or thinner.

Republishing Old Books

Friday, November 11, 2022

Black and White Gordo

No comments:

Post a Comment