Tuesday, December 13, 2022

Starting the Second Gordo Book

Most of the technical issues for the first Gordo book (Gordo Redux) are handled, so we turn our attention to the second Gordo Book (Gordo Galore). As with the first one we scanned it at 600 dpi. Then we can run the same set of programs we used on the first book, convert the pages from gray scale to black and white, clean up the images, to create the cboxes and pull the individual strips out as images.

But a new problem is presented by this particular book: when it was photocopied at some point, thin white lines were introduced into the copy.  This seems particularly a problem on the backs of the pages, the left-hand sides which would be odd pages.  For example look at part of p029:


I've seen these sorts of lines on Xerox copies for decades.  They are caused by dirty or defective points in the photocopy machinery or toner. 

But we would like to identify these lines, and fill them in with the missing black to restore the original image drawn by Arriola.  

If we expand in and look at the individual pixels, we see that the lines are not constant, but rather a mixture of light and dark pixels.

When we convert this to pure black and white pixels, the picture changes somewhat, but we still have these lines.


It appears that many of these could be detected by simply changing small pieces of white surrounded by black to black, but as with the first book, there some places were that would change the image.  Although for the first book, the image we found which limited us to only clusters of 10 pixels or less was an issue of a black cluster in a white background.

Still there are some places where that would be insufficient.  If we follow the lines shown above to the left in this same image, we get:


At least for the line on the right, the little white channels breaking the otherwise solid vertical line probably should not be filled in without the context of knowing that there is a thin white line crossing the page.

To study this problem, I tried to look at where the white lines are, in this image, by examining the borders for the panels. This image has 3 panels, so a left border, two gutters of white with borders at roughly 1/3 and 2/3 horizontally, and a right border.  If I try to list the white lines at each vertical border, I get:


The largest, most obvious white lines are marked with an "@" symbol.  The numbers represent the starting row of the white line, and the number of pixels high that it is.  So "447,3" is a gap that starts at row 447 and continues to row 449.  There are a couple of things we can note from this chart.

One is that the white lines are not actually strictly horizontal.  As they move from left to right, following a white line, for this image, the row position decreases. So there is skew. The last column shows the difference between the left hand row position and the right hand row position.  For this image it decreases by  a skew of 20 to 27 pixels in the span of 4446 columns, about 0.6%.  But notice also that, at least in this case, the skew is not constant for this image, but seems to decrease, almost uniformly as the row changes.  And since the skew is not constant, it is not a property of the scanning.

And lines do not seem to appear across the entire image.  A line may stop or start in the middle of the image.  From p029, for example:


Here we see a major line towards the bottom that continues thru the image, although varying someone is "strength".  Other lines, at the top, start and stop at intervals.

If we can identify the lines, we can fill them in to change the above image to


No comments:

Post a Comment