From reading graphics processing books, it appears that fixing the thin little horizontal lines that have been introduced into the images by the photocopying process should be fairly easy.  All we have to do is use a Hough transform.  
A Hough transform is used to find straight lines in an image. The idea is to look at each pixel and say "if there was a straight line going thru this point, what would it be?". The equation for a line with slope a and y-intercept b going thru the point (x,y) is y = ax+b. Now if we have (x,y) and want to solve for a and b, that would be better presented as b = y - ax. Now give x, y and a we can solve for b.
A Hough transform defines an accumulator matrix m and stepping thru the possible slopes, a, computes b = y - ax for each allowable a and adds one to the location in the matrix defined by m[a,b]. The idea is that if the line is actually there, all the points along the line will add to m[a,b] for the line with a particular slope a and intercept b. All the non-lines will just sort of add randomly to other locations in the matrix. So the matrix locations with large counts correspond to lines and the others are all just noise.
There are problems with this approach when the slope approaches vertical since the slope tends towards infinity, but in our case we are only interested in horizontal (or mostly horizontal) lines. In fact, we expect that if we limit our work to each panel of a strip, instead of an entire strip, then the skew will be minor and we are actually looking for horizontal lines. In that case a = 0, and we simply accumulate into a vector m[b] whenever for a line defined by b = y + 0x or just b = y.
So for each point (x,y) that we think might be on a line, we add one to m[y] and look for the high values of the resulting vector.
So we are interested in a point if it is on a line. We know the lines look like:
 
So we look for a column with a small number of white pixels with black pixels above and below. Then we accumulate the number in a particular row, and look for the rows with the largest accumulation. For a row like that, we then step down it looking for the small cluster of white pixels and setting them all to black.
There are some parameters for this, like how many white pixels in a row, and how many black pixels above and below the white pixels. We start with up to 5 white pixels and needing 10 black pixels above and below.
As we suspected, the fine white lines mainly show up on the odd numbered files, which would be the left hand pages, which was the "back" of the sheets as they were copied.
When we use this approach, we find two types of errors.  One is some 
very noticeable white lines that are not caught. Going thru all 1884 
panels, we find 150 that still need work. 
Some of those are because the white lines are big than we anticipated. For example, here on panel 3 of strip 4 on p013, the white line is sometimes 6 pixels high.
Another problem is where we have what we think might be a white line, but is actually part of the original art. For example in the 2nd panel of the 2nd strip of p034, we have the following art:
Which actually has no white lines in it. But notice that there are two marks on the plate, just below the top rim. The distance here between the white and the black is such that my code things this is part of a thin white line, and fills those white pixels in, producing:
So I need to either improve my code, or ferret out all these sorts of problems and fix them by hand.




 
No comments:
Post a Comment