Tuesday, May 16, 2023

Understanding the Hough Transform

 Our basic code to find the white lines that the photocopying process has introduced is to use a Hough Transform, which we described in a previous post "Fixing the Photocopy Lines".  But we are having a number of problems with our code for that use.  So let us try to understand Hough Transforms and how they work better.

Let us start with a really simple case: a simple  48 pixel long well-defined perfectly horizontal line (from (5,16) to (52,16)).


If we feed this into our Hough Transform code and look at the matrix that is produced by it, we find it looks like:


The maximum value (48) is right in the middle of row 16, corresponding to a slope of 0 and an intercept of 16.  But we also compute for lines that go thru each point of the line for 31 different slopes, each a tiny bit different from the previous (both positive and negative).

So although the maximum is a slope of 0 and an intercept of 16, we also have the same maximum for a slope slightly less and slightly more (a slope so that at the end of the image, the line would be one pixel higher or lower, so not quite level).  This is because of the integer arithmetic used.  As the slope gets higher (or lower), some pixels will actually intercept the edge of the image higher and lower.  Notice that the sum in each column (sum over all slopes) is 48, since all 48 green pixels contribute to some b-intercept line, for each different slope.

If the green line gets thicker, then there are multiple intercepts for the thicker line (from 15 to 17, for example for a 3-pixel wide line), and the Hough transform matrix expands similarly.

Note that each column still adds up to 48 (times 3 now that we have 3 lines of 48 green pixels).  But the maximum value (48) occurs 15 times -- five times on each of the 3 lines for the intercept lines corresponding to the 3 straight, horizontal lines.

And we start to see other local maximums, for example the 39 up one line from the 3 real intercepts, and to the left and right of the center.  This probably corresponds to a line that would start at the lower of the 3 lines at the left and then move right into the center line and up to the upper line, exiting at the right-most pixel of the upper line.  And a similar line going down from the left-most green pixel for the upper line to the right-most green pixel for the lower line. 

So while a maximum value in the Hough Transform matrix corresponds to a particular slope/intercept line, there may be multiple maximums suggesting a thicker line.

In a noisy environment, however, we would expect that there are no perfect lines.  For a perfectly horizontal single-pixel line, that would mean a lower maximum -- some of the pixels that "should" contribute to it are missing -- but we would still expect that the maximum is larger than the lower values in the matrix. 

If we poke holes in the lines, corresponding to noise, we can end up with more or fewer non-green holes in each of the 3 lines corresponding to the width of the line, and higher or lower values in the Hough Transform matrix.  For example, putting one non-green pixel in the lowest line, five in the central line, and 3 in the upper line, moves the maximum to 47 in the line corresponding to the lowest line, 43 for the line corresponding to the middle line, and 45 in the upper line.   This gives a cluster of values from from 47 to 43, and then a drop to 38, 37 ... and much lower values.  But it is not clear how to differentiate between the 43 (which is 4 below the maximum) ad the 38 (which is only 5 below that)

But there could be other high-value entries "close" to the maximum that correspond to a thicker line -- not just one pixel.

If we look at a real example, panel p027_1_1, we find the maximum at 783,22 with a value of 276.  There is a 274,  at 784,20, and a 239 at 785,18.  We have to drop to 182 to get away from the cluster of high numbers in rows 784 to 790.