The idea is fairly straightforward. For each point in the image, we assume that it is on a line. What would the line be? Well, every line thru the point (x,y) is described by the standard linear equation y = ax+b for a slope a and intercept b. If we range over all possible slopes (a), we can then solve for the intercept b: b= y-ax. So if we step a from 0 (horizontal) line to infinity (vertical line), then we get a set of points (a,b) and can plot those in feature space. If we do this for all the points, then for an actual line, with slope A and intercept B, there will be a bunch of points at (A,B) in the feature space. Think of the feature space as an and accumulator array, which starts at all 0, and for each (a,b) corresponding to a point (x,y), we add 1 to that index in the accumulator array. Then the largest value in the accumulator corresponds to the longest line in the image, and gives you the slope and intercept.
The main problem with this, of course, is the slope going to infinity, for a vertical line. So the standard solution is to replace the description of a line as ax+b by a representation as r = x cos a + y sin a, for an angle a and radius r. If we step thru the angles, a, from 90 degrees to -90 degrees, we can solve for r and get a point (a,r) where a is limited from -90 to 90, and r is limited from 0 to the longest distance in the image (which would be the square root of the sum of w squared and h squared for an image of size w by h.) Again, we can use the accumulator array and look for the maximum to find the longest line. Of course, it's not as easy to map the maximum point (A,R) in the feature space back to the actual line in the image space. We solved that by finding the maximum, and then running over the image array a second time, computing (a,r) and marking the image whenever they equal (A,R).
We coded that up, and applying it to Figure 3.34.
This looks pretty good, but notice the little red parts on the circles:
The algorithm finds a line, but it's the abstract line that goes from one end of the image to the other, not just the line segments that are in the image. But we can probably use other code to find the segments, and keep only the longest ones.
Also notice that the section on the circle (and the little black part on the line below it, shows that it isn't just a line (a one-dimensional mathematical concept), but a line with width. The width is determined by the size of the accumulator array. While we step thru the angles from -90 to +90, we can do that in steps of 1 degree, or 5 degrees, or 1/10th of a degree, defining a different size accumulator array. If the accumulator array is too small, the line found is really wide. Plus, it looks like there may be more than one line mapping into an accumulator cell. So stepping by 2 degrees, recognizes a much wider area as being part of the "line":
Look at the amount of red on the top circle, or on the width of the cross lines.
If we increase the number of steps, increasing the size of the array, for example to 1/10 of a degree, we get a much smaller line being recognized.
This line is only 4 pixels wide. Increasing the size of the array by another factor of 4 (steps of 0.025 degrees) reduces this to one pixel wide. Of course it also runs much slower, because of the larger accumulator array, which must be set to zero and then searched for the largest value after the transform.
Plus, the line that is found is not centered, but seems to go from the right edge of the actual line at the top to the left edge at the bottom, which seems a bit strange. I'm guessing it's something to do with the "angle" part of the description of the line.
And I don't really understand why we have to switch to the angle,distance representation of a line -- Wikipedia calls this the Hesse normal form. I can see the difficulty of an infinite slope for a vertical line, but this is a computer program. We can easily constrain our transformation to slopes between say 45 degrees and -45 degrees (a slope of 1 to -1). That will find all "mostly horizontal" lines, then transpose the image and run it again. Transposing the image will make a vertical line into a horizontal line. We run the algorithm, and then transpose the results back. So first we find all the "mostly horizontal lines", then all the "mostly vertical lines". As long as this partitions the possible slopes we consider, we will get all the lines (since we cover all slopes this way). No need for sines and cosines, and we get the lines in the form of a slope and intercept.
But the Hough transform has another advantage -- it can be used for circles. For circle detection, we use the equation (i - a)2 + (j - b)2 = r2 where (a,b) is the center of a circle of radius r, and (i,j) is a point on that circle. We need accumulator cells for a 3-dimensional array, (a,b,r), in general. If the radius is known, this is just another 2-dimensional accumulator array. A limited sampling of our figures suggests all the circles are about 200 pixels in diameter. Our experience with the linear Hough transform suggests that a larger array means that our diameter has to be more accurate, while a smaller array will compensate for variation in diameter, and noise in the scanning.
And after coding it up, it seems that is the case. The Wikipedia article for the Circle Hough Transform makes it quite clear how to run the code (look down for "The algorithm" and "The voting" for "Find circle parameter with unknown radius"). In our case, since the diameters seem to be about 200 pixels, the radius is assumed to be about 100 pixels, so we run the loop for the radius from 80 to 120. This ends up being a bit problematic for finding circles, since each point now takes 40 loops for the radius, and then 360 for the angle, so the time for computation increases substantially. But for test purposes, we cut out just the circle part of a Figure, and use it to develop our code:
Our first run at recognizing the circle says it is a success, but the result, with the circle in red, looks to be the same as what we started with:
There is just the hint of a red section at the bottom, and enlarging the image, we see, at the pixel level,
The code found the circle, and from the position of the maximum in the array, we know it is a circle with radius 100 and center (158,119) which certainly looks right, but what we see is a circle 9 pixels thick, while the algorithm finds only a 1-pixel wide circle.
Further experimenting with the program and our code reveals a couple of things. Notice how in the image above, while the red codes do define a circle, they are not continuous. A circle with radius 100 would have a circumference of roughly 628 (2πr). But our loop runs from 0 to 360 degrees, so we will notice only up to 360 of these points. For the example above, that's roughly half the points (which looks roughly correct). This suggests we can easily tell a circle from a non-circle -- a circle will have (roughly) 360 points in the accumulator array; significantly less than this means no circle. A short set of tests suggests that 350 or more is a circle (Or we could increase or decrease the number of points around a circle that we test, which would directly impact performance.)
If we use 350 as a cut-off, notice that we are no longer looking for the maximum value, just one that is "high enough". That means we can recognize multiple circles in one run. So if we have 15 circles in an image, we will find 15 peaks above 350, each of which defines the center of one of the circles. In fact, we find peaks both for the maximum radius (as above), but also for all the slightly smaller radii for the width of the circle line. And also maximums for all the slightly off-center circles that use some inner bits and some outer bits of the circle line. So a line with a width of n points creates at least n circles with the same center, and varying diameters, plus circles where the center varies by at most n in both x and y directions. We will need code to consolidate all of these into one wide-line circle.
That turns out to work well, so we can use the Hough Transform for Circles to find all the circles, record them, remove them from the figures and be able to put them back in later. Once the circles are gone, the lines are a bit easier, since (for Petri nets) the lines go from place (circle) to transition (perpendicular line) to place (circle). So almost all our lines are now broken into much smaller pieces.
We did some simple pixel level clean-up of the images. Things like saying any instance of bits that look like
can have the little "pimple" of a bit removed. At 600 dpi, you really can't see that individual bit; you just get a sort of a feeling that the line is not "crisp".
Then the program looked explicitly for horizontal lines. Just horizontal lines. It would check each bit to see if it might be part of a horizontal line, and if so, try to expand it, both left and right to see how long it could be. And also up and down to get a "height" for the line. It would allow a certain amount of "mess" -- the wrong color bit, but not too much. Then the image was rotated 180 degrees, and the vertical lines were found. These were all removed and replaced later.
This meant that the "background" image was just the arrow heads, and non-vertical/horizontal or non-straight lines. Those I cleaned up by hand, put the circles and lines and text back on top of, and the figures were in pretty good shape. The code helped, but it wasn't (isn't) perfect. The figures aren't perfect either. But they are "good enough".
No comments:
Post a Comment