Saturday, May 25, 2019

Hough Transform

A search for how to recognize straight lines in an image leads me to the Hough Transform.  This is a technique to find straight lines in an image.  The idea is to transform the bits in the image space into a "feature space" which reveals the lines.

The idea is fairly straightforward.  For each point in the image, we assume that it is on a line.  What would the line be?  Well, every line thru the point (x,y) is described by the standard linear equation y = ax+b for a slope a and intercept b. If we range over all possible slopes (a), we can then solve for the intercept b: b= y-ax.  So if we step a from 0 (horizontal) line to infinity (vertical line), then we get a set of points (a,b) and can plot those in feature space.  If we do this for all the points, then for an actual line, with slope A and intercept B, there will be a bunch of points at (A,B) in the feature space.  Think of the feature space as an and accumulator array, which starts at all 0, and for each (a,b) corresponding to a point (x,y), we add 1 to that index in the accumulator array.  Then the largest value in the accumulator corresponds to the longest line in the image, and gives you the slope and intercept.

The main problem with this, of course, is the slope going to infinity, for a vertical line.  So the standard solution is to replace the description of a line as ax+b by a representation as r = x cos a + y sin a, for an angle a and radius r.  If we step thru the angles, a, from 90 degrees to -90 degrees, we can solve for r and get a point (a,r) where a is limited from -90 to 90, and r is limited from 0 to the longest distance in the image (which would be the square root of the sum of w squared and h squared for an image of size w by h.)  Again, we can use the accumulator array and look for the maximum to find the longest line.  Of course, it's not as easy to map the maximum point (A,R) in the feature space back to the actual line in the image space.  We solved that by finding the maximum, and then running over the image array a second time, computing (a,r) and marking the image whenever they equal (A,R).

 We coded that up, and applying it to Figure 3.34.



and, marking the line found in red



This looks pretty good, but notice the little red parts on the circles:


The algorithm finds a line, but it's the abstract line that goes from one end of the image to the other, not just the line segments that are in the image. But we can probably use other code to find the segments, and keep only the longest ones.

Also notice that the section on the circle (and the little black part on the line below it, shows that it isn't just a line (a one-dimensional mathematical concept), but a line with width.  The width is determined by the size of the accumulator array.  While we step thru the angles from -90 to +90, we can do that in steps of 1 degree, or 5 degrees, or 1/10th of a degree, defining a different size accumulator array.    If the accumulator array is too small, the line found is really wide.  Plus, it looks like there may be more than one line mapping into an accumulator cell.  So stepping by 2 degrees, recognizes a much wider area as being part of the "line":


Look at the amount of red on the top circle, or on the width of the cross lines.

If we increase the number of steps, increasing the size of the array, for example to 1/10 of a degree, we get a much smaller line being recognized.


This line is only 4 pixels wide.  Increasing the size of the array by another factor of 4 (steps of 0.025 degrees) reduces this to one pixel wide.  Of course it also runs much slower, because of the larger accumulator array, which must be set to zero and then searched for the largest value after the transform.

Plus, the line that is found is not centered, but seems to go from the right edge of the actual line at the top to the left edge at the bottom, which seems a bit strange.  I'm guessing it's something to do with the "angle" part of the description of the line.

And I don't really understand why we have to switch to the angle,distance representation of a line -- Wikipedia calls this the Hesse normal form.  I can see the difficulty of an infinite slope for a vertical line, but this is a computer program.  We can easily constrain our transformation to slopes between say 45 degrees and -45 degrees (a slope of 1 to -1).  That will find all "mostly horizontal" lines, then transpose the image and run it again.  Transposing the image will make a vertical line into a horizontal line.  We run the algorithm, and then transpose the results back.  So first we find all the "mostly horizontal lines", then all the "mostly vertical lines".  As long as this partitions the possible slopes we consider, we will get all the lines (since we cover all slopes this way).  No need for sines and cosines, and we get the lines in the form of a slope and intercept.

But the Hough transform has another advantage -- it can be used for circles.  For circle detection, we use the equation (i - a)2 + (j - b)2 = r2 where (a,b) is the center of a circle of radius r, and (i,j) is a point on that circle.  We need accumulator cells for a 3-dimensional array, (a,b,r), in general.  If the radius is known, this is just another 2-dimensional accumulator array.  A limited sampling of our figures suggests all the circles are about 200 pixels in diameter.  Our experience with the linear Hough transform suggests that a larger array means that our diameter has to be more accurate, while a smaller array will compensate for variation in diameter, and noise in the scanning.

And after coding it up, it seems that is the case.  The Wikipedia article for the Circle Hough Transform makes it quite clear how to run the code (look down for "The algorithm" and "The voting" for "Find circle parameter with unknown radius"). In our case, since the diameters seem to be about 200 pixels, the radius is assumed to be about 100 pixels, so we run the loop for the radius from 80 to 120.  This ends up being a bit problematic for finding circles, since each point now takes 40 loops for the radius, and then 360 for the angle, so the time for computation increases substantially.  But for test purposes, we cut out just the circle part of a Figure, and use it to develop our code:


Our first run at recognizing the circle says it is a success, but the result, with the circle in red, looks to be the same as what we started with:




There is just the hint of a red section at the bottom, and enlarging the image, we see, at the pixel level,

The code found the circle, and from the position of the maximum in the array, we know it is a circle with radius 100 and center (158,119) which certainly looks right, but what we see is a circle 9 pixels thick, while the algorithm finds only a 1-pixel wide circle.

Further experimenting with the program and our code reveals a couple of things.  Notice how in the image above, while the red codes do define a circle, they are not continuous.  A circle with radius 100 would have a circumference of roughly 628 (2πr).  But our loop runs from 0 to 360 degrees, so we will notice only up to 360 of these points.  For the example above, that's roughly half the points (which looks roughly correct).  This suggests we can easily tell a circle from a non-circle -- a circle will have (roughly) 360 points in the accumulator array; significantly less than this means no circle.  A short set of tests suggests that 350 or more is a circle (Or we could increase or decrease the number of points around a circle that we test, which would directly impact performance.)

If we use 350 as a cut-off, notice that we are no longer looking for the maximum value, just one that is "high enough".  That means we can recognize multiple circles in one run.  So if we have 15 circles in an image, we will find 15 peaks above 350, each of which defines the center of one of the circles.  In fact, we find peaks both for the maximum radius (as above), but also for all the slightly smaller radii for the width of the circle line.  And also maximums for all the slightly off-center circles that use some inner bits and some outer bits of the circle line.  So a line with a width of n points creates at least n circles with the same center, and varying diameters, plus circles where the center varies by at most n in both x and y directions.  We will need code to consolidate all of these into one wide-line circle.

That turns out to work well, so we can use the Hough Transform for Circles to find all the circles, record them, remove them from the figures and be able to put them back in later.  Once the circles are gone, the lines are a bit easier, since (for Petri nets) the lines go from place (circle) to transition (perpendicular line) to place (circle).  So almost all our lines are now broken into much smaller pieces.

We did some simple pixel level clean-up of the images.  Things like saying any instance of bits that look like






can have the little "pimple" of a bit removed.  At 600 dpi, you really can't see that  individual bit; you just get a sort of a feeling that the line is not "crisp".

Then the program looked explicitly for horizontal lines.  Just horizontal lines.  It would check each bit to see if it might be part of a horizontal line, and if so, try to expand it, both left and right to see how long it could be.  And also up and down to get a "height" for the line.  It would allow a certain amount of "mess" -- the wrong color bit, but not too much.  Then the image was rotated 180 degrees, and the vertical lines were found.  These were all removed and replaced later.

This meant that the "background" image was just the arrow heads, and non-vertical/horizontal or non-straight lines.  Those I cleaned up by hand, put the circles and lines and text back on top of, and the figures were in pretty good shape.  The code helped, but it wasn't (isn't) perfect.  The figures aren't perfect either.  But they are "good enough".






Friday, May 24, 2019

The Figures for the Petri Net book

The Petri net book has a lot of Figures.  152 of them, at least.  Some of the Figures are really just tables, so they can be typeset like the rest of the book, but Petri nets are basically a graphical representation of information or computation flow, so it makes sense there would be a lot of Figures.

The Figures are mainly line drawings, lines and circles, with labels attached to the lines and circles.  The labels are text, and can be typeset like the text of the book.  We have a "line" type in our representation, but it's really only for either horizontal or vertical lines.  It uses the same "box" notation we have for the character boxes -- start and end of x and y coordinates, 4 numbers -- so it's really a black box, not a black line.  But if the box is very thin in one dimension or the other, it becomes a line (horizontal or vertical).

But the lines in the Figures are not horizontal or vertical, in general, and most of them actually have arrow heads on them, being directional lines.  And then there are the circles.

But we have "images" in our representation.  Normally an "image" is used to represent the Figures.  In the text, we leave a blank space for the Figure, and place the text above and below it.

However, we can also use the same approach to defining the Figures themselves.  So we have a two level structure.  First we have a set of files that represent the Figures.  These are used to produce GIF images of the Figures.  Then we have the set of files for the book, which include the images for the Figures.

For the representation of the Figures, we have two basic parts -- the text which are the labels and the actual drawing.  If we create a GIF file which is just the drawing part (the circles and arrows) with no text labels, and include it as an image that covers the entire "page" of the Figure, then we can then list the characters which are the text of the labels and they will be placed on top of the background line drawing.  So we can represent each Figure by a background image and a box file with the text.

As an example, consider page 132.  The box file for it contains both the text and the image:

...
c gif/p132.gif 1682 314 1698 361 l 8pt i
c gif/p132.gif 1697 318 1710 361 i 8pt i
c gif/p132.gif 1714 323 1732 361 t 8pt i
c gif/p132.gif 1730 328 1765 372 y 8pt i
n gif/p132.gif 371 313 1765 373 newline
i gif/p132.gif 1339 532 2066 1353 figures/gif/5_7.gif
n gif/p132.gif 1339 532 2066 1353 newline
c gif/p132.gif 914 1404 949 1451 F 8pt b
c gif/p132.gif 953 1404 970 1451 i 8pt b
c gif/p132.gif 975 1418 1003 1462 g 8pt b
c gif/p132.gif 1010 1418 1045 1451 u 8pt b
...

We have first the boxes that define where the characters go for the page heading (which is in 8 point italic), followed by the image, which takes up all the space from row 532 to row 1353, so it is (1353 - 532 + 1 =) 822 pixels tall.  Then after the Figure, we have it's caption, which starts with "Figu..." in 8pt bold.

This looks like:

where if you look at the full image, the various parts have boxes drawn around them to show where each box is.  One box around the trailing italic "y" and then a bigger box around the image, and another box for the bold capital "F", and so on.

The description of the Figure itself, is the same sort of thing, but consists in its entirety as

v 2
i gif/5_7.gif 0 0 727 821 back/5_7.gif
c gif/5_7.gif 6 76 40 119 p 8pt i
c gif/5_7.gif 54 91 64 126 1 6pt sub
c gif/5_7.gif 33 382 51 420 t 8pt i
c gif/5_7.gif 63 404 73 439 1 6pt sub
c gif/5_7.gif 678 382 696 420 t 8pt i
c gif/5_7.gif 705 404 724 439 2 6pt sub
c gif/5_7.gif 1 710 35 753 p 8pt i
c gif/5_7.gif 46 725 65 760 2 6pt sub

This is a version two file (v 2).  Then the backing image, which takes up the entire size of the Figure.  We computed above that the Figure should be 822 pixels tall, and the backing image goes from row 0 to row 821, 822 pixels.  Then it is overlaid with the various labels (p sub 1, t sub 1, t sub 2, p sub 2).


The backing image is just the part without the labels:


So now the problem comes down to cleaning up the line drawings.  These are scanned (at 600 dpi) just like the text, and just like the text, are not perfect.  If you look at it at a high enough level of magnification, you see the bits are:


It seems like it should be easy enough to write some code to clean up the various lines and circles and such and make them "better", less "fuzzy".

The problem is how to recognize the lines (and circles), so that we know what should be a straight edge, and where it starts and stops, and so on.  One option would be to just use a bit map editor and clean them up by hand -- that's what we did with the fonts -- but that seems like a lot of work.  It seems a program would be better -- it could run over all the figures and do them all the same.

Let's start with the lines -- how do we recognize a line from a set of pixels?  Horizontal or Vertical lines for a start, then straight lines at any angle, then maybe curved lines and arcs.  You would think that someone has already done this.