The Petri net book has a lot of Figures. 152 of them, at least. Some of the Figures are really just tables, so they can be typeset like the rest of the book, but Petri nets are basically a graphical representation of information or computation flow, so it makes sense there would be a lot of Figures.
The Figures are mainly line drawings, lines and circles, with labels attached to the lines and circles. The labels are text, and can be typeset like the text of the book. We have a "line" type in our representation, but it's really only for either horizontal or vertical lines. It uses the same "box" notation we have for the character boxes -- start and end of x and y coordinates, 4 numbers -- so it's really a black box, not a black line. But if the box is very thin in one dimension or the other, it becomes a line (horizontal or vertical).
But the lines in the Figures are not horizontal or vertical, in general, and most of them actually have arrow heads on them, being directional lines. And then there are the circles.
But we have "images" in our representation. Normally an "image" is used to represent the Figures. In the text, we leave a blank space for the Figure, and place the text above and below it.
However, we can also use the same approach to defining the Figures themselves. So we have a two level structure. First we have a set of files that represent the Figures. These are used to produce GIF images of the Figures. Then we have the set of files for the book, which include the images for the Figures.
For the representation of the Figures, we have two basic parts -- the text which are the labels and the actual drawing. If we create a GIF file which is just the drawing part (the circles and arrows) with no text labels, and include it as an image that covers the entire "page" of the Figure, then we can then list the characters which are the text of the labels and they will be placed on top of the background line drawing. So we can represent each Figure by a background image and a box file with the text.
As an example, consider page 132. The box file for it contains both the text and the image:
...
c gif/p132.gif 1682 314 1698 361 l 8pt i
c gif/p132.gif 1697 318 1710 361 i 8pt i
c gif/p132.gif 1714 323 1732 361 t 8pt i
c gif/p132.gif 1730 328 1765 372 y 8pt i
n gif/p132.gif 371 313 1765 373 newline
i gif/p132.gif 1339 532 2066 1353 figures/gif/5_7.gif
n gif/p132.gif 1339 532 2066 1353 newline
c gif/p132.gif 914 1404 949 1451 F 8pt b
c gif/p132.gif 953 1404 970 1451 i 8pt b
c gif/p132.gif 975 1418 1003 1462 g 8pt b
c gif/p132.gif 1010 1418 1045 1451 u 8pt b
...
We have first the boxes that define where the characters go for the page heading (which is in 8 point italic), followed by the image, which takes up all the space from row 532 to row 1353, so it is (1353 - 532 + 1 =) 822 pixels tall. Then after the Figure, we have it's caption, which starts with "Figu..." in 8pt bold.
This looks like:
where if you look at the full image, the various parts have boxes drawn around them to show where each box is. One box around the trailing italic "y" and then a bigger box around the image, and another box for the bold capital "F", and so on.
The description of the Figure itself, is the same sort of thing, but consists in its entirety as
v 2
i gif/5_7.gif 0 0 727 821 back/5_7.gif
c gif/5_7.gif 6 76 40 119 p 8pt i
c gif/5_7.gif 54 91 64 126 1 6pt sub
c gif/5_7.gif 33 382 51 420 t 8pt i
c gif/5_7.gif 63 404 73 439 1 6pt sub
c gif/5_7.gif 678 382 696 420 t 8pt i
c gif/5_7.gif 705 404 724 439 2 6pt sub
c gif/5_7.gif 1 710 35 753 p 8pt i
c gif/5_7.gif 46 725 65 760 2 6pt sub
This is a version two file (v 2). Then the backing image, which takes up the entire size of the Figure. We computed above that the Figure should be 822 pixels tall, and the backing image goes from row 0 to row 821, 822 pixels. Then it is overlaid with the various labels (p sub 1, t sub 1, t sub 2, p sub 2).
The backing image is just the part without the labels:
So now the problem comes down to cleaning up the line drawings. These are scanned (at 600 dpi) just like the text, and just like the text, are not perfect. If you look at it at a high enough level of magnification, you see the bits are:
It seems like it should be easy enough to write some code to clean up the various lines and circles and such and make them "better", less "fuzzy".
The problem is how to recognize the lines (and circles), so that we know what should be a straight edge, and where it starts and stops, and so on. One option would be to just use a bit map editor and clean them up by hand -- that's what we did with the fonts -- but that seems like a lot of work. It seems a program would be better -- it could run over all the figures and do them all the same.
Let's start with the lines -- how do we recognize a line from a set of pixels? Horizontal or Vertical lines for a start, then straight lines at any angle, then maybe curved lines and arcs. You would think that someone has already done this.
No comments:
Post a Comment