Thursday, February 21, 2019

Republishing the MIX book

Part of the motivation for bringing The Oak Island Enigma back into print was to develop tools and techniques that I could use on other books.  Specifically, I have a couple of books that I wrote in the 70's and I would like to bring them back into print, as print-on-demand books.  With the techniques developed from The Oak Island Enigma, I was ready to start.

The first book I wanted to do was "Computer Organization and Assembly Language Programming", originally published in 1978 by Academic Press.  This is a lower-level college computer science textbook, aimed at teaching assembly language.  Specifically it uses MIX to teach the concepts of assembly language, going so far as to write an assembler for MIX assembly language, in MIX assembly language.

Some years back, I scanned the book and converted it to HTML, but HTML is not sufficient for page layout.  Now I had the tools to convert it to PDF using the same format as when it was first published.  I needed a scan of the book, to use with the tools developed.  It turns out I have a PDF of the book that I could use that was provided to me by American Elsevier.  But examining that PDF shows it to be just 400 pages of images, each image being one page.  So they basically just photo-copied the book to make the PDF.  But I can extract the images from that PDF, to get the scanned pages.

After a month or so of working with the scanned pages, converting them to text, with the boxes indicating where each character is on the page, it became obvious that this process would not work for the MIX book.  With the Oak Island book, there were only a few typos to correct; with the MIX book there were lots of them.  The MIX book was used by a lot of students, and I have pages of corrections that needed to be made.  Plus the scanned image was the second printing, which already had a bunch of changes, while the HTML I have is from the first printing (all my copies of the book are the first printing), and the two do not match very well.

So it became obvious that I needed to do a (minor) rewrite and republishing of the book -- I couldn't just reproduce the existing one.  Well, I could, but it would be wrong.

So I considered converting the HTML version (which has all the error fixes in it), to troff.  I had previous experience (back in the 80s) with using troff to produce page proofs for an operating system book.  But again, after spending the time to write a program to convert HTML to troff, and re-learning how to write troff macros, and such, it looked like that would not work either.  When we did the OS book, using troff, we were on the leading edge of doing page proofs, and it worked for all the text.  But the figures were done by just leaving a big hole and pasting the figures onto the page proofs after they were created, before sending it to the printer.  That's not how figures are done anymore.  We have image files for the figures; we want them included directly into the pages.

It would seem that troff evolution effectively stopped in the mid-80s, because interest shifted to TeX and LaTeX.  TeX and LaTeX provided most of the same functionality as troff, and had more cachet.  And LaTeX provided a way to include images directly into your output.  We could have gotten the source for troff (now groff) and added images, but for the purpose of re-creating the MIX book, it seemed more cost effective to simply convert it to LaTeX.

So we went back and change our HTML to troff program to be an HTML to LaTeX program, and then started making changes by hand.  There was a lot of learning, and significant frustration. There was much trial and error, and searching thru online LaTeX forums.  The programs were sufficiently fast that it was reasonable to make a change (for example, move a figure from one spot to another) and then  re-run latex and generate new PDF to review.  But there were weeks of formatting and checking how things were laid out into pages.  It is important to make sure that you are generating the correct size of pages, and in the correct font size early.  The book design said 11 point font, which we defined, but it turns out if LaTeX doesn't have an 11 point font for your style, it ignores that, and uses 10 points.  Changing later to an actual 11 point font, changes all the page layout.

But eventually we were able to get a LaTeX version of the book that would produce reasonable PDF output.  This could easily be modified to include all the error corrections, and additional text.

With the PDF, and with a separate PDF for the cover, we could upload those to the Amazon print-on-demand site, and get the book back into print.

It is now available as Computer Organization and Assembly Language Programming.