Monday, February 23, 2015

Uploading the Book Contents

The biggest problem with the production of any book (once it has been written) is formatting it.  The most important part of a book is its content, but that content has to be presented so that the focus is on the content, not the presentation.

In this case, we have all the content -- it was written by Thomas Leary in 1953.  All we need to do is format it.

There are many ways to format things.  For the print-on-demand version of the book, I am working on page images that will look exactly like the original book (or as close as I can come). So these should be suitable for a 6 inch by 9 inch page with a 10-point font, as was the original book.

But for an eBook, like the Kindle edition, we want to match the properties of the Kindle device, which is a smaller screen  which is 600 by 800 pixels at 167 pixels per inch (or 3.6 by 4.8 inches).  The Kindle allows the user to choose both the font and the font size, so that the words can be bigger (or smaller), as the reader chooses.  When the reader chooses a different font, all the paging changes.  The book could be 30 pages or 50 pages, depending on the size of the font chosen by the reader.

HTML is almost perfect for that -- it defines what needs to be done (a header, centering, italic, start a paragraph), but does not specify the specifics of how that is presented.  But it is meant for computer screens, not for the printed page, so it has no concept of "start a new page".  The Kindle presents things as "pages", and you can go from one page to another on it's fixed size screen.  So the Kindle does have a concept of "start a new page", which is not part of HTML.  So it would seem that HTML is not sufficient to define the formatting for a Kindle.

The Kindle documentation seems to suggest that a text formatting program, like Microsoft Word, is the way to go.  Word (and similar programs) allow the text to be formatted for the printed page, and has that ability to "Insert page break".

I run a Linux computer and don't use Word, but have an open-source "equivalent" in  Libre Office.  Libre Office will read in Word documents (in .doc format) and write them back out.  So it seems that this is the way to go.  Create a Word document (.doc format) and upload that for Kindle.

After going thru all that, importing the text into Libre Office and formatting it, inserting the figures and photos, adjusting the point sizes for the titles, adjusting margins for quotes, and so on, then the Kindle documentation says to "convert this to HTML and upload the HTML"!  My guess is that they have some Microsoft specific extension to HTML to define page breaks, and looking for that, we find a Kindle Direct Publishing web page on "Custom HTML Tag", that says to use  <mbp:pagebreak /> for a page break, and <mbp:section> for a book section, as well as a list, on a different page, of "Supported HTML Tags in Book Content".

The documentation for Kindle says to have Word generate the HTML.  I can ask Libre Office to generate HTML, but if I then upload that to Kindle, and use the Kindle Previewer to see the result, I end up with a 3321 page book, with all but the first two pages being blank (I think).  The HTML produced by Libre Office, at the least, is not a good representation of the original file.

This gives us two possibilities.  (1) I can fire up a Windows machine, and move my oi.doc file to it, to use Microsoft Word to produce the HTML to feed to Kindle, or (2) I can modify my HTML by hand to match the list of tags supported by Kindle, according to this web page, and upload that.

Let's try the first possibility first. I can write the files out to a USB flash drive and take it to a Windows computer.  The Windows computer, running an older version of Word, has significant problems.  For example, it is not able to find some of the figures, and when I re-insert them, it goes into an infinite loop.  But still, creating an HTML file from what it has is much closer to what we want than we have seen from Libre Office. It looks like using the Windows HTML as a guide, I can hand-modify my current HTML to produce the desired result.

This works out pretty well.  Hand-produced HTML is very easy to produce and produces a quite reasonable image.

I use "zip" to create a package of both the HTML file (oik.html) and all of the image files that it references (in our case, all GIFs), and upload that zip file to the Kindle site.  Amazon then runs a spell check, which in our case shows up only proper names and the "mis-spellings" that result from the early English spellings of words that are quoted from original writings of Francis Bacon.  Amazon then allows the book to be previewed.

The presentation of the preview is very well done, with one apparent bug.  The page layout of the figures seems to vary depending on which direction (forward or backwards) you are paging.  For example, the caption of the first figure (on the first page) shows up, as desired at the bottom of the first figure when you start, but if you page forward one -- to the title page -- and then back one, you get a page with just the caption (at the top of the page). Paging back one more gives the first figure (with the caption underneath it).

The preview allows us to consider the book on multiple devices.  The main issue is the figures, which seem to change size, depending on the resolution and screen size of the device.



No comments:

Post a Comment