The Chicago Manual of Style (14th Edition, Section 19.40) says
"Makeup is a highly skilled procedure. If the text is merely divided mechanically into portions of equal length, without regard to where the divisions fall, some of the pages that result are bound to be unacceptable logically or aesthetically: they will incorporate bad breaks."This is especially true for a programming text, which normally contains lots of programs, figures, and tables.
This document is an attempt to describe how I perform page layout for the books that I write, using some real examples. These examples are from UNIX Network Programming Volume 1, Second Edition (1998).
I do all my page layout by hand, chapter by chapter. I start by running troff on a chapter and print the pages 2-up, side by side, so that I can see what the facing pages look like. I use the shell pipeline make -s Chap.ioctl | ps.odd | 2up | lp where ps.odd is a short awk program that inserts a blank page at the beginning of standard input.
Here is an example (40 Kbytes PostScript) showing the second, third, fourth, and fifth pages from Chapter 16. These pages are how things "fell out" without doing anything.
Here are the final pages from this example. I have left the hash marks in at the bottom, but naturally I delete these before the final PostScript is generated. Here are the changes that I made in doing the layout. Note that the page numbers have changed also.
As you can see, much of this is trial and error, and you don't just sit down and do an entire chapter in one fell swoop. The normal scenario goes like this. I print the first copy of the chapter, make the changes that I think are needed on the first few pages, then look at the pages using something like
make -s "PAGES=-o425-435" Chap.ioctl | ps.odd | 2up | pageview -right -geometry 1100x800+0+0 -
I do all my writing on a SparcStation and use Sun's pageview program to preview PostScript on my 17-inch color monitor. Troff's -o option lets me specify the pages for which I want it to generate PostScript, and in this example I look at only the 11 pages starting with page 425.
After making some changes, I rerun troff, and look at the result, and make more changes. As I move through a chapter, finishing pages, I increment the starting and ending page numbers to be viewed with the -o option. I find that to layout N pages, it takes about N/2 troff runs. It takes me about 3.5 minutes per final page, so for a 1,000-page book this totals to just over 58 hours.
Is my technique ideal? Not really, but it gives me complete control over how my pages look, and I think the appearance of a book is very important, especially a programming book. I keep thinking of writing a program to do all this for me, but the amount of work appears enormous. I think the approach is to have troff just generate a long stream of PostScript, assuming an infinitely long page, and then do the page layout on the resulting PostScript, just moving it around on the pages. But then I would really have to learn PostScript (ugh) and you somehow have to generate the running headers and page numbers.
The last line of a right hand page should not end with a hyphen. This has been a style rule for many years, yet it is amazing that most word processors do not do this! I just smile when I pick up a book produced with something like Frame and you immediately find these errors. Needless to say, troff does this correctly, and has for 20+ years. A friend commented to me that normal evolution would have gone Word to Frame to troff, but instead, the computer industry has gone the other way!
Occasionally, however, hand tweaks are needed. Page 875 originally ended at the word "out-of-band", with "out-" on the bottom of this page and "of-band" at the top of the next page. (Troff allows this, since the word contains an explicit hyphen.) To avoid this I first added backslash-p to the word before: that\p out-of-band data. This tells troff to end the line when it hits the backslash-p, and the resulting line is spread out to fill the line length.
While this is OK, I then noticed the word "by" that terminated the line above it, and the line above was quite tight (the interword space appeared quite minimal). So to make these two lines even better, and avoid the hyphen at the end of the page, I wrote provided\p by to move "by" down one line, producing two lines that both looked better.
Another example like this is the bottom of page 927. I preprocess all my files through some sed scripts, before troff sees them. One tweak I make is to change all figure references, such as "Figure 5.5" into Fig\%ure\~5.5. The backslash-% is an explicit hyphenation indicator and the backslash-tilde is gtroff's unbreakable space that then stretches like a normal interword space when the line is adjusted. What I am telling troff is not to break a line with "Figure" at the end and "5.5" at the beginning of the next line, because I would rather have "Fig-" at the end of the line and "ure 5.5" at the beginning of the next line. (I learned this trick from Chapter 14 of Knuth's The TeXBook, a wealth of typesetting information, even if you do not use TeX.)
The problem with this example is that page 927 then ended with "Fig-", since the word had an explicit hyphenation indicator. In the preceding line, the term "3 bytes" was split with "3" at the end of the line and "bytes" at the beginning of the next line. To fix these two line I typed first\p 3 bytes and then in\p Figure 5.5.
Another potential hyphenation problem is when the last word of a paragraph is hyphenated, and the part on the line by itself is a small suffix. Look at the sentence preceding Figure 26.11 on page 717. Originally the last word "signal" was hyphenated, causing "nal" to be the entire second line. To prevent this I wrote \%signal, which tells troff not to hyphenate the word, forcing the end of the line before "signal" is output. While I was able to do this on page 717, notice that I could not do this on the fourth line from the bottom of page 716: "happen-ing". If I had told troff not to hyphenate this word, the interword spacing on the line would have appeared excessive. This is one of those decisions that you just have to make.
The Chicago Manual of Style (14th Edition, Section 6.58) says that no more than three succeeding lines should be allowed to end with hyphens. Vanilla troff does not handle this, but gtroff does. I specify .hlm 2 so that no more than two lines in a row can end in a hyphen.
Ideally all figures should appear sequentially and then their text references should also be in order. That's how I write my books. Look at the top of page 667: the reference is to Figure 25.8, but Figure 25.7 appears next, even though it is not referenced until later on page 667. Indeed, the tv_sub function on page 667 is not referenced until line 21 of the proc_v4 function on page 668. Figure 25.8 would have fit on page 667, but I moved it to keep it and Figure 25.9 together. (Both of these figures would not have fit on page 667.) I thought it was more important to keep these two figures together (showing the two pointers into the headers along with the three lengths on the same page as the code), than to keep the figure references in order.
Another slight violation of the rules is on page 729: Figures 27.2 and 27.3 appear on this page, but they are not even referenced until the next page (730). I did this to keep these two figures along with Figure 27.1 together, on facing pages. Since all three figures are referenced throughout this chapter, keeping them together on facing pages just makes sense.
By default, the troff macros that I use center each picture on the page, based on the picture's width (calculated by pic). But there are occasions when you have multiple pictures on a page, and you want them lined up. Page 242, Figures 9.2 and 9.3 are one example: I had to add an invisible box to the right of Figure 9.2 so that it aligned "correctly" with Figure 9.3. Page 809, Figures 30.7 and 30.8 are another example.
An overall goal is to keep all program listings on a single page. First, this promotes the coding style of trying to keep each function to one page or less, and second it is much easier to read a piece of code that you can see as one piece. When I have functions that exceed one page, I break them into pieces, showing each piece as a separate figure, hopefully on a single page.
When I first wrote Chapter 28, Figure 28.13 was in two pieces: the first figure contained lines 1-34, and the second lines 35-55. But when I was laying out this chapter the placement of the first figure ended right at the bottom of the page. I combined the two figures into one. By getting rid of one figure, I also saved the room occupied by the two sentences leading into the second figure, the vertical space to that figure, the rule above the figure, the rule below the figure, and the figure caption (with its space above and below). This ended up making the book two pages smaller: notice that the chapter ends on page 782 right at the bottom of a left hand page. Had I retained the second figure by itself, the chapter would have spilled over to page 783, requiring a blank page 784, causing Chapter 29 to start on page 785.
Sometimes it is just not possible to keep a program listing on a single page, without adding lots of blank space to a page. In that case I just break the listing across a page boundary, being careful where the break occurs (with regard to C). If the first part is on a left-hand page, and the second part on a right-hand page, that is great, but it does not always work out this way. Page 324 is an example: I purposely put the page break between the final case and the default.
Normally I do not change the vertical spacing between text lines, but page 910 is an example of when to do this. The page was about one-half of a vertical space too short (the page bottom was not even with page 911), but there is no place to add the extra space (no headings or the like). There are 45 lines in the option summary, and 45/0.5v yields 0.011v, so I added the line .vs +0.011v to increase the line spacing for the option summary.
When I laid out the Bibliography, the first troff run had the Bibliography end on page 969. This means the Index will start on page 971, giving me page 970 to "play with". What I did was lay out each page of the Bibliography about 2 lines "short"; that is, there is an extra space of about 2 lines at the bottom of each page of the Bibliography. My reason for doing this is that the most common change that I make to the Bibliography when the book is reprinted (when I get to correct typos and the like) is to add something, often an URL that I have found for a paper in the Bibliography. By giving myself some room on each page, it makes this easier for me in the future. I do the same thing with the Index, because the most common change that I make to the index with a reprint is to add, not to delete.
To align the page bottoms of the Bibliography I first go through it and set all the page breaks, ignoring the evenness. Then I measure how much space remains at the bottom of each page, divide it by the number of entries on that page minus one, and just insert that much space between each entry. For example, page 964 needed 3.6 lines of vertical space, and 3.6v/16 is 0.2250v, so there are 16 .sp 0.2250v commands on that page.
The paper Page Makeup by Postprocessing Text Formatter Output by Brian W. Kernighan and Christopher J. Van Wyk, Computing Systems, Vol. 2, No. 2, pp. 103-113, 1989, is required reading for anyone really interested in page layout. This paper describes a program that does the page layout, but alas, the program was only available with the latest versions of Bell Lab's troff, which was not widely available. You can see the results of this program by looking at some of the books published by Bell Lab's authors since 1989.
The paper Breaking Paragraphs into Lines by Donald E. Knuth and Michael F. Plass, Software--Practice and Experience, Vol. 11, pp. 1119-1184, 1981, describes the TeX way of breaking paragraphs into lines, using a dynamic programming algorithm. I wish troff did this operation as well as TeX. This paper has been reprinted in Knuth's Digital Typography book.
As mentioned earlier, The TeXbook by Donald E. Knuth (Addison-Wesley, 1984), is a wealth of typesetting information.