|Part 1. Introduction: EBNF Rules for XML|
|Chapter 1. Statement of the Problem|
|1.2.||Overview of XML---Examples|
|1. XML for this book.|
|2. XML version for EBNF production rules for XML|
|3. XML for vrule reference page in TeX Reference Manual|
|1.3.||XML Applications---Our Goals|
|Screen and paper representations of the 3 examples|
|1.4.||A Strategy for Writing Code|
|Includes thoughts on writer's block|
|1.5.||A Program's Shell and Makefile|
|Chapter 2. A Quick and Dirty Parser: Version 1|
|2.2.||Version 1's Goals and Tasks|
|2.3.||Data Structures---The Heading|
|2.4.||Initialization and File Input|
|2.6.||The Output Phase---Writing Reports|
|Chapter 3. A Quick and Dirty Parser: Version 2|
|3.1||Program Tasks and Goals|
|3.3||The Heading's New Data Elements and Error Messages|
|3.4||Initialization and Output|
|Chapter 4. A Quick and Dirty Parser: Version 3|
|4.1||Program Tasks and Goals|
|4.2||Task 1: Initialization and File Input|
|4.3||Task 2: Parse|
|4.4||Task 3: Error Handling|
|4.5||Task 4: Format and Write Data|
|4.6||Task 5: Reports---Meet the Goals|
|Chapter 5. A Quick and Dirty Parser: Versions 4 and 5|
|5.2||Version 4 reports|
|5.3||The Character Translation File|
|5.4||Version 5 reports|
|5.5||Summary: Problems with the Q&D Parser|
|Part 2. A Front-end for XML and Dual Publishing|
|Chapter 6. Program Design|
|6.3||Unicode and ISO-8859-1|
|6.4||Structures for tokens and nodes|
|Chapter 7. Parsing and Errors---Version One|
|Chapter 8. String and Memory Management|
|Chapter 9. File I/O Routines|
|Chapter 10. Output Buffers and Character Translation|
|Chapter 11. Parsing and Errors---Version Two|
|Chapter 12. Validation|
|Part 3. Applications to Dual Publishing|
|Chapter 13. A Program to Document Character Translation Files---axpct|
|Chapter 14. A Program to Document Tags---axpdoc|
|Chapter 15. The Program which Built this Book---axp|
Do we need another book dealing with XML?
My answer to the above question is simple, ``No! Not unless it is significantly different from existing books.''
In the fall of 2002, before I began work on the book described here, I checked my favorite on-line technical book store and discovered they carried 195 books dealing with XML. I downloaded---25 titles at a time---their list and wrote a small program that turned it into XML. Now, I had the title, author or authors, publisher, price, and other facts about each of the store's books in a format I could work with. After carefully studying the list and looking at additional information about specific books, I realized the book that existed in my head was significantly different from other XML books.
The two parts of my title, Advanced XML Programming: Applications to Dual Publishing (AXP), are immediate clues that my book is different. Of the 195 titles I located last fall, only 17 contained the words applications or application; only 13 contained programming; only 2 contained publishing; none contained the word advanced.
AXP is for programmers who want to learn how to parse XML data or to see the code in a large project (AXP includes over 8000 lines of code) OR for students who are taking a course in compiler design, software engineering, or software applications.
My goals in AXP are:
I am particularly well suited to write AXP for the following reasons:
AXP's competition consists of three distinctly different classes of books. Here are examples from each class:
1. Beginning and intermediate books dealing with XML.
The XML Schema Complete Reference by Cliff Binstock et al. (Addison-Wesley, 2003).
These books are examples of the `first' and `second' books a person might buy on XML (e.g., they are the XML books I own). There are dozens of such books. The above two lack a coherent style, which is not surprising since the books were written by a committee---each book lists six authors.
2. Introductory books emphasing writing code.
Software Tools by Brian W. Kernighan and P.J. Plauger (Addison-Wesley, 1976).
This is an excellent book. Its goals may be simply stated: software grows and exposure to good code makes good code-writers. I read this book in the early 80s and looked at it recently for the first time in at least 15 years. Although it is somewhat dated, I find that it is still relevant.
3. Books that document a significant project.
TeX: The Program---Computer & Typesetting/B by Donald E. Knuth (Addison Wesley, 1986).
This is a specialized book. It has two goals: to list the code that comprises the program TeX and to explain how the code works. However, it has, in my opinion, one weakness. Knuth keeps a program's code and his explanations about how the code works in a single file---a WEB program. He runs software that converts a WEB program into Pascal and into a book. Most programmers don't use such a system. The code made from a WEB program is sterile---it contains no comments. Also, it is chopped into pieces that I find difficult to read. Pascal was popular in the late 70s and early 80s, but it is used much less frequently today. In my opinion only dedicated TeX gurus will spend much time with this book.
A Retargetable C Compiler: Design and Implementation by Christopher Fraser and David Hanson (The Benjamin/Cummings Publishing Company, Inc., 1995).
Interestingly, this book's preface pays homage to Software Tools and TeX: The Program. Also, it makes the point that there are not many books that document a large program. It is similar to TeX: The Program in that the authors combine the book's code and prose in a single file that they call a literate program. The result is code without comments, and code chopped to pieces. Also the book's copyright is misleading. It was actually written in the 80s, and the code in the book uses old-style function definitions (i.e., the book is dated).
TCP/IP Illustrated, Volume 2 by Gary R. Wright and W. Richard Stevens (Addison-Wesley, 1995).
This is an excellent book. Like all of Stevens' books, it is a pleasure to work with. It did not originate as a WEB or literate program. It is C. The authors combine code, diagrams, and other pedagogical aids with their documentation.
AXP has three things in common with most existing books dealing with XML: the letters X, M, and L in the title. If you ignore that superficiality, AXP is completely different from other XML books.
AXP shares Software Tools's underlying philosophy. But, there are several important differences in the books. The problems that AXP examines are larger and more difficult than the ones in ST. Also, AXP's solutions are more complete; its code is industrial-strength (e.g., AXP's error routines don't cut corners). Finally, AXP looks like a modern book.
The first two books in the third class document important topics. But, AXP's approach is different from theirs. AXP's code is ANSI-standard C. It is filled with comments and not chopped to pieces. Any programmer who knows C will feel at home with AXP. Also, writing a typesetting program is a highly specialized undertaking. Very few programmers will ever write one. Even writing a compiler is a specialized task. Not many programmers write one that compiles real programs. But, an undergraduate could take AXP's front end and write a dual-publishing program that does something useful.
XML's popularity is growing. It is a tool that code writers will use for the next ten years and more. There is a need for a second generation of books about XML that do more than slap together a trivial B2B example and wrap 500 pages around it. These new books will assume its readers know what tags, attributes, and other facets of XML are and use XML to solve real problems. AXP is such a book.
I've written AXP for two audiences:
In short, AXP's material is accessible and relevant to a large audience.
Actually, there is a third group of people who will find AXP relevant: Linux programmers. I work on Linux, and with one exception, I developed AXP using only tools that come free on every Linux system. The exception is fonts. I use Caslon 224 and LucidaSans as my serif and typewriter fonts. I find they work well together, and I prefer them to Times Roman and Courier. There is talk in the trade press that IBM will replace its version of Unix, AIX, with Linux. There is talk in the Linux community that someday there will only be Linux and whatever Microsoft is currently doing. My point is this: the platform I've used to develop AXP's code will still be around in ten years.
Here are three suggestions for ways to market AXP:
|LightHouse & Associates|
|3207 S. Flack Road|
|Beloit, WI 53511|
I began writing software professionally in 1984. I was the sole programmer for Master Software, Inc., a pre-IPO startup funded by a venture capitalist who lived in New York City. I wrote their Accounting and Materials Requirements Planning packages. When MSI ran out of cash in 1989, I started LightHouse & Associates. In the early 90s I wrote a program that received, processed, stored, and displayed the real-time stock market data transmitted via satellite by DBC. In 1998 I completed two projects that would have been XML projects, only I had not yet heard of XML.
I graduated Phi Beta Kappa from Lehigh University, and I hold a doctorate from Yale University. From 1974 to 1984 I taught mathematics and computer science on the college and university levels. Also, during that period, I took a year off and built a house, the Lighthouse. It was my home for twenty years.
|1.*1||TeX Reference Manual. 2002. 396 pages. Kluwer Academic Publishers.|
|2.||Review of Unix Network Programming, Volume 1, Second Edition: Networking APIs: Sockets and XTI by W. Richard Stevens. It appeared in The Linux Journal, Issue 52, August 1998, 86-87.|
|3.||Review of Advanced Programming in the Unix Environment by W. Richard Stevens. It appeared in The Linux Journal, Issue 42, October 1997, 40-41.|
|4.*2||The Journal of Military History Cumulative Index: Vols. 1-58, 1937-1994. 1995. 600 pages.|
|5.*3||Master Journal Editor Reference Manual. 1989. 300 pages. Xeroxed and distributed by Master Software, Inc.|
|6.*3||Mail Master Reference Manual. 1989. 300 pages. Printed and distributed by Master Software, Inc.|
|7.||``Embeddings and Immersions of Manifolds in Euclidean Space,'' Trans. Amer. Math. Soc. 213 (1975), 263-303.|
|*1||I prepared camera-ready copy for this book.|
|*2||I am co-editor. Among other things I prepared camera-ready copy for this.|
|*3||I wrote the software and the manual. Also, I prepared camera-ready copy.|