Advanced XML Programming:

Applications to Dual Publishing

David Bausum

proposal | vitae

Table of Contents

Part 1. Introduction: EBNF Rules for XML
 Chapter 1. Statement of the Problem
   1.1.  Introduction
   1.2.Overview of XML---Examples
  1. XML for this book.
  2. XML version for EBNF production rules for XML
  3. XML for vrule reference page in TeX Reference Manual
   1.3.XML Applications---Our Goals
  Screen and paper representations of the 3 examples
   1.4.A Strategy for Writing Code
  Includes thoughts on writer's block
   1.5.A Program's Shell and Makefile
 Chapter 2. A Quick and Dirty Parser: Version 1
   2.1.  Introduction
   2.2.Version 1's Goals and Tasks
   2.3.Data Structures---The Heading
   2.4.Initialization and File Input
   2.6.The Output Phase---Writing Reports
 Chapter 3. A Quick and Dirty Parser: Version 2
   3.1Program Tasks and Goals
   3.3The Heading's New Data Elements and Error Messages
   3.4Initialization and Output
 Chapter 4. A Quick and Dirty Parser: Version 3
   4.1Program Tasks and Goals
   4.2Task 1: Initialization and File Input
   4.3Task 2: Parse
   4.4Task 3: Error Handling
   4.5Task 4: Format and Write Data
   4.6Task 5: Reports---Meet the Goals
 Chapter 5. A Quick and Dirty Parser: Versions 4 and 5
   5.1Program Goals
   5.2Version 4 reports
   5.3The Character Translation File
   5.4Version 5 reports
   5.5Summary: Problems with the Q&D Parser
Part 2. A Front-end for XML and Dual Publishing
 Chapter 6. Program Design
   6.2Program modules
   6.3Unicode and ISO-8859-1
   6.4Structures for tokens and nodes
 Chapter 7. Parsing and Errors---Version One
 Chapter 8. String and Memory Management
 Chapter 9. File I/O Routines
 Chapter 10. Output Buffers and Character Translation
 Chapter 11. Parsing and Errors---Version Two
 Chapter 12. Validation
Part 3. Applications to Dual Publishing
 Chapter 13. A Program to Document Character Translation Files---axpct
 Chapter 14. A Program to Document Tags---axpdoc
 Chapter 15. The Program which Built this Book---axp

Advanced XML Programming:

Applications to Dual Publishing

David Bausum

A Proposal Prepared for Waterside Productions, Inc.

toc | vitae


Do we need another book dealing with XML?

My answer to the above question is simple, ``No! Not unless it is significantly different from existing books.''

In the fall of 2002, before I began work on the book described here, I checked my favorite on-line technical book store and discovered they carried 195 books dealing with XML. I downloaded---25 titles at a time---their list and wrote a small program that turned it into XML. Now, I had the title, author or authors, publisher, price, and other facts about each of the store's books in a format I could work with. After carefully studying the list and looking at additional information about specific books, I realized the book that existed in my head was significantly different from other XML books.

The two parts of my title, Advanced XML Programming: Applications to Dual Publishing (AXP), are immediate clues that my book is different. Of the 195 titles I located last fall, only 17 contained the words applications or application; only 13 contained programming; only 2 contained publishing; none contained the word advanced.

AXP is for programmers who want to learn how to parse XML data or to see the code in a large project (AXP includes over 8000 lines of code) OR for students who are taking a course in compiler design, software engineering, or software applications.

My goals in AXP are:

I am particularly well suited to write AXP for the following reasons:

Outstanding features of AXP:


AXP's competition consists of three distinctly different classes of books. Here are examples from each class:

1. Beginning and intermediate books dealing with XML.

2. Introductory books emphasing writing code.

3. Books that document a significant project.


AXP has three things in common with most existing books dealing with XML: the letters X, M, and L in the title. If you ignore that superficiality, AXP is completely different from other XML books.

AXP shares Software Tools's underlying philosophy. But, there are several important differences in the books. The problems that AXP examines are larger and more difficult than the ones in ST. Also, AXP's solutions are more complete; its code is industrial-strength (e.g., AXP's error routines don't cut corners). Finally, AXP looks like a modern book.

The first two books in the third class document important topics. But, AXP's approach is different from theirs. AXP's code is ANSI-standard C. It is filled with comments and not chopped to pieces. Any programmer who knows C will feel at home with AXP. Also, writing a typesetting program is a highly specialized undertaking. Very few programmers will ever write one. Even writing a compiler is a specialized task. Not many programmers write one that compiles real programs. But, an undergraduate could take AXP's front end and write a dual-publishing program that does something useful.

XML's popularity is growing. It is a tool that code writers will use for the next ten years and more. There is a need for a second generation of books about XML that do more than slap together a trivial B2B example and wrap 500 pages around it. These new books will assume its readers know what tags, attributes, and other facets of XML are and use XML to solve real problems. AXP is such a book.


I've written AXP for two audiences:

  1. Professionals who write software, particularly with C or C++, and who want to learn how to parse XML data and what to do with it after it is parsed.
  2. Students, particularly undergraduate computer science majors. AXP includes exercises. The front end it develops in Part 2 uses the same tools (string and memory management, hashing, error reporting, file I/O, and parsing) that a compiler's front end uses. So, AXP could be used as a supplemental text in a course on compiler design. Its practice complements the dragon book's theory (Compilers: Principles, Techniques, and Tools by Aho, Sethi, and Ullman. Addison-Wesley, 1986). Or, AXP would make an excellent text for a course in software engineering or applications programming. Or, it could be used in a special projects course, perhaps by a student working individually on a senior thesis.

In short, AXP's material is accessible and relevant to a large audience.

Actually, there is a third group of people who will find AXP relevant: Linux programmers. I work on Linux, and with one exception, I developed AXP using only tools that come free on every Linux system. The exception is fonts. I use Caslon 224 and LucidaSans as my serif and typewriter fonts. I find they work well together, and I prefer them to Times Roman and Courier. There is talk in the trade press that IBM will replace its version of Unix, AIX, with Linux. There is talk in the Linux community that someday there will only be Linux and whatever Microsoft is currently doing. My point is this: the platform I've used to develop AXP's code will still be around in ten years.


Here are three suggestions for ways to market AXP:

  1. With small ads in appropriate journals. Many serious programmers subscribe to at least one of the following journals: C/C++ Users Journal, Dr Dobb's Journal, or Linux Journal.
  2. Also, each of the first two should be interested in reviewing AXP. So should the LJ if the connection between AXP and Linux is pointed out to them. I approached LJ twice in the 90s and offered to review a serious programming book with a small connection to Linux for them, and each time they accepted my offer.
  3. At professional meetings. For example, Linux has an annual meeting.
  4. Normal promotions (fliers and complementary copies) to computer science faculty.

Status of AXP:

AXP has two components: code and prose.
  1. The code, currently 8000+ lines, is essentially complete, and it works. I may make minor tweaks to things, but I don't see the need for anything new that is major.
  2. I've completed the first 6 chapter (Part 1).

Vitae for David Bausum

toc | proposal

Personal Information:

Address: David Bausum
LightHouse & Associates
3207 S. Flack Road
Beloit, WI 53511
Web site:

Biographical Statement:

I began writing software professionally in 1984. I was the sole programmer for Master Software, Inc., a pre-IPO startup funded by a venture capitalist who lived in New York City. I wrote their Accounting and Materials Requirements Planning packages. When MSI ran out of cash in 1989, I started LightHouse & Associates. In the early 90s I wrote a program that received, processed, stored, and displayed the real-time stock market data transmitted via satellite by DBC. In 1998 I completed two projects that would have been XML projects, only I had not yet heard of XML.

I graduated Phi Beta Kappa from Lehigh University, and I hold a doctorate from Yale University. From 1974 to 1984 I taught mathematics and computer science on the college and university levels. Also, during that period, I took a year off and built a house, the Lighthouse. It was my home for twenty years.


1.*1 TeX Reference Manual. 2002. 396 pages. Kluwer Academic Publishers.
2.Review of Unix Network Programming, Volume 1, Second Edition: Networking APIs: Sockets and XTI by W. Richard Stevens. It appeared in The Linux Journal, Issue 52, August 1998, 86-87.
3.Review of Advanced Programming in the Unix Environment by W. Richard Stevens. It appeared in The Linux Journal, Issue 42, October 1997, 40-41.
4.*2The Journal of Military History Cumulative Index: Vols. 1-58, 1937-1994. 1995. 600 pages.
5.*3Master Journal Editor Reference Manual. 1989. 300 pages. Xeroxed and distributed by Master Software, Inc.
6.*3  Mail Master Reference Manual. 1989. 300 pages. Printed and distributed by Master Software, Inc.
7.``Embeddings and Immersions of Manifolds in Euclidean Space,'' Trans. Amer. Math. Soc. 213 (1975), 263-303.
   *1I prepared camera-ready copy for this book.
   *2I am co-editor. Among other things I prepared camera-ready copy for this.
   *3I wrote the software and the manual. Also, I prepared camera-ready copy.