1 An Overview of rtftohtml
rtftohtml reads up RTF format documents and translates them to HTML. In
processing text, the filter chooses HTML markup based on three characteristics
- The destination
of the text. Example destinations are header, footer, nootnote, picture.
- The paragraph style
Paragraph styles are user-definable entities, but some are pre-defined by the
word processing package. For Microsoft Word (on the Macintosh) examples are
"Normal" and "heading 1".
- The text attributes
Examples of text styles are bold, courier, 12 point.
The filter has built-in rules for dealing with destinations. For paragraph and
text styles, the rules for translation are contained in a file called html-trans
By modifying this file, you can train rtftohtml to perform the correct
translations for your documents. The most common change that you will need to
make is to add your own paragraph styles to html-trans.
rtftohtml should produce reasonable HTML output
for most documents. Here is what you can expect:
- Your output should appear in a file called "xx.html" where "xx" or
"xx.rtf" was your input file name.
- Bold, italic and underlined text should appear with <b>,<i>
and <u> markup
- Courier font text should appear with <tt> markup
- Tables will be formatted using <pre> markup (only plain text is
supported in tables.)
- Footnotes will appear in a separate document with hypertext links to them.
- Table of contents, indexes, headers and footers are discarded.
- Table of Contents
entries and paragraphs with the style "heading 1..6" will generate a hypertext
Table of Contents in a separate file. Each table of contents entry will link to
the correct location in the main document.
- All paragraph styles use in your document must appear in the file
"html-trans" (see also
This allows you to create a mapping from any paragraph style to any HTML
markup. There are many pre-defined styles in html-trans, including "heading
1..6". (If a paragraph style is not found, a warning will be generated and the
text will be written to the HTML file with no special markup.)
- Each graphic
in your file will be written out to a separate file. The filename will be
"xxn.ext" where "xx" or "xx.rtf" was your input, "n" is a unique number and
"ext" will be either "pict" for Macintosh PICT format graphics or "wmf" for
Windows Meta-Files format graphics. The HTML file will create links to these
files, using either "<A HREF=" or "<IMG SRC=" links. SINCE most WWW
browsers do not understand "wmf" or "pict" format files, the link will be to
xxn.gif. This presumes that you will run some other filter to
translate your graphic files to gif.
- Text that is connected with copy/paste-link constructs will generate