AscToHTM Documentation for the AscToHTM conversion utility
This documentation can be downloaded as part of the documentation set in .zip format (200k).

Prev | Next | Contents


7 Using the preprocessor

The preprocessor was introduced in version V1.05 to allow users more flexibility in the HTML they generate. As such it moves AscToHTM towards being a HTML authoring tool, as opposed to a simple text conversion or migration tool. Although this wasn't AscToHTM's original intention, it is increasingly the use to which AscToHTM's "power users" are putting it. As such this is a rapidly growing area of functionally within the product.

The preprocessor looks for lines that begin with a special character sequence. Presently this is "$_$_", but this will become configurable in later versions.

Preprocessor lines are not normally output to the HTML generated. Instead they are used to modify AscToHTM's behaviour in a number of ways.


7.1 Marking up sections of text

The pre-processor can be used to mark sections in your document so that AscToHTM will process them as you wish.

Note:
AscToHTM does attempt to spot much user-formatted text automatically, but this is a difficult area and prone to error. Hence the use of these directives can reduce the error rate on such occasions.

7.1.1 User SECTIONS

This directive is used to divide the document up into named section types. Section type names can be repeated through the document, and by default text is assumed to belong to a section called "all", indicating that this text is always copied to the output file.

Section type names must contain no white space, but may contain underscores.

This has no effect unless the user supplies a policy file indicating that they wish to select only certain section types for output.

For example, if the text document looks like this

                Some text that'll always get copied, because it is in an
                "all" section type by default.
        $_$_SECTION Private
                Some text that will be copied either when the preprocessor
                is switched off, or when the user's policy file indicates
                that "private" section types are to be included.
        $_$_SECTION Other
                Likewise, this is an "other" section type.
        $_$_SECTION Private
                And here's some more "private" text.
        $_$_SECTION all
                Some text that will always get copied because it is explicitly
                in an "all" section type.

If the user then supplies a document policy file which includes the lines (see 6.3.5)

[Preprocessor]
 
Use Preprocessor
: Yes

then the two section types marked "private" won't be copied into the converted file unless the line

Include document section : Private



is added to the policy file. Similarly with the "other" section.

Note_1:
Strictly speaking the "use preprocessor" line above isn't needed as this is set to "yes" by default. This means that any $_$_SECTION lines will cause text to be omitted unless you supply an appropriate policy file.
Note_2:
Be aware that any sections omitted are also omitted from the analysis pass. This may have unexpected results as AscToHTM responds only to the input text that is to be included in the output.

7.1.2 TABLE and DELIMITED_TABLE sections

The BEGIN_TABLE ... END_TABLE directives are used to bracket a table in the source text. AscToHTM will then attempt to analyse this table as best it can.

This is explained more in the AscToTab documentation.

Inside this section you can use other TABLE pre-processor commands to tailor the HTML generated (see 7.4).

Similarly the BEGIN_DELIMITED_TABLE ... END_DELIMITED_TABLE directives can be used to delimit a series of tab-delimited data values that should be interpreted as a table (e.g. data originally exported from a spreadsheet such as Excel)


The presence of these directives overrides any value set in the "Attempt table generation" policy


7.1.3 CONTENT sections

The BEGIN_CONTENTS ... END_CONTENTS directives are used to bracket a contents list in the source
document. AscToHTM will attempt to automatically detect the presence and location of any contents list in the document, but the algorithm can be problematic.

Use this markup only when the document contains a contents list that AscToHTM fails to detect correctly.

See the discussion in 5.6.2.


7.1.4 HTML sections

AscToHTM
The BEGIN_HTML ... END_HTML directives are used to bracket actual HTML in the source document.
The bracketed HTML will be transcribed to the output file unconverted. This device will allow you to embed images, tables and other HTML constructs not normally generated by AscToHTM.

This is how the image to the right has been added.

If you simply wish to insert a single line of HTML, the HTML_LINE command (see 7.3.2) offers a more compact form.

For in-line HTML use the HTML in-line tag (see 8.2.7)


7.1.5 CODE sections

The BEGIN_CODE ... END_CODE directives are used to bracket a piece of sample code in the source text.

AscToHTM will either render this in <PRE> ... </PRE> markup or <CODE> ... </CODE> markup (see the discussion about the policy "Use <CODE>..</CODE> markup" to see why the former is used as default).


7.1.6 DIAGRAM sections

The BEGIN_DIAGRAM ... END_DIAGRAM directives are used to bracket a piece of Ascii art or text diagram in the source text.

AscToHTM will render this in <PRE> ... </PRE> markup.


7.1.7 PRE (pre-formatted text) sections

The BEGIN_PRE ... END_PRE directives are largely replaced by the TABLE, CODE and DIAGRAM directives. They are maintained for backwards compatability, and have the same effect as the DIAGRAM commands (see 7.1.6).


7.1.8 IGNORE sections

New in version 3.2

The BEGIN_IGNORE ... END_IGNORE directive delimit a section of text that should be ignored. This could be used to place comments in the source file, or to mark text that shouldn't be converted when the file is being generated by some third party software package.


7.2 Commands that influence the <HEAD>..</HEAD> of a file

7.2.1 The TITLE command

This directive allows you to specify the <TITLE>...</TITLE> to be inserted into the <HEAD> section of the output page. This title will appear in the browser's frame title whenever the page is viewed, and will be the text shown in your browser's history.

The presence of a TITLE command overrides any title specified in a policy file (see 6.3.1).

To fully understand how titles are calculated, see the discussion in 5.6.1


7.2.2 The DESCRIPTION command

This directive allows you to specify a description of your document that is added to a META tag inserted into the <HEAD> section of the output page(s) as follows :-

<META NAME="description" CONTENT="your description">

This tag is often used by search engines (e.g. AltaVista) as a brief description of the contents of your page. If omitted the first few lines may be shown instead, which is often less satisfactory.

The presence of a DESCRIPTION pre-processor command overrides any description specified via a "Document description" policy line.


7.2.3 The KEYWORDS command

This directive allows you to specify keywords that are added to a META tag inserted into the <HEAD> section of the output page(s) as follows :-

<META NAME="keywords" CONTENT="your list or keywords">

This tag is often used by search engines when indexing your HTML page. You should add here any relevant keywords possibly not contained in the text itself.

The presence of a KEYWORDS pre-processor command overrides any keywords specified via a "Document keywords" policy line.


7.2.4 The STYLE_SHEET command

This directive allows you to specify the URL of a style sheet file, usually with a .css extension. Style sheet files are a new HTML feature that allow you specify fonts and colours to be applied to your document.

The resulting HTML is inserted into the <HEAD> section of the output page(s) as follows :-

<LINK REL="STYLESHEET" HREF="URL" TYPE="text/css">

The presence of a STYLE_SHEET pre-processor command will overrides any style sheet specified via a "Document style sheet" policy line.


7.3 One line pre-processor commands

7.3.1 The INCLUDE command

This directive allows you to specify the name of a source file to be included at this point. This is useful if you wish some standard text inserted into many related documents, or into the same documents at many locations.

The included file will be treated as though it were part of the original file during both the analysis and output passes.

The include will fail is the fail cannot be found, and a test for recursive include files will be made.


7.3.2 The HTML_LINE command

This directive allows you to embed a single line of HTML in your source file. The rest of the line is copied across faithfully to the output file.

Essentially this offers the functionality as the HTML section commands (see 7.1.4), but in a more compact form.


7.3.3 The CONTENTS_LIST and NAVIGATION_BAR commands

New in version 3.2

These command allow you to add navigational aids to your document.

The CONTENTS_LIST command inserts a contents list at the present location. When this is present the normal generation of a contents list at the top of the document is suppressed.

The CONTENTS_LIST directive may also be supplied as an in-line tag (see 8.2.3). The same user arguments apply.

The NAVIGATION_BAR command inserts a navigation bar that takes to to the next/previous and contents files. This will only be generated when you have selected to split your file by setting the "Split level" policy.


7.3.4 The LINERULE command

New in version 3.2

The LINERULE directive allows you to insert a horizontal line into your text. It has the syntax:-

LINERULE <length>,<thickness>

where

<length>
length of line in pixels/pts
<thickness>
thickness of line in pixels/pts


7.3.5 The TOC (table of contents) command

New in version 3.2

The TOC directive marks a point in the file that will receive an anchor point, and then be linked to from any generated contents lists.

This can be useful to index non-headings like key diagrams and tables.

The syntax is:

TOC <level>, <link name>, <display text>

where,

<level>
 
the level in the TOC, starting with 1 being the most
significant, equivalent to "chapter"
<link name>
 
 
 
The (usually short) name by which this linkpoint may
be known. This is the value used to create an ANCHOR
point, and which may be referenced in any
HYPERLINK tag.
<display text>
 
 
 
The text to be shown in the TOC. This will also be
used to generate an ANCHOR name, and may be used in
a TOC type HYPERLINK Tag, although this is marginally
less portable than referencing the link name
 
 
If omitted, defaults to the link name, and only one
ANCHOR is created.

See also the section on HYPERLINK tags (8.2.9).


7.4 The TABLE commands

These directives are used to tailor the HTML generated in any tables AscToHTM creates. They are placed either

  1. At the top of the file

Directives placed here become defaults for the whole file, and will replace any policies that have been set (see 6.3.7)

  1. Inside a BEGIN_TABLE ... END_TABLE section

Directives placed here will apply only to the table marked up by these commands (see 7.1.2).

The table commands are described (naturally enough) in the following table.

Directive Value
Effect
TABLE_BGCOLOR Colour
Colour of background
TABLE_BORDER Number
Size of border. 0 = None
TABLE_BORDERCOLOR Colour
Colour of border
TABLE_CAPTION Text
 
Table caption. Added centred at
the top
TABLE_CELL_ALIGN Align
 
Specifies the default alignment of
cells. Left, right or center
TABLE_CELLSPACING Number
Spacing between cells.
TABLE_CELLPADDING Number
Padding inside each cell
TABLE_COLOUR_ROWS or (none)
TABLE_COLOR_ROWS
 
 
If present this specifies that the
odd and even rows of the table should
be coloured differently. See also the
"Colour data rows" policy.
TABLE_CONVERT_XREFS (none)
 
 
 
 
If present, indicates that any section
cross-references in the table may
be converted to hyperlinks
(see also the policy line
"Convert TABLE X-refs to links")
TABLE_EVEN_COLOUR or Colour
TABLE_EVEN_COLOR
 
When data rows are to be coloured
this specifies the colour of the
even numbered rows.
TABLE_HEADER_ROWS Number
 
Number of header rows. These
will be placed in <TH> .. </TH> markup
TABLE_HEADER_COLS Number
 
Number of header columns.
These will be marked up in bold
TABLE_MAY_BE_SPARSE (none)
 
 
If present, indicates that the TABLE
may be sparse (see also the policy
"Expect sparse tables")
TABLE_MIN_COLUMN_SEPARATION Number
 
 
 
Number of spaces to be taken as a
column separator when analysing the
table (see also the policy
"Minimum TABLE column separation").
TABLE_ODD_COLOUR or Colour
TABLE_ODD_COLOR
 
When data rows are to be coloured
this specifies the colour of the
odd numbered rows.
TABLE_WIDTH Text
 
The width of the table (see also the
policy "Default TABLE width")

Colours must be HTML acceptable values which will placed in the various attributes of the <BODY> tag and other.

You can enter any value acceptable to HTML. Normally a value is expressed as a "#" and a 6-digit hexadecimal value in the range #000000 (black) to #FFFFFF (white), but certain colours such as "white", "blue", "red" etc may also be recognised by HTML. AscToHTM simply transcribes your value into the output file.

A value of "none" signals the defaults are to be used. By default AscToHTM changes the background colour to be white (the true HTML default is a light gray whose value is "#C0C0C0").


7.5 The CHANGE_POLICY command

NOTE:
This feature has the potential to cause mayhem, and as such is offered to users on a "as is" basis. That is, we offer no support for getting this feature to have the effect a user may desire.

This directive allows you change a particular policy in part of a document. This is a potentially powerful feature, allowing you to tailor the conversion of your file in different sections of that file, or to embed the policy particular to a file in commands inserted at the top of the file itself.

The syntax of the command line is

$_$_CHANGE_POLICY <Policy Line>

where <Policy_line> is a policy line as it would appear in a policy file, and (usually) as it appears in the Policy manual.

For example the following would all be valid directives

        $_$_CHANGE_POLICY Background Colour : red
        $_$_CHANGE_POLICY Ignore multiple blank lines : Yes

Although how and when they would take affect will depend on the policy.

For example, the background colour would only take effect if splitting the file up, and only on the next file generation. This works, BTW, so if anyone wants to split a file into many pages, all different colours, then be my guest.

There are a many caveats to this behaviour :-

Not all policies may be changed in this way. In particular policies that open other policy files are not supported. Even if a policy if "changed", it does not follow that changing the policy will have an effect.

It is unlikely that this feature can be sensibly used to influence the analysis of file, other than when placed at the top of the file only. If such a manner it is simply an alternative to using a separate policy file.

Output policies are referenced at different times. Only those that are referenced after the line is read from the source file may be influenced, thus things like output file name may have no effect.

Not all policies once changed, can be changed back. This is particularly of policies that contain values to be added to a list. This is an issue that may be addresses in later versions.

Messing with policies can cause unpredictable behaviour. For example if you alter the section splitting parameters, then the chances of a section cross-reference elsewhere in the document being calculated as a correct hyperlink diminishes.

That's why this feature is offered UNSUPPORTED

To further complicate matters, AscToHTM uses a readahead, write behind buffer which means that you may need to experiment with the placing of your policy change to within 40 lines (the size of the buffer).

This problem is alleviated since version 3.2.



Prev | Next | Contents


Valid HTML 4.0! Converted from a single text file by AscToHTM
© 1997-99 John A. Fotheringham
Converted by AscToHTM