Miniproc documentation

version 1.06, 22-DEC-1997 Introduction Building instructions Command line usage Command file usage Comment lines Variables Reserved and special variables Scope of variables and macros Command pass through Functions f$out f$in f$read,f$write f$exit,f$break f$date f$file_info f$type f$evaluate,f$<- Math, String, and Logical Operations Macros f$macro_record f$macro_break, f$macro_return f$macro_repeat If structures Loop structures <<>> embedded substitution tags Copyright Reporting bugs, getting more information Example miniproc script (testfile.mpc)

Introduction

Miniproc is a tiny preprocessor for use in formatting HTML or other documents, and performing other similar tasks. It may be used to embed preprocessing information for any language into the source code, so that platform specific versions of the final code result after the presource code is processed. (Like the C preprocessor, but it will work with any language.) Miniproc scripts are case sensitive in all locations (variable names, function names, macro names). In order to keep Miniproc very small the script syntax is extremely rigid. Although this can make Miniproc scripts a bit ugly, it also eliminates many common coding problems (for instance, incorrectly nested if/then/else constructs, or use of = for ==, or operator precedence problems). That isn't to say it isn't possible to miscode a Miniproc script, just that miscoded scripts will usually exit with an error message - which is better than going on to do the wrong thing. It is called "miniproc" because it is a mini processor, and because the term "miniproc" appears not to be in general use at the time it is written (less than 10 hits in AltaVista on 22-NOV-1997.) table of contents

Building instructions

Miniproc is completely contained in miniproc.c. It is an ANSI C program, and compiles cleanly on OpenVMS and Irix with the pickiest ANSI C compiler settings. Just compile it in ANSI C mode, link it (if that's a separate step on your OS), and run it. table of contents

Command line usage

(This varies a bit with operating system, add quotes slashes, etc. as required to pass the double quotes seen here): miniproc input.mpc int1=123 s1="a string" s2=&123 That is, the first parameter is the name of the first input file to open. If that isn't provided, the program prompts for one. Subsequent parameters are pairs of "variablename"="variablecontents", they are equivalent to input file commands like: #__intvar=123 #__stringvar="this is a string" #__alsostring=&123 Variable names are case sensitive, and you may need to use OS specific quoting to get the desired results. Variables may be defined on the command line, but neither macros nor functions may be invoked. table of contents

Command file usage

Miniproc scripts or command files consist of 4 types of lines: 1. Pass through. A line is read from the current input file, substitutions are performed, and the line is written to the current default output file. Pass through lines may not be continued. The substitution tag is <<>>, where the contents of the variable named is inserted into that position in the line. These lines may not begin with "#__", the command line indicator. 2. Commands. These begin with "#__" and may be continued by placing a final "-" as the last character on the line, with the final line lacking this character. There is only one command per command line. The maximum length of a command line is 2048 characters. There are exactly 6 types of command line: #__! this is a comment Comment lines #__var1=var2 Variable Assignment #__macrnoname parameter parameter Macro invocations #__f$whatever parameter parameter Function invocations #__if/elseif/else/endif test If structures #__"#__command" Command pass through Commands are read in, continued lines are assembled into a single command, substitutions are performed, and then the final command line is passed to the command line interpreter. Trailing spaces and tabs on command lines are ignored, as are spaces and tabs between "#__" and the command. Multiple spaces and tabs will be reduced to a single space between each token in a command line, and for assignments, white space around "=" is allowed, but has no affect. table of contents

Comment lines

#__! Comment, rest of string is ignored table of contents

Variables

There are an unlimited number of variables. Variables are created when they first appear to the left of "=". (There are also some predefined variables, see Reserved and special variables, below.) Variables can hold either strings or integers. For most variables once the type is set, it cannot be changed. For certain special predefined variables, it can be changed. String variables can be used as pointers to other strings or integer variables. Multiple levels of redirection can be obtained by prepending as many "*" as needed. Variable names may NOT start with a digit, a digit indicates an immediate integer value. There are two ways to indicate "what follows is a string literal": enclose it in double quotes, or prefix it with "&". If the string literal has no spaces or tabs they are equivalent in all usages. However, if the string literal contains trailing spaces or tabs, then you must use the "" form to prevent them from being trimmed off. Furthermore, except in variable assignment statements, the literal area for & and " extends only to the next space,tab, or end of line. Consequently, and this is IMPORTANT: "this is a string" will only be treated as a single string in a variable assignment statement - anyplace else it will be broken up into the separate tokens: ["this] [is] [a] [string"]. Examples: #__name="string" put the string literal into name (without the outermost set of quotes. If it doesn't exist, create it. The type is determined by the value it will hold. Double quotes mark off a region from the first pair on the line to the last. So that: ="foobar" "foobar" "boo" will store the string: foobar" "foobar" "boo This is different from most other languages! Even though Miniproc is written in C, \t,\n and so forth have no special meaning in strings. #__name=&string put the string literal into name #__name="" reset name value to an empty string #__name=& " #__name=name2 copy contents of name2 into name #__name=&12 put the string "12" into name #__count=12 put the integer 12 into count #__name2=&count name2 contains "count" #__pointme1=&name2 pointme1 contains "name2" #__pointme2=&count pointme2 contains "count" #__name3=*pointme1 name3 takes on the value of name2 = "count" #__count2=*pointme2 count2 takes on the value of count #__count2=**pointme1 count2 takes on the value of count table of contents

Reserved and special variables

Some variables are reserved and have special meanings and uses. These are: STATUS Integer returned by macros, functions, and command files. 0 = failed, anything else = ok. Functions usually return 1 for ok. To set STATUS on macro exit use f$macro_return or f$macro_break. To set STATUS on input file exit use f$exit or f$break. RESULT The special variable used by the f$<- function to return the results of a calculation. It may be either an integer or a string. trace Integer. Set the trace level, for debugging. This is a bit mask, any bit set causes that operation to be logged to stdout. (But the integer must be specified in decimal syntax!) 1 Log command lines 2 Log noncommand lines 4 Log variable creation 8 Log variable setting 16 Log macro invocation 32 Log function invocation 64 Log output lines (to stderr) 128 Log results of substitution passes Default is 0, nothing is logged. It is not possible to log command line actions! subs Integer The number of levels of <<>> substitution to perform. The default is 1, so if a new <<>> is created, it will not be substituted. If the value is set higher than 1, then after the first full pass through a line a second or third pass will be made. Set it to something very large and it will go until it cannot find any more <<>>. If N is set to 0 it will not do any <<>> substitutions. macrosubs Integer Similar to subs, but controls replacements while macros are recording. The default level is 0 - no replacements while macros are recording. It is important to note that line continuation resolution occurs during macro execution, so that if a a substituted variable is split across two lines it will not be substituted during recording no matter what macrosubs is set to. safety Integer Set on command line ONLY to restrict actions taken by possibly hostile input files. Bit map. Default is 0. 1 use only string to the right of /\]>: in file names, disabling paths (excluding the file name passed from the command line) 2 disables f$in 4 disables f$out (all output to stdout) 8 disables f$file_info P1-P9 Special variables (integer or string) Used to pass parameters into Macros. MC1,2,3 MC1,2,3MAX Integers. These hold macro repeat count information. See f$macro_repeat for more information. table of contents

Scope of variables and macros

Variables and macros are global (visible in all modules) unless they are explicitly created with local scope by preceding the name in every location where it is used with a colon, as in ":var". Macros may also be global or local - give them global names when other input files will use the same macro, otherwise, give them local names. That way ":calculate" can be a different function in each input file. Note that "var1" and ":var1" are different variables even when declared in the same module. Local variables may not be set on the command line. Scope rules are: Global variable or macro "name": visible in all input files and macros internal name = "name" Local variable or macro declared in "input.mpc", but not inside a macro: visible only in that input file internal name = "^input.mpc^name" Local variable or macro declared in "input.mpc", inside a macro "amacro": visible only in that macro inside that input file internal name = "^^input.mpc^amacro^name" (If you are familiar with DCL from OpenVMS, the scoping rules for global versus local are exactly the same as for = vs ==.) It is a very bad idea to refer to local variables indirectly through global variables. That is: #__aglobal=&*:var In one module #__whatever=aglobal later, in another module if there is not a local variable ":var" in the new module the reference will cause a fatal error. If there is a local variable ":var" it will be referenced instead of the original, which is probably not the intended use. It is ok to pass the values of local variables into a macro. As in: #__somemacro :localvar but local variables should not be passed by reference (by name) for the same reason as described above. table of contents

Command pass through

Command pass through is a special shorthand for handling lines that would normally be interpreted as commands. Without the shorthand one of these two forms must be used: #__var="#__some command line <<name>>" #__f$write var or #__var="#__some command line <<name>>" <<var>> But these take two lines and require the creation of a temporary variable. The shorthand forms are: #__"#__some command line <<name>>" #__&#__some command line <<name>> table of contents

Functions

Functions cause certain actions to take place and most change the value of one or more variables. All set the variable STATUS (uppercase) when they return. For the rest of this, string means either an explicit string, like "string" or &string, or a string variable, like name. Integer is either an explicit integer like 123, or an integer variable like >count<. f$out f$in f$read,f$write f$exit,f$break f$date f$file_info f$type f$evaluate,f$<- Math, String, and Logical Operations table of contents

f$out

#__f$out filename [filenumber [disposition]] Opens the file "filename" for output. Filename is a variable name, or "string", or &string. With no other parameters, it redirects the primary output stream (filenumber 0) to the new file. Filenumbers may be in the range 0-9, inclusive. Disposition is a string variable and may be either "new" or "append". Default is "new" - that is, the output file is created when opened. On most operating systems this will destroy any previous versions, but if file versions are allowed it will just create a new version. To use disposition you must include a filenumber. f$out automatically closes open files if a filenumber is reused. If filename is an empty string, it closes that filenumber. table of contents Functions

f$in

#__f$in filename [filenumber] Opens the file "filename" for input. Filename is a variable name, or "string", or &string. With no other parameters, it redirects the primary input stream (filenumber 10) to the new file. Filenumbers may be in the range 10-19, inclusive. The primary input stream may be redirected up to 10 levels deep with f$in commands. When a redirected stream executes f$exit or f$break that input file is closed and the input stream continues from the previous file. Filenumbers 11-19 are automatically closed if reused. This does not generate a warning or error. If filename is an empty string, the file is closed without opening another file. Filenumber 10 may only be closed via f$exit or f$break. table of contents Functions

f$read,f$write

#__f$read string filenumber #__f$write string filenumber Read or write a string variable from/to a filenumber. Note that it is VERY DANGEROUS to read from filenumber 0 (the command stream) since any mistakes will corrupt the logic of the script it contains. An input string may not be larger than any input line, but the output string can be any size that the operating system supports. Read returns 1 if the read was normal, and 0 on any error or EOF. If the string truncated on read it is a fatal error. Write returns 1 for normal operation, 0 for any error. table of contents Functions

f$exit,f$break

#__f$exit integer [bang] #__f$break integer [bang] Close current input file and return integer status. If status isn't specified, defaults to 1 (true). If input stream has been redirected, return to last input stream. When all input streams are closed the program exits. If the second parameter is present it causes an immediate exit from the entire program, passing the status value to the operating system. Either f$exit or f$break may be used for this function anywhere in a miniproc script. Use f$exit to exit unconditionally from an input script. f$exit checks for dangling bits from if/elseif/else structures, indicating bad command file syntax. As a consequence, it may not be used conditionally. Use f$break to exit from within an if/elseif/else structure. f$break does not check for dangling if/elseif/else structures on exit. f$break may not be used outside of such a structure. Except when executing an unconditional program exit, neither of these may be used within a macro. table of contents Functions

f$date

#__f$date sets the following variables (implicitly) day the day (Sun - Sat) (string) month the month (Jan - Dec) (string) dd the date (1-31) integer mm the month (1-12) " wday day of the week (1-7) " yday day of the year (1-365) " yyyy the year (4 digit) " hour the hour (0-23) " minute the minute (0-59) " second the second (0-59) " unixtime store time in Unix format table of contents Functions

f$file_info

#__f$file_info filename sets the following variables for the file named in the immediate sting variable filename. file_exists 1=true, 0=false file_size In bytes. The size may not be exact on some operating systems and for some types of files. file_modified Time file was last modified, in Unix time table of contents Functions

f$type

#__f$type name Returns the type of the variable named in the immediate string value. STATUS Meaning 0 not defined 1 integer 2 string 3 macro table of contents Functions

Math, String, and Logical Operations

f$evaluate

f$<-

#__f$evaluate result op operand operand operand ... #__f$<- op operand operand operand ... Evaluate an expression, which will produce a single result using a single operater "op", and up to N operands. The types of the operands and result must match, and the result variable must already exist. In general, operands can be either variables or immediate values, except that strings containing delimiters may only be used from within a variable. Operations available are for integer math, boolean logic, and string manipulation. f$evaluate and f$<- are equivalent, except that the result for f$<- is always stored in RESULT. The f$<- form is primarily for use in if/else/elseif structures, when the result should be tested and then not used further. "result" is buffered internally, so that any variable may be both the result and an operand, and the operation will always work as expected. integer operands, integer result: add result = op1 + op2 [...+ opN] subtract result = op1 - op2 [...- opN] multiply result = op1 * op2 [...* opN] divide result = op1 / op2 [.../ opN] power result = op1 ^ op2 (op1 raised to op2 power) modulo result = op1 modulo op2 integer operands, integer/logical result (1=true, 0=false) eq,neq,ge,le,lt,gt result = if op1 (operator) op2 [...AND op1 (operator) opN] logical operands, logical result and result = op1 AND op2 [... AND opN] or result = op1 OR op2 [... OR opN] xor result = op1 XOR op2 [... AND (op1 XOR opN)] not result = NOT op1 nand result = NOT (op1 AND op2 [... AND opN]) nor result = NOT (op1 OR op2 [.... OR opN]) string operations, string result: append result = op1 // op2 [... //opN]] uppercase result= uppercase(op1) lowercase result= lowercase(op1) element op1 holds index integer (1 is first) op2 holds delimiter string (any character from it delimits) op3 holds delimited string Result set to indicated token, or "" if not valid, and STATUS set to false. Example: op1=4 op2="," op3="a,b,c,d" then result = "d", STATUS=1 But if op1=20, result="", STATUS=0 shortest Result is shortest string in op1,op2... longest Result is longest string in op1,op2... in case of a tie, the first one encountered wins. lexhigh lexlow result = op with the highest/lowest lexical values. operands are compared left to right and if lengths don't match, the shorter one is extended with zeroes. head result = first op1 characters of op2 [//op3//op4...//opN] if op1 > length of op2, all of op2. tail result = last op1 characters of op2, if op1 > length of op2, all of op2. segment result = starting from position op1, extract op2 characters from op3 (op4..opN) If op2 > all string lengths, then just to the last character in op3. locate result = position in op2 that matches the string in op1. If the string isn't found, result=0. eliminate result = op1, minus any characters in op2. retain result = op1, keeping only characters in op2. stringdel result=op1 minus any patterns that appear in op1, op2, .. opN, applied sequentially. Example: op1=foobar op2=ob op3=oa (this forms when ob comes out) result=far STATUS is 1 if no changes, 2 if changes. resize op1 is an integer, changes the size of the result string's memory area to op1 characters, and replaces the last character with a string terminator. Size must be more than zero. If the string is truncated, STATUS is 0, otherwise, 1. (op1=1 truncates a string to the empty string.) string operands, integer/logical result compare result = 1 if op1 is exactly the same as op2. ccompare result = 1 if op1 differs from op2, case doesn't count. length result= length(op1) [ ... + length(opN)] (does not include final \ 0 on string) integer operand, string result tostring result = string representation of integer op1. op2 C formatting string, use "%d" when in doubt. The final formatted string may not be more than 31 characters in size. (Use %c to store control characters like bell (7) or escape (27).) tointeger result (integer) = op1 (string). Ie, result = 123 when op1 = &123. tohex like tointeger, but hexadecimal tooctal like tointeger, but octal table of contents Functions

Macros

Macros contain a series of lines, command or pass through, and are permanent, they may be recorded exactly one time. Macro names must start with a letter, and may not be the same as a variable name. Macros are invoked by name. If the name doesn't correspond to a known macro it is assumed to be a string variable, with the value of that variable being the macro's name. Examples: foobar Execute the macro named foobar string Execute the macro named in the string variable *string Execute the macro pointed to by the string variable. Macros accept up to 9 parameters which are passed by value. (To pass a variable by name just enclose it in double quotes or precede it with a &). #__name "foo" &boo name2 1 count The preceding line says execute the macro "name", and pass it the string literals foo and boo (which may be the names of other variables), the value of name2 and count, and the integer value 1. Parameters show up inside a Macro in variables P1 - P9. These are special variables, and may contain either strings or integers. They may not contain a macro, but may contain the name of a macro. Since P variables are globals, if a macro will invoke another macro, it must first save the contents of the P variables in named variables. To pass more than one string literal, use string variables or & operators: #__null="" #__name &foo $null $null $null 10 or either of these forms #__name &foo & & & 10 #__name "foo" "" "" "" 10 but this won't work as expected #__name "foo" " " " " " " 10 as the use of " " to enclose spaces is only allowed in a variable assignment statement. For macros (but not functions) you can use local variables to pass a string which, in effect, contains spaces. Pass the values like this: #__name "foo<<:s>>has<<:s>>spaces" and inside the macro name (but NOT in the calling routine have this local variable assignment) #__ :s = " " If subs is at least 2, this line in the macro: P1 is [<<P1>>] would be substituted out to: P1 is [foo has spaces] table of contents Macros

f$macro_record

#__f$macro_record name [deck] reate and begin recording a macro. When a macro is recording it goes in verbatim, with no substitutions or other expansions performed. Only one macro may be recording at a time. The name is a literal string, the only way to change it during execution is by <<var>> substitution. deck is also a string literal. Deck terminates the macro when it appears on a line like #__deck. If deck is not supplied it defaults to "f$macro_end". It is a fatal error to try to rerecord a macro, so if there is any chance that a file will be reexecuted during a single run, protect the macros as you would C header files, like this: #__ifnot f$test macroname #__f$macro_record macroname deck ...(macro contents)... #__deck #__endif a table of contents Macros

f$macro_break, f$macro_return

#__f$macro_break status #__f$macro_return status All macros MUST end with an f$macro_return command. It marks the end of the macro, and handles updating any counters that are active in that macro, and it also checks syntax for dangling if/elseif/else/ endif constructs. Use f$macro_break to immediately terminate a macro and return to the calling script or macro. f$macro_break does not check for syntax of incomplete if structures. Macros return a status value in the integer variable STATUS. If it isn't explicitly set it comes back as 1 (true). table of contents Macros

f$macro_repeat

#__f$macro_repeat name [int1 [int2 [int3]]]] Defines up to 3 repeat counters that are initiated each time the named macro is executed. These are named MC1, MC2, and MC3, with corresponding range limits of MC1MAX, MC2MAX, and MC3MAX. These are readonly integer variables. (Actually, you can rewrite their values, but they are reset on each repeat through the macro without regard to your actions. The default setting for f$macro_repeat is that the macro executes once. #__f$macro_repeat foobar 3 2 Means that the macro command #__foobar would execute 6 times, and while it did so the counter MC1 would count from 1 to 3, and for each of those, the counter MC2 would count from 1 to 2. #__f$macro_repeat foobar 0 Disables the macro foobar. The next instance of #__foobar would be skipped, without even touching the STATUS variable. table of contents Macros

If structures

#__ifnot label test #__if label test #__elseif label test #__elseifnot label test #__else label #__endif label If, elseif, else structure. Label is an arbitrary immediate string, case sensitive. If a variable is to be used for the label it must be substituted all the way to a value, ie #__if label <<alabel>> The function of the label is to allow detection of overlapping if structures at run time. The "not" forms invert the logic of the test. Test type interpretation int 0 = false, anything else is true string zero length string is false, anything else is true *string as for string, but indirect reference macro check STATUS returned, 0 is false, anything else true Note that a macro which has been set to loop zero times returns a status of 1 when invoked, so if used in a test in this state it will always be true. function check STATUS, false if 0, true if not. EXCEPTION. If the function was f$evaluate or f$<- if STATUS is 0 it is a fatal error, if not, check the value returned and act on that. You can use macros and f$evaluate together to construct arbitrarily complicated tests. table of contents

Loop structures

The only loop mechanism in miniproc is to use f$macro_repeat to set the repeat counters for a macro, and then execute that macro. There is no way to set an infinite loop condition since the counter limits are finite. However, if you do #__f$macro_repeat foobar 2000000000 2000000000 2000000000 #__foobar that is effectively the same thing as an infinite loop, since the macro will take 8 x 10^27 cycles to complete. Typical loop structures can be implemented within a macro without much difficulty. For instance: do 100 times #__f$macro_record do100 deck ...(operations)... #__f$macro_return 1 #__deck #__f$macro_repeat do100 100 do while variable is true #__f$macro_record dowhile deck #__if a variable ...(operations)... #__else a #__ f$macro_break 1 #__endif a #__f$macro_return 1 #__deck do until variable is true #__f$macro_record dountil deck ...(operations)... #__if a variable #__ f$macro_break 1 #__endif a #__f$macro_return 1 #__deck and so forth. table of contents

<<>> embedded substitution tags

The <<>> tag is the only miniproc operation that can be mixed with other characters in an output line. <<>> substitutions are done before ANYTHING else on each line. See above for the action of "subs", which controls how many times the line is processed to remove <<>>. The * operator does not work inside <<>>, that is <<*name>> will not resolve to whatever name points to. This is not an error, it will leave <<*name>> as is on the output line. <<name>> Insert the string variable text. <<name>> Insert the integer variable into text. Typical usage might be: #__whichstory=&murderweapon #__whichpocket="right coat" #__killer="Robert" #__! then much, much later... #__! the next three lines have some single or double #__! substitutions and then go right to the output "I have an invitation to dinner," said <<killer>> as he gripped the <<<<whichstory>>>> in his <<whichpocket>> pocket ever more tightly. See testfile.mpc for an example miniproc script. table of contents

Copyright

Copyright 1997 by David Mathog and California Instititute of Technology. This software may be used freely, but may not be redistributed. You may modify this sofware for your own use, but you may not incorporate any part of the original code into any other piece of software which will then be distributed (whether free or commercial) unless prior written consent is obtained. table of contents

Reporting bugs, getting more information<

For more information, or to report bugs, contact: mathog@seqaxp.bio.caltech.edu table of contents