Document revision date: 30 March 2001
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS RTL String Manipulation (STR$) Manual


Previous Contents Index


Chapter 2
Introduction to String Manipulation (STR$) Routines

This chapter explains in detail the following topics:

Descriptor Names and Field Names

In this chapter and throughout this manual it is generally the practice to use only the main part of a descriptor name or a descriptor field name, without the 32-bit or 64-bit prefix used in the actual code. For example, the length field is referred to using LENGTH rather than by mentioning both DSC$W_LENGTH and DSC64$Q_LENGTH. The complete descriptor or field name, including the prefix, is used only when referring to one particular form of the descriptor.

2.1 String Semantics in the Run-Time Library

The semantics of a string refers to the conventions that determine how a string is stored, written, and read. The Alpha and VAX architectures support three string semantics: fixed length, varying length, and dynamic length.

2.1.1 Fixed-Length Strings

Fixed-length strings have the following attributes:

The length of a fixed-length string is constant. It is usually initialized when the program is compiled or linked. After initialization, this length is read but never written. When a Run-Time Library routine copies a source string into a longer fixed-length destination string, the routine pads the destination string with trailing blanks.

When you pass a string to a Run-Time Library routine, you pass the string by descriptor. For a fixed-length string, the descriptor must contain this information:

In most cases, you do not have to construct an actual descriptor. By default, most OpenVMS Alpha and OpenVMS VAX languages pass strings by descriptor. For information about how the language you are using handles strings, see your language reference manual. For more information about descriptors used for fixed-length strings, refer to OpenVMS Programming Interfaces: Calling a System Routine1

Note

In contrast to Run-Time Library routines, system services do not pad output strings. For this reason, when a program calls a system service that returns a fixed-length string, the program should supply an additional argument that indicates how many bytes the system service actually deposited in the fixed-length buffer of the calling program. Some system service routines have corresponding Run-Time Library routines that provide the proper semantics for fixed-length, varying-length, and dynamic output strings.

2.1.2 Varying-Length Strings

Varying-length strings have the following attributes:

The current length, in bytes, of a varying-length string is stored in a two-byte field, called CURLEN, preceding the text of the string. The address of the string points to the beginning of this CURLEN field, not to the beginning of the string's text.

The maximum string length is a field in the string's descriptor. This field specifies how much space is allocated to the string in a program. The maximum string length is fixed and does not change.

The value in the CURLEN field specifies how many bytes beyond the CURLEN field are occupied by the string's text. The character positions beyond this range are reserved for the growth of the string. Their contents are undefined.

For example, assume a varying string whose CURLEN is 3 and whose maximum length is 6. If a string 'ABCD' is copied into this string, the result is 'ABCD' and the CURLEN is changed to 4. If a string 'XYZ' is now copied into the same varying string, the resulting string is 'XYZ' with a CURLEN of 3. The maximum length is still 6. The bytes beyond the range designated by CURLEN are undefined.

For varying-length strings pointed to by both 32-bit and 64-bit descriptors, CURLEN is a two-byte field. Because of this, the maximum length of a varying-length string is limited to 216 - 1, or 65,535, characters.

2.1.3 Dynamic-Length Strings

Dynamic-length strings have the following attributes:

Theoretically, dynamic strings have unbounded length. However, the descriptor LENGTH field contains the length of the string as an unsigned value. This effectively limits the maximum length of the string to the maximum unsigned integer value this field can hold.

For 32-bit dynamic descriptors, the LENGTH field is an unsigned value occupying two bytes. Because its maximum value is 216 - 1, or 65,535, the maximum length of a string is limited to 65,535 characters.

On Alpha systems, the LENGTH field of a 64-bit dynamic descriptor is an unsigned value occupying eight bytes. Because its maximum value is 264 - 1, the maximum length of a string is 264 - 1 characters.

The actual space for a dynamic-length string is allocated from heap storage by the Run-Time Library. When a Run-Time Library routine copies a character string into a dynamic string, and the currently allocated heap storage is not large enough to contain the string, the currently allocated storage returns to a pool of heap storage maintained by the string routines. Then the string routines obtain a new area of the correct size. As a result of this process of deallocation and reallocation, both the current-length field and the address portion of the string's descriptor may change. Often, dynamic strings are the most convenient type to write.

Note

The Run-Time Library STR$ routines are the only routines that you should use to alter the length or address of a dynamic string. Do not use LIB$GET_VM or LIB$GET_VM_64 for this purpose.

2.1.4 Examples

The following examples illustrate what happens when the string 'ABCDEF' (of length 6) is copied into various destination strings:

Note

1 This manual has been archived but is available on the OpenVMS Documentation CD-ROM.

2.2 Descriptor Classes and String Semantics

A calling program passes strings to an STR$ routine by descriptor. That is, the argument list entry for an input or output string is actually the address of a string descriptor. All STR$ routines handle both 32-bit and 64-bit descriptors in the argument list.

The calling program allocates a descriptor for the input string that indicates the string's address and length, so that the called routine can find the string's text and operate on it. The calling program also allocates a descriptor for the output string. In addition to length and address fields, each descriptor contains a field (DSC$B_CLASS or DSC64$B_CLASS) indicating the descriptor's class. The STR$ routine reads the class field to determine whether to write the output string as a fixed-length, varying-length, or dynamic string.

To determine the address and length of the data in the input string, Run-Time Library routines call one of the string descriptor analysis routines: LIB$ANALYZE_SDESC, LIB$ANALYZE_SDESC_64, STR$ANALYZE_SDESC, or STR$ANALYZE_SDESC_64.

The STR$ routines provide a centralized facility for analyzing string descriptors, allowing string-handling routines to function independently of the class of the input string. This means that if the Run-Time Library recognizes new string types, only the analysis routine needs to be changed, not the string routines themselves. If you are writing a routine that recognizes all the string types recognized by the Run-Time Library, your routine should first call the appropriate string-descriptor analysis routine to obtain the address and length of the input string.

You can also use the string descriptor analysis routines to find the length of a returned string. Assume that your called routine calls one of the Run-Time Library string-copying routines to create a new string. You now want the called routine to return the actual length of the new string to the calling program. The called routine calls one of the string-descriptor analysis routines to determine this length. This sequence of calls allows you to create the new string without knowing its ultimate length at the time it is created.

The Run-Time Library routines recognize the following classes of string descriptors:

For a detailed description of these descriptor classes and their fields, see the OpenVMS Calling Standard.

Table 2-1 indicates how the Run-Time Library routines access the fields of the descriptor for input and output string arguments. Given the class of the string and the field of the descriptor, the table shows whether the routine reads, writes, or modifies the field.

Table 2-1 String Passing Techniques Used by the Run-Time Library
  String Descriptor Fields
String Type Class Length Pointer
Input Argument to Routines
Input string passed by descriptor Read Read Read
Output Argument from Routines; Called Routine Assumes the Descriptor Class
Output string passed by descriptor, fixed-length Ignored Read Read
Output string passed by descriptor, dynamic Ignored Read, can be modified Read, can be modified
Output Argument from Routines; Calling Program Specifies the Descriptor Class in the Descriptor
Output string, fixed-length--- Descriptor class: S, Z, A, NCA, SD Read Read Read
Output string, dynamic--- Descriptor class: D Read Read, can be modified Read, can be modified
Output string, varying-length--- Descriptor class: VS Read MAXSTRLEN is read; CURLEN is modified Read

2.2.1 Conventions for Reading Input String Arguments

When a calling program passes a string as an argument to a Run-Time Library routine, the argument contains the address of a descriptor. The called routine examines the CLASS field of the descriptor to determine in which fields it can find the length of the string and the first byte of the string's text. For each descriptor class, Table 2-2 indicates which descriptor fields the routine uses to locate this information. For diagrams of the descriptors, see the OpenVMS Calling Standard manual.

Table 2-2 How Run-Time Library Routines Read Strings
Class String Length Address of First Byte of Data
Z DSC$W_LENGTH
DSC64$Q_LENGTH
DSC$A_POINTER
DSC64$PQ_POINTER
S DSC$W_LENGTH
DSC64$Q_LENGTH
DSC$A_POINTER
DSC64$PQ_POINTER
D DSC$W_LENGTH
DSC64$Q_LENGTH
DSC$A_POINTER
DSC64$PQ_POINTER
A DSC$L_ARSIZE
DSC64$Q_ARSIZE
DSC$A_POINTER
DSC64$PQ_POINTER
SD DSC$W_LENGTH
DSC64$Q_LENGTH
DSC$A_POINTER
DSC64$PQ_POINTER
NCA DSC$L_ARSIZE
DSC64$Q_ARSIZE
DSC$A_POINTER
DSC64$PQ_POINTER
VS Word at DSC$A_POINTER or
at DSC64$PQ_POINTER
(CURLEN field)
Value of DSC$A_POINTER + 2 or
of DSC64$PQ_POINTER + 2
(byte after CURLEN field)

Note:

2.2.2 Semantics for Writing Output String Arguments

Normally, Run-Time Library routines return the result of an operation in one of the following ways:

The STR$ routines that produce string results use the first method to pass the results back to the calling program. Because a result string, by definition, does not fit in R0/R1, the function value from an STR$ routine is placed in the first position in the argument list.

The string manipulation routines in the LIB$ and OTS$ facilities use the second method, returning their results as output arguments.

For example, there are three entry points for the string-copying routine: LIB$SCOPY_DXDX, OTS$SCOPY_DXDX, and STR$COPY_DX. These copy the source string to the destination string. Their formats are as follows:

LIB$SCOPY_DXDX(source-string ,destination-string)
OTS$SCOPY_DXDX(source-string ,destination-string)
STR$COPY_DX(destination-string ,source-string)

Because the STR$ entry point places the result string in the first position, you can call STR$COPY_DX using a function reference in languages that support string functions. In Fortran, for example, you can use a function reference to invoke STR$COPY_DX in the following ways:


CHARACTER*80 STR$COPY_DX 
RETURN_STATUS = STR$COPY_DX(DESTINATION_STRING, SOURCE_STRING) 

or


DESTINATION_STRING = STR$COPY_DX(SOURCE_STRING) 

If you use the second form, you cannot access the return status, which is used to indicate truncation.

If you use a function reference to invoke a string manipulation routine in a language that does not support the concept of a string function (such as MACRO, BLISS, and Pascal), you must place the destination string variable in the argument list. In Pascal, for example, you can use a function reference to invoke STR$COPY_DX as follows:


STATUS := STR$COPY_DX(DESTINATION_STRING, SOURCE_STRING); 

However, the following statement results in an error:


DESTINATION_STRING := STR$COPY_DX(SOURCE_STRING) 

In addition to allocating a variable for the output string, the calling program must allocate the space for and fill in the fields of the output string descriptor at compile, link, or run time. High-level languages do this automatically.

When a Run-Time Library routine returns an output string argument to the calling program, the argument list entry is the address of a descriptor. The routine determines the semantics of the output string (fixed, varying, or dynamic) by examining the class of the descriptor for the destination string. Given the class of the output string's descriptor, Table 2-3 specifies the semantics used by Run-Time Library routines when writing the string.

Table 2-3 Output String Semantics and Descriptor Classes
Class Description Restrictions Semantics
Z Unspecified Treated as class S. Fixed-length string
S Scalar, string None. Fixed-length string
D Dynamic string String length:
DSC$W_LENGTH < 2 16 (64K)
DSC64$Q_LENGTH < 2 64

Dynamic-length string
A Array Array is one-dimensional (DIMCT = 1).
String length:
DSC$L_ARSIZE < 2 16 (64K)
DSC64$Q_ARSIZE < 2 64

Length of array elements is 1 byte (LENGTH = 1).
Fixed-length string
SD Scalar decimal The DIGITS and SCALE fields are ignored. Fixed-length string
NCA Noncontiguous array Array is one-dimensional (DIMCT = 1).
String length:
DSC$L_ARSIZE < 2 16 (64K)
DSC64$Q_ARSIZE < 2 64

Length of array elements is 1 byte (LENGTH = 1).
Array is contiguous (S1 = LENGTH).
Fixed-length string
VS Varying string Current length less than maximum string length. (CURLEN <= MAXSTRLEN <= 2 16 (64K)) Varying-length string

When a called routine returns a string whose length cannot be determined by the calling routine, the calling routine should also pass an optional argument to contain the output length. If the output string is a fixed-length string, the length argument would reflect the number of characters written, not counting the fill characters.

The output length argument is useful, for instance, when your program is reading variable-length records. The program can read the input strings into a buffer that is large enough to contain the largest. When you want to perform the next operation on the contents of the buffer, the length argument indicates exactly how many characters have been read, so that the program does not need to manipulate the whole buffer.

For example, LIB$GET_INPUT has the optional argument resultant-length. If LIB$GET_INPUT is called with a fixed-length, 5-character string as an argument, and the routine reads a record containing 'ABC', then resultant-length has a value of 3 and the output string contains the characters ABC followed by two blanks. But if the routine reads a record containing the value 'ABCDEFG', resultant-length has a value of 5 and the output string is 'ABCDE'. In either case, the calling program knows exactly how many characters (not counting fillers) the routine has read.

A routine such as STR$COPY_DX does not need the length argument, because the calling program can determine the length of the output string. If the output string is dynamic, the length is the same as the input string length. If the output string is fixed-length, the length is the shorter of the two input lengths.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
5936PRO_001.HTML