OpenVMS RTL String Manipulation (STR$) Manual

Document revision date: 30 March 2001

OpenVMS RTL String Manipulation (STR$) Manual

Contents

Index

Chapter 2
Introduction to String Manipulation (STR$) Routines

This chapter explains in detail the following topics:

Types of strings recognized by Run-Time Library routines
Relationship of descriptor classes to string semantics
Differences in string handling among the LIB$, OTS$, and STR$ facilities of the Run-Time Library
Conventions for reading and writing string arguments in the Run-Time Library string routines
Selection of the proper string manipulation routines
Allocation and deallocation of dynamic string resources

Descriptor Names and Field Names

In this chapter and throughout this manual it is generally the practice to use only the main part of a descriptor name or a descriptor field name, without the 32-bit or 64-bit prefix used in the actual code. For example, the length field is referred to using LENGTH rather than by mentioning both DSC$W_LENGTH and DSC64$Q_LENGTH. The complete descriptor or field name, including the prefix, is used only when referring to one particular form of the descriptor.

2.1 String Semantics in the Run-Time Library

The semantics of a string refers to the conventions that determine how a string is stored, written, and read. The Alpha and VAX architectures support three string semantics: fixed length, varying length, and dynamic length.

2.1.1 Fixed-Length Strings

Fixed-length strings have the following attributes:

An address
A length

The length of a fixed-length string is constant. It is usually initialized when the program is compiled or linked. After initialization, this length is read but never written. When a Run-Time Library routine copies a source string into a longer fixed-length destination string, the routine pads the destination string with trailing blanks.

When you pass a string to a Run-Time Library routine, you pass the string by descriptor. For a fixed-length string, the descriptor must contain this information:

The descriptor class
The data type of the string
The length of the string in bytes
The address of the beginning of the string

In most cases, you do not have to construct an actual descriptor. By default, most OpenVMS Alpha and OpenVMS VAX languages pass strings by descriptor. For information about how the language you are using handles strings, see your language reference manual. For more information about descriptors used for fixed-length strings, refer to OpenVMS Programming Interfaces: Calling a System Routine¹

Note

In contrast to Run-Time Library routines, system services do not pad output strings. For this reason, when a program calls a system service that returns a fixed-length string, the program should supply an additional argument that indicates how many bytes the system service actually deposited in the fixed-length buffer of the calling program. Some system service routines have corresponding Run-Time Library routines that provide the proper semantics for fixed-length, varying-length, and dynamic output strings.

2.1.2 Varying-Length Strings

Varying-length strings have the following attributes:

A current length
An address
A maximum length

The current length, in bytes, of a varying-length string is stored in a two-byte field, called CURLEN, preceding the text of the string. The address of the string points to the beginning of this CURLEN field, not to the beginning of the string's text.

The maximum string length is a field in the string's descriptor. This field specifies how much space is allocated to the string in a program. The maximum string length is fixed and does not change.

The value in the CURLEN field specifies how many bytes beyond the CURLEN field are occupied by the string's text. The character positions beyond this range are reserved for the growth of the string. Their contents are undefined.

For example, assume a varying string whose CURLEN is 3 and whose maximum length is 6. If a string 'ABCD' is copied into this string, the result is 'ABCD' and the CURLEN is changed to 4. If a string 'XYZ' is now copied into the same varying string, the resulting string is 'XYZ' with a CURLEN of 3. The maximum length is still 6. The bytes beyond the range designated by CURLEN are undefined.

For varying-length strings pointed to by both 32-bit and 64-bit descriptors, CURLEN is a two-byte field. Because of this, the maximum length of a varying-length string is limited to 2¹⁶ - 1, or 65,535, characters.

2.1.3 Dynamic-Length Strings

Dynamic-length strings have the following attributes:

A current length
An address pointing to the beginning of the text

Theoretically, dynamic strings have unbounded length. However, the descriptor LENGTH field contains the length of the string as an unsigned value. This effectively limits the maximum length of the string to the maximum unsigned integer value this field can hold.

For 32-bit dynamic descriptors, the LENGTH field is an unsigned value occupying two bytes. Because its maximum value is 2¹⁶ - 1, or 65,535, the maximum length of a string is limited to 65,535 characters.

On Alpha systems, the LENGTH field of a 64-bit dynamic descriptor is an unsigned value occupying eight bytes. Because its maximum value is 2⁶⁴ - 1, the maximum length of a string is 2⁶⁴ - 1 characters.

The actual space for a dynamic-length string is allocated from heap storage by the Run-Time Library. When a Run-Time Library routine copies a character string into a dynamic string, and the currently allocated heap storage is not large enough to contain the string, the currently allocated storage returns to a pool of heap storage maintained by the string routines. Then the string routines obtain a new area of the correct size. As a result of this process of deallocation and reallocation, both the current-length field and the address portion of the string's descriptor may change. Often, dynamic strings are the most convenient type to write.

Note

The Run-Time Library STR$ routines are the only routines that you should use to alter the length or address of a dynamic string. Do not use LIB$GET_VM or LIB$GET_VM_64 for this purpose.

2.1.4 Examples

The following examples illustrate what happens when the string 'ABCDEF' (of length 6) is copied into various destination strings:

Fixed-length string
If 'ABCDEF' is copied into a fixed-length string, three results are possible:
1. If the length of the output string is greater than the length of the source string, the string is padded with trailing spaces.
  
  Length of output string 10
  
  Result 'ABCDEF '
2. If the length of the output string is the same as that of the input string, the string is simply copied with no modification.
  
  Length of output string 6
  
  Result 'ABCDEF'
3. If the length of the output string is less than the length of the source string, truncation on the right occurs.
  
  Length of output string 3
  
  Result 'ABC'
Varying-length string
If the string 'ABCDEF' is copied into a varying-length string, two results are possible:
1. If the MAXSTRLEN field of the destination is greater than or equal to the length of the source, the input string is written into the output string without modification, and the CURLEN (current length) field of the output string becomes 6.
2. If the MAXSTRLEN field of the destination is less than the length of the source string, the source string is truncated on the right and the CURLEN field is rewritten to its current length. For example, if MAXSTRLEN = 4, the resulting string contains 'ABCD' and CURLEN = 4.
Dynamic-length string
If the string 'ABCDEF' is copied into a dynamic destination string, three results are possible:
1. If the length of the destination string is greater than the length of the source string (6), the result is a dynamic string of length 6 containing 'ABCDEF'. No padding takes place. The Run-Time Library may deallocate the string and reallocate a new string closer in length to the length of the source string.
2. If the length of the destination string is less than the length of the source string, the result is also 'ABCDEF', with a length of 6. The Run-Time Library deallocates the destination string and allocates a new string large enough to hold the 6 characters.
3. If the destination string and source string are of equal length, a simple copy is done. No allocation, deallocation, or padding takes place, and the destination descriptor is not modified.

Note

¹ This manual has been archived but is available on the OpenVMS Documentation CD-ROM.

2.2 Descriptor Classes and String Semantics

A calling program passes strings to an STR$ routine by descriptor. That is, the argument list entry for an input or output string is actually the address of a string descriptor. All STR$ routines handle both 32-bit and 64-bit descriptors in the argument list.

The calling program allocates a descriptor for the input string that indicates the string's address and length, so that the called routine can find the string's text and operate on it. The calling program also allocates a descriptor for the output string. In addition to length and address fields, each descriptor contains a field (DSC$B_CLASS or DSC64$B_CLASS) indicating the descriptor's class. The STR$ routine reads the class field to determine whether to write the output string as a fixed-length, varying-length, or dynamic string.

To determine the address and length of the data in the input string, Run-Time Library routines call one of the string descriptor analysis routines: LIB$ANALYZE_SDESC, LIB$ANALYZE_SDESC_64, STR$ANALYZE_SDESC, or STR$ANALYZE_SDESC_64.

The STR$ routines provide a centralized facility for analyzing string descriptors, allowing string-handling routines to function independently of the class of the input string. This means that if the Run-Time Library recognizes new string types, only the analysis routine needs to be changed, not the string routines themselves. If you are writing a routine that recognizes all the string types recognized by the Run-Time Library, your routine should first call the appropriate string-descriptor analysis routine to obtain the address and length of the input string.

You can also use the string descriptor analysis routines to find the length of a returned string. Assume that your called routine calls one of the Run-Time Library string-copying routines to create a new string. You now want the called routine to return the actual length of the new string to the calling program. The called routine calls one of the string-descriptor analysis routines to determine this length. This sequence of calls allows you to create the new string without knowing its ultimate length at the time it is created.

The Run-Time Library routines recognize the following classes of string descriptors:

Z---unspecified
S---scalar, fixed-length string
SD---decimal scalar
VS---varying-length string
D---dynamic string
A---array
NCA---noncontiguous array

For a detailed description of these descriptor classes and their fields, see the OpenVMS Calling Standard.

Table 2-1 indicates how the Run-Time Library routines access the fields of the descriptor for input and output string arguments. Given the class of the string and the field of the descriptor, the table shows whether the routine reads, writes, or modifies the field.

Table 2-1 String Passing Techniques Used by the Run-Time Library
String Descriptor Fields

String Type Class Length Pointer

Input Argument to Routines

Input string passed by descriptor Read Read Read

Output Argument from Routines; Called Routine Assumes the Descriptor Class

Output string passed by descriptor, fixed-length Ignored Read Read

Output string passed by descriptor, dynamic Ignored Read, can be modified Read, can be modified

Output Argument from Routines; Calling Program Specifies the Descriptor Class in the Descriptor

Output string, fixed-length--- Descriptor class: S, Z, A, NCA, SD Read Read Read

Output string, dynamic--- Descriptor class: D Read Read, can be modified Read, can be modified

Output string, varying-length--- Descriptor class: VS Read MAXSTRLEN is read; CURLEN is modified Read

**Table 2-1 String Passing Techniques Used by the Run-Time Library**
	String Descriptor Fields
String Type	Class	Length	Pointer
Input Argument to Routines
Input string passed by descriptor	Read	Read	Read
Output Argument from Routines; Called Routine Assumes the Descriptor Class
Output string passed by descriptor, fixed-length	Ignored	Read	Read
Output string passed by descriptor, dynamic	Ignored	Read, can be modified	Read, can be modified
Output Argument from Routines; Calling Program Specifies the Descriptor Class in the Descriptor
Output string, fixed-length--- Descriptor class: S, Z, A, NCA, SD	Read	Read	Read
Output string, dynamic--- Descriptor class: D	Read	Read, can be modified	Read, can be modified
Output string, varying-length--- Descriptor class: VS	Read	MAXSTRLEN is read; CURLEN is modified	Read

2.2.1 Conventions for Reading Input String Arguments

When a calling program passes a string as an argument to a Run-Time Library routine, the argument contains the address of a descriptor. The called routine examines the CLASS field of the descriptor to determine in which fields it can find the length of the string and the first byte of the string's text. For each descriptor class, Table 2-2 indicates which descriptor fields the routine uses to locate this information. For diagrams of the descriptors, see the OpenVMS Calling Standard manual.

Table 2-2 How Run-Time Library Routines Read Strings
Class String Length Address of First Byte of Data

Z DSC$W_LENGTH
DSC64$Q_LENGTH DSC$A_POINTER
DSC64$PQ_POINTER

S DSC$W_LENGTH
DSC64$Q_LENGTH DSC$A_POINTER
DSC64$PQ_POINTER

D DSC$W_LENGTH
DSC64$Q_LENGTH DSC$A_POINTER
DSC64$PQ_POINTER

A DSC$L_ARSIZE
DSC64$Q_ARSIZE DSC$A_POINTER
DSC64$PQ_POINTER

SD DSC$W_LENGTH
DSC64$Q_LENGTH DSC$A_POINTER
DSC64$PQ_POINTER

NCA DSC$L_ARSIZE
DSC64$Q_ARSIZE DSC$A_POINTER
DSC64$PQ_POINTER

VS Word at DSC$A_POINTER or
at DSC64$PQ_POINTER
(CURLEN field) Value of DSC$A_POINTER + 2 or
of DSC64$PQ_POINTER + 2
(byte after CURLEN field)

**Table 2-2 How Run-Time Library Routines Read Strings**
Class	String Length	Address of First Byte of Data
Z	DSC$W_LENGTH DSC64$Q_LENGTH	DSC$A_POINTER DSC64$PQ_POINTER
S	DSC$W_LENGTH DSC64$Q_LENGTH	DSC$A_POINTER DSC64$PQ_POINTER
D	DSC$W_LENGTH DSC64$Q_LENGTH	DSC$A_POINTER DSC64$PQ_POINTER
A	DSC$L_ARSIZE DSC64$Q_ARSIZE	DSC$A_POINTER DSC64$PQ_POINTER
SD	DSC$W_LENGTH DSC64$Q_LENGTH	DSC$A_POINTER DSC64$PQ_POINTER
NCA	DSC$L_ARSIZE DSC64$Q_ARSIZE	DSC$A_POINTER DSC64$PQ_POINTER
VS	Word at DSC$A_POINTER or at DSC64$PQ_POINTER (CURLEN field)	Value of DSC$A_POINTER + 2 or of DSC64$PQ_POINTER + 2 (byte after CURLEN field)

Note:

If the descriptor class is NCA, it is assumed that the string is actually contiguous.
If the descriptor class is A or NCA, the element size is assumed to be 1 byte.
If the descriptor class is A or NCA and the array being passed is multidimensional, you should be aware of how your language stores arrays (by column or by row).

2.2.2 Semantics for Writing Output String Arguments

Normally, Run-Time Library routines return the result of an operation in one of the following ways:

The called routine returns the result as a function value in R0/R1. If the result is too large to fit in R0/R1, it is returned as a function value in the first position in the argument list, and the other arguments are shifted one position to the right.
The called routine returns the result as an output argument. The calling program passes to the called routine an argument naming a variable in which the routine writes the output string. In each RTL routine, the access field of an output argument contains "write only".

The STR$ routines that produce string results use the first method to pass the results back to the calling program. Because a result string, by definition, does not fit in R0/R1, the function value from an STR$ routine is placed in the first position in the argument list.

The string manipulation routines in the LIB$ and OTS$ facilities use the second method, returning their results as output arguments.

For example, there are three entry points for the string-copying routine: LIB$SCOPY_DXDX, OTS$SCOPY_DXDX, and STR$COPY_DX. These copy the source string to the destination string. Their formats are as follows:

LIB$SCOPY_DXDX(source-string ,destination-string)
OTS$SCOPY_DXDX(source-string ,destination-string)
STR$COPY_DX(destination-string ,source-string)

Because the STR$ entry point places the result string in the first position, you can call STR$COPY_DX using a function reference in languages that support string functions. In Fortran, for example, you can use a function reference to invoke STR$COPY_DX in the following ways:

CHARACTER*80 STR$COPY_DX RETURN_STATUS = STR$COPY_DX(DESTINATION_STRING, SOURCE_STRING)

DESTINATION_STRING = STR$COPY_DX(SOURCE_STRING)

If you use the second form, you cannot access the return status, which is used to indicate truncation.

If you use a function reference to invoke a string manipulation routine in a language that does not support the concept of a string function (such as MACRO, BLISS, and Pascal), you must place the destination string variable in the argument list. In Pascal, for example, you can use a function reference to invoke STR$COPY_DX as follows:

STATUS := STR$COPY_DX(DESTINATION_STRING, SOURCE_STRING);

However, the following statement results in an error:

DESTINATION_STRING := STR$COPY_DX(SOURCE_STRING)

In addition to allocating a variable for the output string, the calling program must allocate the space for and fill in the fields of the output string descriptor at compile, link, or run time. High-level languages do this automatically.

When a Run-Time Library routine returns an output string argument to the calling program, the argument list entry is the address of a descriptor. The routine determines the semantics of the output string (fixed, varying, or dynamic) by examining the class of the descriptor for the destination string. Given the class of the output string's descriptor, Table 2-3 specifies the semantics used by Run-Time Library routines when writing the string.

Table 2-3 Output String Semantics and Descriptor Classes
Class Description Restrictions Semantics

Z Unspecified Treated as class S. Fixed-length string

S Scalar, string None. Fixed-length string

D Dynamic string String length:
DSC$W_LENGTH < 2 ¹⁶ (64K)
DSC64$Q_LENGTH < 2 ⁶⁴

Dynamic-length string

A Array Array is one-dimensional (DIMCT = 1).
String length:
DSC$L_ARSIZE < 2 ¹⁶ (64K)
DSC64$Q_ARSIZE < 2 ⁶⁴

Length of array elements is 1 byte (LENGTH = 1). Fixed-length string

SD Scalar decimal The DIGITS and SCALE fields are ignored. Fixed-length string

NCA Noncontiguous array Array is one-dimensional (DIMCT = 1).
String length:
DSC$L_ARSIZE < 2 ¹⁶ (64K)
DSC64$Q_ARSIZE < 2 ⁶⁴

Length of array elements is 1 byte (LENGTH = 1).
Array is contiguous (S1 = LENGTH). Fixed-length string

VS Varying string Current length less than maximum string length. (CURLEN <= MAXSTRLEN <= 2 ¹⁶ (64K)) Varying-length string

**Table 2-3 Output String Semantics and Descriptor Classes**
Class	Description	Restrictions	Semantics
Z	Unspecified	Treated as class S.	Fixed-length string
S	Scalar, string	None.	Fixed-length string
D	Dynamic string	String length: DSC$W_LENGTH < 2 ¹⁶ (64K) DSC64$Q_LENGTH < 2 ⁶⁴	Dynamic-length string
A	Array	Array is one-dimensional (DIMCT = 1). String length: DSC$L_ARSIZE < 2 ¹⁶ (64K) DSC64$Q_ARSIZE < 2 ⁶⁴ Length of array elements is 1 byte (LENGTH = 1).	Fixed-length string
SD	Scalar decimal	The DIGITS and SCALE fields are ignored.	Fixed-length string
NCA	Noncontiguous array	Array is one-dimensional (DIMCT = 1). String length: DSC$L_ARSIZE < 2 ¹⁶ (64K) DSC64$Q_ARSIZE < 2 ⁶⁴ Length of array elements is 1 byte (LENGTH = 1). Array is contiguous (S1 = LENGTH).	Fixed-length string
VS	Varying string	Current length less than maximum string length. (CURLEN <= MAXSTRLEN <= 2 ¹⁶ (64K))	Varying-length string

When a called routine returns a string whose length cannot be determined by the calling routine, the calling routine should also pass an optional argument to contain the output length. If the output string is a fixed-length string, the length argument would reflect the number of characters written, not counting the fill characters.

The output length argument is useful, for instance, when your program is reading variable-length records. The program can read the input strings into a buffer that is large enough to contain the largest. When you want to perform the next operation on the contents of the buffer, the length argument indicates exactly how many characters have been read, so that the program does not need to manipulate the whole buffer.

For example, LIB$GET_INPUT has the optional argument resultant-length. If LIB$GET_INPUT is called with a fixed-length, 5-character string as an argument, and the routine reads a record containing 'ABC', then resultant-length has a value of 3 and the output string contains the characters ABC followed by two blanks. But if the routine reads a record containing the value 'ABCDEFG', resultant-length has a value of 5 and the output string is 'ABCDE'. In either case, the calling program knows exactly how many characters (not counting fillers) the routine has read.

A routine such as STR$COPY_DX does not need the length argument, because the calling program can determine the length of the output string. If the output string is dynamic, the length is the same as the input string length. If the output string is fixed-length, the length is the shorter of the two input lengths.

Contents

Index

privacy and legal statement

5936PRO_001.HTML

OpenVMS RTL String Manipulation (STR$) Manual

Chapter 2Introduction to String Manipulation (STR$) Routines

2.1 String Semantics in the Run-Time Library

2.1.1 Fixed-Length Strings

2.1.2 Varying-Length Strings

2.1.3 Dynamic-Length Strings

2.1.4 Examples

1 This manual has been archived but is available on the OpenVMS Documentation CD-ROM.

2.2 Descriptor Classes and String Semantics

2.2.1 Conventions for Reading Input String Arguments

2.2.2 Semantics for Writing Output String Arguments

Chapter 2
Introduction to String Manipulation (STR$) Routines

¹ This manual has been archived but is available on the OpenVMS Documentation CD-ROM.