WASD Hypertext Services - Technical Overview

16 - Proxy Services

16.1 - HTTP Proxy Serving
    16.1.1 - Enabling A Proxy Service
    16.1.2 - Proxy Bind
    16.1.3 - Proxy Chaining
    16.1.4 - Controlling Proxy Serving
16.2 - Caching
    16.2.1 - Cache Device
    16.2.2 - Enabling Caching
    16.2.3 - Cache Management
    16.2.4 - Cache Invalidation
    16.2.5 - Cache Retention
    16.2.6 - Reporting and Maintenance
    16.2.7 - PCACHE Utility
16.3 - CONNECT Serving
    16.3.1 - Enabling CONNECT Serving
    16.3.2 - Controlling CONNECT Serving
16.4 - FTP Proxy Serving
    16.4.1 - FTP Query String Keywords
    16.4.2 - "login" Keyword
16.5 - Gatewaying Using Proxy
    16.5.1 - Reverse Proxy
    16.5.2 - One-Shot Proxy
    16.5.3 - DNS Wildcard Proxy
    16.5.4 - Originating SSL
16.6 - Browser Proxy Configuration

[next] [previous] [contents] [full-page]

A proxy server acts as an intermediary between Web clients and Web servers. It listens for requests from the clients and forwards these to remote servers. The proxy server then receives the responses from the servers and returns them to the clients. Why go to this trouble? There are several reasons, the most common being:

To allow internal clients access to the Internet from behind a firewall. Browsers behind the firewall have full Web access via the proxy system.
To provide controlled access to internal resources for external clients. The proxy server provides a managed gateway through a firewall into an organisation's Web resources.
Many proxy servers provide caching, or local storage, of responses. For frequent or commonly accessed resources this can not only significantly reduce apparent network latency but also greatly reduce the total traffic downloaded by a site.
For anonymity. Although often related directly to firewall security considerations, it can also sometimes be an advantage to just not reveal the exact source of Web transactions from within your local network.

Proxy Serving Quick-Start

No additional software needs to be installed to provide proxy serving. The following steps provide a brief outline of proxy configuration.

Enable proxy serving and specify which particular services are to be proxies (16.1.1 - Enabling A Proxy Service and 9 - Service Configuration).
If proxy caching is required (most probably, see 16.2 - Caching)
- Decide on a cache device, create the cache root directory, modify server startup procedures to include the HT_CACHE_ROOT logical name (16.2.1 - Cache Device).
- Enable caching on required services (16.2.2 - Enabling Caching).
- Adjust relevant cache management configuration parameters if required (16.2.3 - Cache Management).
- If required adjust cache retention parameter (16.2.5 - Cache Retention).
If providing SSL tunnelling (proxy of Secure Sockets Layer transactions) add/modify a service for that (16.3 - CONNECT Serving).
Add HTTPD$MAP mapping rules for controlling this/these services (16.1.4 - Controlling Proxy Serving, 16.3.2 - Controlling CONNECT Serving, and 16.4 - FTP Proxy Serving).
Restart server (HTTPD/DO=RESTART).

Error Messages

When proxy processing is enabled and HTTPD$CONFIG directive [ReportBasicOnly] is disabled it is necessary to make adjustments to the contents of the HTTPD$MSG message configuration file [status] item beginning "Additional Information". Each of the "/httpd/-/statusnxx.html" links

  <A HREF="/httpd/-/status1xx.html">1<I>xx</I></A>
  <A HREF="/httpd/-/status2xx.html">2<I>xx</I></A>
  <A HREF="/httpd/-/status3xx.html">3<I>xx</I></A>
  <A HREF="/httpd/-/status4xx.html">4<I>xx</I></A>
  <A HREF="/httpd/-/status5xx.html">5<I>xx</I></A>
  <A HREF="/httpd/-/statushelp.html">Help</A>

should be changed to include a local host component

  <A HREF="http://local.host.name/httpd/-/status1xx.html">1<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status2xx.html">2<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status3xx.html">3<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status4xx.html">4<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status5xx.html">5<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/statushelp.html">Help</A>

If this is not provided the links and any error report will be interpreted by the browser as relative to the server the proxy was attempting to request from and the error explanation will not be accessable.

16.1 - HTTP Proxy Serving

WASD provides a proxy service for the HTTP scheme (prototcol).

Proxy serving generally relies on DNS resolution of the requested host name. DNS lookup can introduce significant latency to transactions. To help ameliorate this WASD incorporates a host name cache. To ensure cache consistency the contents are regularly flushed, after which host names must use DNS lookup again, refreshing the information in the cache. The period of this cache purge is contolled with the [ProxyHostCachePurgeHours] configuration parameter.

When a request is made by a proxy server is is common for it to add a line to the request header stating that it is a forwarded request and the agent doing the forwarding. With WASD proxying this line would look something like this:

  Forwarded: by http://host.name.domain (HTTPd-WASD/8.4.0 OpenVMS/IA64 SSL)

It is enabled using the [ProxyForwarded] configuration parameter.

An additional, and perhaps more widely used facility, is the Squid extension field to the proxied request header supplying the originating client host name or IP address.

  X-Forwarded-For: client.host.name

It is enabled using the [ProxyXForwardedFor] configuration parameter.

16.1.1 - Enabling A Proxy Service

Proxy serving is enabled on a per-server basis using the [ProxyServing] configuration parameter.

WASD can configure services using the HTTPD$CONFIG [service] directive, the HTTPD$SERVICE configuration file, or even the /SERVICE= qualifier.

HTTPD$CONFIG [Service]

The actual services providing the proxy serving (i.e. the host and port) are specified on a per-service basis. This means it is possible to have proxy and non-proxy services deployed on the one server (on different ports of course). Proxying is enabled by appending the proxy keyword to the particular service specification. The following example shows a non-proxy and proxy service.

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:80
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy

HTTPD$SERVICE

Proxy service configuration using the HTTPD$SERVICE configuration is slightly simpler, with a specific configuration directive for each aspect. (9 - Service Configuration). This example illustrates configuring the same services as used in the previous section.

  [[http://alpha.wasd.dsto.defence.gov.au:80]]
 
  [[http://alpha.wasd.dsto.defence.gov.au:8080]]
  [ServiceProxy]  enabled

Examples in following section all show configuration using the HTTPD$CONFIG [Service] directive. When using the HTTPD$SERVICE configuration file Server Administration facility interface all relevant proxy directives are provided for selection.

16.1.2 - Proxy Bind

Using the HTTPD$MAP SET proxy=bind=<IP-address> rule it becomes possible to make the outgoing request appear to originate from a particular source. The Network Interface must be able to bind to the specified IP address (i.e. it cannot be an arbitrary address).

  SET http://*.fred.com proxy=bind=131.185.250.1

16.1.3 - Proxy Chaining

Some sites may already be firewalled and have corporate proxy servers providing Internet access. It is quite possible to use WASD proxying in this environment, where the WASD server makes it's proxied requests via the next proxy server in the hierarchy. This is known as proxy chaining. Using the chain keyword specify the host name of the next server when enabling the proxy service, as in this example:

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy;chain=next.proxy.host

Chaining may also be controlled on a virtual service or path basis using the HTTPD$MAP SET proxy=chain=<host:port> rule.

  SET http://*.com proxy=chain=next.proxy.host:8080

16.1.4 - Controlling Proxy Serving

Controlling both access-to and access-via proxy serving is possible.

Proxy Password

Access to the proxy service can be directly controlled through the use of WASD authorization. Proxy authorization is distinct from general access authorization. It uses specific proxy authorization fields provided by HTTP, and by this allows a proxied transaction to also supply transaction authorization for the remote server.

The following example shows a service specification using the "pauth" parameter making the proxy service require authorization for use.

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy;pauth

In addition to the service being specified as requiring authorization it is also necessary to configure the source of the authentication. This is done using the HTTPD$AUTH configuration file. The following example shows all requests for the proxy virtual service must be authorized (GET and well as POST, etc.), although it is possible to restrict access to only read (GET), preventing data being sent out via the server.

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  ["Proxy Access"=PROXY_ACCESS=id]
  http://* read+write

Local Password

It is also possible to control proxy access via local authorization, although this is less flexible by removing the ability to then pass authorization information to the remote service. In other repects it is set up in the same way as proxy authorization, only using the "lauth" parameter.

Access Filtering

Extensive control of how, by whom and what a proxy service is used for may be exercised using WASD general and conditional mapping (13 - Mapping Rules and 13.7 - Conditional Mapping) possibly in the context of a virtual service specification for the particular connect service host and port (13.6 - Virtual Servers). The following examples provide a small indication of how mapping could be used in a proxy service context.

It is possible, though more often not practical, to regulate which hosts are connected to via the proxy service. For example, the following rule forbids accessing any site with the string "hacker" in it (for the proxy service "alpha...:8080".
```
  [[alpha.wasd.dsto.defence.gov.au:8080]]
  pass http://*hacker*/* "403 Proxy access to this host is forbidden."
  pass http://*
```

Or as in the following example, only allow access to specific sites.

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  pass http://*.org/*
  pass http://*.digital.com/*
  pass http://* "403 Proxy access to this host is forbidden."

It is also possible to restrict access via the proxy service to selected hosts on the internal subnet. Here only a range of literal addresses plus a single host in another subnet are allowed access to the service.
```
  [[alpha.wasd.dsto.defence.gov.au:8080]]
  pass http://* "403 Restricted access." ![ho:131.185.250.* ho:131.185.200.10]
  pass http://*
```

In the following example POSTing to a particular proxied servers is not allowed (why I can't imagine, but hey, this is an example!)

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  pass http://subscribe.sexy.com/* "403 POSTing not allowed." [me:POST]
  pass http://*

It is possible to redirect proxied requests to other sites.

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  redirect http://www.sexy.com/* http://www.disney.com/
  pass http://*

A proxy service is just a specialized capability of a general HTTP service. Therefore it is quite in order for the one service to respond to standard HTTP requests as well as proxy-format HTTP requests. To enforce the use of a particular service as proxy-only, add a final rule to a virtual service's mapping restricting non-proxy requests.
```
  [[alpha.wasd.dsto.defence.gov.au:8080]]
  pass http://*
  pass /* "403 This is a proxy-only service."
```

This example provides the essentials when supporting reverse proxying. Note that mappings may become quite complex when supporting access to resources across multiple internal systems (e.g. access to directory icons).

  [[main.corporate.server.com:80]]
  pass /sales/* http://sales.corporate.server.com/*
  pass /shipping/* http://shipping.corporate.server.com/*
  pass /support/* http://support.corporate.server.com/*
  pass * "403 Nothing to access here!"

NOTE
To expedite proxy mapping is it recommended to have a final rule for the proxy virtual service that explicitly passes the request. This would most commonly be a permissive pass as in example 1, could quite easily be an restrictive pass as in example 2, or a combination as in example 6.

16.2 - Caching

Caching involves using the local file-system for storage of responses that can be reused when a request for the same URL is made. The WASD server does not have to be configured for caching, it will provide proxied access without any caching taking place.

When a proxied request is processed, and it's characteristics would allow the response to be cached, a unique identifier generated from the URL is used to create a corresponding file name. The response header and any body are stored in this file. This may be the data of an HTML page, a graphic, etc.

When a proxied request is being processed, and it's characteristics would allow the request to be cached, the unique identifier generated allows for a previously created cache file to be checked for. If it exists, and is current enough, the response is returned from it, instead of from the remote server. If it exists and is no longer current the request is re-made to the remote server, and the response if still cacheable is re-cached, keeping the contents current. If it does not exist the response is delivered from the remote server.

Not all responses can be cached!

The main critera are for the response to be successful (200 status), general (i.e. one not in response to a specialized query or action), and not too volatile (i.e. the same page may be expected to be returned more than once, preferably over an extended period).

Proxied requests can only be cached if ...

uses the GET method
does not contain a query string

Proxied responses will only be cached if ...

status code begins with "2" (success)
contains a Last-Modified: header field
one or more hours since the last modification
any Expires: date/time is still in the future
does not contain a Pragma: no-cache field
does not exceed a configuration parameter in size

The [ProxyCacheFileKbytesMax] configuration parameter controls the maximum size of a response before it will not be cached. This can be determined from any "Content-Length:" response header field, in which case it will proactively not be cached, or if during cache load the maximum size of the file increases beyond the specified limit the load is aborted.

Not all sites may benefit from cache!

As many transactions on today's Web contain query strings, etc., and therefore cannot be meaningfully cached, it should not be assumed the cost/benefit of having a proxy cache enabled is a forgone conclusion. Each site should monitor the proxy traffic reports and decide on a local policy.

The facilities described in 16.2.6 - Reporting and Maintenance allow a reasonably informed decision to be made. Items to be considered.

The ratio of cache reads to network accesses.
The number of non-cacheable requests and responses, particularly as a percentage of total proxy traffic.
The ratio of network to cache traffic, although this may be skewed by having a high ratio of 304 (not-modified) responses from cache (which contain few bytes). Check the cache 304 reporting item.

Last, but by no means least, understanding the characteristics of local usage. For example, are there a small number of requests generating lots of non-cacheable traffic? For instance, a few users accessing streaming content.

16.2.1 - Cache Device

Selection of a disk device for supporting the proxy cache should not be made without careful consideration, doubly so if significant traffic is experienced. Here are some common-sense suggestions.

avoid locating it as a subdirectory of HT_ROOT:[000000]
use a disk with as little other activity as possible (both I/O and space usage)
use a disk with as much free space as possible
use the fastest disk available

Initially the directory will need to be created. This can be done manually as described below, or if using the supplied server startup procedures (STARTUP.COM) it is checked for and if it does not exist is automatically created during startup. The directory must be owned by the HTTP$SERVER account and have full read+write+execute+delete access. It is suggested to name it [HT_CACHE] and may be created manually using the following command.

  $ CREATE /DIR /OWN=HTTP$SERVER /PROT=(O:RWED,G,W) device:[HT_CACHE]

It is a relatively simple matter to relocate the cache at any stage. Simply create the required directory in the new location, modify the startup procedures to reflect this, shut the server down completely then restart it using the procedures (not a /DO=RESTART!). The contents of the previous location could be transfered to the new using the BACKUP utility if desired.

HT_CACHE_ROOT Logical

It is required to define the logical name HT_CACHE_ROOT if any proxy services are specified in the server configuration. The server will not start unless it is correctly defined. The logical should be a concealed device logical specifying the top level directory of the cache tree. The following example shows how to define such a logical name.

  $ DEFINE /SYSTEM /EXEC /TRANSLATION=CONCEALED HT_CACHE_ROOT device:[HT_CACHE.]

If example startup procedure is in use then it is quite straight-forward to have the logical created during server startup (STARTUP.COM).

16.2.2 - Enabling Caching

Caching may enabled on a per-service basis. This means it is possible to have a caching proxy service and a non-caching service active on the one server. Caching is enabled by appending the cache keyword to the particular service specification. The following example shows a non-proxy and a caching proxy service.

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:80
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy;cache

Proxy caching may be selectively disabled for a particular site, sites or paths within sites using the set nocache mapping rule. This rule, used to disable caching for local requests, also disables proxy file caching for that subset of requests. This example shows a couple of variations.

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  # disable caching for local site's servers that respond fairly quickly
  set http://*.local.domain/* nocache
  # disable caching of log files
  set http://*.log nocache
  pass http://*

NOTE
It is also recommended to place the cache directory under some authorization control to prevent casual browsing and access of the cache contents. Something local, similar in intention to
[VMS] /ht_cache_root/* ~webadmin,131.185.250.*,r+w ;

16.2.3 - Cache Management

As the proxy cache is implemented using the local file system, management of the cache implies controlling the number of, and exactly which files remain in cache. Essentially then, management means when and which to delete. The [ProxyReportLog] configuration parameter enables the server process log reporting of cache management activities.

Cache file deletion has three variants.

ROUTINE
This ensures files that have not been accessed within specified limits are periodically and regularly deleted. The [ProxyCacheRoutineHourOfDay] configuration parameter controls this activity.
The ROUTINE form occurs once per day at the specified hour. The cache files are scanned looking for those that exceed the configuration parameter for maximum period since last access, which are then deleted (the largest number of [ProxyCachePurgeList], as described below).
BACKGROUND
Setting the [ProxyCacheRoutineHourOfDay] configuration parameter to 24 enables background purging.
In this mode the server continuously scans through the cache files in the same manner as for ROUTINE purging. The difference is it is not all done a single burst once a day, pushing disk activity to it's maximum. The background purge regulates the period between each file access, pacing the scan so that the entire cache is passed through once a day. It adjusts this pace according the the size of the cache.
REACTIVE
This is a remedial action, when cache device usage is reaching it's configuration limit and files need to be deleted to free up space. The following parameters control this behaviour.

[ProxyCacheDeviceCheckMinutes]
[ProxyCacheDeviceMaxPercent]
[ProxyCacheDevicePurgePercent]
[ProxyCachePurgeList]

The cache device space usage is checked at the specified interval.
If the device reaches the specified percentage used a cache purge is initiated and by deleting files until the specified reduction is attained, the total space in use on the disk is reduced.
The cache files are scanned using the [ProxyCachePurgeList] parameter described below, working from the greatest to least number of hours in the steps provided. At each scan files not accessed within that period are deleted. At each few files deleted the device free space is checked as having reached the lower purge percentage limit, at which point the scan terminates.
This parameter has as it's input a series of comma-separated integers representing a series of hours since files were last accessed. In this way the cache can be progressively reduced until percentage usage targets are realized. Such a parameter would be specified as follows,
```
  [ProxyCachePurgeList] 168,48,24,8,0
```
meaning the purge would first delete files not accessed in the last week, then not for the last two days, then the last twenty-four hours, then eight, then finally all files. The largest of the specified periods (in this case 168) is also used as the limit for the ROUTINE scan and file delete.
Once the target reduction percentage is reached the purge stops. During the purge operation further cache files are not created. Even when cache files cannot be created for any reason proxy serving still continues transparently to the clients.
NOTE
Cache files can be manually deleted at any time (from the command line) without disturbing the proxy-caching server and without rebuilding any databases. When deleting, the /BEFORE=date/time qualifier can be used, with /CREATED being the document's last-modified date, /REVISED being the last time it was loaded, and /EXPIRED the last time the file was accessed (used to supply a request). Be aware that on an active server it is quite possible some files may be locked at time of attempted deletion.

From The Command-Line

If [ProxyCacheRoutineHourOfDay] is empty or non-numeric the automatic, once-a-day routine purge of the cache by the server is disabled and it is expected to be performed via some other mechanism, such as a periodic batch job. This allows routine purging more or less frequently than is provided-for by server configuration, and/or the purge activity being performed by a process or cluster node other than that of the HTTPd server (reducing server and/or node impact of this highly I/O intensive activity). Progress and other messages are provided via SYS$OUTPUT, and if configured in the [Opcom...] directives to the operator log and designated operator terminal as well. If a process already has the cache locked the initiated activity aborts.

The following example shows a routine purge being performed from the command-line. This form uses the hours from [ProxyCachePurgeList].

  $ HTTPD /PROXY=PURGE=ROUTINE

A variant on this allows the maximum age to be explicitly specified.

  $ HTTPD /PROXY=PURGE=ROUTINE=168

Reactive purging and statistic scans may also be initiated from the command line. For a reactive purge the first number can be the device usage percentage (indicated by the trailing "%"), if not the configuration limit is used.

  $ HTTPD /PROXY=PURGE=REACTIVE=80%,168,48,24,8,0
  $ HTTPD /PROXY=CACHE=STATISTICS

Any in-progress scan of the cache (i.e. reactive or routine purges, or a statistics scan) can be halted from the command line (and online Server Admininistration facility).

  $ HTTPD /PROXY=STOP=SCAN

16.2.4 - Cache Invalidation

For the purposes of this document, cache invalidation is defined as the determination when a cache file's data is no longer valid and needs to be reloaded.

The method used for cache validation is deliberately quite simple in algorithm and implementation. In this first attempt at a proxy server the overriding criteria have been efficiency, simplicity of implementation, and reliability. Wishing to avoid complicated revalidation using behind-the-scenes HEAD requests the basic approach has been to just invalidate the cache item upon exiry of a period related to it's "Last-Modified:" age or upon a no-cache request, both described further below.

If a "Pragma: no-cache" request header field is present (as is generated by Netscape Navigator when using the reload function) then the server should completely reload the response from the remote server. (Too often the author seems to have received incomplete responses where the proxy server caches only part of a response and has seemed to refuse to explicitly re-request.) OK it's a a bit more expensive but who's to say the proxy server is right all the time! The response is still cached ... the next request may not have the no-cache parameter.
When a response is cached the file creation date/time is set to the local equivalent of the "Last-Modified:" GMT date and time supplied with the response. In this manner the file's absolute age can be determined quickly and easily from the file header. This is used as described in 16.2.5 - Cache Retention.
When a file is cached, the revision and expires date/times are set to current. The revision date/time is used when assessing when the file was last loaded/validated/reloaded. Once a file is cached the RMS expires date/time is updated every time it is subsequently accessed. In this way recency of usage of the item can be easily tracked, allowing the routine and reactive purges to operate by merely checking the file header.

The revision count (automatically updated by VMS) tracks the absolute number of accesses since the file was created (actually a maximum of 65535, or an unsigned short, but that should be enough for informational purposes).

16.2.5 - Cache Retention

The [ProxyCaheReloadList] configuration parameter is used to control when a file being accessed is reloaded from source.

This parameter supplies a series of integers representing the hours after which an access to a cache file causes the file to be invalidated and reloaded from it's source during the proxied request. Each number in the series represents the lower boundary of the range between it and the next number of hours. A file with a last-loaded age falling within a range is reloaded at the lower boundary of that particular range. The following example

  [ProxyCacheReloadList] 1,2,4,8,12,24,48,96,168

would result in a file 1.5 hours old being reloaded every hour, 3.25 hours old every 2 hours, 7 hours old every 4 hours, etc. Here "old" means since last (or of course first) loaded. Files not reloaded since the final integer, in this example 168 (one week), are always reloaded.

16.2.6 - Reporting and Maintenance

The HTTPDMON utility allows real-time monitoring of proxy serving activity (23.8 - HTTPd Monitor).

Proxy reports and some administrative control may be exercised from the online Server Administration facility (18 - Server Administration). The information reported includes:

some proxy serving statistics
current cache device status
whether cache space is available
if a purge is in progress
the results from the last routine and reactive purges
the results from the last scan of the cache
contents of the host name/address cache

The following actions can be initiated from this menu. Note that three of these relate to proxy file cache and so may take varying periods to complete, depending on the number of files. If the cache is particularly large the scan/purge may take some considerable time.

generate proxy cache statistics by scanning the entire cache
perform a routine purge
perform a reactive purge
purge the proxy host name/address cache

Also available from the Server Administration facility is a dialog allowing the proxy characteristics of the running server to be adjusted on an ad hoc basis. This only affects the executing server, to make changes to permanent configuration the HTTPD$CONFIG configuration file must be changed.

This dialog can be used to modify the device free space percentages according to recent changes in device usage, alter the reload or purge hour list characteristics, etc. After making these changes a routine or reactive purge will automatically be initiated to reduce the space in use by the proxy cache if implied by the new settings.

16.2.7 - PCACHE Utility

It is often useful to be able to list the contents of the proxy cache directory or the characteristics or contents of a particular cache file. Cache files have a specific internal format and so require a tool capable of dealing with this. The HT_ROOT:[SRC.UTILS]PCACHE.C program provides a versatile command-line utility as well as CGI(plus) script, making cache file information accessable from a browser. It also allows cache files to be selected by wildcard filtering on the basis of the contents of the associated URL or response header. For detailed information on the various command-line options and CGI query-string options see the description at the start of the source code file.

Command-Line Use

Make the HT_EXE:PCACHE.EXE executable a foreign verb. It is then possible to

list the basic characteristics of all/selected files in the cache directory tree
list the characteristics plus the HTTP response header of a single file
extract the response header
extract the response body (text, graphic, file, etc.)
do all of the above while filtering on URL or response header contents, number of hits, when last accessed, last loaded, and last modified (in hours)

Script Use

To make the PCACHE script available to the server ensure the following line exists in the HTTP$CONFIG configuration file in the [AddType] section.

  .HTC  application/x-script  /cgiplus-bin/pcache  WASD proxy cache file

The following rule needs to be in the HTTPD$MAP configuration file.

  pass /ht_cache_root/*

NOTE
It is also recommended to place the utility and the cache directory under some authorization control to prevent casual browsing and access of the cache contents. Something local, similar in intention to
[VMS] /pcache/* ~webadmin,131.185.250.*,r+w ; /ht_cache_root/* ~webadmin,131.185.250.*,r+w ;

Once available the following is then possible.

From a directory listing ("Index Of") access a cache file and be presented with the following information:
- blocks used/allocated
- last modification date/time of the response
- date/time the response was (re)loaded into cache
- date/time the cache file was last accessed
- number of time since first created the cache file has been accessed
- the URL the cache file represents (as a link)
- the full response header (as received from the proxied server)
- a series of "buttons" allowing
  - the cache content (response body) to be viewed (note that self-relative embedded graphics, etc., probably will not be displayed in such documents)
  - the cache file to be VMS DUMPed
  - the cache file to be VMS ANALYZE/RMSed
  - the cache file to be VMS DELETEd
If the configuration changes described above have been made the following link will return such an index.
/ht_cache_root/
Have the utility generate a form providing a convenient interface to the various capabilities and filters available. If the configuration changes described above have been made the following link will return this form.
PCACHE
The utility's form does not have to be used. By supplying the appropriate query string components, either from a custom form or forms, or directly embedded into links, profiles, listings, deletion may be generated.

NOTE
Cache directory trees have the potential to become heavily populated, so the use of the script to generate listings of the cache contents could return extremely large listing documents.

16.3 - CONNECT Serving

The connect service provides firewall proxying for any connection-oriented TCP/IP access. Essentially it provides the ability to tunnel any other protocol via a Web proxy server. In the context of Web services it is most commonly used to provide firewall-transparent access for Secure Sockets Layer (SSL) transactions.

The WASD CONNECT service implements the de facto standard HTTP CONNECT method, described in a number of Internet Drafts.

16.3.1 - Enabling CONNECT Serving

As with proxy serving in general, CONNECT serving may enabled on a per-service basis using the HTTPD$CONFIG [service] directive, the HTTPD$SERVICE configuration file, or even the /SERVICE= qualifier.

HTTPD$CONFIG [Service]

The actual services providing the CONNECT access (i.e. the host and port) are specified on a per-service basis. This means it is possible to have CONNECT and non-CONNECT services deployed on the one server, as part of a general proxy service or standalone. CONNECT proxying is enabled by appending the connect keyword to the particular service specification. The following example shows a non-proxy and proxy services, with and without additional connect processing enabled.

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:80
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy
  http://alpha.wasd.dsto.defence.gov.au:8081;connect
  http://alpha.wasd.dsto.defence.gov.au:8082;proxy;connect

HTTPD$SERVICE

Proxy service configuration using the HTTPD$SERVICE configuration is slightly simpler, with a specific configuration directive for each aspect (9 - Service Configuration). This example illustrates configuring the same services as used in the previous section.

  [[http://alpha.wasd.dsto.defence.gov.au:80]]
 
  [[http://alpha.wasd.dsto.defence.gov.au:8080]]
  [ServiceProxy]  enabled
 
  [[http://alpha.wasd.dsto.defence.gov.au:8081]]
  [ServiceProxySSL]  enabled
 
  [[http://alpha.wasd.dsto.defence.gov.au:8082]]
  [ServiceProxy]  enabled
  [ServiceProxySSL]  enabled

16.3.2 - Controlling CONNECT Serving

The connect service poses a significant security dilemma when in use in a firewalled environment. Once a CONNECT service connection has been accepted and established it essentially acts as a relay to whatever data is passed through it. Therefore any transaction whatsoever can occur via the connect service, which in many environments may be considered undesirable.

In the context of the Web and the use of the connect service for proxying SSL transactions it may be well considered to restrict possible connections to the well-known SSL port, 443. This may be done using conditional mapping rules, as in the following example:

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  pass *:443 [me:connect]
  pass * "403 CONNECT only allowed to port 443." [me:connect]

All of the comments on the use of general and conditional mapping made in 16.1.4 - Controlling Proxy Serving can also be applied to the connect service.

16.4 - FTP Proxy Serving

WASD provides a proxy service for the FTP scheme (prototcol). This provides the facility to list directories on the remote FTP server, download and upload files.

The (probable) file system of the FTP server host is determined by examining the results of an FTP PWD command. If it returns a current working directory specification containing a "/" then it's assumes it to be Unix(-like), if ":[" then VMS, if a "\" then DOS. (Some DOS-based FTP servers respond with a Unix-like "/" so a second level of file-system determination is undertaken with the first entry of the actual listing.) Anything else is unknown and reported as such.

Note that the content-type of the transfer is determined by the way the proxy server interprets the FTP request path's "file" extension. This may or may not correspond with what the remote system might consider the file type to be. The default content-type for unknown file types is "application/octet-stream" (binary). When using the alt query string parameters then for any file in a listing the icon provides an alternate content-type. If the file link provides a text document then the icon will provide a binary file. If the link returns a binary file then the icon will return a file with a plain-text content-type.

In addition to content-type the FTP mode in which the file transfer occurs can be determined by either of two conditions. It the content-type is "text/.." then the transfer mode will be ASCII (i.e. record carriage-control adjusted between systems). If not text then the file is transfered in Image mode (i.e. a binary, opaque octet-stream). For any given content-type this default behaviour may be adjusted using the [AddType] directive (8.2 - Alphabetic Listing), or the "#!+" MIME.TYPES directive (6.6.2 - MIME.TYPES).

Rules required in HTTPD$MAP for mapping FTP proxy. This is preferably made against the virtual service providing the FTP proxy. The service explicitly must make the icon path used available or it must be available to the proxy service in some other part of the mappings. Also the general requirement for error message URLs applies to FTP proxying (Error Messages).

  [[proxy.host.name:8080]
  pass http://* http://* 
  pass ftp://* ftp://* 
  pass /*/-/* /ht_root/runtime/*/*

16.4.1 - FTP Query String Keywords

Keywords added to an FTP request query string allow the basic FTP action to be somewhat tailored. These case-insensitive keywords can be in the form of a query keys or query form fields and values. This allows considerable flexibility in how they are supplied, allowing easy use from a browser URL field or for inclusion as form fields.

FTP Query String Keywords

Keyword	Description
alt	Adds alternate access (complementary content-type at the icon) for directory listings.
ascii	Force the file transfer type to be done as ASCII (i.e. with carriage-control conversion between systems with different representations).
content	Explicitly specify the content type for the returned file (e.g. "content:text/plain", or "content=image/gif").
dos	When generating a directory listing force the interpretation to be DOS.
email	Explicitly specify the anonymous access email address (e.g. "email:daniel@wasd.vsm.com.au" or "email=daniel@wasd.vsm.com.au").
image	Force the file transfer type to be done as an opaque binary stream of octets.
list	Displays the actual directory plain-text listing returned by the remote FTP server. Can be used for problem analysis.
login	Results in the server prompting for a username and password pair that are then used as the login credentials on the remote FTP server.
octet	Force the content-type of the file returned to be specified as "application/octet-stream".
text	Force the content-type of the file returned to be specified as "text/plain".
unix	When generating a directory listing force the interpretation to be Unix.
upload	Causes the server to return a simple file transfer form allowing the upload of a file from the local system to the remote FTP server.
vms	When generating a directory listing force the interpretation to be VMS.

16.4.2 - "login" Keyword

The usual mechanism for supplying the username and password for access to a non-anonymous proxied FTP server area is to place it as part of the request line (i.e. "ftp://username:password@the.host.name/path/"). This has the obvious disadvantage that it's there for all and sundry to see.

The "login" query string is provided to work around the more obvious of these issues, having the authentication credentials as part of the request URL. When this string is placed in the request query string the FTP proxy requests the browser to prompt for authentication (i.e. returns a 401 status). When request header authentication data is present it uses this as the remote FTP server username and password. Hence the remote username and password never need to appear in plain-text on screen or in server logs.

16.5 - Gatewaying Using Proxy

WASD is fully capable of mapping non-proxy into proxy requests, with various limitations on effectiveness considering the nature of what is being performed.

Gatewaying between request schemes (protocols)

HTTP to HTTP (a gateway of sorts - standard proxy)

HTTP TO HTTP-over-SSL (non-secure to secure)

HTTP to FTP

HTTP-over-SSL to HTTP (secure to non-secure)

HTTP-over-SSL to HTTP-over-SSL (secure to secure)

HTTP-over-SSL to FTP

and also gatewaying between IP versions

IPv4 to IPv6

IPv6 to IPv4

All can be useful for various reasons. One example might be where a script is required to obtain a resource from a secure server via SSL. The script can either be made SSL-aware, sometimes a not insignificant undertaking, or it can use standard HTTP to the proxy and have that access the required server via SSL. Another example might be accessing an internal HTTP resource from an external browser securely, with SSL being used from the browser to the proxy server, which the accesses the internal HTTP resource on it's behalf.

Request Redirect

The basic mechanism allowing this gatewaying is "internal" redirection. The redirect mapping rule (13.4.2 - REDIRECT Rule) either returns the new URL to the originating client (requiring it to reinitiate the request) or begins reprocessing the request internally (transparently to the client). It is this latter function that is obviously used for gatewaying.

16.5.1 - Reverse Proxy

The use of WASD proxy serving as a firewall component assumes two configured network interfaces on the system, one of which is connected to the internal network, the other to the external network. (Firewalling could also be accomplished using a single network interface with router blocking external access to all but the server system.) Outgoing (internal to external) proxying is the most common configuration, however a proxy server can also be used to provide controlled external access to selected internal resources. This is sometimes known as reverse proxy and is a specific example of WASD's general non-proxy to proxy request redirection capability (16.5 - Gatewaying Using Proxy).

In this configuration the proxy server is contacted by an external browser with a standard HTTP request. Proxy server rules map this request onto a proxy-request format result. For example:

  redirect /sales/* /http://sales.server.com/*?

Note that the trailing question-mark is required to propagate any query string (13.4.2 - REDIRECT Rule).

The server recognises the result format and performs a proxy request to a system on the internal network. Note that the mappings required could become quite complex, but it is possible. See example 7 in 16.1.4 - Controlling Proxy Serving.

Redirection Location Field

If a reverse proxied server returns a redirection response (302) containing a "Location: url" field with the host component the same reverse-proxied-to server it can be rewritten to instead contain the proxy server host. If these do not match the rewrite does not occur. Using the redirection example above, the SET mapping rule proxy=reverse=location specifies the path that will be prefixed to the path component in the location field URL. Usually this would be the same path used to map the reverse proxy redirect (in this example "/sales/"), though could be any string (presumably detected and processed by some other part of the mapping).

  set /sales/* proxy=reverse=location=/sales/
  redirect /sales/* /http://sales.server.com/*?

This could be simplified a little by using a postfix SET rule along with the original redirect.

  redirect /sales/* /http://sales.server.com/*? proxy=reverse=location=/sales/

If the proxy=reverse=location=<string> ends in an asterisk the entire 302 location field URL is appended (rather than just the path) resulting in something along the lines of

  Location: http://proxy.server.com/sales/http://sales.server.com/path/

which once redirected by the client can be subsequently tested for and some action made by the proxy server according to the content (just a bell or whistle ;^).

Authorization Verification

WASD can authorize reverse proxy requests locally (perhaps from the SYSUAF) and rewrite that username into the proxied requests "Authorization: ..." field. The proxied-to server can then verify that the request originated from the proxy server and extract and use that username as authenticated.

This functionality is described in the HT_ROOT:[SRC.HTTPD]PROXYVERIFY.C module.

16.5.2 - One-Shot Proxy

This looks a little like reverse proxy, providing access to a non-local resource via a standard (non-proxy) request. The difference allows the client to determine which remote resource is accessed. This works quite effectively for non-HTML resources (e.g. image, binary files, etc.) but non-self-referential links in HTML documents will generally be inaccessable to the client. This can provide provide scripts access to protocols they do not support, as with HTTP to FTP, HTTP to HTTP-over-SSL, etc.

Mappings appropriate to the protocols to be support must be made against the proxy service. Of course mapping rules may also be used to control whom or to what is connected.

  [[the.proxy.service:port]]
  # support "one-shot" non-proxy to proxy redirect
  redirect  /http://*   http://*
  redirect  /https://*  https://*
  redirect  /ftp://*    ftp://*
  # OK to process these (already, or now) proxy format requests
  pass  http://*   http://*
  pass  https://*  https://*
  pass  ftp://*    ftp://*

The client may the provide the desired URL as the path of the request to the proxy service. Notice that the scheme provided in the desired URL can be any supported by the service and it's mappings.

  http://the.proxy.service:port/http://the.remote.host/path
  http://the.proxy.service:port/https://the.remote.host/path
  http://the.proxy.service:port/ftp://the.remote.host/pub/

16.5.3 - DNS Wildcard Proxy

This relies on being able to manipulate host record in the DNS or local name resolution database. If a "*.the.proxy.host" DNS (CNAME) record is resolved it allows any host name ending in ".the.proxy.host" to be resolved to the corresponding IP address. Similarly (at least the Compaq TCP/IP Services) the local host database allows an alias like "another.host.name.proxy.host.name" for the proxy host name. Both of these would allow a browser to access "another.host.name.proxy.host.name" with it resolved to the proxy service. The request "Host:" field would contain "another.host.name.proxy.host.name".

Using this approach a fully functioning proxy may be implemented for the browser without actually configuring it for proxy access, where returned HTML documents contain links that are always correct with reference to the host used to request them. This allows the client an ad hoc proxy for selected requests. For a wildcard (CNAME) record the browser user may enter any host name prepended to the proxy service host name and port and have the request proxied to that host name. Entering the following URL into the browser location field

  http://the.host.name.the.proxy.service:8080/path

would result in a standard HTTP proxy request for "/path" being made to "the.host.name:80". With the URL

  https://the.host.name.the.proxy.service:8443/path

an SSL proxy request. Note that normally the well-known port would be used to connect to (80 for http: and 443 for https:). If the final, period-separated component of the wildcard host name is all digits it is interpreted as a specific port to connect to. The example

  http://the.host.name.8001.the.proxy.service:8080/path

would connect to "the.host.name:8001", and

  https://the.host.name.8443.the.proxy.service:8443/path

to "the.host.name:8443".

NOTE
It has been observed that some browsers insist that an all-digit host name element is a port number despite it being prefixed by a period not a colon. These browsers then attempt to contact the host/port directly. This obviously precludes using an all-digit element to indicate a target port number with these browsers.

This wildcard DNS entry approach is a more fully functional analogue to common proxy behaviour but is slightly less flexible in providing gatewaying between protocols and does require more care in configuration. It also relies on the contents of the request "Host:" field to provide mapping information (which generally is not a problem with modern browsers). The mappings must be performed in two parts, the first to handle the wildcard DNS entry, the second is the fairly standard rule(s) providing access for proxy processing.

  [[the.proxy.service:port1]]
  if (host:*.the.proxy.service:port1)
     redirect  *  /http://*
  else
     pass  http://*   http://*
  endif

The obvious difference between this and one-shot proxy is the desired host name is provided as part of the URL host, not part of the request path. This allows the browser to correctly resolve HTML links etc. It is less flexible because a different proxy service needs to be provided for each protocol mapping. Therefore, to allow HTTP to HTTP-over-SSL proxy gatewaying another service and mapping would be required.

  [[the.proxy.service:port2]]
  if (host:*.the.proxy.service:port2)
     redirect  *  /https://*
  else
     pass  https://*   https://*
  endif

16.5.4 - Originating SSL

This proxy function allows standard HTTP clients to connect to Secure Sockets Layer (17 - Secure Sockets Layer) services. This is very different to the CONNECT service (16.3 - CONNECT Serving), allowing scripts and standard character-cell browsers supporting only HTTP to access secure services.

Standard username/password authentication is supported (as are all other standard HTTP request/response interactions). The use of X.509 client certificates (17.3.7 - Authorization Using X.509 Certification) to establish outgoing identity is not currently supported.

Enabling SSL

Unlike HTTP and FTP proxy it requires the service to be specifically configured using the [ServiceClientSSL] directive.

There are a number of Secure Sockets Layer related service parameters that should also be considered (9 - Service Configuration). Although most have workable defaults unless [ServiceProxyClientSSLverifyCA] and [ServiceProxyClientSSLverifyCAfile] are specifically set the outgoing connection will be established without any checking of the remote server's certificate. This means the host's secure service could be considered unworthy of trust as it's credentials have not been established.

As with other proxy serving, HTTP-to-SSL gatewaying may enabled on a per-service basis using the HTTPD$CONFIG [service] directive, the HTTPD$SERVICE configuration file, or even the /SERVICE= qualifier, although not all options are available unless using HTTPD$SERVICE.

HTTPD$CONFIG [Service]

The actual services providing the SSL gateway (i.e. the host and port) are specified on a per-service basis, enabled by appending the pclientssl keyword to the particular service specification. The following example shows such a services.

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy;pclientssl

HTTPD$SERVICE

With proxy service configuration being done using the HTTPD$SERVICE configuration file (9 - Service Configuration) is is performed with specific directives. This example illustrates configuring the same services as used in the previous section.

  [[http://alpha.wasd.dsto.defence.gov.au:8080]]
  [ServiceProxy]  enabled
  [ServiceClientSSL]  enabled

16.6 - Browser Proxy Configuration

The browser needs to be configured to access URLs via the proxy server. This is done using two basic approaches, manual and automatic.

Manual
Most browsers allow the configuration for access via a proxy server. This commonly consists of an entry for each of the common Web protocol schemes ("http:", "ftp:", "gopher:", etc.). Supply the configured WASD proxy service host name and port for the HTTP scheme. This is currently the only one available. This would be similar to the following example:
```
  http: www.wasd.dsto.defence.gov.au 8080
```
To exclude local hosts, and other servers that do not require proxy access, there is usually a field that allows a list of hosts and/or domain names for which the browser should not use proxy access. This might be something like:
```
  wasd.dsto.defence.gov.au,dsto.defence.gov.au,defence.gov.au
```
Automatic
At least Netscape Navigator/Communicator and Microsoft Internet Explorer (4.n and following) provide the facility to download a small JavaScript function for establishing proxy policy. Information on this function and it's deployment may be found at
```
  http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html
```
The following is a very simple proxy configuration JavaScript function. This specifies that all URL host names that aren't full qualified, or that are in the "defence.gov.au" domain will be connected to directly, with all other being accessed via the specified proxy server.
```
  function FindProxyForURL(url,host)
  { 
     if (isPlainHostName(host) ||
         dnsDomainIs(host, ".defence.gov.au"))
        return "DIRECT";
     else
        return "PROXY www.wasd.dsto.defence.gov.au:8080; DIRECT";
  }
```
This JavaScript is contained in a file with a specific, associated MIME file type, "application/x-ns-proxy-autoconfig". For WASD it is recommended the file be placed in HT_ROOT:[LOCAL] and have a file extension of .PAC (which follows Netscape naming convention).
The following HTTPD$CONFIG directive would map the file extension to the required MIME type:
```
  [AddType]
  .PAC  application/x-ns-proxy-autoconfig  -  proxy autoconfig
```
This file is commonly made the default document available from the proxy service. The following example shows the HTTP$MAP rules required to do this:
```
  [www.wasd.dsto.defence.gov.au:8080]
  pass http://* http://*
  pass / /ht_root/local/proxy.pac
  pass *
```
All that remains is to provide the browser with the location from which load this automatic proxy configuration file. In the case of the above set-up this would be:
```
  http://www.wasd.dsto.defence.gov.au:8080/
```
A template for a proxy auto-configuration file may be found at HT_ROOT:[EXAMPLE]PROXY_AUTOCONFIG.TXT

[next] [previous] [contents] [full-page]