From:	CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX"  7-SEP-1990 15:04:59.78
To:	MRGATE::"ARISIA::EVERHART"
CC:	
Subj:	LAT rating - The truth at last!

Received:  by crdgw1.ge.com (5.57/GE 1.70)
	 id AA22194; Fri, 7 Sep 90 13:17:30 EDT
Message-Id:  <9009071717.AA22194@crdgw1.ge.com>
Received: From SUN2.NSFNET-RELAY.AC.UK by CRVAX.SRI.COM with TCP; Fri,  7 SEP 90 10:01:48 PDT
Received: from vax.nsfnet-relay.ac.uk by sun2.nsfnet-relay.ac.uk 
          with SMTP inbound id aa03778; Fri, 7 Sep 90 16:50:27 +0000
Received: from sun.nsfnet-relay.ac.uk by vax.NSFnet-Relay.AC.UK via Janet 
          with NIFTP id aa09266; 7 Sep 90 16:51 BST
Date: Fri, 7 Sep 90 17:27 BST
From: Nick de Smith <NICK@ncdlab.ulcc.ac.uk>
To: INFO-VAX <@nsfnet-relay.ac.uk:INFO-VAX@crvax.sri.com>
Subject: LAT rating - The truth at last!

Hi,

Here is the *word* on LAT rating values - I have seen several statements on
Info-VAX about this issue; all have been incorrect, several substantially so.

I believe that the following is totally correct (its been checked by DEC country
support). Its a bit long, but I think that there is general interest on this, so
I am posting it.

Hopefully this will scotch the disinformation surrounding this issue and answer
a few questions for those puzzled by anomalies in LAT rating and load sharing.

If I have made any errors, I apologise in advance - mail me direct and I'll
update/correct it. I especially want an updated CPU_WEIGHT table and individual
experiences of the various rating methods detailed below.

Comments etc. as usual, to...

	nick NICK@NCDLAB.ULCC.AC.UK

Read on...
............................. cut here ...................................

		LAT Rating - The truth behind the rumours!

There appears to be some confusion about the rating algorithm that LAT uses to
determine load balancing. Many statements have been made, most of which are
either totally inaccurate in terms of VMS V5 (LAT V5.1) or incomplete.
The manuals do not tell the whole story - there are components that have subtle
effects that can only be found in the sources.

Edit	Edit date	By		Why
 01	07-Sep-90	Nick de Smith	First attempt.

In this discussion the following values are used:

Name		Stored in		Meaning
-------------	-------------------	----------------------
IJOBLIM		SCH$GW_IJOBLIM		Maximum number of allowed interactive
					jobs.
					Set by SYSGEN or SET LOGIN/INTERACTIVE
IJOBCNT		SCH$GW_IJOBCNT		Current number of interactive jobs.
FREEGOAL	SCH$GL_FREEGOAL		Desired minimum size of free list.
					Set by SYSGEN.
FREECOUNT	SCH$GL_FREECNT		Current size of the free list.

CPU idle time is computed as a percentage of the idle time for all CPUs in the
box since the last time the idle time was computed.


Prior to VMS V5.0
-----------------
The original rating algorithm was:
					IJOBLIM - IJOBCNT
rating  = (100 * %cpu_idle_time + 155 * -----------------) * CPU_WEIGHT
					     IJOBLIM

This gave a rating value between 0 and 255 weighted by the CPU type factor.
The type factor was a normalised value between 0.125 and 1.0 taken from a fixed
table in LTDRIVER depending on CPU power:

Typical CPU_WEIGHT values were:

	8800, 8650, 8500	1.0
	785, 8300		0.5
	780, 8200		0.375
	750, uVAX II		0.25
	730, uVAX I		0.125


VMS V5.0 - V5.1
---------------
In VMS V5.0 and V5.1 LTDRIVERs, the rating algorithm was:

				   IJOBLIM - IJOBCNT     CPU_WEIGHT
(100 * %cpu_idle_time) + ( N*155 * ----------------- ) * ----------
					IJOBLIM		    NMAX

Where:	N	Number of active CPUs
	NMAX	8 (maximum number of supported CPUs under VMS)

The main problem with this algorithm was that on some sites significant loading
changes did not produce significant rating changes due to the CPU_WEIGHT and
N/NMAX multipliers.


VMS V5.2 and later
------------------

Now the rating is computed every multicast timer tick (normally 60 seconds):

	rating = availability + load_part - mem_penalty

The "availability" is computed as follows:

	IJOBLIM - IJOBCNT
	----------------- * 20
	     IJOBLIM

If IJOBCNT >= IJOBLIM, the overall rating is forced to 0 regardless of the other
values.

The "load_part" is computed as follows:

	min( 235, CPU_RATING )
	---------------------- * 100
	  100 + LOAD_AVERAGE

LTDRIVER counts the number of processes in the COM, COMO, COLPGWQ, MWAIT, PFWQ
and FPGWQ system work queues, totals them, and averages the total using a moving
window average. Processes with a current priority less than DEFPRI are not
counted, thus background jobs (eg. BATCH processes) no longer effect the LAT
rating. This is called the LOAD_AVERAGE

The CPU_RATING term is initially set to IJOBLIM, but may be changed by the
LATCP SET NODE /CPU_RATING=n command. If "n" is specified as "0", then LTDRIVER
will use the value of IJOBLIM. A maximum value of 100 may be specified in LATCP
and this value is scaled by 2.35 to yield a value in the range 0..235 in the
equation above.

One other factor was introduced here:

mem_penalty:
 This is a previously undocumented penalty applied by LTDRIVER. It is computed
 as:
	40 * (FREEGOAL + 2048 - FREECOUNT)
	----------------------------------
		FREEGOAL + 2048

 The value therefore ranges (in theory) between 0 and 40. Negative values, when
 FREECOUNT > (FREEGOAL+2048), are not applied. The rating is stored in an
 unsigned number but there was no overflow check applied to the subtraction of
 "mem_penalty".
 The effect of this bug was that systems that had a rating below 40 AND were
 low on free memory could wrongly advertise a large rating value as follows:
 20 - 30 = -10 = 246 as an unsigned 8 bit number.
 This bug was fixed in LTDRIVER version X5.0-128.

Another parameter to be aware of is the "virtual circuit bias" applied by the
connection mechanism of terminal servers.
This is a previously undocumented feature applied by the TERMINAL SERVER. LAT
maintains ONE virtual circuit between a host and a terminal server regardless
of the number of sessions (>0) on the target terminal server using that host.
This is achieved by multiplexing the different sessions within one virtual
circuit. eg. If a terminal server has 100 sessions talking to a total of 5
different hosts, there are only 5 virtual circuits to maintain.
The use of multiplexed virtual circuits to handle multiple sessions makes much
more efficient use of the Ethernet, and presents less load on the server and
host.
In the situation where a user requests a connection to a service offered by two
or more hosts with the same rating value for that service, this BIAS is applied
towards any host that ALREADY has a virtual circuit to it from that terminal
server. In previous versions of server software this was a constant value of
15. This could cause problems where there were systems offering small
rating values as the bias value swamped the rating value, leading to unbalanced
connections.
All current versions of server software for the DECserver 200/300/500, that is
versions V3.0F, V1.0D and V2.0C respectively, use the following:

			service_rating
 virtual_circuit_bias = -------------- * 15
			     255

which yields a value between 0 and 15 scaled by the service rating (which has a
value in the range 0..255), thus providing a more effective bias.

More discussion may be found in the VMS V5.2 release notes, page 3-52, section
3.24.2. There is no published documentation on "mem_penalty" or
"virtual_circuit_bias".


DYNRAT
------
DYNRAT was an attempt to allow users more control over the rating algorithm for
service nodes. It runs as a detached process, and controls service rating values
via LTDRIVER's $QIO interface. On aspect of this use is that LTDRIVER marks a
rating value as implicitly STATIC once a value has been set this way, thus
disabling dynamic alteration by LTDRIVER of the value - this also disables use
of the "mem_penalty" described above.
The DYNRAT V5.1 kit for VMS V5.1 - V5.3 sets static ratings once a minute
computed as folows:

	rating = availability + load_part

The "availability" is computed as follows:

	IJOBLIM - IJOBCNT
	----------------- * K1		where K1 defaults to 55
	     IJOBLIM

This is similar to a part of the old algorithm. If IJOBCNT >= IJOBLIM, the
overall rating is forced to 1.

The "load_part" is computed as follows:

	min( K2, IJOBLIM )
	------------------		where K2 defaults to 200
	 1 + LOAD_AVERAGE

Every 5 seconds, DYNRAT counts the number of processes in the COM, COMO,
COLPGWQ, MWAIT, PFWQ and FPGWQ system work queues, totals them, and averages the
total using a moving window average. This is called the LOAD_AVERAGE.
DYNRAT also ignores processes with a current priority less than DEFPRI.

The "availability" and "load_part" are then added together to give the LAT
service rating. Note that this algorithm does not need to know the CPU type or
relative power, because it infers this from IJOBLIM and LOAD_AVERAGE. Users can
tailor the values of K1 and K2 to suit their system.

The DYNRAT algorithm assumes that the IJOBLIM value set on the system is
realistic for the type of CPU being used. This is VERY IMPORTANT for an
effective load balance between nodes. You may change your system's IJOBLIM value
up or down to get better rating values (using SET LOGINS/INTERACTIVE=n).


The Future
----------

The latest host software (as I write on 07-Sep-90) is LTDRIVER X5.0-131, which
will ship with VMS V5.4-1. I understand a version of this driver has been built
on VMS V5.3-1, but that it would not normally be available to users.

A totally re-engineered LTDRIVER, allowing peer-to-peer operation, and thus
SET HOST/LAT (as in VWSLAT), has been developed, but will not be released until
"the next major release" of VMS (read as not for some time).

nick NICK@NCDLAB.ULCC.AC.UK