From: CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 7-SEP-1990 15:04:59.78 To: MRGATE::"ARISIA::EVERHART" CC: Subj: LAT rating - The truth at last! Received: by crdgw1.ge.com (5.57/GE 1.70) id AA22194; Fri, 7 Sep 90 13:17:30 EDT Message-Id: <9009071717.AA22194@crdgw1.ge.com> Received: From SUN2.NSFNET-RELAY.AC.UK by CRVAX.SRI.COM with TCP; Fri, 7 SEP 90 10:01:48 PDT Received: from vax.nsfnet-relay.ac.uk by sun2.nsfnet-relay.ac.uk with SMTP inbound id aa03778; Fri, 7 Sep 90 16:50:27 +0000 Received: from sun.nsfnet-relay.ac.uk by vax.NSFnet-Relay.AC.UK via Janet with NIFTP id aa09266; 7 Sep 90 16:51 BST Date: Fri, 7 Sep 90 17:27 BST From: Nick de Smith To: INFO-VAX <@nsfnet-relay.ac.uk:INFO-VAX@crvax.sri.com> Subject: LAT rating - The truth at last! Hi, Here is the *word* on LAT rating values - I have seen several statements on Info-VAX about this issue; all have been incorrect, several substantially so. I believe that the following is totally correct (its been checked by DEC country support). Its a bit long, but I think that there is general interest on this, so I am posting it. Hopefully this will scotch the disinformation surrounding this issue and answer a few questions for those puzzled by anomalies in LAT rating and load sharing. If I have made any errors, I apologise in advance - mail me direct and I'll update/correct it. I especially want an updated CPU_WEIGHT table and individual experiences of the various rating methods detailed below. Comments etc. as usual, to... nick NICK@NCDLAB.ULCC.AC.UK Read on... ............................. cut here ................................... LAT Rating - The truth behind the rumours! There appears to be some confusion about the rating algorithm that LAT uses to determine load balancing. Many statements have been made, most of which are either totally inaccurate in terms of VMS V5 (LAT V5.1) or incomplete. The manuals do not tell the whole story - there are components that have subtle effects that can only be found in the sources. Edit Edit date By Why 01 07-Sep-90 Nick de Smith First attempt. In this discussion the following values are used: Name Stored in Meaning ------------- ------------------- ---------------------- IJOBLIM SCH$GW_IJOBLIM Maximum number of allowed interactive jobs. Set by SYSGEN or SET LOGIN/INTERACTIVE IJOBCNT SCH$GW_IJOBCNT Current number of interactive jobs. FREEGOAL SCH$GL_FREEGOAL Desired minimum size of free list. Set by SYSGEN. FREECOUNT SCH$GL_FREECNT Current size of the free list. CPU idle time is computed as a percentage of the idle time for all CPUs in the box since the last time the idle time was computed. Prior to VMS V5.0 ----------------- The original rating algorithm was: IJOBLIM - IJOBCNT rating = (100 * %cpu_idle_time + 155 * -----------------) * CPU_WEIGHT IJOBLIM This gave a rating value between 0 and 255 weighted by the CPU type factor. The type factor was a normalised value between 0.125 and 1.0 taken from a fixed table in LTDRIVER depending on CPU power: Typical CPU_WEIGHT values were: 8800, 8650, 8500 1.0 785, 8300 0.5 780, 8200 0.375 750, uVAX II 0.25 730, uVAX I 0.125 VMS V5.0 - V5.1 --------------- In VMS V5.0 and V5.1 LTDRIVERs, the rating algorithm was: IJOBLIM - IJOBCNT CPU_WEIGHT (100 * %cpu_idle_time) + ( N*155 * ----------------- ) * ---------- IJOBLIM NMAX Where: N Number of active CPUs NMAX 8 (maximum number of supported CPUs under VMS) The main problem with this algorithm was that on some sites significant loading changes did not produce significant rating changes due to the CPU_WEIGHT and N/NMAX multipliers. VMS V5.2 and later ------------------ Now the rating is computed every multicast timer tick (normally 60 seconds): rating = availability + load_part - mem_penalty The "availability" is computed as follows: IJOBLIM - IJOBCNT ----------------- * 20 IJOBLIM If IJOBCNT >= IJOBLIM, the overall rating is forced to 0 regardless of the other values. The "load_part" is computed as follows: min( 235, CPU_RATING ) ---------------------- * 100 100 + LOAD_AVERAGE LTDRIVER counts the number of processes in the COM, COMO, COLPGWQ, MWAIT, PFWQ and FPGWQ system work queues, totals them, and averages the total using a moving window average. Processes with a current priority less than DEFPRI are not counted, thus background jobs (eg. BATCH processes) no longer effect the LAT rating. This is called the LOAD_AVERAGE The CPU_RATING term is initially set to IJOBLIM, but may be changed by the LATCP SET NODE /CPU_RATING=n command. If "n" is specified as "0", then LTDRIVER will use the value of IJOBLIM. A maximum value of 100 may be specified in LATCP and this value is scaled by 2.35 to yield a value in the range 0..235 in the equation above. One other factor was introduced here: mem_penalty: This is a previously undocumented penalty applied by LTDRIVER. It is computed as: 40 * (FREEGOAL + 2048 - FREECOUNT) ---------------------------------- FREEGOAL + 2048 The value therefore ranges (in theory) between 0 and 40. Negative values, when FREECOUNT > (FREEGOAL+2048), are not applied. The rating is stored in an unsigned number but there was no overflow check applied to the subtraction of "mem_penalty". The effect of this bug was that systems that had a rating below 40 AND were low on free memory could wrongly advertise a large rating value as follows: 20 - 30 = -10 = 246 as an unsigned 8 bit number. This bug was fixed in LTDRIVER version X5.0-128. Another parameter to be aware of is the "virtual circuit bias" applied by the connection mechanism of terminal servers. This is a previously undocumented feature applied by the TERMINAL SERVER. LAT maintains ONE virtual circuit between a host and a terminal server regardless of the number of sessions (>0) on the target terminal server using that host. This is achieved by multiplexing the different sessions within one virtual circuit. eg. If a terminal server has 100 sessions talking to a total of 5 different hosts, there are only 5 virtual circuits to maintain. The use of multiplexed virtual circuits to handle multiple sessions makes much more efficient use of the Ethernet, and presents less load on the server and host. In the situation where a user requests a connection to a service offered by two or more hosts with the same rating value for that service, this BIAS is applied towards any host that ALREADY has a virtual circuit to it from that terminal server. In previous versions of server software this was a constant value of 15. This could cause problems where there were systems offering small rating values as the bias value swamped the rating value, leading to unbalanced connections. All current versions of server software for the DECserver 200/300/500, that is versions V3.0F, V1.0D and V2.0C respectively, use the following: service_rating virtual_circuit_bias = -------------- * 15 255 which yields a value between 0 and 15 scaled by the service rating (which has a value in the range 0..255), thus providing a more effective bias. More discussion may be found in the VMS V5.2 release notes, page 3-52, section 3.24.2. There is no published documentation on "mem_penalty" or "virtual_circuit_bias". DYNRAT ------ DYNRAT was an attempt to allow users more control over the rating algorithm for service nodes. It runs as a detached process, and controls service rating values via LTDRIVER's $QIO interface. On aspect of this use is that LTDRIVER marks a rating value as implicitly STATIC once a value has been set this way, thus disabling dynamic alteration by LTDRIVER of the value - this also disables use of the "mem_penalty" described above. The DYNRAT V5.1 kit for VMS V5.1 - V5.3 sets static ratings once a minute computed as folows: rating = availability + load_part The "availability" is computed as follows: IJOBLIM - IJOBCNT ----------------- * K1 where K1 defaults to 55 IJOBLIM This is similar to a part of the old algorithm. If IJOBCNT >= IJOBLIM, the overall rating is forced to 1. The "load_part" is computed as follows: min( K2, IJOBLIM ) ------------------ where K2 defaults to 200 1 + LOAD_AVERAGE Every 5 seconds, DYNRAT counts the number of processes in the COM, COMO, COLPGWQ, MWAIT, PFWQ and FPGWQ system work queues, totals them, and averages the total using a moving window average. This is called the LOAD_AVERAGE. DYNRAT also ignores processes with a current priority less than DEFPRI. The "availability" and "load_part" are then added together to give the LAT service rating. Note that this algorithm does not need to know the CPU type or relative power, because it infers this from IJOBLIM and LOAD_AVERAGE. Users can tailor the values of K1 and K2 to suit their system. The DYNRAT algorithm assumes that the IJOBLIM value set on the system is realistic for the type of CPU being used. This is VERY IMPORTANT for an effective load balance between nodes. You may change your system's IJOBLIM value up or down to get better rating values (using SET LOGINS/INTERACTIVE=n). The Future ---------- The latest host software (as I write on 07-Sep-90) is LTDRIVER X5.0-131, which will ship with VMS V5.4-1. I understand a version of this driver has been built on VMS V5.3-1, but that it would not normally be available to users. A totally re-engineered LTDRIVER, allowing peer-to-peer operation, and thus SET HOST/LAT (as in VWSLAT), has been developed, but will not be released until "the next major release" of VMS (read as not for some time). nick NICK@NCDLAB.ULCC.AC.UK