From:	SMTP%"RELAY-INFO-VAX@CRVAX.SRI.COM" 19-JAN-1994 13:40:57.56
To:	EVERHART
CC:	
Subj:	Thoughts on VMS crashes under heavy system load...

Date: Tue, 18 Jan 1994 15:46:24 -0500 (EST)
From: "Clayton, Paul D." <CLAYTON@radium.xrt.upenn.edu>
To: INFO-VAX@SRI.COM
Cc: CLAYTON@radium.xrt.upenn.edu
Message-Id: <940118154624.3fc0274b@radium.xrt.upenn.edu>
Subject: Thoughts on VMS crashes under heavy system load...


Phillip Green @ "GREEN@GREY.SPS.MOT.COM"  reports the following problem.

>We have a cluster of 6610s and a variety of satellites.  We have Phase II 
>volume shadowing on critical disks and utilise I/O Express extensively.
>
>Under circumstances of heavy CPU, Network and disk i/o load with lots of Phase
>II shadow copying going on, we get a 6610 crashing with the INVEXCEPTN bugcheck
>code.  At DEC's request we switched the DECps data collector off and the crash
>code changed to PGFIPLHI... pigs will fly!!
>
>All 6000's clock ETA0 errors prior to the crash but the DEBNAs have been 
>replaced.  Analysis of the crash shows typically a REMQUE instruction in 
>SYSTEM_PRIMITIVES attempting to access a location in S1 space.  That's as far 
>as we & DEC have got.

This problem sounds very close to a severe and costly problem that we had on 
a program I am currently working for NASA and MMC.

We originally purchased VAX 6310/20's, found them to be wanting in CPU cycles 
due to our application load and ended up upgrading sixty five (65) 63xx 
systems to 6610's with 128MB of memory. This was a COSTLY deal for everyone.

We had no sooner put the 6610's on our development systems at the time and we 
started having crashes of VMS. The bugchecks were either INVEXCPTN or 
PGFLTIPLHI. They were sporadic in nature, spread over four MI clusters and all 
had double entries in the system error log prior to the crash reporting a PEA0 
error (such as a satellite leaving the cluster) and then an ETA0 error 
indicating that the internal command buffer/queue had overflowed to the DEBNA.

A EVALUATE/PSL of the PSL at the time of the crash showed the system to 
consisently be executing at IPL 6. 

Examination of the code stream showed that the problem revolved around a 
REMQUE instruction and if the target packet was formatted, the back link 
(second longword) was always found to be in error. It typically had values 
that were totally incorrect for the packet type and queue structure. 

There appeared to be a massive confusion factor working in that the packets
started as an absolute queue but then one of the links started looking like a 
self relative deal.

If you VALIDATEd QUEUE on the IRP queue, you my end up finding 
hundreds/thousands of packets waiting for work to be performed. The LAST 
packet in the queue ended up with the 'bogus' values for the linked list 
portion.

At the time of the crashes, we were ALWAYS doing MASSIVE amounts of Ethernet 
based I/O consisting of MSCP, DECnet, SCS and/or LAT.

In our case doing copies of thousands of files using MSCP (to a RAM disk on 
another cluster node) or concurrent DECnet file copies to other nodes in our 
network.

The precursor to most crashes was a satellite exiting the cluster for some 
reason or another. The reason the satellite went away was a don't care to our 
version of this problem.

Replacement of the DEBNA boards with new ones had no impact on the problem.

We ended up with VERY LENGTHY talks with CSC, both the VMS Internals and 
CDDS/VAST groups. Mark on the Internals team and Rob on the VAST group ended 
up on the call. 

Mark coded up an SPM event trace program that recorded all threads of code 
that executed at IPL6 into a block of nonpaged pool for later SDA analysis 
when the crash occured. This showed that the ETDRIVER and PEADRIVER were the 
big users of the IPL6 fork queue.

Mark then coded up another SPM Event trace program that recorded all portions 
of VMS code that requested packets from Nonpaged pool and who returned them. 
This showed us that the culprits were definitiatly the ET and PEA drivers and 
the packets they used and how they worked together. Or were suppose to work 
together.

One side effect of running BOTH of these programs on the 6610's, was that the 
crashes did NOT occur as frequently as before. Instead of one every couple of 
days, we could now go weeks. And even considering what was being recorded by 
both programs, the overall throughput on the 6610's was not impacted all that 
adversely.

The problem was then raised to an LOR level and the Ethernet, cluster and VMS 
groups started throwing this problem around between them. Then the group that 
deals with the DEBNA's got involved and pointed to the command overflow errors 
in the one errorlog entry of each pair.

It turns out that the DEBNA has an ability to work with many IRP's 
concurrently, each putting various commands for sending/receiving data into 
the command buffer on DEBNA. 

The problem, as it turned out in our case, was when ANY Ethernet error 
occured. We were effectively doomed (unless the SPM programs were running) 
any time an error occured. In the steps taken to correct an error that occured 
on a DEBNA, the PEA and ET drivers work on the SAME IRP packets to get various 
pieces of info so that proper error handling can take place. One driver uses 
absolute queues, the other uses self-relative. Each maintains their own list. 
The PROBLEM is that either apparently can deallocate packets it deems no 
longer needed due to the error and the other driver MAY STILL NEED the same 
packet for other things. There apparently is no high level synching between 
the two drivers in the case of error recovery for IRP packets. One driver 
steps on the other. 

The end result is a system crash at IPL6 on a REMQUE instruction with bad 
'linkages' to other packets in the queue.

Knowing this, the finger pointing went several more rounds as to who was going 
to fix the problem. All groups ended up declined to provide a software fix 
for the problem.

They all got together and wrote an Eng. position paper regarding the problem 
and effectively said that the use of DEBNA Ethernet adapters on VAX 6610 
platforms was unsupported. This due to the speed of the processor in executing 
device driver code and the lack of speed within the DEBNA for processing 
commands within its interface. The Eng. solution to the problem is/was to 
UPGRADE ALL DEBNA interfaces to AT LEAST DEBNIs. The best solution was to 
upgrade to the XMI Ethernet adapters (I forget their name).

That was the end of story as far as DEC Eng. (all groups involved) were 
concerned.

FORGET the FACT that there is NOTHING AT ALL in the Systems and Options book 
about NOT being able to use DEBNA interfaces on 6610's. You the user of the 
VAX 6610's are now STUCK in the cold with interfaces that are useless for the 
platform DEC said we COULD upgrade to with NO PROBLEMS!

Now we had a REAL PROBLEM on our hands. We have one hundred and ninety (190) 
DEBNA interfaces on the sixty five (65) 6610's. Quick mutliplication of 
replacement board prices times 190 shows the cost to be six or seven digits in 
size. 

Then DEC dusted off the DEBNA to DEBNI upgrade kit with a much lower cost, 
which when multiplied by 190 still amounted to hefty sum. The upgrade kit is a 
set of EEPROMS for the DEBNA along with a couple of other little parts.

We ended up getting 190 upgrade kits and installing them. 

Since the upgrade, we have NOT had the IPL6 crashes with the ET/PEA driver 
footprint. We also, do not run the SPM event trace programs anymore.

There are several footnotes to this complicated story.

1. DEC apparently felt/feels NO RESPONSIBILITY for MISINFORMATION in the S&O 
book with regard to this problem. I have not looked to see if the 6610 upgrade 
options NOW list DEBNA upgrades as mandatory. 

2. If you call the person you are dealing with on your version of this problem 
and ask THEM to go talk with Mark/Rob and mention the IPL6 ET/PEA problem at 
the MMC/GE STGT program, there should be all sorts of light bulbs going off. 
They should also be able to download the two SPM Event trace programs so that 
you can run them to at least SLOW DOWN the occurance of these crashes. If you 
are running DSNlink, they are easily downloadable. The use of the SPM programs 
is one option until you decide what you want to do about upgrading the 
DEBNA's.

3. I have a copy of the DEC Eng. letter stating the 'position' that DEBNA 
interfaces are not supported on 6610 platforms. I can FAX a copy to those 
interested, as long as the request list does not get to far out of hand in 
length.

4. To me, this problem shows that there REMAINS A PROBLEM between the ET/PEA 
drivers under heavy load. Note that at NO TIME has there been code changes for 
this issue. The solution is hardware based. The DEBNI interface is quicker 
then the DEBNA. I do not know if the command queue is any larger as well. 
Seems to me that we could very well end up back in the SAME PROBLEM given the 
'right conditions' of I/O load on the DEBNI interfaces. While I am not holding 
my breath waiting for the problem to surface yet again on the DEBNI's, I fully 
expect to see it again on this program. At that point, we will be faced with 
the truly ugly task of getting approvals to upgrade 190 DEBNI interfaces to 
the XMI versions. And since this is a NASA program, that means getting 
approvals from the US Congress for the funds. Like I mentioned earlier, an 
UGLY task.

5. It is my understanding that DEC has closed the plant that manufactured the 
DEBNA to DEBNI upgrade kits as part of its cost reduction plans. And that the 
kits are not being manufactured by any other plant and therefore are not 
available for purchase anymore. Rumor had it that our 190 kit order resulted 
in the plant in New England remaining open a few weeks longer then planned in 
order to get the parts together. Therefore any site that has this problem now, 
may have as its only option, purchasing complete DEBNI interface cards at a 
significantly higher cost then the DEBNA to DEBNI upgrade kits. Maybe if 
enough sites have this problem and SCREAM about it to DEC, the kits may become
available once again.

If there are comments or questions regrading the information presented here, I 
will respond to them as they are received.

Take care, and good luck...

pdc


Paul D. Clayton 
Address - CLAYTON@RADIUM.XRT.UPENN.EDU		(USA)

Disclaimer:  All thoughts and statements here are my own and NOT those of my 
employer, and are also not based on, or contain, restricted information.