From: SMTP%"RELAY-INFO-VAX@CRVAX.SRI.COM" 19-JAN-1994 13:40:57.56 To: EVERHART CC: Subj: Thoughts on VMS crashes under heavy system load... Date: Tue, 18 Jan 1994 15:46:24 -0500 (EST) From: "Clayton, Paul D." To: INFO-VAX@SRI.COM Cc: CLAYTON@radium.xrt.upenn.edu Message-Id: <940118154624.3fc0274b@radium.xrt.upenn.edu> Subject: Thoughts on VMS crashes under heavy system load... Phillip Green @ "GREEN@GREY.SPS.MOT.COM" reports the following problem. >We have a cluster of 6610s and a variety of satellites. We have Phase II >volume shadowing on critical disks and utilise I/O Express extensively. > >Under circumstances of heavy CPU, Network and disk i/o load with lots of Phase >II shadow copying going on, we get a 6610 crashing with the INVEXCEPTN bugcheck >code. At DEC's request we switched the DECps data collector off and the crash >code changed to PGFIPLHI... pigs will fly!! > >All 6000's clock ETA0 errors prior to the crash but the DEBNAs have been >replaced. Analysis of the crash shows typically a REMQUE instruction in >SYSTEM_PRIMITIVES attempting to access a location in S1 space. That's as far >as we & DEC have got. This problem sounds very close to a severe and costly problem that we had on a program I am currently working for NASA and MMC. We originally purchased VAX 6310/20's, found them to be wanting in CPU cycles due to our application load and ended up upgrading sixty five (65) 63xx systems to 6610's with 128MB of memory. This was a COSTLY deal for everyone. We had no sooner put the 6610's on our development systems at the time and we started having crashes of VMS. The bugchecks were either INVEXCPTN or PGFLTIPLHI. They were sporadic in nature, spread over four MI clusters and all had double entries in the system error log prior to the crash reporting a PEA0 error (such as a satellite leaving the cluster) and then an ETA0 error indicating that the internal command buffer/queue had overflowed to the DEBNA. A EVALUATE/PSL of the PSL at the time of the crash showed the system to consisently be executing at IPL 6. Examination of the code stream showed that the problem revolved around a REMQUE instruction and if the target packet was formatted, the back link (second longword) was always found to be in error. It typically had values that were totally incorrect for the packet type and queue structure. There appeared to be a massive confusion factor working in that the packets started as an absolute queue but then one of the links started looking like a self relative deal. If you VALIDATEd QUEUE on the IRP queue, you my end up finding hundreds/thousands of packets waiting for work to be performed. The LAST packet in the queue ended up with the 'bogus' values for the linked list portion. At the time of the crashes, we were ALWAYS doing MASSIVE amounts of Ethernet based I/O consisting of MSCP, DECnet, SCS and/or LAT. In our case doing copies of thousands of files using MSCP (to a RAM disk on another cluster node) or concurrent DECnet file copies to other nodes in our network. The precursor to most crashes was a satellite exiting the cluster for some reason or another. The reason the satellite went away was a don't care to our version of this problem. Replacement of the DEBNA boards with new ones had no impact on the problem. We ended up with VERY LENGTHY talks with CSC, both the VMS Internals and CDDS/VAST groups. Mark on the Internals team and Rob on the VAST group ended up on the call. Mark coded up an SPM event trace program that recorded all threads of code that executed at IPL6 into a block of nonpaged pool for later SDA analysis when the crash occured. This showed that the ETDRIVER and PEADRIVER were the big users of the IPL6 fork queue. Mark then coded up another SPM Event trace program that recorded all portions of VMS code that requested packets from Nonpaged pool and who returned them. This showed us that the culprits were definitiatly the ET and PEA drivers and the packets they used and how they worked together. Or were suppose to work together. One side effect of running BOTH of these programs on the 6610's, was that the crashes did NOT occur as frequently as before. Instead of one every couple of days, we could now go weeks. And even considering what was being recorded by both programs, the overall throughput on the 6610's was not impacted all that adversely. The problem was then raised to an LOR level and the Ethernet, cluster and VMS groups started throwing this problem around between them. Then the group that deals with the DEBNA's got involved and pointed to the command overflow errors in the one errorlog entry of each pair. It turns out that the DEBNA has an ability to work with many IRP's concurrently, each putting various commands for sending/receiving data into the command buffer on DEBNA. The problem, as it turned out in our case, was when ANY Ethernet error occured. We were effectively doomed (unless the SPM programs were running) any time an error occured. In the steps taken to correct an error that occured on a DEBNA, the PEA and ET drivers work on the SAME IRP packets to get various pieces of info so that proper error handling can take place. One driver uses absolute queues, the other uses self-relative. Each maintains their own list. The PROBLEM is that either apparently can deallocate packets it deems no longer needed due to the error and the other driver MAY STILL NEED the same packet for other things. There apparently is no high level synching between the two drivers in the case of error recovery for IRP packets. One driver steps on the other. The end result is a system crash at IPL6 on a REMQUE instruction with bad 'linkages' to other packets in the queue. Knowing this, the finger pointing went several more rounds as to who was going to fix the problem. All groups ended up declined to provide a software fix for the problem. They all got together and wrote an Eng. position paper regarding the problem and effectively said that the use of DEBNA Ethernet adapters on VAX 6610 platforms was unsupported. This due to the speed of the processor in executing device driver code and the lack of speed within the DEBNA for processing commands within its interface. The Eng. solution to the problem is/was to UPGRADE ALL DEBNA interfaces to AT LEAST DEBNIs. The best solution was to upgrade to the XMI Ethernet adapters (I forget their name). That was the end of story as far as DEC Eng. (all groups involved) were concerned. FORGET the FACT that there is NOTHING AT ALL in the Systems and Options book about NOT being able to use DEBNA interfaces on 6610's. You the user of the VAX 6610's are now STUCK in the cold with interfaces that are useless for the platform DEC said we COULD upgrade to with NO PROBLEMS! Now we had a REAL PROBLEM on our hands. We have one hundred and ninety (190) DEBNA interfaces on the sixty five (65) 6610's. Quick mutliplication of replacement board prices times 190 shows the cost to be six or seven digits in size. Then DEC dusted off the DEBNA to DEBNI upgrade kit with a much lower cost, which when multiplied by 190 still amounted to hefty sum. The upgrade kit is a set of EEPROMS for the DEBNA along with a couple of other little parts. We ended up getting 190 upgrade kits and installing them. Since the upgrade, we have NOT had the IPL6 crashes with the ET/PEA driver footprint. We also, do not run the SPM event trace programs anymore. There are several footnotes to this complicated story. 1. DEC apparently felt/feels NO RESPONSIBILITY for MISINFORMATION in the S&O book with regard to this problem. I have not looked to see if the 6610 upgrade options NOW list DEBNA upgrades as mandatory. 2. If you call the person you are dealing with on your version of this problem and ask THEM to go talk with Mark/Rob and mention the IPL6 ET/PEA problem at the MMC/GE STGT program, there should be all sorts of light bulbs going off. They should also be able to download the two SPM Event trace programs so that you can run them to at least SLOW DOWN the occurance of these crashes. If you are running DSNlink, they are easily downloadable. The use of the SPM programs is one option until you decide what you want to do about upgrading the DEBNA's. 3. I have a copy of the DEC Eng. letter stating the 'position' that DEBNA interfaces are not supported on 6610 platforms. I can FAX a copy to those interested, as long as the request list does not get to far out of hand in length. 4. To me, this problem shows that there REMAINS A PROBLEM between the ET/PEA drivers under heavy load. Note that at NO TIME has there been code changes for this issue. The solution is hardware based. The DEBNI interface is quicker then the DEBNA. I do not know if the command queue is any larger as well. Seems to me that we could very well end up back in the SAME PROBLEM given the 'right conditions' of I/O load on the DEBNI interfaces. While I am not holding my breath waiting for the problem to surface yet again on the DEBNI's, I fully expect to see it again on this program. At that point, we will be faced with the truly ugly task of getting approvals to upgrade 190 DEBNI interfaces to the XMI versions. And since this is a NASA program, that means getting approvals from the US Congress for the funds. Like I mentioned earlier, an UGLY task. 5. It is my understanding that DEC has closed the plant that manufactured the DEBNA to DEBNI upgrade kits as part of its cost reduction plans. And that the kits are not being manufactured by any other plant and therefore are not available for purchase anymore. Rumor had it that our 190 kit order resulted in the plant in New England remaining open a few weeks longer then planned in order to get the parts together. Therefore any site that has this problem now, may have as its only option, purchasing complete DEBNI interface cards at a significantly higher cost then the DEBNA to DEBNI upgrade kits. Maybe if enough sites have this problem and SCREAM about it to DEC, the kits may become available once again. If there are comments or questions regrading the information presented here, I will respond to them as they are received. Take care, and good luck... pdc Paul D. Clayton Address - CLAYTON@RADIUM.XRT.UPENN.EDU (USA) Disclaimer: All thoughts and statements here are my own and NOT those of my employer, and are also not based on, or contain, restricted information.