Re: crashing problem related to multipipe and p22

New Message Reply Date view Thread view Subject view Author view

Rob Jenkins (robj++at++sgi.com)
Wed, 12 Aug 1998 10:45:18 +0100


Sulen, Andy B wrote:
>
> We're experiencing a similar problem using Performer2.2 in a
> 2-channel/pipe configuration as shown from the following SYSLOG excerpt.
> Unfortunately, we haven't discovered the cause of the problem.
> Please pass along any info you obtain and we'll do the same on our end.
>
> > unix: WARNING: IR2: Pipe hung: no TBUS or ARM activity.
> >
> > unix: WARNING: IR2: swapbuffer timeout (rnp=0xc000000001402eb0)
> >
> unix: WARNING: IR2: idma timed out for io_addr 0x834000, len
> 0x1ab80
>
> unix: WARNING: IR2: Fatal error. bdata = 0xa800000000a79800
> death_reason = 4
>

These are all pretty generic pipe crash warnings, the pipe hung so
swapbuffers eventually timed out. The idma might help narrow things down
a bit. If you don't have the CPU panics that Eddy got then you could
well have a very different problem. Neither scenario gives enough info
to make much of a diagnosis so I'll paste some generic steps below that
should help narrow in on the problem. In general, with any repeated iR
pipe crashes:

- Check you have the lastest iR patches
- Log a support call
- Check the HW ( irsaudit )
- Come up with a way to reproduce the crash at will so support can debug
it.

I have some more detailed steps in a web page, I haven't posted it
though to avoid the bandwidth overhead of sending a fair sized html doc
to over 1000 people, instead I saved it as text and posted it. The html
has links to internal pages and tools that wouldn't work outside the SGI
firewall anyway.

Cheers
Rob

  
> Thanks,
>
> Andy B. Sulen
> Boeing Defense & Space Group
> Integrated Technology Development Laboratories
> PH: 206-544-6438 / FAX: 206-655-1875
> E-Mail: andy.b.sulen++at++boeing.com
>
> > ----------
> > From: Eddy Kuo[SMTP:ekuo++at++ait.nrl.navy.mil]
> > Sent: Tuesday, August 11, 1998 7:47 AM
> > To: info-performer++at++sgi.com
> > Cc: Gary Samuel
> > Subject: crashing problem related to multipipe and p22
> >
> > Hello:
> >
> > I am trying to render four channels on a two pipe Onyx2 IR machine.
> > Each pipe is rendering two channels. Beside rendering, I also
> > setup a xformer, and get mouse input from the master channel, and
> > that's all the program does. The crash occurs only when I render
> > into four windows (I can do three windows, two on one pipe, and one
> > on the other pipe). I am using Performer 2.2 with the latest patch,
> > and IRIX 6.4
> >
> > Appreciate for any help.
> >
> > I am including an excerpt from from SYSLOG below.
> >
> >
> > ......
> >
> >
> > Aug 11 10:08:17 4A:fargo unix: WARNING: IR0: Pipe hung: no TBUS or
> > ARM
> > activity.
> > Aug 11 10:08:17 2A:fargo unix:
> > Aug 11 10:08:17 4A:fargo unix: WARNING: IR0: swapbuffer timeout
> > (rnp=0xc00000000 14c4b58)
> > Aug 11 10:08:172A:fargo unix: Aug 11 10:15:29 6F:fargo syslogd:
> > restart
> >
> > ......
> >
> > Aug 11 10:15:48 5D:fargo sn0log: B Fatal: PANIC:
> > Aug 11 10:15:48 5D:fargo sn0log: B Fatal: CPU 1:
> > Aug 11 10:15:48 5D:fargo sn0log: B Fatal: assertion failure!
> > Aug 11 10:15:48 5D:fargo sn0log: B Fatal:
> >
> > .....
> > ======================================================================
> > =
> > List Archives, FAQ, FTP: http://www.sgi.com/Technology/Performer/
> > Submissions: info-performer++at++sgi.com
> > Admin. requests: info-performer-request++at++sgi.com
> >
> =======================================================================
> List Archives, FAQ, FTP: http://www.sgi.com/Technology/Performer/
> Submissions: info-performer++at++sgi.com
> Admin. requests: info-performer-request++at++sgi.com

-- 
________________________________________________________________
Rob Jenkins	Silicon Graphics 	mailto:robj++at++sgi.com

[Image]

Debugging a Crashing InfiniteReality Pipe

------------------------------------------------------------------------

Introduction

This page goes over some steps that are common when trying to isolate a problem with an InfiniteReality ( iR ) graphics pipe hanging or crashing. It's a collection of things that invariably get checked or done in the process of working a support call or bug logged for such a problem. You can use these things to help work a call/bug and ideally go through this list before opening a new support call or bug. Identifying the nature of the problem you're seeing may well point you to a known bug and so potentially a solution or workaround.

Checking Patches

Checking Hardware

Debugging the Pipe

Narrowing Down a Testcase

Flashing the EEPROM

Stopping and Starting the Pipe

Link to Useful Graphics Tools and Resources

Checking Patches

Always double check that the machine with the problem has the latest iR gfx patch installed. It's vital to make sure that you're not chasing a problem that is already fixed. At the time of writing the latest released iR gfx patches were:

patch 2922: Onyx2 6.4 graphics rollup #5 including GVO support or

patch 2327: Onyx (not Onyx2) InfiniteReality 6.2 Fifth Release

For Onyx2 and Onyx respectively. These patches get replaced often so always check the lineage of them to see if there are later replacements released. You can go to the Patchworks page and follow the lineage link for any patch or click on the links on the patch numbers above to view that patch in Patchworks, then scroll down to follow the lineage link for that specific patch you are viewing. For completeness you should really install the iR gfx patch with the whole recommended patchset that it's in as it would be tested against those patches. You should probably at least make sure you have the latest kernel rollup patch for the platform you're on, again take the number of the kernel rollup installed on the problem machine and use Patchworks to trace the lineage of that patch, make sure you have the most recent released patch.

Checking Hardware

Check that the latest Onyx Diagnostics patch is installed. The newer the diagnostics patch you have, the more likely it is that any HW problem will get detected. Passing diagnostics doesn't mean you can be 100% sure that there's no HW problem but it at least gives some confidence that HW is OK. At the time of writing the latest iR diagnostics patches were:

patch 2795: Onyx2 Diagnostics 7th release or

patch 2371: InfiniteReality (Onyx) Diagnostics Fifth Release

For Onyx2 and Onyx respectively. These patches get replaced often so always check the lineage of them to see if there are later replacements released. You can go to the Patchworks page and the follow the lineage link for any patch or click on the links on the patch numbers above to view that patch in Patchworks, then scroll down to follow the lineage link for that specific patch you are viewing.

Once you have the latest diags patch, run the iR diagnostics, irsaudit.

The InfiniteReality, Essential Kona Information page has a link with lots of irsaudit info and there's also a man page.

Debugging the Pipe

If you've checked the patches and checked the HW and you still have a problem then here's some suggestions for narrowing down and debugging it.

Try and describe the symptoms, these problems generally fall into a couple of loose areas, some useful things to establish are:

* Does the problem happen on one machine ? If possible try and reproduce on other similar machines. If it really happens on > 1 similar machines then the chances of it being a HW problem are reduced. * Does the problem happen with one specific application ? * If it happens with the same application always, is it with the same combination of actions ? * Does the pipe crash ( and then should restart, putting you back to the login screen ) or does the pipe just hang in that the gfx locks up but the rest of the machine is still fine ? * Can you get useful Kona Post Mortem ( kpm ) dumps ?

You use kpm to dump the state of an iR gfx pipe. The InfiniteReality, Essential Kona Information page has a link with lot's of information about kpm and there's a man page. If a pipe crashes it will try to restart, if gfxinit() detects that it was in a bad state it should trigger kpm automatically. Alternatively it's possible to force kpm to dump when the pipe is hung in a bad state, both methods are talked about below:

You can find out if kpm ever ran when the pipe restarted, in /var/adm/SYSLOG you would see a message like:

Jan 21 07:56:29 4D:mental gfxinit: Pipe 0 in bad state (0x11) ... starting postmortem Jan 21 07:56:30 6B:mental kpm[1613]: Initiating kpm dump

If you do ls -l /var/adm/crash/diags/gfx/IR then you'll see any files produced by kpm, if any are from the time that the pipe crashed with this problem then they might be useful. Unfortunately the kpm dump started automatically by gfxinit sometimes times out on Onyx2 ( rarely on Onyx ) so you might see a dump file ( with a name like kpm_980121_073720.dump ) that has a length > 0 but it's corresponding .rslt and .sum files are length 0, in this case the dump file is usually no good. If there are kpm_xxxxx.dump files that are > 0 length with .rslt and .sum file > 0 length then copy them somewhere to save them, kpm only keeps the 5 most recent dumps so if the pipe keeps crashing you'll overwrite the dumps you have with new ones eventually.

Realistically kpm dumps are most useful to engineers in the iR SW group but it is often possible to work out if you're seeing a known problem or get a rough idea of the kind of problem you're seeing by searching the database of support calls and bugs with the Oasis search tool and some key words from a kpm dump. The InfiniteReality, Essential Kona Information page link about kpm describes how to run kpm and look at the dumps. I've found that these things can sometimes turn up useful info:

* Finding known bugs with similar combinations of 'Current command' and/or 'Previous command', maybe also tied in with similar 'GE<N> stopped at' type messages can sometimes indicate that you're seeing the same bug. It's easy to get a misleading impression from kpm dumps that you're seeing a known problem, always keep any dumps you have for engineering to look at if needs be. * Seeing a consistent pattern of messages from different dumps generated from the problem you're seeing, if you can't pin it to a known bug then at least you/engineering can narrow down on the type of things an application might need to do trigger such things.

Note: I wouldn't suggest spending too much time on looking at kpm dumps, having a reproducible test case and then generating some dumps for engineering to look at is likely to be most useful, it's worth just trying to make sure you're not obviously seeing a known bug.

If kpm never seems to have run successfully at the time of the pipe crash ( i.e. no useful files in /var/adm/crash/diags/gfx/IR ) then it might be worth trying to set the machine up to not automatically restart a crashed gfx pipe. When the pipe does crash you can force the dump of the pipe state by hand and then restart the pipe once the dump is done. You'll need to run these commands as root:

chkconfig windowsystem off ( this means that gfx won't restart if they crash ) run app to the point where gfx pipe crashes/hangs /usr/gfx/stopgfx ( make sure gfx is stopped ) /usr/gfx/KONA/bin/kpm -dump

Once done, there should be a kpm dump file /var/adm/crash/diags/gfx/IR/kona.dump

Starting the gfx will turn windowsystem back on and restart the pipe normally.

Narrowing Down a Testcase

The primary thing with gfx pipe problems is getting a reliable way to reproduce the problem. If the best testcase turns out to be a large application then that is still better than no way to reproduce at all. Ideally though it's worth narrowing the test case right down to OpenGL only. The best way to start doing this is with ogldebug ( even IRIS GL applications are running through the IGLOO layer on iR and generating OpenGL calls for the pipe so ogldebug will still pick up the underlying OpenGL ). This Pipeline article has an ogldebug overview and also refers to other useful ogldebug info. If you can get an impression for what kind of things the gfx is doing when it fails you can use ogldebug to zoom in further. If you can get a GL trace into a form where it's compileable and reproduces the problem you can start to eliminate things. I would normally try and:

* Take the serial GL call trace ( often very long ), identify repeated sections of code and make a loop which goes round them. * Turn on ogldebug GL error checking, use any error codes to search the GL header files in /usr/include to get more clues about the error. Also, look at the man page for the function giving the error, look at the limitations and machine dependencies sections. * Establish which GL state needs to be enabled to see the problem and/or which modes need to be set to what values. * Iterate, turning off states, trying different combinations and removing data until you have the smallest GL code you can get to make the problem happen, this can be a huge help to engineering in locating and fixing any new bugs. Bear in mind that problems aren't always due to a certain combination of modes and functions in OpenGL, they might be related to certain data sizes or types for example. * Reduce the number of contexts used, knowing if the problem is related to multiple contexts is very useful. * If you suspect certain OpenGL functions are triggering the problem you can set ogldebug to skip over those calls to test your suspicions. * Reduce the requirements of the visual used, try to work out if the problem is at all dependent on the visual. Bear in mind that changing the visual may effect the OpenGL features or modes being used. The command findvis can be used from the command line to list OpenGL capable visuals on a machine. The command glxinfo can be used to display information about OpenGL capable visuals and the OpenGL renderer of an X server. The xwininfo command can be used to determine what visual an application is using by specifying the -tree and then -id options. Within a program, glXChooseVisual() returns a visual that matches specified attributes, while glXGetConfig() returns information about GLX visuals.

Flashing The EEPROM

The iR video format combination can be changed with ircombine, The InfiniteReality, Essential Kona Information page has a link with information about ircombine and there's a man page. If an iR pipe seems to come up with out a valid video format it can be useful to be able to flash the GE EEPROM and restart the pipe with a default vof. This is a bulletin on the GE EEPROM and has a description of conditions you might do this under and the ireeprom utility function that you might use.

Stopping and Starting the Pipe

In theory all that is needed to cycle the graphics pipe is just stop/startgfx ( or use the 'Vulcan Death Grip: <shift><ctrl><alt><f12><numpad /> ). In practice however, especially on multipipe systems, it's often worth taking a more 'belt'n'braces' approach and doing:

(/usr/gfx/stopgfx;/etc/init.d/xdm stop;sleep 2;killall -9 xdm;/usr/gfx/startgfx) &

NOTE: This can be done from the graphics monitor and from Unix. Do not forget the ( ) and the & ------------------------------------------------------------------------

Please contact Rob Jenkins if you have any commentss or suggestion.


New Message Reply Date view Thread view Subject view Author view

This archive was generated by hypermail 2.0b2 on Wed Aug 12 1998 - 04:17:32 PDT

This message has been cleansed for anti-spam protection. Replace '++at++' in any mail addresses with the '@' symbol.