|
Bugzilla – Full Text Bug Listing |
| Summary: | xfs_force_shutdown(dm-0,0x8) called from line 1091 of file fs/xfs/xfs_trans.c | ||
|---|---|---|---|
| Product: | XFS | Reporter: | Peter Nealy <pnealy> |
| Component: | XFS kernel code | Assignee: | XFS power people <xfs-masters> |
| Status: | RESOLVED WORKSFORME | QA Contact: | |
| Severity: | blocker | ||
| Priority: | P1 | CC: | hch |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
|
Description
Peter Nealy
2007-08-13 17:23:46 CDT
Looks like it's canceling a dirty transaction:
1084 /*
1085 * See if the caller is relying on us to shut down the
1086 * filesystem. This happens in paths where we detect
1087 * corruption and decide to give up.
1088 */
1089 if ((tp->t_flags & XFS_TRANS_DIRTY) &&
1090 !XFS_FORCED_SHUTDOWN(tp->t_mountp))
1091 xfs_force_shutdown(tp->t_mountp, XFS_CORRUPT_INCORE);
2.6.12 is awfully old, but I understand that it's an embedded product.
There may have been a fix for this since then, it rings a bell, but I don't
recall offhand.
You could set the panic mask sysctl:
fs.xfs.panic_mask (Min: 0 Default: 0 Max: 127)
Causes certain error conditions to call BUG(). Value is a bitmask;
AND together the tags which represent errors which should cause panics:
XFS_NO_PTAG 0
XFS_PTAG_IFLUSH 0x00000001
XFS_PTAG_LOGRES 0x00000002
XFS_PTAG_AILDELETE 0x00000004
XFS_PTAG_ERROR_REPORT 0x00000008
XFS_PTAG_SHUTDOWN_CORRUPT 0x00000010
XFS_PTAG_SHUTDOWN_IOERROR 0x00000020
XFS_PTAG_SHUTDOWN_LOGERROR 0x00000040
to trip a panic when you get a shutdown; you could then get a BUG, backtrace,
and perhaps a dump at the moment shutdown was called... though I don't know if
that's feasible in the field.
-Eric
Thanks Eric for responding so soon. I'll have them set the panic mask sysctl, but will they be able to relay that information to me easily when the failure occurs? They're not running a kdb enabled kernel, and us not being there locally that would be difficult anyway. Is this information spewed into /var/log/messages or something similar to access after they reboot the appliance? if it triggers a BUG() you might get it in /var/log/messages or maybe only as far as the console. I don't know what you guys have in place for remote debugging or post-mortems... that part is up to you I think. :) -Eric We're back chasing this again. Is there a reliable way to force a crashdump of xfs so we can ensure our kdb/collection are working for us? See Eric's original replay (comment #1). i.e. 'echo 255 > /proc/sys/fs/xfs/panic_mask' Will cause the machine to panic on detection of the problem. FWIW, are the machines at/near ENOSPC when this shutdown occurs? Yes I have the panic_mask set to all flags on every bootup. filesytems can have as little as 7% on them when they fail so not near ENOSPC We're actually putting 2.6.21 with all of our special sauce on them at that level. KDB is in place and on by default. My hope is we never see a failure again because - if it is even an xfs issue - the problem was fixed since 2.6.12.6 Will keep you informed. If you want to test crash collection, echo c > /proc/sysrq trigger should start a crash sequence. also look into netdump, diskdump, kdump, or whatnot for collecting a core from the field (I suppose in an embedded product, diskdump or kdump would be the options... though if this is an arm box, I don't know if such things work there) Did you guys make any progress on this one? Closed due to lack of feedback. Please re-open if you have new information. |