XFS crashing system with general protection fault
Bruno Prémont
bonbons at linux-vserver.org
Tue Feb 10 01:05:47 CST 2015
Hi Dave,
On Tue, 10 Feb 2015 08:24:20 +1100 Dave Chinner wrote:
> On Mon, Feb 09, 2015 at 09:47:01AM +0100, Bruno Prémont wrote:
> > On Fri, 6 Feb 2015 09:15:16 +1100 Dave Chinner wrote:
> > > On Thu, Feb 05, 2015 at 03:10:07PM +0100, Bruno Prémont wrote:
> > > > New crash, new trace, this time on 3.18.2.
> > > > It looks like this time a NULL dereference happened prior to touched memory poison being detected.
> > > >
> > > > Once again it's during normal system operation (no mount/umount activity)
> > >
> > > Can you rebuild the kernel with CONFIG_XFS_WARN=y and see if that
> > > throws any interesting messages into logs?
> >
> > Will try and see
> >
> > > However:
> > >
> > > > [1900390.261491] =============================================================================
> > > > [1900390.272989] BUG task_struct (Tainted: G D W ): Poison overwritten
> > > > [1900390.283021] -----------------------------------------------------------------------------
> > > > [1900390.283021]
> > > > [1900390.297056] INFO: 0xffff880213d651b3-0xffff880213d651b3. First byte 0x6d instead of 0x6b
> > > > [1900390.309044] INFO: Slab 0xffffea00084f5800 objects=16 used=16 fp=0x (null) flags=0x8000000000004080
> > > > [1900390.323087] INFO: Object 0xffff880213d64ba0 @offset=19360 fp=0xffff880213d61e40
> > > > [1900390.323087]
> > > > [1900390.336988] Bytes b4 ffff880213d64b90: 60 2d d6 13 02 88 ff ff 5a 5a 5a 5a 5a 5a 5a 5a `-......ZZZZZZZZ
> > > > [1900390.350988] Object ffff880213d64ba0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > > [1900390.364943] Object ffff880213d64bb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > ....
> > > > [1900391.674636] Object ffff880213d651b0: 6b 6b 6b 6d 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkmkkkkkkkkkkkk
> > > ^^
> > >
> > > There's a single bit that has been flipped in the task_struct slab.
> > > So more than just XFS is seeing memory corruption - this is in core
> > > kernel structure slab caches. I'm not sure, either, how XFS could
> > > cause corruption in this slab.
> > >
> > > So, I'd be checking all the previous memory corruptions to see if
> > > they are single bit errors, and if there is any pattern to the
> > > addresses at which they occur. The above bit flip makes me think
> > > "hardware issue" and everything else stems from that...
> >
> > System has ECC RAM so faulty RAM looks less probable (no complaint seen
> > by kernel nor recorded by firmware).
>
> Sure, but that's not the only hardware in the memory path so single
> bit errors can occur elsewhere as data moved across the bus of sits
> in cpu caches. and if you're not using an IOMMU then it could even
> be hardware writing to memory incorrectly...
>
> > All previous crashes for which I have some logs were dereference after
> > free but not attempt to allocate memory from a modified poison in free
> > slabs.
> >
> > Though what does that single bit represent in that area if it was
> > used/modified after free?
>
> It means that there's either a use after free, or you have a
> hardware problem. being in the task struct slab, if it's a use after
> free then it's unlikely to be an XFS problem.
I mean what field does the affected byte/bit belong to in task_struct
in order to see if it could be some write-after-free (of a task_struct)
or not.
> FWIW, can you post the output of "grep PARAVIRT <kernel config
> file>"?
grep does not find any match (full config, prior to enabling XFS_WARN
attached).
Cheers,
Bruno
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xfs.config
Type: application/octet-stream
Size: 83962 bytes
Desc: not available
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150210/af777497/attachment-0001.obj>
More information about the xfs
mailing list