[Cc'ed lkml, hence the full-quote]
On Fri, 12 Oct 2012 at 08:33, Dave Chinner wrote:
> On Thu, Oct 11, 2012 at 11:13:14AM -0700, Christian Kujau wrote:
> > Hi,
> > since Linux 3.5 I'm seeing these "inconsistent lock state" lockdep
> > warnings . They show up in 3.6 as well . I was being told that I
> > may have run out of inode attributes. This may well be the case, but I
> > cannot reformat the disk right now and will have to live with that warning
> > a while longer.
> > I got the warning again today, but 8h later the system hung and eventually
> > shutdown. The last message from the box was received via netconsole:
> > XFS: possible memory allocation deadlock in xfs_buf_allocate_memory
> > (mode:0x250)
> > This has been reported for 3.2.1, but when the message was printed I
> > was not around and could not watch /proc/slabinfo.
> > The last -MARK- message (some kind of heartbeat message from syslog,
> > printing "MARK" every 5min) has been received 07:55 local time, the
> > netconsole message above was received 08:09, so two -MARK- messages were
> > lost. sar(1) stopped recording at 08:05.
> > These two incidents (the lockdep warning and the final lockup) may be
> > unrelated (and timestamp-wise, they seem to be), but I thought I'd
> > better report it.
> > Full dmesg and .config: http://nerdbynature.de/bits/3.6.0/xfs/o
> The inconsistent lock state is this path:
> Oct 11 00:18:27 alice kernel: [261506.767190] [e5e2fc70] [c00b5d44]
> Oct 11 00:18:27 alice kernel: [261506.768354] [e5e2fcf0] [c01a6f9c]
> Oct 11 00:18:27 alice kernel: [261506.769522] [e5e2fd10] [c01a8090]
> Oct 11 00:18:27 alice kernel: [261506.770698] [e5e2fd30] [c01ffad8]
> Oct 11 00:18:27 alice kernel: [261506.772106] [e5e2fd50] [c01e9504]
> Oct 11 00:18:27 alice kernel: [261506.773309] [e5e2fde0] [c01eabc0]
> Oct 11 00:18:27 alice kernel: [261506.774493] [e5e2fe20] [c01bd5c0]
> Oct 11 00:18:27 alice kernel: [261506.775672] [e5e2fe60] [c01b8450]
> Oct 11 00:18:27 alice kernel: [261506.776863] [e5e2fe70] [c00dfa88]
> Oct 11 00:18:27 alice kernel: [261506.778046] [e5e2fe90] [c00db968]
> Oct 11 00:18:27 alice kernel: [261506.779235] [e5e2feb0] [c00d2e24]
> Oct 11 00:18:27 alice kernel: [261506.780425] [e5e2fed0] [c00d2f10]
> Oct 11 00:18:27 alice kernel: [261506.781619] [e5e2ff40] [c0010aac]
> Which indicates that it is this:
> >  http://oss.sgi.com/archives/xfs/2012-09/msg00305.html
> which is a real bug in the VM code that the VM developers refuse to
> fix even though it is simple to do and patches have been posted
> several times to fix it.
> /me points to his recent rant rather than repeating it:
Read through it, only understood the half of it. But interesting to see
that there seems to be a real issue.
> However, that is unrelated to this message:
> XFS: possible memory allocation deadlock in xfs_buf_allocate_memory
> which triggers when memory cannot be allocated. mode 0x250 is
> ___GFP_NOWARN | ___GFP_IO | ___GFP_WAIT
> or more commonly known as:
> with warnings turned off. Basically the warning is saying "we're
> trying really hard to allocate memory, but we're not making
> progress". If it was only emitted once, however, it means that
> progress was made, as the message is emitted every 100 times through
> the loop and so only one message means it looped less than 200
Memory usage at that time was not different than on other days, so I don't
know why it had a hard time allocating memory. But I don't have any
> What it does imply, however, is that vm_map_ram() is being called
> from GFP_NOFS context quite regularly and might be blocking there,
> and so the lockdep warning is more than just a nuisance - your
> system may indeed have hung there, but I don't have enough
> information to say for sure.
> When it hangs next time, can you get a blocked task dump from sysrq
> (i.e. sysrq-w, or "echo w > /proc/sysrq-trigger")? That's the only
OK, will try to get this information the next time this happens.
> way we're going to know where the system hung. You might also want
> to ensure the hung task functionality is built into you kernel, so
> it automatically dumps stack traces for tasks hung longer than
Yes, the option was already set:
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
Thanks for digging through this and for the explanation!
BOFH excuse #398:
Data for intranet got routed through the extranet and landed on the internet.