On August 16, David Sparks wrote:
> I have a server where the `dmesg` is full of:
>
> Filesystem "hda4": xlog_state_do_callback: looping 10
> Filesystem "hda4": xlog_state_do_callback: looping 20
> Filesystem "hda4": xlog_state_do_callback: looping 10
> Filesystem "hda4": xlog_state_do_callback: looping 10
> Filesystem "hda4": xlog_state_do_callback: looping 10
> Filesystem "hda4": xlog_state_do_callback: looping 20
> [...]
>
> The hda4 partition contains the /var tree.
First, the purpose of that function is to execute I/O completion
callbacks from XFS log buffer writes. The callbacks have to be run in
log block sequence order; that is, if writes complete out of order, it
is this function's job to make sure the calls are called in order (the
faster I/O completion has it's callbacks delayed until the slower one
completes).
It is theoreticly possible for this function to never complete. If
the filesystem is under an intense metadata load and the log device is
fast enough, one CPU could spend all of it's time running callbacks
while other CPUs are creating transactions and writing them to disk.
Since callbacks are an interrupt-time event, a CPU or interrupt
handler could be forever locked up. That, in turn, could result in
other interrupts not being serviced.
I didn't come up with a way to avoid this livelock that wasn't racy,
so instead I chose to let it exist, but to put a warning message in if
the function looped excessively. The count printed is the number of
times that the function has looped over all log buffers and is printed
every 10 loops.
Nothing is wrong with your filesystem.
I'm curious what your system config is: CPU speed, how many CPUs, what
type of disk and controler are under your XFS filesystems. It could
be time to revisit my defintion of "looping excessively" in this
context.
Glen Overby
|