[Top] [All Lists]

Re: fc10, some processes stuck in D state

To: yuji_touya@xxxxxxxxxxxxxxxxxxxx
Subject: Re: fc10, some processes stuck in D state
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 6 Jan 2011 16:00:57 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <8529A87D856C184491994079B5F87B68C1A8289FCC@xxxxxxxxxxxxxxxxxxxx>
References: <8529A87D856C184491994079B5F87B68C1A8289FCC@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Thu, Jan 06, 2011 at 01:18:27PM +0900, yuji_touya@xxxxxxxxxxxxxxxxxxxx wrote:
> Hello folks,
> We need to save a bunch of transport-stream(TS) data(4MB/sec, 300GB/day), and
> are using xfs formatted hardware RAID system to save TS data.
> Some processes (pdflush, kswapd, our own services etc) stuck in D-state and
> our system stops saving and down-converting TS data.

Everything is waiting for log space to be freed. Typically a sign
that metadata has not been flushed or that IO completion has not occurred
so the tail is not moving forward.

> It rarely happens (3 times in recent 3 months), but it's quite serious for us.
> How can we avoid this?

What did you change 3 months ago? Or did this always happen?

> One more thing, in that situation when I run "ls /mnt/raid/foo" command, 
> all stuck processes suddenly wake up and continue running. Very strange...
> (/mnt/raid is where we mount xfs)

So doing new read IOs starts stuff moving again? That sounds like an IO
completion has not arrived from the lower layers until a new IO is
issued and completes. Perhaps the hardware RAID is not issuing an
interrupt when it should?

What type of RAID controller/storage hardware are you using? Is it
all running the latest firmware, appropriate drivers, etc?


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>