[Top] [All Lists]

Re: xfs deadlock in stable kernel 3.0.4

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: xfs deadlock in stable kernel 3.0.4
From: Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>
Date: Tue, 20 Sep 2011 19:23:00 +0200
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, "xfs-masters@xxxxxxxxxxx" <xfs-masters@xxxxxxxxxxx>, aelder@xxxxxxx, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <20110920160226.GA25542@xxxxxxxxxxxxx>
References: <C6515E45-5724-43DD-95A8-1F89AFE29601@xxxxxxxxxxxx> <20110912200543.GA22409@xxxxxxxxxxxxx> <4E6EF274.7050007@xxxxxxxxxxxx> <20110913205018.GA8543@xxxxxxxxxxxxx> <4E70571A.80108@xxxxxxxxxxxx> <4E705C42.6020909@xxxxxxxxxxxx> <20110914143005.GA28496@xxxxxxxxxxxxx> <4E75B660.1030502@xxxxxxxxxxxx> <20110918230245.GF15688@dastard> <4E78665E.8030409@xxxxxxxxxxxx> <20110920160226.GA25542@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2
Can you summarize all the data that we gather over this thread into one
summary, e.g.
Yes - hope it helps.

  - what kernel does it happens?  Seems like 3.0 and 3.1 hit it easily,
    2.6.38 some times, 2.6.32 is fine.  Did you test anything between
    2.6.32 and 2.6.38?
Hits very easily: 3.0.4 and 3.1-rc5
Very rare: 2.6.38 - as it happened only some times i cannot 100% guarantee that it is really the same issue
No issues at all: 2.6.32

I've not tested anything between 2.6.32 as i cannot reproduce it under 2.6.38 at all - seen once a week of 500.

  - what hardware hits it often/sometimes/never?
I've seen this only on multi core CPUs with > 2.8Ghz and fast SAS Raid 10 or SSD. I cannot say if it's the CPU or the fast disks - as our low cost systems have only small CPUs and the high end ones have big cpus with fast disks.

  - what is the fs geometry?
What do you exactly mean? I've seen this on 1TB and 160GB SSD devices with totally different disk layout.

  - what is the hardware?
see above

  - is this a 32 or 64-bit kernel, or do you run both?
always 64bit

I'm pretty sure most got posted somewhere, but let's get a summary
as things was a bit confusing sometimes.
no problem

Note that 2.6.38 moved the whole log grant code to a lockless algorithm,
so this might be a likely culprit if you're managing to hit race windows
no one else does, i.e. this really is a timing issue.
I'm nearly willing todo anything to solve this. What can i do to help. My last hope from today was to get some code lines with kgdb - sadly it does not happen at all when kgdb is attached ;-(


<Prev in Thread] Current Thread [Next in Thread>