netdev
[Top] [All Lists]

Re: Fw: [Bugme-new] [Bug 4628] New: Test server hang while running rhr (

To: Jian Jun He <hejianj@xxxxxxxxxx>
Subject: Re: Fw: [Bugme-new] [Bug 4628] New: Test server hang while running rhr (network) test on RHEL4 with kernel 2.6.12-rc1-mm4
From: Andrew Morton <akpm@xxxxxxxx>
Date: Thu, 26 May 2005 13:31:23 -0700
Cc: ganesh.venkatesan@xxxxxxxxx, anton@xxxxxxxxx, rende@xxxxxxxxxx, ganesh.venkatesan@xxxxxxxxx, herbert@xxxxxxxxxxxxxxxxxxx, jesse.brandeburg@xxxxxxxxx, jgarzik@xxxxxxxxx, wangjs@xxxxxxxxxx, john.ronciak@xxxxxxxxx, cdlwangl@xxxxxxxxxx, linuxppc64-dev@xxxxxxxxxxxxxxxxxxxxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <OFA06E9B66.53E35770-ON4825700D.00585BAF-4825700D.00589E4E@xxxxxxxxxx>
References: <468F3FDA28AA87429AD807992E22D07E056B8C25@orsmsx408> <OFA06E9B66.53E35770-ON4825700D.00585BAF-4825700D.00589E4E@xxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
Jian Jun He <hejianj@xxxxxxxxxx> wrote:
>
>  2. Download rhr2-rhel4-1.0-14a.noarch.rpm from rhn.redhat.com and install
>  it on
>  the test machine.
>  3. Configure and run the rhr test via invoking redhat-ready.

This is the problematic bit.

- Please provide a full URL which can be used to obtain rhr. 
  rhn.redhat.com is subscription-based.

- Please describe the hardware setup - surely the test requires at least
  two machines.  How are they configured?

- Provide an exact transcript of the commands which are to be used.  Is
  it just 

        redhat-ready

  with no arguments?



All that begin said, we already have a quite specific diagnosis via code
inspection, from Herbert:


Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Andrew Morton <akpm@xxxxxxxx> wrote:
> >
> > Might be a bug in the e100 driver, might not be.
> >
> > I assume this is the
> >
> >        BUG_ON(skb->list != NULL);
>
> It certainly is a bug in e100.
>
> e100_tx_timeout -> e100_down -> e100_rx_clean_list
>
> is racing against
>
> e100_poll -> e100_rx_clean -> e100_rx_indicate
>
> e100_rx_clean/e100_rx_indicate takes an skb off the RX ring and
> while it's being processed e100_rx_clean_list comes along and
> frees it.
>
> From a quick check similar problems may exist in other drivers that
> have lockless ->poll() functions with RX rings.

Do the e100 maintainers agree with this diagnosis?  If so then more testing
isn't required at this stage - the next step is to fix the above bug, no?


<Prev in Thread] Current Thread [Next in Thread>