xfs
[Top] [All Lists]

Re: XFS hangs

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS hangs
From: Amit Sahrawat <amit.sahrawat83@xxxxxxxxx>
Date: Wed, 22 Dec 2010 12:27:51 +0530
Cc: xfs@xxxxxxxxxxx, Eric Sandeen <sandeen@xxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=V73D87RNRSZNVTT8L/MpfsqVgYhBGXReqN7ClFhEB60=; b=HPL8YDkM1eVNrtFidxHQC38XJcCIETUCtVm3L6Txd/mNSQQkdfN3NCs+kpkgBmufLw 1+tUbHac4DnFteNCFNY3Gspf+YCvKrevfaq1Y/n7cp1kUQ40mRGq8nrxuw+FaCHUBZb4 rI4BcX3mYndvC8DiQu+PKx1c45EbfXQOXOcrc=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=opSWrnqpUkS3dSawcdODWX7X4zvPViYUqTfVfUC9Nj8DWAkXFZ/EvxUyTZBa1Ej2ht TQnb3js2ZKnAuSjcbMCgp3gNaMNL1fFwDC8POfI5AcVWMP03nJrm/Sd2m3sG/lnZdg0K iaEXDFDRGzQtVwsswB7GcWMbi+bBaA06tZfXE=
In-reply-to: <AANLkTin44EAmOtYrE4bd==ux4s11_QfDLVd7ROSf_+K3@xxxxxxxxxxxxxx>
References: <AANLkTikOcJA5a6u2YDXVGAEY0CqKiATTF02VeBsSKtU0@xxxxxxxxxxxxxx> <20101222060254.GH4907@dastard> <AANLkTin44EAmOtYrE4bd==ux4s11_QfDLVd7ROSf_+K3@xxxxxxxxxxxxxx>
I tried changing the locking in
 
File : xfs_sync.c
Function : int xfs_quiesce_data(struct xfs_mount *mp)
 /* write superblock and hoover up shutdown errors */
-  error = xfs_sync_fsdata(mp, SYNC_WAIT);
+ error = xfs_sync_fsdata(mp,SYNC_TRYLOCK);
 
This change was just out of curiousity, I am trying to reproduce the hang with this, but didn't observe one in last many iterations.
Also, I am looking at possible side effects for the same change. Please let me know about this.
 
To add to this, the code area in doubt according to me:
fs/xfs/xfs_buf_item.c
Function: void xfs_buf_iodone_callbacks( xfs_buf_t *bp), in this function,
 XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse); xfs_buf_error_relse is registered as callback, which will unlock the lock held, but I really doubt if the callback is getting called. Still analyzing this code area.
 
Please update me if this is the right direction.
 
Thanks & Regards,
Amit Sahrawat
 


 
On Wed, Dec 22, 2010 at 12:11 PM, Amit Sahrawat <amit.sahrawat83@xxxxxxxxx> wrote:
Extremely sorry for inconvenience, will take care about posting complete details in future.
 
Test Case :
cp Complex directory structure(large no of files and directories) to my XFS formatted partition:
cp -ar /LibExe /usb/sda2
Unplug the USB while the COPY is in progress.
 
Storage: USB Flash, USB HDD (Both)
 
Kernel: 2.6.34
Target: MIPS
LOGS:
usb 2-1: USB disconnect, address 7
Device sda2, XFS metadata write error block 0x0 in sda2
xfs_force_shutdown(sda2,0x1) called from line 1004 of file fs/xfs/linux-2.6/xfs_buf.c.  Return address = 0x801cc294
Filesystem "sda2": I/O Error Detected.  Shutting down filesystem: sda2
Please umount the filesystem, and rectify the problem(s)
 
Plug in USB Port1
sd 7:0:0:0: [sdb] Attached SCSI disk
Filesystem "sda2": xfs_log_force: error 5 returned.
Filesystem "sda2": xfs_log_force: error 5 returned.
Filesystem "sda2": xfs_log_force: error 5 returned.
Filesystem "sda2": xfs_log_force: error 5 returned.
INFO: task usb_mount:1858 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
usb_mount        D [84a42440] 8032d62c     0  1858   1816                             (user thread)
Stack : 00000107 00000000 85e7be80 00030002 84a425c8 8032d62c 7fffffff 84a42440
        00000002 8496e200 00000001 00000000 85e7bf00 85e7bef8 7fa2f2e0 8032d62c
        00000001 801d69a8 85e7bd40 801d6b34 85e7bd4c 8032dc6c 00000000 801dbc80
        85e7be80 864315a8 8662c980 00000001 00000742 00000000 00000000 84b85800
        85e7bd90 801d6cc0 7fffffff 84a42440 00000002 8032ee74 00000081 804158a0
        ...
Call Trace:
[<8032d574>] __schedule+0x618/0x6b8 from[<8032d62c>] schedule+0x18/0x3c
[<8032d62c>] schedule+0x18/0x3c from[<8032dc6c>] schedule_timeout+0x2c/0x1c0
[<8032dc6c>] schedule_timeout+0x2c/0x1c0 from[<8032ee74>] __down+0x8c/0xdc
[<8032ee74>] __down+0x8c/0xdc from[<8004500c>] down+0x40/0x88
[<8004500c>] down+0x40/0x88 from[<801ca838>] xfs_buf_lock+0xcc/0x15c
[<801ca838>] xfs_buf_lock+0xcc/0x15c from[<801b71a0>] xfs_getsb+0x38/0x54
[<801b71a0>] xfs_getsb+0x38/0x54 from[<801d64a8>] xfs_sync_fsdata+0x7c/0x154
[<801d64a8>] xfs_sync_fsdata+0x7c/0x154 from[<801d7284>] xfs_quiesce_data+0x34/0x60
[<801d7284>] xfs_quiesce_data+0x34/0x60 from[<801d3514>] xfs_fs_sync_fs+0x30/0xec
[<801d3514>] xfs_fs_sync_fs+0x30/0xec from[<800ba09c>] __fsync_super+0xa4/0xc8
[<800ba09c>] __fsync_super+0xa4/0xc8 from[<800ba0d4>] fsync_super+0x14/0x28
[<800ba0d4>] fsync_super+0x14/0x28 from[<800ba4a0>] generic_shutdown_super+0x34/0x190
[<800ba4a0>] generic_shutdown_super+0x34/0x190 from[<800ba654>] kill_block_super+0x58/0x80
[<800ba654>] kill_block_super+0x58/0x80 from[<800bac6c>] deactivate_super+0x7c/0x110
[<800bac6c>] deactivate_super+0x7c/0x110 from[<800d2bbc>] sys_umount+0x310/0x358
[<800d2bbc>] sys_umount+0x310/0x358 from[<8000ff44>] stack_done+0x20/0x3c
-------------------------------------------------------------------------------------
Filesystem "sda2": xfs_log_force: error 5 returned.

Please let me know in case more information is needed.
 
Thanks & Regards,
Amit Sahrawat
On Wed, Dec 22, 2010 at 11:32 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Wed, Dec 22, 2010 at 11:05:26AM +0530, Amit Sahrawat wrote:
> Hi,
> I am encountering hang of XFS filesystem, please find the logs as given
> below:
> INFO: task usb_mount:1858 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> usb_mount        D [84a42440] 8032d62c     0  1858
> 1816                             (user thread)
> Stack : 00000107 00000000 85e7be80 00030002 84a425c8 8032d62c 7fffffff
> 84a42440
>         00000002 8496e200 00000001 00000000 85e7bf00 85e7bef8 7fa2f2e0
> 8032d62c
>         00000001 801d69a8 85e7bd40 801d6b34 85e7bd4c 8032dc6c 00000000
> 801dbc80
>         85e7be80 864315a8 8662c980 00000001 00000742 00000000 00000000
> 84b85800
>         85e7bd90 801d6cc0 7fffffff 84a42440 00000002 8032ee74 00000081
> 804158a0
>         ...
> Call Trace:
> [<8032d574>] __schedule+0x618/0x6b8 from[<8032d62c>] schedule+0x18/0x3c
> [<8032d62c>] schedule+0x18/0x3c from[<8032dc6c>] schedule_timeout+0x2c/0x1c0
> [<8032dc6c>] schedule_timeout+0x2c/0x1c0 from[<8032ee74>] __down+0x8c/0xdc
> [<8032ee74>] __down+0x8c/0xdc from[<8004500c>] down+0x40/0x88
> [<8004500c>] down+0x40/0x88 from[<801ca838>] xfs_buf_lock+0xcc/0x15c
> [<801ca838>] xfs_buf_lock+0xcc/0x15c from[<801b71a0>] xfs_getsb+0x38/0x54
> [<801b71a0>] xfs_getsb+0x38/0x54 from[<801d64a8>] xfs_sync_fsdata+0x7c/0x154
> [<801d64a8>] xfs_sync_fsdata+0x7c/0x154 from[<801d7284>]
> xfs_quiesce_data+0x34/0x60
> [<801d7284>] xfs_quiesce_data+0x34/0x60 from[<801d3514>]
> xfs_fs_sync_fs+0x30/0xec
> [<801d3514>] xfs_fs_sync_fs+0x30/0xec from[<800ba09c>]
> __fsync_super+0xa4/0xc8
> [<800ba09c>] __fsync_super+0xa4/0xc8 from[<800ba0d4>] fsync_super+0x14/0x28
> [<800ba0d4>] fsync_super+0x14/0x28 from[<800ba4a0>]
> generic_shutdown_super+0x34/0x190
> [<800ba4a0>] generic_shutdown_super+0x34/0x190 from[<800ba654>]
> kill_block_super+0x58/0x80
> [<800ba654>] kill_block_super+0x58/0x80 from[<800bac6c>]
> deactivate_super+0x7c/0x110
> [<800bac6c>] deactivate_super+0x7c/0x110 from[<800d2bbc>]
> sys_umount+0x310/0x358
> [<800d2bbc>] sys_umount+0x310/0x358 from[<8000ff44>] stack_done+0x20/0x3c

Please make sure you paste stack traces cleanly in your emails so we
can read them easily.
--
> After reboot it works fine, but during this state XFS does not works no
> operation.

What kernel? What did you do to produce the error? What is the output
of "echo w > /proc/sysrq-trigger"? Do you have a repeatable test
case? What sort of storage are you using? Were there any IO errors
before the hang? etc, etc, etc....

--

For future reference, when you are reporting a problem you need to
be specific about what you were doing to cause the problem you are
reporting.  Describe your kernel, your storage, your test case, any
errors that occurred before the problem you are reporting, etc.

We need this information to make any sense of your bug report, but
I'm getting tired of having to ask for it every time you report a
problem. The more information you put in your bug report, the more
likely we are to be able to help you. We don't have unlimited
amounts of time (or patience) to drag all the basic details of your
problem out of you over 3 or 4 emails, so including it up front will
help a lot....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx


<Prev in Thread] Current Thread [Next in Thread>