xfs
[Top] [All Lists]

Re: Bug#274988: XFS crash in kernel-image-2.6.8-1-686-smp

To: Jan Eringa <jan.eringa@xxxxxxxxxx>, 274988@xxxxxxxxxxxxxxx
Subject: Re: Bug#274988: XFS crash in kernel-image-2.6.8-1-686-smp
From: Horms <horms@xxxxxxxxxx>
Date: Mon, 18 Oct 2004 16:25:08 +0900
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <200410050943.05174.jan.eringa@xxxxxxxxxx>
References: <200410050943.05174.jan.eringa@xxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.6+20040907i
Hi Jan,

Thanks for your bug report. The only xfs patch that is applied to the
Debian kernel is an ioctl patch which is pending a merge upstream, so I
guess this is a bug in the XFS code as present in Linus' tree. I have
CCed the linux-xfs mailing list in the hope that someone there can help
you out.

linux-xfs people, please feel free to either include or not include the
debian bug tracking system on any resulting thread by CCing
274988@xxxxxxxxxxxxxxx as you feel fit. I have attached the one patch
that is applied to the debian 2.6.8 kernel for referance. Please let me
or the bug's address know if there is a resolution as I am not on the
linux-xfs list.

On Tue, Oct 05, 2004 at 09:43:03AM +0100, Jan Eringa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Package: kernel-image-2.6.8-1-686-smp
> Version: 2.6.8-3
> Severity: Important
> 
> Hardware layout
> - ---------------------
> Dual Xeon  + Latest BIOS
> 1 GB ram
> 2 x 3ware SATA raid controllers + Latest Firmware
> 
> All disks live on the 3ware 9xxx controllers
> Controllers provides 3 x 1.5TB raid-5 stripes
> One of which holds /, swap and /var.
> 
> The rest of the free space I've built as a 4.5TB raid-0 stripe
> for the backup volume
> 
> This is then carved into.....
> - 
> ----------------------------------------------------------------------------------------
> backup-srv:~# df -k
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sda1              3937220   2285880   1451336  62% /
> /dev/sda3              3937252   2497220   1240024  67% /var
> /dev/md0             4677217408 3090281796 1586935612  67% /backups
> 
> 
> - 
> ----------------------------------------------------------------------------------------
> 
> 
> The /dev/md0 device is ....
> - 
> ----------------------------------------------------------------------------------------
> backup-srv:~# cat /proc/mdstat
> md0 : active raid0 sdc1[2] sdb1[1] sda4[0]
>       4677348544 blocks 64k chunks
> 
> unused devices: <none>
> 
> - 
> ----------------------------------------------------------------------------------------
> 
> I had to use XFS as this was the only FS that would build that large.
> ext3 seems to barf at anything over 2TB
> 
> 
> 
> Problem Description
> - -------------------------
> This machine is the production backup server for all the *nix machines on
> the network. cron runs rsync via ssh to grab the files from each client target
> The bulk of the systems are backed up weekly and a few daily
> The system seems to survive anywhere between a couple of days to no more
> that 2 weeks under this sort of heavy IO & network loading before 
> giving up the ghost. dmesg dumps follow......
> 
> This problem was also exhibited by 2.6.7 and 2.6.6
> 
> I'm dropping back to 2.4.27 now & will let you know if pain persists
> 
> 
> 
> - From Dmesg
> - 
> ----------------------------------------------------------------------------------------
> Unable to handle kernel paging request at virtual address 20fda90c
>  printing eip:
> f8b26144
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT SMP
> Modules linked in: af_packet ipv6 piix hw_random uhci_hcd usbcore shpchp 
> pciehp pci_hotplug floppy parport_pc parport pcspkr evdev e1000 xfs raid0 md 
> dm_mod ide_cd ide_core cdrom rtc ext3 jbd mbcache sd_mod unix 3w_9xxx 
> scsi_mod
> CPU:    3
> EIP:    0060:[<f8b26144>]    Not tainted
> EFLAGS: 00010213   (2.6.8.20040927)
> EIP is at xfs_ail_insert+0x24/0xd0 [xfs]
> eax: 000003e7   ebx: 00000000   ecx: 000003e7   edx: 00000000
> esi: 20fda904   edi: f7198c18   ebp: c2005168   esp: f7703dd4
> ds: 007b   es: 007b   ss: 0068
> Process xfslogd/3 (pid: 604, threadinfo=f7702000 task=f7cb87d0)
> Stack: 0002050a 0000052a 549b2041 ed9cd202 c2005168 f7198c18 f7198c00 c0f1d30c
>        f8b25e5d f7198c18 c2005168 00000000 c2005168 0002050a 0000052a 00000000
>        c2005168 0002050a 0000052a f8b258bc f7198c00 c2005168 0002050a 0000052a
> Call Trace:
>  [<f8b25e5d>] xfs_trans_update_ail+0x5d/0xf0 [xfs]
>  [<f8b258bc>] xfs_trans_chunk_committed+0x17c/0x240 [xfs]
>  [<f8b2566a>] xfs_trans_committed+0x4a/0x120 [xfs]
>  [<f8b17743>] xlog_state_do_callback+0x2c3/0x3d0 [xfs]
>  [<f8b178d0>] xlog_state_done_syncing+0x80/0xc0 [xfs]
>  [<f8b15fe5>] xlog_iodone+0x55/0xf0 [xfs]
>  [<f8b359bd>] pagebuf_iodone_work+0x4d/0x50 [xfs]
>  [<c0131a26>] worker_thread+0x1f6/0x2e0
>  [<f8b35970>] pagebuf_iodone_work+0x0/0x50 [xfs]
>  [<c011c4f0>] default_wake_function+0x0/0x20
>  [<c011c4f0>] default_wake_function+0x0/0x20
>  [<c0131830>] worker_thread+0x0/0x2e0
>  [<c0135f8a>] kthread+0xba/0xc0
>  [<c0135ed0>] kthread+0x0/0xc0
>  [<c01042c5>] kernel_thread_helper+0x5/0x10
> Code: 8b 46 08 8b 56 0c 89 44 24 08 89 54 24 0c 8b 55 0c 8b 45 08
>  <6>note: xfslogd/3[604] exited with preempt_count 1
> - 
> ----------------------------------------------------------------------------------------
> 
> 
> Machine locks up a little while after this & after a kick in the guts gives
> on next startup....
> - 
> ----------------------------------------------------------------------------------------
> backup-srv:~# mount /backups/
> Oct  4 12:47:03 ouprci05 kernel: Filesystem "md0": XFS internal error 
> xlog_clear_stale_blocks(2) at line 1253 of file fs/xfs/xfs_log_recover.c.  
> Caller 0xf8b28876
> Oct  4 12:47:03 ouprci01 kernel: Filesystem "md0": XFS internal error 
> xlog_clear_stale_blocks(2) at line 1253 of file fs/xfs/xfs_log_recover.c.  
> Caller 0xf8b28876
> mount: Unknown error 990
> - 
> ----------------------------------------------------------------------------------------
> 
> 
> So I try.....
> - 
> ----------------------------------------------------------------------------------------
> backup-srv:~# xfs_repair /dev/md0
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> be replayed.  Mount the filesystem to replay the log, and unmount it before
> re-running xfs_repair.  If you are unable to mount the filesystem, then use
> the -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.
> - 
> ----------------------------------------------------------------------------------------
> 
> 
> So I ....
> - 
> ----------------------------------------------------------------------------------------
> backup-srv:~# xfs_repair -L /dev/md0
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
> ALERT: The filesystem has valuable metadata changes in a log which is being
> destroyed because the -L option was used.
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> LEAFN node level is 1 inode 2820138 bno = 8388608
> 
> entry contains offset out of order in shortform dir 19126020
> corrected entry offsets in directory 19126020
>         - agno = 1
>         - agno = 2
> LEAFN node level is 1 inode 2147942164 bno = 8388608
> LEAFN node level is 1 inode 2148480815 bno = 8388608
> ....
> And so on for a few hours, for  the rest of the 4.5TB file system check to 
> complete :(
> ....
> 
> 
> 
> 
> ________________________________
> It is by caffeine alone I set my mind in motion,
> It is by the beans of Java that thoughts acquire speed,
> The hands acquire shaking, the shaking becomes a warning,
> It is by caffeine alone I set my mind in motion.
> (author unknown)
> with thanks and apologies to Frank Herbert
> ________________________________
> Jan Eringa
> Unix Admin
> Orbian Management Ltd
> ________________________________
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
> 
> iD8DBQFBYl6XX4LWCZ7JjaMRAtH0AJwPIxdCA6xO88hHtJa27qo7UBlG/QCgigGI
> dhtLCXAxPd1W46KbnFMdMcY=
> =nuOo
> -----END PGP SIGNATURE-----
> 

-- 
Horms

Attachment: xfs-ioctl32.dpatch
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Bug#274988: XFS crash in kernel-image-2.6.8-1-686-smp, Horms <=