Hi Jan,
Thanks for your bug report. The only xfs patch that is applied to the
Debian kernel is an ioctl patch which is pending a merge upstream, so I
guess this is a bug in the XFS code as present in Linus' tree. I have
CCed the linux-xfs mailing list in the hope that someone there can help
you out.
linux-xfs people, please feel free to either include or not include the
debian bug tracking system on any resulting thread by CCing
274988@xxxxxxxxxxxxxxx as you feel fit. I have attached the one patch
that is applied to the debian 2.6.8 kernel for referance. Please let me
or the bug's address know if there is a resolution as I am not on the
linux-xfs list.
On Tue, Oct 05, 2004 at 09:43:03AM +0100, Jan Eringa wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Package: kernel-image-2.6.8-1-686-smp
> Version: 2.6.8-3
> Severity: Important
>
> Hardware layout
> - ---------------------
> Dual Xeon + Latest BIOS
> 1 GB ram
> 2 x 3ware SATA raid controllers + Latest Firmware
>
> All disks live on the 3ware 9xxx controllers
> Controllers provides 3 x 1.5TB raid-5 stripes
> One of which holds /, swap and /var.
>
> The rest of the free space I've built as a 4.5TB raid-0 stripe
> for the backup volume
>
> This is then carved into.....
> -
> ----------------------------------------------------------------------------------------
> backup-srv:~# df -k
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/sda1 3937220 2285880 1451336 62% /
> /dev/sda3 3937252 2497220 1240024 67% /var
> /dev/md0 4677217408 3090281796 1586935612 67% /backups
>
>
> -
> ----------------------------------------------------------------------------------------
>
>
> The /dev/md0 device is ....
> -
> ----------------------------------------------------------------------------------------
> backup-srv:~# cat /proc/mdstat
> md0 : active raid0 sdc1[2] sdb1[1] sda4[0]
> 4677348544 blocks 64k chunks
>
> unused devices: <none>
>
> -
> ----------------------------------------------------------------------------------------
>
> I had to use XFS as this was the only FS that would build that large.
> ext3 seems to barf at anything over 2TB
>
>
>
> Problem Description
> - -------------------------
> This machine is the production backup server for all the *nix machines on
> the network. cron runs rsync via ssh to grab the files from each client target
> The bulk of the systems are backed up weekly and a few daily
> The system seems to survive anywhere between a couple of days to no more
> that 2 weeks under this sort of heavy IO & network loading before
> giving up the ghost. dmesg dumps follow......
>
> This problem was also exhibited by 2.6.7 and 2.6.6
>
> I'm dropping back to 2.4.27 now & will let you know if pain persists
>
>
>
> - From Dmesg
> -
> ----------------------------------------------------------------------------------------
> Unable to handle kernel paging request at virtual address 20fda90c
> printing eip:
> f8b26144
> *pde = 00000000
> Oops: 0000 [#1]
> PREEMPT SMP
> Modules linked in: af_packet ipv6 piix hw_random uhci_hcd usbcore shpchp
> pciehp pci_hotplug floppy parport_pc parport pcspkr evdev e1000 xfs raid0 md
> dm_mod ide_cd ide_core cdrom rtc ext3 jbd mbcache sd_mod unix 3w_9xxx
> scsi_mod
> CPU: 3
> EIP: 0060:[<f8b26144>] Not tainted
> EFLAGS: 00010213 (2.6.8.20040927)
> EIP is at xfs_ail_insert+0x24/0xd0 [xfs]
> eax: 000003e7 ebx: 00000000 ecx: 000003e7 edx: 00000000
> esi: 20fda904 edi: f7198c18 ebp: c2005168 esp: f7703dd4
> ds: 007b es: 007b ss: 0068
> Process xfslogd/3 (pid: 604, threadinfo=f7702000 task=f7cb87d0)
> Stack: 0002050a 0000052a 549b2041 ed9cd202 c2005168 f7198c18 f7198c00 c0f1d30c
> f8b25e5d f7198c18 c2005168 00000000 c2005168 0002050a 0000052a 00000000
> c2005168 0002050a 0000052a f8b258bc f7198c00 c2005168 0002050a 0000052a
> Call Trace:
> [<f8b25e5d>] xfs_trans_update_ail+0x5d/0xf0 [xfs]
> [<f8b258bc>] xfs_trans_chunk_committed+0x17c/0x240 [xfs]
> [<f8b2566a>] xfs_trans_committed+0x4a/0x120 [xfs]
> [<f8b17743>] xlog_state_do_callback+0x2c3/0x3d0 [xfs]
> [<f8b178d0>] xlog_state_done_syncing+0x80/0xc0 [xfs]
> [<f8b15fe5>] xlog_iodone+0x55/0xf0 [xfs]
> [<f8b359bd>] pagebuf_iodone_work+0x4d/0x50 [xfs]
> [<c0131a26>] worker_thread+0x1f6/0x2e0
> [<f8b35970>] pagebuf_iodone_work+0x0/0x50 [xfs]
> [<c011c4f0>] default_wake_function+0x0/0x20
> [<c011c4f0>] default_wake_function+0x0/0x20
> [<c0131830>] worker_thread+0x0/0x2e0
> [<c0135f8a>] kthread+0xba/0xc0
> [<c0135ed0>] kthread+0x0/0xc0
> [<c01042c5>] kernel_thread_helper+0x5/0x10
> Code: 8b 46 08 8b 56 0c 89 44 24 08 89 54 24 0c 8b 55 0c 8b 45 08
> <6>note: xfslogd/3[604] exited with preempt_count 1
> -
> ----------------------------------------------------------------------------------------
>
>
> Machine locks up a little while after this & after a kick in the guts gives
> on next startup....
> -
> ----------------------------------------------------------------------------------------
> backup-srv:~# mount /backups/
> Oct 4 12:47:03 ouprci05 kernel: Filesystem "md0": XFS internal error
> xlog_clear_stale_blocks(2) at line 1253 of file fs/xfs/xfs_log_recover.c.
> Caller 0xf8b28876
> Oct 4 12:47:03 ouprci01 kernel: Filesystem "md0": XFS internal error
> xlog_clear_stale_blocks(2) at line 1253 of file fs/xfs/xfs_log_recover.c.
> Caller 0xf8b28876
> mount: Unknown error 990
> -
> ----------------------------------------------------------------------------------------
>
>
> So I try.....
> -
> ----------------------------------------------------------------------------------------
> backup-srv:~# xfs_repair /dev/md0
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> be replayed. Mount the filesystem to replay the log, and unmount it before
> re-running xfs_repair. If you are unable to mount the filesystem, then use
> the -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.
> -
> ----------------------------------------------------------------------------------------
>
>
> So I ....
> -
> ----------------------------------------------------------------------------------------
> backup-srv:~# xfs_repair -L /dev/md0
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> ALERT: The filesystem has valuable metadata changes in a log which is being
> destroyed because the -L option was used.
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> LEAFN node level is 1 inode 2820138 bno = 8388608
>
> entry contains offset out of order in shortform dir 19126020
> corrected entry offsets in directory 19126020
> - agno = 1
> - agno = 2
> LEAFN node level is 1 inode 2147942164 bno = 8388608
> LEAFN node level is 1 inode 2148480815 bno = 8388608
> ....
> And so on for a few hours, for the rest of the 4.5TB file system check to
> complete :(
> ....
>
>
>
>
> ________________________________
> It is by caffeine alone I set my mind in motion,
> It is by the beans of Java that thoughts acquire speed,
> The hands acquire shaking, the shaking becomes a warning,
> It is by caffeine alone I set my mind in motion.
> (author unknown)
> with thanks and apologies to Frank Herbert
> ________________________________
> Jan Eringa
> Unix Admin
> Orbian Management Ltd
> ________________________________
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (GNU/Linux)
>
> iD8DBQFBYl6XX4LWCZ7JjaMRAtH0AJwPIxdCA6xO88hHtJa27qo7UBlG/QCgigGI
> dhtLCXAxPd1W46KbnFMdMcY=
> =nuOo
> -----END PGP SIGNATURE-----
>
--
Horms
xfs-ioctl32.dpatch
Description: Text document
|