[Top] [All Lists]

Re: xfs_logprint segfault with external log

To: Alexander Tsvetkov <alexander.tsvetkov@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Subject: Re: xfs_logprint segfault with external log
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Wed, 11 Feb 2015 09:51:54 -0600
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <54DB5E70.80607@xxxxxxxxxx>
References: <54DB5E70.80607@xxxxxxxxxx>
On 2/11/15 7:51 AM, Alexander Tsvetkov wrote:
> Hello,
> I've obtained corrupted xfs log after some sanity xfs testing:
> "log=logfile
> log_size=855
> dd if=/dev/zero "of=$log" bs=4096 count=$log_size
> loopdev=$(losetup -f)
> losetup $loopdev $log
> mkfs.xfs -f -m crc=1 -llogdev=$loopdev,size=${log_size}b $SCRATCH_DEV
> mount -t xfs -ologdev=$loopdev $SCRATCH_DEV $SCRATCH_MNT
> ./fdtree.sh  -l 4 -d 4 -C -o $SCRATCH_MNT
> sync
> umount $SCRATCH_MNT
> xfs_logprint -l $loopdev $SCRATCH_DEV"
> Test makes crc enabled xfs filesystem with the external log of minimal 
> allowed size and then creates on this fs the small directory tree
> with sub directories and files of fixed depth and size with help of fdtree 
> utility: https://computing.llnl.gov/?set=code&page=sio_downloads
> After that xfs_logprint stably reports bad data in log:

TBH, xfs_logprint has always been a little buggy in corners.  It's
a diagnostic/developer tool, and as such has not been made as robust
as tools that users need to use every day.  Still, we'd hope for
no segfaults or errors.  ;)

> "Oper (307): tid: eec9b0c7  len: 16  clientid: TRANS  flags: none
> EXTENTS inode data
> Oper (308): tid: 41000000  len: 805306368  clientid: ERROR  flags: none
> LOCAL attr data
> ============================================================================
> cycle: 1        version: 2              lsn: 1,3138     tail_lsn: 1,2
> length of Log Record: 32256     prev offset: 3074               num ops: 375
> uuid: 39a962b7-4c0d-4e0e-8bcd-39471f93bc1d   format: little endian linux
> h_size: 32768
> ----------------------------------------------------------------------------
> Oper (0): tid: eec9b0c7  len: 48  clientid: TRANS  flags: none
> **********************************************************************
> * ERROR: data block=3138                                              *
> **********************************************************************
> xfs_logprint: unknown log operation type (2e00)
> Bad data in log"

It's probably just mis-parsing something.  i.e. more likely a logprint bug
than an xfs bug.

If you could provide an xfs_metadump of the filesystem at this point, that
would probably be the simplest reproducer for us.  Fixing it may not be the
very highest priority, but I have dug into and fixed logprint bugs in the
past.  It's not very fun.  ;)

> Subsequent call to "xfs_repair -n -l $loopdev $SCRATCH_DEV" passes and 
> filesystem is mounted without errors.
> I've supposed the using of inappropriate log size so updated log_size to 
> default mkfs.xfs value for this device: "log_size=2560".
> After that xfs_logprint core dumped with segfault (race condition):
> "Feb 11 13:55:42 fedora.fedora kernel: xfs_logprint[14007]: segfault at 
> 29f16768 ip 00000000004028ed sp 00007fff61b46850 error 4 in 
> xfs_logprint[400000+4e000]"

a metadump of this filesystem would be useful as well, assuming it reproduces 


<Prev in Thread] Current Thread [Next in Thread>