xfs
[Top] [All Lists]

Re: Corruption of in-memory data detected.

To: Jonathan Dill <dill@xxxxxxxxxxxx>
Subject: Re: Corruption of in-memory data detected.
From: Steve Lord <lord@xxxxxxx>
Date: Wed, 24 Oct 2001 09:58:58 -0500
Cc: Marc Schmitt <schmitt@xxxxxxxxxxx>, Steve Lord <lord@xxxxxxx>, Eric Sandeen <sandeen@xxxxxxx>, linux-xfs@xxxxxxxxxxx, florin@xxxxxxx
In-reply-to: Message from Jonathan Dill <dill@xxxxxxxxxxxx> of "Wed, 24 Oct 2001 08:56:57 EDT." <3BD6BA99.4CAE0D71@xxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
In this case it looks very like the start of the filesystem got wacked
with a bunch of zeros somehow. The file size limit is somewhat odd,
there is nothing in xfs which will prevent a file from being extremely
large - 2^44 is about where issues would start for buffered I/O. Possibly
the size issue is is an interaction between the vfs in the ac kernel
and xfs - we will run some tests on this.

Your second email about xfs_repair finding everything does not tie in
with this output at all, you lost the root inode in this case.

So far I have been unable to make things fall over here at all - which
is frustrating.

Steve

> This is a multi-part message in MIME format.
> --------------8D5B87496141DA5BD63B7EAF
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> 
> Hi guys,
> 
> I just managed to get this error and severe fs corruption without RAID,
> mongo, huge filesystem, or anything weird.  (BTW I'm a bleeding edge
> kind of guy, so the fs wasn't critical and I've got backups :-).  This
> was with the kernel-2.4.9-6SGI_XFS_PR1, and I'm not using any of the
> modules that had symbol problems.
> 
> Initially, I was trying to xfsdump and gzip a whole filesystem to an xfs
> on another disk and I got "file size limit exceeded" and "core dumped." 
> So I said, "OK so what's the max filesize?"  I thought it was pretty
> high for XFS, but apparently not for Linux XFS--I was backing up a ~6 GB
> partition, so the file size had to be less than that.  I didn't find any
> clues in a  cursory glance at xfs.h, so I decided to test it in a
> not-so-nice way:
> 
> dd if=/dev/zero of=test_size.img bs=10240k
> 
> The process choked and this is what turned up in the system log:
> 
> Oct 24 07:47:08 localhost kernel: xfs_force_shutdown(ide1(22,65),0x8)
> called from line 1120 of file xfs_trans.c.  Return address = 0xc01ca409
> Oct 24 07:47:08 localhost kernel: Corruption of in-memory data
> detected.  Shutting down filesystem: ide1(22,65)
> Oct 24 07:47:08 localhost kernel: Please umount the filesystem, and
> rectify the problem(s)
> 
> When I tried to mount the disk again, I got this error:
> 
> Oct 24 07:47:30 localhost kernel: XFS: bad magic number
> Oct 24 07:47:30 localhost kernel: XFS: SB validate failed
> 
> xfs_repair had to search for quite awhile to find a good alternate SB. 
> I attached the log of xfs_repair so you can see I did a capital job of
> trashing the FS :-)  Later, I'll try it again to see if I can reproduce
> the problem, then again with the newer 2.4.9 kernel.
> 
> -- 
> "Jonathan F. Dill" (dill@xxxxxxxxxxxx)
> --------------8D5B87496141DA5BD63B7EAF
> Content-Type: text/plain; charset=iso-8859-1;
>  name="xfs_repair.log"
> Content-Transfer-Encoding: quoted-printable
> Content-Disposition: inline;
>  filename="xfs_repair.log"
> 
> [root@localhost ~]# mount /trans
> mount: wrong fs type, bad option, bad superblock on /dev/hdd1,
>        or too many mounted file systems
> [root@localhost ~]# xfs_repair /dev/hdd1
> xfs_repair: warning - cannot set blocksize on block device /dev/hdd1: Inp=
> ut/output error
> Phase 1 - find and verify superblock...
> bad primary superblock - bad magic number !!!
> 
> attempting to find secondary superblock...
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E......................................................................=
> =2E..............found candidate secondary superblock...
> verified secondary superblock...
> writing modified primary superblock
> sb root inode value 18446744073709551615 inconsistent with calculated val=
> ue 13835049396628095104
> resetting superblock root inode pointer to 18446744069414584448
> sb realtime bitmap inode 18446744073709551615 inconsistent with calculate=
> d value 13835049396628095105
> resetting superblock realtime bitmap ino pointer to 18446744069414584449
> sb realtime summary inode 18446744073709551615 inconsistent with calculat=
> ed value 13835049396628095106
> resetting superblock realtime summary ino pointer to 18446744069414584450=
> 
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
> bad magic # 0x0 for agf 0
> bad version # 0 for agf 0
> bad length 0 for agf 0, should be 262144
> bad magic # 0x0 for agi 0
> bad version # 0 for agi 0
> bad length # 0 for agi 0, should be 262144
> reset bad agf for ag 0
> reset bad agi for ag 0
> bad agbno 0 for btbno root, agno 0
> bad agbno 0 for btbcnt root, agno 0
> bad agbno 0 for inobt root, agno 0
> root inode chunk not found
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
> error following ag 0 unlinked list
>         - process known inodes and perform inode discovery...
>         - agno =3D 0
> imap claims in-use inode 131 is free, correcting imap
> imap claims in-use inode 132 is free, correcting imap
> imap claims in-use inode 133 is free, correcting imap
> imap claims in-use inode 134 is free, correcting imap
> imap claims in-use inode 135 is free, correcting imap
> imap claims in-use inode 136 is free, correcting imap
> imap claims in-use inode 137 is free, correcting imap
> imap claims in-use inode 141 is free, correcting imap
>         - agno =3D 1
>         - agno =3D 2
>         - agno =3D 3
>         - agno =3D 4
>         - agno =3D 5
>         - agno =3D 6
>         - agno =3D 7
>         - agno =3D 8
>         - agno =3D 9
>         - agno =3D 10
>         - agno =3D 11
>         - agno =3D 12
>         - agno =3D 13
>         - agno =3D 14
>         - agno =3D 15
>         - agno =3D 16
>         - agno =3D 17
>         - agno =3D 18
>         - agno =3D 19
>         - agno =3D 20
>         - agno =3D 21
>         - agno =3D 22
>         - agno =3D 23
>         - agno =3D 24
>         - agno =3D 25
>         - agno =3D 26
>         - agno =3D 27
>         - agno =3D 28
>         - agno =3D 29
>         - agno =3D 30
>         - agno =3D 31
>         - agno =3D 32
>         - agno =3D 33
>         - agno =3D 34
>         - agno =3D 35
>         - agno =3D 36
>         - agno =3D 37
>         - process newly discovered inodes...
> imap claims in-use inode 929 is free, correcting imap
> =2E..snip...
> imap claims in-use inode 991 is free, correcting imap
> imap claims in-use inode 4176897 is free, correcting imap
> =2E..snip...
> imap claims in-use inode 4176959 is free, correcting imap
> found inodes not in the inode allocation tree
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - clear lost+found (if it exists) ...
>         - check for inodes claiming duplicate blocks...
>         - agno =3D 0
> entry "test_size.img" at block 0 offset 1024 in directory inode 136 refer=
> ences free inode 138
>       clearing inode number in entry at offset 1024...
>         - agno =3D 1
>         - agno =3D 2
>         - agno =3D 3
>         - agno =3D 4
>         - agno =3D 5
>         - agno =3D 6
>         - agno =3D 7
>         - agno =3D 8
>         - agno =3D 9
>         - agno =3D 10
>         - agno =3D 11
>         - agno =3D 12
>         - agno =3D 13
>         - agno =3D 14
>         - agno =3D 15
>         - agno =3D 16
>         - agno =3D 17
>         - agno =3D 18
>         - agno =3D 19
>         - agno =3D 20
>         - agno =3D 21
>         - agno =3D 22
>         - agno =3D 23
>         - agno =3D 24
>         - agno =3D 25
>         - agno =3D 26
>         - agno =3D 27
>         - agno =3D 28
>         - agno =3D 29
>         - agno =3D 30
>         - agno =3D 31
>         - agno =3D 32
>         - agno =3D 33
>         - agno =3D 34
>         - agno =3D 35
>         - agno =3D 36
>         - agno =3D 37
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - ensuring existence of lost+found directory
>         - traversing filesystem starting at / ... =
> 
> rebuilding directory inode 136
>         - traversal finished ... =
> 
>         - traversing all unattached subtrees ... =
> 
>         - traversals finished ... =
> 
>         - moving disconnected inodes to lost+found ... =
> 
> Phase 7 - verify and correct link counts...
> Note - stripe unit (0) and width (0) fields have been reset.
> Please set with mount -o sunit=3D<value>,swidth=3D<value>
> done
> 
> --------------8D5B87496141DA5BD63B7EAF--



<Prev in Thread] Current Thread [Next in Thread>