xfs
[Top] [All Lists]

答复: ext4 error

To: Theodore Ts'o <tytso@xxxxxxx>
Subject: 答复: ext4 error
From: Eric Shang <EricShang@xxxxxxxxxxx>
Date: Thu, 14 Apr 2016 06:33:42 +0000
Accept-language: zh-CN, en-US
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>, Toshi Kani <toshi.kani@xxxxxxx>, "akpm@xxxxxxxxxxxxxxxxxxxx" <akpm@xxxxxxxxxxxxxxxxxxxx>, "dan.j.williams@xxxxxxxxx" <dan.j.williams@xxxxxxxxx>, "viro@xxxxxxxxxxxxxxxxxx" <viro@xxxxxxxxxxxxxxxxxx>, "ross.zwisler@xxxxxxxxxxxxxxx" <ross.zwisler@xxxxxxxxxxxxxxx>, "kirill.shutemov@xxxxxxxxxxxxxxx" <kirill.shutemov@xxxxxxxxxxxxxxx>, "david@xxxxxxxxxxxxx" <david@xxxxxxxxxxxxx>, "jack@xxxxxxx" <jack@xxxxxxx>, "adilger.kernel@xxxxxxxxx" <adilger.kernel@xxxxxxxxx>, "linux-nvdimm@xxxxxxxxxxxx" <linux-nvdimm@xxxxxxxxxxx>, "linux-fsdevel@xxxxxxxxxxxxxxx" <linux-fsdevel@xxxxxxxxxxxxxxx>, "linux-ext4@xxxxxxxxxxxxxxx" <linux-ext4@xxxxxxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20160414032207.GC16656@xxxxxxxxx>
References: <0255994B402DE243B1DFC1057A00655201AC6F@xxxxxxxxxxxxxxxxxxxxx> <20160414032207.GC16656@xxxxxxxxx>
Thread-index: AQHRlfzMpqYnb+GHm0aSFaBrEMYRhZ+I+nHA
Thread-topic: ext4 error
Hi Theodore:
        Thanks so much! The issue is very random, but can reproduce may in 
stress for 12 hour, I want to do some file system stress, but need to find out 
the mainly cause or steps of this issue first. I reproduce this panic by the 
process you mentioned with the following steps. But still unknown is this the 
root cause.
        1) force free a inode with used by one file by debugfs freei commond 
(boot from another filesystem from sdcard).
        2) boot the system, create one new file. The file will use the same 
inode which I freed.
        3) deleted the new created file. Then drop the cache by #echo 3 > 
/proc/sys/vm/drop_caches
        4) run #ls and it will cause ext4 error and kernel panic.

    But I still have some questions, because each boot up of the system, will 
run fsck, if the filesystem is corrupted, I think it will fix it. But looks 
like after do the step 1) when boot up fsck can find this free but used inode 
error. Unless I run fsck.ext4 with -f. we often showdown the power directly, 
this can cause dirty inode bit map?
    I check the kernel fs code, it's very complex, when delete a file from 
parent dentry, will use a bh to read the dentry and then delete the entry of 
the file in bh, and then mark the bh dirty. Wait for vm flush this dirty page 
to block device(?). but on the other side parent dentry cache may put in unused 
lru list, which may be freed when memory pressure. Is there any synchronous 
with dirty page flush and dcache flush? Thanks!

Best Regards
EricShang


-----邮件原件-----
发件人: Theodore Ts'o [mailto:tytso@xxxxxxx] 
发送时间: 2016年4月14日 11:22
收件人: Eric Shang
抄送: Matthew Wilcox; Toshi Kani; akpm@xxxxxxxxxxxxxxxxxxxx; 
dan.j.williams@xxxxxxxxx; viro@xxxxxxxxxxxxxxxxxx; 
ross.zwisler@xxxxxxxxxxxxxxx; kirill.shutemov@xxxxxxxxxxxxxxx; 
david@xxxxxxxxxxxxx; jack@xxxxxxx; adilger.kernel@xxxxxxxxx; 
linux-nvdimm@xxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; 
linux-ext4@xxxxxxxxxxxxxxx; xfs@xxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
主题: Re: ext4 error

On Wed, Apr 13, 2016 at 01:44:55PM +0000, Eric Shang wrote:
> HI All:
>   I meet an ext4 error, following is the error log. After panic, I check the 
> emmc by the tool debufs, the inode 69878 i_nlink is not zero. And this inode 
> don't belong to parent dir 6987, it belong to other file(this inode belong to 
> two files when check by debugfs ncheck), I guess than this inode has beed 
> deleted in memory and already used by other file. But the parent dentry 
> buff_head not flush to emmc. But when lookup this dentry can't find it' in 
> dentry cache, and then lookup_real, read the dentry from emmc, get the file 
> inode which already be deleted. 
>   Can any give me some help how to check this issue. My kernel version is 
> 3.18 form Android . I thinks something wrong with dentry cache flush and 
> dirty buff_head flush to emmc. Thanks all!

If I had to guess, this was caused starting with a corrupted file system, where 
the inode allocation bitmap showed that an inode which was in use by the file 
system, was erroneously showing it as free.
This allowed it to be allocated for use in a second file (which would have 
wiped out the contents for the original file stored at that inode).  Later on, 
the file was deleted via either the older or newer pathname, which dropped the 
ref count to zero, and then an access via the other pathname would have 
resulted in this error.

After the panic, the on-disk data structures wouldn't have been updated from 
whatever the in-memory data structures might have been ("Kernel panic - not 
syncing").  So what you see from using debugfs after the crash might not be 
represenatative of what you saw before the crash.

I'm not sure there's much debugging that can be done, because there are any 
number of sources for the original corruption.  It could be caused by a 
hardware issue in the flash or the memory, or it could be caused by a wild 
pointer corrupting a disk buffer, etc. etc.  The panic won't result in a useful 
stack trace because that's when the problem was *noticed*.  But that's very 
different from where the file system corruption was *introduced*.

If you can reliably reproduce this sort of failure, then it becomes possible to 
try to track it down.  But if it's a one-off event, there's not much anyone can 
do.

Best regards,

                                                - Ted
<Prev in Thread] Current Thread [Next in Thread>