xfs
[Top] [All Lists]

Re: xfs resize: primary superblock is not updated immediately

To: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
Subject: Re: xfs resize: primary superblock is not updated immediately
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 1 Mar 2016 08:16:28 +1100
Cc: xfs@xxxxxxxxxxx, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Danny Shavit <danny@xxxxxxxxxxxxxxxxx>, Yair Hershko <yair@xxxxxxxxxxxxxxxxx>, Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <BC0CC25E00CE4CEDA1FFDFC0A2F38742@alyakaslap>
References: <3685DFAD20214109878873CF81232704@alyakaslap> <20160222212019.GI25832@dastard> <CAOcd+r1XY2kcp+qJ=mPOAQSmb90QUnLfmT3-FkMjQN_+Ejmt8A@xxxxxxxxxxxxxx> <20160222235628.GK25832@dastard> <D5578D3E98EE48B88EA9A1126B671155@alyakaslap> <20160223225926.GN25832@dastard> <BC0CC25E00CE4CEDA1FFDFC0A2F38742@alyakaslap>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Feb 29, 2016 at 11:47:54AM +0200, Alex Lyakas wrote:
> Hello Dave,
> I have tried the same scenario with the 4.5 kernel from about a week
> ago, latest commit being [1].
> The same crash is happening, stack trace being [2].
> 
> I am not proficient with xfstests, unfortunately. I tried running
> them several times, but I am not sure I was doing that properly.
> 
> As for your question why the "block beyond end of the filesystem
> fails". I tried to debug it further and added a print into
> _xfs_buf_find. What happens is that at some point, the sb_dblocks
> value is updated to the new value, but the in-memory pag object is
> not created. So the test:
> 
> eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
> if (blkno < 0 || blkno >= eofs) {
> ...
> 
> still holds, but the needed pag does not exist.
> 
> Here are the results of the prints that I added:
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.546542] SGI XFS with
> ACLs, security attributes, realtime, no debug enabled
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.551243] XFS (dm-0):
> Mounting V4 Filesystem
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.576677] XFS (dm-0):
> Starting recovery (logdev: internal)
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577866]
> _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577872]
> _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.577882]
> _xfs_buf_find: blkno=0 eofs=200704 >m_sb.sb_dblocks=25088
> ===> Here we start seeing the new value of "sb_dblocks" and hence
> the new value of "eofs":

We shouldn't see sb_dblocks change until log recovery completes the
first phase of log recovery and the in-core superblock is re-read
from disk in xlog_do_recover().

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606796]
> _xfs_buf_find: blkno=1 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606804]
> _xfs_buf_find: blkno=1 eofs=204800 >m_sb.sb_dblocks=25600

looking up AGF 0.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606988]
> _xfs_buf_find: blkno=2 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.606991]
> _xfs_buf_find: blkno=2 eofs=204800 >m_sb.sb_dblocks=25600

Now AGI 0.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607205]
> _xfs_buf_find: blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607210]
> _xfs_buf_find: blkno=50177 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607375]
> _xfs_buf_find: blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607378]
> _xfs_buf_find: blkno=50178 eofs=204800 >m_sb.sb_dblocks=25600

AGF/AGI 1.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607512]
> _xfs_buf_find: blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607515]
> _xfs_buf_find: blkno=100353 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607652]
> _xfs_buf_find: blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607655]
> _xfs_buf_find: blkno=100354 eofs=204800 >m_sb.sb_dblocks=25600

AGF/AGI 2.

> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607829]
> _xfs_buf_find: blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607832]
> _xfs_buf_find: blkno=150529 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607980]
> _xfs_buf_find: blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.607982]
> _xfs_buf_find: blkno=150530 eofs=204800 >m_sb.sb_dblocks=25600
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.608073]

AGF/AGI 4.

> _xfs_buf_find: blkno=200705 eofs=204800 >m_sb.sb_dblocks=25600
> ===> and here we crash, but as you see, blkno is valid WRT eofs value.
> Feb 29 11:47:09 vc-00-00-350-dev kernel: [   91.608120] BUG: unable
> to handle kernel NULL pointer dereference at 0000000000000098

AGF 5 goes splat.

Which means it's through the first phase of log recovery and it's
not failing in log recovery. i.e. we are now running
xfs_initialize_perag_data() after log recovery. So, as I said a
couple of posts back up this thread:

| If log recovery succeeds, then yes, I can see that there is a
| problem here because the per-ag tree is not reinitialised after
| the superblock is re-read. That's a pretty easy fix, though (3-4
| lines of code in xlog_do_recover() to detect a change in
| filesystem block count and call xfs_initialize_perag() again..

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>