xfs
[Top] [All Lists]

Bug#713994: marked as done (xfsprogs: Failure to allocate new space, pot

To: AnÃbal Monsalve Salazar <anibal@xxxxxxxxxx>
Subject: Bug#713994: marked as done (xfsprogs: Failure to allocate new space, potentially after an xfs_grow, to existing files)
From: owner@xxxxxxxxxxxxxxx (Debian Bug Tracking System)
Date: Mon, 26 May 2014 10:57:17 +0000
Delivered-to: xfs@xxxxxxxxxxx
References: <20140526105428.GA1589@xxxxxxxxxxxxxxxxx> <CAMjbrKpGsyWCca_VX5yXpen5V6zbT_QxS2JpcR6B=XNUaHipZw@xxxxxxxxxxxxxx>
Sender: Debian BTS <debbugs@xxxxxxxxxxxxxxxxxxxx>
Your message dated Mon, 26 May 2014 20:54:28 +1000
with message-id <20140526105428.GA1589@xxxxxxxxxxxxxxxxx>
and subject line Re: Bug#713994: XFS repair
has caused the Debian Bug report #713994,
regarding xfsprogs: Failure to allocate new space, potentially after an 
xfs_grow, to existing files
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@xxxxxxxxxxxxxxx
immediately.)


-- 
713994: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=713994
Debian Bug Tracking System
Contact owner@xxxxxxxxxxxxxxx with problems
--- Begin Message ---
To: submit@xxxxxxxxxxxxxxx
Subject: xfsprogs: Failure to allocate new space, potentially after an xfs_grow, to existing files
From: Wes Deviers <yhetti@xxxxxxxxx>
Date: Mon, 24 Jun 2013 12:32:56 -0400
Delivered-to: submit@xxxxxxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=u5HgIL9wE0opCl1/+kmOC7bRDBAc9+ARcFpfbayKSvc=; b=S09N60hC1weLoygqwUzhauRM+zpavSMq3GxnhePlNUixgDXC7u0QHfoLy2J4GAPkJG GH+zUzDgjWCgR8bIk5FYeMhw+eGxoGpVTriLajsfqESOiNpiczIHt9xbhCK3yoFRukOd PRSebgrEmsQMDruMOwB+Bb1Y9mtDB0sMDJfnWD2iDX/hEZzXZz+3h3XzkQV2CMpFNsPu Ofay4FUAu93EoLD3EihSoXgz/jjcCEu0YKAgAIbbbLeOFkmCMUjM3lLqf9Y49sRjqGqe Gq76aAwQpBvhqjamgxN/WWyGNZaIrSXwjZ02eNw6W6BGSZTH1xR014xbscToHlEbulkg apgw==
Package: xfsprogs
Version: 3.1.4
Justification: breaks unrelated software
Severity: critical
Tags: upstream

*** Please type your report below this line ***

As part of my standard virtual machine template for MySQL, I created an LVM setup as follows:
pvcreate /dev/sdb
vgcreate vg0 /dev/sdb
lvcreate -n mysql -L 10G vg0
lvcreate -n mysqlbinlog -L 5G vg0
lvcreate -n mysqllog -L 5G vg0

Then, after I deploy the template, I resize the LVs and xfs_growfs them, apt-get upgrade, and restart.

I also frequently (multiple times per week) extend MySQL data volumes on the fly using
echo 1 > /sys/block/sdb/device/rescane
pvresize /dev/sdb
lvresize -l +100%FREE /dev/vg0/mysql
xfs_growfs /mysql

This has worked hundreds of times.  Twice, though, on Debian 6.x (I don't have enough 7.x boxes in production yet) I've had something happen with XFS.  Once, after an extend, it was unmountable. I assumed this was an error on my part, created the FS, and moved on.

Today, I got an SQL error on a MySQL replication box:

 Error 'The table 'XXXXX' is full' on query. Default database: XXXXXYYYYYY

Normally this would be a configuration issue in MySQL related to a setting called "innodb_data_file_path".  Mine is configured correctly; otherwise all of my templates would have died years ago.  On this particular host, I extended /mysql (the data volume) to 120G before reboot.  Today, when it hit what appears to be the *original* size:

/dev/mapper/vg0-mysql         
            125814784  10473148 115341636   9% /mysql

XFS threw allocation errors.

[4051547.986869] ffff880638219200: 58 41 47 46 00 00 00 01 00 00 00 0c 00 05 00 00  XAGF............
[4051547.987636] Filesystem "dm-2": XFS internal error xfs_alloc_read_agf at line 2293 of file /build/buildd-linux-2.6_2.6.32-48squeeze1-amd64-qu4MIV/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa021fcb0
[4051547.987639]
[4051547.988226] Pid: 3367, comm: mysqld Not tainted 2.6.32-5-amd64 #1
[4051547.988228] Call Trace:
[4051547.988287]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4051547.988298]  [<ffffffffa021fc73>] ? xfs_read_agf+0x12e/0x149 [xfs]
[4051547.988307]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4051547.988317]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4051547.988331]  [<ffffffffa021fd45>] ? xfs_alloc_pagf_init+0x13/0x34 [xfs]
[4051547.988342]  [<ffffffffa022c3f1>] ? xfs_inode_is_filestream+0x24/0x51 [xfs]
[4051547.988353]  [<ffffffffa022c67d>] ? xfs_bmap_btalloc+0x25f/0x732 [xfs]
[4051547.988368]  [<ffffffffa0228b70>] ? xfs_bmap_search_extents+0x57/0xd1 [xfs]
[4051547.988379]  [<ffffffffa022ee83>] ? xfs_bmapi+0x876/0x104d [xfs]
[4051547.988386]  [<ffffffff8118eec3>] ? _atomic_dec_and_lock+0x33/0x50
[4051547.988406]  [<ffffffffa0257217>] ? xfs_trans_reserve+0xc8/0x19d [xfs]
[4051547.988419]  [<ffffffffa024a51f>] ? xfs_iomap_write_direct+0x305/0x3f5 [xfs]
[4051547.988432]  [<ffffffffa024a7bc>] ? xfs_iomap+0x1ad/0x285 [xfs]
[4051547.988444]  [<ffffffffa025db53>] ? __xfs_get_blocks+0x6c/0x15c [xfs]
[4051547.988450]  [<ffffffff81062311>] ? queue_work_on+0x45/0x4f
[4051547.988455]  [<ffffffff8111442a>] ? __blockdev_direct_IO+0x575/0xa60
[4051547.988461]  [<ffffffff810482ef>] ? finish_task_switch+0x3a/0xaf
[4051547.988472]  [<ffffffffa025d983>] ? xfs_vm_direct_IO+0xd4/0x102 [xfs]
[4051547.988483]  [<ffffffffa025dc43>] ? xfs_get_blocks_direct+0x0/0x11 [xfs]
[4051547.988494]  [<ffffffffa025d791>] ? xfs_end_io_direct+0x0/0x5f [xfs]
[4051547.988499]  [<ffffffff810b55d6>] ? generic_file_direct_write+0xdd/0x152
[4051547.988510]  [<ffffffffa0264532>] ? xfs_write+0x44f/0x6ea [xfs]
[4051547.988523]  [<ffffffffa02442ca>] ? xfs_iunlock+0x31/0x7c [xfs]
[4051547.988528]  [<ffffffff810ef21e>] ? do_sync_write+0xce/0x113
[4051547.988532]  [<ffffffff810652b2>] ? autoremove_wake_function+0x0/0x2e
[4051547.988535]  [<ffffffff810482ef>] ? finish_task_switch+0x3a/0xaf
[4051547.988538]  [<ffffffff810efb70>] ? vfs_write+0xa9/0x102
[4051547.988541]  [<ffffffff810efc20>] ? sys_pwrite64+0x57/0x77
[4051547.988545]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[4053278.621482] ffff880638219200: 58 41 47 46 00 00 00 01 00 00 00 0c 00 05 00 00  XAGF............
[4053278.621761] Filesystem "dm-2": XFS internal error xfs_alloc_read_agf at line 2293 of file /build/buildd-linux-2.6_2.6.32-48squeeze1-amd64-qu4MIV/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa021fcb0
[4053278.621764]
[4053278.622352] Pid: 3367, comm: mysqld Not tainted 2.6.32-5-amd64 #1
[4053278.622354] Call Trace:
[4053278.622381]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4053278.622392]  [<ffffffffa021fc73>] ? xfs_read_agf+0x12e/0x149 [xfs]
[4053278.622401]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4053278.622411]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4053278.622421]  [<ffffffffa021fd45>] ? xfs_alloc_pagf_init+0x13/0x34 [xfs]
[4053278.622432]  [<ffffffffa022c3f1>] ? xfs_inode_is_filestream+0x24/0x51 [xfs]
[4053278.622442]  [<ffffffffa022c67d>] ? xfs_bmap_btalloc+0x25f/0x732 [xfs]
[4053278.622453]  [<ffffffffa0228b70>] ? xfs_bmap_search_extents+0x57/0xd1 [xfs]
[4053278.622465]  [<ffffffffa022ee83>] ? xfs_bmapi+0x876/0x104d [xfs]
[4053278.622486]  [<ffffffff8118eec3>] ? _atomic_dec_and_lock+0x33/0x50
[4053278.622501]  [<ffffffffa0257217>] ? xfs_trans_reserve+0xc8/0x19d [xfs]
[4053278.622514]  [<ffffffffa024a51f>] ? xfs_iomap_write_direct+0x305/0x3f5 [xfs]
[4053278.622527]  [<ffffffffa024a7bc>] ? xfs_iomap+0x1ad/0x285 [xfs]
[4053278.622538]  [<ffffffffa025db53>] ? __xfs_get_blocks+0x6c/0x15c [xfs]
[4053278.622551]  [<ffffffff8111442a>] ? __blockdev_direct_IO+0x575/0xa60
[4053278.622563]  [<ffffffffa025d983>] ? xfs_vm_direct_IO+0xd4/0x102 [xfs]
[4053278.622574]  [<ffffffffa025dc43>] ? xfs_get_blocks_direct+0x0/0x11 [xfs]
[4053278.622585]  [<ffffffffa025d791>] ? xfs_end_io_direct+0x0/0x5f [xfs]
[4053278.622593]  [<ffffffff810b55d6>] ? generic_file_direct_write+0xdd/0x152
[4053278.622605]  [<ffffffffa0264532>] ? xfs_write+0x44f/0x6ea [xfs]
[4053278.622618]  [<ffffffffa02442ca>] ? xfs_iunlock+0x31/0x7c [xfs]
[4053278.622625]  [<ffffffff810ef21e>] ? do_sync_write+0xce/0x113
[4053278.622637]  [<ffffffff810652b2>] ? autoremove_wake_function+0x0/0x2e
[4053278.622646]  [<ffffffff810482ef>] ? finish_task_switch+0x3a/0xaf
[4053278.622650]  [<ffffffff810efb70>] ? vfs_write+0xa9/0x102
[4053278.622652]  [<ffffffff810efc20>] ? sys_pwrite64+0x57/0x77
[4053278.622660]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[4055487.227132] ffff880638219200: 58 41 47 46 00 00 00 01 00 00 00 0c 00 05 00 00  XAGF............
[4055487.227678] Filesystem "dm-2": XFS internal error xfs_alloc_read_agf at line 2293 of file /build/buildd-linux-2.6_2.6.32-48squeeze1-amd64-qu4MIV/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/xfs/xfs_alloc.c.  Caller 0xffffffffa021fcb0
[4055487.227681]
[4055487.228267] Pid: 32572, comm: mysqld Not tainted 2.6.32-5-amd64 #1
[4055487.228269] Call Trace:
[4055487.228292]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4055487.228302]  [<ffffffffa021fc73>] ? xfs_read_agf+0x12e/0x149 [xfs]
[4055487.228311]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4055487.228321]  [<ffffffffa021fcb0>] ? xfs_alloc_read_agf+0x22/0xa4 [xfs]
[4055487.228330]  [<ffffffffa021fd45>] ? xfs_alloc_pagf_init+0x13/0x34 [xfs]
[4055487.228341]  [<ffffffffa022c3f1>] ? xfs_inode_is_filestream+0x24/0x51 [xfs]
[4055487.228352]  [<ffffffffa022c67d>] ? xfs_bmap_btalloc+0x25f/0x732 [xfs]
[4055487.228363]  [<ffffffffa0228b70>] ? xfs_bmap_search_extents+0x57/0xd1 [xfs]
[4055487.228374]  [<ffffffffa022ee83>] ? xfs_bmapi+0x876/0x104d [xfs]
[4055487.228388]  [<ffffffffa0257217>] ? xfs_trans_reserve+0xc8/0x19d [xfs]
[4055487.228400]  [<ffffffffa024a51f>] ? xfs_iomap_write_direct+0x305/0x3f5 [xfs]
[4055487.228413]  [<ffffffffa024a7bc>] ? xfs_iomap+0x1ad/0x285 [xfs]
[4055487.228425]  [<ffffffffa025db53>] ? __xfs_get_blocks+0x6c/0x15c [xfs]
[4055487.228431]  [<ffffffff8111442a>] ? __blockdev_direct_IO+0x575/0xa60
[4055487.228442]  [<ffffffffa025d983>] ? xfs_vm_direct_IO+0xd4/0x102 [xfs]
[4055487.228453]  [<ffffffffa025dc43>] ? xfs_get_blocks_direct+0x0/0x11 [xfs]
[4055487.228464]  [<ffffffffa025d791>] ? xfs_end_io_direct+0x0/0x5f [xfs]
[4055487.228468]  [<ffffffff810b55d6>] ? generic_file_direct_write+0xdd/0x152
[4055487.228479]  [<ffffffffa0264532>] ? xfs_write+0x44f/0x6ea [xfs]
[4055487.228492]  [<ffffffffa02442ca>] ? xfs_iunlock+0x31/0x7c [xfs]
[4055487.228496]  [<ffffffff810ef21e>] ? do_sync_write+0xce/0x113
[4055487.228500]  [<ffffffff810652b2>] ? autoremove_wake_function+0x0/0x2e
[4055487.228503]  [<ffffffff810efb70>] ? vfs_write+0xa9/0x102
[4055487.228505]  [<ffffffff810efc20>] ? sys_pwrite64+0x57/0x77
[4055487.228509]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

Throwing a dd at the FS works fine:

dd if=/dev/urandom of=/mysql/data/somefile.out bs=1M count=512

Two things might be going on here that are special

1) I'm using a pretty customized MySQL configuration, including a setting called O_DIRECT, which turns on direct IO for InnoDB flushes.  Therefore, the code path for some InnoDB writes is going to be different from what dd does.
2) This is SAN-backed storage; it shouldn't matter, but it does introduce IO latency.


I've been trying to find a way to fix this "hot"; these are production systems.  Luckily it was just a read-only replica. If this had happened to the master, there would probably have been data corruption (hence the sev 1 ticket).  Couldn't find a hot fix, so I'm going to unmount/repair and then reboot. I'll keep track of what happens so I can attach it to this report later if possible.

Thanks!

Wes

-- System Information:
Debian Release: 6.0.7
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages xfsprogs depends on:
ii  libblkid1                     2.17.2-9   block device id library
ii  libc6                         2.11.3-4   Embedded GNU C Library: Shared lib
ii  libreadline5                  5.2-7      GNU readline and history libraries
ii  libuuid1                      2.17.2-9   Universally Unique ID library

xfsprogs recommends no packages.

Versions of packages xfsprogs suggests:
pn  acl                           <none>     (no description available)
pn  attr                          <none>     (no description available)
pn  quota                         <none>     (no description available)
pn  xfsdump                       <none>     (no description available)

-- no debconf information

--- End Message ---
--- Begin Message ---
To: 713994-done@xxxxxxxxxxxxxxx
Subject: Re: Bug#713994: XFS repair
From: Aníbal Monsalve Salazar <anibal@xxxxxxxxxx>
Date: Mon, 26 May 2014 20:54:28 +1000
In-reply-to: <20130624224753.GW29338@dastard>
References: <CAMjbrKp164a2CU2SPn94NGP+urwKUCtGz2vKyt6snSWaYdHebA@xxxxxxxxxxxxxx> <20130624224753.GW29338@dastard>
Sender: Anibal Monsalve Salazar <anibal@xxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
Version: 3.1.4

On Tue, 2013-06-25 08:47:53 +1000, Dave Chinner wrote:
> On Mon, Jun 24, 2013 at 12:43:55PM -0400, Wes Deviers wrote:
>> Update:
>> 
>> xfs_repair has fixed the problem:
>> 
>> root@ashdbtrk-s008-02:/home/wdeviers# xfs_check /dev/vg0/mysql
>> agf_freeblks 327676, counted 655100 in ag 4
>> agf_longest 327676, counted 655100 in ag 4
>> block 12/1 expected type unknown got btbno
>> block 12/4 expected type unknown got free2
>> ...
>> block 12/45268 expected type unknown got free2
>> block 12/45269 expected type unknown got free2
>> block 12/45270 expected type unknown got free2
>> block 12/45271 expected type unknown got free2
>> block 12/45272 expected type unknown got free2
>> block 12/45273 expected type unknown got free2
>> block 12/45274 expected type unknown got free2
>> block 12/45275 expected type unknown got free2
>> 
>> root@ashdbtrk-s008-02:/home/wdeviers# xfs_repair /dev/vg0/mysql
>> Phase 1 - find and verify superblock...
>> Phase 2 - using internal log
>>         - zero log...
>>         - scan filesystem freespace and inode maps...
>> primary/secondary superblock 4 conflict - AG superblock geometry info
>> conflicts with filesystem geometry
>> bad sequence # 12 for agf 4
>> bad length 327680 for agf 4, should be 655104
>> bad sequence # 12 for agi 4
>> bad length # 327680 for agi 4, should be 655104
>> reset bad sb for ag 4
>> reset bad agf for ag 4
>> reset bad agi for ag 4
>> sb_fdblocks 28836636, counted 28509212
> 
> The AG headers for AG #4 have been overwritten with header from a
> different AG# (12 instead of 4) that has a different length (327680
> instead of 655104).  The sequence number mismatch is why the runtime
> errors are being thrown
> 
> However, the superblock for AG 4 has the wrong geometry and that's
> indicative of it belonging to some other filesystem. i.e. a
> misdirected write. I'd be taking a careful look at your storage,
> because misdirected writes are evil and *extremely hard* to diagnose
> without specialised equipment...
> 
> Anyway, you're running an old, old kernel (2.6.32-5-amd64) and
> xfsprogs, so unless you can reproduce this reliably on a couple of
> different types of hardware with a more recent kernel and xfsprogs,
> we aren't really going to be able to do anything about your problem.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

I'm closing this bug report after 11 months since Dave's response.

Please feel free to reopen when you have the information requested by
Dave.

Attachment: signature.asc
Description: Digital signature


--- End Message ---
<Prev in Thread] Current Thread [Next in Thread>
  • Bug#713994: marked as done (xfsprogs: Failure to allocate new space, potentially after an xfs_grow, to existing files), Debian Bug Tracking System <=