xfs
[Top] [All Lists]

XFS Error Possibly Related To Quota?

To: linux-xfs@xxxxxxxxxxx
Subject: XFS Error Possibly Related To Quota?
From: andy liebman <andyliebman@xxxxxxx>
Date: Sun, 05 Mar 2006 14:25:56 -0500
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5 (Windows/20051201)
Hi,

I'm hoping one of the XFS gurus can weigh in on the incident described below:

I just experienced an XFS Error on an 11 TB volume. It seems that I was able to repair the error with "xfs_repair", but I am looking for some insights into why the error occurred.

To make a long story short, at 13:11 on February 28, users of this system -- which serves video files for video editing -- noticed two odd symptoms:

1) For the most part, tens of thousands of clips that had previously been captured prior to February 28 played back fine. But any new files that were captured or rendered to the volume after 13:11 on February 28 played back with green flashes or parts of frames missing (the missing parts showing up green). Green is usually what you get when data is missing from a digital video file.

2) SOME files that had been captured prior to February 28 periodically displayed images from completely unrelated files when played back. It's as if bits of different clips got mixed together -- as if the xfs filesystem suddenly start pointing to some incorrect bits of data in the middle of otherwise intact files.

Below, I include a snippet from /var/log messages showing when the trouble started. The "messages" file is actually 1 GB in size -- because so many xfs error messages were generated. When compressed as tar.gz, it is still 12 MB!

I also include the output from xfs_repair. After running repair, the files seem to play fine again, and we can render and capture new files without getting the green stuff. But the question is, what happened??

/var/log/messages:

Feb 28 13:01:00 eshare CROND[498]: (root) CMD (nice -n 19 run-parts
/etc/cron.hourly)
Feb 28 13:11:11 eshare kernel: 0x0: 76 22 86 22 76 22 86 22 76 22 86 22
76 22 86 22
Feb 28 13:11:11 eshare kernel: Filesystem "md2": XFS internal error
xfs_alloc_read_agf at line 2195 of file fs/xf
s/xfs_alloc.c.  Caller 0xf8c1880a
Feb 28 13:11:11 eshare kernel:  [pg0+948030533/1069556736]
xfs_alloc_read_agf+0x111/0x1f9 [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c18c45>]
xfs_alloc_read_agf+0x111/0x1f9 [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948029450/1069556736]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c1880a>]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948029450/1069556736]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948029450/1069556736]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c1880a>]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948386575/1069556736]
xfs_trans_log_buf+0x6b/0xa4 [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c6fb0f>]
xfs_trans_log_buf+0x6b/0xa4 [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948033063/1069556736]
xfs_alloc_search_busy+0x97/0xdc [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c19627>]
xfs_alloc_search_busy+0x97/0xdc [xfs]
Feb 28 13:11:11 eshare kernel:  [activate_task+147/167]
activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [<c0115fc2>] activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [activate_task+147/167]
activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [<c0115fc2>] activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [pg0+948031275/1069556736]
xfs_alloc_vextent+0x1fe/0x5bb [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c18f2b>]
xfs_alloc_vextent+0x1fe/0x5bb [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948100610/1069556736]
xfs_bmap_alloc+0x1190/0x195d [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c29e02>]
xfs_bmap_alloc+0x1190/0x195d [xfs]
Feb 28 13:11:11 eshare kernel:  [lock_timer_base+36/73]
lock_timer_base+0x24/0x49
Feb 28 13:11:11 eshare kernel:  [<c01252eb>] lock_timer_base+0x24/0x49
Feb 28 13:11:11 eshare kernel:  [sk_reset_timer+28/41]
sk_reset_timer+0x1c/0x29
Feb 28 13:11:11 eshare kernel:  [<c0281ad8>] sk_reset_timer+0x1c/0x29
Feb 28 13:11:11 eshare kernel:  [pg0+948148400/1069556736]
xfs_bmbt_get_state+0x2f/0x3b [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c358b0>]
xfs_bmbt_get_state+0x2f/0x3b [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948109278/1069556736]
xfs_bmap_do_search_extents+0xf7/0x48d [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c2bfde>]
xfs_bmap_do_search_extents+0xf7/0x48d [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948119268/1069556736]4>
[free_pages_bulk+471/497] free_pages_bulk+0x1d7/0x1f
1
Feb 28 13:11:11 eshare kernel: 4> [<c0143e1a>] free_pages_bulk+0x1d7/0x1f1
Feb 28 13:11:11 eshare kernel:  [test_clear_page_dirty+188/250]
test_clear_page_dirty+0xbc/0xfa
Feb 28 13:11:11 eshare kernel: 4> [<c0143e1a>] free_pages_bulk+0x1d7/0x1f1
Feb 28 13:11:11 eshare kernel:  [test_clear_page_dirty+188/250]
test_clear_page_dirty+0xbc/0xfa
Feb 28 13:11:11 eshare kernel:  [<c01465df>]
test_clear_page_dirty+0xbc/0xfa
Feb 28 13:11:11 eshare kernel:  [pg0+948444228/1069556736]
linvfs_writepage+0x72/0x128 [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c7dc44>]
linvfs_writepage+0x72/0x128 [xfs]
Feb 28 13:11:11 eshare kernel:  [pageout+179/309] pageout+0xb3/0x135
Feb 28 13:11:11 eshare kernel:  [<c014b1e3>] pageout+0xb3/0x135
Feb 28 13:11:11 eshare kernel:  [__remove_from_page_cache+30/99]
__remove_from_page_cache+0x1e/0x63
Feb 28 13:11:11 eshare kernel:  [<c013f19a>]
__remove_from_page_cache+0x1e/0x63
Feb 28 13:11:11 eshare kernel:  [shrink_list+482/1070]
shrink_list+0x1e2/0x42e
Feb 28 13:11:11 eshare kernel:  [<c014b447>] shrink_list+0x1e2/0x42e
Feb 28 13:11:11 eshare kernel:  [try_to_wake_up+680/869]
try_to_wake_up+0x2a8/0x365
Feb 28 13:11:11 eshare kernel:  [<c0116899>] try_to_wake_up+0x2a8/0x365
Feb 28 13:11:11 eshare kernel:  [shrink_cache+275/669]
shrink_cache+0x113/0x29d
Feb 28 13:11:11 eshare kernel:  [<c014b845>] shrink_cache+0x113/0x29d
Feb 28 13:11:11 eshare kernel:  [wake_up_process+30/32]
wake_up_process+0x1e/0x20
Feb 28 13:11:11 eshare kernel:  [<c0116974>] wake_up_process+0x1e/0x20
Feb 28 13:11:11 eshare kernel:  [shrink_slab+149/415]
shrink_slab+0x95/0x19f
Feb 28 13:11:11 eshare kernel:  [<c014af34>] shrink_slab+0x95/0x19f
Feb 28 13:11:11 eshare kernel:  [shrink_zone+184/222]
shrink_zone+0xb8/0xde
Feb 28 13:11:11 eshare kernel:  [<c014be86>] shrink_zone+0xb8/0xde
Feb 28 13:11:11 eshare kernel:  [balance_pgdat+612/1005]
balance_pgdat+0x264/0x3ed
Feb 28 13:11:11 eshare kernel:  [<c014c345>] balance_pgdat+0x264/0x3ed
Feb 28 13:11:11 eshare kernel:  [prepare_to_wait+32/105]
prepare_to_wait+0x20/0x69
Feb 28 13:11:11 eshare kernel:  [<c01311e9>] prepare_to_wait+0x20/0x69
Feb 28 13:11:11 eshare kernel:  [kswapd+232/312] kswapd+0xe8/0x138
Feb 28 13:11:11 eshare kernel:  [<c014c5b6>] kswapd+0xe8/0x138
Feb 28 13:11:11 eshare kernel:  [autoremove_wake_function+0/87]
autoremove_wake_function+0x0/0x57
Feb 28 13:11:11 eshare kernel:  [<c0131306>]
autoremove_wake_function+0x0/0x57
Feb 28 13:11:11 eshare kernel:  [ret_from_fork+6/20]
ret_from_fork+0x6/0x14
Feb 28 13:11:11 eshare kernel:  [<c0102cd2>] ret_from_fork+0x6/0x14
Feb 28 13:11:11 eshare kernel:  [autoremove_wake_function+0/87]
autoremove_wake_function+0x0/0x57
Feb 28 13:11:11 eshare kernel:  [<c0131306>]
autoremove_wake_function+0x0/0x57
Feb 28 13:11:11 eshare kernel:  [kswapd+0/312] kswapd+0x0/0x138
Feb 28 13:11:11 eshare kernel:  [<c014c4ce>] kswapd+0x0/0x138
Feb 28 13:11:11 eshare kernel:  [kernel_thread_helper+5/11]
kernel_thread_helper+0x5/0xb
Feb 28 13:11:11 eshare kernel:  [<c0101145>] kernel_thread_helper+0x5/0xb
Feb 28 13:11:11 eshare kernel: 0x0: 76 22 86 22 76 22 86 22 76 22 86 22
76 22 86 22
Feb 28 13:11:11 eshare kernel: Filesystem "md2": XFS internal error
xfs_alloc_read_agf at line 2195 of file fs/xf
s/xfs_alloc.c.  Caller 0xf8c1880a
Feb 28 13:11:11 eshare kernel:  [pg0+948030533/1069556736]
xfs_alloc_read_agf+0x111/0x1f9 [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c18c45>]
xfs_alloc_read_agf+0x111/0x1f9 [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948029450/1069556736]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c1880a>]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948029450/1069556736]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c1880a>]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948029450/1069556736]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c1880a>]
xfs_alloc_fix_freelist+0x44a/0x48e [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948386575/1069556736]
xfs_trans_log_buf+0x6b/0xa4 [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c6fb0f>]
xfs_trans_log_buf+0x6b/0xa4 [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948033063/1069556736]
xfs_alloc_search_busy+0x97/0xdc [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c19627>]
xfs_alloc_search_busy+0x97/0xdc [xfs]
Feb 28 13:11:11 eshare kernel:  [activate_task+147/167]
activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [<c0115fc2>] activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [activate_task+147/167]
activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [<c0115fc2>] activate_task+0x93/0xa7
Feb 28 13:11:11 eshare kernel:  [pg0+948031275/1069556736]
xfs_alloc_vextent+0x1fe/0x5bb [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c18f2b>]
xfs_alloc_vextent+0x1fe/0x5bb [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948100610/1069556736]
xfs_bmap_alloc+0x1190/0x195d [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c29e02>]
xfs_bmap_alloc+0x1190/0x195d [xfs]
Feb 28 13:11:11 eshare kernel:  [lock_timer_base+36/73]
lock_timer_base+0x24/0x49
Feb 28 13:11:11 eshare kernel:  [<c01252eb>] lock_timer_base+0x24/0x49
Feb 28 13:11:11 eshare kernel:  [sk_reset_timer+28/41]
sk_reset_timer+0x1c/0x29
Feb 28 13:11:11 eshare kernel:  [<c0281ad8>] sk_reset_timer+0x1c/0x29
Feb 28 13:11:11 eshare kernel:  [pg0+948148400/1069556736]
xfs_bmbt_get_state+0x2f/0x3b [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c358b0>]
xfs_bmbt_get_state+0x2f/0x3b [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948109278/1069556736]
xfs_bmap_do_search_extents+0xf7/0x48d [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c2bfde>]
xfs_bmap_do_search_extents+0xf7/0x48d [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948119268/1069556736]
xfs_bmapi+0xff9/0x1826 [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c2e6e4>] xfs_bmapi+0xff9/0x1826 [xfs]
Feb 28 13:11:11 eshare kernel:  [mempool_alloc+51/230]
mempool_alloc+0x33/0xe6
Feb 28 13:11:11 eshare kernel:  [<c01432ef>] mempool_alloc+0x33/0xe6
Feb 28 13:11:11 eshare kernel:  [pg0+948148400/1069556736]
xfs_bmbt_get_state+0x2f/0x3b [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c358b0>]
xfs_bmbt_get_state+0x2f/0x3b [xfs]
Feb 28 13:11:11 eshare kernel:  [pg0+948109278/1069556736]
xfs_bmap_do_search_extents+0xf7/0x48d [xfs]
Feb 28 13:11:11 eshare kernel:  [<f8c2bfde>]
xfs_bmap_do_search_extents+0xf7/0x48d [xfs]


XFS_REPAIR

[root@eshare RAID_1]# xfs_repair -v /dev/md2
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
zero_log: head block 40861 tail block 40861
        - scan filesystem freespace and inode maps...
bad on-disk superblock 31 - bad magic number
primary/secondary superblock 31 conflict - AG superblock geometry info
conflicts with filesystem geometry
non-null user quota inode field in superblock 31
non-null group quota inode field in superblock 31
bad magic # 0x76228622 for agf 31
bad version # 1981974050 for agf 31
bad sequence # 1981974050 for agf 31
bad length 1981974050 for agf 31, should be 83919936
flfirst 1965262628 in agf 31 too large (max = 128)
fllast 1948550948 in agf 31 too large (max = 128)
bad magic # 0x752f862f for agi 31
bad version # 1965983278 for agi 31
bad sequence # 1982760492 for agi 31
bad length # 1965590054 for agi 31, should be 83919936
reset bad sb for ag 31
reset bad agf for ag 31
reset bad agi for ag 31
bad agbno 1748602424 in agfl, agno 31
freeblk count 1 != flcount 1948550948 in ag 31
bad agbno 1981974050 for btbno root, agno 31
bad agbno 1981974050 for btbcnt root, agno 31
bad agbno 1948550948 for inobt root, agno 31
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
error following ag 31 unlinked list
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
        - traversal finished ...
        - traversing all unattached subtrees ...
        - traversals finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

Andy Liebman


<Prev in Thread] Current Thread [Next in Thread>
  • XFS Error Possibly Related To Quota?, andy liebman <=