| To: | linux-xfs@xxxxxxxxxxx |
|---|---|
| Subject: | XFS Error Possibly Related To Quota? |
| From: | andy liebman <andyliebman@xxxxxxx> |
| Date: | Sun, 05 Mar 2006 14:25:56 -0500 |
| Sender: | linux-xfs-bounce@xxxxxxxxxxx |
| User-agent: | Thunderbird 1.5 (Windows/20051201) |
Hi, I'm hoping one of the XFS gurus can weigh in on the incident described below: I just experienced an XFS Error on an 11 TB volume. It seems that I was able to repair the error with "xfs_repair", but I am looking for some insights into why the error occurred. To make a long story short, at 13:11 on February 28, users of this system -- which serves video files for video editing -- noticed two odd symptoms: 1) For the most part, tens of thousands of clips that had previously been captured prior to February 28 played back fine. But any new files that were captured or rendered to the volume after 13:11 on February 28 played back with green flashes or parts of frames missing (the missing parts showing up green). Green is usually what you get when data is missing from a digital video file. 2) SOME files that had been captured prior to February 28 periodically displayed images from completely unrelated files when played back. It's as if bits of different clips got mixed together -- as if the xfs filesystem suddenly start pointing to some incorrect bits of data in the middle of otherwise intact files. Below, I include a snippet from /var/log messages showing when the trouble started. The "messages" file is actually 1 GB in size -- because so many xfs error messages were generated. When compressed as tar.gz, it is still 12 MB! I also include the output from xfs_repair. After running repair, the files seem to play fine again, and we can render and capture new files without getting the green stuff. But the question is, what happened?? /var/log/messages: Feb 28 13:01:00 eshare CROND[498]: (root) CMD (nice -n 19 run-parts /etc/cron.hourly) Feb 28 13:11:11 eshare kernel: 0x0: 76 22 86 22 76 22 86 22 76 22 86 22 76 22 86 22 Feb 28 13:11:11 eshare kernel: Filesystem "md2": XFS internal error xfs_alloc_read_agf at line 2195 of file fs/xf s/xfs_alloc.c. Caller 0xf8c1880a Feb 28 13:11:11 eshare kernel: [pg0+948030533/1069556736] xfs_alloc_read_agf+0x111/0x1f9 [xfs] Feb 28 13:11:11 eshare kernel: [<f8c18c45>] xfs_alloc_read_agf+0x111/0x1f9 [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948029450/1069556736] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [<f8c1880a>] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948029450/1069556736] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948029450/1069556736] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [<f8c1880a>] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948386575/1069556736] xfs_trans_log_buf+0x6b/0xa4 [xfs] Feb 28 13:11:11 eshare kernel: [<f8c6fb0f>] xfs_trans_log_buf+0x6b/0xa4 [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948033063/1069556736] xfs_alloc_search_busy+0x97/0xdc [xfs] Feb 28 13:11:11 eshare kernel: [<f8c19627>] xfs_alloc_search_busy+0x97/0xdc [xfs] Feb 28 13:11:11 eshare kernel: [activate_task+147/167] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [<c0115fc2>] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [activate_task+147/167] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [<c0115fc2>] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [pg0+948031275/1069556736] xfs_alloc_vextent+0x1fe/0x5bb [xfs] Feb 28 13:11:11 eshare kernel: [<f8c18f2b>] xfs_alloc_vextent+0x1fe/0x5bb [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948100610/1069556736] xfs_bmap_alloc+0x1190/0x195d [xfs] Feb 28 13:11:11 eshare kernel: [<f8c29e02>] xfs_bmap_alloc+0x1190/0x195d [xfs] Feb 28 13:11:11 eshare kernel: [lock_timer_base+36/73] lock_timer_base+0x24/0x49 Feb 28 13:11:11 eshare kernel: [<c01252eb>] lock_timer_base+0x24/0x49 Feb 28 13:11:11 eshare kernel: [sk_reset_timer+28/41] sk_reset_timer+0x1c/0x29 Feb 28 13:11:11 eshare kernel: [<c0281ad8>] sk_reset_timer+0x1c/0x29 Feb 28 13:11:11 eshare kernel: [pg0+948148400/1069556736] xfs_bmbt_get_state+0x2f/0x3b [xfs] Feb 28 13:11:11 eshare kernel: [<f8c358b0>] xfs_bmbt_get_state+0x2f/0x3b [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948109278/1069556736] xfs_bmap_do_search_extents+0xf7/0x48d [xfs] Feb 28 13:11:11 eshare kernel: [<f8c2bfde>] xfs_bmap_do_search_extents+0xf7/0x48d [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948119268/1069556736]4> [free_pages_bulk+471/497] free_pages_bulk+0x1d7/0x1f 1 Feb 28 13:11:11 eshare kernel: 4> [<c0143e1a>] free_pages_bulk+0x1d7/0x1f1 Feb 28 13:11:11 eshare kernel: [test_clear_page_dirty+188/250] test_clear_page_dirty+0xbc/0xfa Feb 28 13:11:11 eshare kernel: 4> [<c0143e1a>] free_pages_bulk+0x1d7/0x1f1 Feb 28 13:11:11 eshare kernel: [test_clear_page_dirty+188/250] test_clear_page_dirty+0xbc/0xfa Feb 28 13:11:11 eshare kernel: [<c01465df>] test_clear_page_dirty+0xbc/0xfa Feb 28 13:11:11 eshare kernel: [pg0+948444228/1069556736] linvfs_writepage+0x72/0x128 [xfs] Feb 28 13:11:11 eshare kernel: [<f8c7dc44>] linvfs_writepage+0x72/0x128 [xfs] Feb 28 13:11:11 eshare kernel: [pageout+179/309] pageout+0xb3/0x135 Feb 28 13:11:11 eshare kernel: [<c014b1e3>] pageout+0xb3/0x135 Feb 28 13:11:11 eshare kernel: [__remove_from_page_cache+30/99] __remove_from_page_cache+0x1e/0x63 Feb 28 13:11:11 eshare kernel: [<c013f19a>] __remove_from_page_cache+0x1e/0x63 Feb 28 13:11:11 eshare kernel: [shrink_list+482/1070] shrink_list+0x1e2/0x42e Feb 28 13:11:11 eshare kernel: [<c014b447>] shrink_list+0x1e2/0x42e Feb 28 13:11:11 eshare kernel: [try_to_wake_up+680/869] try_to_wake_up+0x2a8/0x365 Feb 28 13:11:11 eshare kernel: [<c0116899>] try_to_wake_up+0x2a8/0x365 Feb 28 13:11:11 eshare kernel: [shrink_cache+275/669] shrink_cache+0x113/0x29d Feb 28 13:11:11 eshare kernel: [<c014b845>] shrink_cache+0x113/0x29d Feb 28 13:11:11 eshare kernel: [wake_up_process+30/32] wake_up_process+0x1e/0x20 Feb 28 13:11:11 eshare kernel: [<c0116974>] wake_up_process+0x1e/0x20 Feb 28 13:11:11 eshare kernel: [shrink_slab+149/415] shrink_slab+0x95/0x19f Feb 28 13:11:11 eshare kernel: [<c014af34>] shrink_slab+0x95/0x19f Feb 28 13:11:11 eshare kernel: [shrink_zone+184/222] shrink_zone+0xb8/0xde Feb 28 13:11:11 eshare kernel: [<c014be86>] shrink_zone+0xb8/0xde Feb 28 13:11:11 eshare kernel: [balance_pgdat+612/1005] balance_pgdat+0x264/0x3ed Feb 28 13:11:11 eshare kernel: [<c014c345>] balance_pgdat+0x264/0x3ed Feb 28 13:11:11 eshare kernel: [prepare_to_wait+32/105] prepare_to_wait+0x20/0x69 Feb 28 13:11:11 eshare kernel: [<c01311e9>] prepare_to_wait+0x20/0x69 Feb 28 13:11:11 eshare kernel: [kswapd+232/312] kswapd+0xe8/0x138 Feb 28 13:11:11 eshare kernel: [<c014c5b6>] kswapd+0xe8/0x138 Feb 28 13:11:11 eshare kernel: [autoremove_wake_function+0/87] autoremove_wake_function+0x0/0x57 Feb 28 13:11:11 eshare kernel: [<c0131306>] autoremove_wake_function+0x0/0x57 Feb 28 13:11:11 eshare kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 Feb 28 13:11:11 eshare kernel: [<c0102cd2>] ret_from_fork+0x6/0x14 Feb 28 13:11:11 eshare kernel: [autoremove_wake_function+0/87] autoremove_wake_function+0x0/0x57 Feb 28 13:11:11 eshare kernel: [<c0131306>] autoremove_wake_function+0x0/0x57 Feb 28 13:11:11 eshare kernel: [kswapd+0/312] kswapd+0x0/0x138 Feb 28 13:11:11 eshare kernel: [<c014c4ce>] kswapd+0x0/0x138 Feb 28 13:11:11 eshare kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Feb 28 13:11:11 eshare kernel: [<c0101145>] kernel_thread_helper+0x5/0xb Feb 28 13:11:11 eshare kernel: 0x0: 76 22 86 22 76 22 86 22 76 22 86 22 76 22 86 22 Feb 28 13:11:11 eshare kernel: Filesystem "md2": XFS internal error xfs_alloc_read_agf at line 2195 of file fs/xf s/xfs_alloc.c. Caller 0xf8c1880a Feb 28 13:11:11 eshare kernel: [pg0+948030533/1069556736] xfs_alloc_read_agf+0x111/0x1f9 [xfs] Feb 28 13:11:11 eshare kernel: [<f8c18c45>] xfs_alloc_read_agf+0x111/0x1f9 [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948029450/1069556736] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [<f8c1880a>] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948029450/1069556736] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [<f8c1880a>] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948029450/1069556736] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [<f8c1880a>] xfs_alloc_fix_freelist+0x44a/0x48e [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948386575/1069556736] xfs_trans_log_buf+0x6b/0xa4 [xfs] Feb 28 13:11:11 eshare kernel: [<f8c6fb0f>] xfs_trans_log_buf+0x6b/0xa4 [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948033063/1069556736] xfs_alloc_search_busy+0x97/0xdc [xfs] Feb 28 13:11:11 eshare kernel: [<f8c19627>] xfs_alloc_search_busy+0x97/0xdc [xfs] Feb 28 13:11:11 eshare kernel: [activate_task+147/167] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [<c0115fc2>] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [activate_task+147/167] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [<c0115fc2>] activate_task+0x93/0xa7 Feb 28 13:11:11 eshare kernel: [pg0+948031275/1069556736] xfs_alloc_vextent+0x1fe/0x5bb [xfs] Feb 28 13:11:11 eshare kernel: [<f8c18f2b>] xfs_alloc_vextent+0x1fe/0x5bb [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948100610/1069556736] xfs_bmap_alloc+0x1190/0x195d [xfs] Feb 28 13:11:11 eshare kernel: [<f8c29e02>] xfs_bmap_alloc+0x1190/0x195d [xfs] Feb 28 13:11:11 eshare kernel: [lock_timer_base+36/73] lock_timer_base+0x24/0x49 Feb 28 13:11:11 eshare kernel: [<c01252eb>] lock_timer_base+0x24/0x49 Feb 28 13:11:11 eshare kernel: [sk_reset_timer+28/41] sk_reset_timer+0x1c/0x29 Feb 28 13:11:11 eshare kernel: [<c0281ad8>] sk_reset_timer+0x1c/0x29 Feb 28 13:11:11 eshare kernel: [pg0+948148400/1069556736] xfs_bmbt_get_state+0x2f/0x3b [xfs] Feb 28 13:11:11 eshare kernel: [<f8c358b0>] xfs_bmbt_get_state+0x2f/0x3b [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948109278/1069556736] xfs_bmap_do_search_extents+0xf7/0x48d [xfs] Feb 28 13:11:11 eshare kernel: [<f8c2bfde>] xfs_bmap_do_search_extents+0xf7/0x48d [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948119268/1069556736] xfs_bmapi+0xff9/0x1826 [xfs] Feb 28 13:11:11 eshare kernel: [<f8c2e6e4>] xfs_bmapi+0xff9/0x1826 [xfs] Feb 28 13:11:11 eshare kernel: [mempool_alloc+51/230] mempool_alloc+0x33/0xe6 Feb 28 13:11:11 eshare kernel: [<c01432ef>] mempool_alloc+0x33/0xe6 Feb 28 13:11:11 eshare kernel: [pg0+948148400/1069556736] xfs_bmbt_get_state+0x2f/0x3b [xfs] Feb 28 13:11:11 eshare kernel: [<f8c358b0>] xfs_bmbt_get_state+0x2f/0x3b [xfs] Feb 28 13:11:11 eshare kernel: [pg0+948109278/1069556736] xfs_bmap_do_search_extents+0xf7/0x48d [xfs] Feb 28 13:11:11 eshare kernel: [<f8c2bfde>] xfs_bmap_do_search_extents+0xf7/0x48d [xfs]
[root@eshare RAID_1]# xfs_repair -v /dev/md2
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
zero_log: head block 40861 tail block 40861
- scan filesystem freespace and inode maps...
bad on-disk superblock 31 - bad magic number
primary/secondary superblock 31 conflict - AG superblock geometry info
conflicts with filesystem geometry
non-null user quota inode field in superblock 31
non-null group quota inode field in superblock 31
bad magic # 0x76228622 for agf 31
bad version # 1981974050 for agf 31
bad sequence # 1981974050 for agf 31
bad length 1981974050 for agf 31, should be 83919936
flfirst 1965262628 in agf 31 too large (max = 128)
fllast 1948550948 in agf 31 too large (max = 128)
bad magic # 0x752f862f for agi 31
bad version # 1965983278 for agi 31
bad sequence # 1982760492 for agi 31
bad length # 1965590054 for agi 31, should be 83919936
reset bad sb for ag 31
reset bad agf for ag 31
reset bad agi for ag 31
bad agbno 1748602424 in agfl, agno 31
freeblk count 1 != flcount 1948550948 in ag 31
bad agbno 1981974050 for btbno root, agno 31
bad agbno 1981974050 for btbcnt root, agno 31
bad agbno 1948550948 for inobt root, agno 31
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
error following ag 31 unlinked list
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
doneAndy Liebman |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: TAKE 928864 - [SUSE#76685] Inode extent management causes high order page allocations, Stewart Smith |
|---|---|
| Next by Date: | XFS _apparent_ corruption: "DATA POINT" (worked around); 2.6.15.4-biglowmem, Linda Walsh |
| Previous by Thread: | Re: TAKE 928864 - [SUSE#76685] Inode extent management causes high order page allocations, Stewart Smith |
| Next by Thread: | XFS _apparent_ corruption: "DATA POINT" (worked around); 2.6.15.4-biglowmem, Linda Walsh |
| Indexes: | [Date] [Thread] [Top] [All Lists] |