Bug 922 - kernel hangs in xlog_grant_log_space
: kernel hangs in xlog_grant_log_space
Status: NEW
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: Current
: All Linux
: P3 major
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-08 19:52 CDT by Chris J Arges
Modified: 2012-11-15 16:57 CST (History)
3 users (show)

See Also:


Attachments
Script to create files for reproducer. (397 bytes, text/plain)
2012-05-08 20:14 CDT, Chris J Arges
Details
Script to copy files and cause hang. (2.99 KB, text/plain)
2012-05-08 20:15 CDT, Chris J Arges
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chris J Arges 2012-05-08 19:52:12 CDT
* Overview

When running a large number of file operations, occassionally an XFS filesystem will cause the kernel to hang. This can be reproduced easily using a set of scripts that perform file operations and the XFS partition's logsize is set to a size of 576b.

* Steps to Reproduce

1) mkfs.xfs -b size=1024 -l size=576b <dev path>
2) mount the volume
3) copy check-files, create-files, and copy-files to partition (in the attached archive)
4) run ./create-files
5) run ./copy-files
6) wait about 2-4 hours, the dots will stop printing and check dmesg

* Actual Results

An xfs kernel task hangs, and a backtrace occurs. Output has been placed in this bug: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498
In addition are logs that have been requested in this email thread:
http://oss.sgi.com/archives/xfs/2012-04/msg00951.html

* Expected Results

This should run for a very long time, and not cause hangs.

* Build Date & Platform

This has been tested on the following kernels which all exhibit the same failures:
- 3.2.0-24 (Ubuntu Precise)
- 3.4.0-rc4
- 3.0.29
- 3.1.10
- 3.2.15
- 3.3.2

* Additional Information

This is the related Ubuntu bug with some additional information:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498

This is an email thread describing the issue:
http://oss.sgi.com/archives/xfs/2012-04/msg00951.html

Not sure if the following bug is related or not:
http://oss.sgi.com/bugzilla/show_bug.cgi?id=854

In addition there is an older thread that could also be related:
http://oss.sgi.com/archives/xfs/2011-11/msg00401.html
Comment 1 Chris J Arges 2012-05-08 20:14:31 CDT
Created attachment 304 [details]
Script to create files for reproducer.
Comment 2 Chris J Arges 2012-05-08 20:15:04 CDT
Created attachment 305 [details]
Script to copy files and cause hang.
Comment 3 Ben Myers 2012-05-16 12:48:46 CDT
Is the customer using this crazy small log size, or was that done to make this reproduceable?
Comment 4 Chris J Arges 2012-05-16 14:01:31 CDT
(In reply to comment #3)
> Is the customer using this crazy small log size, or was that done to make this
> reproduceable?

This was done to reproduce the issue. Changing the log size to the minimum seemed to produce the backtrace as the original failure. 

The description here should have part of the backtrace here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/979498

The syslog here should be the backtrace using the reproducer:
https://launchpadlibrarian.net/102951054/syslog

This has also been reproduced on 3.4-rc4, 3.2.0-24-server (Ubuntu precise), and a few other versions in between.

This thread has some good info on the case so far:
http://oss.sgi.com/archives/xfs/2012-04/msg00953.html
Comment 5 Chris J Arges 2012-11-15 16:57:10 CST
Just tested this with the xfs tree commit fb59581404ab7ec5075299065c22cb211a9262a9 on Nov 12th 2012, and I can still reproduce this issue.