XFS is not really part of the problem here, it is just triggering lots of
I/O.
There appears to be a deadlock between the lru_list_lock and the
io_request_lock.
On the one hand the lru_list_lock does not disable interrupts, so in I/O
can complete whilst it is held. I/O completion functions tend to grab the
io_request_lock.
On the other hand there are queue request functions (rd_request from the
ram disk driver is the example we hit) which end up grabbing the lru_list_lock.
In this case rd_request calls getblk. Since the unplug_device layer holds the
io_request_lock around the request_fn call we have a deadlock.
So either doing anything which touches the lru_list_lock in a request function
is a BUG, or the lru_list_lock should hold off interrupts, or some more complex
fix is needed.
I am not sure if the ram disk driver is an isolated case or one of many
possible
culprits.
Steve
|