Greetings! I have a file server that get a pretty nasty load (about 15
million files created every day). After some time, I noticed that the load
average spiked up from the usual 30 to about 180. dmesg revealed:
[434042.318401] INFO: task php:2185 blocked for more than 120 seconds.
[434042.318403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
[434042.318405] php D 000000010675d6cd 0 2185 27306 0x00000000
[434042.318408] ffff88008d735a48 0000000000000086 ffff88008d735938
[434042.318412] ffff88008d734010 ffff88000e28e340 0000000000012000
[434042.318416] ffff88008d735fd8 0000000000012000 ffff8807ef9966c0
[434042.318419] Call Trace:
[434042.318442] [<ffffffffa0087a9b>] ? xfs_trans_brelse+0xee/0xf7 [xfs]
[434042.318464] [<ffffffffa00689de>] ? xfs_da_brelse+0x71/0x96 [xfs]
[434042.318485] [<ffffffffa006df10>] ? xfs_dir2_leaf_lookup_int+0x211/0x225
[434042.318489] [<ffffffff8141481e>] schedule+0x55/0x57
[434042.318512] [<ffffffffa0083de2>] xlog_reserveq_wait+0x115/0x1c0 [xfs]
[434042.318515] [<ffffffff810381f1>] ? try_to_wake_up+0x23d/0x23d
[434042.318539] [<ffffffffa0083f45>] xlog_grant_log_space+0xb8/0x1be [xfs]
[434042.318562] [<ffffffffa0084164>] xfs_log_reserve+0x119/0x133 [xfs]
[434042.318585] [<ffffffffa0080cf1>] xfs_trans_reserve+0xca/0x199 [xfs]
[434042.318605] [<ffffffffa00500dc>] xfs_create+0x18d/0x467 [xfs]
[434042.318623] [<ffffffffa00485be>] xfs_vn_mknod+0xa0/0xf9 [xfs]
[434042.318640] [<ffffffffa0048632>] xfs_vn_create+0xb/0xd [xfs]
[434042.318644] [<ffffffff810f0c5d>] vfs_create+0x6e/0x9e
[434042.318647] [<ffffffff810f1c5e>] do_last+0x302/0x642
[434042.318651] [<ffffffff810f2068>] path_openat+0xca/0x344
[434042.318654] [<ffffffff810f23d1>] do_filp_open+0x38/0x87
[434042.318658] [<ffffffff810fb22e>] ? alloc_fd+0x76/0x11e
[434042.318661] [<ffffffff810e40b1>] do_sys_open+0x10b/0x1a4
[434042.318664] [<ffffffff810e4173>] sys_open+0x1b/0x1d
It makes sense that'd the load average would spike up if some major lock got
held longer than it should have been.
The box has 32GB RAM, 6 cores, and it's running 3.2.2.
I've looked at the commits in the stable tree since 3.2.2 was tagged, and I
do see a couple of useful commits so I'll try to get the kernel updated
anyway but I don't quite see any of those fixes addressing this "hang".
He didn't know where he was going.
When he got there he didn't know where he was.
When he got back he didn't know where he had been.
And he did it all on someone else's money.