I've noticed one of our postgresql servers having a high load, so I looked into
it.
Kernel = 3.2.15, disk space is free:
# df
/dev/sda2 6281216 457888 5823328 8% /disks/work
/dev/sdc1 52402180 10135176 42267004 20% /disks/pg-db
And I got these warnings since Jul 12 until today:
Jul 12 00:31:41 db24 kernel: XFS (sda2): xlog_space_left: head behind tail
Jul 12 00:31:41 db24 kernel: tail_cycle = 4, tail_bytes = 248320
Jul 12 00:31:41 db24 kernel: GH cycle = 4, GH bytes = 248312
[snip]
Aug 8 03:45:48 db24 kernel: XFS (sdc1): xlog_space_left: head behind tail
Aug 8 03:45:48 db24 kernel: tail_cycle = 607, tail_bytes = 21702144
Aug 8 03:45:48 db24 kernel: GH cycle = 607, GH bytes = 21701856
And this morning it changed into warnings:
Aug 9 10:34:57 db24 kernel: INFO: task vmtoolsd:1426 blocked for more than 120
seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:34:57 db24 kernel: ffff880059e0fc78 0000000000000082 ffff880059af3ba0
0000000000000001
Aug 9 10:34:57 db24 kernel: ffff880059e0e010 ffff880058ea5cc0 0000000000010940
ffff880059e0ffd8
Aug 9 10:34:57 db24 kernel: ffff880059e0ffd8 0000000000010940 ffff880056264f80
ffff880058ea5cc0
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb1e>]
rwsem_down_failed_common+0xe4/0x116
Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb63>]
rwsem_down_write_failed+0x13/0x15
Aug 9 10:34:57 db24 kernel: [<ffffffff811c4503>]
call_rwsem_down_write_failed+0x13/0x20
Aug 9 10:34:57 db24 kernel: [<ffffffff8136f35c>] ? down_write+0x20/0x22
Aug 9 10:34:57 db24 kernel: [<ffffffff810d1e44>] thaw_super+0x1c/0xae
Aug 9 10:34:57 db24 kernel: [<ffffffff810f6b5e>] thaw_bdev+0x45/0x74
Aug 9 10:34:57 db24 kernel: [<ffffffffa002019c>] VmSyncThawDevices+0x56/0x9e
[vmsync]
Aug 9 10:34:57 db24 kernel: [<ffffffffa0020542>]
VmSyncUnlockedIoctl+0x1b4/0x208 [vmsync]
Aug 9 10:34:57 db24 kernel: [<ffffffff81115dd1>]
proc_reg_unlocked_ioctl+0x82/0xa1
Aug 9 10:34:57 db24 kernel: [<ffffffff8102c05d>] ?
pick_next_task_fair+0xfc/0x10c
Aug 9 10:34:57 db24 kernel: [<ffffffffa002038e>] ? VmSyncAddPath+0xff/0xff
[vmsync]
Aug 9 10:34:57 db24 kernel: [<ffffffff810dcf9f>] do_vfs_ioctl+0x3f6/0x47b
Aug 9 10:34:57 db24 kernel: [<ffffffff810dd066>] sys_ioctl+0x42/0x65
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:34:57 db24 kernel: INFO: task postmaster:3112 blocked for more than
120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:34:57 db24 kernel: ffff8800563f5d18 0000000000000082 0000000000000000
0000000000000000
Aug 9 10:34:57 db24 kernel: ffff8800563f4010 ffff8800590d93e0 0000000000010940
ffff8800563f5fd8
Aug 9 10:34:57 db24 kernel: ffff8800563f5fd8 0000000000010940 ffff88005ab9c240
ffff8800590d93e0
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:34:57 db24 kernel: [<ffffffff8100875a>] ? read_tsc+0x9/0x1b
Aug 9 10:34:57 db24 kernel: [<ffffffff810528b0>] ? timekeeping_get_ns+0x15/0x34
Aug 9 10:34:57 db24 kernel: [<ffffffff81053370>] ? ktime_get_ts+0x80/0x89
Aug 9 10:34:57 db24 kernel: [<ffffffff810dda04>] ?
poll_select_copy_remaining+0xc6/0xea
Aug 9 10:34:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:34:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:34:57 db24 kernel: INFO: task postmaster:3115 blocked for more than
120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:34:57 db24 kernel: ffff8800563edb38 0000000000000082 0000000000000000
ffff880000000000
Aug 9 10:34:57 db24 kernel: ffff8800563ec010 ffff8800590dcf80 0000000000010940
ffff8800563edfd8
Aug 9 10:34:57 db24 kernel: ffff8800563edfd8 0000000000010940 ffff88005ab9c240
ffff8800590dcf80
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff8115b99a>] xfs_create+0x18c/0x4c7
Aug 9 10:34:57 db24 kernel: [<ffffffff81150b56>] ? xfs_ilock+0x59/0x5e
Aug 9 10:34:57 db24 kernel: [<ffffffff810df065>] ? spin_lock+0x9/0xb
Aug 9 10:34:57 db24 kernel: [<ffffffff810df877>] ? _d_rehash+0x1f/0x22
Aug 9 10:34:57 db24 kernel: [<ffffffff811541d1>] xfs_vn_mknod+0xca/0x15e
Aug 9 10:34:57 db24 kernel: [<ffffffff81154280>] xfs_vn_create+0xb/0xd
Aug 9 10:34:57 db24 kernel: [<ffffffff810da7ad>] vfs_create+0x4e/0x6e
Aug 9 10:34:57 db24 kernel: [<ffffffff810db68c>] do_last+0x201/0x502
Aug 9 10:34:57 db24 kernel: [<ffffffff810dba61>] path_openat+0xd4/0x31a
Aug 9 10:34:57 db24 kernel: [<ffffffff810c13f4>] ? kmem_cache_free+0x15/0x6e
Aug 9 10:34:57 db24 kernel: [<ffffffff810dbd96>] do_filp_open+0x38/0x86
Aug 9 10:34:57 db24 kernel: [<ffffffff810e507a>] ? alloc_fd+0x116/0x129
Aug 9 10:34:57 db24 kernel: [<ffffffff810ce147>] do_sys_open+0x6e/0x107
Aug 9 10:34:57 db24 kernel: [<ffffffff810ce209>] sys_open+0x1b/0x1d
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:34:57 db24 kernel: INFO: task flush-259:26214:19297 blocked for more
than 120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:34:57 db24 kernel: ffff880052039b10 0000000000000046 ffff880052039aa0
ffff880047536078
Aug 9 10:34:57 db24 kernel: ffff880052038010 ffff880056263500 0000000000010940
ffff880052039fd8
Aug 9 10:34:57 db24 kernel: ffff880052039fd8 0000000000010940 ffff880059620000
ffff880056263500
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c605>] ?
spin_unlock_irqrestore+0x9/0xb
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff8109180a>] ? generic_writepages+0x4b/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8115906b>] xfs_log_dirty_inode+0x2b/0x9f
Aug 9 10:34:57 db24 kernel: [<ffffffff8115716d>] xfs_fs_write_inode+0x47/0x107
Aug 9 10:34:57 db24 kernel: [<ffffffff810ec01b>]
writeback_single_inode+0x1b8/0x2ef
Aug 9 10:34:57 db24 kernel: [<ffffffff810ec705>]
writeback_sb_inodes+0x168/0x201
Aug 9 10:34:57 db24 kernel: [<ffffffff810eccfe>]
__writeback_inodes_wb+0x6d/0xab
Aug 9 10:34:57 db24 kernel: [<ffffffff810ecea6>] wb_writeback+0xf1/0x18b
Aug 9 10:34:57 db24 kernel: [<ffffffff8101f406>] ?
default_spin_lock_flags+0x9/0xf
Aug 9 10:34:57 db24 kernel: [<ffffffff8136fbc4>] ?
_raw_spin_lock_irqsave+0x22/0x2b
Aug 9 10:34:57 db24 kernel: [<ffffffff810ed07f>] wb_do_writeback+0x13f/0x1a0
Aug 9 10:34:57 db24 kernel: [<ffffffff8103fdbe>] ? del_timer+0x81/0x81
Aug 9 10:34:57 db24 kernel: [<ffffffff810ed170>]
bdi_writeback_thread+0x90/0x164
Aug 9 10:34:57 db24 kernel: [<ffffffff810ed0e0>] ? wb_do_writeback+0x1a0/0x1a0
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c17c>] kthread+0x7d/0x85
Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb4>] kernel_thread_helper+0x4/0x10
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c0ff>] ?
kthread_worker_fn+0x153/0x153
Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb0>] ? gs_change+0x13/0x13
Aug 9 10:34:57 db24 kernel: INFO: task postmaster:26077 blocked for more than
120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:34:57 db24 kernel: ffff8800257e7d18 0000000000000086 ffffea00014d6680
0000000000000000
Aug 9 10:34:57 db24 kernel: ffff8800257e6010 ffff880056261a80 0000000000010940
ffff8800257e7fd8
Aug 9 10:34:57 db24 kernel: ffff8800257e7fd8 0000000000010940 ffffffff8160d020
ffff880056261a80
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:34:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:34:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task vmtoolsd:1426 blocked for more than 120
seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:36:57 db24 kernel: ffff880059e0fc78 0000000000000082 ffff880059af3ba0
0000000000000001
Aug 9 10:36:57 db24 kernel: ffff880059e0e010 ffff880058ea5cc0 0000000000010940
ffff880059e0ffd8
Aug 9 10:36:57 db24 kernel: ffff880059e0ffd8 0000000000010940 ffff880056264f80
ffff880058ea5cc0
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8136fb1e>]
rwsem_down_failed_common+0xe4/0x116
Aug 9 10:36:57 db24 kernel: [<ffffffff8136fb63>]
rwsem_down_write_failed+0x13/0x15
Aug 9 10:36:57 db24 kernel: [<ffffffff811c4503>]
call_rwsem_down_write_failed+0x13/0x20
Aug 9 10:36:57 db24 kernel: [<ffffffff8136f35c>] ? down_write+0x20/0x22
Aug 9 10:36:57 db24 kernel: [<ffffffff810d1e44>] thaw_super+0x1c/0xae
Aug 9 10:36:57 db24 kernel: [<ffffffff810f6b5e>] thaw_bdev+0x45/0x74
Aug 9 10:36:57 db24 kernel: [<ffffffffa002019c>] VmSyncThawDevices+0x56/0x9e
[vmsync]
Aug 9 10:36:57 db24 kernel: [<ffffffffa0020542>]
VmSyncUnlockedIoctl+0x1b4/0x208 [vmsync]
Aug 9 10:36:57 db24 kernel: [<ffffffff81115dd1>]
proc_reg_unlocked_ioctl+0x82/0xa1
Aug 9 10:36:57 db24 kernel: [<ffffffff8102c05d>] ?
pick_next_task_fair+0xfc/0x10c
Aug 9 10:36:57 db24 kernel: [<ffffffffa002038e>] ? VmSyncAddPath+0xff/0xff
[vmsync]
Aug 9 10:36:57 db24 kernel: [<ffffffff810dcf9f>] do_vfs_ioctl+0x3f6/0x47b
Aug 9 10:36:57 db24 kernel: [<ffffffff810dd066>] sys_ioctl+0x42/0x65
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task postmaster:3112 blocked for more than
120 seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:36:57 db24 kernel: ffff8800563f5d18 0000000000000082 0000000000000000
0000000000000000
Aug 9 10:36:57 db24 kernel: ffff8800563f4010 ffff8800590d93e0 0000000000010940
ffff8800563f5fd8
Aug 9 10:36:57 db24 kernel: ffff8800563f5fd8 0000000000010940 ffff88005ab9c240
ffff8800590d93e0
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:36:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:36:57 db24 kernel: [<ffffffff8100875a>] ? read_tsc+0x9/0x1b
Aug 9 10:36:57 db24 kernel: [<ffffffff810528b0>] ? timekeeping_get_ns+0x15/0x34
Aug 9 10:36:57 db24 kernel: [<ffffffff81053370>] ? ktime_get_ts+0x80/0x89
Aug 9 10:36:57 db24 kernel: [<ffffffff810dda04>] ?
poll_select_copy_remaining+0xc6/0xea
Aug 9 10:36:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:36:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task postmaster:3115 blocked for more than
120 seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:36:57 db24 kernel: ffff8800563edb38 0000000000000082 0000000000000000
ffff880000000000
Aug 9 10:36:57 db24 kernel: ffff8800563ec010 ffff8800590dcf80 0000000000010940
ffff8800563edfd8
Aug 9 10:36:57 db24 kernel: ffff8800563edfd8 0000000000010940 ffff88005ab9c240
ffff8800590dcf80
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7
Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:36:57 db24 kernel: [<ffffffff8115b99a>] xfs_create+0x18c/0x4c7
Aug 9 10:36:57 db24 kernel: [<ffffffff81150b56>] ? xfs_ilock+0x59/0x5e
Aug 9 10:36:57 db24 kernel: [<ffffffff810df065>] ? spin_lock+0x9/0xb
Aug 9 10:36:57 db24 kernel: [<ffffffff810df877>] ? _d_rehash+0x1f/0x22
Aug 9 10:36:57 db24 kernel: [<ffffffff811541d1>] xfs_vn_mknod+0xca/0x15e
Aug 9 10:36:57 db24 kernel: [<ffffffff81154280>] xfs_vn_create+0xb/0xd
Aug 9 10:36:57 db24 kernel: [<ffffffff810da7ad>] vfs_create+0x4e/0x6e
Aug 9 10:36:57 db24 kernel: [<ffffffff810db68c>] do_last+0x201/0x502
Aug 9 10:36:57 db24 kernel: [<ffffffff810dba61>] path_openat+0xd4/0x31a
Aug 9 10:36:57 db24 kernel: [<ffffffff810c13f4>] ? kmem_cache_free+0x15/0x6e
Aug 9 10:36:57 db24 kernel: [<ffffffff810dbd96>] do_filp_open+0x38/0x86
Aug 9 10:36:57 db24 kernel: [<ffffffff810e507a>] ? alloc_fd+0x116/0x129
Aug 9 10:36:57 db24 kernel: [<ffffffff810ce147>] do_sys_open+0x6e/0x107
Aug 9 10:36:57 db24 kernel: [<ffffffff810ce209>] sys_open+0x1b/0x1d
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task postmaster:13762 blocked for more than
120 seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Aug 9 10:36:57 db24 kernel: ffff880029025d18 0000000000000082 ffff880000000000
0000000000000000
Aug 9 10:36:57 db24 kernel: ffff880029024010 ffff880056260d40 0000000000010940
ffff880029025fd8
Aug 9 10:36:57 db24 kernel: ffff880029025fd8 0000000000010940 ffffffff8160d020
ffff880056260d40
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:36:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:36:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:36:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
I then tried
# echo 3 >/proc/sys/vm/drop_caches
but the system hung completely.
This is a VMware VM, and I have no idea how to send SYS-T or similar keys to it.
I had to reset it (reboot hung), and everything worked again. This VM is now on
kernel 3.5
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531
signature.asc
Description: This is a digitally signed message part.
|