Hi Ben,
>
>
Thanks for the comments.
>
> What was the symptom that led to the discovery of this problem?
>
> Reviewed-by: Ben Myers <bpm@xxxxxxx>
>
It started with the messages like the example below being logged by syslog:
shrink_slab: xfs_buftarg_shrink+0x0/0x160 [xfs] negative objects to delete
nr=-61993820
shrink_slab: xfs_buftarg_shrink+0x0/0x160 [xfs] negative objects to delete
nr=-146
shrink_slab: xfs_buftarg_shrink+0x0/0x160 [xfs] negative objects to delete
nr=-240601220
shrink_slab: xfs_buftarg_shrink+0x0/0x160 [xfs] negative objects to delete
nr=-152
shrink_slab: xfs_buftarg_shrink+0x0/0x160 [xfs] negative objects to delete
nr=-2921236993
These messages came from shrink_slab().
After that I've added a second counter into the xfs_buftarg_shrink() to check
the amount of elements in list (via list_for_each() macro) to confirm the
discrepancy between the counter and the real number of elements in list, and
last, Eric added a second and local counter to xfs_buftarg_shrink, to account
the number of buffers being added and removed from the dispose list into each
call to xfs_buftarg_shrink(), where, when the problem started, we could see a
wrong number of buffers beind added and/or removed from the dispose list.
Cheers.
--
--Carlos
|