>From d20b823b7a9ffd7a5eb468636e4c3d4536449fd7 Mon Sep 17 00:00:00 2001
From: Scott Emery <emery@xxxxxxx>
Date: Tue, 15 Jun 2010 20:47:10 -0700
Subject: [PATCH 2/5] Put something more informative into the Lustrecomm help
file.
---
src/pmdas/lustrecomm/help | 100 +++++++++++++++++++++++++++++++--------------
1 files changed, 69 insertions(+), 31 deletions(-)
diff --git a/src/pmdas/lustrecomm/help b/src/pmdas/lustrecomm/help
index 0876863..af3b27b 100644
--- a/src/pmdas/lustrecomm/help
+++ b/src/pmdas/lustrecomm/help
@@ -34,35 +34,73 @@
# blank lines before the @ line are ignored
#
-@ lustrecomm.timeout
-
-@ lustrecomm.ldlm_timeout
-
-@ lustrecomm.dump_on_timeout
-
-@ lustrecomm.lustre_memused
-
-@ lustrecomm.lnet_memused
-
-@ lustrecomm.stats.msgs_alloc unknown
-
-@ lustrecomm.stats.msgs_max unknown
-
-@ lustrecomm.stats.errors unknown
-
-@ lustrecomm.stats.send_count unknown
-
-@ lustrecomm.stats.recv_count unknown
-
-@ lustrecomm.stats.route_count unknown
-
-@ lustrecomm.stats.drop_count unknown
-
-@ lustrecomm.stats.send_length unknown
-
-@ lustrecomm.stats.recv_length unknown
-
-@ lustrecomm.stats.route_length unknown
-
-@ lustrecomm.stats.drop_length unknown
+@ lustrecomm.timeout contents of /proc/sys/lustre/timeout
+The time period that a client waits for a server to complete an RPC
+(default in 1.6 is 100s). Servers wait half this time for a normal
+client RPC to compelte and a quarter of this time for a single
+bulk request to complete. The client pings recoverable targets
+(MDS and OSTs) at one quarter of the timeout, and the server
+waits on and a half times the timeout before evicting a client
+for being "stale". (source: Lustre 1.6 Operations Manual)
+
+@ lustrecomm.ldlm_timeout contents of /proc/sys/lustre/ldlm_timeout
+This is the time period for which a server will wait for a client
+to reply to an initial AST (lock cancellation request). The default
+is 20s for an OST and 6s for an MDS. (source: Lustre 1.6 Operations
+Manual)
+
+@ lustrecomm.dump_on_timeout contents of /proc/sys/lustre/ldlm_timeout
+A 1 triggers dumps of the Lustre debug log when timeouts occur.
+Default value 0. (source: Lustre 1.6 Operations Manual)
+
+@ lustrecomm.lustre_memused contents of /proc/sys/lustre/memused
+lustre/obdclass/linux/linux-sysctl.c: &proc_memory_alloc
+lustre/include/obd_support.h: obd_memory_sum()
+Total bytes allocated by Lustre (inferred from lustre/include/obd_support.h)
+
+@ lustrecomm.lnet_memused contents of /proc/sys/lnet/memused
+lnet/libcfs/linux/linux-proc.c: (int *)&libcfs_kmemory.counter
+Total bytes allocated by LNET. (inferred from lustre/include/obd_support.h)
+
+@ lustrecomm.stats.msgs_alloc first number from /proc/sys/lnet/stats
+routerstat source file: messages currently allocated (first number after M)
+
+@ lustrecomm.stats.msgs_max second number from /proc/sys/lnet/stats
+routerstat source file: messages maximum (highwater mark) (second
+number after M)
+
+@ lustrecomm.stats.errors third number from /proc/sys/lnet/stats
+routerstat source file: errors (number after E)
+
+@ lustrecomm.stats.send_count fourth number from /proc/sys/lnet/stats
+routerstat source file: send_count (raw data from which second number
+after S is derived).
+
+@ lustrecomm.stats.recv_count fifth number from /proc/sys/lnet/stats
+routerstat source file: recv_count (raw data from which second number
+after R is derived)
+
+@ lustrecomm.stats.route_count sixth number from /proc/sys/lnet/stats
+routerstat source file: route_count (raw data from which second number
+after R is derived)
+
+@ lustrecomm.stats.drop_count seventh number from /proc/sys/lnet/stats
+routerstat source file: drop_count (raw data from which second number
+after D is derived)
+
+@ lustrecomm.stats.send_length eigth number from /proc/sys/lnet/stats
+routerstat source file: send_length (raw data from which first number
+after S is derived)
+
+@ lustrecomm.stats.recv_length ninth number from /proc/sys/lnet/stats
+routerstat source file: recv_length (raw data from which first number
+after S is derived)
+
+@ lustrecomm.stats.route_length tenth number from /proc/sys/lnet/stats
+routerstat source file: route_length (raw data from which first number
+after R is derived)
+
+@ lustrecomm.stats.drop_length eleventh number from /proc/sys/lnet/stats
+routerstat source file: drop_length (raw data from which first number
+after D is derived)
--
1.6.1.2
|