Another permissions problem I ran into when trying to test Network
Connectivity:
$ cd /usr/lib/sysadm/privbin;
$ ls -l ClusterDiags
-rwxrwxr-x 1 root root 79 Jan 2 00:54 ClusterDiags
(the write bit needs to be off of this executable)
Once I fixed that, it tells me that:
Cluster Diagnostics have not been implemented in this release.
------------------------------------------------------------------------------
>From the GUI, I added my nodes to my cluster definition, and then
chose:
Fix or Upgrade Cluster Nodes
--> Start Failsafe HA Services
--> Start
It only started the first node, not the second. (No, I didn't select the
node from the optional drop down list) I the went back and
start it specifically on the second node (this time, I did select the
node from the drop down list), and then the gui updated the
status of the second node from 'Inactive' to 'OK'. No big deal, I
just wondered why it didn't come up on both nodes the first time.
I attached a portion of /var/log/messages below. There are some ugly
looking messages about:
Stale CDB handle.
CI_IPCERR_NOSERVER, cms ipc: ipcclnt_connect() failed, file
/var/cluster/ha/comm/cmsd-ipc_dru1a .Check if the cmsd daemon is running.
from the time I attempted to start the cluster.
------------------------------------------------------------------------------
OK, now I'm at the point where I want to define my resources.
>From FailSafe Manager:
--> Resources & Resource Types
--> Define a New Resource
I get a dialog: Create a new Resource Definition
It looks like I don't have any default resource types defined. All I've
got in the 'Resource Type' drop down list is 'template'. The admin
guide says there are some pre-defined resource types that look handy.
Do you have any definitions already made up for linux for:
IP Address
filesystem
I'll also need a RAID resource. I guess I need to crack open the
Programming guide. I'll have to do it eventually for my application
anyway.
-----------------------------------------------------------------------------
Just for yucks, I took a look at /var/cluster/ha/log. It looks like
one of the files is growing quite a bit!
[root@dru1a log]# ls -l
total 23669
-rw-r--r-- 1 root root 23892291 Jan 2 22:43 cad_log
-rw-r--r-- 1 root root 5922 Jan 2 22:36 cli_dru1a
-rw------- 1 root root 127777 Jan 2 22:36 cmond_log
-rw-r--r-- 1 root root 6173 Jan 2 22:24 cmsd_dru1a
-rw-r--r-- 1 root root 4782 Jan 2 22:15 crsd_dru1a
-rw-r--r-- 1 root root 8435 Jan 2 22:25 failsafe_dru1a
-rw------- 1 root root 50639 Jan 2 22:16 fs2d_log
-rw-r--r-- 1 root root 36799 Jan 2 22:43 gcd_dru1a
-rw-r--r-- 1 root root 1153 Jan 2 22:16 srmd_dru1a
I take it there is some kind of debugging turned on in the 'cad'
deamon?!? I only configured a 100MB var partition, so it looks like
I'll only be able to run my cluster for a few days :-)
Regards,
-Eric.
---
(/var/log/messages excerpt)
Jan 2 22:10:10 dru1a PAM_pwdb[1593]: (su) session closed for user root
Jan 2 22:10:14 dru1a runpriv[1598]: Running privilege ClusterDiags for user
root.
Jan 2 22:10:59 dru1a runpriv[1604]: Running privilege ClusterDiags for user
root.
Jan 2 22:11:05 dru1a runpriv[1605]: Running privilege ClusterDiags for user
root.
Jan 2 22:13:08 dru1a runpriv[1620]: Running privilege ClusterDiags for user
root.
Jan 2 22:14:40 dru1a runpriv[1631]: Running privilege haParamsModify for user
root.
Jan 2 22:14:40 dru1a cli[1631]: <<CI> E config 0> CI_ERR_INVAL, Internal
error: inte
rnal argument is invalid : Internal error no nodes in cluster
Jan 2 22:14:41 dru1a cli[1631]: <<CI> E config 0> CI_ERR_INVAL, CLI private
command:
failed (Internal error no nodes in cluster)
Jan 2 22:14:53 dru1a runpriv[1637]: Running privilege clusterAddMachine for
user roo
t.
Jan 2 22:14:56 dru1a cmond[537]: <cmond_cdb.c:477> Notification can not be
processed
, local machine and cluster name is not known.
Jan 2 22:14:56 dru1a cmond[537]: <cmond_cdb.c:558> Local machine belongs to
cluster
dru.
Jan 2 22:14:56 dru1a cmond[537]: <cmond_cdb.c:579> Local machine name is dru1a.
Jan 2 22:15:02 dru1a cmond[537]: <cmond_cdb.c:910> Stale CDB handle.
Jan 2 22:15:02 dru1a crsd[549]: <<CI> N log 0> Additional crsd logs can be
found in
/var/cluster/ha/log/crsd_dru1a.
Jan 2 22:15:21 dru1a runpriv[1692]: Running privilege haActivate for user root.
Jan 2 22:15:21 dru1a cmond[537]: <cmond_proc.c:142> New process ha_cmsd pid
1702
Jan 2 22:15:21 dru1a cmond[537]: <cmond_proc.c:142> New process ha_gcd pid 1703
Jan 2 22:15:21 dru1a cmond[537]: <cmond_proc.c:142> New process ha_srmd pid
1704
Jan 2 22:15:21 dru1a cmond[537]: <cmond_proc.c:142> New process ha_fsd pid 1706
Jan 2 22:15:22 dru1a ha_cmsd[1702]: <<CI> N log 0> Additional ha_cmsd logs can
be fo
und in /var/cluster/ha/log/cmsd_dru1a.
Jan 2 22:15:22 dru1a ha_gcd[1703]: <<CI> N log 0> Additional ha_gcd logs can
be foun
d in /var/cluster/ha/log/gcd_dru1a.
Jan 2 22:15:22 dru1a ha_cmsd[1702]: <<CI> N cms 0> ha_cmsd restarted.
Jan 2 22:15:22 dru1a ha_fsd[1706]: <<CI> N log 0> Additional ha_fsd logs can
be foun
d in /var/cluster/ha/log/failsafe_dru1a.
Jan 2 22:15:22 dru1a ha_fsd[1706]: <<CI> N fsd 0> /usr/cluster/bin/ha_fsd is
running
as foreground process
Jan 2 22:15:23 dru1a ha_srmd[1704]: <<CI> N log 0> Additional ha_srmd logs can
be fo
und in /var/cluster/ha/log/srmd_dru1a.
Jan 2 22:15:23 dru1a ha_cmsd[1702]: <<CI> N log 0> Additional ha_cmsd logs can
be fo
und in /var/cluster/ha/log/cmsd_dru1a.
Jan 2 22:15:23 dru1a ha_gcd[1703]: <<CI> N log 0> Additional ha_gcd logs can
be foun
d in /var/cluster/ha/log/gcd_dru1a.
Jan 2 22:15:23 dru1a ha_gcd[1703]: <<CI> N gcd 0> My node name = dru1a.
Jan 2 22:15:23 dru1a ha_gcd[1703]: <<CI> E cms 0> CI_IPCERR_NOSERVER, cms ipc:
ipccl
nt_connect() failed, file /var/cluster/ha/comm/cmsd-ipc_dru1a .Check if the
cmsd daem
on is running.
Jan 2 22:15:24 dru1a ha_gcd[1703]: <<CI> E cms 0> CI_IPCERR_NOSERVER, cms ipc:
ipccl
nt_connect() failed, file /var/cluster/ha/comm/cmsd-ipc_dru1a .Check if the
cmsd daem
on is running.
Jan 2 22:15:26 dru1a ha_cmsd[1702]: <<CI> N cms 0> Confirmed Membership: sqn 1
G_sqn
= 1, ack false node dru1a [1] : UP incarnation 1 age 1:0 node dru1b [2] :
DOWN*
incarnation 0 age 0:0
Jan 2 22:15:27 dru1a ha_gcd[1703]: <<CI> N gcd 0> My nodeid = 1 [0x1].
Jan 2 22:15:46 dru1a ha_srmd[1733]: <<CI> N srm 2> SRM ready to accept clients
Jan 2 22:16:30 dru1a ha_fsd[1706]: <<CI> N fsd 0> FailSafe initialization
complete -
- Move to state: UP
Jan 2 22:24:07 dru1a runpriv[1746]: Running privilege haActivate for user root.
Jan 2 22:24:08 dru1a ha_cmsd[1717]: <<CI> N log 1> Additional ha_cmsd logs can
be fo
und in /var/cluster/ha/log/cmsd_dru1a.
Jan 2 22:24:38 dru1a ha_cmsd[1702]: <<CI> N cms 0> Node dru1b id 2
added/enabled.
Jan 2 22:24:41 dru1a ha_cmsd[1702]: <<CI> N cms 0> Confirmed Membership: sqn 2
G_sqn
= 2, ack false node dru1a [1] : UP incarnation 1 age 2:0 node dru1b [2] :
UP inc
arnation 1 age 1:0
|