[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
DHCP failover failure when peer crashed
Hi,
I got a interesting scenario using dhcp-3.0.1rc9 with DHCP failover.
One of the peers shut down its one and only / fs (XFS on Linux, shut
down due to filesystem errors, applications just get an I/O error after
the filesystem has been umounted by the kernel). So the system (kernel,
daemons, network) is still running but no application can complete disk
I/O.
The other peer could not hand out leases because the running dhcpd on
the crashed machine disturbed and did not let go control.
These are some lines from the working peer not being able to hand out
leases because 192.168.9.50 (the crashed peer) was still alive on
network:
May 24 09:08:56 hermes dhcpd: DHCPREQUEST for 192.168.9.173
(192.168.9.50) from 00:01:02:c8:cd:8a via eth0: lease in transition
state expired
May 24 09:08:56 hermes dhcpd: DHCPREQUEST for 192.168.9.173
(192.168.9.50) from 00:01:02:c8:cd:8a via 192.168.9.6: lease in
transition state expired
May 24 09:09:00 hermes dhcpd: DHCPREQUEST for 192.168.9.173
(192.168.9.50) from 00:01:02:c8:cd:8a via eth0: lease in transition
state expired
May 24 09:09:00 hermes dhcpd: DHCPREQUEST for 192.168.9.173
(192.168.9.50) from 00:01:02:c8:cd:8a via 192.168.9.6: lease in
transition state expired
May 24 09:09:08 hermes dhcpd: DHCPREQUEST for 192.168.9.173
(192.168.9.50) from 00:01:02:c8:cd:8a via eth0: lease in transition
state expired
May 24 09:09:08 hermes dhcpd: DHCPREQUEST for 192.168.9.173
(192.168.9.50) from 00:01:02:c8:cd:8a via 192.168.9.6: lease in
transition state expired
My opinion is: the dhcpd does not handle gracefully disk I/O errors when
trying to write dhcpd.leases. If an important filesystem is shut down
due to errors, the dhcpd on the crashed host disturbs failover and does
not completely give control to the surviving peer.