Takashi Sato wrote:
What is the difference between the timeout and AUTO-THAW?
When the kernel detects a deadlock, does it occur to solve it?
TIMEOUT is a user-specified limit for the freeze. It is
not a deadlock preventer or deadlock breaker. The reason
it exists is:
- middle of the night (low but not zero users)
- cron triggers freeze and hardware snapshot
- san is overloaded by tape copy traffic so
hardware will take 2 hours to ack snapshot done
- user "company president" tries to create a report
needed for an AM meeting with bankers
- with so few users, system will just patiently
wait for hardware to finish
- after 10 minutes "company president" pages
admin, admin's boss, and "IT vice president"
in a real unhappy mood
AUTO-THAW is simply a name for the effect of all deadlock
preventer and deadlock breaker code that the kernel has
in the freeze implementation paths... if that code would
unfreeze the filesystem. We also implemented deadlock
preventer code that does not thaw the freeze.
None of the AUTO-THAW code is there to stop a stupid
userspace program caller of freeze. It handles things
like "a system in our cluster is going down so we
must have this filesystem unfrozen or the whole
cluster will crash". In places where there could be
a kernel deadlock we made it "lock-only-if-non-blocking"
and if we could not wait to retry later, the failure
to lock would trigger an immediate unfreeze.
Deadlock prevention needs code in critical paths in more
than just filesystems. Sometimes this is as simple as
an "I can't wait on freeze" flag added to a vm-filesystem
Timers just don't work for keeping the kernel alive
because they don't trigger on resource exhaustion.