JBoss.org Community Documentation

19.5.1. FD

FD is a failure detection protocol based on heartbeat messages. This protocol requires each node to periodically send are-you-alive messages to its neighbour. If the neighbour fails to respond, the calling node sends a SUSPECT message to the cluster. The current group coordinator can optionally double check whether the suspected node is indeed dead after which, if the node is still considered dead, updates the cluster's view. Here is an example FD configuration.

<FD timeout="2000"
    max_tries="3"
    shun="true"
    down_thread="false" up_thread="false"/>

The available attributes in the FD element are listed below.

  • timeout specifies the maximum number of milliseconds to wait for the responses to the are-you-alive messages. The default is 3000.

  • max_tries specifies the number of missed are-you-alive messages from a node before the node is suspected. The default is 2.

  • shun specifies whether a failed node will be shunned. Once shunned, the node will be expelled from the cluster even if it comes back later. The shunned node would have to re-join the cluster through the discovery process. JGroups allows to configure itself such that shunning leads to automatic rejoins and state transfer, which is the default behaivour within JBoss Application Server.

Note

Regular traffic from a node counts as if it is a live. So, the are-you-alive messages are only sent when there is no regular traffic to the node for sometime.