Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
VOTEQUORUM(5)	  Corosync Cluster Engine Programmer's Manual	 VOTEQUORUM(5)

NAME
       votequorum - Votequorum Configuration Overview

OVERVIEW
       The  votequorum	service	 is part of the	corosync project. This service
       can be optionally loaded	into the nodes of a corosync cluster to	 avoid
       split-brain  situations.	  It does this by having a number of votes as-
       signed to each system in	the cluster and	ensuring that only when	a  ma-
       jority of the votes are present,	cluster	operations are allowed to pro-
       ceed.  The service must be loaded into all nodes	 or  none.  If	it  is
       loaded  into  a	subset	of  cluster  nodes  the	results	will be	unpre-
       dictable.

       The following corosync.conf  extract  will  enable  votequorum  service
       within corosync:

       quorum {
	   provider: corosync_votequorum
       }

       votequorum  reads its configuration from	corosync.conf. Some values can
       be changed at runtime, others are only read at corosync startup.	It  is
       very  important	that  those values are consistent across all the nodes
       participating in	the cluster or	votequorum  behavior  will  be	unpre-
       dictable.

       votequorum  requires  an	 expected_votes	value to function, this	can be
       provided	in two ways.  The number of expected votes will	 be  automati-
       cally   calculated  when	 the  nodelist	{  }  section  is  present  in
       corosync.conf or	expected_votes can be specified	in the quorum {	} sec-
       tion.  Lack of both will	disable	votequorum. If both are	present	at the
       same time, the quorum.expected_votes value will override	the one	calcu-
       lated from the nodelist.

       Example (no nodelist) of	an 8 node cluster (each	node has 1 vote):

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
       }

       Example (with nodelist) of a 3 node cluster (each node has 1 vote):

       quorum {
	   provider: corosync_votequorum
       }

       nodelist	{
	   node	{
	       ring0_addr: 192.168.1.1
	   }
	   node	{
	       ring0_addr: 192.168.1.2
	   }
	   node	{
	       ring0_addr: 192.168.1.3
	   }
       }

SPECIAL	FEATURES
       two_node: 1

       Enables two node	cluster	operations (default: 0).

       The  "two  node cluster"	is a use case that requires special considera-
       tion.  With a standard two node cluster,	each node with a single	 vote,
       there are 2 votes in the	cluster. Using the simple majority calculation
       (50% of the votes + 1) to calculate quorum,  the	 quorum	 would	be  2.
       This  means  that  the both nodes would always have to be alive for the
       cluster to be quorate and operate.

       Enabling	two_node: 1, quorum is set artificially	to 1.

       Example configuration 1:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 2
	   two_node: 1
       }

       Example configuration 2:

       quorum {
	   provider: corosync_votequorum
	   two_node: 1
       }

       nodelist	{
	   node	{
	       ring0_addr: 192.168.1.1
	   }
	   node	{
	       ring0_addr: 192.168.1.2
	   }
       }

       NOTES: enabling two_node: 1 automatically enables wait_for_all.	It  is
       still  possible to override wait_for_all	by explicitly setting it to 0.
       If more than 2 nodes join the cluster, the two_node option is automati-
       cally disabled.

       wait_for_all: 1

       Enables Wait For	All (WFA) feature (default: 0).

       The  general behaviour of votequorum is to switch a cluster from	inquo-
       rate to quorate as soon as possible. For	example, in an 8 node cluster,
       where  every  node has 1	vote, expected_votes is	set to 8 and quorum is
       (50% + 1) 5. As soon as 5 (or more) nodes are visible  to  each	other,
       the partition of	5 (or more) becomes quorate and	can start operating.

       When  WFA  is  enabled,	the cluster will be quorate for	the first time
       only after all nodes have been visible at least once at the same	time.

       This feature has	the advantage of avoiding  some	 startup  race	condi-
       tions,  with  the cost that all nodes need to be	up at the same time at
       least once before the cluster can operate.

       A common	startup	race condition based on	the above example is  that  as
       soon as 5 nodes become quorate, with the	other 3	still offline, the re-
       maining 3 nodes will be fenced.

       It is very useful when combined with last_man_standing (see below).

       Example configuration:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   wait_for_all: 1
       }

       last_man_standing: 1 / last_man_standing_window:	10000

       Enables	Last  Man  Standing  (LMS)  feature  (default:	0).    Tunable
       last_man_standing_window	(default: 10 seconds, expressed	in ms).

       The general behaviour of	votequorum is to set expected_votes and	quorum
       at startup (unless modified by the user at runtime, see below) and  use
       those values during the whole lifetime of the cluster.

       Using  for  example  an	8 node cluster where each node has 1 vote, ex-
       pected_votes is set to 8	and quorum to 5. This condition	allows a total
       failure	of 3 nodes. If a 4th node fails, the cluster becomes inquorate
       and it will stop	providing services.

       Enabling	 LMS  allows  the  cluster  to	dynamically  recalculate   ex-
       pected_votes  and  quorum under specific	circumstances. It is essential
       to enable WFA when using	LMS in High Availability clusters.

       Using the above 8 node cluster example, with LMS	 enabled  the  cluster
       can  retain quorum and continue operating by losing, in a cascade fash-
       ion, up to 6 nodes with only 2 remaining	active.

       Example chain of	events:
       1) cluster is fully operational with 8 nodes.
	  (expected_votes: 8 quorum: 5)

       2) 3 nodes die, cluster is quorate with 5 nodes.

       3) after	last_man_standing_window timer expires,
	  expected_votes and quorum are	recalculated.
	  (expected_votes: 5 quorum: 3)

       4) at this point, 2 more	nodes can die and
	  cluster will still be	quorate	with 3.

       5) once again, after last_man_standing_window
	  timer	expires	expected_votes and quorum are
	  recalculated.
	  (expected_votes: 3 quorum: 2)

       6) at this point, 1 more	node can die and
	  cluster will still be	quorate	with 2.

       7) one more last_man_standing_window timer
	  (expected_votes: 2 quorum: 2)

       NOTES: In order for the cluster to downgrade automatically from 2 nodes
       to  a 1 node cluster, the auto_tie_breaker feature must also be enabled
       (see below).  If	auto_tie_breaker is not	enabled, and one more  failure
       occurs,	the remaining node will	not be quorate.	LMS does not work with
       asymmetric voting schemes, each node must vote 1. LMS is	also incompat-
       ible   with  quorum  devices,  if  last_man_standing  is	 specified  in
       corosync.conf then the quorum device will be disabled.

       Example configuration 1:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   last_man_standing: 1
       }

       Example configuration 2 (increase timeout to 20 seconds):

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   last_man_standing: 1
	   last_man_standing_window: 20000
       }

       auto_tie_breaker: 1

       Enables Auto Tie	Breaker	(ATB) feature (default:	0).

       The general behaviour of	votequorum allows a simultaneous node  failure
       up to 50% - 1 node, assuming each node has 1 vote.

       When  ATB  is  enabled,	the  cluster can suffer	up to 50% of the nodes
       failing at the same time, in a deterministic fashion.  By  default  the
       cluster	partition,  or the set of nodes	that are still in contact with
       the node	that has the lowest nodeid  will  remain  quorate.  The	 other
       nodes will be inquorate.	This behaviour can be changed by also specify-
       ing

       auto_tie_breaker_node: lowest|highest|<list of node IDs>

       'lowest'	is the default,	'highest' is similar in	that  if  the  current
       set  of	nodes contains the highest nodeid then it will remain quorate.
       Alternatively it	is possible to specify a particular node ID or list of
       node  IDs  that	will be	required to maintain quorum. If	a (space-sepa-
       rated) list is given, the nodes are evaluated in	order, so if the first
       node  is	 present  then it will be used to determine the	quorate	parti-
       tion, if	that node is not in either half	(ie was	not in the cluster be-
       fore  the split)	then the second	node ID	will be	checked	for and	so on.
       ATB is incompatible with	quorum devices - if auto_tie_breaker is	speci-
       fied in corosync.conf then the quorum device will be disabled.

       Example configuration 1:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   auto_tie_breaker: 1
	   auto_tie_breaker_node: lowest
       }

       Example configuration 2:
       quorum {
	   provider: corosync_votequorum
	   expected_votes: 8
	   auto_tie_breaker: 1
	   auto_tie_breaker_node: 1 3 5
       }

       allow_downscale:	1

       Enables allow downscale (AD) feature (default: 0).

       THIS FEATURE IS INCOMPLETE AND CURRENTLY	UNSUPPORTED.

       The general behaviour of	votequorum is to never decrease	expected votes
       or quorum.

       When AD is enabled, both	expected votes	and  quorum  are  recalculated
       when  a node leaves the cluster in a clean state	(normal	corosync shut-
       down process) down to configured	expected_votes.

       Example use case:

       1) N node cluster (where	N is any value higher than 3)

       2) expected_votes set to	3 in corosync.conf

       3) only 3 nodes are running

       4) admin	requires to increase processing	power and adds 10 nodes

       5) internal expected_votes is automatically set to 13

       6) minimum expected_votes is 3 (from configuration)

       - up to this point this is standard votequorum behavior -

       7) once the work	is done, admin wants to	remove nodes from the cluster

       8) using	an ordered shutdown the	admin can reduce the cluster size
	  automatically	back to	3, but not below 3, where normal quorum
	  operation will work as usual.

       Example configuration:

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 3
	   allow_downscale: 1
       }
       allow_downscale implicitly enabled EVT (see below).

       expected_votes_tracking:	1

       Enables Expected	Votes Tracking (EVT) feature (default: 0).

       Expected	Votes Tracking stores the highest-seen value of	expected votes
       on  disk	 and  uses that	as the minimum value for expected votes	in the
       absence of any higher authority (eg a current quorate cluster). This is
       useful for when a group of nodes	becomes	detached from the main cluster
       and after a restart could have enough votes to  provide	quorum,	 which
       can happen after	using allow_downscale.

       Note  that  even	if the in-memory version of expected_votes is reduced,
       eg by removing nodes or using  corosync-quorumtool,  the	 stored	 value
       will still be the highest value seen - it never gets reduced.

       The  value  is held in the file /var/lib/corosync/ev_tracking which can
       be deleted if you really	do need	to reduce the expected votes  for  any
       reason, like the	node has been moved to a different cluster.

VARIOUS	NOTES
       * WFA / LMS / ATB / AD can be used combined together.

       *  In  order  to	 change	the default votes for a	node there are two op-
       tions:

       1) nodelist:

       nodelist	{
	   node	{
	       ring0_addr: 192.168.1.1
	       quorum_votes: 3
	   }
	   ....
       }

       2) quorum section (deprecated):

       quorum {
	   provider: corosync_votequorum
	   expected_votes: 2
	   votes: 2
       }

       In the event that both nodelist and quorum { votes: } are defined,  the
       value from the nodelist will be used.

       *  Only votes, quorum_votes, expected_votes and two_node	can be changed
       at runtime. Everything else requires a cluster restart.

BUGS
       No known	bugs at	the time of writing. The authors are from  outerspace.
       Deal with it.

SEE ALSO
       corosync(8),  corosync.conf(5),	corosync-quorumtool(8),	 corosync-qde-
       vice(8),	votequorum_overview(3)

corosync Man Page		  2012-01-24			 VOTEQUORUM(5)

NAME | OVERVIEW | SPECIAL FEATURES | VARIOUS NOTES | BUGS | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=votequorum&sektion=5&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help