Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
nonstop.conf(5)		   Slurm Configuration File	       nonstop.conf(5)

NAME
       nonstop.conf - Slurm configuration file for fault-tolerant computing.

DESCRIPTION
       nonstop.conf  is	 an  ASCII file	which describes	the configuration used
       for fault-tolerant computing  with  Slurm  using	 the  optional	slurm-
       ctld/nonstop  plugin.  This plugin provides a means for users to	notify
       Slurm of	nodes it believes are suspect, replace the  job's  failing  or
       failed nodes, and extend	a job's	in response to failures.  The file lo-
       cation  can  be	modified  at  system  build   time   using   the   DE-
       FAULT_SLURM_CONF	  parameter  or	 at  execution	time  by  setting  the
       SLURM_CONF environment variable.	The file will always be	located	in the
       same directory as the slurm.conf	file.

       Parameter  names	are case insensitive.  Any text	following a "#"	in the
       configuration file is treated as	a comment  through  the	 end  of  that
       line.   Changes	to  the	configuration file take	effect upon restart of
       Slurm daemons, daemon receipt of	the SIGHUP signal, or execution	of the
       command	"scontrol reconfigure" unless otherwise	noted.	The configura-
       tion parameters available include:

       BackupAddr
	      Communications address used for the slurmctld daemon.  This  can
	      either  be a hostname or IP address.  This value would typically
	      be the same as the secondary  SlurmctldHost  in  the  slurm.conf
	      file, when applicable.

       ControlAddr
	      Communications  address used for the slurmctld daemon.  This can
	      either be	a hostname or IP address.  This	value would  typically
	      be the same as the SlurmctldHost in the slurm.conf file.

       Debug  A	 number	indicating the level of	additional logging desired for
	      the plugin.  The default value is	zero, which generates no addi-
	      tional logging.

       HotSpareCount
	      This identifies how many nodes in	each partition should be main-
	      tained as	spare resources.  When a job fails, this pool  of  re-
	      sources  will be depleted	and then replenished when possible us-
	      ing idle resources.  The value should be a comma delimited  list
	      of partition and node count pairs	separated by a colon.

       MaxSpareNodeCount
	      This  identifies	the maximum number of nodes any	single job may
	      replace through the job's	entire lifetime.  This could prevent a
	      single  job  from	causing	all of the nodes in a cluster to fail.
	      By default, there	is no maximum node count.

       Port   Port used	for communications.  The default value is 6820.

       TimeLimitDelay
	      If a job requires	replacement resources and none are immediately
	      available,  then	permit	a  job to extend its time limit	by the
	      length of	time required to secure	replacement  resources	up  to
	      the  number of minutes specified by TimeLimitDelay.  This	option
	      will only	take effect if no hot spare resources are available at
	      the  time	 replacement resources are requested.  This time limit
	      extension	is in addition to the value calculated using the Time-
	      LimitExtend.   The  default  value is zero (no time limit	exten-
	      sion).  The value	may not	exceed 65533 seconds.

       TimeLimitDrop
	      Specifies	the number of minutes that a job can extend  its  time
	      limit for	each failed or failing node removed from the job's al-
	      location.	 The default value is zero (no time limit  extension).
	      The value	may not	exceed 65533 seconds.

       TimeLimitExtend
	      Specifies	 the  number of	minutes	that a job can extend its time
	      limit for	each replaced node.  The default  value	 is  zero  (no
	      time limit extension).  The value	may not	exceed 65533 seconds.

       UserDrainAllow
	      This identifies a	comma delimited	list of	user names or user IDs
	      of users who are authorized to  drain  nodes  they  believe  are
	      failing.	 Specify  a value of "ALL" to permit any user to drain
	      nodes.  By default, no users may drain nodes using  this	inter-
	      face.

       UserDrainDeny
	      This identifies a	comma delimited	list of	user names or user IDs
	      of users who are NOT authorized to drain nodes they believe  are
	      failing.	Specifying a value for UserDrainDeny implicitly	allows
	      all other	users to drain nodes (sets the value of	UserDrainAllow
	      to "ALL").

EXAMPLE
       #
       # Sample	nonstop.conf file
       # Date: 12 Feb 2013
       #
       ControlAddr=12.34.56.78
       BackupAddr=12.34.56.79
       Port=1234
       #
       HotSpareCount=batch:6,interactive:0
       MaxSpareNodesCount=4
       TimeLimitDelay=30
       TimeLimitExtend=20
       TimeLimitExtend=10
       UserDrainAllow=adam,brenda

COPYING
       Copyright (C) 2013-2014 SchedMD LLC. All	rights reserved.

       Slurm  is  distributed  in the hope that	it will	be useful, but WITHOUT
       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
       FITNESS	FOR  A PARTICULAR PURPOSE.  See	the GNU	General	Public License
       for more	details.

SEE ALSO
       slurm.conf(5)

April 2015		   Slurm Configuration File	       nonstop.conf(5)

NAME | DESCRIPTION | EXAMPLE | COPYING | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=nonstop.conf&sektion=5&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help