Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
gres.conf(5)		   Slurm Configuration File		  gres.conf(5)

NAME
       gres.conf  -  Slurm configuration file for Generic RESource (GRES) man-
       agement.

DESCRIPTION
       gres.conf is an ASCII file which	describes the configuration of Generic
       RESource	 (GRES)	 on each compute node.	If the GRES information	in the
       slurm.conf file	does  not  fully  describe  those  resources,  then  a
       gres.conf file should be	included on each compute node.	The file loca-
       tion can	be modified at system build time using the  DEFAULT_SLURM_CONF
       parameter  or  at  execution time by setting the	SLURM_CONF environment
       variable. The file will always be located in the	same directory as  the
       slurm.conf file.

       If  the	GRES  information in the slurm.conf file fully describes those
       resources (i.e. no "Cores", "File" or "Links" specification is required
       for that	GRES type or that information is automatically detected), that
       information may be omitted from the gres.conf file and only the config-
       uration information in the slurm.conf file will be used.	 The gres.conf
       file may	be omitted completely if the configuration information in  the
       slurm.conf file fully describes all GRES.

       If  using  the  gres.conf  file	to describe the	resources available to
       nodes, the first	parameter on the line should be	NodeName. If configur-
       ing  Generic Resources without specifying nodes,	the first parameter on
       the line	should be Name.

       Parameter names are case	insensitive.  Any text following a "#" in  the
       configuration  file  is	treated	 as  a comment through the end of that
       line.  Changes to the configuration file	take effect  upon  restart  of
       Slurm daemons, daemon receipt of	the SIGHUP signal, or execution	of the
       command "scontrol reconfigure" unless otherwise noted.

       NOTE:  Slurm  support  for  gres/mps  requires  the  use	 of  the   se-
       lect/cons_tres  plugin.	For  more information on how to	configure MPS,
       see https://slurm.schedmd.com/gres.html#MPS_Management.

       For   more   information	  on   GRES   scheduling   in	general,   see
       https://slurm.schedmd.com/gres.html.

       The overall configuration parameters available include:

       AutoDetect
	      The  hardware  detection mechanisms to enable for	automatic GRES
	      configuration.  This should be on	a line by itself. Current, op-
	      tions are:

	      nvml   Used to automatically detect NVIDIA GPUs

	      rsmi   Used to automatically detect AMD GPUs

       Count  Number  of  resources  of	this type available on this node.  The
	      default value is set to the number of File values	specified  (if
	      any),  otherwise the default value is one. A suffix of "K", "M",
	      "G", "T" or "P" may be used to  multiply	the  number  by	 1024,
	      1048576,	  1073741824,	etc.   respectively.	For   example:
	      "Count=10G".

       Cores  Optionally specify the core index	numbers	for the	specific cores
	      which  can  use  this resource.  For example, it may be strongly
	      preferable to use	specific  cores	 with  specific	 GRES  devices
	      (e.g. on a NUMA architecture).  While Slurm can track and	assign
	      resources	at the CPU or thread level, its	scheduling  algorithms
	      used  to co-allocate GRES	devices	with CPUs operates at a	socket
	      or NUMA level.  Therefore	it is not possible  to	preferentially
	      assign  GRES  with  different  specific CPUs on the same NUMA or
	      socket and this option should be used to identify	all  cores  on
	      some socket.

	      Multiple	cores may be specified using a comma delimited list or
	      a	range may be specified using a "-" separator  (e.g.  "0,1,2,3"
	      or  "0-3").   If	a  job specifies --gres-flags=enforce-binding,
	      then only	the  identified	 cores	can  be	 allocated  with  each
	      generic resource.	This will tend to improve performance of jobs,
	      but delay	the allocation of resources to them.  If specified and
	      a	job is not submitted with the --gres-flags=enforce-binding op-
	      tion the identified cores	will be	preferred for scheduling  with
	      each generic resource.

	      If  --gres-flags=disable-binding is specified, then any core can
	      be used with the resources, which	also increases	the  speed  of
	      Slurm's  scheduling  algorithm  but  can degrade the application
	      performance.  The	--gres-flags=disable-binding  option  is  cur-
	      rently  required to use more CPUs	than are bound to a GRES (i.e.
	      if a GPU is bound	to the CPUs on one socket,  but	 resources  on
	      more  than one socket are	required to run	the job).  If any core
	      can be effectively used with the resources, then do not  specify
	      the  cores  option  for  improved	 speed in the Slurm scheduling
	      logic.  A	restart	of the slurmctld is needed for changes to  the
	      Cores option to take effect.

	      NOTE: Since Slurm	must be	able to	perform	resource management on
	      heterogeneous clusters having various processing unit  numbering
	      schemes,	a  logical core	index must be specified	instead	of the
	      physical core index.  That logical core index might  not	corre-
	      spond  to	 your  physical	core index number.  Core 0 will	be the
	      first core on the	first socket, while core 1 will	be the	second
	      core  on	the  first  socket.  This numbering coincides with the
	      logical core number (Core	L#) seen in "lstopo -l"	 command  out-
	      put.

       File   Fully  qualified	pathname of the	device files associated	with a
	      resource.	 The name can include a	numeric	range suffix to	be in-
	      terpreted	by Slurm (e.g. File=/dev/nvidia[0-3]).

	      This  field  is generally	required if enforcement	of generic re-
	      source allocations is to be supported (i.e. prevents users  from
	      making  use  of  resources  allocated to a different user).  En-
	      forcement	of the	file  allocation  relies  upon	Linux  Control
	      Groups  (cgroups)	 and  Slurm's  task/cgroup  plugin, which will
	      place the	allocated files	into the job's cgroup and prevent  use
	      of  other	 files.	 Please	see Slurm's Cgroups Guide for more in-
	      formation: https://slurm.schedmd.com/cgroups.html.

	      If File is specified then	Count must be either set to the	number
	      of  file	names  specified  or not set (the default value	is the
	      number of	files specified).  The exception to this is  MPS.  For
	      MPS,  each GPU would be identified by device file	using the File
	      parameter	and Count would	specify	the number of MPS entries that
	      would  correspond	to that	GPU (typically 100 or some multiple of
	      100).

	      NOTE: If you specify the File parameter for a resource  on  some
	      node,  the  option must be specified on all nodes	and Slurm will
	      track the	assignment of each specific  resource  on  each	 node.
	      Otherwise	 Slurm	will only track	a count	of allocated resources
	      rather than the state of each individual device file.

	      NOTE: Drain a node before	changing the  count  of	 records  with
	      File  parameters	(i.e. if you want to add or remove GPUs	from a
	      node's configuration).  Failure to do so will result in any  job
	      using those GRES being aborted.

       Flags  Optional flags that can be specified to change configured	behav-
	      ior of the GRES.

	      Allowed values at	present	are:

	      CountOnly		  Do not attempt to load plugin	as  this  GRES
				  will	only  be  used to track	counts of GRES
				  used.	This avoids attempting to load non-ex-
				  istent  plugin  which	can affect filesystems
				  with high latency  metadata  operations  for
				  non-existent files.

       Links  A	comma-delimited	list of	numbers	identifying the	number of con-
	      nections	between	 this  device  and  other  devices  to	 allow
	      coscheduling  of	better	connected devices.  This is an ordered
	      list in which the	number of connections this specific device has
	      to device	number 0 would be in the first position, the number of
	      connections it has to device number 1 in	the  second  position,
	      etc.  A -1 indicates the device itself and a 0 indicates no con-
	      nection.	If specified, then this	line can only contain a	single
	      GRES device (i.e.	can only contain a single file via File).

	      This  is	an  optional value and is usually automatically	deter-
	      mined if AutoDetect is enabled.  A typical use case would	be  to
	      identify	GPUs  having NVLink connectivity.  Note	that for GPUs,
	      the minor	number assigned	by the OS and used in the device  file
	      (i.e.  the X in /dev/nvidiaX) is not necessarily the same	as the
	      device number/index. The device number is	created	by sorting the
	      GPUs  by	PCI  bus  ID and then numbering	them starting from the
	      smallest		     bus	       ID.		   See
	      https://slurm.schedmd.com/gres.html#GPU_Management

       Name   Name of the generic resource. Any	desired	name may be used.  The
	      name must	match  a  value	 in  GresTypes	in  slurm.conf.	  Each
	      generic  resource	 has  an optional plugin which can provide re-
	      source-specific functionality.  Generic resources	that currently
	      include an optional plugin are:

	      gpu    Graphics Processing Unit

	      mps    CUDA Multi-Process	Service	(MPS)

	      nic    Network Interface Card

	      mic    Intel Many	Integrated Core	(MIC) processor

       NodeName
	      An  optional  NodeName  specification  can be used to permit one
	      gres.conf	file to	be used	for all	compute	nodes in a cluster  by
	      specifying  the  node(s)	that  each  line should	apply to.  The
	      NodeName specification can use a Slurm hostlist specification as
	      shown in the example below.

       Type   An  optional  arbitrary  string  identifying the type of device.
	      For example, this	might be used to identify a specific model  of
	      GPU,  which users	can then specify in a job request.  If Type is
	      specified, then Count is limited in size (currently 1024).

EXAMPLES
       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Define	GPU devices with MPS support
       ##################################################################
       AutoDetect=nvml
       Name=gpu	Type=gtx560 File=/dev/nvidia0 COREs=0,1
       Name=gpu	Type=tesla  File=/dev/nvidia1 COREs=2,3
       Name=mps	Count=100 File=/dev/nvidia0 COREs=0,1
       Name=mps	Count=100  File=/dev/nvidia1 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Overwrite system defaults and explicitly configure three GPUs
       ##################################################################
       Name=gpu	Type=tesla File=/dev/nvidia[0-1] COREs=0,1
       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
       # NOTE: nvidia2 device is out of	service
       Name=gpu	Type=tesla  File=/dev/nvidia3 COREs=2,3

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use a single gres.conf	file for all compute nodes - positive method
       ##################################################################
       ## Explicitly specify devices on	nodes tux0-tux15
       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
       # NOTE: tux3 nvidia1 device is out of service
       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]

       ##################################################################
       # Slurm's Generic Resource (GRES) configuration file
       # Use NVML to gather GPU	configuration information
       # Information about all other GRES gathered from	slurm.conf
       ##################################################################
       AutoDetect=nvml

COPYING
       Copyright (C) 2010 The Regents of the University	of  California.	  Pro-
       duced at	Lawrence Livermore National Laboratory (cf, DISCLAIMER).
       Copyright (C) 2010-2019 SchedMD LLC.

       This  file  is  part  of	Slurm, a resource management program.  For de-
       tails, see <https://slurm.schedmd.com/>.

       Slurm is	free software; you can redistribute it and/or modify it	 under
       the  terms  of  the GNU General Public License as published by the Free
       Software	Foundation; either version 2 of	the License, or	(at  your  op-
       tion) any later version.

       Slurm  is  distributed  in the hope that	it will	be useful, but WITHOUT
       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
       FITNESS	FOR  A PARTICULAR PURPOSE.  See	the GNU	General	Public License
       for more	details.

SEE ALSO
       slurm.conf(5)

October	2020		   Slurm Configuration File		  gres.conf(5)

NAME | DESCRIPTION | EXAMPLES | COPYING | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=gres.conf&sektion=5&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help