Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
MPIRUN(1)			   Open	MPI			     MPIRUN(1)

       orterun,	 mpirun,  mpiexec  -  Execute serial and parallel jobs in Open
       MPI.  oshrun, shmemrun -	Execute	 serial	 and  parallel	jobs  in  Open

       Note:  mpirun,  mpiexec,	and orterun are	all synonyms for each other as
       well as oshrun, shmemrun	in case	Open SHMEM is installed.  Using	any of
       the names will produce the same behavior.

       Single Process Multiple Data (SPMD) Model:

       mpirun [	options	] <program> [ <args> ]

       Multiple	Instruction Multiple Data (MIMD) Model:

       mpirun [	global_options ]
	      [	local_options1 ] <program1> [ <args1> ]	:
	      [	local_options2 ] <program2> [ <args2> ]	:
	      ... :
	      [	local_optionsN ] <programN> [ <argsN> ]

       Note  that in both models, invoking mpirun via an absolute path name is
       equivalent to specifying	the --prefix option with a _dir_ value equiva-
       lent  to	 the  directory	where mpirun resides, minus its	last subdirec-
       tory.  For example:

	   % /usr/local/bin/mpirun ...

       is equivalent to

	   % mpirun --prefix /usr/local

       If you are simply looking for how to run	an MPI application, you	proba-
       bly want	to use a command line of the following form:

	   % mpirun [ -np X ] [	--hostfile <filename> ]	 <program>

       This  will  run X copies	of _program_ in	your current run-time environ-
       ment (if	running	under a	supported resource manager, Open MPI's	mpirun
       will  usually  automatically  use  the  corresponding  resource manager
       process starter,	as opposed to, for example, rsh	or ssh,	which  require
       the  use	 of a hostfile,	or will	default	to running all X copies	on the
       localhost), scheduling (by default) in a	 round-robin  fashion  by  CPU
       slot.  See the rest of this page	for more details.

       Please  note  that mpirun automatically binds processes as of the start
       of the v1.8 series. Three binding patterns are used in the  absence  of
       any further directives:

       Bind to core:	 when the number of processes is <= 2

       Bind to socket:	 when the number of processes is > 2

       Bind to none:	 when oversubscribed

       If your application uses	threads, then you probably want	to ensure that
       you are either not bound	at all	(by  specifying	 --bind-to  none),  or
       bound  to multiple cores	using an appropriate binding level or specific
       number of processing elements per application process.

       The term	"slot" is used extensively in the rest of this manual page.  A
       slot  is	 an  allocation	 unit for a process.  The number of slots on a
       node indicate how many processes	can potentially	execute	on that	 node.
       By default, Open	MPI will allow one process per slot.

       If  Open	 MPI  is not explicitly	told how many slots are	available on a
       node (e.g., if a	hostfile is used and the number	of slots is not	speci-
       fied for	a given	node), it will determine a maximum number of slots for
       that node in one	of two ways:

       1. Default behavior
	  By default, Open MPI will attempt to discover	the number of  proces-
	  sor  cores  on  the node, and	use that as the	number of slots	avail-

       2. When --use-hwthread-cpus is used
	  If --use-hwthread-cpus is specified on the mpirun command line, then
	  Open	MPI will attempt to discover the number	of hardware threads on
	  the node, and	use that as the	number of slots	available.

       This default behavior also occurs when specifying the -host option with
       a single	host.  Thus, the command:

       mpirun --host node1 ./a.out
	   launches a number of	processes equal	to the number of cores on node
	   node1, whereas:

       mpirun --host node1 --use-hwthread-cpus ./a.out
	   launches a number of	processes equal	 to  the  number  of  hardware
	   threads on node1.

       When  Open  MPI applications are	invoked	in an environment managed by a
       resource	manager	(e.g., inside of a SLURM job), and Open	MPI was	 built
       with  appropriate support for that resource manager, then Open MPI will
       be informed of the number of slots for each node	by the	resource  man-
       ager.  For example:

       mpirun ./a.out
	   launches  one process for every slot	(on every node)	as dictated by
	   the resource	manager	job specification.

       Also note that the one-process-per-slot restriction can	be  overridden
       in  unmanaged  environments  (e.g.,  when using hostfiles without a re-
       source manager) if oversubscription is enabled (by default, it is  dis-
       abled).	 Most  MPI  applications  and HPC environments do not oversub-
       scribe; for simplicity, the majority of this documentation assumes that
       oversubscription	is not enabled.

   Slots are not hardware resources
       Slots are frequently incorrectly	conflated with hardware	resources.  It
       is important to realize that slots are  an  entirely  different	metric
       than the	number (and type) of hardware resources	available.

       Here are	some examples that may help illustrate the difference:

       1. More processor cores than slots

	  Consider a resource manager job environment that tells Open MPI that
	  there	is a single node with 20 processor cores and  2	 slots	avail-
	  able.	 By default, Open MPI will only	let you	run up to 2 processes.

	  Meaning:  you	 run out of slots long before you run out of processor

       2. More slots than processor cores

	  Consider a hostfile with a single  node  listed  with	 a  "slots=50"
	  qualification.   The	node has 20 processor cores.  By default, Open
	  MPI will let you run up to 50	processes.

	  Meaning: you can run many more processes  than  you  have  processor

       By default, Open	MPI defines that a "processing element"	is a processor
       core.  However, if --use-hwthread-cpus is specified on the mpirun  com-
       mand line, then a "processing element" is a hardware thread.

       mpirun  will send the name of the directory where it was	invoked	on the
       local node to each of the remote	nodes, and attempt to change  to  that
       directory.   See	the "Current Working Directory"	section	below for fur-
       ther details.

       <program> The program executable. This is identified as the first  non-
		 recognized argument to	mpirun.

       <args>	 Pass  these  run-time	arguments to every new process.	 These
		 must always be	the last arguments to mpirun. If an  app  con-
		 text file is used, _args_ will	be ignored.

       -h, --help
		 Display help for this command

       -q, --quiet
		 Suppress informative messages from orterun during application

       -v, --verbose
		 Be verbose

       -V, --version
		 Print version number.	If no other arguments are given,  this
		 will also cause orterun to exit.

       -N <num>
		 Launch	num processes per node on all allocated	nodes (synonym
		 for npernode).

       -display-map, --display-map
		 Display a table showing the mapped location of	 each  process
		 prior to launch.

       -display-allocation, --display-allocation
		 Display the detected resource allocation.

       -output-proctable, --output-proctable
		 Output	the debugger proctable after launch.

       -dvm, --dvm
		 Create	a persistent distributed virtual machine (DVM).

       -max-vm-size, --max-vm-size <size>
		 Number	of processes to	run.

       -novm, --novm
		 Execute  without  creating an allocation-spanning virtual ma-
		 chine	(only  start  daemons  on  nodes  hosting  application

       -hnp, --hnp <arg0>
		 Specify  the  URI of the Head Node Process (HNP), or the name
		 of the	file (specified	as file:filename) that	contains  that

       Use  one	of the following options to specify which hosts	(nodes)	of the
       cluster to run on. Note that as of  the	start  of  the	v1.8  release,
       mpirun  will launch a daemon onto each host in the allocation (as modi-
       fied by the following options) at the very beginning of execution,  re-
       gardless	 of  whether  or  not application processes will eventually be
       mapped to execute there.	This is	done to	allow collection  of  hardware
       topology	 information  from  the	 remote	nodes, thus allowing us	to map
       processes against known topology. However, it is	a change from the  be-
       havior in prior releases	where daemons were only	launched after mapping
       was complete, and thus only occurred on nodes  where  application  pro-
       cesses would actually be	executing.

       -H, -host, --host <host1,host2,...,hostN>
	      List of hosts on which to	invoke processes.

       -hostfile, --hostfile <hostfile>
	      Provide a	hostfile to use.

       -default-hostfile, --default-hostfile <hostfile>
	      Provide a	default	hostfile.

       -machinefile, --machinefile <machinefile>
	      Synonym for -hostfile.

       -cpu-set, --cpu-set <list>
	      Restrict	launched  processes  to	 the specified logical cpus on
	      each node	(comma-separated list).	Note that the binding  options
	      will  still  apply within	the specified envelope - e.g., you can
	      elect to bind each process to only one cpu within	the  specified
	      cpu set.

       The  following  options specify the number of processes to launch. Note
       that none of the	options	imply a	particular binding policy - e.g.,  re-
       questing	 N processes for each socket does not imply that the processes
       will be bound to	the socket.

       -c, -n, --n, -np	<#>
	      Run this many copies of the program on the  given	 nodes.	  This
	      option  indicates	 that the specified file is an executable pro-
	      gram and not an application context. If no value is provided for
	      the number of copies to execute (i.e., neither the "-np" nor its
	      synonyms are provided on the command line), Open MPI will	 auto-
	      matically	 execute  a  copy  of the program on each process slot
	      (see below for description of a "process slot").	This  feature,
	      however,	can  only be used in the SPMD model and	will return an
	      error (without beginning execution of  the  application)	other-

       ^amap-by ppr:N:<object>
	      Launch  N	 times	the number of objects of the specified type on
	      each node.

       -npersocket, --npersocket <#persocket>
	      On each node, launch this	many processes	times  the  number  of
	      processor	 sockets  on  the  node.   The -npersocket option also
	      turns on the -bind-to-socket option.  (deprecated	 in  favor  of
	      --map-by ppr:n:socket)

       -npernode, --npernode <#pernode>
	      On  each node, launch this many processes.  (deprecated in favor
	      of --map-by ppr:n:node)

       -pernode, --pernode
	      On each node, launch one process -- equivalent to	 -npernode  1.
	      (deprecated in favor of --map-by ppr:1:node)

       To map processes:

       --map-by	<foo>
	      Map  to  the specified object, defaults to socket. Supported op-
	      tions include slot, hwthread, core, L1cache,  L2cache,  L3cache,
	      socket,  numa,  board,  node, sequential,	distance, and ppr. Any
	      object can include modifiers by adding a : and  any  combination
	      of  PE=n	(bind  n processing elements to	each proc), SPAN (load
	      balance the processes across the allocation), OVERSUBSCRIBE (al-
	      low  more	 processes  on	a  node	than processing	elements), and
	      NOOVERSUBSCRIBE.	This includes PPR, where the pattern would  be
	      terminated by another colon to separate it from the modifiers.

       -bycore,	--bycore
	      Map processes by core (deprecated	in favor of --map-by core)

       -byslot,	--byslot
	      Map and rank processes round-robin by slot.

       -nolocal, --nolocal
	      Do  not  run  any	copies of the launched application on the same
	      node as orterun is running.  This	option will  override  listing
	      the  localhost  with  --host or any other	host-specifying	mecha-

       -nooversubscribe, --nooversubscribe
	      Do not oversubscribe any nodes; error (without starting any pro-
	      cesses)  if  the requested number	of processes would cause over-
	      subscription.  This option implicitly sets "max_slots" equal  to
	      the "slots" value	for each node. (Enabled	by default).

       -oversubscribe, --oversubscribe
	      Nodes  are  allowed to be	oversubscribed,	even on	a managed sys-
	      tem, and overloading of processing elements.

       -bynode,	--bynode
	      Launch processes one per node, cycling by	node in	a  round-robin
	      fashion.	 This spreads processes	evenly among nodes and assigns
	      MPI_COMM_WORLD ranks in a	round-robin, "by node" manner.

       -cpu-list, --cpu-list <cpus>
	      Comma-delimited list of processor	IDs to which to	bind processes
	      [default=NULL].	Processor IDs are interpreted as hwloc logical
	      core IDs.	 Run the hwloc lstopo(1) command  to  see  a  list  of
	      available	cores and their	logical	IDs.

       To order	processes' ranks in MPI_COMM_WORLD:

       --rank-by <foo>
	      Rank  in	round-robin fashion according to the specified object,
	      defaults to slot.	 Supported  options  include  slot,  hwthread,
	      core, L1cache, L2cache, L3cache, socket, numa, board, and	node.

       For process binding:

       --bind-to <foo>
	      Bind  processes  to the specified	object,	defaults to core. Sup-
	      ported options include slot, hwthread, core,  l1cache,  l2cache,
	      l3cache, socket, numa, board, cpu-list, and none.

       -cpus-per-proc, --cpus-per-proc <#perproc>
	      Bind  each process to the	specified number of cpus.  (deprecated
	      in favor of --map-by <obj>:PE=n)

       -cpus-per-rank, --cpus-per-rank <#perrank>
	      Alias for	-cpus-per-proc.	  (deprecated  in  favor  of  --map-by

       -bind-to-core, --bind-to-core
	      Bind processes to	cores (deprecated in favor of --bind-to	core)

       -bind-to-socket,	--bind-to-socket
	      Bind  processes  to  processor  sockets  (deprecated in favor of
	      --bind-to	socket)

       -report-bindings, --report-bindings
	      Report any bindings for launched processes.

       For rankfiles:

       -rf, --rankfile <rankfile>
	      Provide a	rankfile file.

       To manage standard I/O:

       -output-filename, --output-filename <filename>
	      Redirect the stdout, stderr, and stddiag of all processes	 to  a
	      process-unique  version  of the specified	filename. Any directo-
	      ries in the filename will	automatically be created.  Each	output
	      file  will consist of, where the id will be the pro-
	      cesses' rank in MPI_COMM_WORLD, left-filled with zero's for cor-
	      rect  ordering  in  listings. A relative path value will be con-
	      verted to	an absolute path based on the cwd where	mpirun is exe-
	      cuted.  Note  that  this will not	work on	environments where the
	      file system on compute nodes differs from	that where  mpirun  is

       -stdin, --stdin <rank>
	      The MPI_COMM_WORLD rank of the process that is to	receive	stdin.
	      The default is to	forward	stdin to MPI_COMM_WORLD	 rank  0,  but
	      this  option  can	be used	to forward stdin to any	process. It is
	      also acceptable to specify none, indicating  that	 no  processes
	      are to receive stdin.

       -merge-stderr-to-stdout,	--merge-stderr-to-stdout
	      Merge stderr to stdout for each process.

       -tag-output, --tag-output
	      Tag each line of output to stdout, stderr, and stddiag with [jo-
	      bid,  MCW_rank]<stdxxx>  indicating  the	 process   jobid   and
	      MPI_COMM_WORLD  rank  of	the process that generated the output,
	      and the channel which generated it.

       -timestamp-output, --timestamp-output
	      Timestamp	each line of output to stdout, stderr, and stddiag.

       -xml, --xml
	      Provide all output to stdout, stderr, and	stddiag	in an xml for-

       -xml-file, --xml-file <filename>
	      Provide all output in XML	format to the specified	file.

       -xterm, --xterm <ranks>
	      Display  the  output  from  the  processes  identified  by their
	      MPI_COMM_WORLD ranks in separate xterm windows.  The  ranks  are
	      specified	 as  a comma-separated list of ranges, with a -1 indi-
	      cating all. A separate window will be created for	each specified
	      process.	 Note:	xterm  will normally terminate the window upon
	      termination of the process running within	it. However, by	adding
	      a	 "!" to	the end	of the list of specified ranks,	the proper op-
	      tions will be provided to	ensure that  xterm  keeps  the	window
	      open  after the process terminates, thus allowing	you to see the
	      process' output.	Each xterm window will subsequently need to be
	      manually	closed.	 Note: In some environments, xterm may require
	      that the executable be in	the user's path, or  be	 specified  in
	      absolute or relative terms. Thus,	it may be necessary to specify
	      a	local executable as "./foo" instead of just  "foo".  If	 xterm
	      fails  to	 find  the executable, mpirun will hang, but still re-
	      spond correctly to a ctrl-c.  If this happens, please check that
	      the executable is	being specified	correctly and try again.

       To manage files and runtime environment:

       -path, --path <path>
	      <path> that will be used when attempting to locate the requested
	      executables.  This is used prior to using	the  local  PATH  set-

       --prefix	<dir>
	      Prefix  directory	 that  will be used to set the PATH and	LD_LI-
	      BRARY_PATH on the	remote node before invoking Open  MPI  or  the
	      target process.  See the "Remote Execution" section, below.

	      Disable the automatic --prefix behavior

       -s, --preload-binary
	      Copy  the	 specified  executable(s)  to remote machines prior to
	      starting remote processes. The executables will be copied	to the
	      Open  MPI	 session directory and will be deleted upon completion
	      of the job.

       --preload-files <files>
	      Preload the comma	separated list of files	to the current working
	      directory	 of  the  remote  machines  where  processes  will  be
	      launched prior to	starting those processes.

       -set-cwd-to-session-dir,	--set-cwd-to-session-dir
	      Set the working directory	of the started processes to their ses-
	      sion directory.

       -wd <dir>
	      Synonym for -wdir.

       -wdir <dir>
	      Change  to  the  directory  <dir>	before the user's program exe-
	      cutes.  See the "Current Working Directory" section for notes on
	      relative	paths.	 Note: If the -wdir option appears both	on the
	      command line and in an application  context,  the	 context  will
	      take  precedence over the	command	line. Thus, if the path	to the
	      desired wdir is different	on the backend nodes, then it must  be
	      specified	 as  an	 absolute path that is correct for the backend

       -x <env>
	      Export the specified environment variables to the	 remote	 nodes
	      before executing the program.  Only one environment variable can
	      be specified per -x option.  Existing environment	variables  can
	      be  specified or new variable names specified with corresponding
	      values.  For example:
		  % mpirun -x DISPLAY -x OFILE=/tmp/out	...

	      The parser for the -x option is not very sophisticated; it  does
	      not  even	 understand  quoted  values.  Users are	advised	to set
	      variables	in the environment, and	then use -x to export (not de-
	      fine) them.

       Setting MCA parameters:

       -gmca, --gmca <key> <value>
	      Pass  global MCA parameters that are applicable to all contexts.
	      _key_ is the parameter name; _value_ is the parameter value.

       -mca, --mca <key> <value>
	      Send arguments to	various	MCA modules.  See the  "MCA"  section,

       -am <arg0>
	      Aggregate	MCA parameter set file list.

       -tune, --tune <tune_file>
	      Specify a	tune file to set arguments for various MCA modules and
	      environment variables.  See the "Setting MCA parameters and  en-
	      vironment	variables from file" section, below.

       For debugging:

       -debug, --debug
	      Invoke	the    user-level    debugger	 indicated    by   the
	      orte_base_user_debugger MCA parameter.

	      When paired with the --timeout option, mpirun  will  obtain  and
	      print  out  stack	 traces	 from  all launched processes that are
	      still alive when the timeout expires.  Note that obtaining stack
	      traces can take a	little time and	produce	a lot of output, espe-
	      cially for large process-count jobs.

       -debugger, --debugger <args>
	      Sequence of debuggers to search for when --debug is  used	 (i.e.
	      a	synonym	for orte_base_user_debugger MCA	parameter).

       --timeout <seconds>
	      The  maximum  number  of	seconds	 that  mpirun  (also  known as
	      mpiexec, oshrun, orterun,	etc.)  will run.  After	this many sec-
	      onds,  mpirun  will  abort the launched job and exit with	a non-
	      zero exit	status.	 Using --timeout can be	also useful when  com-
	      bined with the --get-stack-traces	option.

       -tv, --tv
	      Launch processes under the TotalView debugger.  Deprecated back-
	      wards compatibility flag.	Synonym	for --debug.

       There are also other options:

	      Allow mpirun to run when executed	by the root user  (mpirun  de-
	      faults  to aborting when launched	as the root user).  Be sure to
	      see the Running as root section, below, for more detail.

       --app <appfile>
	      Provide an appfile, ignoring all other command line options.

       -cf, --cartofile	<cartofile>
	      Provide a	cartography file.

       -continuous, --continuous
	      Job is to	run until explicitly terminated.

       -disable-recovery, --disable-recovery
	      Disable recovery (resets all recovery options to off).

       -do-not-launch, --do-not-launch
	      Perform all necessary operations to prepare to launch the	appli-
	      cation, but do not actually launch it.

       -do-not-resolve,	--do-not-resolve
	      Do not attempt to	resolve	interfaces.

       -enable-recovery, --enable-recovery
	      Enable recovery from process failure [Default = disabled].

       -index-argv-by-rank, --index-argv-by-rank
	      Uniquely index argv[0] for each process using its	rank.

       -leave-session-attached,	--leave-session-attached
	      Do not detach OmpiRTE daemons used by this application. This al-
	      lows error messages from the daemons as well as  the  underlying
	      environment  (e.g.,  when	failing	to launch a daemon) to be out-

       -max-restarts, --max-restarts <num>
	      Max number of times to restart a failed process.

       -ompi-server, --ompi-server <uri	or file>
	      Specify the URI of the Open MPI server (or the mpirun to be used
	      as  the  server),	 the name of the file (specified as file:file-
	      name) that contains that info, or	the PID	(specified  as	pid:#)
	      of  the mpirun to	be used	as the server.	The Open MPI server is
	      used to support multi-application	data exchange  via  the	 MPI-2
	      MPI_Publish_name and MPI_Lookup_name functions.

       -personality, --personality <list>
	      Comma-separated  list  of	programming model, languages, and con-
	      tainers being used (default="ompi").

       --ppr <list>
	      Comma-separated list of number of	processes on a given  resource
	      type [default: none].

       -report-child-jobs-separately, --report-child-jobs-separately
	      Return the exit status of	the primary job	only.

       -report-events, --report-events <URI>
	      Report events to a tool listening	at the specified URI.

       -report-pid, --report-pid <channel>
	      Print  out  mpirun's PID during startup. The channel must	be ei-
	      ther a '-' to indicate that the pid is to	be output to stdout, a
	      '+'  to  indicate	 that  the pid is to be	output to stderr, or a
	      filename to which	the pid	is to be written.

       -report-uri, --report-uri <channel>
	      Print out	mpirun's URI during startup. The channel must  be  ei-
	      ther a '-' to indicate that the URI is to	be output to stdout, a
	      '+' to indicate that the URI is to be output  to	stderr,	 or  a
	      filename to which	the URI	is to be written.

       -show-progress, --show-progress
	      Output a brief periodic report on	launch progress.

       -terminate, --terminate
	      Terminate	the DVM.

       -use-hwthread-cpus, --use-hwthread-cpus
	      Use hardware threads as independent CPUs.

	      Note  that  if  a	 number	 of  slots is not provided to Open MPI
	      (e.g., via the "slots" keyword in	a hostfile or from a  resource
	      manager  such  as	SLURM),	the use	of this	option changes the de-
	      fault calculation	of number of slots on a	node.  See "DEFINITION
	      OF 'SLOT'", above.

	      Also  note  that	the  use of this option	changes	the Open MPI's
	      definition of a "processor element" from a processor core	 to  a
	      hardware	thread.	  See  "DEFINITION  OF	'PROCESSOR  ELEMENT'",

       -use-regexp, --use-regexp
	      Use regular expressions for launch.

       The following options are useful	for developers;	they are not generally
       useful to most ORTE and/or MPI users:

       -d, --debug-devel
	      Enable  debugging	 of  the  OmpiRTE  (the	run-time layer in Open
	      MPI).  This is not generally useful for most users.

	      Enable debugging of any OmpiRTE daemons used  by	this  applica-

	      Enable  debugging	 of  any OmpiRTE daemons used by this applica-
	      tion, storing output in files.

       -display-devel-allocation, --display-devel-allocation
	      Display a	detailed list of the allocation	 being	used  by  this

       -display-devel-map, --display-devel-map
	      Display  a  more	detailed  table	showing	the mapped location of
	      each process prior to launch.

       -display-diffable-map, --display-diffable-map
	      Display a	diffable process map just before launch.

       -display-topo, --display-topo
	      Display the topology as part of  the  process  map  just	before

       -launch-agent, --launch-agent
	      Name  of the executable that is to be used to start processes on
	      the remote nodes.	The default is "orted".	 This  option  can  be
	      used to test new daemon concepts,	or to pass options back	to the
	      daemons without having mpirun  itself  see  them.	 For  example,
	      specifying  a launch agent of orted -mca odls_base_verbose 5 al-
	      lows the developer to ask	the orted for debugging	output without
	      clutter from mpirun itself.

	      When  paired  with the --timeout command line option, report the
	      run-time subsystem state of each process when  the  timeout  ex-

       There may be other options listed with mpirun --help.

   Environment Variables
	      Synonym for the --timeout	command	line option.

       One  invocation	of mpirun starts an MPI	application running under Open
       MPI. If the application is single process multiple data (SPMD), the ap-
       plication can be	specified on the mpirun	command	line.

       If  the	application is multiple	instruction multiple data (MIMD), com-
       prising of multiple programs, the set of	programs and argument  can  be
       specified  in one of two	ways: Extended Command Line Arguments, and Ap-
       plication Context.

       An application context describes	the MIMD program set including all ar-
       guments	in  a  separate	file.  This file essentially contains multiple
       mpirun command lines, less the command name  itself.   The  ability  to
       specify	different options for different	instantiations of a program is
       another reason to use an	application context.

       Extended	command	line arguments allow for the description of the	appli-
       cation  layout  on  the	command	 line using colons (:) to separate the
       specification of	programs and arguments.	Some options are globally  set
       across  all specified programs (e.g. --hostfile), while others are spe-
       cific to	a single program (e.g. -np).

   Specifying Host Nodes
       Host nodes can be identified on the mpirun command line with the	 -host
       option or in a hostfile.

       For example,

       mpirun -H aa,aa,bb ./a.out
	   launches two	processes on node aa and one on	bb.

       Or, consider the	hostfile

	  % cat	myhostfile
	  aa slots=2
	  bb slots=2
	  cc slots=2

       Here,  we  list	both the host names (aa, bb, and cc) but also how many
       slots there are for each.

       mpirun -hostfile	myhostfile ./a.out
	   will	launch two processes on	each of	the three nodes.

       mpirun -hostfile	myhostfile -host aa ./a.out
	   will	launch two processes, both on node aa.

       mpirun -hostfile	myhostfile -host dd ./a.out
	   will	find no	hosts to run on	and abort with an error.  That is, the
	   specified host dd is	not in the specified hostfile.

       When  running under resource managers (e.g., SLURM, Torque, etc.), Open
       MPI will	obtain both the	hostnames and the  number  of  slots  directly
       from the	resource manger.

   Specifying Number of	Processes
       As  we  have just seen, the number of processes to run can be set using
       the hostfile.  Other mechanisms exist.

       The number of processes launched	can be specified as a multiple of  the
       number of nodes or processor sockets available.	For example,

       mpirun -H aa,bb -npersocket 2 ./a.out
	   launches processes 0-3 on node aa and process 4-7 on	node bb, where
	   aa and bb are both dual-socket nodes.  The -npersocket option  also
	   turns  on the -bind-to-socket option, which is discussed in a later

       mpirun -H aa,bb -npernode 2 ./a.out
	   launches processes 0-1 on node aa and processes 2-3 on node bb.

       mpirun -H aa,bb -npernode 1 ./a.out
	   launches one	process	per host node.

       mpirun -H aa,bb -pernode	./a.out
	   is the same as -npernode 1.

       Another alternative is to specify the number of processes with the  -np
       option.	Consider now the hostfile

	  % cat	myhostfile
	  aa slots=4
	  bb slots=4
	  cc slots=4


       mpirun -hostfile	myhostfile -np 6 ./a.out
	   will	 launch	processes 0-3 on node aa and processes 4-5 on node bb.
	   The remaining slots in the hostfile will not	be used	since the  -np
	   option indicated that only 6	processes should be launched.

   Mapping Processes to	Nodes: Using Policies
       The  examples above illustrate the default mapping of process processes
       to nodes.  This mapping can also	be controlled with various mpirun  op-
       tions that describe mapping policies.

       Consider	the same hostfile as above, again with -np 6:

				 node aa      node bb	   node	cc

	 mpirun			 0 1 2 3      4	5

	 mpirun	--map-by node	 0 3	      1	4	   2 5

	 mpirun	-nolocal		      0	1 2 3	   4 5

       The  --map-by  node  option  will load balance the processes across the
       available nodes,	numbering each process in a round-robin	fashion.

       The -nolocal option prevents any	processes from being mapped  onto  the
       local host (in this case	node aa).  While mpirun	typically consumes few
       system resources, -nolocal can be helpful for launching very large jobs
       where  mpirun  may  actually  need  to use noticeable amounts of	memory
       and/or processing time.

       Just as -np can specify fewer processes than there are  slots,  it  can
       also oversubscribe the slots.  For example, with	the same hostfile:

       mpirun -hostfile	myhostfile -np 14 ./a.out
	   will	 launch	 processes  0-3	on node	aa, 4-7	on bb, and 8-11	on cc.
	   It will then	add the	remaining two processes	to whichever nodes  it

       One can also specify limits to oversubscription.	 For example, with the
       same hostfile:

       mpirun -hostfile	myhostfile -np 14 -nooversubscribe ./a.out
	   will	produce	an error since -nooversubscribe	prevents oversubscrip-

       Limits  to  oversubscription  can also be specified in the hostfile it-
	% cat myhostfile
	aa slots=4 max_slots=4
	bb	   max_slots=4
	cc slots=4

       The max_slots field specifies such a limit.  When it  does,  the	 slots
       value defaults to the limit.  Now:

       mpirun -hostfile	myhostfile -np 14 ./a.out
	   causes the first 12 processes to be launched	as before, but the re-
	   maining two processes will be forced	onto node cc.  The  other  two
	   nodes  are  protected  by  the hostfile against oversubscription by
	   this	job.

       Using the --nooversubscribe option can be helpful since Open  MPI  cur-
       rently does not get "max_slots" values from the resource	manager.

       Of course, -np can also be used with the	-H or -host option.  For exam-

       mpirun -H aa,bb -np 8 ./a.out
	   launches 8 processes.  Since	only two hosts	are  specified,	 after
	   the	first  two  processes are mapped, one to aa and	one to bb, the
	   remaining processes oversubscribe the specified hosts.

       And here	is a MIMD example:

       mpirun -H aa -np	1 hostname : -H	bb,cc -np 2 uptime
	   will	launch process 0 running hostname on node aa and  processes  1
	   and 2 each running uptime on	nodes bb and cc, respectively.

   Mapping, Ranking, and Binding: Oh My!
       Open  MPI  employs  a three-phase procedure for assigning process loca-
       tions and ranks:

       mapping	 Assigns a default location to each process

       ranking	 Assigns an MPI_COMM_WORLD rank	value to each process

       binding	 Constrains each process to run	on specific processors

       The mapping step	is used	to assign a default location to	 each  process
       based  on the mapper being employed. Mapping by slot, node, and sequen-
       tially results in the assignment	of the processes to the	node level. In
       contrast, mapping by object, allows the mapper to assign	the process to
       an actual object	on each	node.

       Note: the location assigned to the process is independent of  where  it
       will  be	 bound - the assignment	is used	solely as input	to the binding

       The mapping of process processes	to nodes can be	defined	not just  with
       general	policies but also, if necessary, using arbitrary mappings that
       cannot be described by a	simple policy.	One can	 use  the  "sequential
       mapper,"	 which reads the hostfile line by line,	assigning processes to
       nodes in	whatever order the hostfile specifies.	Use the	-mca rmaps seq
       option.	For example, using the same hostfile as	before:

       mpirun -hostfile	myhostfile -mca	rmaps seq ./a.out

       will  launch  three processes, one on each of nodes aa, bb, and cc, re-
       spectively.  The	slot counts don't matter;  one process is launched per
       line on whatever	node is	listed on the line.

       Another	way  to	 specify  arbitrary mappings is	with a rankfile, which
       gives you detailed control over process binding as well.	 Rankfiles are
       discussed below.

       The second phase	focuses	on the ranking of the process within the job's
       MPI_COMM_WORLD.	Open MPI separates this	from the mapping procedure  to
       allow more flexibility in the relative placement	of MPI processes. This
       is best illustrated by considering the following	 two  cases  where  we
       used the	amap-by	ppr:2:socket option:

				 node aa       node bb

	   rank-by core		0 1 ! 2	3     4	5 ! 6 7

	  rank-by socket	0 2 ! 1	3     4	6 ! 5 7

	  rank-by socket:span	0 4 ! 1	5     2	6 ! 3 7

       Ranking	by  core  and  by slot provide the identical result - a	simple
       progression of MPI_COMM_WORLD ranks across each node. Ranking by	socket
       does  a	round-robin  ranking within each node until all	processes have
       been assigned an	MCW rank, and then progresses to the next node.	Adding
       the span	modifier to the	ranking	directive causes the ranking algorithm
       to treat	the entire allocation as a single entity - thus, the MCW ranks
       are  assigned across all	sockets	before circling	back around to the be-

       The binding phase actually binds	each process to	a given	set of proces-
       sors.  This  can	improve	performance if the operating system is placing
       processes suboptimally.	 For  example,	it  might  oversubscribe  some
       multi-core  processor  sockets,	leaving	 other sockets idle;  this can
       lead processes to contend unnecessarily for common resources.   Or,  it
       might  spread  processes	out too	widely;	 this can be suboptimal	if ap-
       plication performance is	sensitive to interprocess communication	costs.
       Binding can also	keep the operating system from migrating processes ex-
       cessively, regardless of	how optimally those processes were  placed  to
       begin with.

       The  processors	to  be	used for binding can be	identified in terms of
       topological groupings - e.g., binding to	 an  l3cache  will  bind  each
       process	to all processors within the scope of a	single L3 cache	within
       their assigned location.	Thus, if a process is assigned by  the	mapper
       to  a  certain socket, then a _abind-to l3cache directive	will cause the
       process to be bound to the processors that  share  a  single  L3	 cache
       within that socket.

       Alternatively,  processes  can be assigned to processors	based on their
       local rank on a node using the --bind-to	cpu-list:ordered  option  with
       an associated --cpu-list	"0,2,5". In this example, the first process on
       a node will be bound to cpu 0, the second process on the	node  will  be
       bound  to cpu 2,	and the	third process on the node will be bound	to cpu
       5. --bind-to will also accept cpulist:ortered  as  a  synonym  to  cpu-
       list:ordered.  Note that	an error will result if	more processes are as-
       signed to a node	than cpus are provided.

       To help balance loads, the binding directive uses a round-robin	method
       when binding to levels lower than used in the mapper. For example, con-
       sider the case where a job is mapped to	the  socket  level,  and  then
       bound  to  core.	 Each  socket will have	multiple cores,	so if multiple
       processes are mapped to a given socket, the binding algorithm will  as-
       sign each process located to a socket to	a unique core in a round-robin

       Alternatively, processes	mapped by l2cache and  then  bound  to	socket
       will simply be bound to all the processors in the socket	where they are
       located.	In this	manner,	users can exert	detailed control over relative
       MCW rank	location and binding.

       Finally,	--report-bindings can be used to report	bindings.

       As  an  example,	 consider a node with two processor sockets, each com-
       prised of four cores, and each of those	cores  contains	 one  hardware
       thread.	 We  run mpirun	with -np 4 --report-bindings and the following
       additional options:

	% mpirun ... --map-by core --bind-to core
	[...] ... binding child	[...,0]	to cpus	0001
	[...] ... binding child	[...,1]	to cpus	0002
	[...] ... binding child	[...,2]	to cpus	0004
	[...] ... binding child	[...,3]	to cpus	0008

	% mpirun ... --map-by socket --bind-to socket
	[...] ... binding child	[...,0]	to socket 0 cpus 000f
	[...] ... binding child	[...,1]	to socket 1 cpus 00f0
	[...] ... binding child	[...,2]	to socket 0 cpus 000f
	[...] ... binding child	[...,3]	to socket 1 cpus 00f0

	% mpirun ... --map-by slot:PE=2	--bind-to core
	[...] ... binding child	[...,0]	to cpus	0003
	[...] ... binding child	[...,1]	to cpus	000c
	[...] ... binding child	[...,2]	to cpus	0030
	[...] ... binding child	[...,3]	to cpus	00c0

	% mpirun ... --bind-to none

       Here, --report-bindings shows the binding of each process  as  a	 mask.
       In  the first case, the processes bind to successive cores as indicated
       by the masks 0001, 0002,	0004, and 0008.	 In the	second case, processes
       bind  to	all cores on successive	sockets	as indicated by	the masks 000f
       and 00f0.  The processes	cycle  through	the  processor	sockets	 in  a
       round-robin fashion as many times as are	needed.

       In  the	third case, the	masks show us that 2 cores have	been bound per
       process.	 Specifically, the mapping by slot with	the PE=2 qualifier in-
       dicated that each slot (i.e., process) should consume two processor el-
       ements.	Since --use-hwthread-cpus was not specified, Open MPI  defined
       "processor  element" as "core", and therefore the --bind-to core	caused
       each process to be bound	to both	of the cores to	which it was mapped.

       In the fourth case, binding is turned off and no	bindings are reported.

       Open MPI's support for process binding depends on the underlying	 oper-
       ating  system.	Therefore,  certain process binding options may	not be
       available on every system.

       Process binding can also	be set with MCA	parameters.   Their  usage  is
       less  convenient	 than  that of mpirun options.	On the other hand, MCA
       parameters can be set not only on the mpirun command line, but alterna-
       tively in a system or user mca-params.conf file or as environment vari-
       ables, as described in the MCA section below.  Some examples include:

	   mpirun option	  MCA parameter	key	    value

	 --map-by core		rmaps_base_mapping_policy   core
	 --map-by socket	rmaps_base_mapping_policy   socket
	 --rank-by core		rmaps_base_ranking_policy   core
	 --bind-to core		hwloc_base_binding_policy   core
	 --bind-to socket	hwloc_base_binding_policy   socket
	 --bind-to none		hwloc_base_binding_policy   none

       Rankfiles are text files	that specify detailed  information  about  how
       individual  processes  should  be mapped	to nodes, and to which proces-
       sor(s) they should be bound.  Each line of a rankfile specifies the lo-
       cation  of one process (for MPI jobs, the process' "rank" refers	to its
       rank in MPI_COMM_WORLD).	 The general form of each line in the rankfile

	   rank	<N>=<hostname> slot=<slot list>

       For example:

	   $ cat myrankfile
	   rank	0=aa slot=1:0-2
	   rank	1=bb slot=0:0,1
	   rank	2=cc slot=1-2
	   $ mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out

       Means that

	 Rank 0	runs on	node aa, bound to logical socket 1, cores 0-2.
	 Rank 1	runs on	node bb, bound to logical socket 0, cores 0 and	1.
	 Rank 2	runs on	node cc, bound to logical cores	1 and 2.

       Rankfiles can alternatively be used to specify physical processor loca-
       tions. In this case, the	syntax is somewhat different. Sockets  are  no
       longer  recognized, and the slot	number given must be the number	of the
       physical	PU as most OS's	do not assign a	unique physical	identifier  to
       each core in the	node. Thus, a proper physical rankfile looks something
       like the	following:

	   $ cat myphysicalrankfile
	   rank	0=aa slot=1
	   rank	1=bb slot=8
	   rank	2=cc slot=6

       This means that

	 Rank 0	will run on node aa, bound to the core that contains  physical
       PU 1
	 Rank  1 will run on node bb, bound to the core	that contains physical
       PU 8
	 Rank 2	will run on node cc, bound to the core that contains  physical
       PU 6

       Rankfiles  are  treated	as  logical  by	default, and the MCA parameter
       rmaps_rank_file_physical	must be	set to 1 to indicate that the rankfile
       is to be	considered as physical.

       The hostnames listed above are "absolute," meaning that actual resolve-
       able hostnames are specified.  However, hostnames can also be specified
       as "relative," meaning that they	are specified in relation to an	exter-
       nally-specified list of hostnames (e.g.,	by mpirun's --host argument, a
       hostfile, or a job scheduler).

       The  "relative" specification is	of the form "+n<X>", where X is	an in-
       teger specifying	the Xth	hostname in the	set  of	 all  available	 host-
       names, indexed from 0.  For example:

	   $ cat myrankfile
	   rank	0=+n0 slot=1:0-2
	   rank	1=+n1 slot=0:0,1
	   rank	2=+n2 slot=1-2
	   $ mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out

       Starting	 with  Open  MPI  v1.7,	 all socket/core slot locations	are be
       specified as logical indexes (the Open MPI v1.6	series	used  physical
       indexes).  You can use tools such as HWLOC's "lstopo" to	find the logi-
       cal indexes of socket and cores.

   Application Context or Executable Program?
       To distinguish the two different	forms, mpirun  looks  on  the  command
       line  for --app option.	If it is specified, then the file named	on the
       command line is assumed to be an	application context.   If  it  is  not
       specified, then the file	is assumed to be an executable program.

   Locating Files
       If  no relative or absolute path	is specified for a file, Open MPI will
       first look for files by searching  the  directories  specified  by  the
       --path  option.	If there is no --path option set or if the file	is not
       found at	the --path location, then Open MPI will	search the user's PATH
       environment variable as defined on the source node(s).

       If  a  relative directory is specified, it must be relative to the ini-
       tial working directory determined by the	specific starter used. For ex-
       ample  when  using  the	rsh  or	ssh starters, the initial directory is
       $HOME by	default. Other starters	may set	the initial directory  to  the
       current working directory from the invocation of	mpirun.

   Current Working Directory
       The  -wdir  mpirun  option  (and	 its  synonym, -wd) allows the user to
       change to an arbitrary directory	before the program is invoked.	It can
       also  be	 used in application context files to specify working directo-
       ries on specific	nodes and/or for specific applications.

       If the -wdir option appears both	in a context file and on  the  command
       line, the context file directory	will override the command line value.

       If  the	-wdir  option is specified, Open MPI will attempt to change to
       the specified directory on all of the  remote  nodes.  If  this	fails,
       mpirun will abort.

       If  the -wdir option is not specified, Open MPI will send the directory
       name where mpirun was invoked to	each of	the remote nodes.  The	remote
       nodes  will  try	to change to that directory. If	they are unable	(e.g.,
       if the directory	does not exist on that node), then Open	MPI  will  use
       the default directory determined	by the starter.

       All  directory changing occurs before the user's	program	is invoked; it
       does not	wait until MPI_INIT is called.

   Standard I/O
       Open MPI	directs	UNIX standard input to /dev/null on all	processes  ex-
       cept  the  MPI_COMM_WORLD  rank	0  process.  The MPI_COMM_WORLD	rank 0
       process inherits	standard input from mpirun.  Note: The node  that  in-
       voked  mpirun need not be the same as the node where the	MPI_COMM_WORLD
       rank 0 process resides. Open MPI	handles	the  redirection  of  mpirun's
       standard	input to the rank 0 process.

       Open  MPI  directs  UNIX	standard output	and error from remote nodes to
       the node	that invoked mpirun and	prints it on the standard output/error
       of mpirun.  Local processes inherit the standard	output/error of	mpirun
       and transfer to it directly.

       Thus it is possible to redirect standard	I/O for	Open MPI  applications
       by using	the typical shell redirection procedure	on mpirun.

	     % mpirun -np 2 my_app < my_input >	my_output

       Note  that  in this example only	the MPI_COMM_WORLD rank	0 process will
       receive the stream from my_input	on stdin.  The stdin on	all the	 other
       nodes  will  be	tied to	/dev/null.  However, the stdout	from all nodes
       will be collected into the my_output file.

   Signal Propagation
       When orterun receives a SIGTERM and SIGINT, it will attempt to kill the
       entire  job  by	sending	 all processes in the job a SIGTERM, waiting a
       small number of seconds,	then  sending  all  processes  in  the	job  a

       SIGUSR1	and  SIGUSR2 signals received by orterun are propagated	to all
       processes in the	job.

       A SIGTSTOP signal to mpirun will	cause a	SIGSTOP	signal to be  sent  to
       all  of the programs started by mpirun and likewise a SIGCONT signal to
       mpirun will cause a SIGCONT sent.

       Other signals are not currently propagated by orterun.

   Process Termination / Signal	Handling
       During the run of an MPI	application, if	any  process  dies  abnormally
       (either exiting before invoking MPI_FINALIZE, or	dying as the result of
       a signal), mpirun will print out	an error message and kill the rest  of
       the MPI application.

       User  signal handlers should probably avoid trying to cleanup MPI state
       (Open MPI is currently not  async-signal-safe;  see  MPI_Init_thread(3)
       for details about MPI_THREAD_MULTIPLE and thread	safety).  For example,
       if a segmentation fault occurs in MPI_SEND (perhaps because a bad  buf-
       fer  was	 passed	in) and	a user signal handler is invoked, if this user
       handler attempts	to invoke MPI_FINALIZE,	Bad Things could happen	 since
       Open  MPI  was  already "in" MPI	when the error occurred.  Since	mpirun
       will notice that	the process died due to	a signal, it is	 probably  not
       necessary (and safest) for the user to only clean up non-MPI state.

   Process Environment
       Processes  in  the  MPI	application inherit their environment from the
       Open RTE	daemon upon the	node on	which they are running.	 The  environ-
       ment  is	 typically  inherited from the user's shell.  On remote	nodes,
       the exact environment is	determined by the boot MCA module  used.   The
       rsh  launch module, for example,	uses either rsh/ssh to launch the Open
       RTE daemon on remote nodes, and typically executes one or more  of  the
       user's  shell-setup  files  before launching the	Open RTE daemon.  When
       running	dynamically  linked  applications  which  require  the	LD_LI-
       BRARY_PATH environment variable to be set, care must be taken to	ensure
       that it is correctly set	when booting Open MPI.

       See the "Remote Execution" section for more details.

   Remote Execution
       Open MPI	requires that the PATH environment variable be set to find ex-
       ecutables  on remote nodes (this	is typically only necessary in rsh- or
       ssh-based environments -- batch/scheduled environments  typically  copy
       the current environment to the execution	of remote jobs,	so if the cur-
       rent environment	has PATH and/or	LD_LIBRARY_PATH	set properly, the  re-
       mote  nodes  will also have it set properly).  If Open MPI was compiled
       with shared library support, it may  also  be  necessary	 to  have  the
       LD_LIBRARY_PATH environment variable set	on remote nodes	as well	(espe-
       cially to find the shared libraries required to run user	 MPI  applica-

       However,	 it  is	not always desirable or	possible to edit shell startup
       files to	set PATH and/or	LD_LIBRARY_PATH.  The --prefix option is  pro-
       vided for some simple configurations where this is not possible.

       The  --prefix option takes a single argument: the base directory	on the
       remote node where Open MPI is installed.	 Open MPI will use this	direc-
       tory  to	 set  the remote PATH and LD_LIBRARY_PATH before executing any
       Open MPI	or user	applications.  This allows running Open	MPI jobs with-
       out  having  pre-configured  the	PATH and LD_LIBRARY_PATH on the	remote

       Open MPI	adds the basename of the current node's	"bindir"  (the	direc-
       tory where Open MPI's executables are installed)	to the prefix and uses
       that to set the PATH on the remote node.	 Similarly, Open MPI adds  the
       basename	of the current node's "libdir" (the directory where Open MPI's
       libraries are installed)	to the prefix and uses that to set the	LD_LI-
       BRARY_PATH on the remote	node.  For example:

       Local bindir:  /local/node/directory/bin

       Local libdir:  /local/node/directory/lib64

       If the following	command	line is	used:

	   % mpirun --prefix /remote/node/directory

       Open  MPI  will	add "/remote/node/directory/bin" to the	PATH and "/re-
       mote/node/directory/lib64" to the LD_LIBRARY_PATH on  the  remote  node
       before attempting to execute anything.

       The  --prefix option is not sufficient if the installation paths	on the
       remote node are different than the local	node (e.g., if "/lib" is  used
       on  the local node, but "/lib64"	is used	on the remote node), or	if the
       installation paths are something	other than a subdirectory under	a com-
       mon prefix.

       Note  that  executing  mpirun via an absolute pathname is equivalent to
       specifying --prefix without the last subdirectory in the	absolute path-
       name to mpirun.	For example:

	   % /usr/local/bin/mpirun ...

       is equivalent to

	   % mpirun --prefix /usr/local

   Exported Environment	Variables
       All  environment	variables that are named in the	form OMPI_* will auto-
       matically be exported to	new processes on the local and	remote	nodes.
       Environmental parameters	can also be set/forwarded to the new processes
       using the MCA parameter mca_base_env_list. The -x option	to mpirun  has
       been deprecated,	but the	syntax of the MCA param	follows	that prior ex-
       ample. While the	syntax of the -x option	and MCA	param allows the defi-
       nition  of  new	variables,  note that the parser for these options are
       currently not very sophisticated	- it does not even  understand	quoted
       values.	 Users are advised to set variables in the environment and use
       the option to export them; not to define	them.

   Setting MCA Parameters
       The -mca	switch allows the passing of parameters	to various MCA	(Modu-
       lar Component Architecture) modules.  MCA modules have direct impact on
       MPI programs because they allow tunable parameters to  be  set  at  run
       time (such as which BTL communication device driver to use, what	param-
       eters to	pass to	that BTL, etc.).

       The -mca	switch takes two arguments: _key_ and _value_.	The _key_  ar-
       gument  generally  specifies  which  MCA	module will receive the	value.
       For example, the	_key_ "btl" is used to select which BTL	to be used for
       transporting  MPI  messages.  The _value_ argument is the value that is
       passed.	For example:

       mpirun -mca btl tcp,self	-np 1 foo
	   Tells Open MPI to use the "tcp" and "self" BTLs, and	to run a  sin-
	   gle copy of "foo" an	allocated node.

       mpirun -mca btl self -np	1 foo
	   Tells  Open	MPI to use the "self" BTL, and to run a	single copy of
	   "foo" an allocated node.

       The -mca	switch can be used multiple times to specify  different	 _key_
       and/or  _value_	arguments.   If	 the same _key_	is specified more than
       once, the _value_s are concatenated with	a comma	(",") separating them.

       Note that the -mca switch is simply a shortcut for setting  environment
       variables.   The	same effect may	be accomplished	by setting correspond-
       ing environment variables before	running	mpirun.	 The form of the envi-
       ronment variables that Open MPI sets is:


       Thus,  the  -mca	 switch	overrides any previously set environment vari-
       ables.  The -mca	settings similarly override MCA	parameters set in  the
       $OPAL_PREFIX/etc/openmpi-mca-params.conf	    or	   $HOME/.openmpi/mca-
       params.conf file.

       Unknown _key_ arguments are still set as	environment variable  --  they
       are  not	 checked  (by  mpirun)	for correctness.  Illegal or incorrect
       _value_ arguments may or	may not	be reported -- it depends on the  spe-
       cific MCA module.

       To find the available component types under the MCA architecture, or to
       find the	 available  parameters	for  a	specific  component,  use  the
       ompi_info command.  See the ompi_info(1)	man page for detailed informa-
       tion on the command.

   Setting MCA parameters and environment variables from file.
       The -tune  command  line	 option	 and  its  synonym  -mca  mca_base_en-
       var_file_prefix	allows	a  user	 to set	mca parameters and environment
       variables with the syntax described below.  This	option requires	a sin-
       gle file	or list	of files separated by "," to follow.

       A  valid	 line  in  the	file may contain zero or many "-x", "-mca", or
       a--mcaa arguments.  The following patterns are supported: -mca var  val
       -mca var	"val" -x var=val -x var.  If any argument is duplicated	in the
       file, the last value read will be used.

       MCA parameters and environment  specified  on  the  command  line  have
       higher precedence than variables	specified in the file.

   Running as root
       The Open	MPI team strongly advises against executing mpirun as the root
       user.  MPI applications should be run as	regular	(non-root) users.

       Reflecting this advice, mpirun will refuse to run as root  by  default.
       To override this	default, you can add the --allow-run-as-root option to
       the mpirun command line,	or you can set	the  environmental  parameters
       that it takes setting two environment variables to effect the same  be-
       havior  as  --allow-run-as-root	in order to stress the Open MPI	team's
       strong advice against running as	the root user.	After extended discus-
       sions  with  communities	 who use containers (where running as the root
       user is the default), there was a persistent desire to be able  to  en-
       able root execution of mpirun via an environmental control (vs. the ex-
       isting --allow-run-as-root command line parameter).  The	compromise  of
       using  two  environment variables was reached: it allows	root execution
       via an environmental control, but it conveys the	Open MPI team's	strong
       recomendation against this behavior.

   Exit	status
       There  is  no  standard	definition for what mpirun should return as an
       exit status. After considerable discussion, we settled on the following
       method for assigning the	mpirun exit status (note: in the following de-
       scription, the "primary"	job is	the  initial  application  started  by
       mpirun  -  all  jobs  that are spawned by that job are designated "sec-
       ondary" jobs):

       o if all	processes in the primary job normally terminate	with exit sta-
	 tus 0,	we return 0

       o if  one  or more processes in the primary job normally	terminate with
	 non-zero exit status, we return the exit status of the	 process  with
	 the lowest MPI_COMM_WORLD rank	to have	a non-zero status

       o if all	processes in the primary job normally terminate	with exit sta-
	 tus 0,	and one	or more	processes in a secondary job  normally	termi-
	 nate  with non-zero exit status, we (a) return	the exit status	of the
	 process with the lowest MPI_COMM_WORLD	rank in	the  lowest  jobid  to
	 have a	non-zero status, and (b) output	a message summarizing the exit
	 status	of the primary and all secondary jobs.

       o if the	cmd line option	--report-child-jobs-separately is set, we will
	 return	 -only-	 the exit status of the	primary	job. Any non-zero exit
	 status	in secondary jobs will be reported solely in a	summary	 print

       By  default,  the  job will abort when any process terminates with non-
       zero status. The	MCA parameter "orte_abort_on_non_zero_status"  can  be
       set to "false" (or "0") to cause	OMPI to	not abort a job	if one or more
       processes return	a non-zero status. In that situation the OMPI  records
       and notes that processes	exited with non-zero termination status	to re-
       port the	approprate exit	status of mpirun (per bullet points above).

       Be sure also to see the examples	throughout the sections	above.

       mpirun -np 4 -mca btl ib,tcp,self prog1
	   Run 4 copies	of prog1 using the "ib", "tcp",	and "self"  BTL's  for
	   the transport of MPI	messages.

       mpirun -np 4 -mca btl tcp,sm,self
	   --mca btl_tcp_if_include eth0 prog1
	   Run 4 copies	of prog1 using the "tcp", "sm" and "self" BTLs for the
	   transport of	MPI messages, with TCP using only the  eth0  interface
	   to  communicate.   Note that	other BTLs have	similar	if_include MCA

       mpirun returns 0	if all processes started by mpirun exit	after  calling
       MPI_FINALIZE.   A  non-zero  value is returned if an internal error oc-
       curred in mpirun, or  one  or  more  processes  exited  before  calling
       MPI_FINALIZE.  If an internal error occurred in mpirun, the correspond-
       ing error code is returned.  In the event that one  or  more  processes
       exit   before   calling	 MPI_FINALIZE,	 the   return	value  of  the
       MPI_COMM_WORLD rank of the process that mpirun first notices  died  be-
       fore  calling  MPI_FINALIZE  will  be returned.	Note that, in general,
       this will be the	first process that died	but is not  guaranteed	to  be

       If  the	--timeout  command line	option is used and the timeout expires
       before the job completes	(thereby  forcing  mpirun  to  kill  the  job)
       mpirun  will return an exit status equivalent to	the value of ETIMEDOUT
       (which is typically 110 on Linux	and OS X systems).


4.1.3				 Mar 31, 2022			     MPIRUN(1)


Want to link to this manual page? Use this URL:

home | help