Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
MPIEXEC(1)			      OSC			    MPIEXEC(1)

NAME
       mpiexec - MPI parallel job initializer

SYNOPSIS
       mpiexec [OPTION]... executable [args]...
       mpiexec [OPTION]... -config configfile
       mpiexec -server

DESCRIPTION
       Mpiexec	is  a replacement program for the script mpirun, which is part
       of the mpich package.  It is used to initialize	a  parallel  job  from
       within  a  pbs  batch or	interactive environment.  It further generates
       the environment variables and configuration files necessary to  intial-
       ize a parallel program for the appropriate MPI message-passing library.

       Mpiexec	uses  the  task	 manager  library, tm(3B), of PBS(1B) to spawn
       copies of the executable	on all the nodes in a pbs allocation.	It  is
       almost functionally equivalent to

	     rsh node "cd $cwd;	exec executable	arguments",

       using  the current working directory from where mpiexec is invoked, and
       the shell specified in the environment, or from the password file.

       The standard input of the mpiexec process is forwarded to  task	number
       zero in the parallel job, allowing for use of the construct

	     mpiexec mycode < inputfile

       This  behavior  can  be modified	using the -nostdin or -allstdin	flags.
       Standard	output and error are also forwarded to mpiexec,	allowing redi-
       rection	of the outputs of all processes.  This can be turned off using
       -nostdout so that the standard output and error streams go through  the
       normal PBS mechanisms, to the batch job output files, or	to your	termi-
       nal in the case of an interactive job.  See qsub(1) for	more  informa-
       tion.

OPTIONS
       All  options  may  be  introduced using either a	single dash, or	double
       dashes as are common in most gnu	utilities.  Options may	 be  shortened
       as long as they remain unambiguous.  Options that require arguments may
       appear as separate words	in the argument	list, or they may be separated
       from the	option by an equals sign.

       -n numproc, -np numproc
	     Use  only	the  specified number of processes.  Default is	to use
	     all which were provided in	the pbs	environment.

       -verbose
	     Talk more about what mpiexec is doing.

       -nostdin
	     Do	not connect the	standard input stream  of  process  0  to  the
	     mpiexec  process.	 If the	process	attempts to read from stdin it
	     will see an end-of-file.

       -allstdin
	     Send the standard input stream of mpiexec to all processes.  Each
	     character	typed  to  mpiexec (or read from a file) is duplicated
	     numproc times, and	sent to	 each  process.	  This	permits	 every
	     process  to read, for example, configuration information from the
	     input stream.

       -nostdout
	     Do	not connect the	standard output	 and  error  streams  of  each
	     process  back  to	the  mpiexec process.  Output on these streams
	     will go through the normal	PBS mechanisms instead,	to wit:	 files
	     of	 the  form  job.ojobid	and job.ejobid for batch jobs, and di-
	     rectly to the controlling terminal	for interactive	jobs.

       -comm type
	     Specify the communication library used by your  code.   Each  MPI
	     library  has  different mechanisms	for starting all the processes
	     of	a parallel job,	thus you must specify to mpiexec which library
	     you  use  so  that	it can set up the environment of the processes
	     correctly.	 The argument type must	be one of:   mpich-gm,	mpich-
	     mx,  mpich-p4,  mpich-ib, mpich-rai, mpich2-pmi, lam, shmem, emp,
	     none; although the	code may not have been compiled	 with  support
	     for  some	of  those.  If this argument is	not specified, mpiexec
	     will look for the environment variable MPIEXEC_COMM  which	 could
	     specify  one  of those arguments.	If this	fails, the compiled-in
	     default communication library is chosen.

       -mpich-p4-shmem

       -mpich-p4-no-shmem
	     The MPICH/P4 library may be configured either to  support	shared
	     memory within a multiprocessor node or not.  It is	necessary that
	     mpiexec know in which way the library was configured to  success-
	     fully start jobs.	While this is generally	chosen at compile time
	     using the --disable-p4-shmem configure flag, it  is  possible  to
	     choose explicitly at runtime with one of these flags.

       -pernode	(SMP only)
	     Allocate  only one	process	per compute node.  For SMP nodes, only
	     one processor will	be allocated a job.  This flag is used to  im-
	     plement  multiple	level  parallelism with	MPI between nodes, and
	     threads within a node, assmuming the code is set up to do that.

       -npernode nprocs	(SMP only)
	     Allocate no more than nprocs processes per	compute	node.  This is
	     a	generalization of the -pernode flag that can be	used to	place,
	     for example, two tasks on each 4-way SMP.

       -nolocal	(not MPICH/P4)
	     Do	not run	any MPI	processes on the local	compute	 node.	 In  a
	     batch  job,  one  of the machines allocated to run	a parallel job
	     will run the batch	script and thus	invoke mpiexec.	  Normally  it
	     participates  in  running the parallel appliacition, but this op-
	     tion disables that	for special  situations	 where	that  node  is
	     needed for	other processing.

       -transform-hostname sed_expression
	     Use an alternate hostname for message passing.  Processes will be
	     spawned using a separate hostname for their message passing  com-
	     munications.   This  is  necessary	 if you	use, say, one ethernet
	     card for PBS hostnames, and another  ethernet  card  for  message
	     passing.	The transformation is provided by a general expression
	     which will	be parsed by  sed  at  runtime	by  invoking:  sed  -e
	     sed_expression.   The  argument is	not split at space boundaries,
	     and can use any feature supported by sed  including  the  use  of
	     hold spaces.  See below for an example.  Note that	currently only
	     MPICH/P4, MPICH2 and EMP  change  their  behavior	for  different
	     names.

       -transform-hostname-program executable
	     Similar  to  the  previous	 option, but instead of	using sed, the
	     list of hostnames will be passed on standard input	to the	exter-
	     nal  script  or program you specify.  It is expected to generate,
	     in	order, the alternate names to be  used	for  message  passing.
	     This  option  is  a generalization	of the previous	one and	is ex-
	     pected to be used only by power users at sites with complex  net-
	     work setups.

       -gige This  option  is deprecated, but still accepted and synonymous to
	     the preferred option -transform-hostname=s/node/gige/.

       -tv, -totalview
	     Debug using totalview.  The process on node zero attempts to open
	     an	 X  window  to $DISPLAY, and all processes are attached	by to-
	     talview threads.  See totalview(1)	for more information.

       -kill If	any one	of the processes dies, wait a little,  then  kill  all
	     the  other	 processes  in the parallel job.  Your message passing
	     library should handle this	for you	in most	circumstances.

       -config configfile
	     Process executable	and arguments are specified in the given  con-
	     figuration	file.  This flag permits the use of heterogeneous jobs
	     using multiple executables, architectures,	and command line argu-
	     ments.  No	executable is given on the command line	when using the
	     -config flag.  If configfile is "-", then	the  configuration  is
	     read  from	 standard  input.   In	this case the flag -nostdin is
	     mandatory,	as it is not possible to separate the contents of  the
	     configuration file	from process input.

       -version
	     Display the mpiexec version number	and configure arguments.

MPI LIBRARY OPTIONS
       Different  MPI  libraries  may  support tuning options which can	change
       their behavior or performance.  Mpiexec	does  not  explicitly  support
       these,  but  it does pass the environment variables used	to set the op-
       tions, for example, MPICH/GM has	an option to set the maximum size  for
       "eager"	(as opposed to rendez-vous) messages.  In sh or	bash, this can
       be set with:

	     GMPI_EAGER=16384 mpiexec mycode

       or in csh or tcsh:

	     setenv GMPI_EAGER=16384
	     mpiexec mycode

       Other  options  can  be	found  in  the	MPI  documentation,  such   as
       GMPI_SHMEM, GMPI_RECV, P4_SOCKBUFSIZE and P4_GLOBMEMSIZE.

       Although	 not  an  MPI library implementation, the "none" communication
       device can be handy for running many copies of the same serial program.
       Programs	 spawned  with	this  device are provided an extra environment
       variable, MPIEXEC_RANK, which they can use to generate a	unique identi-
       fier in the context of the pseudo-parallel job.

CONFIG FILE
       Each  line  of a	configuration file contains a node specification and a
       command line, separated by a single colon (:).  A command line consists
       of  an  executable  name	and arguments to be passed to that executable,
       just like when running mpiexec without a	config file.  A	node  specifi-
       cation can be either:

       -n numproc
	     Run the executable	on a certain number of processors.

       nodespec
	     Run the executable	on the named nodes specified by	nodespec.

       A node specification is a space-separated list of hostnames.  Each ele-
       ment in the list	is interpreted using case-insensitive  standard	 shell
       wildcard	 patterns  (see	 glob(7)  and fnmatch(3)), to produce multiple
       hostnames, possibly.  It	is not an error	to specify nodes in the	 node-
       spec  that  are not actually part of the	pbs allocation.	 This allows a
       single generic configuration file to be used in multiple	situations.

       Config file example
	     node03 node04 node1* : myexe -s 4
	     -n	5 : otherexe -f	2 -large

	     If	processors are available on the	nodes, run the code  myexe  on
	     node03,  node04, and any machine with a hostname matching node1*.
	     Pick up to	five other nodes on which to run  otherexe,  depending
	     on	availability and any -n	arguments.

       Note  that each node listed in a	node specification is chosen only once
       to run a	given process.	If using multiprocessor	nodes, and you do want
       to  run	two or more copies of the code on a given node,	list that node
       twice in	the line, or duplicate the config file entry.  Also note  that
       node-anonymous  specifications (e.g., -n	6) may choose other processors
       on a node that already has processes assigned; use the -pernode flag on
       the command line	if you want node-exclusive behavior.

       There  is  no  way  to  run  more  than one process per processor using
       mpiexec.	 You must explicitly spawn threads in your code	if you wish to
       do  this.  The presence of a -n argument	on the command line limits the
       total number of processors available to the configuration  file	selec-
       tion process, just as the flag -pernode limits the available nodes.

       It  is  not an error if some lines in the configuration file can	not be
       satisfied with the available nodes.  If,	however, a -n <numproc>	 argu-
       ment requests more than can be satisfied, or if no tasks	could be allo-
       cated, an error is reported.

       Finally,	the order of lines in the configuration	file is	 the  same  as
       the  order of tasks in the MPI sense when the process is	started.  Com-
       ments starting with '#' to the end of the  line	are  ignored  anywhere
       they appear in the configuration	file.

CONCURRENT MPIEXEC
       You  can	 run  invoke mpiexec multiple times in the same	batch job, one
       after the other,	sequentially.  But you can also	run multiple  mpiexecs
       in  the	same batch job concurrently.  In a 10-node PBS allocation, for
       example:

	     mpiexec -n	5 a.out	args1 <	input1 > output1 &
	     mpiexec -n	5 a.out	args2 <	input2 > output2 &
	     wait

       This runs two different instances of the	parallel code, each on its own
       set of 5	nodes with its own input file and output file.

       The  first  invocation of mpiexec handles all interactions with PBS and
       thus waits for all subsequent ones to finish before it exits.  Communi-
       cation between the concurrent mpiexecs is mediated through a named pipe
       in /tmp that is created by the first mpiexec.

       Note that none of the command line arguments apply from one mpiexec  to
       other  concurrent ones.	For instance, -pernode applies as a constraint
       separately to each one.	The first "mpiexec -pernode" will not  reserve
       its  unused  processors	from use by subsequent concurrent ones.	 To do
       something like this, a configuration file may be	your best option.

       Finally,	since only one mpiexec can be the master at a  time,  if  your
       code  setup requires that mpiexec exit to get a result, you can start a
       "dummy" mpiexec first in	your batch job:

	     mpiexec -server

       It runs no tasks	itself but handles the connections of other  transient
       mpiexec clients.	 It will shut down cleanly when	the batch job exits or
       you may kill the	server explicitly.   If	 the  server  is  killed  with
       SIGTERM	(or  HUP  or INT), it will exit	with a status of zero if there
       were no clients connected at the	time.  If there	were still clients us-
       ing  the	 server, the server will kill all their	tasks, disconnect from
       the clients, and	exit with status 1.

       If you are using	mpich/p4, be aware that	limitations  in	 the  mpich/p4
       library	restrict  all task zeros to be on the same node	as the mpiexec
       process itself, hence concurrency is severely  limited.	 You  can  use
       -pernode	to permit one concurrent job for each CPU in the node, though.

EXAMPLES
       mpiexec a.out
	     Run  the  executable a.out	as a parallel mpi code on each process
	     allocated by pbs.

       mpiexec -n 2 a.out -b 4
	     Run the code with arguments -b 4 on only two processors.

       mpiexec -pernode	-conf my.config
	     Run only one process on each node,	using the nodes	 and  executa-
	     bles listed in the	configuration file my.config.

       mpiexec mycode >out 2>err
	     Using a sh-compatible shell, send the standard output of all pro-
	     cesses to the file	out, and the stdandard error to	err.

       mpiexec mycode >& output
	     Using a csh-compatible shell, combine the standard	output and er-
	     ror streams of all	processes to the file output.

       mpiexec mycode |	sort > output
	     Sort  the output of the processes.	 Standard error	will appear as
	     the standard error	of the mpiexec process.

       mpiexec -comm none -pernode mkdir /tmp/my-temp-dir
	     Run the standard unix command mkdir on each of the	SMP  nodes  in
	     your PBS node allocation for this job.

       mpiexec -comm mpich-p4 mycode-p4
	     Run  a  code compiled using MPICH/P4, even	though your system ad-
	     ministrator has chosen MPICH/GM as	a default.

       mpiexec --transform-hostname='s/su/10.1./; s/cn/./'
	     For each hostname provided	by PBS,	translate it using  the	 given
	     sed  command  to generate the list	of names passed	to the MPI li-
	     brary.

ENVIRONMENT VARIABLES
       Mpiexec uses PBS_JOBID as deposited in the environment by pbs  to  con-
       tact the	pbs daemons.  When looking for the executable to run, the PATH
       environment variable is consulted, as well as searching in the  current
       working	directory,  and	jobs are started using SHELL on	all the	nodes.
       For totalview debugging	runs,  the  settings  in  DISPLAY  and	LM_LI-
       CENSE_FILE may be important.

       To  specify  a default communication library, the variable MPIEXEC_COMM
       may be set to one of the	accepted values	for -comm as documented	above.
       The  command-line  argument takes precedence over the environment vari-
       able, and if neither is set, the	compiled-in default is used.

       Note that mpiexec does pass all variables in the	environment  which  it
       was given, but PBS will not copy	your entire environment	for batch jobs
       at job submission time unless you use invoke qsub using	the  -V	 argu-
       ment.

DIAGNOSTICS
       mpiexec:	Warning: tasks <tasknum>,... exited with status	<exitval>.
	     One  or  more  of the tasks in the	parallel process exited	with a
	     non-zero exit status.  This is the	value a	program	returns	to its
	     environment  when	it  finishes,  either with "return exitval" or
	     "exit(exitval)", or in FORTRAN, "STOP exitval".  Tradition	 holds
	     that a program which terminates correctly should return zero, and
	     hence mpiexec warns if it sees otherwise.	Due to race conditions
	     inherent  in  the	TM interface, sometimes	mpiexec	will report an
	     exit value	of zero	even though it was actually otherwise.

       mpiexec:	Warning: task <tasknum>	died with signal <signum>
	     One of the	tasks in the parallel process exited due to receipt of
	     an	 uncaught signal.  The symbolic	names of signal	numbers	can be
	     listed with "kill -l".  Common ones are SIGSEGV (11)  and	SIGBUS
	     (7),  both	 of which generally indicate a program error.  Others,
	     SIGINT (2), SIGKILL (9), and SIGTERM (15),	 may  occur  when  the
	     task is killed or interrupted externally.

ERRORS
       tm: not connected
	     A	fatal  error  occurred	in  communications between the mpiexec
	     process and the local pbs_mom.  This might	occur due to  bugs  in
	     pbs_mom, and is not recoverable.

       mpiexec:	 Error:	 PBS_JOBID  not	 set in	environment.  Code must	be run
       from a PBS script, perhaps interactively	using "qsub -I".
	     It	is not possible	to run mpiexec unless you are within a PBS en-
	     vironment,	either created in a batch or interactive PBS job.  See
	     tha man page for qsub on how to submit a job.

EXIT VALUE
       Mpiexec returns to its environment the exit status  of  process	number
       zero  in	a parallel task.  With this, scripts which use mpiexec can ac-
       cess the	return value of	the parallel program.	If  task  zero	exited
       with a signal, as opposed to naturally with STOP	or exit(), mpiexec re-
       turns 256 + signum, where signum	is the signal that killed  task	 zero.
       This is a convention inherited from PBS.

AUTHOR
       Pete Wyckoff <pw@osc.edu>

SEE ALSO
       mpirun(1), pbs(1B), tm(3B), qsub(1B), totalview(1), kill(1)

OSC MPI	utilities		  21 Sep 2004			    MPIEXEC(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | MPI LIBRARY OPTIONS | CONFIG FILE | CONCURRENT MPIEXEC | EXAMPLES | ENVIRONMENT VARIABLES | DIAGNOSTICS | ERRORS | EXIT VALUE | AUTHOR | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=mpiexec&sektion=1&manpath=FreeBSD+12.0-RELEASE+and+Ports>

home | help