Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
fi_endpoint(3)			   @VERSION@			fi_endpoint(3)

NAME
       fi_endpoint - Fabric endpoint operations

       fi_endpoint / fi_scalable_ep / fi_passive_ep / fi_close
	      Allocate or close	an endpoint.

       fi_ep_bind
	      Associate	 an  endpoint  with  hardware resources, such as event
	      queues, completion queues, counters, address vectors, or	shared
	      transmit/receive contexts.

       fi_scalable_ep_bind
	      Associate	a scalable endpoint with an address vector

       fi_pep_bind
	      Associate	a passive endpoint with	an event queue

       fi_enable
	      Transitions an active endpoint into an enabled state.

       fi_cancel
	      Cancel a pending asynchronous data transfer

       fi_ep_alias
	      Create an	alias to the endpoint

       fi_control
	      Control endpoint operation.

       fi_getopt / fi_setopt
	      Get or set endpoint options.

       fi_rx_context / fi_tx_context / fi_srx_context /	fi_stx_context
	      Open a transmit or receive context.

       fi_tc_dscp_set /	fi_tc_dscp_get
	      Convert between a	DSCP value and a network traffic class

       fi_rx_size_left / fi_tx_size_left (DEPRECATED)
	      Query the	lower bound on how many	RX/TX operations may be	posted
	      without an operation returning -FI_EAGAIN.  This functions  have
	      been  deprecated	and will be removed in a future	version	of the
	      library.

SYNOPSIS
	      #include <rdma/fabric.h>

	      #include <rdma/fi_endpoint.h>

	      int fi_endpoint(struct fid_domain	*domain, struct	fi_info	*info,
		  struct fid_ep	**ep, void *context);

	      int fi_scalable_ep(struct	fid_domain *domain, struct fi_info *info,
		  struct fid_ep	**sep, void *context);

	      int fi_passive_ep(struct fi_fabric *fabric, struct fi_info *info,
		  struct fid_pep **pep,	void *context);

	      int fi_tx_context(struct fid_ep *sep, int	index,
		  struct fi_tx_attr *attr, struct fid_ep **tx_ep,
		  void *context);

	      int fi_rx_context(struct fid_ep *sep, int	index,
		  struct fi_rx_attr *attr, struct fid_ep **rx_ep,
		  void *context);

	      int fi_stx_context(struct	fid_domain *domain,
		  struct fi_tx_attr *attr, struct fid_stx **stx,
		  void *context);

	      int fi_srx_context(struct	fid_domain *domain,
		  struct fi_rx_attr *attr, struct fid_ep **rx_ep,
		  void *context);

	      int fi_close(struct fid *ep);

	      int fi_ep_bind(struct fid_ep *ep,	struct fid *fid, uint64_t flags);

	      int fi_scalable_ep_bind(struct fid_ep *sep, struct fid *fid, uint64_t flags);

	      int fi_pep_bind(struct fid_pep *pep, struct fid *fid, uint64_t flags);

	      int fi_enable(struct fid_ep *ep);

	      int fi_cancel(struct fid_ep *ep, void *context);

	      int fi_ep_alias(struct fid_ep *ep, struct	fid_ep **alias_ep, uint64_t flags);

	      int fi_control(struct fid	*ep, int command, void *arg);

	      int fi_getopt(struct fid *ep, int	level, int optname,
		  void *optval,	size_t *optlen);

	      int fi_setopt(struct fid *ep, int	level, int optname,
		  const	void *optval, size_t optlen);

	      uint32_t fi_tc_dscp_set(uint8_t dscp);

	      uint8_t fi_tc_dscp_get(uint32_t tclass);

	      DEPRECATED ssize_t fi_rx_size_left(struct	fid_ep *ep);

	      DEPRECATED ssize_t fi_tx_size_left(struct	fid_ep *ep);

ARGUMENTS
       fid    On creation, specifies a fabric  or  access  domain.   On	 bind,
	      identifies  the  event  queue, completion	queue, counter,	or ad-
	      dress vector to bind to the endpoint.  In	other  cases,  it's  a
	      fabric identifier	of an associated resource.

       info   Details  about  the  fabric interface endpoint to	be opened, ob-
	      tained from fi_getinfo.

       ep     A	fabric endpoint.

       sep    A	scalable fabric	endpoint.

       pep    A	passive	fabric endpoint.

       context
	      Context associated with the endpoint or asynchronous operation.

       index  Index to retrieve	a specific transmit/receive context.

       attr   Transmit or receive context attributes.

       flags  Additional flags to apply	to the operation.

       command
	      Command of control operation to perform on endpoint.

       arg    Optional control argument.

       level  Protocol level at	which the desired option resides.

       optname
	      The protocol option to read or set.

       optval The option value that was	read or	to set.

       optlen The size of the optval buffer.

DESCRIPTION
       Endpoints are transport level communication  portals.   There  are  two
       types  of endpoints: active and passive.	 Passive endpoints belong to a
       fabric domain and are most often	used to	listen for incoming connection
       requests.   However, a passive endpoint may be used to reserve a	fabric
       address that can	be granted to an active	 endpoint.   Active  endpoints
       belong to access	domains	and can	perform	data transfers.

       Active  endpoints may be	connection-oriented or connectionless, and may
       provide data reliability.  The data  transfer  interfaces  --  messages
       (fi_msg),  tagged  messages  (fi_tagged),  RMA  (fi_rma),  and  atomics
       (fi_atomic) -- are associated with active endpoints.  In	basic configu-
       rations,	an active endpoint has transmit	and receive queues.  In	gener-
       al, operations that generate traffic on the fabric are  posted  to  the
       transmit	 queue.	  This	includes  all RMA and atomic operations, along
       with sent messages and sent tagged messages.  Operations	that post buf-
       fers for	receiving incoming data	are submitted to the receive queue.

       Active  endpoints are created in	the disabled state.  They must transi-
       tion into an enabled state before accepting data	 transfer  operations,
       including  posting  of  receive buffers.	 The fi_enable call is used to
       transition an active endpoint into an enabled  state.   The  fi_connect
       and  fi_accept  calls will also transition an endpoint into the enabled
       state, if it is not already active.

       In order	to transition an endpoint into an enabled state,  it  must  be
       bound  to one or	more fabric resources.	An endpoint that will generate
       asynchronous completions, either	through	data  transfer	operations  or
       communication  establishment  events,  must be bound to the appropriate
       completion queues or event queues, respectively,	before being  enabled.
       Additionally,  endpoints	 that  use  manual progress must be associated
       with relevant completion	queues or  event  queues  in  order  to	 drive
       progress.   For	endpoints  that	 are only used as the target of	RMA or
       atomic operations, this means binding  the  endpoint  to	 a  completion
       queue  associated  with receive processing.  Unconnected	endpoints must
       be bound	to an address vector.

       Once an endpoint	has been activated, it may be associated with  an  ad-
       dress  vector.	Receive	 buffers  may be posted	to it and calls	may be
       made to connection establishment	 routines.   Connectionless  endpoints
       may also	perform	data transfers.

       The behavior of an endpoint may be adjusted by setting its control data
       and protocol options.  This allows the underlying provider to  redirect
       function	 calls to implementations optimized to meet the	desired	appli-
       cation behavior.

       If an endpoint experiences a critical error, it	will  transition  back
       into  a disabled	state.	Critical errors	are reported through the event
       queue associated	with the EP.  In certain cases,	 a  disabled  endpoint
       may  be	re-enabled.   The  ability  to transition back into an enabled
       state is	provider specific and depends on the type of  error  that  the
       endpoint	 experienced.	When  an endpoint is disabled as a result of a
       critical	error, all pending operations are discarded.

   fi_endpoint / fi_passive_ep / fi_scalable_ep
       fi_endpoint allocates a new active endpoint.  fi_passive_ep allocates a
       new  passive  endpoint.	 fi_scalable_ep	allocates a scalable endpoint.
       The properties and behavior of the endpoint are defined	based  on  the
       provided	 struct	 fi_info.   See	 fi_getinfo  for additional details on
       fi_info.	 fi_info flags that control the	operation of an	 endpoint  are
       defined below.  See section SCALABLE ENDPOINTS.

       If  an active endpoint is allocated in order to accept a	connection re-
       quest, the fi_info parameter must be the	same as	the fi_info  structure
       provided	with the connection request (FI_CONNREQ) event.

       An  active endpoint may acquire the properties of a passive endpoint by
       setting the fi_info handle field	to the	passive	 endpoint  fabric  de-
       scriptor.   This	 is  useful  for applications that need	to reserve the
       fabric address of an endpoint prior to knowing if the endpoint will  be
       used  on	the active or passive side of a	connection.  For example, this
       feature is useful for simulating	socket semantics.  Once	an active end-
       point  acquires	the properties of a passive endpoint, the passive end-
       point is	no longer bound	to any fabric resources	and must no longer  be
       used.  The user is expected to close the	passive	endpoint after opening
       the active endpoint in order to free up any  lingering  resources  that
       had been	used.

   fi_close
       Closes an endpoint and release all resources associated with it.

       When closing a scalable endpoint, there must be no opened transmit con-
       texts, or receive contexts associated with the scalable	endpoint.   If
       resources are still associated with the scalable	endpoint when attempt-
       ing to close, the call will return -FI_EBUSY.

       Outstanding operations posted to	the endpoint when fi_close  is	called
       will be discarded.  Discarded operations	will silently be dropped, with
       no completions reported.	 Additionally, a provider may  discard	previ-
       ously  completed	 operations  from  the associated completion queue(s).
       The behavior to discard completed operations is provider	specific.

   fi_ep_bind
       fi_ep_bind is used to associate an endpoint with	 other	allocated  re-
       sources,	 such  as  completion queues, counters,	address	vectors, event
       queues, shared contexts,	and memory regions.  The type of objects  that
       must be bound with an endpoint depend on	the endpoint type and its con-
       figuration.

       Passive endpoints must be bound with an	EQ  that  supports  connection
       management  events.  Connectionless endpoints must be bound to a	single
       address vector.	If an endpoint is using	a shared transmit  and/or  re-
       ceive context, the shared contexts must be bound	to the endpoint.  CQs,
       counters, AV, and shared	contexts must be  bound	 to  endpoints	before
       they are	enabled	either explicitly or implicitly.

       An endpoint must	be bound with CQs capable of reporting completions for
       any asynchronous	operation initiated on the endpoint.  For example,  if
       the  endpoint  supports	any  outbound  transfers (sends, RMA, atomics,
       etc.), then it must be bound to a  completion  queue  that  can	report
       transmit	 completions.  This is true even if the	endpoint is configured
       to suppress successful completions, in order that operations that  com-
       plete in	error may be reported to the user.

       An  active  endpoint  may  direct asynchronous completions to different
       CQs,  based  on	the  type  of  operation.   This  is  specified	 using
       fi_ep_bind flags.  The following	flags may be OR'ed together when bind-
       ing an endpoint to a completion domain CQ.

       FI_RECV
	      Directs the notification of inbound data transfers to the	speci-
	      fied  completion	queue.	This includes received messages.  This
	      binding automatically includes FI_REMOTE_WRITE, if applicable to
	      the endpoint.

       FI_SELECTIVE_COMPLETION
	      By default, data transfer	operations write CQ completion entries
	      into the associated completion queue after they have successful-
	      ly completed.  Applications can use this bind flag to selective-
	      ly enable	when completions are generated.	 If  FI_SELECTIVE_COM-
	      PLETION is specified, data transfer operations will not generate
	      CQ entries for successful	completions  unless  FI_COMPLETION  is
	      set  as an operational flag for the given	operation.  Operations
	      that fail	asynchronously will still generate  completions,  even
	      if  a completion is not requested.  FI_SELECTIVE_COMPLETION must
	      be OR'ed with FI_TRANSMIT	and/or FI_RECV flags.

       When FI_SELECTIVE_COMPLETION is set, the	user must determine when a re-
       quest  that  does  NOT have FI_COMPLETION set has completed indirectly,
       usually based on	the completion of a subsequent operation or  by	 using
       completion  counters.   Use of this flag	may improve performance	by al-
       lowing the provider to avoid writing a CQ completion  entry  for	 every
       operation.

       See Notes section below for additional information on how this flag in-
       teracts with the	FI_CONTEXT and FI_CONTEXT2 mode	bits.

       FI_TRANSMIT
	      Directs the completion of	outbound data transfer requests	to the
	      specified	 completion  queue.   This includes send message, RMA,
	      and atomic operations.

       An endpoint may optionally be bound to a	completion counter.  Associat-
       ing  an endpoint	with a counter is in addition to binding the EP	with a
       CQ.  When binding an endpoint to	a counter, the following flags may  be
       specified.

       FI_READ
	      Increments  the  specified  counter whenever an RMA read,	atomic
	      fetch, or	atomic compare operation initiated from	 the  endpoint
	      has completed successfully or in error.

       FI_RECV
	      Increments  the specified	counter	whenever a message is received
	      over the endpoint.  Received messages include  both  tagged  and
	      normal message operations.

       FI_REMOTE_READ
	      Increments  the  specified  counter whenever an RMA read,	atomic
	      fetch, or	atomic compare operation is initiated  from  a	remote
	      endpoint	that targets the given endpoint.  Use of this flag re-
	      quires that the endpoint be created using	FI_RMA_EVENT.

       FI_REMOTE_WRITE
	      Increments the specified counter whenever	an RMA write  or  base
	      atomic  operation	 is initiated from a remote endpoint that tar-
	      gets the given endpoint.	Use of this  flag  requires  that  the
	      endpoint be created using	FI_RMA_EVENT.

       FI_SEND
	      Increments  the  specified  counter  whenever a message transfer
	      initiated	over the endpoint has completed	successfully or	in er-
	      ror.  Sent messages include both tagged and normal message oper-
	      ations.

       FI_WRITE
	      Increments the specified counter whenever	an RMA write  or  base
	      atomic  operation	initiated from the endpoint has	completed suc-
	      cessfully	or in error.

       An endpoint may only be bound to	a single CQ or	counter	 for  a	 given
       type of operation.  For example,	a EP may not bind to two counters both
       using FI_WRITE.	Furthermore, providers may limit CQ and	counter	 bind-
       ings to endpoints of the	same endpoint type (DGRAM, MSG,	RDM, etc.).

   fi_scalable_ep_bind
       fi_scalable_ep_bind  is	used  to associate a scalable endpoint with an
       address vector.	See section on SCALABLE	ENDPOINTS.   A	scalable  end-
       point  has  a  single  transport	level address and can support multiple
       transmit	and receive contexts.  The transmit and	receive	contexts share
       the  transport-level  address.  Address vectors that are	bound to scal-
       able endpoints are implicitly bound to any transmit or receive contexts
       created using the scalable endpoint.

   fi_enable
       This  call transitions the endpoint into	an enabled state.  An endpoint
       must be enabled before it may be	used to	perform	data  transfers.   En-
       abling  an  endpoint  typically results in hardware resources being as-
       signed to it.  Endpoints	making use  of	completion  queues,  counters,
       event queues, and/or address vectors must be bound to them before being
       enabled.

       Calling connect or accept on an endpoint	will implicitly	enable an end-
       point if	it has not already been	enabled.

       fi_enable  may also be used to re-enable	an endpoint that has been dis-
       abled as	a result  of  experiencing  a  critical	 error.	  Applications
       should  check the return	value from fi_enable to	see if a disabled end-
       point has successfully be re-enabled.

   fi_cancel
       fi_cancel attempts to cancel  an	 outstanding  asynchronous  operation.
       Canceling an operation causes the fabric	provider to search for the op-
       eration and, if it is still pending, complete it	as  having  been  can-
       celed.	An error queue entry will be available in the associated error
       queue with error	code FI_ECANCELED.  On the other hand, if  the	opera-
       tion completed before the call to fi_cancel, then the completion	status
       of that operation will be available in the associated completion	queue.
       No specific entry related to fi_cancel itself will be posted.

       Cancel uses the context parameter associated with an operation to iden-
       tify the	request	to cancel.  Operations posted without a	valid  context
       parameter  --  either  no context parameter is specified	or the context
       value was ignored by the	provider -- cannot be canceled.	  If  multiple
       outstanding  operations	match  the context parameter, only one will be
       canceled.  In this case,	the operation which is	canceled  is  provider
       specific.   The	cancel	operation  is  asynchronous, but will complete
       within a	bounded	period of time.

   fi_ep_alias
       This call creates an alias to the specified endpoint.  Conceptually, an
       endpoint	alias provides an alternate software path from the application
       to the underlying provider hardware.  An	alias EP differs from its par-
       ent  endpoint only by its default data transfer flags.  For example, an
       alias EP	may be configured to use a different completion	mode.  By  de-
       fault,  an alias	EP inherits the	same data transfer flags as the	parent
       endpoint.  An application can use fi_control to modify the alias	EP op-
       erational flags.

       When  allocating	 an  alias,  an	 application  may configure either the
       transmit	or receive operational flags.  This avoids needing a  separate
       call to fi_control to set those flags.  The flags passed	to fi_ep_alias
       must include FI_TRANSMIT	or FI_RECV (not	both) with  other  operational
       flags  OR'ed in.	 This will override the	transmit or receive flags, re-
       spectively, for operations posted through the alias endpoint.  All  al-
       located	aliases	 must  be closed for the underlying endpoint to	be re-
       leased.

   fi_control
       The control operation is	used to	adjust the default behavior of an end-
       point.  It allows the underlying	provider to redirect function calls to
       implementations optimized to meet the desired application behavior.  As
       a  result,  calls to fi_ep_control must be serialized against all other
       calls to	an endpoint.

       The base	operation of an	endpoint is  selected  during  creation	 using
       struct  fi_info.	  The  following control commands and arguments	may be
       assigned	to an endpoint.

       **FI_BACKLOG - int *value**
	      This option only applies to passive endpoints.  It  is  used  to
	      set the connection request backlog for listening endpoints.

       **FI_GETOPSFLAG -- uint64_t *flags**
	      Used  to retrieve	the current value of flags associated with the
	      data transfer operations initiated on the	endpoint.  The control
	      argument must include FI_TRANSMIT	or FI_RECV (not	both) flags to
	      indicate the type	of data	transfer flags to  be  returned.   See
	      below for	a list of control flags.

       FI_GETWAIT -- void **
	      This command allows the user to retrieve the file	descriptor as-
	      sociated with a socket endpoint.	The fi_control	arg  parameter
	      should  be  an  address where a pointer to the returned file de-
	      scriptor will be written.	 See fi_eq.3 for addition details  us-
	      ing fi_control with FI_GETWAIT.  The file	descriptor may be used
	      for notification that the	endpoint is ready to send  or  receive
	      data.

       **FI_SETOPSFLAG -- uint64_t *flags**
	      Used to change the data transfer operation flags associated with
	      an endpoint.  The	control	argument must include  FI_TRANSMIT  or
	      FI_RECV  (not  both)  to indicate	the type of data transfer that
	      the flags	should apply to, with other flags OR'ed	in.  The given
	      flags will override the previous transmit	and receive attributes
	      that were	set when the  endpoint	was  created.	Valid  control
	      flags are	defined	below.

   fi_getopt / fi_setopt
       Endpoint	 protocol  operations  may be retrieved	using fi_getopt	or set
       using fi_setopt.	 Applications specify the level	that a desired	option
       exists, identify	the option, and	provide	input/output buffers to	get or
       set the option.	fi_setopt provides an  application  a  way  to	adjust
       low-level protocol and implementation specific details of an endpoint.

       The  following  option  levels  and option names	and parameters are de-
       fined.

       FI_OPT_ENDPOINT o .RS 2

       FI_OPT_BUFFERED_LIMIT - size_t
	      Defines the maximum size of a buffered message that will be  re-
	      ported  to  users	 as  part  of  a  receive  completion when the
	      FI_BUFFERED_RECV mode is enabled on an endpoint.

       fi_getopt() will	return the  currently  configured  threshold,  or  the
       provider's  default threshold if	one has	not be set by the application.
       fi_setopt() allows an application to configure the threshold.   If  the
       provider	 cannot	 support  the  requested  threshold,  it will fail the
       fi_setopt()  call  with	FI_EMSGSIZE.   Calling	fi_setopt()  with  the
       threshold  set  to  SIZE_MAX will set the threshold to the maximum sup-
       ported by the provider.	fi_getopt() can	then be	used to	 retrieve  the
       set size.

       In  most	 cases,	the sending and	receiving endpoints must be configured
       to use the same threshold value,	and the	threshold must be set prior to
       enabling	the endpoint.  o .RS 2

       FI_OPT_BUFFERED_MIN - size_t
	      Defines  the minimum size	of a buffered message that will	be re-
	      ported.  Applications would set this to a	size that's big	enough
	      to decide	whether	to discard or claim a buffered receive or when
	      to claim a buffered receive on getting a buffered	 receive  com-
	      pletion.	The value is typically used by a provider when sending
	      a	rendezvous protocol request  where  it	would  send  at	 least
	      FI_OPT_BUFFERED_MIN  bytes of application	data along with	it.  A
	      smaller sized rendezvous protocol	 message  usually  results  in
	      better latency for the overall transfer of a large message.
       o .RS 2

       FI_OPT_CM_DATA_SIZE - size_t
	      Defines  the size	of available space in CM messages for user-de-
	      fined data.  This	value limits the amount	of data	that  applica-
	      tions  can exchange between peer endpoints using the fi_connect,
	      fi_accept, and fi_reject operations.  The	size returned  is  de-
	      pendent  upon the	properties of the endpoint, except in the case
	      of passive endpoints, in which the  size	reflects  the  maximum
	      size of the data that may	be present as part of a	connection re-
	      quest event.  This option	is read	only.
       o .RS 2

       FI_OPT_MIN_MULTI_RECV - size_t
	      Defines the minimum receive buffer space available when the  re-
	      ceive  buffer  is	 released by the provider (see FI_MULTI_RECV).
	      Modifying	this value is only guaranteed to set the minimum  buf-
	      fer  space  needed  on  receives posted after the	value has been
	      changed.	It is recommended that applications that want to over-
	      ride the default MIN_MULTI_RECV value set	this option before en-
	      abling the corresponding endpoint.

   fi_tc_dscp_set
       This call converts a DSCP defined value into a libfabric	traffic	 class
       value.	It should be used when assigning a DSCP	value when setting the
       tclass field in either domain or	endpoint attributes

   fi_tc_dscp_get
       This call returns the DSCP value	associated with	the tclass  field  for
       the domain or endpoint attributes.

   fi_rx_size_left (DEPRECATED)
       This  function has been deprecated and will be removed in a future ver-
       sion of the library.  It	may not	be supported by	all providers.

       The fi_rx_size_left call	returns	a lower	bound on the number of receive
       operations that may be posted to	the given endpoint without that	opera-
       tion returning -FI_EAGAIN.  Depending on	the specific  details  of  the
       subsequently  posted  receive  operations (e.g.,	number of iov entries,
       which receive function is called, etc.),	it may	be  possible  to  post
       more receive operations than originally indicated by fi_rx_size_left.

   fi_tx_size_left (DEPRECATED)
       This  function has been deprecated and will be removed in a future ver-
       sion of the library.  It	may not	be supported by	all providers.

       The fi_tx_size_left call	returns	a lower	bound on the number of	trans-
       mit  operations	that  may be posted to the given endpoint without that
       operation returning -FI_EAGAIN.	Depending on the specific  details  of
       the  subsequently  posted  transmit operations (e.g., number of iov en-
       tries, which transmit function is called, etc.),	it may be possible  to
       post   more   transmit	operations   than   originally	 indicated  by
       fi_tx_size_left.

ENDPOINT ATTRIBUTES
       The fi_ep_attr structure	defines	the set	of attributes associated  with
       an  endpoint.   Endpoint	 attributes  may  be further refined using the
       transmit	and receive context attributes as shown	below.

	      struct fi_ep_attr	{
		  enum fi_ep_type type;
		  uint32_t	  protocol;
		  uint32_t	  protocol_version;
		  size_t	  max_msg_size;
		  size_t	  msg_prefix_size;
		  size_t	  max_order_raw_size;
		  size_t	  max_order_war_size;
		  size_t	  max_order_waw_size;
		  uint64_t	  mem_tag_format;
		  size_t	  tx_ctx_cnt;
		  size_t	  rx_ctx_cnt;
		  size_t	  auth_key_size;
		  uint8_t	  *auth_key;
	      };

   type	- Endpoint Type
       If specified, indicates the type	of fabric interface communication  de-
       sired.  Supported types are:

       FI_EP_DGRAM
	      Supports	a  connectionless,  unreliable datagram	communication.
	      Message boundaries are maintained, but the maximum message  size
	      may  be  limited to the fabric MTU.  Flow	control	is not guaran-
	      teed.

       FI_EP_MSG
	      Provides a reliable, connection-oriented data  transfer  service
	      with flow	control	that maintains message boundaries.

       FI_EP_RDM
	      Reliable datagram	message.  Provides a reliable, unconnected da-
	      ta transfer service with flow  control  that  maintains  message
	      boundaries.

       FI_EP_SOCK_DGRAM
	      A	 connectionless,  unreliable  datagram endpoint	with UDP sock-
	      et-like semantics.  FI_EP_SOCK_DGRAM is most useful for applica-
	      tions  designed  around  using UDP sockets.  See the SOCKET END-
	      POINT section for	additional details and restrictions that apply
	      to datagram socket endpoints.

       FI_EP_SOCK_STREAM
	      Data  streaming  endpoint	 with TCP socket-like semantics.  Pro-
	      vides a reliable,	connection-oriented data transfer service that
	      does not maintain	message	boundaries.  FI_EP_SOCK_STREAM is most
	      useful for applications designed around using TCP	sockets.   See
	      the  SOCKET ENDPOINT section for additional details and restric-
	      tions that apply to stream endpoints.

       FI_EP_UNSPEC
	      The type of endpoint is not specified.  This is usually provided
	      as  input, with other attributes of the endpoint or the provider
	      selecting	the type.

   Protocol
       Specifies the low-level end to end protocol employed by	the  provider.
       A  matching  protocol must be used by communicating endpoints to	ensure
       interoperability.  The following	protocol values	are defined.  Provider
       specific	 protocols are also allowed.  Provider specific	protocols will
       be indicated by having the upper	bit of the protocol value set to one.

       FI_PROTO_GNI
	      Protocol runs over Cray GNI low-level interface.

       FI_PROTO_IB_RDM
	      Reliable-datagram	protocol  implemented  over  InfiniBand	 reli-
	      able-connected queue pairs.

       FI_PROTO_IB_UD
	      The  protocol  runs  over	 Infiniband  unreliable	datagram queue
	      pairs.

       FI_PROTO_IWARP
	      The protocol runs	over the  Internet  wide  area	RDMA  protocol
	      transport.

       FI_PROTO_IWARP_RDM
	      Reliable-datagram	 protocol implemented over iWarp reliable-con-
	      nected queue pairs.

       FI_PROTO_NETWORKDIRECT
	      Protocol runs over Microsoft NetworkDirect service provider  in-
	      terface.	 This  adds  reliable-datagram semantics over the Net-
	      workDirect connection- oriented endpoint semantics.

       FI_PROTO_PSMX
	      The protocol is based on an Intel	proprietary protocol known  as
	      PSM,  performance	scaled messaging.  PSMX	is an extended version
	      of the PSM protocol to support the libfabric interfaces.

       FI_PROTO_PSMX2
	      The protocol is based on an Intel	proprietary protocol known  as
	      PSM2,  performance  scaled messaging version 2.  PSMX2 is	an ex-
	      tended version of	the PSM2 protocol to support the libfabric in-
	      terfaces.

       FI_PROTO_RDMA_CM_IB_RC
	      The  protocol  runs  over	 Infiniband  reliable-connected	 queue
	      pairs, using the RDMA CM protocol	for connection establishment.

       FI_PROTO_RXD
	      Reliable-datagram	protocol implemented over datagram  endpoints.
	      RXD  is a	libfabric utility component that adds RDM endpoint se-
	      mantics over DGRAM endpoint semantics.

       FI_PROTO_RXM
	      Reliable-datagram	protocol implemented over  message  endpoints.
	      RXM  is a	libfabric utility component that adds RDM endpoint se-
	      mantics over MSG endpoint	semantics.

       FI_PROTO_SOCK_TCP
	      The protocol is layered over TCP packets.

       FI_PROTO_UDP
	      The protocol sends and receives UDP datagrams.  For example,  an
	      endpoint	using  FI_PROTO_UDP will be able to communicate	with a
	      remote peer that is using	Berkeley SOCK_DGRAM sockets using  IP-
	      PROTO_UDP.

       FI_PROTO_UNSPEC
	      The  protocol is not specified.  This is usually provided	as in-
	      put, with	other attributes of the	socket or the provider select-
	      ing the actual protocol.

   protocol_version - Protocol Version
       Identifies  which  version of the protocol is employed by the provider.
       The protocol version allows providers to	extend an  existing  protocol,
       by adding support for additional	features or functionality for example,
       in a backward compatible	manner.	 Providers that	support	different ver-
       sions  of  the  same protocol should inter-operate, but only when using
       the capabilities	defined	for the	lesser version.

   max_msg_size	- Max Message Size
       Defines the maximum size	for an application data	transfer as  a	single
       operation.

   msg_prefix_size - Message Prefix Size
       Specifies  the  size of any required message prefix buffer space.  This
       field will be 0 unless the FI_MSG_PREFIX	mode is	enabled.  If  msg_pre-
       fix_size	is > 0 the specified value will	be a multiple of 8-bytes.

   Max RMA Ordered Size
       The maximum ordered size	specifies the delivery order of	transport data
       into target memory for RMA and atomic  operations.   Data  ordering  is
       separate,  but dependent	on message ordering (defined below).  Data or-
       dering is unspecified where message order is not	defined.

       Data ordering refers to the access of target memory by subsequent oper-
       ations.	When back to back RMA read or write operations access the same
       registered memory location, data	ordering indicates whether the	second
       operation  reads	 or writes the target memory after the first operation
       has completed.  Because RMA ordering applies  between  two  operations,
       and not within a	single data transfer, ordering is defined per byte-ad-
       dressable memory	location.  I.e.	 ordering specifies whether location X
       is accessed by the second operation after the first operation.  Nothing
       is implied about	the completion of the first operation before the  sec-
       ond operation is	initiated.

       In  order  to  support  large data transfers being broken into multiple
       packets and sent	using multiple paths through the fabric, data ordering
       may  be	limited	 to  transfers	of a specific size or less.  Providers
       specify when data ordering is maintained	through	the following  values.
       Note that even if data ordering is not maintained, message ordering may
       be.

       max_order_raw_size
	      Read after write size.  If set, an RMA or	atomic read  operation
	      issued after an RMA or atomic write operation, both of which are
	      smaller than the size, will be ordered.  Where the target	memory
	      locations	overlap, the RMA or atomic read	operation will see the
	      results of the previous RMA or atomic write.

       max_order_war_size
	      Write after read size.  If set, an RMA or	atomic write operation
	      issued  after an RMA or atomic read operation, both of which are
	      smaller than the size, will be ordered.  The RMA or atomic  read
	      operation	 will see the initial value of the target memory loca-
	      tion before a subsequent RMA or atomic write updates the value.

       max_order_waw_size
	      Write after write	size.  If set, an RMA or atomic	 write	opera-
	      tion  issued  after  an  RMA  or atomic write operation, both of
	      which are	smaller	than the size, will be	ordered.   The	target
	      memory  location	will  reflect the results of the second	RMA or
	      atomic write.

       An order	size value of 0	indicates that ordering	is not guaranteed.   A
       value of	-1 guarantees ordering for any data size.

   mem_tag_format - Memory Tag Format
       The  memory  tag	 format	 is  a	bit array used to convey the number of
       tagged bits supported by	a provider.  Additionally, it may be  used  to
       divide  the bit array into separate fields.  The	mem_tag_format option-
       ally begins with	a series of bits set to	0, to signify bits  which  are
       ignored by the provider.	 Following the initial prefix of ignored bits,
       the array will consist of alternating groups of bits set	to all 1's  or
       all 0's.	 Each group of bits corresponds	to a tagged field.  The	impli-
       cation of defining a tagged field is that when a	mask is	applied	to the
       tagged  bit  array, all bits belonging to a single field	will either be
       set to 1	or 0, collectively.

       For example, a mem_tag_format of	0x30FF indicates support for 14	tagged
       bits, separated into 3 fields.  The first field consists	of 2-bits, the
       second field 4-bits, and	the final field	8-bits.	 Valid masks for  such
       a tagged	field would be a bitwise OR'ing	of zero	or more	of the follow-
       ing values: 0x3000, 0x0F00, and 0x00FF.	The provider may not  validate
       the mask	provided by the	application for	performance reasons.

       By  identifying fields within a tag, a provider may be able to optimize
       their search routines.  An application which requests tag  fields  must
       provide	tag  masks  that  either  set all mask bits corresponding to a
       field to	all 0 or all 1.	 When negotiating tag fields,  an  application
       can  request  a	specific number	of fields of a given size.  A provider
       must return a tag format	that supports the requested number of  fields,
       with each field being at	least the size requested, or fail the request.
       A provider may increase the size	of the fields.	When reporting comple-
       tions (see FI_CQ_FORMAT_TAGGED),	it is not guaranteed that the provider
       would clear out any unsupported tag bits	in the tag field of  the  com-
       pletion entry.

       It is recommended that field sizes be ordered from smallest to largest.
       A generic, unstructured tag and mask can	be achieved  by	 requesting  a
       bit array consisting of alternating 1's and 0's.

   tx_ctx_cnt -	Transmit Context Count
       Number  of  transmit  contexts  to associate with the endpoint.	If not
       specified (0), 1	context	will be	assigned if the	endpoint supports out-
       bound  transfers.   Transmit  contexts  are independent transmit	queues
       that may	be separately configured.  Each	transmit context may be	 bound
       to  a  separate CQ, and no ordering is defined between contexts.	 Addi-
       tionally, no synchronization is needed when accessing contexts in  par-
       allel.

       If  the	count is set to	the value FI_SHARED_CONTEXT, the endpoint will
       be configured to	use a shared transmit context,	if  supported  by  the
       provider.   Providers that do not support shared	transmit contexts will
       fail the	request.

       See the scalable	endpoint and shared contexts sections  for  additional
       details.

   rx_ctx_cnt -	Receive	Context	Count
       Number  of  receive  contexts  to  associate with the endpoint.	If not
       specified, 1 context will be assigned if	the endpoint supports  inbound
       transfers.  Receive contexts are	independent processing queues that may
       be separately configured.  Each receive context may be bound to a sepa-
       rate CQ,	and no ordering	is defined between contexts.  Additionally, no
       synchronization is needed when accessing	contexts in parallel.

       If the count is set to the value	FI_SHARED_CONTEXT, the	endpoint  will
       be  configured  to  use	a  shared receive context, if supported	by the
       provider.  Providers that do not	support	shared receive	contexts  will
       fail the	request.

       See  the	 scalable endpoint and shared contexts sections	for additional
       details.

   auth_key_size - Authorization Key Length
       The length of the authorization key in bytes.  This field will be 0  if
       authorization  keys  are	 not available or used.	 This field is ignored
       unless the fabric is opened with	API version 1.5	or greater.

   auth_key - Authorization Key
       If supported by the fabric, an authorization key	(a.k.a.	 job  key)  to
       associate  with	the  endpoint.	 An authorization key is used to limit
       communication between endpoints.	 Only peer  endpoints  that  are  pro-
       grammed	to use the same	authorization key may communicate.  Authoriza-
       tion keys are often used	to implement job keys, to ensure that process-
       es  running  in	different jobs do not accidentally cross traffic.  The
       domain authorization key	will be	used if	auth_key_size  is  set	to  0.
       This  field is ignored unless the fabric	is opened with API version 1.5
       or greater.

TRANSMIT CONTEXT ATTRIBUTES
       Attributes specific to the transmit capabilities	 of  an	 endpoint  are
       specified using struct fi_tx_attr.

	      struct fi_tx_attr	{
		  uint64_t  caps;
		  uint64_t  mode;
		  uint64_t  op_flags;
		  uint64_t  msg_order;
		  uint64_t  comp_order;
		  size_t    inject_size;
		  size_t    size;
		  size_t    iov_limit;
		  size_t    rma_iov_limit;
		  uint32_t  tclass;
	      };

   caps	- Capabilities
       The  requested capabilities of the context.  The	capabilities must be a
       subset of those requested of the	associated endpoint.  See the CAPABIL-
       ITIES  section  of  fi_getinfo(3)  for capability details.  If the caps
       field is	0 on input to fi_getinfo(3), the  applicable  capability  bits
       from the	fi_info	structure will be used.

       The  following  capabilities  apply to the transmit attributes: FI_MSG,
       FI_RMA, FI_TAGGED,  FI_ATOMIC,  FI_READ,	 FI_WRITE,  FI_SEND,  FI_HMEM,
       FI_TRIGGER,  FI_FENCE,  FI_MULTICAST, FI_RMA_PMEM, FI_NAMED_RX_CTX, and
       FI_COLLECTIVE.

       Many applications will be able to ignore	this field and rely solely  on
       the  fi_info::caps field.  Use of this field provides fine grained con-
       trol over the transmit capabilities associated with an endpoint.	 It is
       useful  when  handling  scalable	endpoints, with	multiple transmit con-
       texts, for example, and allows configuring a specific transmit  context
       with  fewer  capabilities  than that supported by the endpoint or other
       transmit	contexts.

   mode
       The operational mode bits of the	context.  The mode bits	will be	a sub-
       set  of	those  associated  with	the endpoint.  See the MODE section of
       fi_getinfo(3) for details.  A mode value	of 0 will be ignored on	 input
       to fi_getinfo(3), with the mode value of	the fi_info structure used in-
       stead.  On return from fi_getinfo(3), the mode  will  be	 set  only  to
       those constraints specific to transmit operations.

   op_flags - Default transmit operation flags
       Flags  that  control  the operation of operations submitted against the
       context.	 Applicable flags are listed in	the Operation Flags section.

   msg_order - Message Ordering
       Message ordering	refers to the order in which transport	layer  headers
       (as  viewed  by the application)	are identified and processed.  Relaxed
       message order enables data transfers to be sent and received out	of or-
       der,  which may improve performance by utilizing	multiple paths through
       the fabric from the initiating endpoint to a target endpoint.   Message
       order  applies  only  between  a	single source and destination endpoint
       pair.  Ordering between different target	endpoints is not defined.

       Message order is	determined using a set of ordering bits.  Each set bit
       indicates  that	ordering  is  maintained between data transfers	of the
       specified type.	Message	order is defined for [read | write | send] op-
       erations	submitted by an	application after [read	| write	| send]	opera-
       tions.

       Message ordering	only applies to	the end	to end transmission of	trans-
       port  headers.	Message	ordering is necessary, but does	not guarantee,
       the order in which message data is sent or received  by	the  transport
       layer.	Message	 ordering  requires matching ordering semantics	on the
       receiving side of a data	transfer operation in order to guarantee  that
       ordering	is met.

       FI_ORDER_ATOMIC_RAR
	      Atomic  read  after  read.   If set, atomic fetch	operations are
	      transmitted in the order	submitted  relative  to	 other	atomic
	      fetch operations.	 If not	set, atomic fetches may	be transmitted
	      out of order from	their submission.

       FI_ORDER_ATOMIC_RAW
	      Atomic read after	write.	If set,	atomic	fetch  operations  are
	      transmitted in the order submitted relative to atomic update op-
	      erations.	 If not	set, atomic fetches may	be  transmitted	 ahead
	      of atomic	updates.

       FI_ORDER_ATOMIC_WAR
	      RMA  write  after	 read.	 If  set, atomic update	operations are
	      transmitted in the order submitted relative to atomic fetch  op-
	      erations.	  If  not set, atomic updates may be transmitted ahead
	      of atomic	fetches.

       FI_ORDER_ATOMIC_WAW
	      RMA write	after write.  If set,  atomic  update  operations  are
	      transmitted  in the order	submitted relative to other atomic up-
	      date operations.	If not atomic updates may be  transmitted  out
	      of order from their submission.

       FI_ORDER_NONE
	      No  ordering  is	specified.  This value may be used as input in
	      order to obtain the  default  message  order  supported  by  the
	      provider.	 FI_ORDER_NONE is an alias for the value 0.

       FI_ORDER_RAR
	      Read  after  read.   If  set, RMA	and atomic read	operations are
	      transmitted in the order submitted relative  to  other  RMA  and
	      atomic read operations.  If not set, RMA and atomic reads	may be
	      transmitted out of order from their submission.

       FI_ORDER_RAS
	      Read after send.	If set,	RMA and	 atomic	 read  operations  are
	      transmitted  in the order	submitted relative to message send op-
	      erations,	including tagged sends.	 If not	set,  RMA  and	atomic
	      reads may	be transmitted ahead of	sends.

       FI_ORDER_RAW
	      Read  after  write.   If set, RMA	and atomic read	operations are
	      transmitted in the order submitted relative to  RMA  and	atomic
	      write  operations.   If  not  set,  RMA  and atomic reads	may be
	      transmitted ahead	of RMA and atomic writes.

       FI_ORDER_RMA_RAR
	      RMA read after read.  If set, RMA	read operations	are  transmit-
	      ted  in  the  order  submitted relative to other RMA read	opera-
	      tions.  If not set, RMA reads may	be transmitted	out  of	 order
	      from their submission.

       FI_ORDER_RMA_RAW
	      RMA read after write.  If	set, RMA read operations are transmit-
	      ted in the order submitted relative to RMA write operations.  If
	      not set, RMA reads may be	transmitted ahead of RMA writes.

       FI_ORDER_RMA_WAR
	      RMA  write  after	read.  If set, RMA write operations are	trans-
	      mitted in	the order submitted relative to	RMA  read  operations.
	      If not set, RMA writes may be transmitted	ahead of RMA reads.

       FI_ORDER_RMA_WAW
	      RMA  write after write.  If set, RMA write operations are	trans-
	      mitted in	the order submitted relative to	other RMA write	opera-
	      tions.   If  not set, RMA	writes may be transmitted out of order
	      from their submission.

       FI_ORDER_SAR
	      Send after read.	If set,	 message  send	operations,  including
	      tagged sends, are	transmitted in order submitted relative	to RMA
	      and atomic read operations.  If not set, message	sends  may  be
	      transmitted ahead	of RMA and atomic reads.

       FI_ORDER_SAS
	      Send  after  send.   If  set, message send operations, including
	      tagged sends, are	transmitted in the order submitted relative to
	      other  message send.  If not set,	message	sends may be transmit-
	      ted out of order from their submission.

       FI_ORDER_SAW
	      Send after write.	 If set, message  send	operations,  including
	      tagged sends, are	transmitted in order submitted relative	to RMA
	      and atomic write operations.  If not set,	message	sends  may  be
	      transmitted ahead	of RMA and atomic writes.

       FI_ORDER_WAR
	      Write  after  read.  If set, RMA and atomic write	operations are
	      transmitted in the order submitted relative to  RMA  and	atomic
	      read  operations.	  If  not  set,	 RMA  and atomic writes	may be
	      transmitted ahead	of RMA and atomic reads.

       FI_ORDER_WAS
	      Write after send.	 If set, RMA and atomic	write  operations  are
	      transmitted  in the order	submitted relative to message send op-
	      erations,	including tagged sends.	 If not	set,  RMA  and	atomic
	      writes may be transmitted	ahead of sends.

       FI_ORDER_WAW
	      Write  after write.  If set, RMA and atomic write	operations are
	      transmitted in the order submitted relative  to  other  RMA  and
	      atomic  write operations.	 If not	set, RMA and atomic writes may
	      be transmitted out of order from their submission.

   comp_order -	Completion Ordering
       Completion ordering refers to the order in which	completed requests are
       written	into  the completion queue.  Completion	ordering is similar to
       message order.  Relaxed completion order	may enable faster reporting of
       completed  transfers,  allow  acknowledgments to	be sent	over different
       fabric paths, and support more sophisticated  retry  mechanisms.	  This
       can result in lower-latency completions,	particularly when using	uncon-
       nected  endpoints.   Strict  completion	ordering  may	require	  that
       providers queue completed operations or limit available optimizations.

       For transmit requests, completion ordering depends on the endpoint com-
       munication type.	 For unreliable	communication, completion ordering ap-
       plies  to all data transfer requests submitted to an endpoint.  For re-
       liable communication, completion	ordering only applies to requests that
       target  a single	destination endpoint.  Completion ordering of requests
       that target different endpoints over a reliable transport  is  not  de-
       fined.

       Applications  should  specify the completion ordering that they support
       or require.  Providers should return the	completion order that they ac-
       tually  provide,	 with  the  constraint	that  the returned ordering is
       stricter	than that specified by the application.	 Supported  completion
       order values are:

       FI_ORDER_NONE
	      No  ordering is defined for completed operations.	 Requests sub-
	      mitted to	the transmit context may complete in any order.

       FI_ORDER_STRICT
	      Requests complete	in the order in	which they  are	 submitted  to
	      the transmit context.

   inject_size
       The  requested  inject operation	size (see the FI_INJECT	flag) that the
       context will support.  This is the maximum size data transfer that  can
       be  associated  with  an	inject operation (such as fi_inject) or	may be
       used with the FI_INJECT data transfer flag.

   size
       The size	of the context.	 The size is specified as the  minimum	number
       of  transmit  operations	that may be posted to the endpoint without the
       operation returning -FI_EAGAIN.

   iov_limit
       This is the maximum number of IO	vectors	(scatter-gather	elements) that
       a single	posted operation may reference.

   rma_iov_limit
       This  is	the maximum number of RMA IO vectors (scatter-gather elements)
       that an RMA or atomic operation may reference.  The rma_iov_limit  cor-
       responds	to the rma_iov_count values in RMA and atomic operations.  See
       struct fi_msg_rma and struct fi_msg_atomic in fi_rma.3 and fi_atomic.3,
       for  additional	details.  This limit applies to	both the number	of RMA
       IO vectors that may be specified	when initiating	an operation from  the
       local endpoint, as well as the maximum number of	IO vectors that	may be
       carried in a single request from	a remote endpoint.

   Traffic Class (tclass)
       Traffic classes can be a	differentiated services	code point (DSCP) val-
       ue, one of the following	defined	labels,	or a provider-specific defini-
       tion.  If tclass	is unset or set	to FI_TC_UNSPEC, the endpoint will use
       the default traffic class associated with the domain.

       FI_TC_BEST_EFFORT
	      This  is the default in the absence of any other local or	fabric
	      configuration.  This class carries the traffic for a  number  of
	      applications executing concurrently over the same	network	infra-
	      structure.  Even though it is shared, network capacity  and  re-
	      source  allocation  are  distributed  fairly across the applica-
	      tions.

       FI_TC_BULK_DATA
	      This class is intended for large data transfers associated  with
	      I/O and is present to separate sustained I/O transfers from oth-
	      er application inter-process communications.

       FI_TC_DEDICATED_ACCESS
	      This class operates at the highest priority, except the  manage-
	      ment class.  It carries a	high bandwidth allocation, minimum la-
	      tency targets, and the highest scheduling	and arbitration	prior-
	      ity.

       FI_TC_LOW_LATENCY
	      This  class supports low latency,	low jitter data	patterns typi-
	      cally caused by transactional data exchanges,  barrier  synchro-
	      nizations, and collective	operations that	are typical of HPC ap-
	      plications.  This	class often requires maximum tolerable	laten-
	      cies that	data transfers must achieve for	correct	or performance
	      operations.  Fulfillment of such requests	 in  this  class  will
	      typically	 require accompanying bandwidth	and message size limi-
	      tations so as not	to consume excessive bandwidth at high priori-
	      ty.

       FI_TC_NETWORK_CTRL
	      This  class  is  intended	for traffic directly related to	fabric
	      (network)	management, which is critical to the correct operation
	      of  the  network.	 Its use is typically restricted to privileged
	      network management applications.

       FI_TC_SCAVENGER
	      This class is used for data that is desired but  does  not  have
	      strict  delivery requirements, such as in-band network or	appli-
	      cation level monitoring data.  Use of this class indicates  that
	      the  traffic  is considered lower	priority and should not	inter-
	      fere with	higher priority	workflows.

       fi_tc_dscp_set /	fi_tc_dscp_get
	      DSCP values are supported	via the	DSCP get  and  set  functions.
	      The definitions for DSCP values are outside the scope of libfab-
	      ric.  See	the fi_tc_dscp_set and fi_tc_dscp_get function defini-
	      tions for	details	on their use.

RECEIVE	CONTEXT	ATTRIBUTES
       Attributes  specific  to	 the  receive  capabilities of an endpoint are
       specified using struct fi_rx_attr.

	      struct fi_rx_attr	{
		  uint64_t  caps;
		  uint64_t  mode;
		  uint64_t  op_flags;
		  uint64_t  msg_order;
		  uint64_t  comp_order;
		  size_t    total_buffered_recv;
		  size_t    size;
		  size_t    iov_limit;
	      };

   caps	- Capabilities
       The requested capabilities of the context.  The capabilities must be  a
       subset of those requested of the	associated endpoint.  See the CAPABIL-
       ITIES section if	fi_getinfo(3) for capability  details.	 If  the  caps
       field  is  0  on	input to fi_getinfo(3),	the applicable capability bits
       from the	fi_info	structure will be used.

       The following capabilities apply	to  the	 receive  attributes:  FI_MSG,
       FI_RMA, FI_TAGGED, FI_ATOMIC, FI_REMOTE_READ, FI_REMOTE_WRITE, FI_RECV,
       FI_HMEM,	FI_TRIGGER,  FI_RMA_PMEM,  FI_DIRECTED_RECV,  FI_VARIABLE_MSG,
       FI_MULTI_RECV,  FI_SOURCE,  FI_RMA_EVENT, FI_SOURCE_ERR,	and FI_COLLEC-
       TIVE.

       Many applications will be able to ignore	this field and rely solely  on
       the  fi_info::caps field.  Use of this field provides fine grained con-
       trol over the receive capabilities associated with an endpoint.	It  is
       useful  when  handling  scalable	 endpoints, with multiple receive con-
       texts, for example, and allows configuring a specific  receive  context
       with  fewer  capabilities  than that supported by the endpoint or other
       receive contexts.

   mode
       The operational mode bits of the	context.  The mode bits	will be	a sub-
       set  of	those  associated  with	the endpoint.  See the MODE section of
       fi_getinfo(3) for details.  A mode value	of 0 will be ignored on	 input
       to fi_getinfo(3), with the mode value of	the fi_info structure used in-
       stead.  On return from fi_getinfo(3), the mode  will  be	 set  only  to
       those constraints specific to receive operations.

   op_flags - Default receive operation	flags
       Flags  that  control  the operation of operations submitted against the
       context.	 Applicable flags are listed in	the Operation Flags section.

   msg_order - Message Ordering
       For a description of message ordering, see the msg_order	field  in  the
       Transmit	 Context  Attribute section.  Receive context message ordering
       defines the order in which received transport message headers are  pro-
       cessed  when  received  by an endpoint.	When ordering is set, it indi-
       cates that message headers will be processed in order, based on how the
       transmit	 side has identified the messages.  Typically, this means that
       messages	will be	handled	in order based on  a  message  level  sequence
       number.

       The  following  ordering	 flags,	as defined for transmit	ordering, also
       apply to	the processing of received operations:	FI_ORDER_NONE,	FI_OR-
       DER_RAR,	FI_ORDER_RAW, FI_ORDER_RAS, FI_ORDER_WAR, FI_ORDER_WAW,	FI_OR-
       DER_WAS,	FI_ORDER_SAR,  FI_ORDER_SAW,  FI_ORDER_SAS,  FI_ORDER_RMA_RAR,
       FI_ORDER_RMA_RAW,  FI_ORDER_RMA_WAR,  FI_ORDER_RMA_WAW,	FI_ORDER_ATOM-
       IC_RAR, FI_ORDER_ATOMIC_RAW,  FI_ORDER_ATOMIC_WAR,  and	FI_ORDER_ATOM-
       IC_WAW.

   comp_order -	Completion Ordering
       For  a  description of completion ordering, see the comp_order field in
       the Transmit Context Attribute section.

       FI_ORDER_DATA
	      When set,	this bit indicates that	received data is written  into
	      memory  in  order.   Data	ordering applies to memory accessed as
	      part of a	single operation and between operations	if message or-
	      dering is	guaranteed.

       FI_ORDER_NONE
	      No ordering is defined for completed operations.	Receive	opera-
	      tions may	complete in any	order, regardless of their  submission
	      order.

       FI_ORDER_STRICT
	      Receive  operations complete in the order	in which they are pro-
	      cessed by	the receive context, based on the receive side msg_or-
	      der attribute.

   total_buffered_recv
       This  field is supported	for backwards compatibility purposes.  It is a
       hint to the provider of the total available space that may be needed to
       buffer  messages	 that  are received for	which there is no matching re-
       ceive operation.	 The provider may adjust or ignore  this  value.   The
       allocation  of  internal	 network  buffering  among received message is
       provider	specific.  For instance, a provider may	limit the size of mes-
       sages  which  can be buffered or	the amount of buffering	allocated to a
       single message.

       If receive side buffering is disabled (total_buffered_recv = 0)	and  a
       message	is  received by	an endpoint, then the behavior is dependent on
       whether resource	management has been enabled (FI_RM_ENABLED has be  set
       or  not).   See the Resource Management section of fi_domain.3 for fur-
       ther clarification.  It is recommended  that  applications  enable  re-
       source  management  if  they  anticipate	receiving unexpected messages,
       rather than modifying this value.

   size
       The size	of the context.	 The size is specified as the  minimum	number
       of  receive  operations	that may be posted to the endpoint without the
       operation returning -FI_EAGAIN.

   iov_limit
       This is the maximum number of IO	vectors	(scatter-gather	elements) that
       a single	posted operating may reference.

SCALABLE ENDPOINTS
       A  scalable  endpoint  is a communication portal	that supports multiple
       transmit	and receive contexts.  Scalable	endpoints are loosely  modeled
       after  the  networking  concept	of transmit/receive side scaling, also
       known as	multi-queue.  Support for scalable endpoints is	domain specif-
       ic.   Scalable  endpoints may improve the performance of	multi-threaded
       and parallel applications, by allowing threads  to  access  independent
       transmit	 and  receive queues.  A scalable endpoint has a single	trans-
       port level address, which can reduce the	memory requirements needed  to
       store  remote  addressing data, versus using standard endpoints.	 Scal-
       able endpoints cannot be	used directly  for  communication  operations,
       and  require  the application to	explicitly create transmit and receive
       contexts	as described below.

   fi_tx_context
       Transmit	contexts are independent transmit queues.  Ordering  and  syn-
       chronization between contexts are not defined.  Conceptually a transmit
       context behaves similar to a send-only endpoint.	  A  transmit  context
       may  be	configured  with fewer capabilities than the base endpoint and
       with different attributes (such as  ordering  requirements  and	inject
       size)  than  other contexts associated with the same scalable endpoint.
       Each transmit context has its own  completion  queue.   The  number  of
       transmit	 contexts associated with an endpoint is specified during end-
       point creation.

       The fi_tx_context call is used to retrieve a specific context,  identi-
       fied  by	 an  index  (see  above	 for  details  on transmit context at-
       tributes).  Providers may dynamically allocate contexts when fi_tx_con-
       text  is	called,	or may statically create all contexts when fi_endpoint
       is invoked.  By default,	a transmit context inherits the	properties  of
       its  associated	endpoint.   However,  applications may request context
       specific	attributes through the attr parameter.	Support	for per	trans-
       mit  context  attributes	 is  provider  specific	 and  not  guaranteed.
       Providers will return the actual	attributes  assigned  to  the  context
       through the attr	parameter, if provided.

   fi_rx_context
       Receive	contexts are independent receive queues	for receiving incoming
       data.  Ordering and synchronization between contexts  are  not  guaran-
       teed.  Conceptually a receive context behaves similar to	a receive-only
       endpoint.  A receive context may	be configured with fewer  capabilities
       than  the base endpoint and with	different attributes (such as ordering
       requirements and	inject size) than other	contexts associated  with  the
       same  scalable  endpoint.   Each	receive	context	has its	own completion
       queue.  The number of receive contexts associated with an  endpoint  is
       specified during	endpoint creation.

       Receive contexts	are often associated with steering flows, that specify
       which incoming packets targeting	a scalable endpoint to process.	  How-
       ever,  receive  contexts	 may be	targeted directly by the initiator, if
       supported by the	underlying protocol.  Such contexts are	referred to as
       'named'.	  Support  for named contexts must be indicated	by setting the
       caps FI_NAMED_RX_CTX capability when the	corresponding endpoint is cre-
       ated.   Support	for named receive contexts is coordinated with address
       vectors.	 See fi_av(3) and fi_rx_addr(3).

       The fi_rx_context call is used to retrieve a specific context,  identi-
       fied by an index	(see above for details on receive context attributes).
       Providers may  dynamically  allocate  contexts  when  fi_rx_context  is
       called,	or  may	statically create all contexts when fi_endpoint	is in-
       voked.  By default, a receive context inherits the  properties  of  its
       associated endpoint.  However, applications may request context specif-
       ic attributes through the attr parameter.  Support for per receive con-
       text  attributes	 is  provider  specific	and not	guaranteed.  Providers
       will return the actual attributes assigned to the context  through  the
       attr parameter, if provided.

SHARED CONTEXTS
       Shared  contexts	 are  transmit	and receive contexts explicitly	shared
       among one or more endpoints.  A shareable context allows	an application
       to  use	a  single dedicated provider resource among multiple transport
       addressable endpoints.  This can	greatly	reduce the resources needed to
       manage  communication  over multiple endpoints by multiplexing transmit
       and/or receive processing, with the potential cost of  serializing  ac-
       cess  across multiple endpoints.	 Support for shareable contexts	is do-
       main specific.

       Conceptually, shareable transmit	contexts are transmit queues that  may
       be accessed by many endpoints.  The use of a shared transmit context is
       mostly opaque to	an application.	 Applications must allocate  and  bind
       shared  transmit	 contexts  to endpoints, but operations	are posted di-
       rectly to the endpoint.	Shared transmit	contexts  are  not  associated
       with completion queues or counters.  Completed operations are posted to
       the CQs bound to	the endpoint.  An endpoint may only be associated with
       a single	shared transmit	context.

       Unlike  shared  transmit	 contexts, applications	interact directly with
       shared receive contexts.	 Users post  receive  buffers  directly	 to  a
       shared  receive	context, with the buffers usable by any	endpoint bound
       to the shared receive context.  Shared receive contexts are not associ-
       ated  with completion queues or counters.  Completed receive operations
       are posted to the CQs bound to the endpoint.  An	endpoint may  only  be
       associated  with	 a single receive context, and all connectionless end-
       points associated with a	shared receive context	must  also  share  the
       same address vector.

       Endpoints  associated  with a shared transmit context may use dedicated
       receive contexts, and vice-versa.  Or an	endpoint may use shared	trans-
       mit  and	 receive  contexts.  And there is no requirement that the same
       group of	endpoints sharing a context of one type	also share the context
       of  an  alternate type.	Furthermore, an	endpoint may use a shared con-
       text of one type, but a scalable	set of contexts	of the alternate type.

   fi_stx_context
       This call is used to open a shareable transmit context (see  above  for
       details on the transmit context attributes).  Endpoints associated with
       a shared	transmit context must use a subset of the  transmit  context's
       attributes.   Note  that	 this  is  the	reverse	of the requirement for
       transmit	contexts for scalable endpoints.

   fi_srx_context
       This allocates a	shareable receive context (see above  for  details  on
       the  receive  context  attributes).  Endpoints associated with a	shared
       receive context must use	a subset of the	receive	context's  attributes.
       Note  that  this	is the reverse of the requirement for receive contexts
       for scalable endpoints.

SOCKET ENDPOINTS
       The following feature and description should be	considered  experimen-
       tal.  Until the experimental tag	is removed, the	interfaces, semantics,
       and data	structures associated with socket endpoints may	change between
       library versions.

       This  section  applies  to  endpoints  of  type	FI_EP_SOCK_STREAM  and
       FI_EP_SOCK_DGRAM, commonly referred to as socket	endpoints.

       Socket endpoints	are defined with semantics that	 allow	them  to  more
       easily  be  adopted by developers familiar with the UNIX	socket API, or
       by middleware that exposes the socket API, while	still taking advantage
       of high-performance hardware features.

       The  key	difference between socket endpoints and	other active endpoints
       are socket endpoints use	synchronous data  transfers.   Buffers	passed
       into  send and receive operations revert	to the control of the applica-
       tion upon returning from	the function  call.   As  a  result,  no  data
       transfer	 completions  are reported to the application, and socket end-
       points are not associated with completion queues	or counters.

       Socket endpoints	support	 a  subset  of	message	 operations:  fi_send,
       fi_sendv,  fi_sendmsg,  fi_recv,	 fi_recvv,  fi_recvmsg,	and fi_inject.
       Because data transfers are synchronous, the return value	from send  and
       receive operations indicate the number of bytes transferred on success,
       or a negative value on error, including -FI_EAGAIN if the endpoint can-
       not  send  or receive any data because of full or empty queues, respec-
       tively.

       Socket endpoints	are associated with event queues and address  vectors,
       and  process  connection	 management  events asynchronously, similar to
       other endpoints.	 Unlike	UNIX sockets, socket endpoint  must  still  be
       declared	as either active or passive.

       Socket endpoints	behave like non-blocking sockets.  In order to support
       select and poll semantics, active socket	endpoints are associated  with
       a  file	descriptor  that is signaled whenever the endpoint is ready to
       send and/or receive data.  The file descriptor may be  retrieved	 using
       fi_control.

OPERATION FLAGS
       Operation  flags	 are  obtained by OR-ing the following flags together.
       Operation flags define the default flags	applied	to an endpoint's  data
       transfer	 operations,  where  a flags parameter is not available.  Data
       transfer	operations that	take flags as input override the op_flags val-
       ue of transmit or receive context attributes of an endpoint.

       FI_COMMIT_COMPLETE
	      Indicates	 that a	completion should not be generated (locally or
	      at the peer) until the result of an  operation  have  been  made
	      persistent.   See	 fi_cq(3) for additional details on completion
	      semantics.

       FI_COMPLETION
	      Indicates	that a completion queue	entry should  be  written  for
	      data  transfer operations.  This flag only applies to operations
	      issued on	an endpoint that was bound to a	completion queue  with
	      the  FI_SELECTIVE_COMPLETION flag	set, otherwise,	it is ignored.
	      See the fi_ep_bind section above for more	detail.

       FI_DELIVERY_COMPLETE
	      Indicates	that a completion should be generated when the	opera-
	      tion  has	 been  processed  by the destination endpoint(s).  See
	      fi_cq(3) for additional details on completion semantics.

       FI_INJECT
	      Indicates	that all outbound data buffers should be  returned  to
	      the  user's  control  immediately	after a	data transfer call re-
	      turns, even if the operation is  handled	asynchronously.	  This
	      may  require that	the provider copy the data into	a local	buffer
	      and transfer out of that buffer.	A provider can limit the total
	      amount  of  send	data that may be buffered and/or the size of a
	      single send that can use this flag.  This	limit is indicated us-
	      ing inject_size (see inject_size above).

       FI_INJECT_COMPLETE
	      Indicates	 that a	completion should be generated when the	source
	      buffer(s)	may be reused.	See fi_cq(3) for additional details on
	      completion semantics.

       FI_MULTICAST
	      Indicates	that data transfers will target	multicast addresses by
	      default.	Any fi_addr_t passed into a  data  transfer  operation
	      will be treated as a multicast address.

       FI_MULTI_RECV
	      Applies to posted	receive	operations.  This flag allows the user
	      to post a	single buffer that will	receive	multiple incoming mes-
	      sages.  Received messages	will be	packed into the	receive	buffer
	      until the	buffer has been	consumed.  Use of this flag may	 cause
	      a	 single	 posted	receive	operation to generate multiple comple-
	      tions as messages	are placed into	the buffer.  The placement  of
	      received	data into the buffer may be subjected to provider spe-
	      cific alignment restrictions.  The buffer	will  be  released  by
	      the  provider  when  the	available buffer space falls below the
	      specified	minimum	(see FI_OPT_MIN_MULTI_RECV).

       FI_TRANSMIT_COMPLETE
	      Indicates	that a completion should be generated when the	trans-
	      mit operation has	completed relative to the local	provider.  See
	      fi_cq(3) for additional details on completion semantics.

NOTES
       Users should call fi_close to release all resources  allocated  to  the
       fabric endpoint.

       Endpoints  allocated  with  the FI_CONTEXT or FI_CONTEXT2 mode bits set
       must typically provide struct fi_context(2) as their per	operation con-
       text  parameter.	  (See fi_getinfo.3 for	details.) However, when	FI_SE-
       LECTIVE_COMPLETION is enabled to	suppress CQ completion entries,	and an
       operation  is  initiated	 without  the FI_COMPLETION flag set, then the
       context parameter is ignored.  An application does not need to pass  in
       a valid struct fi_context(2) into such data transfers.

       Operations  that	 complete  in error that are not associated with valid
       operational context will	use the	endpoint context in any	error  report-
       ing structures.

       Although	 applications  typically associate individual completions with
       either completion queues	or counters, an	endpoint can  be  attached  to
       both  a	counter	and completion queue.  When combined with using	selec-
       tive completions, this allows an	application to use counters  to	 track
       successful  completions,	 with  a CQ used to report errors.  Operations
       that complete with an error increment the error counter and generate  a
       CQ completion event.

       As  mentioned  in  fi_getinfo(3),  the ep_attr structure	can be used to
       query providers that support various endpoint  attributes.   fi_getinfo
       can return provider info	structures that	can support the	minimal	set of
       requirements (such that the application maintains correctness).	Howev-
       er, it can also return provider info structures that exceed application
       requirements.   As  an  example,	 consider  an  application  requesting
       msg_order  as  FI_ORDER_NONE.  The resulting output from	fi_getinfo may
       have all	the ordering bits set.	The application	can reset the ordering
       bits it does not	require	before creating	the endpoint.  The provider is
       free to implement a stricter ordering than is required by the  applica-
       tion.

RETURN VALUES
       Returns 0 on success.  On error,	a negative value corresponding to fab-
       ric errno is returned.  For fi_cancel, a	return value  of  0  indicates
       that the	cancel request was submitted for processing.

       Fabric errno values are defined in rdma/fi_errno.h.

ERRORS
       -FI_EDOMAIN
	      A	 resource  domain  was not bound to the	endpoint or an attempt
	      was made to bind multiple	domains.

       -FI_ENOCQ
	      The endpoint has not been	configured with	necessary event	queue.

       -FI_EOPBADSTATE
	      The endpoint's state does	not permit the requested operation.

SEE ALSO
       fi_getinfo(3),	fi_domain(3),	fi_cq(3)   fi_msg(3),	 fi_tagged(3),
       fi_rma(3)

AUTHORS
       OpenFabrics.

Libfabric Programmer's Manual	  2020-08-07			fi_endpoint(3)

NAME | SYNOPSIS | ARGUMENTS | DESCRIPTION | ENDPOINT ATTRIBUTES | TRANSMIT CONTEXT ATTRIBUTES | RECEIVE CONTEXT ATTRIBUTES | SCALABLE ENDPOINTS | SHARED CONTEXTS | SOCKET ENDPOINTS | OPERATION FLAGS | NOTES | RETURN VALUES | ERRORS | SEE ALSO | AUTHORS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=fi_endpoint&sektion=3&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help