Skip site navigation (1)Skip section navigation (2)

FreeBSD Man Pages

Man Page or Keyword Search:
Man Architecture
Apropos Keyword Search (all sections) Output format
home | help
NETMAP(4)	       FreeBSD Kernel Interfaces Manual		     NETMAP(4)

NAME
     netmap -- a framework for fast packet I/O
     VALE -- a fast VirtuAl Local Ethernet using the netmap API
     netmap pipes -- a shared memory packet transport channel

SYNOPSIS
     device netmap

DESCRIPTION
     netmap is a framework for extremely fast and efficient packet I/O for
     both userspace and	kernel clients.	 It runs on FreeBSD and	Linux, and
     includes VALE, a very fast	and modular in-kernel software switch/data-
     plane, and	netmap pipes, a	shared memory packet transport channel.	 All
     these are accessed	interchangeably	with the same API.

     netmap, VALE and netmap pipes are at least	one order of magnitude faster
     than standard OS mechanisms (sockets, bpf,	tun/tap	interfaces, native
     switches, pipes), reaching	14.88 million packets per second (Mpps)	with
     much less than one	core on	a 10 Gbit NIC, about 20	Mpps per core for VALE
     ports, and	over 100 Mpps for netmap pipes.

     Userspace clients can dynamically switch NICs into	netmap mode and	send
     and receive raw packets through memory mapped buffers.  Similarly,	VALE
     switch instances and ports, and netmap pipes can be created dynamically,
     providing high speed packet I/O between processes,	virtual	machines, NICs
     and the host stack.

     netmap suports both non-blocking I/O through ioctls(), synchronization
     and blocking I/O through a	file descriptor	and standard OS	mechanisms
     such as select(2),	poll(2), epoll(2), kqueue(2).  VALE and	netmap pipes
     are implemented by	a single kernel	module,	which also emulates the	netmap
     API over standard drivers for devices without native netmap support.  For
     best performance, netmap requires explicit	support	in device drivers.

     In	the rest of this (long)	manual page we document	various	aspects	of the
     netmap and	VALE architecture, features and	usage.

ARCHITECTURE
     netmap supports raw packet	I/O through a port, which can be connected to
     a physical	interface (NIC), to the	host stack, or to a VALE switch).
     Ports use preallocated circular queues of buffers (rings) residing	in an
     mmapped region.  There is one ring	for each transmit/receive queue	of a
     NIC or virtual port.  An additional ring pair connects to the host	stack.

     After binding a file descriptor to	a port,	a netmap client	can send or
     receive packets in	batches	through	the rings, and possibly	implement
     zero-copy forwarding between ports.

     All NICs operating	in netmap mode use the same memory region, accessible
     to	all processes who own /dev/netmap file descriptors bound to NICs.
     Independent VALE and netmap pipe ports by default use separate memory
     regions, but can be independently configured to share memory.

ENTERING AND EXITING NETMAP MODE
     The following section describes the system	calls to create	and control
     netmap ports (including VALE and netmap pipe ports).  Simpler, higher
     level functions are described in section LIBRARIES.

     Ports and rings are created and controlled	through	a file descriptor,
     created by	opening	a special device
	   fd =	open("/dev/netmap");
     and then bound to a specific port with an
	   ioctl(fd, NIOCREGIF,	(struct	nmreq *)arg);

     netmap has	multiple modes of operation controlled by the struct nmreq
     argument.	arg.nr_name specifies the port name, as	follows:

     OS	network	interface name (e.g. 'em0', 'eth1', ...)
	   the data path of the	NIC is disconnected from the host stack, and
	   the file descriptor is bound	to the NIC (one	or all queues),	or to
	   the host stack;

     valeXXX:YYY (arbitrary XXX	and YYY)
	   the file descriptor is bound	to port	YYY of a VALE switch called
	   XXX,	both dynamically created if necessary.	The string cannot
	   exceed IFNAMSIZ characters, and YYY cannot be the name of any
	   existing OS network interface.

     On	return,	arg indicates the size of the shared memory region, and	the
     number, size and location of all the netmap data structures, which	can be
     accessed by mmapping the memory
	   char	*mem = mmap(0, arg.nr_memsize, fd);

     Non blocking I/O is done with special ioctl(2) select(2) and poll(2) on
     the file descriptor permit	blocking I/O.  epoll(2)	and kqueue(2) are not
     supported on netmap file descriptors.

     While a NIC is in netmap mode, the	OS will	still believe the interface is
     up	and running.  OS-generated packets for that NIC	end up into a netmap
     ring, and another ring is used to send packets into the OS	network	stack.
     A close(2)	on the file descriptor removes the binding, and	returns	the
     NIC to normal mode	(reconnecting the data path to the host	stack),	or
     destroys the virtual port.

DATA STRUCTURES
     The data structures in the	mmapped	memory region are detailed in
     sys/net/netmap.h, which is	the ultimate reference for the netmap API. The
     main structures and fields	are indicated below:

     struct netmap_if (one per interface)

	  struct netmap_if {
	      ...
	      const uint32_t   ni_flags;      /* properties		 */
	      ...
	      const uint32_t   ni_tx_rings;   /* NIC tx	rings		 */
	      const uint32_t   ni_rx_rings;   /* NIC rx	rings		 */
	      uint32_t	       ni_bufs_head;  /* head of extra bufs list */
	      ...
	  };

	  Indicates the	number of available rings (struct netmap_rings)	and
	  their	position in the	mmapped	region.	 The number of tx and rx rings
	  (ni_tx_rings,	ni_rx_rings) normally depends on the hardware.	NICs
	  also have an extra tx/rx ring	pair connected to the host stack.
	  NIOCREGIF can	also request additional	unbound	buffers	in the same
	  memory space,	to be used as temporary	storage	for packets.
	  ni_bufs_head contains	the index of the first of these	free rings,
	  which	are connected in a list	(the first uint32_t of each buffer
	  being	the index of the next buffer in	the list).  A 0	indicates the
	  end of the list.

     struct netmap_ring	(one per ring)

	  struct netmap_ring {
	      ...
	      const uint32_t num_slots;	  /* slots in each ring		   */
	      const uint32_t nr_buf_size; /* size of each buffer	   */
	      ...
	      uint32_t	     head;	  /* (u) first buf owned by user   */
	      uint32_t	     cur;	  /* (u) wakeup	position	   */
	      const uint32_t tail;	  /* (k) first buf owned by kernel */
	      ...
	      uint32_t	     flags;
	      struct timeval ts;	  /* (k) time of last rxsync()	   */
	      ...
	      struct netmap_slot slot[0]; /* array of slots		   */
	  }

	  Implements transmit and receive rings, with read/write pointers,
	  metadata and and an array of slots describing	the buffers.

     struct netmap_slot	(one per buffer)

	  struct netmap_slot {
	      uint32_t buf_idx;		  /* buffer index		  */
	      uint16_t len;		  /* packet length		  */
	      uint16_t flags;		  /* buf changed, etc.		  */
	      uint64_t ptr;		  /* address for indirect buffers */
	  };

	  Describes a packet buffer, which normally is identified by an	index
	  and resides in the mmapped region.

     packet buffers
	  Fixed	size (normally 2 KB) packet buffers allocated by the kernel.

     The offset	of the struct netmap_if	in the mmapped region is indicated by
     the nr_offset field in the	structure returned by NIOCREGIF.  From there,
     all other objects are reachable through relative references (offsets or
     indexes).	Macros and functions in	<net/netmap_user.h> help converting
     them into actual pointers:

	   struct netmap_if *nifp = NETMAP_IF(mem, arg.nr_offset);
	   struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index);
	   struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index);

	   char	*buf = NETMAP_BUF(ring,	buffer_index);

RINGS, BUFFERS AND DATA	I/O
     Rings are circular	queues of packets with three indexes/pointers (head,
     cur, tail); one slot is always kept empty.	 The ring size (num_slots)
     should not	be assumed to be a power of two.
     (NOTE: older versions of netmap used head/count format to indicate	the
     content of	a ring).

     head is the first slot available to userspace;
     cur is the	wakeup point: select/poll will unblock when tail passes	cur;
     tail is the first slot reserved to	the kernel.

     Slot indexes MUST only move forward; for convenience, the function
	   nm_ring_next(ring, index)
     returns the next index modulo the ring size.

     head and cur are only modified by the user	program; tail is only modified
     by	the kernel.  The kernel	only reads/writes the struct netmap_ring slots
     and buffers during	the execution of a netmap-related system call.	The
     only exception are	slots (and buffers) in the range tail ... head-1, that
     are explicitly assigned to	the kernel.

   TRANSMIT RINGS
     On	transmit rings,	after a	netmap system call, slots in the range
     head ... tail-1 are available for transmission.  User code	should fill
     the slots sequentially and	advance	head and cur past slots	ready to
     transmit.	cur may	be moved further ahead if the user code	needs more
     slots before further transmissions	(see SCATTER GATHER I/O).

     At	the next NIOCTXSYNC/select()/poll(), slots up to head-1	are pushed to
     the port, and tail	may advance if further slots have become available.
     Below is an example of the	evolution of a TX ring:

	 after the syscall, slots between cur and tail are (a)vailable
		   head=cur   tail
		    |	       |
		    v	       v
	  TX  [.....aaaaaaaaaaa.............]

	 user creates new packets to (T)ransmit
		     head=cur tail
			 |     |
			 v     v
	  TX  [.....TTTTTaaaaaa.............]

	 NIOCTXSYNC/poll()/select() sends packets and reports new slots
		     head=cur	   tail
			 |	    |
			 v	    v
	  TX  [..........aaaaaaaaaaa........]

     select() and poll() wlll block if there is	no space in the	ring, i.e.
	   ring->cur ==	ring->tail
     and return	when new slots have become available.

     High speed	applications may want to amortize the cost of system calls by
     preparing as many packets as possible before issuing them.

     A transmit	ring with pending transmissions	has
	   ring->head != ring->tail + 1	(modulo	the ring size).
     The function int nm_tx_pending(ring) implements this test.

   RECEIVE RINGS
     On	receive	rings, after a netmap system call, the slots in	the range
     head... tail-1 contain received packets.  User code should	process	them
     and advance head and cur past slots it wants to return to the kernel.
     cur may be	moved further ahead if the user	code wants to wait for more
     packets without returning all the previous	slots to the kernel.

     At	the next NIOCRXSYNC/select()/poll(), slots up to head-1	are returned
     to	the kernel for further receives, and tail may advance to report	new
     incoming packets.
     Below is an example of the	evolution of an	RX ring:

	 after the syscall, there are some (h)eld and some (R)eceived slots
		head  cur     tail
		 |     |       |
		 v     v       v
	  RX  [..hhhhhhRRRRRRRR..........]

	 user advances head and	cur, releasing some slots and holding others
		    head cur  tail
		      |	 |     |
		      v	 v     v
	  RX  [..*****hhhRRRRRR...........]

	 NICRXSYNC/poll()/select() recovers slots and reports new packets
		    head cur	    tail
		      |	 |	     |
		      v	 v	     v
	  RX  [.......hhhRRRRRRRRRRRR....]

SLOTS AND PACKET BUFFERS
     Normally, packets should be stored	in the netmap-allocated	buffers
     assigned to slots when ports are bound to a file descriptor.  One packet
     is	fully contained	in a single buffer.

     The following flags affect	slot and buffer	processing:

     NS_BUF_CHANGED
	  it MUST be used when the buf_idx in the slot is changed.  This can
	  be used to implement zero-copy forwarding, see ZERO-COPY FORWARDING.

     NS_REPORT
	  reports when this buffer has been transmitted.  Normally, netmap
	  notifies transmit completions	in batches, hence signals can be
	  delayed indefinitely.	This flag helps	detecting when packets have
	  been send and	a file descriptor can be closed.

     NS_FORWARD
	  When a ring is in 'transparent' mode (see TRANSPARENT	MODE), packets
	  marked with this flags are forwarded to the other endpoint at	the
	  next system call, thus restoring (in a selective way)	the connection
	  between a NIC	and the	host stack.

     NS_NO_LEARN
	  tells	the forwarding code that the SRC MAC address for this packet
	  must not be used in the learning bridge code.

     NS_INDIRECT
	  indicates that the packet's payload is in a user-supplied buffer,
	  whose	user virtual address is	in the 'ptr' field of the slot.	 The
	  size can reach 65535 bytes.
	  This is only supported on the	transmit ring of VALE ports, and it
	  helps	reducing data copies in	the interconnection of virtual
	  machines.

     NS_MOREFRAG
	  indicates that the packet continues with subsequent buffers; the
	  last buffer in a packet must have the	flag clear.

SCATTER	GATHER I/O
     Packets can span multiple slots if	the NS_MOREFRAG	flag is	set in all but
     the last slot.  The maximum length	of a chain is 64 buffers.  This	is
     normally used with	VALE ports when	connecting virtual machines, as	they
     generate large TSO	segments that are not split unless they	reach a	physi-
     cal device.

     NOTE: The length field always refers to the individual fragment; there is
     no	place with the total length of a packet.

     On	receive	rings the macro	NS_RFRAGS(slot)	indicates the remaining	number
     of	slots for this packet, including the current one.  Slots with a	value
     greater than 1 also have NS_MOREFRAG set.

IOCTLS
     netmap uses two ioctls (NIOCTXSYNC, NIOCRXSYNC) for non-blocking I/O.
     They take no argument.  Two more ioctls (NIOCGINFO, NIOCREGIF) are	used
     to	query and configure ports, with	the following argument:

     struct nmreq {
	 char	   nr_name[IFNAMSIZ]; /* (i) port name			*/
	 uint32_t  nr_version;	      /* (i) API version		*/
	 uint32_t  nr_offset;	      /* (o) nifp offset in mmap region	*/
	 uint32_t  nr_memsize;	      /* (o) size of the mmap region	*/
	 uint32_t  nr_tx_slots;	      /* (i/o) slots in	tx rings	*/
	 uint32_t  nr_rx_slots;	      /* (i/o) slots in	rx rings	*/
	 uint16_t  nr_tx_rings;	      /* (i/o) number of tx rings	*/
	 uint16_t  nr_rx_rings;	      /* (i/o) number of tx rings	*/
	 uint16_t  nr_ringid;	      /* (i/o) ring(s) we care about	*/
	 uint16_t  nr_cmd;	      /* (i) special command		*/
	 uint16_t  nr_arg1;	      /* (i/o) extra arguments		*/
	 uint16_t  nr_arg2;	      /* (i/o) extra arguments		*/
	 uint32_t  nr_arg3;	      /* (i/o) extra arguments		*/
	 uint32_t  nr_flags	      /* (i/o) open mode		*/
	 ...
     };

     A file descriptor obtained	through	/dev/netmap also supports the ioctl
     supported by network devices, see netintro(4).

     NIOCGINFO
	   returns EINVAL if the named port does not support netmap.  Other-
	   wise, it returns 0 and (advisory) information about the port.  Note
	   that	all the	information below can change before the	interface is
	   actually put	in netmap mode.

	   nr_memsize
	       indicates the size of the netmap	memory region. NICs in netmap
	       mode all	share the same memory region, whereas VALE ports have
	       independent regions for each port.

	   nr_tx_slots,	nr_rx_slots
	       indicate	the size of transmit and receive rings.

	   nr_tx_rings,	nr_rx_rings
	       indicate	the number of transmit and receive rings.  Both	ring
	       number and sizes	may be configured at runtime using interface-
	       specific	functions (e.g.	 ethtool ).

     NIOCREGIF
	   binds the port named	in nr_name to the file descriptor. For a phys-
	   ical	device this also switches it into netmap mode, disconnecting
	   it from the host stack.  Multiple file descriptors can be bound to
	   the same port, with proper synchronization left to the user.

	   NIOCREGIF can also bind a file descriptor to	one endpoint of	a
	   netmap pipe,	consisting of two netmap ports with a crossover	con-
	   nection.  A netmap pipe share the same memory space of the parent
	   port, and is	meant to enable	configuration where a master process
	   acts	as a dispatcher	towards	slave processes.

	   To enable this function, the	nr_arg1	field of the structure can be
	   used	as a hint to the kernel	to indicate how	many pipes we expect
	   to use, and reserve extra space in the memory region.

	   On return, it gives the same	info as	NIOCGINFO, with	nr_ringid and
	   nr_flags indicating the identity of the rings controlled through
	   the file descriptor.

	   nr_flags nr_ringid selects which rings are controlled through this
	   file	descriptor.  Possible values of	nr_flags are indicated below,
	   together with the naming schemes that application libraries (such
	   as the nm_open indicated below) can use to indicate the specific
	   set of rings.  In the example below,	"netmap:foo" is	any valid
	   netmap port name.

	   NR_REG_ALL_NIC netmap:foo
		  (default) all	hardware ring pairs

	   NR_REG_SW_NIC netmap:foo^
		  the ``host rings'', connecting to the	host stack.

	   NR_RING_NIC_SW netmap:foo+
		  all hardware rings and the host rings

	   NR_REG_ONE_NIC netmap:foo-i
		  only the i-th	hardware ring pair, where the number is	in
		  nr_ringid;

	   NR_REG_PIPE_MASTER netmap:foo{i
		  the master side of the netmap	pipe whose identifier (i) is
		  in nr_ringid;

	   NR_REG_PIPE_SLAVE netmap:foo}i
		  the slave side of the	netmap pipe whose identifier (i) is in
		  nr_ringid.

		  The identifier of a pipe must	be thought as part of the pipe
		  name,	and does not need to be	sequential. On return the pipe
		  will only have a single ring pair with index 0, irrespective
		  of the value of i.

	   By default, a poll(2) or select(2) call pushes out any pending
	   packets on the transmit ring, even if no write events are speci-
	   fied.  The feature can be disabled by or-ing	NETMAP_NO_TX_SYNC to
	   the value written to	nr_ringid. When	this feature is	used, packets
	   are transmitted only	on ioctl(NIOCTXSYNC) or	select()/poll()	are
	   called with a write event (POLLOUT/wfdset) or a full	ring.

	   When	registering a virtual interface	that is	dynamically created to
	   a vale(4) switch, we	can specify the	desired	number of rings	(1 by
	   default, and	currently up to	16) on it using	nr_tx_rings and
	   nr_rx_rings fields.

     NIOCTXSYNC
	   tells the hardware of new packets to	transmit, and updates the num-
	   ber of slots	available for transmission.

     NIOCRXSYNC
	   tells the hardware of consumed packets, and asks for	newly avail-
	   able	packets.

SELECT,	POLL, EPOLL, KQUEUE.
     select(2) and poll(2) on a	netmap file descriptor process rings as	indi-
     cated in TRANSMIT RINGS and RECEIVE RINGS,	respectively when write	(POLL-
     OUT) and read (POLLIN) events are requested.  Both	block if no slots are
     available in the ring (ring-_cur == ring-_tail).  Depending on the	plat-
     form, epoll(2) and	kqueue(2) are supported	too.

     Packets in	transmit rings are normally pushed out (and buffers reclaimed)
     even without requesting write events. Passing the NETMAP_NO_TX_SYNC flag
     to	NIOCREGIF disables this	feature.  By default, receive rings are	pro-
     cessed only if read events	are requested. Passing the NETMAP_DO_RX_SYNC
     flag to NIOCREGIF updates receive rings even without read events. Note
     that on epoll and kqueue, NETMAP_NO_TX_SYNC and NETMAP_DO_RX_SYNC only
     have an effect when some event is posted for the file descriptor.

LIBRARIES
     The netmap	API is supposed	to be used directly, both because of its sim-
     plicity and for efficient integration with	applications.

     For conveniency, the _net/netmap_user.h_ header provides a	few macros and
     functions to ease creating	a file descriptor and doing I/O	with a netmap
     port. These are loosely modeled after the pcap(3) API, to ease porting of
     libpcap-based applications	to netmap.  To use these extra functions, pro-
     grams should
	   #define NETMAP_WITH_LIBS
     before
	   #include <net/netmap_user.h>

     The following functions are available:

     struct nm_desc * nm_open(const char *ifname, const	struct nmreq *req,
	    uint64_t flags, const struct nm_desc *arg)
	    similar to pcap_open, binds	a file descriptor to a port.

	    ifname
		is a port name,	in the form "netmap:XXX" for a NIC and
		"valeXXX:YYY" for a VALE port.

	    req
		provides the initial values for	the argument to	the NIOCREGIF
		ioctl.	The nm_flags and nm_ringid values are overwritten by
		parsing	ifname and flags, and other fields can be overridden
		through	the other two arguments.

	    arg
		points to a struct nm_desc containing arguments	(e.g. from a
		previously open	file descriptor) that should override the
		defaults.  The fields are used as described below

	    flags
		can be set to a	combination of the following flags:
		NETMAP_NO_TX_POLL, NETMAP_DO_RX_POLL (copied into nr_ringid);
		NM_OPEN_NO_MMAP	(if arg	points to the same memory region,
		avoids the mmap	and uses the values from it); NM_OPEN_IFNAME
		(ignores ifname	and uses the values in arg); NM_OPEN_ARG1,
		NM_OPEN_ARG2, NM_OPEN_ARG3 (uses the fields from arg);
		NM_OPEN_RING_CFG (uses the ring	number and sizes from arg).

     int nm_close(struct nm_desc *d)
	    closes the file descriptor,	unmaps memory, frees resources.

     int nm_inject(struct nm_desc *d, const void *buf, size_t size)
	    similar to pcap_inject(), pushes a packet to a ring, returns the
	    size of the	packet is successful, or 0 on error;

     int nm_dispatch(struct nm_desc *d,	int cnt, nm_cb_t cb, u_char *arg)
	    similar to pcap_dispatch(),	applies	a callback to incoming packets

     u_char * nm_nextpkt(struct	nm_desc	*d, struct nm_pkthdr *hdr)
	    similar to pcap_next(), fetches the	next packet

SUPPORTED DEVICES
     netmap natively supports the following devices:

     On	FreeBSD: em(4),	igb(4),	ixgbe(4), lem(4), re(4).

     On	Linux e1000(4),	e1000e(4), igb(4), ixgbe(4), mlx4(4), forcedeth(4),
     r8169(4).

     NICs without native support can still be used in netmap mode through emu-
     lation. Performance is inferior to	native netmap mode but still signifi-
     cantly higher than	sockets, and approaching that of in-kernel solutions
     such as Linux's pktgen.

     Emulation is also available for devices with native netmap	support, which
     can be used for testing or	performance comparison.	 The sysctl variable
     dev.netmap.admode globally	controls how netmap mode is implemented.

SYSCTL VARIABLES AND MODULE PARAMETERS
     Some aspect of the	operation of netmap are	controlled through sysctl
     variables on FreeBSD (dev.netmap.*) and module parameters on Linux
     (/sys/module/netmap_lin/parameters/*):

     dev.netmap.admode:	0
	     Controls the use of native	or emulated adapter mode.  0 uses the
	     best available option, 1 forces native and	fails if not avail-
	     able, 2 forces emulated hence never fails.

     dev.netmap.generic_ringsize: 1024
	     Ring size used for	emulated netmap	mode

     dev.netmap.generic_mit: 100000
	     Controls interrupt	moderation for emulated	mode

     dev.netmap.mmap_unreg: 0

     dev.netmap.fwd: 0
	     Forces NS_FORWARD mode

     dev.netmap.flags: 0

     dev.netmap.txsync_retry: 2

     dev.netmap.no_pendintr: 1
	     Forces recovery of	transmit buffers on system calls

     dev.netmap.mitigate: 1
	     Propagates	interrupt mitigation to	user processes

     dev.netmap.no_timestamp: 0
	     Disables the update of the	timestamp in the netmap	ring

     dev.netmap.verbose: 0
	     Verbose kernel messages

     dev.netmap.buf_num: 163840

     dev.netmap.buf_size: 2048

     dev.netmap.ring_num: 200

     dev.netmap.ring_size: 36864

     dev.netmap.if_num:	100

     dev.netmap.if_size: 1024
	     Sizes and number of objects (netmap_if, netmap_ring, buffers) for
	     the global	memory region. The only	parameter worth	modifying is
	     dev.netmap.buf_num	as it impacts the total	amount of memory used
	     by	netmap.

     dev.netmap.buf_curr_num: 0

     dev.netmap.buf_curr_size: 0

     dev.netmap.ring_curr_num: 0

     dev.netmap.ring_curr_size:	0

     dev.netmap.if_curr_num: 0

     dev.netmap.if_curr_size: 0
	     Actual values in use.

     dev.netmap.bridge_batch: 1024
	     Batch size	used when moving packets across	a VALE switch. Values
	     above 64 generally	guarantee good performance.

SYSTEM CALLS
     netmap uses select(2), poll(2), epoll and kqueue to wake up processes
     when significant events occur, and	mmap(2)	to map memory.	ioctl(2) is
     used to configure ports and VALE switches.

     Applications may need to create threads and bind them to specific cores
     to	improve	performance, using standard OS primitives, see pthread(3).  In
     particular, pthread_setaffinity_np(3) may be of use.

CAVEATS
     No	matter how fast	the CPU	and OS are, achieving line rate	on 10G and
     faster interfaces requires	hardware with sufficient performance.  Several
     NICs are unable to	sustain	line rate with small packet sizes. Insuffi-
     cient PCIe	or memory bandwidth can	also cause reduced performance.

     Another frequent reason for low performance is the	use of flow control on
     the link: a slow receiver can limit the transmit speed.  Be sure to dis-
     able flow control when running high speed experiments.

   SPECIAL NIC FEATURES
     netmap is orthogonal to some NIC features such as multiqueue, schedulers,
     packet filters.

     Multiple transmit and receive rings are supported natively	and can	be
     configured	with ordinary OS tools,	such as	ethtool	or device-specific
     sysctl variables.	The same goes for Receive Packet Steering (RPS)	and
     filtering of incoming traffic.

     netmap does not use features such as checksum offloading, TCP
     segmentation offloading, encryption, VLAN encapsulation/decapsulation,
     etc. .  When using	netmap to exchange packets with	the host stack,	make
     sure to disable these features.

EXAMPLES
   TEST	PROGRAMS
     netmap comes with a few programs that can be used for testing or simple
     applications.  See	the examples/ directory	in netmap distributions, or
     tools/tools/netmap/ directory in FreeBSD distributions.

     pkt-gen is	a general purpose traffic source/sink.

     As	an example
	   pkt-gen -i ix0 -f tx	-l 60
     can generate an infinite stream of	minimum	size packets, and
	   pkt-gen -i ix0 -f rx
     is	a traffic sink.	 Both print traffic statistics,	to help	monitor	how
     the system	performs.

     pkt-gen has many options can be uses to set packet	sizes, addresses,
     rates, and	use multiple send/receive threads and cores.

     bridge is another test program which interconnects	two netmap ports. It
     can be used for transparent forwarding between interfaces,	as in
	   bridge -i ix0 -i ix1
     or	even connect the NIC to	the host stack using netmap
	   bridge -i ix0 -i ix0

   USING THE NATIVE API
     The following code	implements a traffic generator

     #include <net/netmap_user.h>
     void sender(void)
     {
	 struct	netmap_if *nifp;
	 struct	netmap_ring *ring;
	 struct	nmreq nmr;
	 struct	pollfd fds;

	 fd = open("/dev/netmap", O_RDWR);
	 bzero(&nmr, sizeof(nmr));
	 strcpy(nmr.nr_name, "ix0");
	 nmr.nm_version	= NETMAP_API;
	 ioctl(fd, NIOCREGIF, &nmr);
	 p = mmap(0, nmr.nr_memsize, fd);
	 nifp =	NETMAP_IF(p, nmr.nr_offset);
	 ring =	NETMAP_TXRING(nifp, 0);
	 fds.fd	= fd;
	 fds.events = POLLOUT;
	 for (;;) {
	     poll(&fds,	1, -1);
	     while (!nm_ring_empty(ring)) {
		 i = ring->cur;
		 buf = NETMAP_BUF(ring,	ring->slot[i].buf_index);
		 ... prepare packet in buf ...
		 ring->slot[i].len = ... packet	length ...
		 ring->head = ring->cur	= nm_ring_next(ring, i);
	     }
	 }
     }

   HELPER FUNCTIONS
     A simple receiver can be implemented using	the helper functions
     #define NETMAP_WITH_LIBS
     #include <net/netmap_user.h>
     void receiver(void)
     {
	 struct	nm_desc	*d;
	 struct	pollfd fds;
	 u_char	*buf;
	 struct	nm_pkthdr h;
	 ...
	 d = nm_open("netmap:ix0", NULL, 0, 0);
	 fds.fd	= NETMAP_FD(d);
	 fds.events = POLLIN;
	 for (;;) {
	     poll(&fds,	1, -1);
	     while ( (buf = nm_nextpkt(d, &h)) )
		 consume_pkt(buf, h->len);
	 }
	 nm_close(d);
     }

   ZERO-COPY FORWARDING
     Since physical interfaces share the same memory region, it	is possible to
     do	packet forwarding between ports	swapping buffers. The buffer from the
     transmit ring is used to replenish	the receive ring:
	 uint32_t tmp;
	 struct	netmap_slot *src, *dst;
	 ...
	 src = &src_ring->slot[rxr->cur];
	 dst = &dst_ring->slot[txr->cur];
	 tmp = dst->buf_idx;
	 dst->buf_idx =	src->buf_idx;
	 dst->len = src->len;
	 dst->flags = NS_BUF_CHANGED;
	 src->buf_idx =	tmp;
	 src->flags = NS_BUF_CHANGED;
	 rxr->head = rxr->cur =	nm_ring_next(rxr, rxr->cur);
	 txr->head = txr->cur =	nm_ring_next(txr, txr->cur);
	 ...

   ACCESSING THE HOST STACK
     The host stack is for all practical purposes just a regular ring pair,
     which you can access with the netmap API (e.g. with
	   nm_open("netmap:eth0^", ...);
     All packets that the host would send to an	interface in netmap mode end
     up	into the RX ring, whereas all packets queued to	the TX ring are	send
     up	to the host stack.

   VALE	SWITCH
     A simple way to test the performance of a VALE switch is to attach	a
     sender and	a receiver to it, e.g. running the following in	two different
     terminals:
	   pkt-gen -i vale1:a -f rx # receiver
	   pkt-gen -i vale1:b -f tx # sender
     The same example can be used to test netmap pipes,	by simply changing
     port names, e.g.
	   pkt-gen -i vale:x{3 -f rx # receiver	on the master side
	   pkt-gen -i vale:x}3 -f tx # sender on the slave side

     The following command attaches an interface and the host stack to a
     switch:
	   vale-ctl -h vale2:em0
     Other netmap clients attached to the same switch can now communicate with
     the network card or the host.

SEE ALSO
     http://info.iet.unipi.it/~luigi/netmap/

     Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, Communi-
     cations of	the ACM, 55 (3), pp.45-51, March 2012

     Luigi Rizzo, netmap: a novel framework for	fast packet I/O, Usenix
     ATC'12, June 2012,	Boston

     Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for virtual
     machines, ACM CoNEXT'12, December 2012, Nice

     Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione,	Speeding up packet I/O
     in	virtual	machines, ACM/IEEE ANCS'13, October 2013, San Jose

AUTHORS
     The netmap	framework has been originally designed and implemented at the
     Universita` di Pisa in 2011 by Luigi Rizzo, and further extended with
     help from Matteo Landi, Gaetano Catalli, Giuseppe Lettieri, Vincenzo
     Maffione.

     netmap and	VALE have been funded by the European Commission within	FP7
     Projects CHANGE (257422) and OPENLAB (287581).

FreeBSD	10.1		       February	13, 2014		  FreeBSD 10.1

NAME | SYNOPSIS | DESCRIPTION | ARCHITECTURE | ENTERING AND EXITING NETMAP MODE | DATA STRUCTURES | RINGS, BUFFERS AND DATA I/O | SLOTS AND PACKET BUFFERS | SCATTER GATHER I/O | IOCTLS | SELECT, POLL, EPOLL, KQUEUE. | LIBRARIES | SUPPORTED DEVICES | SYSCTL VARIABLES AND MODULE PARAMETERS | SYSTEM CALLS | CAVEATS | EXAMPLES | SEE ALSO | AUTHORS

Want to link to this manual page? Use this URL:
<http://www.freebsd.org/cgi/man.cgi?query=netmap&manpath=FreeBSD+10.1-RELEASE>

home | help