Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
JEMALLOC(3)		 BSD Library Functions Manual		   JEMALLOC(3)

NAME
     jemalloc -- the default system allocator

LIBRARY
     Standard C	Library	(libc, -lc)

SYNOPSIS
     const char	* _malloc_options;

DESCRIPTION
     The jemalloc is a general-purpose concurrent malloc(3) implementation
     specifically designed to be scalable on modern multi-processor systems.
     It	is the default user space system allocator in NetBSD.

     When the first call is made to one	of the memory allocation routines such
     as	malloc() or realloc(), various flags that affect the workings of the
     allocator are set or reset.  These	are described below.

     The "name"	of the file referenced by the symbolic link named
     /etc/malloc.conf, the value of the	environment variable MALLOC_OPTIONS,
     and the string pointed to by the global variable _malloc_options will be
     interpreted, in that order, character by character	as flags.

     Most flags	are single letters.  Uppercase letters indicate	that the be-
     havior is set, or on, and lowercase letters mean that the behavior	is not
     set, or off.  The following options are available.

	A     All warnings (except for the warning about unknown flags being
	      set) become fatal.  The process will call	abort(3) in these
	      cases.

	H     Use madvise(2) when pages	within a chunk are no longer in	use,
	      but the chunk as a whole cannot yet be deallocated.  This	is
	      primarily	of use when swapping is	a real possibility, due	to the
	      high overhead of the madvise() system call.

	J     Each byte	of new memory allocated	by malloc(), realloc() will be
	      initialized to 0xa5.  All	memory returned	by free(), realloc()
	      will be initialized to 0x5a.  This is intended for debugging and
	      will impact performance negatively.

	K     Increase/decrease	the virtual memory chunk size by a factor of
	      two.  The	default	chunk size is 1	MB.  This option can be	speci-
	      fied multiple times.

	N     Increase/decrease	the number of arenas by	a factor of two.  The
	      default number of	arenas is four times the number	of CPUs, or
	      one if there is a	single CPU.  This option can be	specified mul-
	      tiple times.

	P     Various statistics are printed at	program	exit via an atexit(3)
	      function.	 This has the potential	to cause deadlock for a	multi-
	      threaded process that exits while	one or more threads are	exe-
	      cuting in	the memory allocation functions.  Therefore, this op-
	      tion should only be used with care; it is	primarily intended as
	      a	performance tuning aid during application development.

	Q     Increase/decrease	the size of the	allocation quantum by a	factor
	      of two.  The default quantum is the minimum allowed by the ar-
	      chitecture (typically 8 or 16 bytes).  This option can be	speci-
	      fied multiple times.

	S     Increase/decrease	the size of the	maximum	size class that	is a
	      multiple of the quantum by a factor of two.  Above this size,
	      power-of-two spacing is used for size classes.  The default
	      value is 512 bytes.  This	option can be specified	multiple
	      times.

	U     Generate "utrace"	entries	for ktrace(1), for all operations.
	      Consult the source for details on	this option.

	V     Attempting to allocate zero bytes	will return a NULL pointer in-
	      stead of a valid pointer.	 (The default behavior is to make a
	      minimal allocation and return a pointer to it.)  This option is
	      provided for System V compatibility.  This option	is incompati-
	      ble with the X option.

	X     Rather than return failure for any allocation function, display
	      a	diagnostic message on stderr and cause the program to drop
	      core (using abort(3)).  This option should be set	at compile
	      time by including	the following in the source code:

		    _malloc_options = "X";

	Z     Each byte	of new memory allocated	by malloc(), realloc() will be
	      initialized to 0.	 Note that this	initialization only happens
	      once for each byte, so realloc() does not	zero memory that was
	      previously allocated.  This is intended for debugging and	will
	      impact performance negatively.

     Extra care	should be taken	when enabling any of the options in production
     environments.  The	A, J, and Z options are	intended for testing and de-
     bugging.  An application which changes its	behavior when these options
     are used is flawed.

IMPLEMENTATION NOTES
     The jemalloc allocator uses multiple arenas in order to reduce lock con-
     tention for threaded programs on multi-processor systems.	This works
     well with regard to threading scalability,	but incurs some	costs.	There
     is	a small	fixed per-arena	overhead, and additionally, arenas manage mem-
     ory completely independently of each other, which means a small fixed in-
     crease in overall memory fragmentation.  These overheads are not gener-
     ally an issue, given the number of	arenas normally	used.  Note that using
     substantially more	arenas than the	default	is not likely to improve per-
     formance, mainly due to reduced cache performance.	 However, it may make
     sense to reduce the number	of arenas if an	application does not make much
     use of the	allocation functions.

     Memory is conceptually broken into	equal-sized chunks, where the chunk
     size is a power of	two that is greater than the page size.	 Chunks	are
     always aligned to multiples of the	chunk size.  This alignment makes it
     possible to find metadata for user	objects	very quickly.

     User objects are broken into three	categories according to	size:

	1.   Small objects are smaller than one	page.

	2.   Large objects are smaller than the	chunk size.

	3.   Huge objects are a	multiple of the	chunk size.

     Small and large objects are managed by arenas; huge objects are managed
     separately	in a single data structure that	is shared by all threads.
     Huge objects are used by applications infrequently	enough that this sin-
     gle data structure	is not a scalability issue.

     Each chunk	that is	managed	by an arena tracks its contents	in a page map
     as	runs of	contiguous pages (unused, backing a set	of small objects, or
     backing one large object).	 The combination of chunk alignment and	chunk
     page maps makes it	possible to determine all metadata regarding small and
     large allocations in constant time.

     Small objects are managed in groups by page runs.	Each run maintains a
     bitmap that tracks	which regions are in use.  Allocation requests can be
     grouped as	follows.

	+o   Allocation requests	that are no more than half the quantum (see
	    the	Q option) are rounded up to the	nearest	power of two (typi-
	    cally 2, 4,	or 8).

	+o   Allocation requests	that are more than half	the quantum, but no
	    more than the maximum quantum-multiple size	class (see the S op-
	    tion) are rounded up to the	nearest	multiple of the	quantum.

	+o   Allocation requests	that are larger	than the maximum quantum-mul-
	    tiple size class, but no larger than one half of a page, are
	    rounded up to the nearest power of two.

	+o   Allocation requests	that are larger	than half of a page, but small
	    enough to fit in an	arena-managed chunk (see the K option),	are
	    rounded up to the nearest run size.

	+o   Allocation requests	that are too large to fit in an	arena-managed
	    chunk are rounded up to the	nearest	multiple of the	chunk size.

     Allocations are packed tightly together, which can	be an issue for	multi-
     threaded applications.  If	you need to assure that	allocations do not
     suffer from cache line sharing, round your	allocation requests up to the
     nearest multiple of the cache line	size.

DEBUGGING
     The first thing to	do is to set the A option.  This option	forces a core-
     dump (if possible)	at the first sign of trouble, rather than the normal
     policy of trying to continue if at	all possible.

     It	is probably also a good	idea to	recompile the program with suitable
     options and symbols for debugger support.

     If	the program starts to give unusual results, coredump or	generally be-
     have differently without emitting any of the messages mentioned in	the
     next section, it is likely	because	it depends on the storage being	filled
     with zero bytes.  Try running it with the Z option	set; if	that improves
     the situation, this diagnosis has been confirmed.	If the program still
     misbehaves, the likely problem is accessing memory	outside	the allocated
     area.

     Alternatively, if the symptoms are	not easy to reproduce, setting the J
     option may	help provoke the problem.  In truly difficult cases, the U op-
     tion, if supported	by the kernel, can provide a detailed trace of all
     calls made	to these functions.

     Unfortunately, jemalloc does not provide much detail about	the problems
     it	detects; the performance impact	for storing such information would be
     prohibitive.  There are a number of allocator implementations available
     on	the Internet which focus on detecting and pinpointing problems by
     trading performance for extra sanity checks and detailed diagnostics.

ENVIRONMENT
     The following environment variables affect	the execution of the alloca-
     tion functions:

     MALLOC_OPTIONS  If	the environment	variable MALLOC_OPTIONS	is set,	the
		     characters	it contains will be interpreted	as flags to
		     the allocation functions.

EXAMPLES
     To	dump core whenever a problem occurs:

	   ln -s 'A' /etc/malloc.conf

     To	specify	in the source that a program does no return value checking on
     calls to these functions:

	   _malloc_options = "X";

DIAGNOSTICS
     If	any of the memory allocation/deallocation functions detect an error or
     warning condition,	a message will be printed to file descriptor
     STDERR_FILENO.  Errors will result	in the process dumping core.  If the A
     option is set, all	warnings are treated as	errors.

     The _malloc_message variable allows the programmer	to override the	func-
     tion which	emits the text strings forming the errors and warnings if for
     some reason the stderr file descriptor is not suitable for	this.  Please
     note that doing anything which tries to allocate memory in	this function
     is	likely to result in a crash or deadlock.

     All messages are prefixed by "<progname>: (malloc)".

SEE ALSO
     emalloc(3), malloc(3), memory(3), memoryallocators(9)

     Jason Evans, A Scalable Concurrent	malloc(3) Implementation for FreeBSD,
     http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf, April
     16, 2006, BSDCan 2006.

     Poul-Henning Kamp,	"Malloc(3) revisited", Proceedings of the FREENIX
     Track: 1998 USENIX	Annual Technical Conference, USENIX Association,
     http://www.usenix.org/publications/library/proceedings/usenix98/freenix/kamp.pdf,
     June 15-19, 1998.

     Paul R. Wilson, Mark S. Johnstone,	Michael	Neely, and David Boles,
     Dynamic Storage Allocation: A Survey and Critical Review, University of
     Texas at Austin, ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps, 1995.

HISTORY
     The jemalloc allocator became the default system allocator	first in
     FreeBSD 7.0 and then in NetBSD 5.0.  In both systems it replaced the
     older so-called "phkmalloc" implementation.

AUTHORS
     Jason Evans <jasone@canonware.com>

BSD				 June 21, 2011				   BSD

NAME | LIBRARY | SYNOPSIS | DESCRIPTION | IMPLEMENTATION NOTES | DEBUGGING | ENVIRONMENT | EXAMPLES | DIAGNOSTICS | SEE ALSO | HISTORY | AUTHORS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=jemalloc&sektion=3&manpath=NetBSD+6.0>

home | help