Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Devel::NYTProf(3)     User Contributed Perl Documentation    Devel::NYTProf(3)

       Devel::NYTProf -	Powerful fast feature-rich Perl	source code profiler

	 # profile code	and write database to ./nytprof.out
	 perl -d:NYTProf

	 # convert database into a set of html files, e.g., ./nytprof/index.html
	 # and open a web browser on the nytprof/index.html file
	 nytprofhtml --open

	 # or into comma separated files, e.g.,	./nytprof/*.csv

       I give talks on profiling perl code, including a	detailed look at how
       to use NYTProf and how to optimize your code, every year. A video of my
       YAPC::NA	2014 talk can be found at <>

       Devel::NYTProf is a powerful, fast, feature-rich	perl source code

       o   Performs per-line statement profiling for fine detail

       o   Performs per-subroutine statement profiling for overview

       o   Performs per-opcode profiling for slow perl builtins

       o   Performs per-block statement	profiling (the first profiler to do

       o   Accounts correctly for time spent after calls return

       o   Performs inclusive and exclusive timing of subroutines

       o   Subroutine times are	per calling location (a	powerful feature)

       o   Can profile compile-time activity, just run-time, or	just END time

       o   Uses	novel techniques for efficient profiling

       o   Sub-microsecond (100ns) resolution on supported systems

       o   Very	fast - the fastest statement and subroutine profilers for perl

       o   Handles applications	that fork, with	no performance cost

       o   Immune from noise caused by profiling overheads and I/O

       o   Program being profiled can stop/start the profiler

       o   Generates richly annotated and cross-linked html reports

       o   Captures source code, including string evals, for stable results

       o   Trivial to use with mod_perl	- add one line to httpd.conf

       o   Includes an extensive test suite

       o   Tested on very large	codebases

       NYTProf is effectively two profilers in one: a statement	profiler, and
       a subroutine profiler.

   Statement Profiling
       The statement profiler measures the time	between	entering one perl
       statement and entering the next.	Whenever execution reaches a new
       statement, the time since entering the previous statement is calculated
       and added to the	time associated	with the line of the source file that
       the previous statement starts on.

       By default the statement	profiler also determines the first line	of the
       current block and the first line	of the current statement, and
       accumulates times associated with those.

       Another innovation unique to NYTProf is automatic compensation for a
       problem inherent	in simplistic statement-to-statement timing. Consider
       a statement that	calls a	subroutine and then performs some other	work
       that doesn't execute new	statements, for	example:

	 foo(...) + mkdir(...);

       In all other statement profilers	the time spent in remainder of the
       expression (mkdir in the	example) will be recorded as having been spent
       on the last statement executed in foo()!	Here's another example:

	 while (<>) {

       After the first time around the loop, any further time spent evaluating
       the condition (waiting for input	in this	example) would be recorded as
       having been spent on the	last statement executed	in the loop! (Until
       perl bug	#60954 is fixed	this problem still applies to some loops. For
       more information	see

       NYTProf avoids these problems by	intercepting the opcodes which
       indicate	that control is	returning into some previous statement and
       adjusting the profile accordingly.

       The statement profiler naturally	generates a lot	of data	which is
       streamed	out to a file in a very	compact	format.	NYTProf	takes care to
       not include the measurement and writing overheads in the	profile	times
       (some profilers produce 'noisy' data due	to periodic stdio flushing).

   Subroutine Profiling
       The subroutine profiler measures	the time between entering a subroutine
       and leaving it. It then increments a call count and accumulates the
       duration.  For each subroutine called, separate counts and durations
       are stored for each location that called	the subroutine.

       Subroutine entry	is detected by intercepting the	"entersub" opcode.
       Subroutine exit is detected via perl's internal save stack. As a	result
       the subroutine profiler is both fast and	robust.

       Subroutine Recursion

       For subroutines that recurse directly or	indirectly, such as
       Error::try, the inclusive time is only measured for the outer-most

       The inclusive times of recursive	calls are still	measured and are
       accumulated separately. Also the	'maximum recursion depth' per calling
       location	is recorded.

       Goto _Subroutine

       Perl implements a "goto &destination" as	a "return" followed by a call
       to &destination,	so that's how it will appear in	the report.

       The "goto" will be shown	with a very short time because it's
       effectively just	a "return". The	&destination sub will show a call not
       from the	location of the	"goto" but from	the location of	the call to
       the sub that performed the "goto".


       The perl	built-in accept() function waits listening for a connection on
       a socket, and so	is a key part of pure-perl network service

       The time	spent waiting for a remotely initiated connection can be
       relatively high but is not relevant to the performance of the
       application. So the accept() function is	treated	as a special case. The
       subroutine profiler discounts the time spent in the accept() function.
       It does this in a way that also discounts that time from	all the
       callers up the call stack. The effect on	the reports is that all
       accept()	calls appear to	be instant.

       The statement profiler still shows the time actually spent in the
       statement that executed the accept() call.

   Application Profiling
       NYTProf records extra information in the	data file to capture details
       that may	be useful when analyzing the performance. It also records the
       filename	and line ranges	of all the subroutines.

       NYTProf can profile applications	that fork, and does so with no loss of
       performance.  NYTProf detects the fork and starts writing a new profile
       file with the pid appended to the filename. Since nytprofhtml only
       works with a single profile file	you may	want to	merge multiple files
       using nytprofmerge.

   Fast	Profiling
       The NYTProf profiler is written almost entirely in C and	great care has
       been taken to ensure it's very efficient.

   Apache Profiling
       Just add	one line near the start	of your	httpd.conf file:

	 PerlModule Devel::NYTProf::Apache

       By default you'll get a /tmp/nytprof.$$.out file	for the	parent process
       and a /tmp/nytprof.$parent.out.$$ file for each worker process.

       NYTProf takes care to detect when control is returning back from	perl
       to mod_perl so time spent in mod_perl (such as waiting for the next
       request)	does not get allocated to the last statement executed.

       Works with mod_perl 1 and 2. See	Devel::NYTProf::Apache for more

       Usually you'd load Devel::NYTProf on the	command	line using the perl -d

	 perl -d:NYTProf

       To save typing the ':NYTProf' you could set the PERL5DB env var

	 PERL5DB='use Devel::NYTProf'

       and then	just perl -d would work:

	 perl -d

       Or you can avoid	the need to add	the -d option at all by	using the
       "PERL5OPT" env var:


       That's also very	handy when you can't alter the perl command line being
       used to run the script you want to profile. Usually you'll want to
       enable the "addpid=1" option to ensure any nested invocations of	perl
       don't overwrite the profile.

       The behavior of Devel::NYTProf may be modified by setting the
       environment variable "NYTPROF".	It is possible to use this environment
       variable	to effect multiple setting by separating the values with a
       ":".  For example:

	 export	NYTPROF=trace=2:start=init:file=/tmp/nytprof.out

       Any colon or equal characters in	a value	can be escaped by preceding
       them with a backslash.

       Append the current process id to	the end	of the filename.

       This avoids concurrent, or consecutive, processes from overwriting the
       same file.  If a	fork is	detected during	profiling then the child
       process will automatically add the process id to	the filename.

       Append the current time,	as integer epoch seconds, to the end of	the

       Set trace level to N. 0 is off (the default). Higher values cause more
       detailed	trace output. Trace output is written to STDERR	or wherever
       the "log=F" option has specified.

       Specify the name	of the file that "trace=N" output should be written

       Specify at which	phase of program execution the profiler	should be

	 start=begin - start immediately (the default)
	 start=init  - start at	beginning of INIT phase	(after compilation/use/BEGIN)
	 start=end   - start at	beginning of END phase
	 start=no    - don't automatically start

       The start=no option is handy if you want	to explicitly control
       profiling by calling DB::enable_profile() and DB::disable_profile()
       yourself.  See "RUN-TIME	CONTROL	OF PROFILING".

       The start=init option is	handy if you want to avoid profiling the
       loading and initialization of modules.

       Disable the perl	optimizer.

       By default NYTProf leaves perl's	optimizer enabled.  That gives you
       more accurate profile timing overall, but can lead to odd statement
       counts for individual sets of lines. That's because the perl's peephole
       optimizer has effectively rewritten the statements but you can't	see
       what the	rewritten version looks	like.

       For example:

	 1     if (...)	{
	 2	   return;
	 3     }

       may be rewritten	as

	 1    return if	(...)

       so the profile won't show a statement count for line 2 in your source
       code because the	"return" was merged into the "if" statement on the
       preceding line.

       Also 'empty' statements like "1;" are removed entirely.	Such
       statements are empty because the	optimizer has already removed the
       pointless constant in void context. It then goes	on to remove the now
       empty statement (in perl	>= 5.13.7).

       Using the "optimize=0" option disables the optimizer so you'll get
       lower overall performance but more accurately assigned statement

       If you find any other examples of the effect of optimizer on NYTProf
       output (other than performance, obviously) please let us	know.

       Set to 0	to disable the collection of subroutine	caller and timing

       Set to 1	to enable the determination of block and subroutine location
       per statement.  This makes the profiler about 50% slower	(as of July
       2008) and produces larger output	files, but you gain some valuable
       insight in where	time is	spent in the blocks within large subroutines
       and scripts.

       Set to 0	to disable the statement profiler. (Implies "blocks=0".)  The
       reports won't contain any statement timing detail.

       This significantly reduces the overhead of the profiler and can also be
       useful for profiling large applications that would normally generate a
       very large profile data file.

       This option is new and experimental.

       With calls=1 (the default) subroutine call return events	are emitted
       into the	data stream as they happen.  With calls=2 subroutine call
       entry events are	also emitted. With calls=0 no subroutine call events
       are produced.  This option depends on the "subs"	option being enabled,
       which it	is by default.

       The nytprofcalls	utility	can be used to process this data. It too is
       new and experimental and	so likely to change.

       The subroutine profiler normally	gathers	data in	memory and outputs a
       summary when the	profile	data is	being finalized, usually when the
       program has finished.  The summary contains aggregate information for
       all the calls from one location to another, but the details of
       individual calls	have been lost.	 The calls option enables the
       recording of individual call events and thus more detailed analysis and
       reporting of that data.

       Set to 0	to disable the extra work done by the statement	profiler to
       allocate	times accurately when returning	into the middle	of statement.
       For example leaving a subroutine	and returning into the middle of
       statement, or re-evaluating a loop condition.

       This feature also ensures that in embedded environments,	such as
       mod_perl, the last statement executed doesn't accumulate	the time spent
       'outside	perl'.

       Force NYTProf to	recalculate the	name of	the caller of the each sub
       instead of 'inheriting' the name	calculated when	the caller was
       entered.	(Rarely	needed,	but might be useful in some odd	cases.)

       Set to 1	to enable use of the traditional DB::DB() subroutine to
       perform profiling, instead of the faster	'opcode	redirection' technique
       that's used by default. Also effectively	sets "leave=0" (see above).

       The default 'opcode redirection'	technique can't	profile	subroutines
       that were compiled before NYTProf was loaded. So	using use_db_sub=1 can
       be useful in cases where	you can't load the profiler early in the life
       of the application.

       Another side effect of "use_db_sub=1" is	that it	enables	recording of
       the source code of the "perl -e '...'" and "perl	-" input for old
       versions	of perl. See also "savesrc=0".

       Disable the saving of source code.

       By default NYTProf saves	a copy of all source code into the profile
       data file.  This	makes the file self-contained, so the reporting	tools
       no longer depend	on having the unmodified source	code files available.

       With "savesrc=0"	some source code is still saved: the arguments to the
       "perl -e" option, the script fed	to perl	via STDIN when using "perl -",
       and the source code of string evals.

       Saving the source code of string	evals requires perl version 5.8.9+,
       5.10.1+,	or 5.12	or later.

       Saving the source code of the "perl -e '...'" or	"perl -" input
       requires	either a recent	perl version, as above,	or setting the
       "use_db_sub=1" option.

       Profile perl opcodes that can be	slow. These include opcodes that make
       system calls, such as "print", "read", "sysread", "socket" etc.,	plus
       regular expression opcodes like "subst" and "match".

       If "N" is 0 then	slowops	profiling is disabled.

       If "N" is 1 then	all the	builtins are treated as	being defined in the
       "CORE" package. So times	for "print" calls from anywhere	in your	code
       are merged and accounted	for as calls to	an xsub	called "CORE::print".

       If "N" is 2 (the	default) then builtins are treated as being defined in
       the package that	calls them. So calls to	"print"	from package "Foo" are
       treated as calls	to an xsub called "Foo::CORE:print". Note the single
       colon after CORE.

       The opcodes are currently profiled using	their internal names, so
       "printf"	is "prtf" and the "-x" file test is "fteexec". This may	change
       in future.

       Opcodes that call subroutines, perhaps by triggering a FETCH from a
       tied variable, currently	appear in the call tree	as the caller of the
       sub. This is likely to change in	future.

       This option has been removed. Profiling won't be	enabled	if set.

       Use the "clock=N" option	to select a high-resolution CPU	time clock, if
       available on your system, instead. That will give you higher resolution
       and work	for the	subroutine profiler as well.

       Specify the output file to write	profile	data to	(default:

       Specify the compression level to	use, if	NYTProf	is compiled with
       compression support. Valid values are 0 to 9, with 0 disabling
       compression. The	default	is 6 as	higher values yield little extra
       compression but the cpu cost starts to rise significantly. Using	level
       1 still gives you a significant reduction in file size.

       If NYTProf was not compiled with	compression support, this option is
       silently	ignored.

       Systems which support the "clock_gettime()" system call typically
       support several clocks. By default NYTProf uses CLOCK_MONOTONIC.

       This option enables you to select a different clock by specifying the
       integer id of the clock (which may vary between operating system
       types).	If the clock you select	isn't available	then CLOCK_REALTIME is

       See "CLOCKS" for	more information.

       When perl exits normally	it runs	any code defined in "END" blocks.
       NYTProf defines an END block that finishes profiling and	writes out the
       final profile data.

       If the process ends due to a signal then	END blocks are not executed so
       the profile will	be incomplete and unusable.  The "sigexit" option
       tells NYTProf to	catch some signals (e.g. INT, HUP, PIPE, SEGV, BUS)
       and ensure a usable profile by executing:

	   exit	1;

       You can also specify which signals to catch in this way by listing
       them, separated by commas, as the value of the option (case is not


       The NYTProf subroutine profiler normally	detects	calls to
       "POSIX::_exit()"	(which exits the process without running END blocks)
       and automatically calls "DB::finish_profile()" for you, so NYTProf
       'just works'.

       When using the "subs=0" option to disable the subroutine	profiler the
       "posix_exit" option can be used to tell NYTProf to take other steps to
       arrange for "DB::finish_profile()" to be	called before

       Arranges	for "finish_profile" to	be called via the C library "atexit()"
       function.  This may help	some tricky cases where	the process may	exit
       without perl executing the "END"	block that NYTProf uses	to call

       Sets the	PERL_EXIT_DESTRUCT_END flag in the PL_exit_flags of the	perl
       interpreter.  This makes	perl run "END" blocks in perl_destruct()
       instead of perl_run() which may help in cases, like Apache, where perl
       is embedded but perl_run() isn't	called.

       When a perl process that	is being profiled executes a fork() the	child
       process is also profiled. The forkdepth option can be used to control
       this. If	forkdepth is zero then profiling will be disabled in the child

       If forkdepth is greater than zero then profiling	will be	enabled	in the
       child process and the forkdepth value in	that process is	decremented by

       If forkdepth is -1 (the default)	then there's no	limit on the number of
       generations of children that are	profiled.

       The 'file name' of a string eval	is normally a string like ""(eval
       N)"", where "N" is a sequence number. By	default	NYTProf	asks perl to
       give evals more informative names like ""(eval N)[file:line]"", where
       "file" and "line" are the file and line number where the	string "eval"
       was executed.

       The "nameevals=0" option	can be used to disable the more	informative
       names and return	to the default behaviour. This may be need in rare
       cases where the application code	is sensitive to	the name given to a
       "eval". (The most common	case in	when running test suites undef

       The downside is that the	NYTProf	reporting tools	are less useful	and
       may get confused	if this	option is used.

       The name	of a anonymous subroutine is normally ""__ANON__"".  By
       default NYTProf asks perl to give anonymous subroutines more
       informative names like ""__ANON__[file:line]"", where "file" and	"line"
       are the file and	line number where the anonymous	subroutine was

       The "nameanonsubs=0" option can be used to disable the more informative
       names and return	to the default behaviour. This may be need in rare
       cases where the application code	is sensitive to	the name given to a
       anonymous subroutines.  (The most common	case in	when running test
       suites undef NYTProf.)

       The downside is that the	NYTProf	reporting tools	are less useful	and
       may get confused	if this	option is used.

       You can profile only parts of an	application by calling
       DB::disable_profile() to	stop collecting	profile	data, and calling
       DB::enable_profile() to start collecting	profile	data.

       Using the "start=no" option lets	you leave the profiler disabled
       initially until you call	DB::enable_profile() at	the right moment. You
       still need to load Devel::NYTProf as early as possible, even if you
       don't call DB::enable_profile() until much later. That's	because	any
       code that's compiled before Devel::NYTProf is loaded will not be
       profiled	by default. See	also "use_db_sub=1".

       The profile output file can't be	used until it's	been properly
       completed and closed.  Calling DB::disable_profile() doesn't do that.
       To make a profile file usable before the	profiled application has
       completed you can call DB::finish_profile(). Alternatively you could
       call DB::enable_profile($newfile).

       Always call the DB::enable_profile(), DB::disable_profile() or
       DB::finish_profile() function with the "DB::" prefix as shown because
       you can't import	them. They're provided automatically when NYTProf is
       in use.


       Stops collection	of profile data	until DB:enable_profile() is called.

       Subroutine calls	which were made	while profiling	was enabled and	are
       still on	the call stack (have not yet exited) will still	have their
       profile data collected when they	exit. Compare with "finish_profile"


       Enables collection of profile data. If $newfile is specified the
       profile data will be written to $newfile	(after completing and closing
       the previous file, if any).  If $newfile	already	exists it will be
       deleted first.  If DB::enable_profile() is called without a filename
       argument	then profile data will continue	to be written to the current
       file (nytprof.out by default).


       Calls DB::disable_profile(), then completes the profile data file by
       writing subroutine profile data,	and then closes	the file. The in
       memory subroutine profile data is then discarded.

       Normally	NYTProf	arranges to call finish_profile() for you via an END

       NYTProf tries very hard to gather accurate information.	The nature of
       the internals of	perl mean that,	in some	cases, the information that's
       gathered	is accurate but	surprising. In some cases it can appear	to be
       misleading.  (Of	course,	in some	cases it may actually be plain wrong.
       Caveat lector.)

   If Statement	and Subroutine Timings Don't Match
       NYTProf has two profilers: a statement profiler that's invoked when
       perl moves from one perl	statement to another, and a subroutine
       profiler	that's invoked when perl calls or returns from a subroutine.

       The individual statement	timings	for a subroutine usually add up	to
       slightly	less than the exclusive	time for the subroutine. That's
       because the handling of the subroutine call and return overheads	is
       included	in the exclusive time for the subroutine. The difference may
       only be a few microseconds but that may become noticeable for
       subroutines that	are called hundreds of thousands of times.

       The statement profiler keeps track how much time	was spent on
       overheads, like writing statement profile data to disk. The subroutine
       profiler	subtracts the overheads	that have accumulated between entering
       and leaving the subroutine in order to give a more accurate profile.
       The statement profiler is generally very	fast because most writes get
       buffered	for zip	compression so the profiler overhead per statement
       tends to	be very	small, often a single 'tick'.  The result is that the
       accumulated overhead is quite noisy. This becomes more significant for
       subroutines that	are called frequently and are also fast.  This may be
       another,	smaller, contribution to the discrepancy between statement
       time and	exclusive times.

   If Headline Subroutine Timings Don't	Match the Called Subs
       Overall subroutine times	are reported with a headline like "spent 10s
       (2+8) within ...".  In this example, 10 seconds were spent inside the
       subroutine (the "inclusive time") and, of that, 8 seconds were spent in
       subroutines called by this one.	That leaves 2 seconds as the time
       spent in	the subroutine code itself (the	"exclusive time", sometimes
       also called the "self time").

       The report shows	the source code	of the subroutine. Lines that make
       calls to	other subroutines are annotated	with details of	the time spent
       in those	calls.

       Sometimes the sum of the	times for calls	made by	the lines of code in
       the subroutine is less than the inclusive-exclusive time	reported in
       the headline (10-2 = 8 seconds in the example above).

       What's happening	here is	that calls to other subroutines	are being made
       but NYTProf isn't able to determine the calling location	correctly so
       the calls don't appear in the report in the correct place.

       Using an	old version of perl is one cause (see below). Another is
       calling subroutines that	exit via "goto &sub;" -	most frequently
       encountered in AUTOLOAD subs and	code using the Memoize module.

       In general the overall subroutine timing	is accurate and	should be
       trusted more than the sum of statement or nested	sub call timings.

   Perl	5.10.1+	(or else 5.8.9+) is Recommended
       These versions of perl yield much more detailed information about calls
       to BEGIN, CHECK,	INIT, and END blocks, the code handling	tied or
       overloaded variables, and callbacks from	XS code.

       Perl 5.12 will hopefully	also fix an inaccuracy in the timing of	the
       last statement and the condition	clause of some kinds of	loops:

   eval	$string
       Perl treats each	execution of a string eval ("eval $string;" not	"eval
       { ...  }") as a distinct	file, so NYTProf does as well. The 'files' are
       given names with	this structure:

	       (eval $sequence)[$filename:$line]

       for example ""(eval 93)[/foo/]"" would be the name given to
       the 93rd	execution of a string eval by that process and,	in this	case,
       the 93rd	eval happened to be one	at line	42 of "/foo/".

       Nested string evals can give rise to file names like

	       (eval 1047)[(eval 93)[/foo/]:17]

       Merging Evals

       Some applications execute a great many string eval statements. If
       NYTProf generated a report page for each	one it would not only slow
       report generation but also make the overall report less useful by
       scattering performance data too widely.	On the other hand, being able
       to see the actual source	code executed by an eval, along	with the
       timing details, is often	very useful.

       To try to balance these conflicting needs, NYTProf currently merges
       uninteresting string eval siblings.

       What does that mean? Well, for each source code line that executed any
       string evals, NYTProf first gathers the corresponding eval 'files' for
       that line (known	as the 'siblings') into	groups keyed by	distinct
       source code.

       Then, for each of those groups of siblings, NYTProf will	'merge'	a
       group that shares the same source code and doesn't execute any string
       evals itself.  Merging means to pick one	sibling	as the survivor	and
       merge and delete	all the	data from the others into it.

       If there	are a large number of sibling groups then the data for all of
       them are	merged into one	regardless.

       The report annotations will indicate when evals have been merged

       Merging Anonymous Subroutines

       Anonymous subroutines defined within string evals have names like this:

	       main::__ANON__[(eval 75)[/foo/]:12]

       That anonymous subroutine was defined on	line 12	of the source code
       executed	by the string eval on line 42 of /foo/ That was the
       75th string eval	executed by the	program.

       Anonymous subroutines defined on	the same line of sibling evals that
       get merged are also merged. That	is, the	profile	information is merged
       into one	and the	others are discarded.


       Care should be taken when interpreting the report annotations
       associated with a string	eval statement.	 Normally the report
       annotations embedded into the source code related to timings from the
       subroutine profiler. This isn't (currently) true	of annotations for
       string eval statements.

       This makes a significant	different if the eval defines any subroutines
       that get	called after the eval has returned. Because the	time shown for
       a string	eval is	based on the statement times it	will include time
       spent executing statements within the subs defined by the eval.

       In future NYTProf may involve the subroutine profiler in	timings	evals
       and so be able to avoid this issue.

   Calls from XSUBs and	Opcodes
       Calls record the	current	filename and line number of the	perl code at
       the time	the call was made. That's fine and accurate for	calls from
       perl code. For calls that originate from	C code however,	such as	an
       XSUB or an opcode, the filename and line	number recorded	are still
       those of	the last perl statement	executed.

       For example, a line that	calls an xsub will appear in reports to	also
       have also called	any subroutines	that that xsub called. This can	be
       construed as a feature.

       As an extreme example, the first	time a regular expression that uses
       character classes is executed on	a unicode string you'll	find profile
       data like this:

	     # spent 1ms within	main::BEGIN@4 which was	called
	     #	  once (1ms+0s)	by main::CORE:subst at line 0
	 4   s/	(?: [A-Z] | [\d] )+ (?=	[\s] ) //x;
	     # spent  38.8ms making 1 call to main::CORE:subst
	     # spent  25.4ms making 2 calls to utf8::SWASHNEW, avg 12.7ms/call
	     # spent  12.4ms making 1 call to utf8::AUTOLOAD

       You can reduce the cost of profiling by adjusting some options. The
       trade-off is reduced detail and/or accuracy in reports.

       If you don't need statement-level profiling then	you can	disable	it via
       "stmts=0".  To further boost statement-level profiling performance try
       "leave=0" but note that will apportion timings for some kinds of
       statements less accurate).

       If you don't need call stacks or	flamegraph then	disable	it via
       "calls=0".  If you don't	need subroutine	profiling then you can disable
       it via "subs=0".	 If you	do need	it but don't need timings for perl
       opcodes then set	"slowops=0".

       Generally speaking, setting calls=0 and slowops=0 will give you a
       useful boost with the least loss	of detail.

       Another approach	is to only enable NYTProf in the sections of code that
       interest	you. See "RUN-TIME CONTROL OF PROFILING" for more details.

       To speed	up nytprofhtml try using the --minimal (-m) or --no-flame

       The Devel::NYTProf::Data	module provides	a low-level interface for
       loading the profile data.

       The Devel::NYTProf::Reader module provides an interface for generating
       arbitrary reports.  This	means that you can implement your own output
       format in perl. (Though the module is in	a state	of flux	and may	be
       deprecated soon.)

       Included	in the bin directory of	this distribution are some scripts
       which turn the raw profile data into more useful	formats:

       Creates attractive, richly annotated, and fully cross-linked html
       reports (including statistics, source code and color highlighting).
       This is the main	report generation tool for NYTProf.

       Translates a profile into a format that can be loaded into KCachegrind

       Reads a profile and processes the calls events it contains.

       Reads multiple profile data files and writes out	a new file containing
       the merged profile data.

   Threads and Multiplicity
       "Devel::NYTProf"	is not currently thread	safe or	multiplicity safe.  If
       you'd be	interested in helping to fix that then please get in touch
       with us.	Meanwhile, profiling is	disabled when a	thread is created, and
       NYTProf tries to	ignore any activity from perl interpreters other than
       the first one that loaded it.

       The "Devel::NYTProf" subroutine profiler	gets confused by the stack
       gymnastics performed by the Coro	module and aborts. When	profiling
       applications that use Coro you should disable the subroutine profiler
       using the "subs=0" option.

       Using "open('-|')" in code running under	FCGI::Engine causes a panic in
       nytprofcalls.  See
       for more	information.

   For perl < 5.8.8 it may change what caller()	returns
       For example, the	Readonly module	croaks with "Invalid tie" when
       profiled	with perl versions before 5.8.8. That's	because	Readonly
       explicitly checking for certain values from caller(). The NEXT module
       is also affected.

   For perl < 5.10.1 it	can't see some implicit	calls and callbacks
       For perl	versions prior to 5.8.9	and 5.10.1, some implicit subroutine
       calls can't be seen by the subroutine profiler. Technically this
       affects calls made via the various perl "call_*()" internal APIs.

       For example, BEGIN/CHECK/INIT/END blocks, the "TIE"whatever subroutine
       called by "tie()", all calls made via operator overloading, and
       callbacks from XS code, are not seen.

       The effect is that time in those	subroutines is accumulated by the subs
       that triggered the call to them.	So time	spent in calls invoked by perl
       to handle overloading are accumulated by	the subroutines	that trigger
       overloading (so it is measured, but the cost is dispersed across
       possibly	many calling locations).

       Although	the calls aren't seen by the subroutine	profiler, the
       individual statements executed by the code in the called	subs are
       profiled	by the statement profiler.

   #line directives
       The reporting code currently doesn't handle #line directives, but at
       least it	warns about them. Patches welcome.

   Freed values	in @_ may be mutated
       Perl has	a class	of bugs	related	to the fact that values	placed in the
       stack are not reference counted.	Consider this example:

	 @a = (1..9);  sub s { undef @a; print $_ for @_ }  s(@a);

       The "undef @a" frees the	values that @_ refers to. Perl can sometimes
       detect when a freed value is accessed and treats	it as an undef.
       However,	if the freed value is assigned some new	value then @_ is
       effectively corrupted.

       NYTProf allocates new values while it's profiling, in order to record
       program activity, and so	may appear to corrupt @_ in this (rare)
       situation.  If this happens, NYTProf is simply exposing an existing
       problem in the code.

   Lvalue subroutines aren't profiled when using use_db_sub=1
       Currently 'lvalue' subroutines (subs that can be	assigned to, like
       "foo() =	42") are not profiled when using use_db_sub=1.

       Here we discuss the way NYTProf gets high-resolution timing information
       from your system	and related issues.

   POSIX Clocks
       These are the clocks that your system may support if it supports	the
       POSIX "clock_gettime()" function. Other clock sources are listed	in the
       "Other Clocks" section below.

       The "clock_gettime()" interface allows clocks to	return times to
       nanosecond precision. Of	course few offer nanosecond accuracy but the
       extra precision helps reduce the	cumulative error that naturally	occurs
       when adding together many timings. When using these clocks NYTProf
       outputs timings as a count of 100 nanosecond ticks.


       CLOCK_MONOTONIC represents the amount of	time since an unspecified
       point in	the past (typically system start-up time).  It increments
       uniformly independent of	adjustments to 'wallclock time'. NYTProf will
       use this	clock by default, if available.


       CLOCK_REALTIME is typically the system's	main high resolution 'wall
       clock time' source.  The	same source as used for	the gettimeofday()
       call used by most kinds of perl benchmarking and	profiling tools.

       The problem with	real time is that it's far from	simple.	It tends to
       drift and then be reset to match	'reality', either sharply or by	small
       adjustments (via	the adjtime() system call).

       Surprisingly, it	can also go backwards, for reasons explained in so CLOCK_MONOTONIC is preferred.


       CLOCK_VIRTUAL increments	only when the CPU is running in	user mode on
       behalf of the calling process.


       CLOCK_PROF increments when the CPU is running in	user or	kernel mode.


       CLOCK_PROCESS_CPUTIME_ID	represents the amount of execution time	of the
       process associated with the clock.


       CLOCK_THREAD_CPUTIME_ID represents the amount of	execution time of the
       thread associated with the clock.

       Finding Available POSIX Clocks

       On unix-like systems you	can find the CLOCK_* clocks available on you
       system using a command like:

	 grep -r 'define *CLOCK_' /usr/include

       Look for	a group	that includes CLOCK_REALTIME. The integer values
       listed are the clock ids	that you can use with the "clock=N" option.

       A future	version	of NYTProf should be able to list the supported

   Other Clocks
       This section lists other	clock sources that NYTProf may use.

       If your system doesn't support clock_gettime() then NYTProf will	use
       gettimeofday(), or the nearest equivalent,


       This is the traditional high resolution time of day interface for most
       unix-like systems.  With	this clock NYTProf outputs timings as a	count
       of 1 microsecond	ticks.


       On Mac OS X the mach_absolute_time() function is	used. With this	clock
       NYTProf outputs timings as a count of 100 nanosecond ticks.


       On systems which	don't support other clocks, NYTProf falls back to
       using the Time::HiRes module.  With this	clock NYTProf outputs timings
       as a count of 1 microsecond ticks.

   Clock References
       Relevant	specifications and manual pages:

       Why 'realtime' can appear to go backwards:

       The PostgreSQL pg_test_timing utility documentation has a good summary
       of timing issues:

   SMP Systems
       On systems with multiple	processors, which includes most	modern
       machines, (from Linux docs though applicable to most SMP	systems):

	 many platforms	using timers from the CPUs (TSC	on i386, AR.ITC	on Itanium).
	 These registers may differ between CPUs and as	a consequence these clocks may
	 return	bogus results if a process is migrated to another CPU.

	 If the	CPUs in	an SMP system have different clock sources then	there is no way
	 to maintain a correlation between the timer registers since each CPU will run
	 at a slightly different frequency. If that is the case	then
	 clock_getcpuclockid(0)	will return ENOENT to signify this condition. The two
	 clocks	will then only be useful if it can be ensured that a process stays on a
	 certain CPU.

	 The processors	in an SMP system do not	start all at exactly the same time and
	 therefore the timer registers are typically running at	an offset. Some
	 architectures include code that attempts to limit these offsets on bootup.
	 However, the code cannot guarantee to accurately tune the offsets. Glibc
	 contains no provisions	to deal	with these offsets (unlike the Linux Kernel).
	 Typically these offsets are small and therefore the effects may be negligible
	 in most cases.

       In summary, SMP systems are likely to give 'noisy' profiles.  Setting a
       "Processor Affinity" may	help.

       Processor Affinity

       Processor affinity is an	aspect of task scheduling on SMP systems.
       "Processor affinity takes advantage of the fact that some remnants of a
       process may remain in one processor's state (in particular, in its
       cache) from the last time the process ran, and so scheduling it to run
       on the same processor the next time could result	in the process running
       more efficiently	than if	it were	to run on another processor." (From

       Setting an explicit processor affinity can avoid	the problems described
       in "SMP Systems".

       Processor affinity can be set using the "taskset" command on Linux.

       Note that processor affinity is inherited by child processes, so	if the
       process you're profiling	spawns cpu intensive sub processes then	your
       process will be impacted	by those more than it otherwise	would.



       On Windows NYTProf uses Time::HiRes which uses the windows
       QueryPerformanceCounter() API with some extra logic to adjust for the
       current clock speed and try to resync the raw counter to	wallclock time
       every so	often (every 30	seconds	or if the timer	drifts by more than
       0.5 of a	seconds).  This	extra logic may	lead to	occasional spurious

       (It would be great if someone could contribute a	patch to NYTProf to
       use QueryPerformanceCounter() directly and avoid	the overheads and
       resyncing behaviour of Time::HiRes.)

   Virtual Machines
       I recommend you don't do	performance profiling while running in a
       virtual machine.	 If you	do you're likely to find inexplicable spikes
       of real-time appearing at unreasonable places in	your code. You should
       pay less	attention to the statement timings and rely more on the
       subroutine timings. They	will still be noisy but	less so	than the
       statement times.

       You could also try using	the "clock=N" option to	select a high-
       resolution cpu-time clock instead of a real-time	one. That should be
       much less noisy,	though you will	lose visibility	of wait-times due to
       network and disk	I/O, for example.

       Possibly. All complex software has bugs.	Let me know if you find	one.

       Screenshots of nytprofhtml v2.01	reports	can be seen at
       A writeup of the	new features of	NYTProf	v2 can be found	at
       and the background story, explaining the	"why", can be found at

       Mailing list and	discussion at

       Blog posts <>

       Public Github Repository	and hacking instructions at

       nytprofhtml is a	script included	that produces html reports.
       nytprofcsv is another script included that produces plain text CSV

       Devel::NYTProf::Reader is the module that powers	the report scripts.
       You might want to check this out	if you plan to implement a custom
       report (though it's very	likely to be deprecated	in a future release).

       Devel::NYTProf::ReadStream is the module	that lets you read a profile
       data file as a stream of	chunks of data.

       Other tools:

       DTrace <>

   "Profile data incomplete, ..." or "Profile format error: ..."
       This error message means	the file doesn't contain all the expected data
       or the data has been corrupted in some way.  That may be	because	it was
       truncated (perhaps the filesystem was full) or, more commonly, because
       the all the expected data hasn't	been written.

       NYTProf writes some important data to the data file when	finishing
       profiling.  If you read the file	before the profiling has finished
       you'll get this error.

       If the process being profiled is	still running you'll need to wait
       until it	exits cleanly (runs "END" blocks or "finish_profile" is	called

       If the process being profiled has exited	then it's likely that it met
       with a sudden and unnatural death that didn't give NYTProf a chance to
       finish the profile.  If the sudden death	was due	to a signal, like
       SIGTERM,	or a SIGINT from pressing Ctrl-C, then the "sigexit=1" option
       may help.

       If the sudden death was due to calling "POSIX::_exit($status)" then
       you'll need to call "finish_profile" before calling "POSIX::_exit".

       You'll also get this error if the code trying to	read the profile is
       itself being profiled. That's most likely to happen if you enable
       profiling via the "PERL5OPT" environment	variable and have forgotten to
       unset it.

       If you've encountered this error	message, and you're sure you've
       understood the concerns described above,	and you're sure	they don't
       apply in	your case, then	please open an issue.  Be sure to include
       sufficient information so I can see how you've addressed	these likely

   Some	source files don't have	profile	information
       This is usually due to NYTProf being initialized	after some perl	files
       have already been compiled.

       If you can't alter the command line to add ""-d:NYTProf"" you could try
       using the "PERL5OPT" environment	variable. See "PROFILING".

       You could also try using	the "use_db_sub=1" option.

   Eval	... has	unknown	invoking fid
       When using the statement	profiler you may see a warning message like

	 Eval '(eval 2)' (fid 9, flags:viastmt,savesrc)	has unknown invoking fid 10

       Notice that the eval file id (fid 9 in this case) is lower than the
       file id that invoked the	eval (fid 10 in	this case). This is a known
       problem caused by the way perl works and	how the	profiler assigns and
       outputs the file	ids.  The invoking fid is known	but gets assigned a
       fid and output after the	fid for	the eval, and that causes the warning
       when the	file is	read.

   Warning: %d subroutine calls	had negative time
       There are two likely causes for this: clock instability,	or accumulated
       timing errors.

       Clock instability, if present on	your system, is	most likely to be
       noticeable on very small/fast subroutines that are called very few

       Accumulated timing errors can arise because the subroutine profiler
       uses floating point values (NVs)	to store the times.  They are most
       likely to be noticed on subroutines that	are called a few times but
       which make a large number of calls to very fast subroutines (such as
       opcodes). In this case the accumulated time apparently spent making
       those calls can be greater than the time	spent in the calling

       If you rerun nytprofhtml	(etc.) with the	"trace=N" option set >0	you'll
       see trace messages like	"%s call has negative time: incl %fs, excl
       %fs" for	each affected subroutine.

       Try profiling with the "slowops=N" option set to	0 to disable the
       profiling of opcodes. Since opcodes often execute in a few microseconds
       they are	a common cause of this warning.

       You could also try recompiling perl to use 'long	doubles' for the NV
       floating	point type (use	Configure -Duselongdouble). If you try this
       please let me know.  I'd	also happily take a patch to use long doubles,
       if available, by	default.

   panic: buffer overflow ...
       You have	unusually long subroutine names	in your	code. You'll need to
       rebuild Devel::NYTProf with the NYTP_MAX_SUB_NAME_LEN environment
       variable	set to a value longer than the longest subroutine names	in
       your code.

       Tim Bunce (<> and <>)
       leads the project and has done most of the development work thus	far.

       Nicholas	Clark contributed zip compression and "nytprofmerge".  Chia-
       liang Kao contributed "nytprofcg".  Peter (Stig)	Edwards	contributed
       the VMS port.  Jan Dubois contributed the Windows port.	Gisle Aas
       contributed the Devel::NYTProf::ReadStream module.  Steve Peters
       contributed greater perl	version	portability and	use of POSIX high-
       resolution clocks.  Other contributors are noted	in the Changes file.

       Many thanks to Adam Kaplan who created "NYTProf"	initially by forking
       "Devel::FastProf" adding	reporting from "Devel::Cover" and a test
       suite.  For more	details	see "HISTORY" below.

	 Copyright (C) 2008 by Adam Kaplan and The New York Times Company.
	 Copyright (C) 2008-2016 by Tim	Bunce, Ireland.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself, either Perl	version	5.8.8 or, at
       your option, any	later version of Perl 5	you may	have available.

       Subroutine-level	profilers:

	 Devel::DProf	     | 1995-10-31 | ILYAZ
	 Devel::AutoProfiler | 2002-04-07 | GSLONDON
	 Devel::Profiler     | 2002-05-20 | SAMTREGAR
	 Devel::Profile	     | 2003-04-13 | JAW
	 Devel::DProfLB	     | 2006-05-11 | JAW
	 Devel::WxProf	     | 2008-04-14 | MKUTTER

       Statement-level profilers:

	 Devel::SmallProf    | 1997-07-30 | ASHTED
	 Devel::FastProf     | 2005-09-20 | SALVA
	 Devel::NYTProf	     | 2008-03-04 | AKAPLAN
	 Devel::Profit	     | 2008-05-19 | LBROCARD

       Devel::NYTProf is a (now	distant) fork of Devel::FastProf, which	was
       itself an evolution of Devel::SmallProf.

       Adam Kaplan forked Devel::FastProf and added html report	generation
       (based on Devel::Cover) and a test suite	- a tricky thing to do for a
       profiler.  Meanwhile Tim	Bunce had been extending Devel::FastProf to
       add novel per-sub and per-block timing, plus subroutine caller

       When Devel::NYTProf was released	Tim switched to	working	on
       Devel::NYTProf because the html report would be a good way to show the
       extra profile data, and the test	suite made development much easier and

       Then Tim	went a little crazy and	added a	slew of	new features, in
       addition	to per-sub and per-block timing	and subroutine caller
       tracking. These included	the 'opcode interception' method of profiling,
       ultra-fast and robust inclusive subroutine timing, doubling
       performance, plus major changes to html reporting to display all	the
       extra profile call and timing data in richly annotated and cross-linked

       Steve Peters came on board along	the way	with patches for portability
       and to keep NYTProf working with	the latest development perl versions.
       Nicholas	Clark added zip	compression, many optimizations, and
       "nytprofmerge".	Jan Dubois contributed Windows support.

       Adam's work was sponsored by The	New York Times Co.
       <>.  Tim's work was partly sponsored by
       Shopzilla <> during 2008	but hasn't been
       sponsored since then.

       For the record, Tim has never worked for	the New	York Times nor has he
       received	any kind of sponsorship	or support from	them in	relation to
       NYTProf.	The name of this module	is simply result of the	history
       outlined	above, which can be summarised as: Adam	forked an existing
       module when he added his	enhancements and Tim didn't.

perl v5.32.0			  2018-06-04		     Devel::NYTProf(3)


Want to link to this manual page? Use this URL:

home | help