Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PT-STALK(1)	      User Contributed Perl Documentation	   PT-STALK(1)

       pt-stalk	- Collect forensic data	about MySQL when problems occur.

       Usage: pt-stalk [OPTIONS]

       pt-stalk	waits for a trigger condition to occur,	then collects data to
       help diagnose problems.	The tool is designed to	run as a daemon	with
       root privileges,	so that	you can	diagnose intermittent problems that
       you cannot observe directly.  You can also use it to execute a custom
       command,	or to collect data on demand without waiting for the trigger
       to occur.

       Percona Toolkit is mature, proven in the	real world, and	well tested,
       but all database	tools can pose a risk to the system and	the database
       server.	Before using this tool,	please:

       o   Read	the tool's documentation

       o   Review the tool's known "BUGS"

       o   Test	the tool on a non-production server

       o   Backup your production server and verify the	backups

       Sometimes a problem happens infrequently	and for	a short	time, giving
       you no chance to	see the	system when it happens.	How do you solve
       intermittent MySQL problems when	you can't observe them?	That's why pt-
       stalk exists. In	addition to using it when there's a known problem on
       your servers, it	is a good idea to run pt-stalk all the time, even when
       you think nothing is wrong.  You	will appreciate	the data it collects
       when a problem occurs, because problems such as MySQL lockups or	spikes
       in activity typically leave no evidence to use in root cause analysis.

       pt-stalk	does two things: it watches a MySQL server and waits for a
       trigger condition to occur, and it collects diagnostic data when	that
       trigger occurs.	To avoid false-positives caused	by short-lived
       problems, the trigger condition must be true at least "--cycles"	times
       before a	"--collect" is triggered.

       To use pt-stalk effectively, you	need to	define a good trigger.	A good
       trigger is sensitive enough to fire reliably when a problem occurs, so
       that you	don't miss a chance to solve problems.	On the other hand, a
       good trigger isn't prone	to false positives, so you don't gather
       information when	the server is functioning normally.

       The most	reliable triggers for MySQL tend to be the number of
       connections to the server, and the number of queries running
       concurrently. These are available in the	SHOW GLOBAL STATUS command as
       Threads_connected and Threads_running.  Sometimes Threads_connected is
       not a reliable indicator	of trouble, but	Threads_running	usually	is.
       Your job, as the	tool's user, is	to define an appropriate trigger
       condition for the tool.	Choose carefully, because the quality of your
       results will depend on the trigger you choose.

       You define the trigger with the "--function", "--variable",
       "--threshold", and "--cycles" options.  The default values for these
       options define a	reasonable trigger, but	you should adjust or change
       them to suite your particular system and	needs.

       By default, pt-stalk tool watches MySQL forever until the trigger
       occurs, then it collects	diagnostic data	for a while, and sleeps
       afterwards to avoid repeatedly collecting data if the trigger remains
       true.  The general order	of operations is:

	  while	true; do
	     if	--variable from	--function > --threshold; then
		if cycles_true >= --cycles; then
		   if --collect; then
		      if --disk-bytes-free and --disk-pct-free ok; then
			 (--collect for	--run-time seconds) &
		      rm files in --dest older than --retention-time
		if iter	< --iterations;	then
		   sleep --sleep seconds
		if iter	< --iterations;	then
		   sleep --interval seconds
	  rm old --dest	files older than --retention-time
	  if --collect process are still running; then
	     wait up to	--run-time * 3 seconds
	     kill any remaining	--collect processes

       The diagnostic data is written to files whose names begin with a
       timestamp, so you can distinguish samples from each other in case the
       tool collects data multiple times.  The pt-sift tool is designed	to
       help you	browse and analyze the resulting data samples.

       Although	this sounds simple enough, in practice there are a number of
       subtleties, such	as detecting when the disk is beginning	to fill	up so
       that the	tool doesn't cause the server to run out of disk space.	 This
       tool handles these types	of potential problems, so it's a good idea to
       use this	tool instead of	writing	something from scratch and possibly
       experiencing some of the	hazards	this tool is designed to avoid.

       You can use standard Percona Toolkit configuration files	to set command
       line options.

       You will	probably want to run the tool as a daemon and customize	at
       least the "--threshold".	 Here's	a sample configuration file for
       triggering when there are more than 20 queries running at once:


       If you don't run	the tool as root, then you will	need specify several
       options,	such as	"--pid", "--log", and "--dest",	else the tool will
       probably	fail to	start.

	   Prompt for a	password when connecting to MySQL.

	   default: yes; negatable: yes

	   Collect diagnostic data when	the trigger occurs.  Specify
	   "--no-collect" to make the tool watch the system but	not collect

	   See also "--stalk".

	   Collect GDB stacktraces.  This is achieved by attaching to MySQL
	   and printing	stack traces from all threads. This will freeze	the
	   server for some period of time, ranging from	a second or so to much
	   longer on very busy systems with a lot of memory and	many threads
	   in the server.  For this reason, it is disabled by default.
	   However, if you are trying to diagnose a server stall or lockup,
	   freezing the	server causes no additional harm, and the stack	traces
	   can be vital	for diagnosis.

	   In addition to freezing the server, there is	also some risk of the
	   server crashing or performing badly after GDB detaches from it.

	   Collect oprofile data.  This	is achieved by starting	an oprofile
	   session, letting it run for the collection time, and	then stopping
	   and saving the resulting profile data in the	system's default
	   location.  Please read your system's	oprofile documentation to
	   learn more about this.

	   Collect strace data.	This is	achieved by attaching strace to	the
	   server, which will make it run very slowly until strace detaches.
	   The same cautions apply as those listed in --collect-gdb.  You
	   should not enable this option together with --collect-gdb, because
	   GDB and strace can't	attach to the server process simultaneously.

	   Collect tcpdump data. This option causes tcpdump to capture all
	   traffic on all interfaces for the port on which MySQL is listening.
	   You can later use pt-query-digest to	decode the MySQL protocol and
	   extract a log of query traffic from it.

	   type: string

	   Read	this comma-separated list of config files.  If specified, this
	   must	be the first option on the command line.

	   type: int; default: 5

	   How many times "--variable" must be greater than "--threshold"
	   before triggering "--collect".  This	helps prevent false positives,
	   and makes the trigger condition less	likely to fire when the
	   problem recovers quickly.

	   Daemonize the tool.	This causes the	tool to	fork into the
	   background and log its output as specified in --log.

	   short form: -F; type: string

	   Only	read mysql options from	the given file.	 You must give an
	   absolute pathname.

	   type: string; default: /var/lib/pt-stalk

	   Where to save diagnostic data from "--collect".  Each time the tool
	   collects data, it writes to a new set of files, which are named
	   with	the current system timestamp.

	   type: size; default:	100M

	   Do not "--collect" if the disk has less than	this much free space.
	   This	prevents the tool from filling up the disk with	diagnostic

	   If the "--dest" directory contains a	previously captured sample of
	   data, the tool will measure its size	and use	that as	an estimate of
	   how much data is likely to be gathered this time, too.  It will
	   then	be even	more pessimistic, and will refuse to collect data
	   unless the disk has enough free space to hold the sample and	still
	   have	the desired amount of free space.  For example,	if you'd like
	   100MB of free space and the previous	diagnostic sample consumed
	   100MB, the tool won't collect any data unless the disk has 200MB

	   Valid size value suffixes are k, M, G, and T.

	   type: int; default: 5

	   Do not "--collect" if the disk has less than	this percent free
	   space.  This	prevents the tool from filling up the disk with
	   diagnostic data.

	   This	option works similarly to "--disk-bytes-free" but specifies a
	   percentage margin of	safety instead of a bytes margin of safety.
	   The tool honors both	options, and will not collect any data unless
	   both	margins	are satisfied.

	   type: string; default: status

	   What	to watch for the trigger.  The default value watches "SHOW
	   GLOBAL STATUS", but you can also watch "SHOW	PROCESSLIST" and
	   specify a file with your own	custom code.  This function supplies
	   the value of	"--variable", which is then compared against
	   "--threshold" to see	if the the trigger condition is	met.
	   Additional options may be required as well; see below. Possible
	   values are:

	   o   status

	       Watch "SHOW GLOBAL STATUS" for the trigger.  The	value of
	       "--variable" then defines which status counter is the trigger.

	   o   processlist

	       Watch "SHOW FULL	PROCESSLIST" for the trigger.  The trigger
	       value is	the count of processes whose "--variable" column
	       matches the "--match" option.  For example, to trigger
	       "--collect" when	more than 10 processes are in the "statistics"
	       state, specify:

		  --function processlist \
		  --variable State	 \
		  --match statistics	 \
		  --threshold 10

	   In addition,	you can	specify	a file that contains your custom
	   trigger function, written in	Unix shell script.  This can be	a
	   wrapper that	executes anything you wish.  If	the argument to
	   "--function"	is a file, then	it takes precedence over built-in
	   functions, so if there is a file in the working directory named
	   "status" or "processlist" then the tool will	use that file even
	   though are valid built-in values.

	   The file works by providing a function called "trg_plugin", and the
	   tool	simply sources the file	and executes the function.  For
	   example, the	file might contain:

	      trg_plugin() {
		   | grep -c "has waited at"

	   This	snippet	will count the number of mutex waits inside InnoDB.
	   It illustrates the general principle: the function must output a
	   number, which is then compared to "--threshold" as usual.  The
	   $EXT_ARGV variable contains the MySQL options mentioned in the
	   "SYNOPSIS" above.

	   The file should not alter the tool's	existing global	variables.
	   Prefix any file-specific global variables with "PLUGIN_" or make
	   them	local.

	   Print help and exit.

	   short form: -h; type: string

	   Host	to connect to.

	   type: int; default: 1

	   How often to	check the if trigger is	true, in seconds.

	   type: int

	   How many times to "--collect" diagnostic data.  By default, the
	   tool	runs forever and collects data every time the trigger occurs.
	   Specify "--iterations" to collect data a limited number of times.
	   This	option is also useful with "--no-stalk"	to collect data	once
	   and exit, for example.

	   type: string; default: /var/log/pt-stalk.log

	   Print all output to this file when daemonized.

	   type: string

	   The pattern to use when watching SHOW PROCESSLIST.  See
	   "--function"	for details.

	   type: string

	   Send	an email to these addresses for	every "--collect".

	   short form: -p; type: string

	   Password to use when	connecting.  If	password contains commas they
	   must	be escaped with	a backslash: "exam\,ple"

	   type: string; default: /var/run/

	   Create the given PID	file.  The tool	won't start if the PID file
	   already exists and the PID it contains is different than the
	   current PID.	 However, if the PID file exists and the PID it
	   contains is no longer running, the tool will	overwrite the PID file
	   with	the current PID.  The PID file is removed automatically	when
	   the tool exits.

	   type: string

	   Load	a plugin to hook into the tool and extend is functionality.
	   The specified file does not need to be executable, nor does its
	   first line need to be shebang line.	It only	needs to define	one or
	   more	of these Bash functions:

	       Called before stalking.

	       Called when the trigger occurs, before running a	"--collect"
	       subprocesses in the background.

	       Called after running a collector	process.  The PID of the
	       collector process is passed as the first	argument.  This	hook
	       is called before	"after_collect_sleep".

	       Called after sleeping "--sleep" seconds for the collector
	       process to finish.  This	hook is	called after "after_collect".

	       Called after sleeping "--interval" seconds after	each trigger

	       Called after stalking.  Since pt-stalk stalks forever by
	       default,	this hook is only called if "--iterations" is

	   For example,	a very simple plugin that touches a file when
	   "--collect" is triggered:

	      before_collect() {
		 touch /tmp/foo

	   Since the plugin is completely sourced (imported) into the tool's
	   namespace, be careful not to	define other functions or global
	   variables that already exist	in the tool.  You should prefix	all
	   plugin-specific functions and global	variables with "plugin_" or

	   Plugins have	access to all command line options but they should not
	   modify them.	 Each option is	a global variable like $OPT_DEST which
	   corresponds to "--dest".  Therefore,	the global variable for	each
	   command line	option is "OPT_" plus the option name in all caps with
	   hyphens replaced by underscores.

	   Plugins can stop the	tool by	setting	the global variable "OKTORUN"
	   to 1.  In this case,	the global variable "EXIT_REASON" should also
	   be set to indicate why the tool was stopped.

	   Plugin writers should keep in mind that the file destination	prefix
	   currently in	use should be accessed through the $prefix variable,
	   rather than $OPT_PREFIX.

	   Trigger only	MySQL related captures,	ignoring all others. The only
	   not MySQL related value being collected is the disk space, because
	   it is needed	to calculate the available free	disk space to write
	   the result files.  This option is useful for	RDS instances.

	   short form: -P; type: int

	   Port	number to use for connection.

	   type: string

	   The filename	prefix for diagnostic samples.	By default, all	files
	   created by the same "--collect" instance have a timestamp prefix
	   based on the	current	local time, like "2011_12_06_14_02_02",	which
	   is December 6, 2011 at 14:02:02.

	   type: int; default: 0

	   Keep	the data for the last N	runs. If N > 0,	the program will keep
	   the data for	the last N runs	and will delete	the older data.

	   type: int; default: 0

	   Keep	up to --retention-size MB of data. It will keep	at least 1 run
	   even	if the size is bigger than the specified in this parameter

	   type: int; default: 30

	   Number of days to retain collected samples.	Any samples that are
	   older will be purged.

	   type: int; default: 30

	   How long to "--collect" diagnostic data when	the trigger occurs.
	   The value is	in seconds and should not be longer than "--sleep".
	   It is usually not necessary to change this; if the default 30
	   seconds doesn't collect enough data,	running	longer is not likely
	   to help because the system or MySQL server is probably too busy to
	   respond.  In	fact, in many cases a shorter collection period	is

	   This	value is used two other	times.	After collecting, the collect
	   subprocess will wait	another	"--run-time" seconds for its commands
	   to finish.  Some commands can take awhile if	the system is running
	   very	slowly (which can likely be the	case given that	a collection
	   was triggered).  Since empty	files are deleted, the extra wait
	   gives commands time to finish and write their data.	The value is
	   potentially used again just before the tool exits to	wait again for
	   any collect subprocesses to finish.	In most	cases this won't
	   happen because of the aforementioned	extra wait.  If	it happens,
	   the tool will log "Waiting up to N seconds for subprocesses to
	   finish..." where N is three times "--run-time".  In both cases,
	   after waiting, the tool kills all of	its subprocesses.

	   type: int; default: 300

	   How long to sleep after "--collect".	 This prevents the tool	from
	   triggering continuously, which might	be a problem if	the collection
	   process is intrusive.  It also prevents filling up the disk or
	   gathering too much data to analyze reasonably.

	   type: int; default: 1

	   How long to sleep between collection	loop cycles.  This is useful
	   with	"--no-stalk" to	do long	collections.  For example, to collect
	   data	every minute for an hour, specify: "--no-stalk --run-time 3600
	   --sleep-collect 60".

	   short form: -S; type: string

	   Socket file to use for connection.

	   default: yes; negatable: yes

	   Watch the server and	wait for the trigger to	occur.	Specify
	   "--no-stalk"	to collect diagnostic data immediately,	that is,
	   without waiting for the trigger to occur.  You probably also	want
	   to specify values for "--interval", "--iterations", and "--sleep".
	   For example,	to immediately collect data for	1 minute then exit,

	      --no-stalk --run-time 60 --iterations 1

	   "--cycles", "--daemonize", "--log" and "--pid" have no effect with
	   "--no-stalk".  Safeguard options, like "--disk-bytes-free" and
	   "--disk-pct-free", are still	respected.

	   See also "--collect".

	   type: int; default: 25

	   The maximum acceptable value	for "--variable".  "--collect" is
	   triggered when the value of "--variable" is greater than
	   "--threshold" for "--cycles"	many times.  Currently,	there is no
	   way to define a lower threshold to check for	a "--variable" value
	   that	is too low.

	   See also "--function".

	   short form: -u; type: string

	   User	for login if not current user.

	   type: string; default: Threads_running

	   The variable	to compare against "--threshold".  See also

	   type: int; default: 2

	   Print more or less information while	running.  Since	the tool is
	   designed to be a long-running daemon, the default verbosity level
	   only	prints the most	important information.	If you run the tool
	   interactively, you may want to use a	higher verbosity level.

	     ===== =====================================
	     0	   Errors
	     1	   Warnings
	     2	   Matching triggers and collection info
	     3	   Non-matching	triggers

	   Print tool's	version	and exit.

       This tool does not require any environment variables for	configuration,
       although	it can be influenced to	work differently by through several
       variables.  Keep	in mind	that these are expert settings,	and should not
       be used in most cases.

       Specifically, the variables that	can be set are:


       For example, during collection iostat is	called with a -dx argument,
       but because you have an NFS partition, you also need the	-n flag	there.
       Instead of editing the source, you can call pt-stalk as

	   CMD_IOSTAT="iostat -n" pt-stalk ...

       which will do exactly what you need.  Combined with the plugin hooks,
       this gives you a	fine-grained control of	what the tool does.

       It is possible to enable	"debug"	mode in	mysqladmin specifying:

       "CMD_MYSQLADMIN='mysqladmin debug' pt-stalk params ..."

       This tool requires Bash v3 or newer.  Certain options require other

       "--collect-gdb" requires	"gdb"
       "--collect-oprofile" requires "opcontrol" and "opreport"
       "--collect-strace" requires "strace"
       "--collect-tcpdump" requires "tcpdump"

       For a list of known bugs, see <>.

       Please report bugs at <>.  Include
       the following information in your bug report:

       o   Complete command-line used to run the tool

       o   Tool	"--version"

       o   MySQL version of all	servers	involved

       o   Output from the tool	including STDERR

       o   Input files (log/dump/config	files, etc.)

       If possible, include debugging output by	running	the tool with
       "PTDEBUG"; see "ENVIRONMENT".

       Visit <>	to download
       the latest release of Percona Toolkit.  Or, get the latest release from
       the command line:




       You can also get	individual tools from the latest release:


       Replace "TOOL" with the name of any tool.

       Baron Schwartz, Justin Swanhart,	Fernando Ipar, Daniel Nichter, and
       Brian Fraser

       This tool is part of Percona Toolkit, a collection of advanced command-
       line tools for MySQL developed by Percona.  Percona Toolkit was forked
       from two	projects in June, 2011:	Maatkit	and Aspersa.  Those projects
       were created by Baron Schwartz and primarily developed by him and
       Daniel Nichter.	Visit <> to learn
       about other free, open-source software from Percona.

       This program is copyright 2011-2018 Percona LLC and/or its affiliates,
       2010-2011 Baron Schwartz.


       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation, version 2; OR the Perl	Artistic License.  On
       UNIX and	similar	systems, you can issue `man perlgpl' or	`man
       perlartistic' to	read these licenses.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place,	Suite 330, Boston, MA  02111-1307  USA.

       pt-stalk	3.2.0

perl v5.32.1			  2020-04-23			   PT-STALK(1)


Want to link to this manual page? Use this URL:

home | help