Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
PT-QUERY-DIGEST(1)    User Contributed Perl Documentation   PT-QUERY-DIGEST(1)

NAME
       pt-query-digest - Analyze MySQL queries from logs, processlist, and
       tcpdump.

SYNOPSIS
       Usage: pt-query-digest [OPTIONS]	[FILES]	[DSN]

       pt-query-digest analyzes	MySQL queries from slow, general, and binary
       log files.  It can also analyze queries from "SHOW PROCESSLIST" and
       MySQL protocol data from	tcpdump.  By default, queries are grouped by
       fingerprint and reported	in descending order of query time (i.e.	the
       slowest queries first).	If no "FILES" are given, the tool reads
       "STDIN".	 The optional "DSN" is used for	certain	options	like "--since"
       and "--until".

       Report the slowest queries from "slow.log":

	  pt-query-digest slow.log

       Report the slowest queries from the processlist on host1:

	  pt-query-digest --processlist	h=host1

       Capture MySQL protocol data with	tcppdump, then report the slowest
       queries:

	  tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt

	  pt-query-digest --type tcpdump mysql.tcp.txt

       Save query data from "slow.log" to host2	for later review and trend
       analysis:

	  pt-query-digest --review h=host2 --no-report slow.log

RISKS
       Percona Toolkit is mature, proven in the	real world, and	well tested,
       but all database	tools can pose a risk to the system and	the database
       server.	Before using this tool,	please:

       o   Read	the tool's documentation

       o   Review the tool's known "BUGS"

       o   Test	the tool on a non-production server

       o   Backup your production server and verify the	backups

DESCRIPTION
       pt-query-digest is a sophisticated but easy to use tool for analyzing
       MySQL queries.  It can analyze queries from MySQL slow, general,	and
       binary logs. (Binary logs must first be converted to text, see
       "--type").  It can also use "SHOW PROCESSLIST" and MySQL	protocol data
       from tcpdump.  By default, the tool reports which queries are the
       slowest,	and therefore the most important to optimize.  More complex
       and custom-tailored reports can be created by using options like
       "--group-by", "--filter", and "--embedded-attributes".

       Query analysis is a best-practice that should be	done frequently.  To
       make this easier, pt-query-digest has two features: query review
       ("--review") and	query history ("--history").  When the "--review"
       option is used, all unique queries are saved to a database.  When the
       tool is ran again with "--review", queries marked as reviewed in	the
       database	are not	printed	in the report.	This highlights	new queries
       that need to be reviewed.  When the "--history" option is used, query
       metrics (query time, lock time, etc.) for each unique query are saved
       to database.  Each time the tool	is ran with "--history", the more
       historical data is saved	which can be used to trend and analyze query
       performance over	time.

ATTRIBUTES
       pt-query-digest works on	events,	which are a collection of key-value
       pairs called attributes.	 You'll	recognize most of the attributes right
       away: "Query_time", "Lock_time",	and so on.  You	can just look at a
       slow log	and see	them.  However,	there are some that don't exist	in the
       slow log, and slow logs may actually include different kinds of
       attributes (for example,	you may	have a server with the Percona
       patches).

       See "ATTRIBUTES REFERENCE" near the end of this documentation for a
       list of common and "--type" specific attributes.	 A familiarity with
       these attributes	is necessary for working with "--filter",
       "--ignore-attributes", and other	attribute-related options.

       With creative use of "--filter",	you can	create new attributes derived
       from existing attributes.  For example, to create an attribute called
       "Row_ratio" for examining the ratio of "Rows_sent" to "Rows_examined",
       specify a filter	like:

	 --filter '($event->{Row_ratio}	= $event->{Rows_sent} /	($event->{Rows_examined})) && 1'

       The "&& 1" trick	is needed to create a valid one-line syntax that is
       always true, even if the	assignment happens to evaluate false.  The new
       attribute will automatically appears in the output:

	 # Row ratio	    1.00    0.00      1	   0.50	     1	  0.71	  0.50

       Attributes created this way can be specified for	"--order-by" or	any
       option that requires an attribute.

OUTPUT
       The default "--output" is a query analysis report.  The "--[no]report"
       option controls whether or not this report is printed.  Sometimes you
       may want	to parse all the queries but suppress the report, for example
       when using "--review" or	"--history".

       There is	one paragraph for each class of	query analyzed.	 A "class" of
       queries all have	the same value for the "--group-by" attribute which is
       "fingerprint" by	default.  (See "ATTRIBUTES".)  A fingerprint is	an
       abstracted version of the query text with literals removed, whitespace
       collapsed, and so forth.	 The report is formatted so it's easy to paste
       into emails without wrapping, and all non-query lines begin with	a
       comment,	so you can save	it to a	.sql file and open it in your favorite
       syntax-highlighting text	editor.	 There is a response-time profile at
       the beginning.

       The output described here is controlled by "--report-format".  That
       option allows you to specify what to print and in what order.  The
       default output in the default order is described	here.

       The report, by default, begins with a paragraph about the entire
       analysis	run The	information is very similar to what you'll see for
       each class of queries in	the log, but it	doesn't	have some information
       that would be too expensive to keep globally for	the analysis.  It also
       has some	statistics about the code's execution itself, such as the CPU
       and memory usage, the local date	and time of the	run, and a list	of
       input file read/parsed.

       Following this is the response-time profile over	the events.  This is a
       highly summarized view of the unique events in the detailed query
       report that follows.  It	contains the following columns:

	Column	      Meaning
	============  ==========================================================
	Rank	      The query's rank within the entire set of	queries	analyzed
	Query ID      The query's fingerprint
	Response time The total	response time, and percentage of overall total
	Calls	      The number of times this query was executed
	R/Call	      The mean response	time per execution
	V/M	      The Variance-to-mean ratio of response time
	Item	      The distilled query

       A final line whose rank is shown	as MISC	contains aggregate statistics
       on the queries that were	not included in	the report, due	to options
       such as "--limit" and "--outliers".  For	details	on the variance-to-
       mean ratio, please see
       http://en.wikipedia.org/wiki/Index_of_dispersion.

       Next, the detailed query	report is printed.  Each query appears in a
       paragraph.  Here	is a sample, slightly reformatted so 'perldoc' will
       not wrap	lines in a terminal.  The following will all be	one paragraph,
       but we'll break it up for commentary.

	# Query	2: 0.01	QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665

       This line identifies the	sequential number of the query in the sort
       order specified by "--order-by".	 Then there's the queries per second,
       and the approximate concurrency for this	query (calculated as a
       function	of the timespan	and total Query_time).	Next there's a query
       ID.  This ID is a hex version of	the query's checksum in	the database,
       if you're using "--review".  You	can select the reviewed	query's
       details from the	database with a	query like "SELECT ....	WHERE
       checksum=0xFDEA8D2993C9CAF3".

       If you are investigating	the report and want to print out every sample
       of a particular query, then the following "--filter" may	be helpful:

	  pt-query-digest slow.log	     \
	     --no-report		     \
	     --output slowlog		     \
	     --filter '$event->{fingerprint} \
		  && make_checksum($event->{fingerprint}) eq "FDEA8D2993C9CAF3"'

       Notice that you must remove the "0x" prefix from	the checksum.

       Finally,	in case	you want to find a sample of the query in the log
       file, there's the byte offset where you can look.  (This	is not always
       accurate, due to	some anomalies in the slow log format, but it's
       usually right.)	The position refers to the worst sample, which we'll
       see more	about below.

       Next is the table of metrics about this class of	queries.

	#	    pct	  total	   min	  max	  avg	  95%  stddev  median
	# Count	      0	      2
	# Exec time  13	  1105s	  552s	 554s	 553s	 554s	   2s	 553s
	# Lock time   0	  216us	  99us	117us	108us	117us	 12us	108us
	# Rows sent  20	  6.26M	 3.13M	3.13M	3.13M	3.13M	12.73	3.13M
	# Rows exam   0	  6.26M	 3.13M	3.13M	3.13M	3.13M	12.73	3.13M

       The first line is column	headers	for the	table.	The percentage is the
       percent of the total for	the whole analysis run,	and the	total is the
       actual value of the specified metric.  For example, in this case	we can
       see that	the query executed 2 times, which is 13% of the	total number
       of queries in the file.	The min, max and avg columns are self-
       explanatory.  The 95% column shows the 95th percentile; 95% of the
       values are less than or equal to	this value.  The standard deviation
       shows you how tightly grouped the values	are.  The standard deviation
       and median are both calculated from the 95th percentile,	discarding the
       extremely large values.

       The stddev, median and 95th percentile statistics are approximate.
       Exact statistics	require	keeping	every value seen, sorting, and doing
       some calculations on them.  This	uses a lot of memory.  To avoid	this,
       we keep 1000 buckets, each of them 5% bigger than the one before,
       ranging from .000001 up to a very big number.  When we see a value we
       increment the bucket into which it falls.  Thus we have fixed memory
       per class of queries.  The drawback is the imprecision, which typically
       falls in	the 5 percent range.

       Next we have statistics on the users, databases and time	range for the
       query.

	# Users	      1	  user1
	# Databases   2	    db1(1), db2(1)
	# Time range 2008-11-26	04:55:18 to 2008-11-27 00:15:15

       The users and databases are shown as a count of distinct	values,
       followed	by the values.	If there's only	one, it's shown	alone; if
       there are many, we show each of the most	frequent ones, followed	by the
       number of times it appears.

	# Query_time distribution
	#   1us
	#  10us
	# 100us
	#   1ms
	#  10ms	 #####
	# 100ms	 ####################
	#    1s	 ##########
	#  10s+

       The execution times show	a logarithmic chart of time clustering.	 Each
       query goes into one of the "buckets" and	is counted up.	The buckets
       are powers of ten.  The first bucket is all values in the "single
       microsecond range" -- that is, less than	10us.  The second is "tens of
       microseconds," which is from 10us up to (but not	including) 100us; and
       so on.  The charted attribute can be changed by specifying
       "--report-histogram" but	is limited to time-based attributes.

	# Tables
	#    SHOW TABLE	STATUS LIKE 'table1'\G
	#    SHOW CREATE TABLE `table1`\G
	# EXPLAIN
	SELECT * FROM table1\G

       This section is a convenience: if you're	trying to optimize the queries
       you see in the slow log,	you probably want to examine the table
       structure and size.  These are copy-and-paste-ready commands to do
       that.

       Finally,	we see a sample	of the queries in this class of	query.	This
       is not a	random sample.	It is the query	that performed the worst,
       according to the	sort order given by "--order-by".  You will normally
       see a commented "# EXPLAIN" line	just before it,	so you can copy-paste
       the query to examine its	EXPLAIN	plan. But for non-SELECT queries that
       isn't possible to do, so	the tool tries to transform the	query into a
       roughly equivalent SELECT query,	and adds that below.

       If you want to find this	sample event in	the log, use the offset
       mentioned above,	and something like the following:

	 tail -c +<offset> /path/to/file | head

       See also	"--report-format".

QUERY REVIEW
       A query "--review" is the process of storing all	the query fingerprints
       analyzed.  This has several benefits:

       o   You can add metadata	to classes of queries, such as marking them
	   for follow-up, adding notes to queries, or marking them with	an
	   issue ID for	your issue tracking system.

       o   You can refer to the	stored values on subsequent runs so you'll
	   know	whether	you've seen a query before.  This can help you cut
	   down	on duplicated work.

       o   You can store historical data such as the row count,	query times,
	   and generally anything you can see in the report.

       To use this feature, you	run pt-query-digest with the "--review"
       option.	It will	store the fingerprints and other information into the
       table you specify.  Next	time you run it	with the same option, it will
       do the following:

       o   It won't show you queries you've already reviewed.  A query is
	   considered to be already reviewed if	you've set a value for the
	   "reviewed_by" column.  (If you want to see queries you've already
	   reviewed, use the "--report-all" option.)

       o   Queries that	you've reviewed, and don't appear in the output, will
	   cause gaps in the query number sequence in the first	line of	each
	   paragraph.  And the value you've specified for "--limit" will still
	   be honored.	So if you've reviewed all queries in the top 10	and
	   you ask for the top 10, you won't see anything in the output.

       o   If you want to see the queries you've already reviewed, you can
	   specify "--report-all".  Then you'll	see the	normal analysis
	   output, but you'll also see the information from the	review table,
	   just	below the execution time graph.	 For example,

	     # Review information
	     #	    comments: really bad IN() subquery,	fix soon!
	     #	  first_seen: 2008-12-01 11:48:57
	     #	 jira_ticket: 1933
	     #	   last_seen: 2008-12-18 11:49:07
	     #	    priority: high
	     #	 reviewed_by: xaprb
	     #	 reviewed_on: 2008-12-18 15:03:11

	   This	metadata is useful because, as you analyze your	queries, you
	   get your comments integrated	right into the report.

FINGERPRINTS
       A query fingerprint is the abstracted form of a query, which makes it
       possible	to group similar queries together.  Abstracting	a query
       removes literal values, normalizes whitespace, and so on.  For example,
       consider	these two queries:

	 SELECT	name, password FROM user WHERE id='12823';
	 select	name,	password from user
	    where id=5;

       Both of those queries will fingerprint to

	 select	name, password from user where id=?

       Once the	query's	fingerprint is known, we can then talk about a query
       as though it represents all similar queries.

       What "pt-query-digest" does is analogous	to a GROUP BY statement	in
       SQL.  (But note that "multiple columns" doesn't define a	multi-column
       grouping; it defines multiple reports!) If your command-line looks like
       this,

	 pt-query-digest	       \
	     --group-by	fingerprint    \
	     --order-by	Query_time:sum \
	     --limit 10		       \
	     slow.log

       The corresponding pseudo-SQL looks like this:

	 SELECT	WORST(query BY Query_time), SUM(Query_time), ...
	 FROM /path/to/slow.log
	 GROUP BY FINGERPRINT(query)
	 ORDER BY SUM(Query_time) DESC
	 LIMIT 10

       You can also use	the value "distill", which is a	kind of	super-
       fingerprint.  See "--group-by" for more.

       Query fingerprinting accommodates many special cases, which have	proven
       necessary in the	real world.  For example, an "IN" list with 5 literals
       is really equivalent to one with	4 literals, so lists of	literals are
       collapsed to a single one.  If you find something that is not
       fingerprinted properly, please submit a bug report with a reproducible
       test case.

       Here is a list of transformations during	fingerprinting,	which might
       not be exhaustive:

       o   Group all SELECT queries from mysqldump together, even if they are
	   against different tables.  The same applies to all queries from pt-
	   table-checksum.

       o   Shorten multi-value INSERT statements to a single VALUES() list.

       o   Strip comments.

       o   Abstract the	databases in USE statements, so	all USE	statements are
	   grouped together.

       o   Replace all literals, such as quoted	strings.  For efficiency, the
	   code	that replaces literal numbers is somewhat non-selective, and
	   might replace some things as	numbers	when they really are not.
	   Hexadecimal literals	are also replaced.  NULL is treated as a
	   literal.  Numbers embedded in identifiers are also replaced,	so
	   tables named	similarly will be fingerprinted	to the same values
	   (e.g. users_2009 and	users_2010 will	fingerprint identically).

       o   Collapse all	whitespace into	a single space.

       o   Lowercase the entire	query.

       o   Replace all literals	inside of IN() and VALUES() lists with a
	   single placeholder, regardless of cardinality.

       o   Collapse multiple identical UNION queries into a single one.

OPTIONS
       This tool accepts additional command-line arguments.  Refer to the
       "SYNOPSIS" and usage information	for details.

       --ask-pass
	   Prompt for a	password when connecting to MySQL.

       --attribute-aliases
	   type: array;	default: db|Schema

	   List	of attribute|alias,etc.

	   Certain attributes have multiple names, like	db and Schema.	If an
	   event does not have the primary attribute, pt-query-digest looks
	   for an alias	attribute.  If it finds	an alias, it creates the
	   primary attribute with the alias attribute's	value and removes the
	   alias attribute.

	   If the event	has the	primary	attribute, all alias attributes	are
	   deleted.

	   This	helps simplify event attributes	so that, for example, there
	   will	not be report lines for	both db	and Schema.

       --attribute-value-limit
	   type: int; default: 0

	   A sanity limit for attribute	values.

	   This	option deals with bugs in slow logging functionality that
	   causes large	values for attributes.	If the attribute's value is
	   bigger than this, the last-seen value for that class	of query is
	   used	instead.  Disabled by default.

       --charset
	   short form: -A; type: string

	   Default character set.  If the value	is utf8, sets Perl's binmode
	   on STDOUT to	utf8, passes the mysql_enable_utf8 option to
	   DBD::mysql, and runs	SET NAMES UTF8 after connecting	to MySQL.  Any
	   other value sets binmode on STDOUT without the utf8 layer, and runs
	   SET NAMES after connecting to MySQL.

       --config
	   type: Array

	   Read	this comma-separated list of config files; if specified, this
	   must	be the first option on the command line.

       --[no]continue-on-error
	   default: yes

	   Continue parsing even if there is an	error.	The tool will not
	   continue forever: it	stops once any process causes 100 errors, in
	   which case there is probably	a bug in the tool or the input is
	   invalid.

       --[no]create-history-table
	   default: yes

	   Create the "--history" table	if it does not exist.

	   This	option causes the table	specified by "--history" to be created
	   with	the default structure shown in the documentation for
	   "--history".

       --[no]create-review-table
	   default: yes

	   Create the "--review" table if it does not exist.

	   This	option causes the table	specified by "--review"	to be created
	   with	the default structure shown in the documentation for
	   "--review".

       --daemonize
	   Fork	to the background and detach from the shell.  POSIX operating
	   systems only.

       --database
	   short form: -D; type: string

	   Connect to this database.

       --defaults-file
	   short form: -F; type: string

	   Only	read mysql options from	the given file.	 You must give an
	   absolute pathname.

       --embedded-attributes
	   type: array

	   Two Perl regex patterns to capture pseudo-attributes	embedded in
	   queries.

	   Embedded attributes might be	special	attribute-value	pairs that
	   you've hidden in comments.  The first regex should match the	entire
	   set of attributes (in case there are	multiple).  The	second regex
	   should match	and capture attribute-value pairs from the first
	   regex.

	   For example,	suppose	your query looks like the following:

	     SELECT * from users -- file: /login.php, line: 493;

	   You might run pt-query-digest with the following option:

	     pt-query-digest --embedded-attributes ' --	.*','(\w+): ([^\,]+)'

	   The first regular expression	captures the whole comment:

	     " -- file:	/login.php, line: 493;"

	   The second one splits it into attribute-value pairs and adds	them
	   to the event:

	      ATTRIBUTE	 VALUE
	      =========	 ==========
	      file	 /login.php
	      line	 493

	   NOTE: All commas in the regex patterns must be escaped with \
	   otherwise the pattern will break.

       --expected-range
	   type: array;	default: 5,10

	   Explain items when there are	more or	fewer than expected.

	   Defines the number of items expected	to be seen in the report given
	   by "--[no]report", as controlled by "--limit" and "--outliers".  If
	   there  are more or fewer items in the report, each one will explain
	   why it was included.

       --explain
	   type: DSN

	   Run EXPLAIN for the sample query with this DSN and print results.

	   This	works only when	"--group-by" includes fingerprint.  It causes
	   pt-query-digest to run EXPLAIN and include the output into the
	   report.  For	safety,	queries	that appear to have a subquery that
	   EXPLAIN will	execute	won't be EXPLAINed.  Those are typically
	   "derived table" queries of the form

	     select ...	from ( select .... ) der;

	   The EXPLAIN results are printed as a	full vertical format in	the
	   event report, which appears at the end of each event	report in
	   vertical style ("\G") just like MySQL prints	it.

       --filter
	   type: string

	   Discard events for which this Perl code doesn't return true.

	   This	option is a string of Perl code	or a file containing Perl code
	   that	gets compiled into a subroutine	with one argument: $event.
	   This	is a hashref.  If the given value is a readable	file, then pt-
	   query-digest	reads the entire file and uses its contents as the
	   code.  The file should not contain a	shebang	(#!/usr/bin/perl)
	   line.

	   If the code returns true, the chain of callbacks continues;
	   otherwise it	ends.  The code	is the last statement in the
	   subroutine other than "return $event".  The subroutine template is:

	     sub { $event = shift; filter && return $event; }

	   Filters given on the	command	line are wrapped inside	parentheses
	   like	like "(	filter )".  For	complex, multi-line filters, you must
	   put the code	inside a file so it will not be	wrapped	inside
	   parentheses.	 Either	way, the filter	must produce syntactically
	   valid code given the	template.  For example,	an if-else branch
	   given on the	command	line would not be valid:

	     --filter 'if () { } else {	}'  # WRONG

	   Since it's given on the command line, the if-else branch would be
	   wrapped inside parentheses which is not syntactically valid.	 So to
	   accomplish something	more complex like this would require putting
	   the code in a file, for example filter.txt:

	     my	$event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok

	   Then	specify	"--filter filter.txt" to read the code from
	   filter.txt.

	   If the filter code won't compile, pt-query-digest will die with an
	   error.  If the filter code does compile, an error may still occur
	   at runtime if the code tries	to do something	wrong (like pattern
	   match an undefined value).  pt-query-digest does not	provide	any
	   safeguards so code carefully!

	   An example filter that discards everything but SELECT statements:

	     --filter '$event->{arg} =~	m/^select/i'

	   This	is compiled into a subroutine like the following:

	     sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }

	   It is permissible for the code to have side effects (to alter
	   $event).

	   See "ATTRIBUTES REFERENCE" for a list of common and "--type"
	   specific attributes.

	   Here	are more examples of filter code:

	   Host/IP matches domain.com
	       --filter	'($event->{host} || $event->{ip} || "")	=~
	       m/domain.com/'

	       Sometimes MySQL logs the	host where the IP is expected.
	       Therefore, we check both.

	   User	matches	john
	       --filter	'($event->{user} || "")	=~ m/john/'

	   More	than 1 warning
	       --filter	'($event->{Warning_count} || 0)	> 1'

	   Query does full table scan or full join
	       --filter	'(($event->{Full_scan} || "") eq "Yes")	||
	       (($event->{Full_join} ||	"") eq "Yes")'

	   Query was not served	from query cache
	       --filter	'($event->{QC_Hit} || "") eq "No"'

	   Query is 1 MB or larger
	       --filter	'$event->{bytes} >= 1_048_576'

	   Since "--filter" allows you to alter	$event,	you can	use it to do
	   other things, like create new attributes.  See "ATTRIBUTES" for an
	   example.

       --group-by
	   type: Array;	default: fingerprint

	   Which attribute of the events to group by.

	   In general, you can group queries into classes based	on any
	   attribute of	the query, such	as "user" or "db", which will by
	   default show	you which users	and which databases get	the most
	   "Query_time".  The default attribute, "fingerprint",	groups
	   similar, abstracted queries into classes; see below and see also
	   "FINGERPRINTS".

	   A report is printed for each	"--group-by" value (unless
	   "--no-report" is given).  Therefore,	"--group-by user,db" means
	   "report on queries with the same user and report on queries with
	   the same db"; it does not mean "report on queries with the same
	   user	and db."  See also "OUTPUT".

	   Every value must have a corresponding value in the same position in
	   "--order-by".  However, adding values to "--group-by" will
	   automatically add values to "--order-by", for your convenience.

	   There are several magical values that cause some extra data mining
	   to happen before the	grouping takes place:

	   fingerprint
	       This causes events to be	fingerprinted to abstract queries into
	       a canonical form, which is then used to group events together
	       into a class.  See "FINGERPRINTS" for more about
	       fingerprinting.

	   tables
	       This causes events to be	inspected for what appear to be
	       tables, and then	aggregated by that.  Note that a query that
	       contains	two or more tables will	be counted as many times as
	       there are tables; so a join against two tables will count the
	       Query_time against both tables.

	   distill
	       This is a sort of super-fingerprint that	collapses queries down
	       into a suggestion of what they do, such as "INSERT SELECT
	       table1 table2".

       --help
	   Show	help and exit.

       --history
	   type: DSN

	   Save	metrics	for each query class in	the given table.  pt-query-
	   digest saves	query metrics (query time, lock	time, etc.) to this
	   table so you	can see	how query classes change over time.

	   The default table is	"percona_schema.query_history".	 Specify
	   database (D)	and table (t) DSN options to override the default.
	   The database	and table are automatically created unless
	   "--no-create-history-table" is specified (see
	   "--[no]create-history-table").

	   pt-query-digest inspects the	columns	in the table.  The table must
	   have	at least the following columns:

	     CREATE TABLE query_review_history (
	       checksum	    CHAR(32) NOT NULL,
	       sample	    TEXT NOT NULL
	     );

	   Any columns not mentioned above are inspected to see	if they	follow
	   a certain naming convention.	 The column is special if the name
	   ends	with an	underscore followed by any of these values:

	     pct|avg|cnt|sum|min|max|pct_95|stddev|median|rank

	   If the column ends with one of those	values,	then the prefix	is
	   interpreted as the event attribute to store in that column, and the
	   suffix is interpreted as the	metric to be stored.  For example, a
	   column named	"Query_time_min" will be used to store the minimum
	   "Query_time"	for the	class of events.

	   The table should also have a	primary	key, but that is up to you,
	   depending on	how you	want to	store the historical data.  We suggest
	   adding ts_min and ts_max columns and	making them part of the
	   primary key along with the checksum.	 But you could also just add a
	   ts_min column and make it a DATE type, so you'd get one row per
	   class of queries per	day.

	   The following table definition is used for
	   "--[no]create-history-table":

	    CREATE TABLE IF NOT	EXISTS query_history (
	      checksum		   CHAR(32) NOT	NULL,
	      sample		   TEXT	NOT NULL,
	      ts_min		   DATETIME,
	      ts_max		   DATETIME,
	      ts_cnt		   FLOAT,
	      Query_time_sum	   FLOAT,
	      Query_time_min	   FLOAT,
	      Query_time_max	   FLOAT,
	      Query_time_pct_95	   FLOAT,
	      Query_time_stddev	   FLOAT,
	      Query_time_median	   FLOAT,
	      Lock_time_sum	   FLOAT,
	      Lock_time_min	   FLOAT,
	      Lock_time_max	   FLOAT,
	      Lock_time_pct_95	   FLOAT,
	      Lock_time_stddev	   FLOAT,
	      Lock_time_median	   FLOAT,
	      Rows_sent_sum	   FLOAT,
	      Rows_sent_min	   FLOAT,
	      Rows_sent_max	   FLOAT,
	      Rows_sent_pct_95	   FLOAT,
	      Rows_sent_stddev	   FLOAT,
	      Rows_sent_median	   FLOAT,
	      Rows_examined_sum	   FLOAT,
	      Rows_examined_min	   FLOAT,
	      Rows_examined_max	   FLOAT,
	      Rows_examined_pct_95 FLOAT,
	      Rows_examined_stddev FLOAT,
	      Rows_examined_median FLOAT,
	      -- Percona extended slowlog attributes
	      -- http://www.percona.com/docs/wiki/patches:slow_extended
	      Rows_affected_sum		    FLOAT,
	      Rows_affected_min		    FLOAT,
	      Rows_affected_max		    FLOAT,
	      Rows_affected_pct_95	    FLOAT,
	      Rows_affected_stddev	    FLOAT,
	      Rows_affected_median	    FLOAT,
	      Rows_read_sum		    FLOAT,
	      Rows_read_min		    FLOAT,
	      Rows_read_max		    FLOAT,
	      Rows_read_pct_95		    FLOAT,
	      Rows_read_stddev		    FLOAT,
	      Rows_read_median		    FLOAT,
	      Merge_passes_sum		    FLOAT,
	      Merge_passes_min		    FLOAT,
	      Merge_passes_max		    FLOAT,
	      Merge_passes_pct_95	    FLOAT,
	      Merge_passes_stddev	    FLOAT,
	      Merge_passes_median	    FLOAT,
	      InnoDB_IO_r_ops_min	    FLOAT,
	      InnoDB_IO_r_ops_max	    FLOAT,
	      InnoDB_IO_r_ops_pct_95	    FLOAT,
	      InnoDB_IO_r_ops_stddev	    FLOAT,
	      InnoDB_IO_r_ops_median	    FLOAT,
	      InnoDB_IO_r_bytes_min	    FLOAT,
	      InnoDB_IO_r_bytes_max	    FLOAT,
	      InnoDB_IO_r_bytes_pct_95	    FLOAT,
	      InnoDB_IO_r_bytes_stddev	    FLOAT,
	      InnoDB_IO_r_bytes_median	    FLOAT,
	      InnoDB_IO_r_wait_min	    FLOAT,
	      InnoDB_IO_r_wait_max	    FLOAT,
	      InnoDB_IO_r_wait_pct_95	    FLOAT,
	      InnoDB_IO_r_wait_stddev	    FLOAT,
	      InnoDB_IO_r_wait_median	    FLOAT,
	      InnoDB_rec_lock_wait_min	    FLOAT,
	      InnoDB_rec_lock_wait_max	    FLOAT,
	      InnoDB_rec_lock_wait_pct_95   FLOAT,
	      InnoDB_rec_lock_wait_stddev   FLOAT,
	      InnoDB_rec_lock_wait_median   FLOAT,
	      InnoDB_queue_wait_min	    FLOAT,
	      InnoDB_queue_wait_max	    FLOAT,
	      InnoDB_queue_wait_pct_95	    FLOAT,
	      InnoDB_queue_wait_stddev	    FLOAT,
	      InnoDB_queue_wait_median	    FLOAT,
	      InnoDB_pages_distinct_min	    FLOAT,
	      InnoDB_pages_distinct_max	    FLOAT,
	      InnoDB_pages_distinct_pct_95  FLOAT,
	      InnoDB_pages_distinct_stddev  FLOAT,
	      InnoDB_pages_distinct_median  FLOAT,
	      -- Boolean (Yes/No) attributes.  Only the	cnt and	sum are	needed
	      -- for these.  cnt is how	many times is attribute	was recorded,
	      -- and sum is how	many of	those times the	value was Yes.	So
	      -- sum/cnt * 100 equals the percentage of	recorded times that
	      -- the value was Yes.
	      QC_Hit_cnt	  FLOAT,
	      QC_Hit_sum	  FLOAT,
	      Full_scan_cnt	  FLOAT,
	      Full_scan_sum	  FLOAT,
	      Full_join_cnt	  FLOAT,
	      Full_join_sum	  FLOAT,
	      Tmp_table_cnt	  FLOAT,
	      Tmp_table_sum	  FLOAT,
	      Tmp_table_on_disk_cnt FLOAT,
	      Tmp_table_on_disk_sum FLOAT,
	      Filesort_cnt	    FLOAT,
	      Filesort_sum	    FLOAT,
	      Filesort_on_disk_cnt  FLOAT,
	      Filesort_on_disk_sum  FLOAT,
	      PRIMARY KEY(checksum, ts_min, ts_max)
	    );

	   Note	that we	store the count	(cnt) for the ts attribute only; it
	   will	be redundant to	store this for other attributes.

	   Starting from Percona Toolkit 3.0.11, the checksum function has
	   been	updated	to use 32 chars	in the MD5 sum.	 This causes the
	   checksum field in the history table will have a different value
	   than	in the previous	versions of the	tool.

       --host
	   short form: -h; type: string

	   Connect to host.

       --ignore-attributes
	   type: array;	default: arg, cmd, insert_id, ip, port,	Thread_id,
	   timestamp, exptime, flags, key, res,	val, server_id,	offset,
	   end_log_pos,	Xid

	   Do not aggregate these attributes.  Some attributes are not query
	   metrics but metadata	which doesn't need to be (or can't be)
	   aggregated.

       --inherit-attributes
	   type: array;	default: db,ts

	   If missing, inherit these attributes	from the last event that had
	   them.

	   This	option sets which attributes are inherited or carried forward
	   to events which do not have them.  For example, if one event	has
	   the db attribute equal to "foo", but	the next event doesn't have
	   the db attribute, then it inherits "foo" for	its db attribute.

       --interval
	   type: float;	default: .1

	   How frequently to poll the processlist, in seconds.

       --iterations
	   type: int; default: 1

	   How many times to iterate through the collect-and-report cycle.  If
	   0, iterate to infinity.  Each iteration runs	for "--run-time"
	   amount of time.  An iteration is usually determined by an amount of
	   time	and a report is	printed	when that amount of time elapses.
	   With	"--run-time-mode" "interval", an interval is instead
	   determined by the interval time you specify with "--run-time".  See
	   "--run-time"	and "--run-time-mode" for more information.

       --limit
	   type: Array;	default: 95%:20

	   Limit output	to the given percentage	or count.

	   If the argument is an integer, report only the top N	worst queries.
	   If the argument is an integer followed by the "%" sign, report that
	   percentage of the worst queries.  If	the percentage is followed by
	   a colon and another integer,	report the top percentage or the
	   number specified by that integer, whichever comes first.

	   The value is	actually a comma-separated array of values, one	for
	   each	item in	"--group-by".  If you don't specify a value for	any of
	   those items,	the default is the top 95%.

	   See also "--outliers".

       --log
	   type: string

	   Print all output to this file when daemonized.

       --max-hostname-length
	   type: int; default: 10

	   Trim	host names in reports to this length. 0=Do not trim host
	   names.

       --max-line-length
	   type: int; default: 74

	   Trim	lines to this length. 0=Do not trim lines.

       --order-by
	   type: Array;	default: Query_time:sum

	   Sort	events by this attribute and aggregate function.

	   This	is a comma-separated list of order-by expressions, one for
	   each	"--group-by" attribute.	 The default "Query_time:sum" is used
	   for "--group-by" attributes without explicitly given	"--order-by"
	   attributes (that is,	if you specify more "--group-by" attributes
	   than	corresponding "--order-by" attributes).	 The syntax is
	   "attribute:aggregate".  See "ATTRIBUTES" for	valid attributes.
	   Valid aggregates are:

	      Aggregate	Meaning
	      =========	============================
	      sum	Sum/total attribute value
	      min	Minimum	attribute value
	      max	Maximum	attribute value
	      cnt	Frequency/count	of the query

	   For example,	the default "Query_time:sum" means that	queries	in the
	   query analysis report will be ordered (sorted) by their total query
	   execution time ("Exec time").  "Query_time:max" orders the queries
	   by their maximum query execution time, so the query with the	single
	   largest "Query_time"	will be	list first.  "cnt" refers more to the
	   frequency of	the query as a whole, how often	it appears; "Count" is
	   its corresponding line in the query analysis	report.	 So any
	   attribute and "cnt" should yield the	same report wherein queries
	   are sorted by the number of times they appear.

	   When	parsing	general	logs ("--type" "genlog"), the default
	   "--order-by"	becomes	"Query_time:cnt".  General logs	do not report
	   query times so only the "cnt" aggregate makes sense because all
	   query times are zero.

	   If you specify an attribute that doesn't exist in the events, then
	   pt-query-digest falls back to the default "Query_time:sum" and
	   prints a notice at the beginning of the report for each query
	   class.  You can create attributes with "--filter" and order by
	   them; see "ATTRIBUTES" for an example.

       --outliers
	   type: array;	default: Query_time:1:10

	   Report outliers by attribute:percentile:count.

	   The syntax of this option is	a comma-separated list of colon-
	   delimited strings.  The first field is the attribute	by which an
	   outlier is defined.	The second is a	number that is compared	to the
	   attribute's 95th percentile.	 The third is optional,	and is
	   compared to the attribute's cnt aggregate.  Queries that pass this
	   specification are added to the report, regardless of	any limits you
	   specified in	"--limit".

	   For example,	to report queries whose	95th percentile	Query_time is
	   at least 60 seconds and which are seen at least 5 times, use	the
	   following argument:

	     --outliers	Query_time:60:5

	   You can specify an --outliers option	for each value in
	   "--group-by".

       --output
	   type: string; default: report

	   How to format and print the query analysis results.	Accepted
	   values are:

	      VALUE	     FORMAT
	      =======	     ==============================
	      report	     Standard query analysis report
	      slowlog	     MySQL slow	log
	      json	     JSON, on array per	query class
	      json-anon	     JSON without example queries
	      secure-slowlog JSON without example queries

	   The entire "report" output can be disabled by specifying
	   "--no-report" (see "--[no]report"), and its sections	can be
	   disabled or rearranged by specifying	"--report-format".

	   "json" output was introduced	in 2.2.1 and is	still in development,
	   so the data structure may change in future versions.

       --password
	   short form: -p; type: string

	   Password to use when	connecting.  If	password contains commas they
	   must	be escaped with	a backslash: "exam\,ple"

       --pid
	   type: string

	   Create the given PID	file.  The tool	won't start if the PID file
	   already exists and the PID it contains is different than the
	   current PID.	 However, if the PID file exists and the PID it
	   contains is no longer running, the tool will	overwrite the PID file
	   with	the current PID.  The PID file is removed automatically	when
	   the tool exits.

       --port
	   short form: -P; type: int

	   Port	number to use for connection.

       --preserve-embedded-numbers
	   Preserve numbers in database/table names when fingerprinting
	   queries.  The standar fingeprint method replaces numbers in
	   db/tables names, making a query like	'SELECT	* FROM db1.table2' to
	   be figerprinted as 'SELECT *	FROM db?.table?'.  This	option changes
	   that	behaviour and the fingerprint will become 'SELECT * FROM
	   db1.table2'.

       --processlist
	   type: DSN

	   Poll	this DSN's processlist for queries, with "--interval" sleep
	   between.

	   If the connection fails, pt-query-digest tries to reopen it once
	   per second.

       --progress
	   type: array;	default: time,30

	   Print progress reports to STDERR.  The value	is a comma-separated
	   list	with two parts.	 The first part	can be percentage, time, or
	   iterations; the second part specifies how often an update should be
	   printed, in percentage, seconds, or number of iterations.

       --read-timeout
	   type: time; default:	0

	   Wait	this long for an event from the	input; 0 to wait forever.

	   This	option sets the	maximum	time to	wait for an event from the
	   input.  It applies to all types of input except "--processlist".
	   If an event is not received after the specified time, the script
	   stops reading the input and prints its reports.  If "--iterations"
	   is 0	or greater than	1, the next iteration will begin, else the
	   script will exit.

	   This	option requires	the Perl POSIX module.

       --[no]report
	   default: yes

	   Print query analysis	reports	for each "--group-by" attribute.  This
	   is the standard slow	log analysis functionality.  See "OUTPUT" for
	   the description of what this	does and what the results look like.

	   If you don't	need a report (for example, when using "--review" or
	   "--history"), it is best to specify "--no-report" because this
	   allows the tool to skip some	expensive operations.

       --report-all
	   Report all queries, even ones that have been	reviewed.  This	only
	   affects the "report"	"--output" when	using "--review".  Otherwise,
	   all queries are always printed.

       --report-format
	   type: Array;	default:
	   rusage,date,hostname,files,header,profile,query_report,prepared

	   Print these sections	of the query analysis report.

	     SECTION	  PRINTS
	     ============ ======================================================
	     rusage	  CPU times and	memory usage reported by ps
	     date	  Current local	date and time
	     hostname	  Hostname of machine on which pt-query-digest was run
	     files	  Input	files read/parse
	     header	  Summary of the entire	analysis run
	     profile	  Compact table	of queries for an overview of the report
	     query_report Detailed information about each unique query
	     prepared	  Prepared statements

	   The sections	are printed in the order specified.  The rusage, date,
	   files and header sections are grouped together if specified
	   together; other sections are	separated by blank lines.

	   See "OUTPUT"	for more information on	the various parts of the query
	   report.

       --report-histogram
	   type: string; default: Query_time

	   Chart the distribution of this attribute's values.

	   The distribution chart is limited to	time-based attributes, so
	   charting "Rows_examined", for example, will produce a useless
	   chart.  Charts look like:

	     # Query_time distribution
	     #	 1us
	     #	10us
	     # 100us
	     #	 1ms
	     #	10ms  ###########################
	     # 100ms  ########################################################
	     #	  1s  ########
	     #	10s+

	   See "OUTPUT"	for more information.

       --resume
	   type: string

	   If specified, the tool writes the last file offset, if there	is
	   one,	to the given filename. When ran	again with the same value for
	   this	option,	the tool reads the last	file offset from the file,
	   seeks to that position in the log, and resumes parsing events from
	   that	point onward.

       --review
	   type: DSN

	   Save	query classes for later	review,	and don't report already
	   reviewed classes.

	   The default table is	"percona_schema.query_review".	Specify
	   database (D)	and table (t) DSN options to override the default.
	   The database	and table are automatically created unless
	   "--no-create-review-table" is specified (see
	   "--[no]create-review-table").

	   If the table	was created manually, it must have at least the
	   following columns.  You can add more	columns	for your own special
	   purposes, but they won't be used by pt-query-digest.

	     CREATE TABLE IF NOT EXISTS	query_review (
		checksum     CHAR(32) NOT NULL PRIMARY KEY,
		fingerprint  TEXT NOT NULL,
		sample	     TEXT NOT NULL,
		first_seen   DATETIME,
		last_seen    DATETIME,
		reviewed_by  VARCHAR(20),
		reviewed_on  DATETIME,
		comments     TEXT
	     )

	   The columns are:

	     COLUMN	  MEANING
	     ===========  ====================================================
	     checksum	  A 64-bit checksum of the query fingerprint
	     fingerprint  The abstracted version of the	query; its primary key
	     sample	  The query text of a sample of	the class of queries
	     first_seen	  The smallest timestamp of this class of queries
	     last_seen	  The largest timestamp	of this	class of queries
	     reviewed_by  Initially NULL; if set, query	is skipped thereafter
	     reviewed_on  Initially NULL; not assigned any special meaning
	     comments	  Initially NULL; not assigned any special meaning

	   Note	that the "fingerprint" column is the true primary key for a
	   class of queries.  The "checksum" is	just a cryptographic hash of
	   this	value, which provides a	shorter	value that is very likely to
	   also	be unique.

	   After parsing and aggregating events, your table should contain a
	   row for each	fingerprint.  This option depends on "--group-by
	   fingerprint"	(which is the default).	 It will not work otherwise.

       --run-time
	   type: time

	   How long to run for each "--iterations".  The default is to run
	   forever (you	can interrupt with CTRL-C).  Because "--iterations"
	   defaults to 1, if you only specify "--run-time", pt-query-digest
	   runs	for that amount	of time	and then exits.	 The two options are
	   specified together to do collect-and-report cycles.	For example,
	   specifying "--iterations" 4 "--run-time" "15m" with a continuous
	   input (like STDIN or	"--processlist") will cause pt-query-digest to
	   run for 1 hour (15 minutes x	4), reporting four times, once at each
	   15 minute interval.

       --run-time-mode
	   type: string; default: clock

	   Set what the	value of "--run-time" operates on.  Following are the
	   possible values for this option:

	   clock
	       "--run-time" specifies an amount	of real	clock time during
	       which the tool should run for each "--iterations".

	   event
	       "--run-time" specifies an amount	of log time.  Log time is
	       determined by timestamps	in the log.  The first timestamp seen
	       is remembered, and each timestamp after that is compared	to the
	       first to	determine how much log time has	passed.	 For example,
	       if the first timestamp seen is "12:00:00" and the next is
	       "12:01:30", that	is 1 minute and	30 seconds of log time.	 The
	       tool will read events until the log time	is greater than	or
	       equal to	the specified "--run-time" value.

	       Since timestamps	in logs	are not	always printed,	or not always
	       printed frequently, this	mode varies in accuracy.

	   interval
	       "--run-time" specifies interval boundaries of log time into
	       which events are	divided	and reports are	generated.  This mode
	       is different from the others because it doesn't specify how
	       long to run.  The value of "--run-time" must be an interval
	       that divides evenly into	minutes, hours or days.	 For example,
	       "5m" divides evenly into	hours (60/5=12,	so 12 5	minutes
	       intervals per hour) but "7m" does not (60/7=8.6).

	       Specifying "--run-time-mode interval --run-time 30m
	       --iterations 0" is similar to specifying	"--run-time-mode clock
	       --run-time 30m --iterations 0".	In the latter case, pt-query-
	       digest will run forever,	producing reports every	30 minutes,
	       but this	only works effectively with  continuous	inputs like
	       STDIN and the processlist.  For fixed inputs, like log files,
	       the former example produces multiple reports by dividing	the
	       log into	30 minutes intervals based on timestamps.

	       Intervals are calculated	from the zeroth	second/minute/hour in
	       which a timestamp occurs, not from whatever time	it specifies.
	       For example, with 30 minute intervals and a timestamp of
	       "12:10:30", the interval	is not "12:10:30" to "12:40:30", it is
	       "12:00:00" to "12:29:59".  Or, with 1 hour intervals, it	is
	       "12:00:00" to "12:59:59".  When a new timestamp exceeds the
	       interval, a report is printed, and the next interval is
	       recalculated based on the new timestamp.

	       Since "--iterations" is 1 by default, you probably want to
	       specify a new value else	pt-query-digest	will only get and
	       report on the first interval from the log since 1 interval = 1
	       iteration.  If you want to get and report every interval	in a
	       log, specify "--iterations" 0.

       --sample
	   type: int

	   Filter out all but the first	N occurrences of each query.  The
	   queries are filtered	on the first value in "--group-by", so by
	   default, this will filter by	query fingerprint.  For	example,
	   "--sample 2"	will permit two	sample queries for each	fingerprint.
	   Useful in conjunction with "--output	slowlog" to print the queries.
	   You probably	want to	set "--no-report" to avoid the overhead	of
	   aggregating and reporting if	you're just using this to print	out
	   samples of queries.	A complete example:

	     pt-query-digest --sample 2	--no-report --output slowlog slow.log

       --slave-user
	   type: string

	   Sets	the user to be used to connect to the slaves.  This parameter
	   allows you to have a	different user with less privileges on the
	   slaves but that user	must exist on all slaves.

       --slave-password
	   type: string

	   Sets	the password to	be used	to connect to the slaves.  It can be
	   used	with --slave-user and the password for the user	must be	the
	   same	on all slaves.

       --set-vars
	   type: Array

	   Set the MySQL variables in this comma-separated list	of
	   "variable=value" pairs.

	   By default, the tool	sets:

	      wait_timeout=10000

	   Variables specified on the command line override these defaults.
	   For example,	specifying "--set-vars wait_timeout=500" overrides the
	   defaultvalue	of 10000.

	   The tool prints a warning and continues if a	variable cannot	be
	   set.

       --show-all
	   type: Hash

	   Show	all values for these attributes.

	   By default pt-query-digest only shows as many of an attribute's
	   value that fit on a single line.  This option allows	you to specify
	   attributes for which	all values will	be shown (line width is
	   ignored).  This only	works for attributes with string values	like
	   user, host, db, etc.	 Multiple attributes can be specified, comma-
	   separated.

       --since
	   type: string

	   Parse only queries newer than this value (parse queries since this
	   date).

	   This	option allows you to ignore queries older than a certain value
	   and parse only those	queries	which are more recent than the value.
	   The value can be several types:

	     * Simple time value N with	optional suffix: N[shmd], where
	       s=seconds, h=hours, m=minutes, d=days (default s	if no suffix
	       given); this is like saying "since N[shmd] ago"
	     * Full date with optional hours:minutes:seconds:
	       YYYY-MM-DD [HH:MM:SS]
	     * Short, MySQL-style date:
	       YYMMDD [HH:MM:SS]
	     * Any time	expression evaluated by	MySQL:
	       CURRENT_DATE - INTERVAL 7 DAY

	   If you give a MySQL time expression,	and you	have not also
	   specified a DSN for "--explain", "--processlist", or	"--review",
	   then	you must specify a DSN on the command line so that pt-query-
	   digest can connect to MySQL to evaluate the expression.

	   The MySQL time expression is	wrapped	inside a query like "SELECT
	   UNIX_TIMESTAMP(<expression>)", so be	sure that the expression is
	   valid inside	this query.  For example, do not use UNIX_TIMESTAMP()
	   because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.

	   Events are assumed to be in chronological: older events at the
	   beginning of	the log	and newer events at the	end of the log.
	   "--since" is	strict:	it ignores all queries until one is found that
	   is new enough.  Therefore, if the query events are not consistently
	   timestamped,	some may be ignored which are actually new enough.

	   See also "--until".

       --socket
	   short form: -S; type: string

	   Socket file to use for connection.

       --timeline
	   Show	a timeline of events.

	   This	option makes pt-query-digest print another kind	of report: a
	   timeline of the events.  Each query is still	grouped	and aggregate
	   into	classes	according to "--group-by", but then they are printed
	   in chronological order.  The	timeline report	prints out the
	   timestamp, interval,	count and value	of each	classes.

	   If all you want is the timeline report, then	specify	"--no-report"
	   to suppress the default query analysis report.  Otherwise, the
	   timeline report will	be printed at the end before the response-time
	   profile (see	"--report-format" and "OUTPUT").

	   For example,	this:

	     pt-query-digest /path/to/log --group-by distill --timeline

	   will	print something	like:

	     # ########################################################
	     # distill report
	     # ########################################################
	     # 2009-07-25 11:19:27 1+00:00:01	2 SELECT foo
	     # 2009-07-27 11:19:30	00:01	2 SELECT bar
	     # 2009-07-27 11:30:00 1+06:30:00	2 SELECT foo

       --type
	   type: Array;	default: slowlog

	   The type of input to	parse.	The permitted types are

	   binlog
	       Parse a binary log file that has	first been converted to	text
	       using mysqlbinlog.

	       For example:

		  mysqlbinlog mysql-bin.000441 > mysql-bin.000441.txt

		  pt-query-digest --type binlog	mysql-bin.000441.txt

	   genlog
	       Parse a MySQL general log file.	General	logs lack a lot	of
	       "ATTRIBUTES", notably "Query_time".  The	default	"--order-by"
	       for general logs	changes	to "Query_time:cnt".

	   slowlog
	       Parse a log file	in any variation of MySQL slow log format.

	   tcpdump
	       Inspect network packets and decode the MySQL client protocol,
	       extracting queries and responses	from it.

	       pt-query-digest does not	actually watch the network (i.e. it
	       does NOT	"sniff packets").  Instead, it's just parsing the
	       output of tcpdump.  You are responsible for generating this
	       output; pt-query-digest does not	do it for you.	Then you send
	       this to pt-query-digest as you would any	log file: as files on
	       the command line	or to STDIN.

	       The parser expects the input to be formatted with the following
	       options:	"-x -n -q -tttt".  For example,	if you want to capture
	       output from your	local machine, you can do something like the
	       following (the port must	come last on FreeBSD):

		 tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 \
		   > mysql.tcp.txt
		 pt-query-digest --type	tcpdump	mysql.tcp.txt

	       The other tcpdump parameters, such as -s, -c, and -i, are up to
	       you.  Just make sure the	output looks like this (there is a
	       line break in the first line to avoid man-page problems):

		 2009-04-12 09:50:16.804849 IP 127.0.0.1.42167
			> 127.0.0.1.3306: tcp 37
		     0x0000:  4508 0059	6eb2 4000 4006 cde2 7f00 0001
		     0x0010:  ....

	       Remember	tcpdump	has a handy -c option to stop after it
	       captures	some number of packets!	 That's	very useful for
	       testing your tcpdump command.  Note that	tcpdump	can't capture
	       traffic on a Unix socket.  Read
	       <http://bugs.mysql.com/bug.php?id=31577>	if you're confused
	       about this.

	       Devananda Van Der Veen explained	on the MySQL Performance Blog
	       how to capture traffic without dropping packets on busy
	       servers.	 Dropped packets cause pt-query-digest to miss the
	       response	to a request, then see the response to a later request
	       and assign the wrong execution time to the query.  You can
	       change the filter to something like the following to help
	       capture a subset	of the queries.	 (See
	       <http://www.mysqlperformanceblog.com/?p=6092> for details.)

		 tcpdump -i any	-s 65535 -x -n -q -tttt	\
		    'port 3306 and tcp[1] & 7 == 2 and tcp[3] &	7 == 2'

	       All MySQL servers running on port 3306 are automatically
	       detected	in the tcpdump output.	Therefore, if the tcpdump out
	       contains	packets	from multiple servers on port 3306 (for
	       example,	10.0.0.1:3306, 10.0.0.2:3306, etc.), all
	       packets/queries from all	these servers will be analyzed
	       together	as if they were	one server.

	       If you're analyzing traffic for a MySQL server that is not
	       running on port 3306, see "--watch-server".

	       Also note that pt-query-digest may fail to report the database
	       for queries when	parsing	tcpdump	output.	 The database is
	       discovered only in the initial connect events for a new client
	       or when <USE db>	is executed.  If the tcpdump output contains
	       neither of these, then pt-query-digest cannot discover the
	       database.

	       Server-side prepared statements are supported.  SSL-encrypted
	       traffic cannot be inspected and decoded.

	   rawlog
	       Raw logs	are not	MySQL logs but simple text files with one SQL
	       statement per line, like:

		 SELECT	c FROM t WHERE id=1
		 /* Hello, world! */ SELECT * FROM t2 LIMIT 1
		 INSERT	INTO t (a, b) VALUES ('foo', 'bar')
		 INSERT	INTO t SELECT *	FROM monkeys

	       Since raw logs do not have any metrics, many options and
	       features	of pt-query-digest do not work with them.

	       One use case for	raw logs is ranking queries by count when the
	       only information	available is a list of queries,	from polling
	       "SHOW PROCESSLIST" for example.

       --until
	   type: string

	   Parse only queries older than this value (parse queries until this
	   date).

	   This	option allows you to ignore queries newer than a certain value
	   and parse only those	queries	which are older	than the value.	 The
	   value can be	one of the same	types listed for "--since".

	   Unlike "--since", "--until" is not strict: all queries are parsed
	   until one has a timestamp that is equal to or greater than
	   "--until".  Then all	subsequent queries are ignored.

       --user
	   short form: -u; type: string

	   User	for login if not current user.

       --variations
	   type: Array

	   Report the number of	variations in these attributes'	values.

	   Variations show how many distinct values an attribute had within a
	   class.  The usual value for this option is "arg" which shows	how
	   many	distinct queries were in the class.  This can be useful	to
	   determine a query's cacheability.

	   Distinct values are determined by CRC32 checksums of	the
	   attributes' values.	These checksums	are reported in	the query
	   report for attributes specified by this option, like:

	     # arg crc	    109	(1/25%), 144 (1/25%)...	2 more

	   In that class there were 4 distinct queries.	 The checksums of the
	   first two variations	are shown, and each one	occurred once (or, 25%
	   of the time).

	   The counts of distinct variations is	approximate because only 1,000
	   variations are saved.  The mod (%) 1000 of the full CRC32 checksum
	   is saved, so	some distinct checksums	are treated as equal.

       --version
	   Show	version	and exit.

       --[no]version-check
	   default: yes

	   Check for the latest	version	of Percona Toolkit, MySQL, and other
	   programs.

	   This	is a standard "check for updates automatically"	feature, with
	   two additional features.  First, the	tool checks its	own version
	   and also the	versions of the	following software: operating system,
	   Percona Monitoring and Management (PMM), MySQL, Perl, MySQL driver
	   for Perl (DBD::mysql), and Percona Toolkit. Second, it checks for
	   and warns about versions with known problems. For example, MySQL
	   5.5.25 had a	critical bug and was re-released as 5.5.25a.

	   A secure connection to Perconaas Version Check database server is
	   done	to perform these checks. Each request is logged	by the server,
	   including software version numbers and unique ID of the checked
	   system. The ID is generated by the Percona Toolkit installation
	   script or when the Version Check database call is done for the
	   first time.

	   Any updates or known	problems are printed to	STDOUT before the
	   tool's normal output.  This feature should never interfere with the
	   normal operation of the tool.

	   For more information, visit
	   <https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>.

       --[no]vertical-format
	   default: yes

	   Output a trailing "\G" in the reported SQL queries.

	   This	makes the mysql	client display the result using	vertical
	   format.  Non-native MySQL clients like phpMyAdmin do	not support
	   this.

       --watch-server
	   type: string

	   This	option tells pt-query-digest which server IP address and port
	   (like "10.0.0.1:3306") to watch when	parsing	tcpdump	(for "--type"
	   tcpdump); all other servers are ignored.  If	you don't specify it,
	   pt-query-digest watches all servers by looking for any IP address
	   using port 3306 or "mysql".	If you're watching a server with a
	   non-standard	port, this won't work, so you must specify the IP
	   address and port to watch.

	   If you want to watch	a mix of servers, some running on standard
	   port	3306 and some running on non-standard ports, you need to
	   create separate tcpdump outputs for the non-standard	port servers
	   and then specify this option	for each.  At present pt-query-digest
	   cannot auto-detect servers on port 3306 and also be told to watch a
	   server on a non-standard port.

DSN OPTIONS
       These DSN options are used to create a DSN.  Each option	is given like
       "option=value".	The options are	case-sensitive,	so P and p are not the
       same option.  There cannot be whitespace	before or after	the "="	and if
       the value contains whitespace it	must be	quoted.	 DSN options are
       comma-separated.	 See the percona-toolkit manpage for full details.

       o   A

	   dsn:	charset; copy: yes

	   Default character set.

       o   D

	   dsn:	database; copy:	yes

	   Default database to use when	connecting to MySQL.

       o   F

	   dsn:	mysql_read_default_file; copy: yes

	   Only	read default options from the given file.

       o   h

	   dsn:	host; copy: yes

	   Connect to host.

       o   p

	   dsn:	password; copy:	yes

	   Password to use when	connecting.  If	password contains commas they
	   must	be escaped with	a backslash: "exam\,ple"

       o   P

	   dsn:	port; copy: yes

	   Port	number to use for connection.

       o   S

	   dsn:	mysql_socket; copy: yes

	   Socket file to use for connection.

       o   t

	   The "--review" or "--history" table.

       o   u

	   dsn:	user; copy: yes

	   User	for login if not current user.

ENVIRONMENT
       The environment variable	"PTDEBUG" enables verbose debugging output to
       STDERR.	To enable debugging and	capture	all output to a	file, run the
       tool like:

	  PTDEBUG=1 pt-query-digest ...	> FILE 2>&1

       Be careful: debugging output is voluminous and can generate several
       megabytes of output.

SYSTEM REQUIREMENTS
       You need	Perl, DBI, DBD::mysql, and some	core packages that ought to be
       installed in any	reasonably new version of Perl.

BUGS
       For a list of known bugs, see
       <http://www.percona.com/bugs/pt-query-digest>.

       Please report bugs at <https://jira.percona.com/projects/PT>.  Include
       the following information in your bug report:

       o   Complete command-line used to run the tool

       o   Tool	"--version"

       o   MySQL version of all	servers	involved

       o   Output from the tool	including STDERR

       o   Input files (log/dump/config	files, etc.)

       If possible, include debugging output by	running	the tool with
       "PTDEBUG"; see "ENVIRONMENT".

DOWNLOADING
       Visit <http://www.percona.com/software/percona-toolkit/>	to download
       the latest release of Percona Toolkit.  Or, get the latest release from
       the command line:

	  wget percona.com/get/percona-toolkit.tar.gz

	  wget percona.com/get/percona-toolkit.rpm

	  wget percona.com/get/percona-toolkit.deb

       You can also get	individual tools from the latest release:

	  wget percona.com/get/TOOL

       Replace "TOOL" with the name of any tool.

ATTRIBUTES REFERENCE
       Events may have the following attributes.  If writing a "--filter", be
       sure to check that an attribute is defined in each event	before using
       it, else	the filter code	may crash the tool with	a "use of
       uninitialized value" error.

       You can dump event attributes for any input like:

	 $ pt-query-digest		    \
	     slow.log			    \
	     --filter 'print Dumper $event' \
	     --no-report		    \
	     --sample 1

       That will produce a lot of output with "attribute => value" pairs like:

	  $VAR1	= {
	    Query_time => '0.033384',
	    Rows_examined => '0',
	    Rows_sent => '0',
	    Thread_id => '10',
	    Tmp_table => 'No',
	    Tmp_table_on_disk => 'No',
	    arg	=> 'SELECT col FROM tbl	WHERE id=5',
	    bytes => 103,
	    cmd	=> 'Query',
	    db => 'db1',
	    fingerprint	=> 'select col from tbl	where id=?',
	    host => '',
	    pos_in_log => 1334,
	    ts => '071218 11:48:27',
	    user => '[SQL_SLAVE]'
	  };

   COMMON
       These attribute are common to all input "--type"	and "--processlist",
       except where noted.

       arg The query text, or the command for admin commands like "Ping".

       bytes
	   The byte length of the "arg".

       cmd "Query" or "Admin".

       db  The current database.  The value comes from USE database
	   statements.	By default, "Schema" is	an alias which is
	   automatically changed to "db"; see "--attribute-aliases".

       fingerprint
	   An abstracted form of the query.  See "FINGERPRINTS".

       host
	   Client host which executed the query.

       pos_in_log
	   The byte offset of the event	in the log or tcpdump, except for
	   "--processlist".

       Query_time
	   The total time the query took, including lock time.

       ts  The timestamp of when the query ended.

   SLOW, GENERAL, AND BINARY LOGS
       Events have all available attributes from the log file.	Therefore, you
       only need to look at the	log file to see	which events are available,
       but remember: not all events have the same attributes.

       Percona Server adds many	attributes to the slow log; see
       http://www.percona.com/docs/wiki/patches:slow_extended for more
       information.

   TCPDUMP
       These attributes	are available when parsing "--type" tcpdump.

       Error_no
	   The MySQL error number if the query caused an error.

       ip  The client's	IP address.  Certain log files may also	contain	this
	   attribute.

       No_good_index_used
	   Yes or No if	no good	index existed for the query (flag set by
	   server).

       No_index_used
	   Yes or No if	the query did not use any index	(flag set by server).

       port
	   The client's	port number.

       Warning_count
	   The number of warnings, as otherwise	shown by "SHOW WARNINGS".

   PROCESSLIST
       If using	"--processlist", an "id" attribute is available	for the
       process ID, in addition to the common attributes.

AUTHORS
       Baron Schwartz, Daniel Nichter, and Brian Fraser

ABOUT PERCONA TOOLKIT
       This tool is part of Percona Toolkit, a collection of advanced command-
       line tools for MySQL developed by Percona.  Percona Toolkit was forked
       from two	projects in June, 2011:	Maatkit	and Aspersa.  Those projects
       were created by Baron Schwartz and primarily developed by him and
       Daniel Nichter.	Visit <http://www.percona.com/software/> to learn
       about other free, open-source software from Percona.

COPYRIGHT, LICENSE, AND	WARRANTY
       This program is copyright 2008-2018 Percona LLC and/or its affiliates.

       THIS PROGRAM IS PROVIDED	"AS IS"	AND WITHOUT ANY	EXPRESS	OR IMPLIED
       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
       MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation, version 2; OR the Perl	Artistic License.  On
       UNIX and	similar	systems, you can issue `man perlgpl' or	`man
       perlartistic' to	read these licenses.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place,	Suite 330, Boston, MA  02111-1307  USA.

VERSION
       pt-query-digest 3.2.0

POD ERRORS
       Hey! The	above document had some	coding errors, which are explained
       below:

       Around line 16636:
	   Non-ASCII character seen before =encoding in	'Perconaas'. Assuming
	   UTF-8

perl v5.32.1			  2020-04-23		    PT-QUERY-DIGEST(1)

NAME | SYNOPSIS | RISKS | DESCRIPTION | ATTRIBUTES | OUTPUT | QUERY REVIEW | FINGERPRINTS | OPTIONS | DSN OPTIONS | ENVIRONMENT | SYSTEM REQUIREMENTS | BUGS | DOWNLOADING | ATTRIBUTES REFERENCE | AUTHORS | ABOUT PERCONA TOOLKIT | COPYRIGHT, LICENSE, AND WARRANTY | VERSION | POD ERRORS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=pt-query-digest&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help