Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PT-ARCHIVER(1)	      User Contributed Perl Documentation	PT-ARCHIVER(1)

       pt-archiver - Archive rows from a MySQL table into another table	or a

       Usage: pt-archiver [OPTIONS] --source DSN --where WHERE

       pt-archiver nibbles records from	a MySQL	table.	The --source and
       --dest arguments	use DSN	syntax;	if COPY	is yes,	--dest defaults	to the
       key's value from	--source.


       Archive all rows	from oltp_server to olap_server	and to a file:

	 pt-archiver --source h=oltp_server,D=test,t=tbl --dest	h=olap_server \
	   --file '/var/log/archive/%Y-%m-%d-%D.%t'			      \
	   --where "1=1" --limit 1000 --commit-each

       Purge (delete) orphan rows from child table:

	 pt-archiver --source h=host,D=db,t=child --purge \
	   --where 'NOT	EXISTS(SELECT *	FROM parent WHERE col=child.col)'

       Percona Toolkit is mature, proven in the	real world, and	well tested,
       but all database	tools can pose a risk to the system and	the database
       server.	Before using this tool,	please:

       o   Read	the tool's documentation

       o   Review the tool's known "BUGS"

       o   Test	the tool on a non-production server

       o   Backup your production server and verify the	backups

       pt-archiver is the tool I use to	archive	tables as described in
       <>.  The goal is a low-impact,
       forward-only job	to nibble old data out of the table without impacting
       OLTP queries much.  You can insert the data into	another	table, which
       need not	be on the same server.	You can	also write it to a file	in a
       format suitable for LOAD	DATA INFILE.  Or you can do neither, in	which
       case it's just an incremental DELETE.

       pt-archiver is extensible via a plugin mechanism.  You can inject your
       own code	to add advanced	archiving logic	that could be useful for
       archiving dependent data, applying complex business rules, or building
       a data warehouse	during the archiving process.

       You need	to choose values carefully for some options.  The most
       important are "--limit",	"--retries", and "--txn-size".

       The strategy is to find the first row(s), then scan some	index forward-
       only to find more rows efficiently.  Each subsequent query should not
       scan the	entire table; it should	seek into the index, then scan until
       it finds	more archivable	rows.  Specifying the index with the 'i' part
       of the "--source" argument can be crucial for this; use "--dry-run" to
       examine the generated queries and be sure to EXPLAIN them to see	if
       they are	efficient (most	of the time you	probably want to scan the
       PRIMARY key, which is the default).  Even better, examine the
       difference in the Handler status	counters before	and after running the
       query, and make sure it is not scanning the whole table every query.

       You can disable the seek-then-scan optimizations	partially or wholly
       with "--no-ascend" and "--ascend-first".	 Sometimes this	may be more
       efficient for multi-column keys.	 Be aware that pt-archiver is built to
       start at	the beginning of the index it chooses and scan it forward-
       only.  This might result	in long	table scans if you're trying to	nibble
       from the	end of the table by an index other than	the one	it prefers.
       See "--source" and read the documentation on the	"i" part if this
       applies to you.

Percona	XtraDB Cluster
       pt-archiver works with Percona XtraDB Cluster (PXC) 5.5.28-23.7 and
       newer, but there	are three limitations you should consider before
       archiving on a cluster:

       Error on	commit
	   pt-archiver does not	check for error	when it	commits	transactions.
	   Commits on PXC can fail, but	the tool does not yet check for	or
	   retry the transaction when this happens.  If	it happens, the	tool
	   will	die.

       MyISAM tables
	   Archiving MyISAM tables works, but MyISAM support in	PXC is still
	   experimental	at the time of this release.  There are	several	known
	   bugs	with PXC, MyISAM tables, and "AUTO_INCREMENT" columns.
	   Therefore, you must ensure that archiving will not directly or
	   indirectly result in	the use	of default "AUTO_INCREMENT" values for
	   a MyISAM table.  For	example, this happens with "--dest" if
	   "--columns" is used and the "AUTO_INCREMENT"	column is not
	   included.  The tool does not	check for this!

       Non-cluster options
	   Certain options may or may not work.	 For example, if a cluster
	   node	is not also a slave, then "--check-slave-lag" does not work.
	   And since PXC tables	are usually InnoDB, but	InnoDB doesn't support
	   "INSERT DELAYED", then "--delayed-insert" does not work.  Other
	   options may also not	work, but the tool does	not check them,
	   therefore you should	test archiving on a test cluster before
	   archiving on	your real cluster.

       If you specify "--progress", the	output is a header row,	plus status
       output at intervals.  Each row in the status output lists the current
       date and	time, how many seconds pt-archiver has been running, and how
       many rows it has	archived.

       If you specify "--statistics", "pt-archiver" outputs timing and other
       information to help you identify	which part of your archiving process
       takes the most time.

       pt-archiver tries to catch signals and exit gracefully; for example, if
       you send	it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the
       signal, print a message about the signal, and exit fairly normally.  It
       will not	execute	"--analyze" or "--optimize", because these may take a
       long time to finish.  It	will run all other code	normally, including
       calling after_finish() on any plugins (see "EXTENDING").

       In other	words, a signal, if caught, will break out of the main
       archiving loop and skip optimize/analyze.

       Specify at least	one of "--dest", "--file", or "--purge".

       "--ignore" and "--replace" are mutually exclusive.

       "--txn-size" and	"--commit-each"	are mutually exclusive.

       "--low-priority-insert" and "--delayed-insert" are mutually exclusive.

       "--share-lock" and "--for-update" are mutually exclusive.

       "--analyze" and "--optimize" are	mutually exclusive.

       "--no-ascend" and "--no-delete" are mutually exclusive.

       DSN values in "--dest" default to values	from "--source"	if COPY	is

	   type: string

	   Run ANALYZE TABLE afterwards	on "--source" and/or "--dest".

	   Runs	ANALYZE	TABLE after finishing.	The argument is	an arbitrary
	   string.  If it contains the letter 's', the source will be
	   analyzed.  If it contains 'd', the destination will be analyzed.
	   You can specify either or both.  For	example, the following will
	   analyze both:


	   See <>	for details on

	   Ascend only first column of index.

	   If you do want to use the ascending index optimization (see
	   "--no-ascend"), but do not want to incur the	overhead of ascending
	   a large multi-column	index, you can use this	option to tell pt-
	   archiver to ascend only the leftmost	column of the index.  This can
	   provide a significant performance boost over	not ascending the
	   index at all, while avoiding	the cost of ascending the whole	index.

	   See "EXTENDING" for a discussion of how this	interacts with

	   Prompt for a	password when connecting to MySQL.

	   Buffer output to "--file" and flush at commit.

	   Disables autoflushing to "--file" and flushes "--file" to disk only
	   when	a transaction commits.	This typically means the file is
	   block-flushed by the	operating system, so there may be some
	   implicit flushes to disk between commits as well.  The default is
	   to flush "--file" to	disk after every row.

	   The danger is that a	crash might cause lost data.

	   The performance increase I have seen	from using "--buffer" is
	   around 5 to 15 percent.  Your mileage may vary.

	   Delete each chunk with a single statement (implies

	   Delete each chunk of	rows in	bulk with a single "DELETE" statement.
	   The statement deletes every row between the first and last row of
	   the chunk, inclusive.  It implies "--commit-each", since it would
	   be a	bad idea to "INSERT" rows one at a time	and commit them	before
	   the bulk "DELETE".

	   The normal method is	to delete every	row by its primary key.	 Bulk
	   deletes might be a lot faster.  They	also might not be faster if
	   you have a complex "WHERE" clause.

	   This	option completely defers all "DELETE" processing until the
	   chunk of rows is finished.  If you have a plugin on the source, its
	   "before_delete" method will not be called.  Instead,	its
	   "before_bulk_delete"	method is called later.

	   WARNING: if you have	a plugin on the	source that sometimes doesn't
	   return true from "is_archivable()", you should use this option only
	   if you understand what it does.  If the plugin instructs
	   "pt-archiver" not to	archive	a row, it will still be	deleted	by the
	   bulk	delete!

	   default: yes

	   Add "--limit" to "--bulk-delete" statement.

	   This	is an advanced option and you should not disable it unless you
	   know	what you are doing and why!  By	default, "--bulk-delete"
	   appends a "--limit" clause to the bulk delete SQL statement.	 In
	   certain cases, this clause can be omitted by	specifying
	   "--no-bulk-delete-limit".  "--limit"	must still be specified.

	   Insert each chunk with LOAD DATA INFILE (implies "--bulk-delete"

	   Insert each chunk of	rows with "LOAD	DATA LOCAL INFILE".  This may
	   be much faster than inserting a row at a time with "INSERT"
	   statements.	It is implemented by creating a	temporary file for
	   each	chunk of rows, and writing the rows to this file instead of
	   inserting them.  When the chunk is finished,	it uploads the rows.

	   To protect the safety of your data, this option forces bulk deletes
	   to be used.	It would be unsafe to delete each row as it is found,
	   before inserting the	rows into the destination first.  Forcing bulk
	   deletes guarantees that the deletion	waits until the	insertion is

	   The "--low-priority-insert",	"--replace", and "--ignore" options
	   work	with this option, but "--delayed-insert" does not.

	   If "LOAD DATA LOCAL INFILE" throws an error in the lines of "The
	   used	command	is not allowed with this MySQL version", refer to the
	   documentation for the "L" DSN option.

	   type: string

	   Channel name	used when connected to a server	using replication
	   channels.  Suppose you have two masters, master_a at	port 12345,
	   master_b at port 1236 and a slave connected to both masters using
	   channels chan_master_a and chan_master_b.  If you want to run pt-
	   archiver to syncronize the slave against master_a, pt-archiver
	   won't be able to determine what's the correct master	since SHOW
	   SLAVE STATUS	will return 2 rows. In this case, you can use
	   --channel=chan_master_a to specify the channel name to use in the
	   SHOW	SLAVE STATUS command.

	   short form: -A; type: string

	   Default character set.  If the value	is utf8, sets Perl's binmode
	   on STDOUT to	utf8, passes the mysql_enable_utf8 option to
	   DBD::mysql, and runs	SET NAMES UTF8 after connecting	to MySQL.  Any
	   other value sets binmode on STDOUT without the utf8 layer, and runs
	   SET NAMES after connecting to MySQL.

	   Note	that only charsets as known by MySQL are recognized; So	for
	   example, "UTF8" will	work, but "UTF-8" will not.

	   See also "--[no]check-charset".

	   default: yes

	   Ensure connection and table character sets are the same.  Disabling
	   this	check may cause	text to	be erroneously converted from one
	   character set to another (usually from utf8 to latin1) which	may
	   cause data loss or mojibake.	 Disabling this	check may be useful or
	   necessary when character set	conversions are	intended.

	   default: yes

	   Ensure "--source" and "--dest" have same columns.

	   Enabled by default; causes pt-archiver to check that	the source and
	   destination tables have the same columns.  It does not check	column
	   order, data type, etc.  It just checks that all columns in the
	   source exist	in the destination and vice versa.  If there are any
	   differences,	pt-archiver will exit with an error.

	   To disable this check, specify --no-check-columns.

	   type: time; default:	1s

	   If "--check-slave-lag" is given, this defines how long the tool
	   pauses each
	    time it discovers that a slave is lagging.
	    This check is performed every 100 rows.

	   type: string; repeatable: yes

	   Pause archiving until the specified DSN's slave lag is less than
	   "--max-lag".	 This option can be specified multiple times for
	   checking more than one slave.

	   short form: -c; type: array

	   Comma-separated list	of columns to archive.

	   Specify a comma-separated list of columns to	fetch, write to	the
	   file, and insert into the destination table.	 If specified, pt-
	   archiver ignores other columns unless it needs to add them to the
	   "SELECT" statement for ascending an index or	deleting rows.	It
	   fetches and uses these extra	columns	internally, but	does not write
	   them	to the file or to the destination table.  It does pass them to

	   See also "--primary-key-only".

	   Commit each set of fetched and archived rows	(disables

	   Commits transactions	and flushes "--file" after each	set of rows
	   has been archived, before fetching the next set of rows, and	before
	   sleeping if "--sleep" is specified.	Disables "--txn-size"; use
	   "--limit" to	control	the transaction	size with "--commit-each".

	   This	option is useful as a shortcut to make "--limit" and
	   "--txn-size"	the same value,	but more importantly it	avoids
	   transactions	being held open	while searching	for more rows.	For
	   example, imagine you	are archiving old rows from the	beginning of a
	   very	large table, with "--limit" 1000 and "--txn-size" 1000.	 After
	   some	period of finding and archiving	1000 rows at a time, pt-
	   archiver finds the last 999 rows and	archives them, then executes
	   the next SELECT to find more	rows.  This scans the rest of the
	   table, but never finds any more rows.  It has held open a
	   transaction for a very long time, only to determine it is finished
	   anyway.  You	can use	"--commit-each"	to avoid this.

	   type: Array

	   Read	this comma-separated list of config files; if specified, this
	   must	be the first option on the command line.

	   short form: -D; type: string

	   Connect to this database.

	   Add the DELAYED modifier to INSERT statements.

	   Adds	the DELAYED modifier to	INSERT or REPLACE statements.  See
	   <> for details.

	   type: DSN

	   DSN specifying the table to archive to.

	   This	item specifies a table into which pt-archiver will insert rows
	   archived from "--source".  It uses the same key=val argument	format
	   as "--source".  Most	missing	values default to the same values as
	   "--source", so you don't have to repeat options that	are the	same
	   in "--source" and "--dest".	Use the	"--help" option	to see which
	   values are copied from "--source".

	   WARNING: Using a default options file (F) DSN option	that defines a
	   socket for "--source" causes	pt-archiver to connect to "--dest"
	   using that socket unless another socket for "--dest"	is specified.
	   This	means that pt-archiver may incorrectly connect to "--source"
	   when	it connects to "--dest".  For example:

	     --source F=host1.cnf,D=db,t=tbl --dest h=host2

	   When	pt-archiver connects to	"--dest", host2, it will connect via
	   the "--source", host1, socket defined in host1.cnf.

	   Print queries and exit without doing	anything.

	   Causes pt-archiver to exit after printing the filename and SQL
	   statements it will use.

	   type: string

	   File	to archive to, with DATE_FORMAT()-like formatting.

	   Filename to write archived rows to.	A subset of MySQL's
	   DATE_FORMAT() formatting codes are allowed in the filename, as

	      %d    Day	of the month, numeric (01..31)
	      %H    Hour (00..23)
	      %i    Minutes, numeric (00..59)
	      %m    Month, numeric (01..12)
	      %s    Seconds (00..59)
	      %Y    Year, numeric, four	digits

	   You can use the following extra format codes	too:

	      %D    Database name
	      %t    Table name


	      --file '/var/log/archive/%Y-%m-%d-%D.%t'

	   The file's contents are in the same format used by SELECT INTO
	   OUTFILE, as documented in the MySQL manual: rows terminated by
	   newlines, columns terminated	by tabs, NULL characters are
	   represented by \N, and special characters are escaped by \.	This
	   lets	you reload a file with LOAD DATA INFILE's default settings.

	   If you want a column	header at the top of the file, see "--header".
	   The file is auto-flushed by default;	see "--buffer".

	   Adds	the FOR	UPDATE modifier	to SELECT statements.

	   For details,	see

	   Print column	header at top of "--file".

	   Writes column names as the first line in the	file given by
	   "--file".  If the file exists, does not write headers; this keeps
	   the file loadable with LOAD DATA INFILE in case you append more
	   output to it.

	   Show	help and exit.

	   Adds	the HIGH_PRIORITY modifier to SELECT statements.

	   See <> for details.

	   short form: -h; type: string

	   Connect to host.

	   Use IGNORE for INSERT statements.

	   Causes INSERTs into "--dest"	to be INSERT IGNORE.

	   type: int; default: 1

	   Number of rows to fetch and archive per statement.

	   Limits the number of	rows returned by the SELECT statements that
	   retrieve rows to archive.  Default is one row.  It may be more
	   efficient to	increase the limit, but	be careful if you are
	   archiving sparsely, skipping	over many rows;	this can potentially
	   cause more contention with other queries, depending on the storage
	   engine, transaction isolation level,	and options such as

	   Do not write	OPTIMIZE or ANALYZE queries to binlog.

	   Adds	the NO_WRITE_TO_BINLOG modifier	to ANALYZE and OPTIMIZE
	   queries.  See "--analyze" for details.

	   Adds	the LOW_PRIORITY modifier to DELETE statements.

	   See <> for details.

	   Adds	the LOW_PRIORITY modifier to INSERT or REPLACE statements.

	   See <> for details.

	   type: float

	   Somewhat similar to --max-lag but for PXC clusters.	Check average
	   time	cluster	spent pausing for Flow Control and make	tool pause if
	   it goes over	the percentage indicated in the	option.	 Default is no
	   Flow	Control	checking.  This	option is available for	PXC versions
	   5.6 or higher.

	   type: time; default:	1s

	   Pause archiving if the slave	given by "--check-slave-lag" lags.

	   This	option causes pt-archiver to look at the slave every time it's
	   about to fetch another row.	If the slave's lag is greater than the
	   option's value, or if the slave isn't running (so its lag is	NULL),
	   pt-table-checksum sleeps for	"--check-interval" seconds and then
	   looks at the	lag again.  It repeats until the slave is caught up,
	   then	proceeds to fetch and archive the row.

	   This	option may eliminate the need for "--sleep" or "--sleep-coef".

	   Do not use ascending	index optimization.

	   The default ascending-index optimization causes "pt-archiver" to
	   optimize repeated "SELECT" queries so they seek into	the index
	   where the previous query ended, then	scan along it, rather than
	   scanning from the beginning of the table every time.	 This is
	   enabled by default because it is generally a	good strategy for
	   repeated accesses.

	   Large, multiple-column indexes may cause the	WHERE clause to	be
	   complex enough that this could actually be less efficient.
	   Consider for	example	a four-column PRIMARY KEY on (a, b, c, d).
	   The WHERE clause to start where the last query ended	is as follows:

	      WHERE (a > ?)
		 OR (a = ? AND b > ?)
		 OR (a = ? AND b = ? AND c > ?)
		 OR (a = ? AND b = ? AND c = ? AND d >=	?)

	   Populating the placeholders with values uses	memory and CPU,	adds
	   network traffic and parsing overhead, and may make the query	harder
	   for MySQL to	optimize.  A four-column key isn't a big deal, but a
	   ten-column key in which every column	allows "NULL" might be.

	   Ascending the index might not be necessary if you know you are
	   simply removing rows	from the beginning of the table	in chunks, but
	   not leaving any holes, so starting at the beginning of the table is
	   actually the	most efficient thing to	do.

	   See also "--ascend-first".  See "EXTENDING" for a discussion	of how
	   this	interacts with plugins.

	   Do not delete archived rows.

	   Causes "pt-archiver"	not to delete rows after processing them.
	   This	disallows "--no-ascend", because enabling them both would
	   cause an infinite loop.

	   If there is a plugin	on the source DSN, its "before_delete" method
	   is called anyway, even though "pt-archiver" will not	execute	the
	   delete.  See	"EXTENDING" for	more on	plugins.

	   type: string

	   Run OPTIMIZE	TABLE afterwards on "--source" and/or "--dest".

	   Runs	OPTIMIZE TABLE after finishing.	 See "--analyze" for the
	   option syntax and <>
	   for details on OPTIMIZE TABLE.

	   type: string

	   Used	with "--file" to specify the output format.

	   Valid formats are:
	     dump: MySQL dump format using tabs	as field separator (default)
	     csv : Dump	rows using ',' as separator and	optionally enclosing
	   fields by '"'.
		   This	format is equivalent to	FIELDS TERMINATED BY ','

	   short form: -p; type: string

	   Password to use when	connecting.  If	password contains commas they
	   must	be escaped with	a backslash: "exam\,ple"

	   type: string

	   Create the given PID	file.  The tool	won't start if the PID file
	   already exists and the PID it contains is different than the
	   current PID.	 However, if the PID file exists and the PID it
	   contains is no longer running, the tool will	overwrite the PID file
	   with	the current PID.  The PID file is removed automatically	when
	   the tool exits.

	   type: string

	   Perl	module name to use as a	generic	plugin.

	   Specify the Perl module name	of a general-purpose plugin.  It is
	   currently used only for statistics (see "--statistics") and must
	   have	"new()"	and a "statistics()" method.

	   The "new( src =" $src, dst => $dst, opts => $o )> method gets the
	   source and destination DSNs,	and their database connections,	just
	   like	the connection-specific	plugins	do.  It	also gets an
	   OptionParser	object ($o) for	accessing command-line options
	   (example: "$o-"get('purge');>).

	   The "statistics(\%stats, $time)" method gets	a hashref of the
	   statistics collected	by the archiving job, and the time the whole
	   job started.

	   short form: -P; type: int

	   Port	number to use for connection.

	   Primary key columns only.

	   A shortcut for specifying "--columns" with the primary key columns.
	   This	is an efficiency if you	just want to purge rows; it avoids
	   fetching the	entire row, when only the primary key columns are
	   needed for "DELETE" statements.  See	also "--purge".

	   type: int

	   Print progress information every X rows.

	   Prints current time,	elapsed	time, and rows archived	every X	rows.

	   Purge instead of archiving; allows omitting "--file"	and "--dest".

	   Allows archiving without a "--file" or "--dest" argument, which is
	   effectively a purge since the rows are just deleted.

	   If you just want to purge rows, consider specifying the table's
	   primary key columns with "--primary-key-only".  This	will prevent
	   fetching all	columns	from the server	for no reason.

	   Adds	the QUICK modifier to DELETE statements.

	   See <> for details.  As
	   stated in the documentation,	in some	cases it may be	faster to use
	   DELETE QUICK	followed by OPTIMIZE TABLE.  You can use "--optimize"
	   for this.

	   short form: -q

	   Do not print	any output, such as for	"--statistics".

	   Suppresses normal output, including the output of "--statistics",
	   but doesn't suppress	the output from	"--why-quit".

	   Causes INSERTs into "--dest"	to be written as REPLACE.

	   type: int; default: 1

	   Number of retries per timeout or deadlock.

	   Specifies the number	of times pt-archiver should retry when there
	   is an InnoDB	lock wait timeout or deadlock.	When retries are
	   exhausted, pt-archiver will exit with an error.

	   Consider carefully what you want to happen when you are archiving
	   between a mixture of	transactional and non-transactional storage
	   engines.  The INSERT	to "--dest" and	DELETE from "--source" are on
	   separate connections, so they do not	actually participate in	the
	   same	transaction even if they're on the same	server.	 However, pt-
	   archiver implements simple distributed transactions in code,	so
	   commits and rollbacks should	happen as desired across the two

	   At this time	I have not written any code to handle errors with
	   transactional storage engines other than InnoDB.  Request that
	   feature if you need it.

	   type: time

	   Time	to run before exiting.

	   Optional suffix s=seconds, m=minutes, h=hours, d=days; if no
	   suffix, s is	used.

	   default: yes

	   Do not archive row with max AUTO_INCREMENT.

	   Adds	an extra WHERE clause to prevent pt-archiver from removing the
	   newest row when ascending a single-column AUTO_INCREMENT key.  This
	   guards against re-using AUTO_INCREMENT values if the	server
	   restarts, and is enabled by default.

	   The extra WHERE clause contains the maximum value of	the auto-
	   increment column as of the beginning	of the archive or purge	job.
	   If new rows are inserted while pt-archiver is running, it will not
	   see them.

	   type: string; default: /tmp/pt-archiver-sentinel

	   Exit	if this	file exists.

	   The presence	of the file specified by "--sentinel" will cause pt-
	   archiver to stop archiving and exit.	 The default is
	   /tmp/pt-archiver-sentinel.  You might find this handy to stop cron
	   jobs	gracefully if necessary.  See also "--stop".

	   type: string

	   Sets	the user to be used to connect to the slaves.  This parameter
	   allows you to have a	different user with less privileges on the
	   slaves but that user	must exist on all slaves.

	   type: string

	   Sets	the password to	be used	to connect to the slaves.  It can be
	   used	with --slave-user and the password for the user	must be	the
	   same	on all slaves.

	   type: Array

	   Set the MySQL variables in this comma-separated list	of
	   "variable=value" pairs.

	   By default, the tool	sets:


	   Variables specified on the command line override these defaults.
	   For example,	specifying "--set-vars wait_timeout=500" overrides the
	   default value of 10000.

	   The tool prints a warning and continues if a	variable cannot	be

	   Adds	the LOCK IN SHARE MODE modifier	to SELECT statements.

	   See <>.

	   Disables foreign key	checks with SET	FOREIGN_KEY_CHECKS=0.

	   type: int

	   Sleep time between fetches.

	   Specifies how long to sleep between SELECT statements.  Default is
	   not to sleep	at all.	 Transactions are NOT committed, and the
	   "--file" file is NOT	flushed, before	sleeping.  See "--txn-size" to
	   control that.

	   If "--commit-each" is specified, committing and flushing happens
	   before sleeping.

	   type: float

	   Calculate "--sleep" as a multiple of	the last SELECT	time.

	   If this option is specified,	pt-archiver will sleep for the query
	   time	of the last SELECT multiplied by the specified coefficient.

	   This	is a slightly more sophisticated way to	throttle the SELECTs:
	   sleep a varying amount of time between each SELECT, depending on
	   how long the	SELECTs	are taking.

	   short form: -S; type: string

	   Socket file to use for connection.

	   type: DSN

	   DSN specifying the table to archive from (required).	 This argument
	   is a	DSN.  See "DSN OPTIONS"	for the	syntax.	 Most options control
	   how pt-archiver connects to MySQL, but there	are some extended DSN
	   options in this tool's syntax.  The D, t, and i options select a
	   table to archive:

	     --source h=my_server,D=my_database,t=my_tbl

	   The a option	specifies the database to set as the connection's
	   default with	USE.  If the b option is true, it disables binary
	   logging with	SQL_LOG_BIN.  The m option specifies pluggable
	   actions, which an external Perl module can provide.	The only
	   required part is the	table; other parts may be read from various
	   places in the environment (such as options files).

	   The 'i' part	deserves special mention.  This	tells pt-archiver
	   which index it should scan to archive.  This	appears	in a FORCE
	   INDEX or USE	INDEX hint in the SELECT statements used to fetch
	   archivable rows.  If	you don't specify anything, pt-archiver	will
	   auto-discover a good	index, preferring a "PRIMARY KEY" if one
	   exists.  In my experience this usually works	well, so most of the
	   time	you can	probably just omit the 'i' part.

	   The index is	used to	optimize repeated accesses to the table; pt-
	   archiver remembers the last row it retrieves	from each SELECT
	   statement, and uses it to construct a WHERE clause, using the
	   columns in the specified index, that	should allow MySQL to start
	   the next SELECT where the last one ended, rather than potentially
	   scanning from the beginning of the table with each successive
	   SELECT.  If you are using external plugins, please see "EXTENDING"
	   for a discussion of how they	interact with ascending	indexes.

	   The 'a' and 'b' options allow you to	control	how statements flow
	   through the binary log.  If you specify the 'b' option, binary
	   logging will	be disabled on the specified connection.  If you
	   specify the 'a' option, the connection will "USE" the specified
	   database, which you can use to prevent slaves from executing	the
	   binary log events with "--replicate-ignore-db" options.  These two
	   options can be used as different methods to achieve the same	goal:
	   archive data	off the	master,	but leave it on	the slave.  For
	   example, you	can run	a purge	job on the master and prevent it from
	   happening on	the slave using	your method of choice.

	   WARNING: Using a default options file (F) DSN option	that defines a
	   socket for "--source" causes	pt-archiver to connect to "--dest"
	   using that socket unless another socket for "--dest"	is specified.
	   This	means that pt-archiver may incorrectly connect to "--source"
	   when	it is meant to connect to "--dest".  For example:

	     --source F=host1.cnf,D=db,t=tbl --dest h=host2

	   When	pt-archiver connects to	"--dest", host2, it will connect via
	   the "--source", host1, socket defined in host1.cnf.

	   Collect and print timing statistics.

	   Causes pt-archiver to collect timing	statistics about what it does.
	   These statistics are	available to the plugin	specified by

	   Unless you specify "--quiet", "pt-archiver" prints the statistics
	   when	it exits.  The statistics look like this:

	    Started at 2008-07-18T07:18:53, ended at 2008-07-18T07:18:53
	    Source: D=db,t=table
	    SELECT 4
	    INSERT 4
	    DELETE 4
	    Action	   Count       Time	   Pct
	    commit	      10     0.1079	 88.27
	    select	       5     0.0047	  3.87
	    deleting	       4     0.0028	  2.29
	    inserting	       4     0.0028	  2.28
	    other	       0     0.0040	  3.29

	   The first two (or three) lines show times and the source and
	   destination tables.	The next three lines show how many rows	were
	   fetched, inserted, and deleted.

	   The remaining lines show counts and timing.	The columns are	the
	   action, the total number of times that action was timed, the	total
	   time	it took, and the percent of the	program's total	runtime.  The
	   rows	are sorted in order of descending total	time.  The last	row is
	   the rest of the time	not explicitly attributed to anything.
	   Actions will	vary depending on command-line options.

	   If "--why-quit" is given, its behavior is changed slightly.	This
	   option causes it to print the reason	for exiting even when it's
	   just	because	there are no more rows.

	   This	option requires	the standard Time::HiRes module, which is part
	   of core Perl	on reasonably new Perl releases.

	   Stop	running	instances by creating the sentinel file.

	   Causes pt-archiver to create	the sentinel file specified by
	   "--sentinel"	and exit.  This	should have the	effect of stopping all
	   running instances which are watching	the same sentinel file.

	   type: int; default: 1

	   Number of rows per transaction.

	   Specifies the size, in number of rows, of each transaction. Zero
	   disables transactions altogether.  After pt-archiver	processes this
	   many	rows, it commits both the "--source" and the "--dest" if
	   given, and flushes the file given by	"--file".

	   This	parameter is critical to performance.  If you are archiving
	   from	a live server, which for example is doing heavy	OLTP work, you
	   need	to choose a good balance between transaction size and commit
	   overhead.  Larger transactions create the possibility of more lock
	   contention and deadlocks, but smaller transactions cause more
	   frequent commit overhead, which can be significant.	To give	an
	   idea, on a small test set I worked with while writing pt-archiver,
	   a value of 500 caused archiving to take about 2 seconds per 1000
	   rows	on an otherwise	quiet MySQL instance on	my desktop machine,
	   archiving to	disk and to another table.  Disabling transactions
	   with	a value	of zero, which turns on	autocommit, dropped
	   performance to 38 seconds per thousand rows.

	   If you are not archiving from or to a transactional storage engine,
	   you may want	to disable transactions	so pt-archiver doesn't try to

	   short form: -u; type: string

	   User	for login if not current user.

	   Show	version	and exit.

	   default: yes

	   Check for the latest	version	of Percona Toolkit, MySQL, and other

	   This	is a standard "check for updates automatically"	feature, with
	   two additional features.  First, the	tool checks its	own version
	   and also the	versions of the	following software: operating system,
	   Percona Monitoring and Management (PMM), MySQL, Perl, MySQL driver
	   for Perl (DBD::mysql), and Percona Toolkit. Second, it checks for
	   and warns about versions with known problems. For example, MySQL
	   5.5.25 had a	critical bug and was re-released as 5.5.25a.

	   A secure connection to Perconaas Version Check database server is
	   done	to perform these checks. Each request is logged	by the server,
	   including software version numbers and unique ID of the checked
	   system. The ID is generated by the Percona Toolkit installation
	   script or when the Version Check database call is done for the
	   first time.

	   Any updates or known	problems are printed to	STDOUT before the
	   tool's normal output.  This feature should never interfere with the
	   normal operation of the tool.

	   For more information, visit

	   type: string

	   WHERE clause	to limit which rows to archive (required).

	   Specifies a WHERE clause to limit which rows	are archived.  Do not
	   include the word WHERE.  You	may need to quote the argument to
	   prevent your	shell from interpreting	it.  For example:

	      --where 'ts < current_date - interval 90 day'

	   For safety, "--where" is required.  If you do not require a WHERE
	   clause, use "--where" 1=1.

	   Print reason	for exiting unless rows	exhausted.

	   Causes pt-archiver to print a message if it exits for any reason
	   other than running out of rows to archive.  This can	be useful if
	   you have a cron job with "--run-time" specified, for	example, and
	   you want to be sure pt-archiver is finishing	before running out of

	   If "--statistics" is	given, the behavior is changed slightly.  It
	   will	print the reason for exiting even when it's just because there
	   are no more rows.

	   This	output prints even if "--quiet"	is given.  That's so you can
	   put "pt-archiver" in	a "cron" job and get an	email if there's an
	   abnormal exit.

       These DSN options are used to create a DSN.  Each option	is given like
       "option=value".	The options are	case-sensitive,	so P and p are not the
       same option.  There cannot be whitespace	before or after	the "="	and if
       the value contains whitespace it	must be	quoted.	 DSN options are
       comma-separated.	 See the percona-toolkit manpage for full details.

       o   a

	   copy: no

	   Database to USE when	executing queries.

       o   A

	   dsn:	charset; copy: yes

	   Default character set.

       o   b

	   copy: no

	   If true, disable binlog with	SQL_LOG_BIN.

       o   D

	   dsn:	database; copy:	yes

	   Database that contains the table.

       o   F

	   dsn:	mysql_read_default_file; copy: yes

	   Only	read default options from the given file

       o   h

	   dsn:	host; copy: yes

	   Connect to host.

       o   i

	   copy: yes

	   Index to use.

       o   L

	   copy: yes

	   Explicitly enable LOAD DATA LOCAL INFILE.

	   For some reason, some vendors compile libmysql without the
	   --enable-local-infile option, which disables	the statement.	This
	   can lead to weird situations, like the server allowing LOCAL
	   INFILE, but the client throwing exceptions if it's used.

	   However, as long as the server allows LOAD DATA, clients can	easily
	   re-enable it; See
	   <> and
	   This	option does exactly that.

	   Although we've not found a case where turning this option leads to
	   errors or differing behavior, to be on the safe side, this option
	   is not on by	default.

       o   m

	   copy: no

	   Plugin module name.

       o   p

	   dsn:	password; copy:	yes

	   Password to use when	connecting.  If	password contains commas they
	   must	be escaped with	a backslash: "exam\,ple"

       o   P

	   dsn:	port; copy: yes

	   Port	number to use for connection.

       o   S

	   dsn:	mysql_socket; copy: yes

	   Socket file to use for connection.

       o   t

	   copy: yes

	   Table to archive from/to.

       o   u

	   dsn:	user; copy: yes

	   User	for login if not current user.

       pt-archiver is extensible by plugging in	external Perl modules to
       handle some logic and/or	actions.  You can specify a module for both
       the "--source" and the "--dest",	with the 'm' part of the
       specification.  For example:

	  --source D=test,t=test1,m=My::Module1	--dest m=My::Module2,t=test2

       This will cause pt-archiver to load the My::Module1 and My::Module2
       packages, create	instances of them, and then make calls to them during
       the archiving process.

       You can also specify a plugin with "--plugin".

       The module must provide this interface:

       new(dbh => $dbh,	db => $db_name,	tbl => $tbl_name)
	   The plugin's	constructor is passed a	reference to the database
	   handle, the database	name, and table	name.  The plugin is created
	   just	after pt-archiver opens	the connection,	and before it examines
	   the table given in the arguments.  This gives the plugin a chance
	   to create and populate temporary tables, or do other	setup work.

       before_begin(cols => \@cols, allcols => \@allcols)
	   This	method is called just before pt-archiver begins	iterating
	   through rows	and archiving them, but	after it does all other	setup
	   work	(examining table structures, designing SQL queries, and	so
	   on).	 This is the only time pt-archiver tells the plugin column
	   names for the rows it will pass the plugin while archiving.

	   The "cols" argument is the column names the user requested to be
	   archived, either by default or by the "--columns" option.  The
	   "allcols" argument is the list of column names for every row	pt-
	   archiver will fetch from the	source table.  It may fetch more
	   columns than	the user requested, because it needs some columns for
	   its own use.	 When subsequent plugin	functions receive a row, it is
	   the full row	containing all the extra columns, if any, added	to the

       is_archivable(row => \@row)
	   This	method is called for each row to determine whether it is
	   archivable.	This applies only to "--source".  The argument is the
	   row itself, as an arrayref.	If the method returns true, the	row
	   will	be archived; otherwise it will be skipped.

	   Skipping a row adds complications for non-unique indexes.  Normally
	   pt-archiver uses a WHERE clause designed to target the last
	   processed row as the	place to start the scan	for the	next SELECT
	   statement.  If you have skipped the row by returning	false from
	   is_archivable(), pt-archiver	could get into an infinite loop
	   because the row still exists.  Therefore, when you specify a	plugin
	   for the "--source" argument,	pt-archiver will change	its WHERE
	   clause slightly.  Instead of	starting at "greater than or equal to"
	   the last processed row, it will start "strictly greater than."
	   This	will work fine on unique indexes such as primary keys, but it
	   may skip rows (leave	holes) on non-unique indexes or	when ascending
	   only	the first column of an index.

	   "pt-archiver" will change the clause	in the same way	if you specify
	   "--no-delete", because again	an infinite loop is possible.

	   If you specify the "--bulk-delete" option and return	false from
	   this	method,	"pt-archiver" may not do what you want.	 The row won't
	   be archived,	but it will be deleted,	since bulk deletes operate on
	   ranges of rows and don't know which rows the	plugin selected	to

	   If you specify the "--bulk-insert" option, this method's return
	   value will influence	whether	the row	is written to the temporary
	   file	for the	bulk insert, so	bulk inserts will work as expected.
	   However, bulk inserts require bulk deletes.

       before_delete(row => \@row)
	   This	method is called for each row just before it is	deleted.  This
	   applies only	to "--source".	This is	a good place for you to	handle
	   dependencies, such as deleting things that are foreign-keyed	to the
	   row you are about to	delete.	 You could also	use this to
	   recursively archive all dependent tables.

	   This	plugin method is called	even if	"--no-delete" is given,	but
	   not if "--bulk-delete" is given.

       before_bulk_delete(first_row => \@row, last_row => \@row)
	   This	method is called just before a bulk delete is executed.	 It is
	   similar to the "before_delete" method, except its arguments are the
	   first and last row of the range to be deleted.  It is called	even
	   if "--no-delete" is given.

       before_insert(row => \@row)
	   This	method is called for each row just before it is	inserted.
	   This	applies	only to	"--dest".  You could use this to insert	the
	   row into multiple tables, perhaps with an ON	DUPLICATE KEY UPDATE
	   clause to build summary tables in a data warehouse.

	   This	method is not called if	"--bulk-insert"	is given.

       before_bulk_insert(first_row => \@row, last_row => \@row, filename =>
	   This	method is called just before a bulk insert is executed.	 It is
	   similar to the "before_insert" method, except its arguments are the
	   first and last row of the range to be deleted.

       custom_sth(row => \@row,	sql => $sql)
	   This	method is called just before inserting the row,	but after
	   "before_insert()".  It allows the plugin to specify different
	   "INSERT" statement if desired.  The return value (if	any) should be
	   a DBI statement handle.  The	"sql" parameter	is the SQL text	used
	   to prepare the default "INSERT" statement.  This method is not
	   called if you specify "--bulk-insert".

	   If no value is returned, the	default	"INSERT" statement handle is

	   This	method applies only to the plugin specified for	"--dest", so
	   if your plugin isn't	doing what you expect, check that you've
	   specified it	for the	destination and	not the	source.

       custom_sth_bulk(first_row => \@row, last_row => \@row, sql => $sql,
       filename	=> $bulk_insert_filename)
	   If you've specified "--bulk-insert",	this method is called just
	   before the bulk insert, but after "before_bulk_insert()", and the
	   arguments are different.

	   This	method's return	value etc is similar to	the "custom_sth()"

	   This	method is called after pt-archiver exits the archiving loop,
	   commits all database	handles, closes	"--file", and prints the final
	   statistics, but before pt-archiver runs ANALYZE or OPTIMIZE (see
	   "--analyze" and "--optimize").

       If you specify a	plugin for both	"--source" and "--dest", pt-archiver
       constructs, calls before_begin(), and calls after_finish() on the two
       plugins in the order "--source",	"--dest".

       pt-archiver assumes it controls transactions, and that the plugin will
       NOT commit or roll back the database handle.  The database handle
       passed to the plugin's constructor is the same handle pt-archiver uses
       itself.	Remember that "--source" and "--dest" are separate handles.

       A sample	module might look like this:

	  package My::Module;

	  sub new {
	     my	( $class, %args	) = @_;
	     return bless(\%args, $class);

	  sub before_begin {
	     my	( $self, %args ) = @_;
	     # Save column names for later
	     $self->{cols} = $args{cols};

	  sub is_archivable {
	     my	( $self, %args ) = @_;
	     # Do some advanced	logic with $args{row}
	     return 1;

	  sub before_delete {} # Take no action
	  sub before_insert {} # Take no action
	  sub custom_sth    {} # Take no action
	  sub after_finish  {} # Take no action


       The environment variable	"PTDEBUG" enables verbose debugging output to
       STDERR.	To enable debugging and	capture	all output to a	file, run the
       tool like:

	  PTDEBUG=1 pt-archiver	... > FILE 2>&1

       Be careful: debugging output is voluminous and can generate several
       megabytes of output.

       You need	Perl, DBI, DBD::mysql, and some	core packages that ought to be
       installed in any	reasonably new version of Perl.

       For a list of known bugs, see

       Please report bugs at <>.  Include
       the following information in your bug report:

       o   Complete command-line used to run the tool

       o   Tool	"--version"

       o   MySQL version of all	servers	involved

       o   Output from the tool	including STDERR

       o   Input files (log/dump/config	files, etc.)

       If possible, include debugging output by	running	the tool with
       "PTDEBUG"; see "ENVIRONMENT".

       Visit <>	to download
       the latest release of Percona Toolkit.  Or, get the latest release from
       the command line:




       You can also get	individual tools from the latest release:


       Replace "TOOL" with the name of any tool.

       Baron Schwartz

       Andrew O'Brien

       This tool is part of Percona Toolkit, a collection of advanced command-
       line tools for MySQL developed by Percona.  Percona Toolkit was forked
       from two	projects in June, 2011:	Maatkit	and Aspersa.  Those projects
       were created by Baron Schwartz and primarily developed by him and
       Daniel Nichter.	Visit <> to learn
       about other free, open-source software from Percona.

       This program is copyright 2011-2018 Percona LLC and/or its affiliates,
       2007-2011 Baron Schwartz.


       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation, version 2; OR the Perl	Artistic License.  On
       UNIX and	similar	systems, you can issue `man perlgpl' or	`man
       perlartistic' to	read these licenses.

       You should have received	a copy of the GNU General Public License along
       with this program; if not, write	to the Free Software Foundation, Inc.,
       59 Temple Place,	Suite 330, Boston, MA  02111-1307  USA.

       pt-archiver 3.2.0

       Hey! The	above document had some	coding errors, which are explained

       Around line 8233:
	   Non-ASCII character seen before =encoding in	'Perconaas'. Assuming

perl v5.32.1			  2020-04-23			PT-ARCHIVER(1)


Want to link to this manual page? Use this URL:

home | help