Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
PPI(3)		      User Contributed Perl Documentation		PPI(3)

       PPI - Parse, Analyze and	Manipulate Perl	(without perl)

	 use PPI;

	 # Create a new	empty document
	 my $Document =	PPI::Document->new;

	 # Create a document from source
	 $Document = PPI::Document->new(\'print	"Hello World!\n"');

	 # Load	a Document from	a file
	 $Document = PPI::Document->new('');

	 # Does	it contain any POD?
	 if ( $Document->find_any('PPI::Token::Pod') ) {
	     print "Module contains POD\n";

	 # Get the name	of the main package
	 $pkg =	$Document->find_first('PPI::Statement::Package')->namespace;

	 # Remove all that nasty documentation

	 # Save	the file

   About this Document
       This is the PPI manual. It describes its	reason for existing, its
       general structure, its use, an overview of the API, and provides	a few
       implementation samples.

       The ability to read, and	manipulate Perl	(the language)
       programmatically	other than with	perl (the application) was one that
       caused difficulty for a long time.

       The cause of this problem was Perl's complex and	dynamic	grammar.
       Although	there is typically not a huge diversity	in the grammar of most
       Perl code, certain issues cause large problems when it comes to

       Indeed, quite early in Perl's history Tom Christiansen introduced the
       Perl community to the quote "Nothing but	perl can parse Perl", or as it
       is more often stated now	as a truism:

       "Only perl can parse Perl"

       One example of the sorts	of things the prevent Perl being easily	parsed
       are function signatures,	as demonstrated	by the following.

	 @result = (dothis $foo, $bar);

	 # Which of the	following is it	equivalent to?
	 @result = (dothis($foo), $bar);
	 @result = dothis($foo,	$bar);

       The first line above can	be interpreted in two different	ways,
       depending on whether the	&dothis	function is expecting one argument, or
       two, or several.

       A "code parser" (something that parses for the purpose of execution)
       such as perl needs information that is not found	in the immediate
       vicinity	of the statement being parsed.

       The information might not just be elsewhere in the file,	it might not
       even be in the same file	at all.	It might also not be able to determine
       this information	without	the prior execution of a "BEGIN	{}" block, or
       the loading and execution of one	or more	external modules. Or worse the
       &dothis function	may not	even have been written yet.

       When parsing Perl as code, you must also	execute	it

       Even perl itself	never really fully understands the structure of	the
       source code after and indeed as it processes it,	and in that sense
       doesn't "parse" Perl source into	anything remotely like a structured
       document.  This makes it	of no real use for any task that needs to
       treat the source	code as	a document, and	do so reliably and robustly.

       For more	information on why it is impossible to parse perl, see Randal
       Schwartz's seminal response to the question of "Why can't you parse


       The purpose of PPI is not to parse Perl Code, but to parse Perl
       Documents. By treating the problem this way, we are able	to parse a
       single file containing Perl source code "isolated" from any other
       resources, such as libraries upon which the code	may depend, and
       without needing to run an instance of perl alongside or inside the

       Historically, using an embedded perl parser was widely considered to be
       the most	likely avenue for finding a solution to	parsing	Perl. It has
       been investigated from time to time, but	attempts have generally	failed
       or suffered from	sufficiently bad corner	cases that they	were

   What	Does PPI Stand For?
       "PPI" is	an acronym for the longer original module name
       "Parse::Perl::Isolated".	And in the spirit of the silly acronym games
       played by certain unnamed Open Source projects you may have hurd	of, it
       is also a reverse backronym of "I Parse Perl".

       Of course, I could just be lying	and have just made that	second bit up
       10 minutes before the release of	PPI 1.000. Besides, all	the cool Perl
       packages	have TLAs (Three Letter	Acronyms). It's	a rule or something.

       Why don't you just think	of it as the Perl Parsing Interface for

       The original name was shortened to prevent the author (and you the
       users) from contracting RSI by having to	type crazy things like
       "Parse::Perl::Isolated::Token::QuoteLike::Backtick" 100 times a day.

       In acknowledgment that someone may some day come	up with	a valid
       solution	for the	grammar	problem	it was decided at the commencement of
       the project to leave the	"Parse::Perl" namespace	free for any such

       Since that time I've been able to prove to my own satisfaction that it
       is truly	impossible to accurately parse Perl as both code and document
       at once.	For the	academics, parsing Perl	suffers	from the "Halting

   Why Parse Perl?
       Once you	can accept that	we will	never be able to parse Perl well
       enough to meet the standards of things that treat Perl as code, it is
       worth re-examining why we want to "parse" Perl at all.

       What are	the things that	people might want a "Perl parser" for?

	   Analyzing the contents of a Perl document to	automatically generate
	   documentation, in parallel to, or as	a replacement for, POD

	   Allow an indexer to locate and process all the comments and
	   documentation from code for "full text search" applications.

       Structural and Quality Analysis
	   Determine quality or	other metrics across a body of code, and
	   identify situations relating	to particular phrases, techniques or

	   Index functions, variables and packages within Perl code, and doing
	   search and graph (in	the node/edge sense) analysis of large code

	   Perl::Critic, based on PPI, is a large, thriving tool for bug
	   detection and style analysis	of Perl	code.

	   Make	structural, syntax, or other changes to	code in	an automated
	   manner, either independently	or in assistance to an editor. This
	   sort	of task	list includes backporting, forward porting, partial
	   evaluation, "improving" code, or whatever. All the sort of things
	   you'd want from a Perl::Editor.

	   Change the layout of	code without changing its meaning. This
	   includes techniques such as tidying (like perltidy),	obfuscation,
	   compressing and "squishing",	or to implement	formatting preferences
	   or policies.

	   This	includes methods of improving the presentation of code,
	   without changing the	content	of the code. Modify, improve, syntax
	   colour etc the presentation of a Perl document. Generating
	   "IntelliText"-like functions.

       If we treat this	as a baseline for the sort of things we	are going to
       have to build on	top of Perl, then it becomes possible to identify a
       standard	for how	good a Perl parser needs to be.

   How good is Good Enough(TM)
       PPI seeks to be good enough to achieve all of the above tasks, or to
       provide a sufficiently good API on which	to allow others	to implement
       modules in these	and related areas.

       However,	there are going	to be limits to	this process. Because PPI
       cannot adapt to changing	grammars, any code written using source
       filters should not be assumed to	be parsable.

       At one extreme, this includes anything munged by	Acme::Bleach, as well
       as (arguably) more common cases like Switch. We do not pretend to be
       able to always parse code using these modules, although as long as it
       still follows a format that looks like Perl syntax, it may be possible
       to extend the lexer to handle them.

       The ability to extend PPI to handle lexical additions to	the language
       is on the drawing board to be done some time post-1.0

       The goal	for success was	originally to be able to successfully parse
       99% of all Perl documents contained in CPAN. This means the entire file
       in each case.

       PPI has succeeded in this goal far beyond the expectations of even the
       author. At time of writing there	are only 28 non-Acme Perl modules in
       CPAN that PPI is	incapable of parsing. Most of these are	so badly
       broken they do not compile as Perl code anyway.

       So unless you are actively going	out of your way	to break PPI, you
       should expect that it will handle your code just	fine.

       PPI provides partial support for	internationalisation and localisation.

       Specifically, it	allows the use of characters from the Latin-1
       character set to	be used	in quotes, comments, and POD. Primarily, this
       covers languages	from Europe and	South America.

       PPI does	not currently provide support for Unicode.  If you need
       Unicode support and would like to help, contact the author. (contact
       details below)

   Round Trip Safe
       When PPI	parses a file it builds	everything into	the model, including
       whitespace. This	is needed in order to make the Document	fully "Round
       Trip" safe.

       The general concept behind a "Round Trip" parser	is that	it knows what
       it is parsing is	somewhat uncertain, and	so expects to get things wrong
       from time to time. In the cases where it	parses code wrongly the	tree
       will serialize back out to the same string of code that was read	in,
       repairing the parser's mistake as it heads back out to the file.

       The end result is that if you parse in a	file and serialize it back out
       without changing	the tree, you are guaranteed to	get the	same file you
       started with. PPI does this correctly and reliably for 100% of all
       known cases.

       What goes in, will come out. Every time.

       The one minor exception at this time is that if the newlines for	your
       file are	wrong (meaning not matching the	platform newline format), PPI
       will localise them for you. (It isn't to	be convenient, supporting
       arbitrary newlines would	make some of the code more complicated)

       Better control of the newline type is on	the wish list though, and
       anyone wanting to help out is encouraged	to contact the author.

   General Layout
       PPI is built upon two primary "parsing" components, PPI::Tokenizer and
       PPI::Lexer, and a large tree of about 70	classes	which implement	the
       various the Perl	Document Object	Model (PDOM).

       The PDOM	is conceptually	similar	in style and intent to the regular DOM
       or other	code Abstract Syntax Trees (ASTs), but contains	some
       differences to handle perl-specific cases, and to assist	in treating
       the code	as a document. Please note that	it is not an implementation of
       the official Document Object Model specification, only somewhat similar
       to it.

       On top of the Tokenizer,	Lexer and the classes of the PDOM, sit a
       number of classes intended to make life a little	easier when dealing
       with PDOM trees.

       Both the	major parsing components were hand-coded from scratch with
       only plain Perl code and	a few small utility modules. There are no
       grammar or patterns mini-languages, no YACC or LEX style	tools and only
       a small number of regular expressions.

       This is primarily because of the	sheer volume of	accumulated cruft that
       exists in Perl. Not even	perl itself is capable of parsing Perl
       documents (remember, it just parses and executes	it as code).

       As a result, PPI	needed to be cruftier than perl	itself.	Feel free to
       shudder at this point, and hope you never have to understand the
       Tokenizer codebase. Speaking of which...

   The Tokenizer
       The Tokenizer takes source code and converts it into a series of
       tokens. It does this using a slow but thorough character	by character
       manual process, rather than using a pattern system or complex regexes.

       Or at least it does so conceptually. If you were	to actually trace the
       code you	would find it's	not truly character by character due to	a
       number of regexps and optimisations throughout the code.	This lets the
       Tokenizer "skip ahead" when it can find shortcuts, so it	tends to jump
       around a	line a bit wildly at times.

       In practice, the	number of times	the Tokenizer will actually move the
       character cursor	itself is only about 5%	- 10% higher than the number
       of tokens contained in the file.	This makes it about as optimal as it
       can be made without implementing	it in something	other than Perl.

       In 2001 when PPI	was started, this structure made PPI quite slow, and
       not really suitable for interactive tasks. This situation has improved
       greatly with multi-gigahertz processors,	but can	still be painful when
       working with very large files.

       The target parsing rate for PPI is about	5000 lines per gigacycle. It
       is currently believed to	be at about 1500, and the main avenue for
       making it to the	target speed has now become PPI::XS, a drop-in XS
       accelerator for PPI.

       Since PPI::XS has only just gotten off the ground and is	currently only
       at proof-of-concept stage, this may take	a little while.	Anyone
       interested in helping out with PPI::XS is highly	encouraged to contact
       the author. In fact, the	design of PPI::XS means	it's possible to port
       one function at a time safely and reliably. So every little bit will

   The Lexer
       The Lexer takes a token stream, and converts it to a lexical tree.
       Because we are parsing Perl documents this includes whitespace,
       comments, and all number	of weird things	that have no relevance when
       code is actually	executed.

       An instantiated PPI::Lexer consumes PPI::Tokenizer objects and produces
       PPI::Document objects. However you should probably never	be working
       with the	Lexer directly.	You should just	be able	to create
       PPI::Document objects and work with them	directly.

   The Perl Document Object Model
       The PDOM	is a structured	collection of data classes that	together
       provide a correct and scalable model for	documents that follow the
       standard	Perl syntax.

   The PDOM Class Tree
       The following lists all of the 72 current PDOM classes, listing with
       indentation based on inheritance.


       To summarize the	above layout, all PDOM objects inherit from the
       PPI::Element class.

       Under this are PPI::Token, strings of content with a known type,	and
       PPI::Node, syntactically	significant containers that hold other

       The three most important	of these are the PPI::Document,	the
       PPI::Statement and the PPI::Structure classes.

   The Document, Statement and Structure
       At the top of all complete PDOM trees is	a PPI::Document	object.	It
       represents a complete file of Perl source code as you might find	it on

       There are some specialised types	of document, such as
       PPI::Document::File and PPI::Document::Normalized but for the purposes
       of the PDOM they	are all	just considered	to be the same thing.

       Each Document will contain a number of Statements, Structures and

       A PPI::Statement	is any series of Tokens	and Structures that are
       treated as a single contiguous statement	by perl	itself.	You should
       note that a Statement is	as close as PPI	can get	to "parsing" the code
       in the sense that perl-itself parses Perl code when it is building the

       Because of the isolation	and Perl's syntax, it is provably impossible
       for PPI to accurately determine precedence of operators or which	tokens
       are implicit arguments to a sub call.

       So rather than lead you on with a bad guess that	has a strong chance of
       being wrong, PPI	does not attempt to determine precedence or sub
       parameters at all.

       At a fundamental	level, it only knows that this series of elements
       represents a single Statement as	perl sees it, but it can do so with
       enough certainty	that it	can be trusted.

       However,	for specific Statement types the PDOM is able to derive
       additional useful information about their meaning. For the best,	most
       useful, and most	heavily	used example, see PPI::Statement::Include.

       A PPI::Structure	is any series of tokens	contained within matching
       braces.	This includes code blocks, conditions, function	argument
       braces, anonymous array and hash	constructors, lists, scoping braces
       and all other syntactic structures represented by a matching pair of
       braces, including (although it may not seem obvious at first)
       "<READLINE>" braces.

       Each Structure contains none, one, or many Tokens and Structures	(the
       rules for which vary for	the different Structure	subclasses)

       Under the PDOM structure	rules, a Statement can never directly contain
       another child Statement,	a Structure can	never directly contain another
       child Structure,	and a Document can never contain another Document
       anywhere	in the tree.

       Aside from these	three rules, the PDOM tree is extremely	flexible.

   The PDOM at Work
       To demonstrate the PDOM in use lets start with an example showing how
       the tree	might look for the following chunk of simple Perl code.


	 print(	"Hello World!" );


       Translated into a PDOM tree it would have the following structure (as
       shown via the included PPI::Dumper).

	   PPI::Token::Comment		      '#!/usr/bin/perl\n'
	   PPI::Token::Whitespace	      '\n'
	     PPI::Token::Word		      'print'
	     PPI::Structure::List	      (	... )
	       PPI::Token::Whitespace	      '	'
		 PPI::Token::Quote::Double    '"Hello World!"'
	       PPI::Token::Whitespace	      '	'
	     PPI::Token::Structure	      ';'
	   PPI::Token::Whitespace	      '\n'
	   PPI::Token::Whitespace	      '\n'
	     PPI::Token::Word		      'exit'
	     PPI::Structure::List	      (	... )
	     PPI::Token::Structure	      ';'
	   PPI::Token::Whitespace	      '\n'

       Please note that	in this	example, strings are only listed for the
       actual PPI::Token that contains that string. Structures are listed with
       the type	of brace characters they represent noted.

       The PPI::Dumper module can be used to generate similar trees yourself.

       We can make that	PDOM dump a little easier to read if we	strip out all
       the whitespace. Here it is again, sans the distracting whitespace

	   PPI::Token::Comment		      '#!/usr/bin/perl\n'
	     PPI::Token::Word		      'print'
	     PPI::Structure::List	      (	... )
		 PPI::Token::Quote::Double    '"Hello World!"'
	     PPI::Token::Structure	      ';'
	     PPI::Token::Word		      'exit'
	     PPI::Structure::List	      (	... )
	     PPI::Token::Structure	      ';'

       As you can see, the tree	can get	fairly deep at time, especially	when
       every isolated token in a bracket becomes its own statement. This is
       needed to allow anything	inside the tree	the ability to grow. It	also
       makes the search	and analysis algorithms	much more flexible.

       Because of the depth and	complexity of PDOM trees, a vast number	of
       very easy to use	methods	have been added	wherever possible to help
       people working with PDOM	trees do normal	tasks relatively quickly and

   Overview of the Primary Classes
       The main	PPI classes, and links to their	own documentation, are listed
       here in alphabetical order.

	   The Document	object,	the root of the	PDOM.

	   A cohesive fragment of a larger Document. Although not of any real
	   current use,	it is needed for use in	certain	internal tree
	   manipulation	algorithms.

	   For example,	doing things like cut/copy/paste etc. Very similar to
	   a PPI::Document, but	has some additional methods and	does not
	   represent a lexical scope boundary.

	   A document fragment is also non-serializable, and so	cannot be
	   written out to a file.

	   A simple class for dumping readable debugging versions of PDOM
	   structures, such as in the demonstration above.

	   The Element class is	the abstract base class	for all	objects	within
	   the PDOM

	   Implements an instantiable object form of a PDOM tree search.

	   The PPI Lexer. Converts Token streams into PDOM trees.

	   The Node object, the	abstract base class for	all PDOM objects that
	   can contain other Elements, such as the Document, Statement and
	   Structure objects.

	   The base class for all Perl statements. Generic "evaluate for side-
	   effects" statements are of this actual type.	Other more interesting
	   statement types belong to one of its	children.

	   See its own documentation for a longer description and list of all
	   of the different statement types and	sub-classes.

	   The abstract	base class for all structures. A Structure is a
	   language construct consisting of matching braces containing a set
	   of other elements.

	   See the PPI::Structure documentation	for a description and list of
	   all of the different	structure types	and sub-classes.

	   A token is the basic	unit of	content. At its	most basic, a Token is
	   just	a string tagged	with metadata (its class, and some additional
	   flags in some cases).

	   The PPI::Token::Quote and PPI::Token::QuoteLike classes provide
	   abstract base classes for the many and varied types of quote	and
	   quote-like things in	Perl. However, much of the actual quote	logic
	   is implemented in a separate	quote engine, based at

	   Classes that	inherit	from PPI::Token::Quote,	PPI::Token::QuoteLike
	   and PPI::Token::Regexp are generally	parsed only by the Quote

	   The PPI Tokenizer. One Tokenizer consumes a chunk of	text and
	   provides access to a	stream of PPI::Token objects.

	   The Tokenizer is very very complicated, to the point	where even the
	   author treads carefully when	working	with it.

	   Most	of the complication is the result of optimizations which have
	   tripled the tokenization speed, at the expense of maintainability.
	   We cope with	the spaghetti by heavily commenting everything.

	   The Perl Document Transformation API. Provides a standard interface
	   and abstract	base class for objects and classes that	manipulate

       The core	PPI distribution is pure Perl and has been kept	as tight as
       possible	and with as few	dependencies as	possible.

       It should download and install normally on any platform from within the
       CPAN and	CPANPLUS applications, or directly using the distribution
       tarball.	If installing by hand, you may need to install a few small
       utility modules first. The exact	ones will depend on your version of

       There are no special install instructions for PPI, and the normal "Perl
       Makefile.PL", "make", "make test", "make	install" instructions apply.

       The PPI namespace itself	is reserved for	use by PPI itself.  You	are
       recommended to use the PPIx:: namespace for PPI-specific	modifications
       or prototypes thereof, or Perl::	for modules which provide a general
       Perl language-related functions.

       If what you wish	to implement looks like	it fits	into the PPIx::
       namespace, you should consider contacting the PPI maintainers on	GitHub
       first, as what you want may already be in progress, or you may wish to
       consider	contributing to	PPI itself.

       - Many more analysis and	utility	methods	for PDOM classes

       - Creation of a PPI::Tutorial document

       - Add many more key functions to	PPI::XS

       - We can	always write more and better unit tests

       - Complete the full implementation of ->literal (1.200)

       - Full understanding of scoping (due 1.300)

       The most	recent version of PPI is available at the following address.


       PPI source is maintained	in a GitHub repository at the following


       Contributions via GitHub	pull request are welcome.

       Bug fixes in the	form of	pull requests or bug reports with new
       (failing) unit tests have the best chance of being addressed by busy
       maintainers, and	are strongly encouraged.

       If you cannot provide a test or fix, or don't have time to do so, then
       regular bug reports are still accepted and appreciated via the GitHub
       bug tracker.


       The "ppidump" utility that is part of the Perl::Critic distribution is
       a useful	tool for demonstrating how PPI is parsing (or misparsing)
       small code snippets, and	for providing information for bug reports.

       For other issues, questions, or commercial or media-related enquiries,
       contact the author.

       Adam Kennedy <>

       A huge thank you	to Phase N Australia (<>) for
       permitting the original open sourcing and release of this distribution
       from what was originally	several	thousand hours of commercial work.

       Another big thank you to	The Perl Foundation
       (<>) for funding for the final big
       refactoring and completion run.

       Also, to	the various co-maintainers that	have contributed both large
       and small with tests and	patches	and especially to those	rare few who
       have deep-dived into the	guts to	(gasp) add a feature.

	 - Dan Brook	   : PPIx::XPath, Acme::PerlML
	 - Audrey Tang	   : "Line Noise" Testing
	 - Arjen Laarhoven : Three-element ->location support
	 - Elliot Shank	   : Perl 5.10 support,	five-element ->location

       And finally, thanks to those brave ( and	foolish	:) ) souls willing to
       dive in and use,	test drive and provide feedback	on PPI before version
       1.000, in some cases before it made it to beta quality, and still did
       extremely distasteful things (like eating 50 meg	of RAM a second).

       I owe you all a beer. Corner me somewhere and collect at	your
       convenience.  If	I missed someone who wasn't in my email	history, thank
       you too :)

	 # In approximate order	of appearance
	 - Claes Jacobsson
	 - Michael Schwern
	 - Jeff	T. Parsons
	 - Robert Rotherberg
	 - Richard Soderberg
	 - Nadim ibn Hamouda el	Khemir
	 - Graciliano M. P.
	 - Leon	Brocard
	 - Jody	Belka
	 - Curtis Ovid
	 - Yuval Kogman
	 - Michael Schilli
	 - Slaven Rezic
	 - Lars	Thegler
	 - Tony	Stubblebine
	 - Tatsuhiko Miyagawa
	 - Matisse Enzer
	 - Roy Fulbright
	 - Dan Brook
	 - Johnny Lee
	 - Johan Lindstrom

       And to single one person	out, thanks go to Randal Schwartz who spent a
       great number of hours in	IRC over a critical 6 month period explaining
       why Perl	is impossibly unparsable and constantly	shoving	evil and ugly
       corner cases in my face.	He remained a tireless devil's advocate, and
       without his support this	project	genuinely could	never have been

       So for my schooling in the Deep Magiks, you have	my deepest gratitude

       Copyright 2001 -	2011 Adam Kennedy.

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

       The full	text of	the license can	be found in the	LICENSE	file included
       with this module.

perl v5.32.1			  2019-07-09				PPI(3)


Want to link to this manual page? Use this URL:

home | help