Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
PDFGREP(1)			Pdfgrep	Manual			    PDFGREP(1)

NAME
       pdfgrep - search	PDF files for a	regular	expression

SYNOPSIS
       pdfgrep [OPTION...] PATTERN [FILE...]
       pdfgrep [OPTION...] [-e PATTERN | -f FILE] [FILE...]

DESCRIPTION
       Search for PATTERN in each PDF FILE and print matching lines. By
       default,	PATTERN	is an extended regular expression.

       pdfgrep tries to	be mostly compatible with GNU grep with	some
       PDF-specific distinctions and additional	options. Most notably, -n
       prints page instead of line numbers.

OPTIONS
   General Information
       --help
	   Print a short summary of the	options.

       -V, --version
	   Show	version	information.

   Pattern Interpretation
       -F, --fixed-strings
	   Interpret PATTERN as	a list of fixed	strings	separated by newlines,
	   any of which	is to be matched.

       -P, --perl-regexp
	   Interpret PATTERN as	a Perl compatible regular expression (PCRE).
	   See pcresyntax(3) for a quick overview.

   Matching Control
       -e PATTERN, --regexp=PATTERN
	   Use PATTERN as the pattern to search	for. If	this option is
	   specified multiple times or combined	with --file, all patterns are
	   tried in turn until one of them matches.

       -f FILE,	--file=FILE
	   Read	patterns from FILE, one	per line. If FILE contains multiple
	   patterns or if this option is applied multiple times	or combined
	   with	-e, all	patterns are tried in turn until one of	them matches.
	   An empty pattern list matches nothing.

       -i, --ignore-case
	   Ignore case distinctions in both the	PATTERN	and the	input files.

   General Output Control
       -c, --count
	   Suppress normal output. Instead print the number of matches for
	   each	input file. Note that unlike grep, multiple matches on the
	   same	page will be counted individually.

       -p, --page-count
	   Like	-c, but	prints the number of matches per page. Implies -n.

       --color WHEN
	   Surround file names,	page numbers and matched text with escape
	   sequences to	display	them in	color on the terminal.	WHEN can be:

	   always   Always use colors, even
		    when stdout	is not a
		    terminal.
	   never    Do not use colors.
	   auto	    Use	colors only when
		    stdout is a	terminal (this
		    is the default).

       -L, --files-without-match
	   Suppress normal output. Instead print the name of each input	file
	   that	doesn't	contain	a match. This works well with -Z, but many
	   other output	options	like -n	or -c are ignored when -L is
	   specified.

       -l, --files-with-matches
	   Suppress normal output. Instead print the name of each input	file
	   that	contains a match. This works well with -Z, but many other
	   output options like -n or -c	are ignored when -l is specified.

       -m, --max-count NUM
	   Stop	reading	a file after NUM matches. When the -c or --count
	   option is also used,	pdfgrep	does not output	a count	greater	than
	   NUM.

       -o, --only-matching
	   Print only the matched part of a line without any surrounding
	   context.

       -q, --quiet
	   Suppress all	normal output to stdout. Exit immediately with exit
	   status 0 if a match is found, even in case of errors. Use this if
	   you only care about the presence of matches,	not their number or
	   content.

   Line	Prefix Control
       -H, --with-filename
	   Print the file name for each	match. This is the default setting
	   when	there is more than one file to search.

       -h, --no-filename
	   Suppress the	prefixing of file name on output. This is the default
	   setting when	there is only one file to search.

       -n, --page-number
	   Prefix each match with the number of	the page where it was found.

       -Z, --null
	   Output a null byte (called NUL in ASCII and '\0' in C) instead of
	   the colon that usually separates a filename from the	rest of	the
	   line. This option makes the output unambiguous in the presence of
	   colons, spaces or newlines in the filename. It can be used in
	   conjunction with commands such as xargs -0 or perl -0.

       --match-prefix-separator	SEP
	   Changes the colon used to separate filename,	line number and	text
	   in the output to SEP, which can be an arbitrary string. This	is
	   useful when filenames contain colons, but only for interactive
	   usage. For scripting, --null	should be used.

   Context Control
       -A NUM, --after-context=NUM
	   Print NUM lines of context after matching lines. Contiguous groups
	   of matches are separated by a line containing --. With -o, this
	   option has no effect.

       -B NUM, --before-context=NUM
	   Print NUM lines of context before matching lines. Contiguous	groups
	   of matches are separated by a line containing --. With -o, this
	   option has no effect.

       -C NUM, --context=NUM
	   Print NUM lines of context before and after matching	lines.
	   Contiguous groups of	matches	are separated by a line	containing --.
	   With	-o, this option	has no effect.

   File	Selection
       -r, --recursive
	   Recursively search all files	(restricted by --include and
	   --exclude) under each directory, following symlinks only if they
	   are on the command line.

       -R, --dereference-recursive
	   Same	as -r, but follows all symlinks.

       --exclude=GLOB
	   Skip	files whose base name matches GLOB. See	glob(7)	for wildcards
	   you can use.	You can	use this option	multiple times to exclude more
	   patterns. It	takes precedence over --include. Note, that in-	and
	   excludes apply only to files	found via --recursive and not to the
	   argument list.

       --include=GLOB
	   Only	search files whose base	name matches GLOB. See --exclude for
	   details. The	default	is *.pdf.

   Other Options
       --cache
	   Use a cache for the rendered	text to	speed up the operation on
	   large files.

       --password=PASSWORD
	   Use PASSWORD	to decrypt the PDF-files. Can be specified multiple
	   times; all passwords	will be	tried on all PDFs.  Note that this
	   password will show up in your command history and the output	of
	   ps(1). So please do not use this if the security of PASSWORD	is
	   important.

       --page-range=RANGE
	   Limit search	to a specified set of pages.  RANGE is a comma
	   separated list of either a single page number or a range expression
	   of the form PAGE1-PAGE2. Example: 2-3,5,7-10.

       --debug
	   Enable debug	output.	 Note: Due to limitations of poppler before
	   version 0.30.0, some	debug output is	also printed without --debug
	   when	using such a poppler version.

       --warn-empty
	   Print a warning to stderr if	a PDF contains no searchable text.
	   This	is the case for	PDFs that consist only of images, for example
	   scanned documents.

       --unac
	   Remove accents and ligatures	from both the search pattern and the
	   PDF documents. This is useful if you	want to	search for a word
	   containing "ae", but	the PDF	uses the single	character "A|"
	   instead. See	unac(3)	and unaccent(1)	for details.

	   This	option is experimental and only	available if pdfgrep is
	   compiled with unac support.

EXIT STATUS
       Normally, the exit status is 0 if at least one match is found, 1	if no
       match is	found and 2 if an error	occurred. But if the --quiet or	-q
       option is used and a match was found, pdfgrep will return 0 regardless
       of errors.

ENVIRONMENT VARIABLES
       The behavior of pdfgrep is affected by the following environment
       variable.

       GREP_COLORS
	   Specifies the colors	and other attributes used to highlight various
	   parts of the	output.	The syntax and values are like GREP_COLORS of
	   grep. See grep(1) for more details. Currently only the capabilities
	   mt, ms, mc, fn, ln and se are used by pdfgrep, where	mt, ms and mc
	   have	the same effect.

FILES
       ${XDG_CACHE_HOME}/pdfgrep/*
	   Cache files written and used	when --cache is	enabled. At most 200
	   cache entries older than a day are retained.

EXAMPLES
       Print the first ten lines matching pattern and print their page number:

	       pdfgrep -n --max-count 10 pattern foo.pdf

       Search all .pdf files whose names begin with foo	recursively in the
       current directory:

	       pdfgrep -r --include "foo*.pdf" pattern

       Search all PDFs in the current directory	for foo	that also contain bar:

	       pdfgrep -Z --files-with-matches "bar" *.pdf | xargs -0 pdfgrep -H foo

       Search all .pdf files that are smaller than 12M recursively in the
       current directory:

	       find . -name "*.pdf" -size -12M -print0 | xargs -0 pdfgrep pattern

	   Note	that in	contrast to the	previous examples, this	task could not
	   be solved with pdfgrep alone, but the Unix tools find(1) and
	   xargs(1) had	to be used. That's because pdfgrep itself doesn't
	   include options to exclude files by their size. But as you see, it
	   doesn't have	to!

BUGS
   Reporting Bugs
       Bugs can	either be reportet to the mailing list
       (pdfgrep-users@pdfgrep.org) or to the bugtracker	on gitlab
       (https://gitlab.com/pdfgrep/pdfgrep/issues).

AUTHORS
       pdfgrep is maintained by	Hans-Peter Deifel.

       See the AUTHORS file in the source for a	full list of contributors.

SEE ALSO
       grep(1),	pcre(3), regex(7)

       See pdfgrep's website https://pdfgrep.org for more information,
       downloads, git repository and more.

Pdfgrep	2.0.1			  04/28/2018			    PDFGREP(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | EXIT STATUS | ENVIRONMENT VARIABLES | FILES | EXAMPLES | BUGS | AUTHORS | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=pdfgrep&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help