Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
extract_url(1)			 User Commands			extract_url(1)

       extract_url -- extract URLs from	email messages

       extract_url [options] file

       This is a Perl script that extracts URLs	from correctly-encoded MIME
       email messages. This can	be used	either as a pre-parser for urlview, or
       to replace urlview entirely.

       Urlview is a great program, but has some	deficiencies. In particular,
       it isn't	particularly configurable, and cannot handle URLs that have
       been broken over	several	lines in format=flowed delsp=yes email
       messages.  Nor can it handle quoted-printable email messages. Also,
       urlview doesn't eliminate duplicate URLs. This Perl script handles all
       of that.	 It also sanitizes URLs	so that	they can't break out of	the
       command shell.

       This is designed	primarily for use with the mutt	emailer. The idea is
       that if you want	to access a URL	in an email, you pipe the email	to a
       URL extractor (like this	one) which then	lets you select	a URL to view
       in some third program (such as Firefox).	An alternative design is to
       access URLs from	within mutt's pager by defining	macros and tagging the
       URLs in the display to indicate which macro to use. A script you	can
       use to do that is

       -h, --help
	   Display this	help and exit.

       -m, --man
	   Display the full man	page documentation.

       -l, --list
	   Prevent use of Ncurses, and simply output a list of extracted URLs.

       -t, --text
	   Prevent MIME	handling; treat	the input as plain text.

       -q, --quoted
	   Force a quoted-printable decode on plain text.

       -c, --config
	   Specify a config file to read.

       -V, --version
	   Output version information and exit.

       Mandatory dependencies are MIME::Parser and HTML::Parser.  These
       usually come with Perl.

       Optional	dependencies are URI::Find (recognizes more exotic URL
       variations in plain text	(without HTML tags)), Curses::UI (allows it to
       fully replace urlview), MIME::Quoted (does a more standardized decode
       of quoted-printable characters in plain text), and Getopt::Long (if
       present, recognizes long options --version and --list).

       This Perl script	expects	a valid	email to be either piped in via	STDIN
       or in a file listed as the script's only	argument. Its STDOUT can be a
       pipe into urlview (it will detect this).	Here's how you can use it:

	   cat message.txt |
	   cat message.txt | | urlview message.txt message.txt |	urlview

       For use with mutt 1.4.x,	here's a macro you can use:

	   macro index,pager \cb "\
	   <enter-command> \
	   unset pipe_decode<enter>\
	       <pipe-message><enter>" \
	   "get	URLs"

       For use with mutt 1.5.x,	here's a more complicated macro	you can	use:

	   macro index,pager \cb "\
	   <enter-command> set my_pdsave=\$pipe_decode<enter>\
	   <enter-command> unset pipe_decode<enter>\
	   <enter-command> set pipe_decode=\$my_pdsave<enter>" \
	   "get	URLs"

       Here's a	suggestion for how to handle encrypted email:

	   macro index,pager ,b	"\
	   <enter-command> set my_pdsave=\$pipe_decode<enter>\
	   <enter-command> unset pipe_decode<enter>\
	   <enter-command> set pipe_decode=\$my_pdsave<enter>" \
	   "get	URLs"

	   macro index,pager ,B	"\
	   <enter-command> set my_pdsave=\$pipe_decode<enter>\
	   <enter-command> set pipe_decode<enter>\
	   <enter-command> set pipe_decode=\$my_pdsave<enter>" \
	   "decrypt message, then get URLs"

	   message-hook	.  'macro index,pager \cb ,b "URL viewer"'
	   message-hook	~G 'macro index,pager \cb ,B "URL viewer"'

       If you're using it with Curses::UI (i.e.	as a standalone	URL selector),
       this Perl script	will try and figure out	what command to	use based on
       the contents of your ~/.urlview file. However, it also has its own
       configuration file (~/.extract_urlview) that will be used instead, if
       it exists. So far, there	are eight kinds	of lines you can have in this

       COMMAND ...
	       This line specifies the command that will be used to view URLs.
	       This command CAN	contain	a %s, which will be replaced by	the
	       URL inside single-quotes. If it does not	contain	a %s, the URL
	       will simply be appended to the command. If this line is not
	       present,	the command is taken from the environment variable
	       $BROWSER. If BROWSER is not set,	the command is assumed to be
	       "open", which is	the correct command for	MacOS X	systems.

	       This line specifies that	if an email contains only 1 URL, that
	       URL will	be opened without prompting. The default (without this
	       line) is	to always prompt.

	       Normally, if a URL is too long to display on screen in the
	       menu, the user will be prompted with the	full URL before
	       opening it, just	to make	sure it's correct. This	line turns
	       that behavior off.

	       By default, when	a URL has been selected	and viewed from	the
	       menu, will exit. If you would like it to be
	       ready to	view another URL without re-parsing the	email (i.e.
	       much like standard urlview behavior), add this line to the
	       config file.

	       By default, the script collects all the URLs it can find.
	       Sometimes, though, HTML messages	contain	links that don't
	       correspond to any text (and aren't normally rendered or
	       accessible). This tells the script to ignore these links.

	       By default, the script sanitizes	URLs pretty thoroughly,
	       eliminating all characters that are not part of the Unreserved
	       class (per RFC 3986). Sometimes,	though,	this is	not desirable.
	       This tells the script to	leave the Reserved Characters un-
	       encoded (with the exception of the single quote).

       HTML_TAGS ...
	       This line specifies which HTML tags will	be examined for	URLs.
	       By default, the script is very generous,	looking	in a, applet,
	       area, blockquote, embed,	form, frame, iframe, input, ins,
	       isindex,	head, layer, link, object, q, script, and xmp tags for
	       links. If you would like	it to examine just a subset of these
	       (e.g. you only want a tags to be	examined), merely list the
	       subset you want.	The list is expected to	be a comma-separated
	       list. If	there are multiple of these lines in the config	file,
	       the script will look for	the minimum set	of specified tags.

       ALTSELECT ...
	       This line specifies a key for an	alternate url viewing
	       behavior.  By default, will quit after the URL
	       viewer has been launched	for the	selected URL. This key will
	       then make	launch the URL viewer but will not
	       quit. However, if PERSISTENT is specified in the	config file,
	       the opposite is true: normal selection of a URL will launch the
	       URL viewer and will not cause to exit, but this
	       key will. This setting defaults to a.

       DEFAULT_VIEW {url|context}
	       This line specifies whether to show the list of URLs at first
	       or to show the url contexts when	the program is run. By
	       default, shows a list of URLs.

       Here is an example config file:

	   COMMAND mozilla-firefox -remote "openURL(%s,new-window)"
	   HTML_TAGS a,iframe,link
	   DEFAULT_VIEW	context



       mutt(1) urlview(1) urlscan(1)

       All URLs	have any potentially dangerous shell characters	(namely	a
       single quote and	a dollar sign) removed (transformed into percent-
       encoding) before	they are used in a shell. This should eliminate	the
       possibility of a	bad URL	breaking the shell.

       If using	Curses::UI, and	a URL is too big for your terminal, when you
       select it, will (by default) ask you to review it	in a
       way that	you can	see the	whole thing.

       Program was written by Kyle Wheeler <>

       Released	under license BSD-2-Cluase (simplified)	For more information
       about the license, visit	<>.

perl v5.32.1			  2021-11-05			extract_url(1)


Want to link to this manual page? Use this URL:

home | help