Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
extract_url(1)			 User Commands			extract_url(1)

NAME
       extract_url -- extract URLs from	email messages

SYNOPSIS
       extract_url [options] file

DESCRIPTION
       This is a Perl script that extracts URLs	from correctly-encoded MIME
       email messages. This can	be used	either as a pre-parser for urlview, or
       to replace urlview entirely.

       Urlview is a great program, but has some	deficiencies. In particular,
       it isn't	particularly configurable, and cannot handle URLs that have
       been broken over	several	lines in format=flowed delsp=yes email
       messages.  Nor can it handle quoted-printable email messages. Also,
       urlview doesn't eliminate duplicate URLs. This Perl script handles all
       of that.	 It also sanitizes URLs	so that	they can't break out of	the
       command shell.

       This is designed	primarily for use with the mutt	emailer. The idea is
       that if you want	to access a URL	in an email, you pipe the email	to a
       URL extractor (like this	one) which then	lets you select	a URL to view
       in some third program (such as Firefox).	An alternative design is to
       access URLs from	within mutt's pager by defining	macros and tagging the
       URLs in the display to indicate which macro to use. A script you	can
       use to do that is tagurl.pl.

OPTIONS
       -h, --help
	   Display this	help and exit.

       -m, --man
	   Display the full man	page documentation.

       -l, --list
	   Prevent use of Ncurses, and simply output a list of extracted URLs.

       -t, --text
	   Prevent MIME	handling; treat	the input as plain text.

       -q, --quoted
	   Force a quoted-printable decode on plain text.

       -c, --config
	   Specify a config file to read.

       -V, --version
	   Output version information and exit.

DEPENDENCIES
       Mandatory dependencies are MIME::Parser and HTML::Parser.  These
       usually come with Perl.

       Optional	dependencies are URI::Find (recognizes more exotic URL
       variations in plain text	(without HTML tags)), Curses::UI (allows it to
       fully replace urlview), MIME::Quoted (does a more standardized decode
       of quoted-printable characters in plain text), and Getopt::Long (if
       present,	extract_url.pl recognizes long options --version and --list).

EXAMPLES
       This Perl script	expects	a valid	email to be either piped in via	STDIN
       or in a file listed as the script's only	argument. Its STDOUT can be a
       pipe into urlview (it will detect this).	Here's how you can use it:

	   cat message.txt | extract_url.pl
	   cat message.txt | extract_url.pl | urlview
	   extract_url.pl message.txt
	   extract_url.pl message.txt |	urlview

       For use with mutt 1.4.x,	here's a macro you can use:

	   macro index,pager \cb "\
	   <enter-command> \
	   unset pipe_decode<enter>\
	       <pipe-message>extract_url.pl<enter>" \
	   "get	URLs"

       For use with mutt 1.5.x,	here's a more complicated macro	you can	use:

	   macro index,pager \cb "\
	   <enter-command> set my_pdsave=\$pipe_decode<enter>\
	   <enter-command> unset pipe_decode<enter>\
	   <pipe-message>extract_url.pl<enter>\
	   <enter-command> set pipe_decode=\$my_pdsave<enter>" \
	   "get	URLs"

       Here's a	suggestion for how to handle encrypted email:

	   macro index,pager ,b	"\
	   <enter-command> set my_pdsave=\$pipe_decode<enter>\
	   <enter-command> unset pipe_decode<enter>\
	   <pipe-message>extract_url.pl<enter>\
	   <enter-command> set pipe_decode=\$my_pdsave<enter>" \
	   "get	URLs"

	   macro index,pager ,B	"\
	   <enter-command> set my_pdsave=\$pipe_decode<enter>\
	   <enter-command> set pipe_decode<enter>\
	   <pipe-message>extract_url.pl<enter>\
	   <enter-command> set pipe_decode=\$my_pdsave<enter>" \
	   "decrypt message, then get URLs"

	   message-hook	.  'macro index,pager \cb ,b "URL viewer"'
	   message-hook	~G 'macro index,pager \cb ,B "URL viewer"'

CONFIGURATION
       If you're using it with Curses::UI (i.e.	as a standalone	URL selector),
       this Perl script	will try and figure out	what command to	use based on
       the contents of your ~/.urlview file. However, it also has its own
       configuration file (~/.extract_urlview) that will be used instead, if
       it exists. So far, there	are eight kinds	of lines you can have in this
       file:

       COMMAND ...
	       This line specifies the command that will be used to view URLs.
	       This command CAN	contain	a %s, which will be replaced by	the
	       URL inside single-quotes. If it does not	contain	a %s, the URL
	       will simply be appended to the command. If this line is not
	       present,	the command is taken from the environment variable
	       $BROWSER. If BROWSER is not set,	the command is assumed to be
	       "open", which is	the correct command for	MacOS X	systems.

       SHORTCUT
	       This line specifies that	if an email contains only 1 URL, that
	       URL will	be opened without prompting. The default (without this
	       line) is	to always prompt.

       NOREVIEW
	       Normally, if a URL is too long to display on screen in the
	       menu, the user will be prompted with the	full URL before
	       opening it, just	to make	sure it's correct. This	line turns
	       that behavior off.

       PERSISTENT
	       By default, when	a URL has been selected	and viewed from	the
	       menu, extract_url.pl will exit. If you would like it to be
	       ready to	view another URL without re-parsing the	email (i.e.
	       much like standard urlview behavior), add this line to the
	       config file.

       IGNORE_EMPTY_TAGS
	       By default, the script collects all the URLs it can find.
	       Sometimes, though, HTML messages	contain	links that don't
	       correspond to any text (and aren't normally rendered or
	       accessible). This tells the script to ignore these links.

       RAW_RESERVED
	       By default, the script sanitizes	URLs pretty thoroughly,
	       eliminating all characters that are not part of the Unreserved
	       class (per RFC 3986). Sometimes,	though,	this is	not desirable.
	       This tells the script to	leave the Reserved Characters un-
	       encoded (with the exception of the single quote).

       HTML_TAGS ...
	       This line specifies which HTML tags will	be examined for	URLs.
	       By default, the script is very generous,	looking	in a, applet,
	       area, blockquote, embed,	form, frame, iframe, input, ins,
	       isindex,	head, layer, link, object, q, script, and xmp tags for
	       links. If you would like	it to examine just a subset of these
	       (e.g. you only want a tags to be	examined), merely list the
	       subset you want.	The list is expected to	be a comma-separated
	       list. If	there are multiple of these lines in the config	file,
	       the script will look for	the minimum set	of specified tags.

       ALTSELECT ...
	       This line specifies a key for an	alternate url viewing
	       behavior.  By default, extract_url.pl will quit after the URL
	       viewer has been launched	for the	selected URL. This key will
	       then make extract_url.pl	launch the URL viewer but will not
	       quit. However, if PERSISTENT is specified in the	config file,
	       the opposite is true: normal selection of a URL will launch the
	       URL viewer and will not cause extract_url.pl to exit, but this
	       key will. This setting defaults to a.

       DEFAULT_VIEW {url|context}
	       This line specifies whether to show the list of URLs at first
	       or to show the url contexts when	the program is run. By
	       default,	extract_url.pl shows a list of URLs.

       Here is an example config file:

	   SHORTCUT
	   COMMAND mozilla-firefox -remote "openURL(%s,new-window)"
	   HTML_TAGS a,iframe,link
	   ALTSELECT Q
	   DEFAULT_VIEW	context

STANDARDS
       None.

AVAILABILITY
       http://www.memoryhole.net/~kyle/extract_url/

SEE ALSO
       mutt(1) urlview(1) urlscan(1)

CAVEATS
       All URLs	have any potentially dangerous shell characters	(namely	a
       single quote and	a dollar sign) removed (transformed into percent-
       encoding) before	they are used in a shell. This should eliminate	the
       possibility of a	bad URL	breaking the shell.

       If using	Curses::UI, and	a URL is too big for your terminal, when you
       select it, extract_url.pl will (by default) ask you to review it	in a
       way that	you can	see the	whole thing.

AUTHOR
       Program was written by Kyle Wheeler <kyle@memoryhole.net>

       Released	under license BSD-2-Cluase (simplified)	For more information
       about the license, visit	<http://spdx.org/licenses/BSD-2-Clause>.

perl v5.32.0			  2020-08-17			extract_url(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | DEPENDENCIES | EXAMPLES | CONFIGURATION | STANDARDS | AVAILABILITY | SEE ALSO | CAVEATS | AUTHOR

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=extract_url&sektion=1&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help