Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
uri_string(3)		   Erlang Module Definition		 uri_string(3)

NAME
       uri_string - URI	processing functions.

DESCRIPTION
       This module contains functions for parsing and handling URIs (RFC 3986)
       and form-urlencoded query strings (HTML 5.2).

       Parsing and serializing non-UTF-8  form-urlencoded  query  strings  are
       also supported (HTML 5.0).

       A  URI is an identifier consisting of a sequence	of characters matching
       the syntax rule named URI in RFC	3986.

       The generic URI syntax consists of a hierarchical  sequence  of	compo-
       nents referred to as the	scheme,	authority, path, query,	and fragment:

	   URI	       = scheme	":" hier-part [	"?" query ] [ "#" fragment ]
	   hier-part   = "//" authority	path-abempty
			  / path-absolute
			  / path-rootless
			  / path-empty
	   scheme      = ALPHA *( ALPHA	/ DIGIT	/ "+" /	"-" / "." )
	   authority   = [ userinfo "@"	] host [ ":" port ]
	   userinfo    = *( unreserved / pct-encoded / sub-delims / ":"	)

	   reserved    = gen-delims / sub-delims
	   gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
	   sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
		       / "*" / "+" / "," / ";" / "="

	   unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

       The interpretation of a URI depends only	on the characters used and not
       on how those characters are represented in a network protocol.

       The functions implemented by this module	cover the following use	cases:

	 * Parsing URIs	into its components and	returing a map
	   parse/1

	 * Recomposing a map of	URI components into a URI string
	   recompose/1

	 * Changing inbound binary and percent-encoding	of URIs
	   transcode/2

	 * Transforming	URIs into a normalized form
	   normalize/1
	   normalize/2

	 * Composing form-urlencoded query strings from	a  list	 of  key-value
	   pairs
	   compose_query/1
	   compose_query/2

	 * Dissecting  form-urlencoded	query strings into a list of key-value
	   pairs
	   dissect_query/1

       There are four different	encodings present during the handling of URIs:

	 * Inbound binary encoding in binaries

	 * Inbound percent-encoding in lists and binaries

	 * Outbound binary encoding in binaries

	 * Outbound percent-encoding in	lists and binaries

       Functions with uri_string() argument accept lists, binaries  and	 mixed
       lists  (lists with binary elements) as input type. All of the functions
       but transcode/2 expects input as	lists of unicode codepoints, UTF-8 en-
       coded  binaries	and  UTF-8  percent-encoded URI	parts ("%C3%B6"	corre-
       sponds to the unicode character "A<paragraph>").

       Unless otherwise	specified the return value type	and encoding  are  the
       same  as	the input type and encoding. That is, binary input returns bi-
       nary output, list input returns a list output but mixed	input  returns
       list output.

       In  case	of lists there is only percent-encoding. In binaries, however,
       both  binary  encoding  and  percent-encoding  shall   be   considered.
       transcode/2  provides the means to convert between the supported	encod-
       ings, it	takes a	uri_string() and a list	of options specifying  inbound
       and outbound encodings.

       RFC  3986  does	not  mandate any specific character encoding and it is
       usually defined by the protocol or surrounding text. This library takes
       the  same  assumption,  binary  and percent-encoding are	handled	as one
       configuration unit, they	cannot be set to different values.

DATA TYPES
       error() = {error, atom(), term()}

	      Error tuple indicating the type of error.	Possible values	of the
	      second component:

		* invalid_character

		* invalid_encoding

		* invalid_input

		* invalid_map

		* invalid_percent_encoding

		* invalid_scheme

		* invalid_uri

		* invalid_utf8

		* missing_value

	      The  third  component is a term providing	additional information
	      about the	cause of the error.

       uri_map() =
	   #{fragment => unicode:chardata(),
	     host => unicode:chardata(),
	     path => unicode:chardata(),
	     port => integer() >= 0 | undefined,
	     query => unicode:chardata(),
	     scheme => unicode:chardata(),
	     userinfo => unicode:chardata()} |
	   #{}

	      Map holding the main components of a URI.

       uri_string() = iodata()

	      List of unicode codepoints, a UTF-8 encoded binary, or a mix  of
	      the two, representing an RFC 3986	compliant URI (percent-encoded
	      form). A URI is a	sequence of characters	from  a	 very  limited
	      set:  the	letters	of the basic Latin alphabet, digits, and a few
	      special characters.

EXPORTS
       compose_query(QueryList)	-> QueryString

	      Types:

		 QueryList = [{unicode:chardata(), unicode:chardata()}]
		 QueryString = uri_string() | error()

	      Composes a form-urlencoded QueryString based on a	 QueryList,  a
	      list of non-percent-encoded key-value pairs. Form-urlencoding is
	      defined in section 4.10.21.6 of the HTML 5.2  specification  and
	      in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
	      encodings.

	      See also the opposite operation dissect_query/1.

	      Example:

	      1> uri_string:compose_query([{"foo bar","1"},{"city","A<paragraph>rebro"}]).
	      "foo+bar=1&city=%C3%B6rebro"
	      2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
	      2> {<<"city">>,<<"A<paragraph>rebro"/utf8>>}]).
	      <<"foo+bar=1&city=%C3%B6rebro">>

       compose_query(QueryList,	Options) -> QueryString

	      Types:

		 QueryList = [{unicode:chardata(), unicode:chardata()}]
		 Options = [{encoding, atom()}]
		 QueryString = uri_string() | error()

	      Same as compose_query/1 but with an additional  Options  parame-
	      ter, that	controls the encoding ("charset") used by the encoding
	      algorithm. There are two supported encodings: utf8 (or  unicode)
	      and latin1.

	      Each  character in the entry's name and value that cannot	be ex-
	      pressed using the	selected character encoding, is	replaced by  a
	      string  consisting  of  a	 U+0026	AMPERSAND character (&), a "#"
	      (U+0023) character, one or more ASCII  digits  representing  the
	      Unicode  code  point of the character in base ten, and finally a
	      ";" (U+003B) character.

	      Bytes that are out of the	range 0x2A, 0x2D, 0x2E,	0x30 to	 0x39,
	      0x41  to	0x5A,  0x5F, 0x61 to 0x7A, are percent-encoded (U+0025
	      PERCENT SIGN character (%) followed by uppercase ASCII hex  dig-
	      its representing the hexadecimal value of	the byte).

	      See also the opposite operation dissect_query/1.

	      Example:

	      1> uri_string:compose_query([{"foo bar","1"},{"city","A<paragraph>rebro"}],
	      1> [{encoding, latin1}]).
	      "foo+bar=1&city=%F6rebro"
	      2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
	      2> {<<"city">>,<<"ae+-ao~"/utf8>>}], [{encoding, latin1}]).
	      <<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>

       dissect_query(QueryString) -> QueryList

	      Types:

		 QueryString = uri_string()
		 QueryList =
		     [{unicode:chardata(), unicode:chardata()}]	| error()

	      Dissects	an  urlencoded	QueryString and	returns	a QueryList, a
	      list of non-percent-encoded key-value pairs. Form-urlencoding is
	      defined  in  section 4.10.21.6 of	the HTML 5.2 specification and
	      in section 4.10.22.6 of the HTML 5.0 specification for non-UTF-8
	      encodings.

	      See also the opposite operation compose_query/1.

	      Example:

	      1> uri_string:dissect_query("foo+bar=1&city=%C3%B6rebro").
	      [{"foo bar","1"},{"city","A<paragraph>rebro"}]
	      2> uri_string:dissect_query(<<"foo+bar=1&city=%26%2326481%3B%26%2320140%3B">>).
	      [{<<"foo bar">>,<<"1">>},
	       {<<"city">>,<<230,157,177,228,186,172>>}]

       normalize(URI) -> NormalizedURI

	      Types:

		 URI = uri_string() | uri_map()
		 NormalizedURI = uri_string() |	error()

	      Transforms an URI	into a normalized form using Syntax-Based Nor-
	      malization as defined by RFC 3986.

	      This function implements	case  normalization,  percent-encoding
	      normalization,  path segment normalization and scheme based nor-
	      malization for HTTP(S) with basic	support	for FTP, SSH, SFTP and
	      TFTP.

	      Example:

	      1> uri_string:normalize("/a/b/c/./../../g").
	      "/a/g"
	      2> uri_string:normalize(<<"mid/content=5/../6">>).
	      <<"mid/6">>
	      3> uri_string:normalize("http://localhost:80").
	      "https://localhost/"
	      4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
	      4> host => "localhost-A<paragraph>rebro"}).
	      "http://localhost-%C3%B6rebro/a/g"

       normalize(URI, Options) -> NormalizedURI

	      Types:

		 URI = uri_string() | uri_map()
		 Options = [return_map]
		 NormalizedURI = uri_string() |	uri_map()

	      Same  as	normalize/1  but with an additional Options parameter,
	      that controls if the normalized URI  shall  be  returned	as  an
	      uri_map(). There is one supported	option:	return_map.

	      Example:

	      1> uri_string:normalize("/a/b/c/./../../g", [return_map]).
	      #{path =>	"/a/g"}
	      2> uri_string:normalize(<<"mid/content=5/../6">>,	[return_map]).
	      #{path =>	<<"mid/6">>}
	      3> uri_string:normalize("http://localhost:80", [return_map]).
	      #{scheme => "http",path => "/",host => "localhost"}
	      4> uri_string:normalize(#{scheme => "http",port => 80,path => "/a/b/c/./../../g",
	      4> host => "localhost-A<paragraph>rebro"}, [return_map]).
	      #{scheme => "http",path => "/a/g",host =>	"localhost-A<paragraph>rebro"}

       parse(URIString)	-> URIMap

	      Types:

		 URIString = uri_string()
		 URIMap	= uri_map() | error()

	      Parses an	RFC 3986 compliant uri_string()	into a uri_map(), that
	      holds the	parsed components of the URI. If parsing fails,	an er-
	      ror tuple	is returned.

	      See also the opposite operation recompose/1.

	      Example:

	      1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
	      #{fragment => "nose",host	=> "example.com",
		path =>	"/over/there",port => 8042,query => "name=ferret",
		scheme => foo,userinfo => "user"}
	      2> uri_string:parse(<<"foo://user@example.com:8042/over/there?name=ferret">>).
	      #{host =>	<<"example.com">>,path => <<"/over/there">>,
		port =>	8042,query => <<"name=ferret">>,scheme => <<"foo">>,
		userinfo => <<"user">>}

       recompose(URIMap) -> URIString

	      Types:

		 URIMap	= uri_map()
		 URIString = uri_string() | error()

	      Creates an RFC 3986 compliant URIString (percent-encoded), based
	      on the components	of URIMap. If the URIMap is invalid, an	 error
	      tuple is returned.

	      See also the opposite operation parse/1.

	      Example:

	      1> URIMap	= #{fragment =>	"nose",	host =>	"example.com", path => "/over/there",
	      1> port => 8042, query =>	"name=ferret", scheme => "foo",	userinfo => "user"}.
	      #{fragment => "top",host => "example.com",
		path =>	"/over/there",port => 8042,query => "?name=ferret",
		scheme => foo,userinfo => "user"}

	      2> uri_string:recompose(URIMap).
	      "foo://example.com:8042/over/there?name=ferret#nose"

       transcode(URIString, Options) ->	Result

	      Types:

		 URIString = uri_string()
		 Options =
		     [{in_encoding, unicode:encoding()}	|
		      {out_encoding, unicode:encoding()}]
		 Result	= uri_string() | error()

	      Transcodes  an  RFC 3986 compliant URIString, where Options is a
	      list of tagged tuples, specifying	the inbound (in_encoding)  and
	      outbound	(out_encoding) encodings. in_encoding and out_encoding
	      specifies	both binary encoding and percent-encoding for the  in-
	      put  and	output	data. Mixed encoding, where binary encoding is
	      not the same as percent-encoding,	is not supported. If an	 argu-
	      ment is invalid, an error	tuple is returned.

	      Example:

	      1> uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,
	      1> [{in_encoding,	utf32},{out_encoding, utf8}]).
	      <<"foo%C3%B6bar"/utf8>>
	      2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
	      2> {out_encoding,	utf8}]).
	      "foo%C3%B6bar"

Ericsson AB			  stdlib 3.8			 uri_string(3)

NAME | DESCRIPTION | DATA TYPES | EXPORTS

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=uri_string&sektion=3&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help