Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Unicode::Stringprep(3)User Contributed Perl DocumentatioUnicode::Stringprep(3)

       Unicode::Stringprep - Preparation of Internationalized Strings
       (RFCA 3454)

	 use Unicode::Stringprep;
	 use Unicode::Stringprep::Mapping;
	 use Unicode::Stringprep::Prohibited;

	 my $prepper = Unicode::Stringprep->new(
	   [ { 32 => '<SPACE>'},  ],
	   [ @Unicode::Stringprep::Prohibited::C12, @Unicode::Stringprep::Prohibited::C22,
	     @Unicode::Stringprep::Prohibited::C3, @Unicode::Stringprep::Prohibited::C4,
	     @Unicode::Stringprep::Prohibited::C5, @Unicode::Stringprep::Prohibited::C6,
	     @Unicode::Stringprep::Prohibited::C7, @Unicode::Stringprep::Prohibited::C8,
	     @Unicode::Stringprep::Prohibited::C9 ],
	   1, 0	);
	 $output = $prepper->($input)

       This module implements the stringprep framework for preparing Unicode
       text strings in order to	increase the likelihood	that string input and
       string comparison work in ways that make	sense for typical users
       throughout the world.  The stringprep protocol is useful	for protocol
       identifier values, company and personal names, internationalized	domain
       names, and other	text strings.

       The stringprep framework	does not specify how protocols should prepare
       text strings. Protocols must create profiles of stringprep in order to
       fully specify the processing options.

       This module provides a single function, "new", that creates a perl
       function	implementing a stringprep profile.

       This module exports nothing.

       new($unicode_version, $mapping_tables, $unicode_normalization,
       $prohibited_tables, $bidi_check,	$unassigned_check)
	   Creates a "bless"ed function	reference that implements a stringprep

	   This	function takes the following parameters:

	       The Unicode version specified by	the stringprep profile.

	       Currently, this parameter must be 3.2 (numeric).

	       The mapping tables used for stringprep.

	       The parameter may be a reference	to a hash or an	array, or
	       "undef".	A hash must map	Unicode	codepoints (as integers,
	       e.A g. 0x0020 for U+0020) to replacement	strings	(as perl
	       strings).  An array may contain pairs of	Unicode	codepoints and
	       replacement strings as well as references to nested hashes and

	       Unicode::Stringprep::Mapping provides the tables	from
	       RFCA 3454, AppendixA B.

	       For further information on the mapping step, see	RFCA 3454,
	       sectionA	3.

	       The Unicode normalization to be used.

	       Currently, "undef"/'' (no normalization)	and 'KC'
	       (compatibility composed)	are specified for stringprep.

	       For further information on the normalization step, see
	       RFCA 3454, sectionA 4.

	       Normalization form KC will also enable checks for some problem
	       sequences for which the normalization can't be implemented in
	       an interoperable	way.

	       For more	information, see "CAVEATS" below.

	       The list	of prohibited output characters	for stringprep.

	       The parameter may be a reference	to an array, or	"undef". The
	       array contains pairs of codepoints, which define	the start and
	       end of a	Unicode	character range	(as integers). The end
	       character may be	"undef", specifying a single-character range.
	       The array may also contain references to	nested arrays.

	       Unicode::Stringprep::Prohibited provides	the tables from
	       RFCA 3454, AppendixA C.

	       For further information on the prohibition checking step, see
	       RFCA 3454, sectionA 5.

	       Whether to employ checks	for confusing bidirectional text. A
	       boolean value.

	       For further information on the bidi checking step, see
	       RFCA 3454, sectionA 6.

	       Whether to check	for and	prohibit unassigned characters.	A
	       boolean value.

	       The check must be used when creating stored strings. It should
	       not be used for query strings, increasing the chance that newly
	       assigned	characters work	as expected.

	       For further information on stored and query strings, see
	       RFCA 3454, sectionA 7.

	   The function	returned can be	called with a single parameter,	the
	   string to be	prepared, and returns the prepared string. It will die
	   if the input	string cannot be successfully prepared because it
	   would contain invalid output	(so use	"eval" if necessary).

	   For performance reasons, it is strongly recommended to call the
	   "new" function as few times as possible, i.A	e. exactly once	per
	   stringprep profile. It might	also be	better not to use this module
	   directly but	to use (or write) a module implementing	a profile,
	   such	as Authen::SASL::SASLprep.

       You can easily implement	a stringprep profile without subclassing:

	 package ACME::ExamplePrep;

	 use Unicode::Stringprep;

	 use Unicode::Stringprep::Mapping;
	 use Unicode::Stringprep::Prohibited;

	 *exampleprep =	Unicode::Stringprep->new(
	   [ \@Unicode::Stringprep::Mapping::B1, ],
	   [ \@Unicode::Stringprep::Prohibited::C12,
	     \@Unicode::Stringprep::Prohibited::C22, ],

       This binds "ACME::ExamplePrep::exampleprep" to the function created by

       Usually,	it is not necessary to subclass	this module. Sublassing	this
       module is not recommended.

       The following modules contain the data tables from RFCA 3454.  These
       modules are automatically loaded	when loading "Unicode::Stringprep".

       o   Unicode::Stringprep::Unassigned

	     @Unicode::Stringprep::Unassigned::A1  # Appendix A.1

       o   Unicode::Stringprep::Mapping

	     @Unicode::Stringprep::Mapping::B1	   # Appendix B.1
	     @Unicode::Stringprep::Mapping::B2	   # Appendix B.2
	     @Unicode::Stringprep::Mapping::B2	   # Appendix B.3

       o   Unicode::Stringprep::Prohibited

	     @Unicode::Stringprep::Prohibited::C11 # Appendix C.1.1
	     @Unicode::Stringprep::Prohibited::C12 # Appendix C.1.2
	     @Unicode::Stringprep::Prohibited::C21 # Appendix C.2.1
	     @Unicode::Stringprep::Prohibited::C22 # Appendix C.2.2
	     @Unicode::Stringprep::Prohibited::C3  # Appendix C.3
	     @Unicode::Stringprep::Prohibited::C4  # Appendix C.4
	     @Unicode::Stringprep::Prohibited::C5  # Appendix C.5
	     @Unicode::Stringprep::Prohibited::C6  # Appendix C.6
	     @Unicode::Stringprep::Prohibited::C7  # Appendix C.7
	     @Unicode::Stringprep::Prohibited::C8  # Appendix C.8
	     @Unicode::Stringprep::Prohibited::C9  # Appendix C.9

       o   Unicode::Stringprep::BiDi

	     @Unicode::Stringprep::BiDi::D1	   # Appendix D.1
	     @Unicode::Stringprep::BiDi::D2	   # Appendix D.2

       In Unicode 3.2 to 4.0.1,	the specification of UAX #15: Unicode
       Normalization Forms for forms NFC and NFKC is not logically self-
       consistent.  This has been fixed	in Corrigendum #5

       Unfortunately, this yields two ways to implement	NFC and	NFKC in
       Unicode 3.2, on which the Stringprep standard is	based: one based on a
       literal interpretation of the original specification and	one based on
       the corrected specification. The	output of these	implementations
       differs for a small class of strings, all of which can't	appear in
       meaningful text.	See UAX	#15, section 19
       <> for

       This module will	check for these	strings	and, if	normalization is done,
       prohibit	them in	output as it is	not possible to	interoperate under
       these circumstandes.

       Please note that	due to this, the normalization step may	cause the
       preparation to fail. That is, the preparation function may die even if
       there are no prohibited characters and no checks	for bidi sequences and
       unassigned characters, which may	be surprising.

       Claus FAxrber <>

       Copyright 2007-2009 Claus FAxrber.

       This library is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.

       Unicode::Normalize, RFCA	3454 (<>)

perl v5.32.1			  2014-09-03		Unicode::Stringprep(3)


Want to link to this manual page? Use this URL:

home | help