Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Catalyst::UTF8(3)     User Contributed Perl Documentation    Catalyst::UTF8(3)

       Catalyst::UTF8 -	All About UTF8 and Catalyst Encoding

       Starting	in 5.90080 Catalyst will enable	UTF8 encoding by default for
       text like body responses.  In addition we've made a ton of fixes	around
       encoding	and utf8 scattered throughout the codebase.  This document
       attempts	to give	an overview of the assumptions and practices that
       Catalyst	uses when dealing with UTF8 and	encoding issues.  You should
       also review the Changes file, Catalyst::Delta and Catalyst::Upgrading
       for more.

       We attempt to describe all relevant processes, try to give some advice
       and explain where we may	have been exceptional to respect our
       commitment to backwards compatibility.

UTF8 in	Controller Actions
       Using UTF8 characters in	your Controller	classes	and actions.

       In this section we will review changes to how UTF8 characters can be
       used in controller actions, how it looks	in the debugging screens (and
       your logs) as well as how you construct URL objects to actions with
       UTF8 paths (or using UTF8 args or captures).

   Unicode in Controllers and URLs
	   package MyApp::Controller::Root;

	   use utf8;
	   use base 'Catalyst::Controller';

	   sub heart_with_arg :Path('aY') Args(1)  {
	     my	($self,	$c, $arg) = @_;

	   sub base :Chained('/') CaptureArgs(0) {
	     my	($self,	$c) = @_;

	     sub capture :Chained('base') PathPart('aY') CaptureArgs(1)	{
	       my ($self, $c, $capture)	= @_;

	       sub arg :Chained('capture') PathPart('aY') Args(1) {
		 my ($self, $c,	$arg) =	@_;

       In the example controller above we have constructed two matchable URL


       The first one is	a classic Path type action and the second uses
       Chaining, and spans three actions in total.  As you can see, you	can
       use unicode characters in your Path and PathPart	attributes (remember
       to use the "utf8" pragma	to allow these multibyte characters in your
       source).	 The two constructed matchable routes would match the
       following incoming URLs:

	   (heart_with_arg) -> http://localhost/root/%E2%99%A5/{arg}
	   (base/capture/arg) -> http://localhost/base/%E2%99%A5/{capture}/%E2%99%A5/{arg}

       That path path "%E2%99%A5" is url encoded unicode (assuming you are
       hitting this with a reasonably modern browser).	Its basically what
       goes over HTTP when your	type a browser location	that has the unicode
       'heart' in it.  However we will use the unicode symbol in your
       debugging messages:

	   [debug] Loaded Path actions:
	   | Path				 | Private				|
	   | /root/aY/*				 | /root/heart_with_arg			 |

	   [debug] Loaded Chained actions:
	   | Path Spec				 | Private				|
	   | /base/aY/*/aY/*			   | /root/base	(0)			  |
	   |					 | -> /root/capture (1)			|
	   |					 | => /root/arg				|

       And if the requested URL	uses unicode characters	in your	captures or
       args (such as "http://localhost:/base/aY/aY/aY/aY") you should see the
       arguments and captures as their unicode characters as well:

	   [debug] Arguments are "aY"
	   [debug] "GET" request for "base/aY/aY/aY/aY"	from ""
	   | Action							| Time	    |
	   | /root/base							| 0.000080s |
	   | /root/capture						| 0.000075s |
	   | /root/arg							| 0.000755s |

       Again, remember that we are display the unicode character and using it
       to match	actions	containing such	multibyte characters BUT over HTTP you
       are getting these as URL	encoded	bytes.	For example if you looked at
       the PSGI	$env value for "REQUEST_URI" you would see (for	the above

	   REQUEST_URI => "/base/%E2%99%A5/%E2%99%A5/%E2%99%A5/%E2%99%A5"

       So on the incoming request we decode so that we can match and display
       unicode characters (after decoding the URL encoding).  This makes it
       straightforward to use these types of multibyte characters in your
       actions and see them incoming in	captures and arguments.	 Please	keep
       this in might if	you are	doing for example regular expression matching,
       length determination or other string comparisons, you will need to try
       these incoming variables	as though UTF8 strings.	 For example in	the
       following action:

	       sub arg :Chained('capture') PathPart('aY') Args(1) {
		 my ($self, $c,	$arg) =	@_;

       when $arg is "aY" you should expect "length($arg)" to be	1 since	it is
       indeed one character although it	will take more than one	byte to	store.

   UTF8	in constructing	URLs via $c->uri_for
       For the reverse (constructing meaningful	URLs to	actions	that contain
       multibyte characters in their paths or path parts, or when you want to
       include such characters in your captures	or arguments) Catalyst will do
       the right thing (again just remember to use the "utf8" pragma).

	   use utf8;
	   my $url = $c->uri_for( $c->controller('Root')->action_for('arg'), ['aY','aY']);

       When you	stringify this object (for use in a template, for example) it
       will automatically do the right thing regarding utf8 encoding and url


       Since again what	you want is a properly url encoded version of this.
       In this case your string	length will reflect URL	encoded	bytes, not the
       character length.  Ultimately what you want to send over	the wire via
       HTTP needs to be	bytes.

UTF8 in	GET Query and Form POST
       What Catalyst does with UTF8 in your GET	and classic HTML Form POST

   UTF8	in URL query and keywords
       The same	rules that we find in URL paths	also cover URL query parts.
       That is if one types a URL like this into the browser


       When this goes 'over the	wire' to your application server its going to
       be as percent encoded bytes:


       When Catalyst encounters	this we	decode the percent encoding and	the
       utf8 so that we can properly display this information (such as in the
       debugging logs or in a response.)

	   [debug] Query Parameters are:
	   | Parameter				 | Value				|
	   | aY					  | aYaY				   |

       All the values and keys that are	part of	$c->req->query_parameters will
       be utf8 decoded.	 So you	should not need	to do anything special to take
       those values/keys and send them to the body response (since as we will
       see later Catalyst will do all the necessary encoding for you).

       Again, remember that values of your parameters are now decode into
       Unicode strings.	 so for	example	you'd expect the result	of length to
       reflect the character length not	the byte length.

       Just like with arguments	and captures, you can use utf8 literals	(or
       utf8 strings) in	$c->uri_for:

	   use utf8;
	   my $url = $c->uri_for( $c->controller('Root')->action_for('example'), {'aY' => 'aYaY'});

       When you	stringify this object (for use in a template, for example) it
       will automatically do the right thing regarding utf8 encoding and url


       Since again what	you want is a properly url encoded version of this.
       Ultimately what you want	to send	over the wire via HTTP needs to	be
       bytes (not unicode characters).

       Remember	if you use any utf8 literals in	your source code, you should
       use the "use utf8" pragma.

       NOTE: Assuming UTF-8 in your query parameters and keywords may be an
       issue if	you have legacy	code where you created URL in templates
       manually	and used an encoding other than	UTF-8.	In these cases you may
       find versions of	Catalyst after 5.90080+	will incorrectly decode.  For
       backwards compatibility we offer	three configurations settings, here
       described in order of precedence:


       If true,	then do	not try	to character decode any	wide characters	in
       your request URL	query or keywords.  You	will need to handle this
       manually	in your	action code (although if you choose this setting,
       chances are you already do this).


       This setting allows one to specify a fixed value	for how	to decode your
       query, instead of using the default, UTF-8.


       If this is true we decode using whatever	you set	"encoding" to.

   UTF8	in Form	POST
       In general most modern browsers will follow the specification, which
       says that POSTed	form fields should be encoded in the same way that the
       document	was served with.  That means that if you are using modern
       Catalyst	and serving UTF8 encoded responses, a browser is supposed to
       notice that and encode the form POSTs accordingly.

       As a result since Catalyst now serves UTF8 encoded responses by
       default,	this means that	you can	mostly rely on incoming	form POSTs to
       be so encoded.  Catalyst	will make this assumption and decode
       accordingly (unless you explicitly turn off encoding...)	 If you	are
       running Catalyst	in developer debug, then you will see the correct
       unicode characters in the debug output.	For example if you generate a
       POST request:

	   use Catalyst::Test 'MyApp';
	   use utf8;

	   my $res = request POST "/example/posted", ['aY'=>'aY', 'aYaY'=>'aY'];

       Running in CATALYST_DEBUG=1 mode	you should see output like this:

	   [debug] Body	Parameters are:
	   | Parameter				 | Value				|
	   | aY					  | aY					  |
	   | aYaY				   | aY					   |

       And if you had a	controller like	this:

	   package MyApp::Controller::Example;

	   use base 'Catalyst::Controller';

	   sub posted :POST Local {
	       my ($self, $c) =	@_;
	       $c->res->body("hearts =>	${\$c->req->post_parameters->{aY}}");

       The following test case would be	true:

	   use Encode 2.21 'decode_utf8';
	   is decode_utf8($req->content), 'hearts => aY';

       In this case we decode so that we can print and compare strings with
       multibyte characters.

       NOTE  In	some cases some	browsers may not follow	the specification and
       set the form POST encoding based	on the server response.	 Catalyst
       itself doesn't attempt any workarounds, but one common approach is to
       use a hidden form field with a UTF8 value (You might be familiar	with
       this from how Ruby on Rails has HTML form helpers that do that
       automatically).	In that	case some browsers will	send UTF8 encoded if
       it notices the hidden input field contains such a character.  Also, you
       can add an HTML attribute to your form tag which	many modern browsers
       will respect to set the encoding	(accept-charset="utf-8").  And lastly
       there are some javascript based tricks and workarounds for even more
       odd cases (just search the web for this will return a number of
       approaches.  Hopefully as more compliant	browsers become	popular	these
       edge cases will fade.

       NOTE  It	is possible for	a form POST multipart response (normally a
       file upload) to contain inline content with mixed content character
       sets and	encoding.  For example one might create	a POST like this:

	   use utf8;
	   use HTTP::Request::Common;

	   my $utf8 = 'test aY';
	   my $shiftjs = 'test aa^1a';
	   my $req = POST '/root/echo_arg',
	       Content_Type => 'form-data',
		 Content =>  [
		   arg0	=> 'helloworld',
		   Encode::encode('UTF-8','aY')	=> Encode::encode('UTF-8','aYaY'),
		   arg1	=> [
		     undef, '',
		     'Content-Type' =>'text/plain; charset=UTF-8',
		     'Content' => Encode::encode('UTF-8', $utf8)],
		   arg2	=> [
		     undef, '',
		     'Content-Type' =>'text/plain; charset=SHIFT_JIS',
		     'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],
		   arg2	=> [
		     undef, '',
		     'Content-Type' =>'text/plain; charset=SHIFT_JIS',
		     'Content' => Encode::encode('SHIFT_JIS', $shiftjs)],

       In this case we've created a POST request but each part specifies its
       own content character set (and setting a	content	encoding would also be
       possible).  Generally one would not run into this situation in a	web
       browser context but for completeness sake Catalyst will notice if a
       multipart POST contains parts with complex or extended header
       information.  In	these cases we will try	to inspect the meta data and
       do the right thing (in the above	case we'd use SHIFT_JIS	to decode, not
       UTF-8).	However	if after inspecting the	headers	we cannot figure out
       how to decode the data, in those	cases it will not attempt to apply
       decoding	to the form values.  Instead the part will be represented as
       an instance of an object	Catalyst::Request::PartData which will contain
       all the header information needed for you to perform custom parser of
       the data.

       Ideally we'd fix	Catalyst to be smarter about decoding so please	submit
       your cases of this so we	can add	intelligence to	the parser and find a
       way to extract a	valid value out	of it.

UTF8 Encoding in Body Response
       When does Catalyst encode your response body and	what rules does	it use
       to determine when that is needed.

	   use utf8;
	   use warnings;
	   use strict;

	   package MyApp::Controller::Root;

	   use base 'Catalyst::Controller';
	   use File::Spec;

	   sub scalar_body :Local {
	       my ($self, $c) =	@_;
	       $c->response->body("<p>This is scalar_body action aY</p>");

	   sub stream_write :Local {
	       my ($self, $c) =	@_;
	       $c->response->write("<p>This is stream_write action aY</p>");

	   sub stream_write_fh :Local {
	       my ($self, $c) =	@_;

	       my $writer = $c->res->write_fh;
	       $writer->write_encoded('<p>This is stream_write_fh action aY</p>');

	   sub stream_body_fh :Local {
	       my ($self, $c) =	@_;
	       my $path	= File::Spec->catfile('t', 'utf8.txt');
	       open(my $fh, '<', $path)	|| die "trouble: $!";

       Beginning with Catalyst version 5.90080 You no longer need to set the
       encoding	configuration (although	doing so won't hurt anything).

       Currently we only encode	if the content type is one of the types	which
       generally expects a UTF8	encoding.  This	is determined by the following
       regular expression:

	   our $DEFAULT_ENCODE_CONTENT_TYPE_MATCH = qr{text|xml$|javascript$};
	   $c->response->content_type =~ /$DEFAULT_ENCODE_CONTENT_TYPE_MATCH/

       This is a global	variable in Catalyst::Response which is	stored in the
       "encodable_content_type"	attribute of $c->response.  You	may currently
       alter this directly on the response or globally.	 In the	future we may
       offer a configuration setting for this.

       This would match	content-types like the following (examples)


       You should set your content type	prior to header	finalization if	you
       want Catalyst to	encode.

       NOTE We do not attempt to encode	"application/json" since the two most
       commonly	used approaches	(Catalyst::View::JSON and
       Catalyst::Action::REST) have already configured their JSON encoders to
       produce properly	encoding UTF8 responses.  If you are rolling your own
       JSON encoding, you may need to set the encoder to do the	right thing
       (or override the	global regular expression to include the JSON media

   Encoding with Scalar	Body
       Catalyst	supports several methods of supplying your response with body
       content.	 The first and currently most common is	to set the
       Catalyst::Response ->body with a	scalar string (	as in the example):

	   use utf8;

	   sub scalar_body :Local {
	       my ($self, $c) =	@_;
	       $c->response->body("<p>This is scalar_body action aY</p>");

       In general you should need to do	nothing	else since Catalyst will
       automatically encode this string	during body finalization.  The only
       matter to watch out for is to make sure the string has not already been
       encoded,	as this	will result in double encoding errors.

       NOTE pay	attention to the content-type setting in the example.
       Catalyst	inspects that content type carefully to	determine if the body
       needs encoding).

       NOTE If you set the character set of the	response Catalyst will skip
       encoding	IF the character set is	set to something that doesn't match
       $c->encoding->mime_name.	We will	assume if you are setting an
       alternative character set, that means you want to handle	the encoding
       yourself.  However it might be easier to	set $c->encoding for a given
       response	cycle since you	can override this for a	given response.	 For
       example here's how to override the default encoding and set the correct
       character set in	the response:

	   sub override_encoding :Local	{
	       my ($self, $c) =	@_;

       This will use the alternative encoding for a single response.

       NOTE If you manually set	the content-type character set to whatever
       $c->encoding->mime_name is set to, we STILL encode, rather than assume
       your manual setting is a	flag to	override.  This	is done	to support
       backward	compatible assumptions (in particular Catalyst::View::TT has
       set a utf-8 character set in its	default	content-type for ages, even
       though it does not itself do any	encoding on the	body response).	 If
       you are going to	handle encoding	manually you may set
       $c->clear_encoding for a	single request response	cycle, or as in	the
       above example set an alternative	encoding.

   Encoding with streaming type	responses
       Catalyst	offers two approaches to streaming your	body response.	Again,
       you must	remember to set	your content type prior	to streaming, since
       invoking	a streaming response will automatically	finalize and send your
       HTTP headers (and your content type MUST	be one that matches the
       regular expression given	above.)

       Also, if	you are	going to override $c->encoding (or invoke
       $c->clear_encoding), you	should do that before anything else!

       The first streaming method is to	use the	"write"	method on the response
       object.	This method allows 'inlined' streaming and is generally	used
       with blocking style servers.

	   sub stream_write :Local {
	       my ($self, $c) =	@_;
	       $c->response->write("<p>This is stream_write action aY</p>");

       You may call the	"write"	method as often	as you need to finish
       streaming all your content.  Catalyst will encode each line in turn as
       long as the content-type	meets the 'encodable types' requirement	and
       $c->encoding is set (which it is, as long as you	did not	change it).

       NOTE If you try to change the encoding after you	start the stream, this
       will invoke an error response.  However since you've already started
       streaming this will not show up as an HTTP error	status code, but
       rather error information	in your	body response and an error in your

       NOTE If you use ->body AFTER using ->write (for example you may do this
       to write	your HTML HEAD information as fast as possible)	we expect the
       contents	to body	to be encoded as it normally would be if you never
       called ->write.	In general unless you are doing	weird custom stuff
       with encoding this is likely to just already do the correct thing.

       The second way to stream	a response is to get the response writer
       object and invoke methods on that directly:

	   sub stream_write_fh :Local {
	       my ($self, $c) =	@_;

	       my $writer = $c->res->write_fh;
	       $writer->write_encoded('<p>This is stream_write_fh action aY</p>');

       This can	be used	just like the "write" method, but typically you
       request this object when	you want to do a nonblocking style response
       since the writer	object can be closed over or sent to a model that will
       invoke it in a non blocking manner.  For	more on	using the writer
       object for non blocking responses you should review the "Catalyst"
       documentation and also you can look at several articles from last years
       advent, in particular:


       The main	difference this	year is	that previously	calling	->write_fh
       would return the	actual Plack writer object that	was supplied by	your
       Plack application handler, whereas now we wrap that object in a
       lightweight decorator object that proxies the "write" and "close"
       methods and supplies an additional "write_encoded" method.
       "write_encoded" does the	exact same thing as "write" except that	it
       will first encode the string when necessary.  In	general	if you are
       streaming encodable content such	as HTML	this is	the method to use.  If
       you are streaming binary	content, you should just use the "write"
       method (although	if the content type is set correctly we	would skip
       encoding	anyway,	but you	may as well avoid the extra noop overhead).

       The last	style of content response that Catalyst	supports is setting
       the body	to a filehandle	like object.  In this case the object is
       passed down to the Plack	application handler directly and currently we
       do nothing to set encoding.

	   sub stream_body_fh :Local {
	       my ($self, $c) =	@_;
	       my $path	= File::Spec->catfile('t', 'utf8.txt');
	       open(my $fh, '<', $path)	|| die "trouble: $!";

       In this example we create a filehandle to a text	file that contains
       UTF8 encoded characters.	We pass	this down without modification,	which
       I think is correct since	we don't want to double	encode.	 However this
       may change in a future development release so please be sure to double
       check the current docs and changelog.  Its possible a future release
       will require you	to to set a encoding on	the IO layer level so that we
       can be sure to properly encode at body finalization.  So	this is	still
       an edge case we are writing test	examples for.  But for now if you are
       returning a filehandle like response, you are expected to make sure you
       are following the PSGI specification and	return raw bytes.

   Override the	Encoding on Context
       As already noted	you may	change the current encoding (or	remove it) by
       setting an alternative encoding on the context;


       Please note that	you can	continue to change encoding UNTIL the headers
       have been finalized.  The last setting always wins.  Trying to change
       encoding	after header finalization is an	error.

   Setting the Content Encoding	HTTP Header
       In some cases you may set a content encoding on your response.  For
       example if you are encoding your	response with gzip.  In	this case you
       are again on your own.  If we notice that the content encoding header
       is set when we hit finalization,	we skip	automatic encoding:

	   use Encode;
	   use Compress::Zlib;
	   use utf8;

	   sub gzipped :Local {
	       my ($self, $c) =	@_;


		   Encode::encode_utf8("manual_1 aY")));

       If you are using	Catalyst::Plugin::Compress you need to upgrade to the
       most recent version in order to be compatible with changes introduced
       in Catalyst 5.90080.  Other plugins may require updates (please open
       bugs if you find	them).

       NOTE Content encoding may be set	to 'identify' and we will still
       perform automatic encoding if the content type is encodable and an
       encoding	is present for the context.

   Using Common	Views
       The following common views have been updated so that their tests	pass
       with default UTF8 encoding for Catalyst:

       Catalyst::View::TT, Catalyst::View::Mason, Catalyst::View::HTML::Mason,

       See Catalyst::Upgrading for additional information on Catalyst
       extensions that require upgrades.

       In generally for	the common views you should not	need to	do anything
       special.	 If your actual	template files contain UTF8 literals you
       should set configuration	on your	View to	enable that.  For example in
       TT, if your template has	actual UTF8 character in it you	should do the

	   MyApp::View::TT->config(ENCODING => 'utf-8');

       However Catalyst::View::Xslate wants to do the UTF8 encoding for	you
       (We assume that the authors of that view	did this as a workaround to
       the fact	that until now encoding	was not	core to	Catalyst.  So if you
       use that	view, you either need to tell it to not	encode,	or you need to
       turn off	encoding for Catalyst.

	   MyApp::View::Xslate->config(encode_body => 0);



       Preference is to	disable	it in the View.

       Other views may be similar.  You	should review View documentation and
       test during upgrading.  We tried	to make	sure most common views worked
       properly	and noted all workaround but if	we missed something please
       alert the development team (instead of introducing a local hack into
       your application	that will mean nobody will ever	upgrade	it...).

   Setting the response	from an	external PSGI application.
       Catalyst::Response allows one to	set the	response from an external PSGI
       application.  If	you do this, and that external application sets	a
       character set on	the content-type, we "clear_encoding" for the rest of
       the response.  This is done to prevent double encoding.

       NOTE Even if the	character set of the content type is the same as the
       encoding	set in $c->encoding, we	still skip encoding.  This is a
       regrettable difference from the general rule outlined above, where if
       the current character set is the	same as	the current encoding, we
       encode anyway.  Nevertheless I think this is the	correct	behavior since
       the earlier rule	exists only to support backward	compatibility with

       In general if you want Catalyst to handle encoding, you should avoid
       setting the content type	character set since Catalyst will do so
       automatically based on the requested response encoding.	Its best to
       request alternative encodings by	setting	$c->encoding and if you
       really want manual control of encoding you should always
       $c->clear_encoding so that programmers that come	after you are very
       clear as	to your	intentions.

   Disabling default UTF8 encoding
       You may encounter issues	with your legacy code running under default
       UTF8 body encoding.  If so you can disable this with the	following
       configurations setting:


       Where "MyApp" is	your Catalyst subclass.

       If you do not wish to disable all the Catalyst encoding features, you
       may disable specific features via two additional	configuration options:
       'skip_body_param_unicode_decoding' and
       'skip_complex_post_part_handling'.  The first will skip any attempt to
       decode POST parameters in the creating of body parameters and the
       second will skip	creation of instances of Catalyst::Request::PartData
       in the case that	the multipart form upload contains parts with a	mix of
       content character sets.

       If you believe you have discovered a bug	in UTF8	body encoding, I
       strongly	encourage you to report	it (and	not try	to hack	a workaround
       in your local code).  We	also recommend that you	regard such a
       workaround as a temporary solution.  It is ideal	if Catalyst extension
       authors can start to count on Catalyst doing the	right thing for

       This document has attempted to be a complete review of how UTF8 and
       encoding	works in the current version of	Catalyst and also to document
       known issues, gotchas and backward compatible hacks.  Please report
       issues to the development team.

       John Napiorkowski <>

perl v5.32.1			  2020-07-26		     Catalyst::UTF8(3)

Name | Description | UTF8 in Controller Actions | UTF8 in GET Query and Form POST | UTF8 Encoding in Body Response | Conclusion | Author

Want to link to this manual page? Use this URL:

home | help