Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
LaTeXML::Package(3)   User Contributed Perl Documentation  LaTeXML::Package(3)

       "LaTeXML::Package" - Support for	package	implementations	and document

       This package defines and	exports	most of	the procedures users will need
       to customize or extend LaTeXML. The LaTeXML implementation of some
       package might look something like the following,	but see	the installed
       "LaTeXML/Package" directory for realistic examples.

	 package LaTeXML::Package::pool;  # to put new subs & variables	in common pool
	 use LaTeXML::Package;		  # to load these definitions
	 use strict;			  # good style
	 use warnings;
	 # Load	"anotherpackage"
	 # A simple macro, just	like in	TeX
	 DefMacro('\thesection', '\thechapter.\roman{section}');
	 # A constructor defines how a control sequence	generates XML:
	 DefConstructor('\thanks{}', "<ltx:thanks>#1</ltx:thanks>");
	 # And a simple	environment ...
	 # A math  symbol \Real	to stand for the Reals:
	 DefMath('\Real', "\x{211D}", role=>'ID');
	 # Or a	semantic floor:
	 # More	esoteric ...
	 # Use a RelaxNG schema
	 # Or use a special DocType if you have	to:
	 # DocType("rootelement",
	 #	   "-//Your Site//Your DocType",'your.dtd',
	 #	    prefix=>"http://whatever/");
	 # Allow sometag elements to be	automatically closed if	needed
	 Tag('prefix:sometag', autoClose=>1);
	 # Don't forget	this, so perl knows the	package	loaded.

       This module provides a large set	of utilities and declarations that are
       useful for writing `bindings': LaTeXML-specific implementations of a
       set of control sequences	such as	would be defined in a LaTeX style or
       class file. They	are also useful	for controlling	and customization of
       LaTeXML's processing.  See the "See also" section, below, for
       additional lower-level modules imported & re-exported.

       To a limited extent (and	currently only when explicitly enabled),
       LaTeXML can process the raw TeX code found in style files.  However, to
       preserve	document structure and semantics, as well as for efficiency,
       it is usually necessary to supply a LaTeXML-specific `binding' for
       style and class files. For example, a binding "mypackage.sty.ltxml"
       would encode LaTeXML-specific implementations of	all the	control
       sequences in "mypackage.sty" so that "\usepackage{mypackage}" would
       work.  Similarly	for "myclass.cls.ltxml".  Additionally,	document-
       specific	bindings can be	supplied: before processing a TeX source file,
       eg "mydoc.tex", LaTeXML will automatically include the definitions and
       settings	in "mydoc.latexml".  These ".ltxml" and	".latexml" files
       should be placed	LaTeXML's searchpaths, where will find them: either in
       the current directory or	in a directory given to	the --path option, or
       possibly	added to the variable SEARCHPATHS).

       Since LaTeXML mimics TeX, a familiarity with TeX's processing model is
       critical.  LaTeXML models: catcodes and tokens (See
       LaTeXML::Core::Token,  LaTeXML::Core::Tokens) which are extracted from
       the plain source	text characters	by the LaTeXML::Core::Mouth; "Macros",
       which are expanded within the LaTeXML::Core::Gullet; and	"Primitives",
       which are digested within the LaTeXML::Core::Stomach to produce
       LaTeXML::Core::Box, LaTeXML::Core::List.	 A key additional feature is
       the "Constructors": when	digested they generate a
       LaTeXML::Core::Whatsit which, upon absorbtion by
       LaTeXML::Core::Document,	inserts	text or	XML fragments in the final
       document	tree.

       Notation: Many of the following forms take code references as arguments
       or options.  That is, either a reference	to a defined sub, eg.
       "\&somesub", or an anonymous function "sub { ...	}".  To	document these
       cases, and the arguments	that are passed	in each	case, we'll use	a
       notation	like "code($stomach,...)".

   Control Sequences
       Many of the following forms define the behaviour	of control sequences.
       While in	TeX you'll typically only define macros, LaTeXML is
       effectively redefining TeX itself, so we	define "Macros"	as well	as
       "Primitives", "Registers", "Constructors" and "Environments".  These
       define the behaviour of these control sequences when processed during
       the various phases of LaTeX's imitation of TeX's	digestive tract.


       LaTeXML uses a more convienient method of specifying parameter patterns
       for control sequences. The first	argument to each of these defining
       forms ("DefMacro", "DefPrimive",	etc) is	a prototype consisting of the
       control sequence	being defined along with the specification of
       parameters required by the control sequence.  Each parameter describes
       how to parse tokens following the control sequence into arguments or
       how to delimit them.  To	simplify coding	and capture common idioms in
       TeX/LaTeX programming, latexml's	parameter specifications are more
       expressive than TeX's  "\def" or	LaTeX's	"\newcommand".	Examples of
       the prototypes for familiar TeX or LaTeX	control	sequences are:

	  DefPrimitive('\multiply Variable SkipKeyword:by Number',..
	  DefPrimitive('\newcommand OptionalMatch:* DefToken[]{}', ...

       The general syntax for parameter	specification is

	   reads a regular TeX argument.  spec can be omitted (ie. "{}").
	   Otherwise spec is itself a parameter	specification and the argument
	   is reparsed to accordingly.	("{}" is a shorthand for "Plain".)

	   reads an LaTeX-style	optional argument.  spec can be	omitted	(ie.
	   "{}").  Otherwise, if spec is of the	form Default:stuff, then stuff
	   would be the	default	value.	Otherwise spec is itself a parameter
	   specification and the argument, if supplied,	is reparsed according
	   to that specification.  ("[]" is a shorthand	for "Optional".)

	   Reads an argument of	the given type,	where either Type has been
	   declared, or	there exists a ReadType	function accessible from
	   LaTeXML::Package::Pool.  See	the available types, below.

       "Type:value | Type:value1:value2..."
	   These forms invoke the parser for Type but pass additional Tokens
	   to the reader function.  Typically this would supply	defaults or
	   parameters to a match.

	   Similar to Type, but	it is not considered an	error if the reader
	   returns undef.

	   Similar to "Optional"Type, but the value returned from the reader
	   is ignored, and does	not occupy a position in the arguments list.

       The predefined argument Types are as follows.

       "Plain, Semiverbatim"

	   Reads a standard TeX	argument being either the next token, or if
	   the next token is an	{, the balanced	token list.  In	the case of
	   "Semiverbatim", many	catcodes are disabled, which is	handy for
	   URL's, labels and similar.

       "Token, XToken"

	   Read	a single TeX Token.  For "XToken", if the next token is
	   expandable, it is repeatedly	expanded until an unexpandable token
	   remains, which is returned.

       "Number,	Dimension, Glue	| MuGlue"

	   Read	an Object corresponding	to Number, Dimension, Glue or MuGlue,
	   using TeX's rules for parsing these objects.

       "Until:match | XUntil:"match>

	   Reads tokens	until a	match to the tokens match is found, returning
	   the tokens preceding	the match. This	corresponds to TeX delimited
	   arguments.  For "XUntil", tokens are	expanded as they are matched
	   and accumulated (but	a brace	reads and accumulates till a matching
	   close brace,	without	expanding).


	   Reads tokens	until the next open brace "{".	This corresponds to
	   the peculiar	TeX construct "\def\foo#{...".

       "Match:match(|match)* | Keyword:"match(|match)*>

	   Reads tokens	expecting a match to one of the	token lists match,
	   returning the one that matches, or undef.  For "Keyword", case and
	   catcode of the matches are ignored.	Additionally, any leading
	   spaces are skipped.


	   Read	tokens until a closing }, but respecting nested	{} pairs.


	   Read	a parenthesis delimited	tokens,	but does not balance any
	   nested parentheses.

       "Undigested, Digested, DigestUntil:match"

	   These types alter the usual sequence	of tokenization	and digestion
	   in separate stages (like TeX).  A "Undigested" parameter inhibits
	   digestion completely	and remains in token form.  A "Digested"
	   parameter gets digested until the (required)	opening	{ is balanced;
	   this	is useful when the content would usually need to have been
	   protected in	order to correctly deal	with catcodes.	"DigestUntil"
	   digests tokens until	a token	matching match is found.


	   Reads a token, expanding if necessary, and expects a	control
	   sequence naming a writable register.	 If such is found, it returns
	   an array of the corresponding definition object, and	any arguments
	   required by that definition.

       "SkipSpaces, Skip1Space"

	   Skips one, or any number of,	space tokens, if present, but
	   contributes nothing to the argument list.

       Common Options

       "scope=>'local' | 'global' | scope"
	   Most	defining commands accept an option to control how the
	   definition is stored, for global or local definitions, or using a
	   named scope A named scope saves a set of definitions	and values
	   that	can be activated at a later time.

	   Particularly	interesting forms of scope are those that get
	   automatically activated upon	changes	of counter and label.  For
	   example, definitions	that have "scope=>'section:1.1'"  will be
	   activated when the section number is	"1.1", and will	be deactivated
	   when	that section ends.

	   This	option controls	whether	this definition	is locked from further
	   changes in the TeX sources; this keeps local	'customizations' by an
	   author from overriding important LaTeXML definitions	and breaking
	   the conversion.


       "DefMacro(prototype, expansion, %options);"

	   Defines the macro expansion for prototype; a	macro control sequence
	   that	is expanded during macro expansion time	in the
	   LaTeXML::Core::Gullet.  The expansion should	be one of tokens |
	   string | code($gullet,@args)>: a string will	be tokenized upon
	   first usage.	 Any macro arguments will be substituted for parameter
	   indicators (eg #1) in the tokens or tokenized string	and the	result
	   is used as the expansion of the control sequence. If	code is	used,
	   it is called	at expansion time and should return a list of tokens
	   as its result.

	   DefMacro options are

	       See "Common Options".

	       specifies a definition that will	only be	expanded in math mode;
	       the control sequence must be a single character.


	     DefMacro('\today',sub { ExplodeText(today()); });

       "DefMacroI(cs, paramlist, expansion, %options);"

	   Internal form of "DefMacro" where the control sequence and
	   parameter list have already been separated; useful for definitions
	   from	within code.  Also, slightly more efficient for	macros with no
	   arguments (use "undef" for paramlist), and useful for obscure cases
	   like	defining "\begin{something*}" as a Macro.


       "DefConditional(prototype, test,	%options);"

	   Defines a conditional for prototype;	a control sequence that	is
	   processed during macro expansion time (in the
	   LaTeXML::Core::Gullet).  A conditional corresponds to a TeX "\if".
	   If the test is "undef", a "\newif" type of conditional is defined,
	   which is controlled with control sequences like "\footrue" and
	   "\foofalse".	 Otherwise the test should be "code($gullet,@args)"
	   (with the control sequence's	arguments) that	is called at expand
	   time	to determine the condition.  Depending on whether the result
	   of that evaluation returns a	true or	false value (in	the usual Perl
	   sense), the result of the expansion is either the first or else
	   code	following, in the usual	TeX sense.

	   DefConditional options are

	       See "Common Options".

	       This option is only used	to define "\ifcase".


	     DefConditional('\ifmmode',sub {
		LookupValue('IN_MATH');	});

       "DefConditionalI(cs, paramlist, test, %options);"

	   Internal form of "DefConditional" where the control sequence	and
	   parameter list have already been parsed; useful for definitions
	   from	within code.  Also, slightly more efficient for	conditinal
	   with	no arguments (use "undef" for "paramlist").


	   "IfCondition" allows	you to test a conditional from within perl.
	   Thus	something like "if(IfCondition('\ifmmode')){ domath } else {
	   dotext }" might be equivalent to TeX's "\ifmmode domath \else
	   dotext \fi".


       "DefPrimitive(prototype,	replacement, %options);"

	   Defines a primitive control sequence; a primitive is	processed
	   during digestion (in	the  LaTeXML::Core::Stomach), after macro
	   expansion but before	Construction time.  Primitive control
	   sequences generate Boxes or Lists, generally	containing basic
	   Unicode content, rather than	structured XML.	 Primitive control
	   sequences are also executed for side	effect during digestion,
	   effecting changes to	the LaTeXML::Core::State.

	   The replacement can be a string used	as the text content of a Box
	   to be created (using	the current font).  Alternatively replacement
	   can be "code($stomach,@args)" (with the control sequence's
	   arguments) which is invoked at digestion time, probably for side-
	   effect, but returning Boxes or Lists	or nothing.  replacement may
	   also	be undef, which	contributes nothing to the document, but does
	   record the TeX code that created it.

	   DefPrimitive	options	are

	       See "Common Options".

	   "mode=> ('text' | 'display_math' | 'inline_math')"
	       Changes to this mode during digestion.

	       Specifies the font to use (see "Fonts").	 If the	font change is
	       to only apply to	material generated within this command,	you
	       would also use "<bounded="1>>; otherwise, the font will remain
	       in effect afterwards as for a font switching command.

	       If true,	TeX grouping (ie. "{}")	is enforced around this

	       specifies whether the given constructor can only	appear,	or
	       cannot appear, in math mode.

	       supplies	a hook to execute during digestion just	before the
	       main part of the	primitive is executed (and before any
	       arguments have been read).  The code should either return
	       nothing (return;) or a list of digested items
	       (Box's,List,Whatsit).  It can thus change the State and/or add
	       to the digested output.

	       supplies	a hook to execute during digestion just	after the main
	       part of the primitive ie	executed.  it should either return
	       nothing (return;) or digested items.  It	can thus change	the
	       State and/or add	to the digested	output.

	       indicates whether this is a prefix type of command; This	is
	       only used for the special TeX assignment	prefixes, like


	      DefPrimitive('\begingroup',sub { $_[0]->begingroup; });

       "DefPrimitiveI(cs, paramlist, code($stomach,@args), %options);"

	   Internal form of "DefPrimitive" where the control sequence and
	   parameter list have already been separated; useful for definitions
	   from	within code.


       "DefRegister(prototype, value, %options);"

	   Defines a register with value as the	initial	value (a Number,
	   Dimension, Glue, MuGlue or Tokens --- I haven't handled Box's yet).
	   Usually, the	prototype is just the control sequence,	but registers
	   are also handled by prototypes like "\count{Number}". "DefRegister"
	   arranges that the register value can	be accessed when a numeric,
	   dimension, ... value	is being read, and also	defines	the control
	   sequence for	assignment.

	   Options are

	       specifies if it is not allowed to change	this value.

	       By default value	is stored in the State's Value table under a
	       name concatenating the control sequence and argument values.
	       These options allow other means of fetching and storing the



       "DefRegisterI(cs, paramlist, value, %options);"

	   Internal form of "DefRegister" where	the control sequence and
	   parameter list have already been parsed; useful for definitions
	   from	within code.


       "DefConstructor(prototype, $replacement,	%options);"

	   The Constructor is where LaTeXML really starts getting interesting;
	   invoking the	control	sequence will generate an arbitrary XML
	   fragment in the document tree.  More	specifically: during
	   digestion, the arguments will be read and digested, creating	a
	   LaTeXML::Core::Whatsit to represent the object. During absorbtion
	   by the LaTeXML::Core::Document, the "Whatsit" will generate the XML
	   fragment according to replacement. The replacement can be
	   "code($document,@args,%properties)" which is	called during document
	   absorbtion to create	the appropriate	XML (See the methods of

	   More	conveniently, replacement can be an pattern: simply a bit of
	   XML as a string with	certain	substitutions to be made. The
	   substitutions are of	the following forms:

	   "#1,	#2 ... #name"
	       These are replaced by the corresponding argument	(for #1) or
	       property	(for #name) stored with	the Whatsit. Each are turned
	       into a string when it appears as	in an attribute	position, or
	       recursively processed when it appears as	content.

	       Another form of substituted value is prefixed with "&" which
	       invokes a function.  For	example, " &func(#1) " would invoke
	       the function "func" on the first	argument to the	control
	       sequence; what it returns will be inserted into the document.

	   "?test(pattern)"  or	"?test(ifpattern)(elsepattern)"
	       Patterns	can be conditionallized	using this form.  The test is
	       any of the above	expressions (eg. "#1"),	considered true	if the
	       result is non-empty.  Thus "?#1(<foo/>)"	would add the empty
	       element "foo" if	the first argument were	given.

	   "^" If the constuctor begins	with "^", the XML fragment is allowed
	       to float	up to a	parent node that is allowed to contain it,
	       according to the	Document Type.

	   The Whatsit property	"font" is defined by default.  Additional
	   properties "body" and "trailer" are defined when "captureBody" is
	   true, or for	environments.  By using
	   "$whatsit->setProperty(key=>$value);" within	"afterDigest", or by
	   using the "properties" option, other	properties can be added.

	   DefConstructor options are

	       See "Common Options".

	       These options are the same as for "Primitives"

	   "reversion=>texstring | code($whatsit,#1,#2,...)"
	       specifies the reversion of the invocation back into TeX tokens
	       (if the default reversion is not	appropriate).  The textstring
	       string can include "#1",	"#2"...	 The code is called with the
	       $whatsit	and digested arguments and must	return a list of

	       provides	a control sequence to be used in the "reversion"
	       instead of the one defined in the "prototype".  This is a
	       convenient alternative for reversion when a 'public' command
	       conditionally expands into an internal one, but the reversion
	       should be for the public	command.

	   "sizer=>string | code($whatsit)"
	       specifies how to	compute	(approximate) the displayed size of
	       the object, if that size	is ever	needed (typically needed for
	       graphics	generation).  If a string is given, it should contain
	       only a sequence of "#1" or "#name" to access arguments and
	       properties of the Whatsit: the size is computed from these
	       items layed out side-by-side.  If code is given,	it should
	       return the three	Dimensions (width, height and depth).  If
	       neither is given, and the "reversion" specification is of
	       suitible	format,	it will	be used	for the	sizer.

	   "properties=>{%properties} |	code($stomach,#1,#2...)"
	       supplies	additional properties to be set	on the generated
	       Whatsit.	 In the	first form, the	values can be of any type, but
	       if a value is a code references,	it takes the same args
	       ($stomach,#1,#2,...) and	should return the value; it is
	       executed	before creating	the Whatsit.  In the second form, the
	       code should return a hash of properties.

	       supplies	a hook to execute during digestion just	before the
	       Whatsit is created.  The	code should either return nothing
	       (return;) or a list of digested items (Box's,List,Whatsit).  It
	       can thus	change the State and/or	add to the digested output.

	       supplies	a hook to execute during digestion just	after the
	       Whatsit is created (and so the Whatsit already has its
	       arguments and properties). It should either return nothing
	       (return;) or digested items.  It	can thus change	the State,
	       modify the Whatsit, and/or add to the digested output.

	       supplies	a hook to execute before constructing the XML
	       (generated by replacement).

	       Supplies	code to	execute	after constructing the XML.

	   "captureBody=>boolean | Token"
	       if true,	arbitrary following material will be accumulated into
	       a `body'	until the current grouping level is reverted, or till
	       the "Token" is encountered if the option	is a "Token".  This
	       body is available as the	"body" property	of the Whatsit.	 This
	       is used by environments and math.

	       This gives a number of args for cases where it can't be infered
	       directly	from the prototype (eg.	when more args are explicitly
	       read by hooks).

       "DefConstructorI(cs, paramlist, replacement, %options);"

	   Internal form of "DefConstructor" where the control sequence	and
	   parameter list have already been separated; useful for definitions
	   from	within code.

       "DefMath(prototype, tex,	%options);"

	   A common shorthand constructor; it defines a	control	sequence that
	   creates a mathematical object, such as a symbol, function or
	   operator application.  The options given can	effectively create
	   semantic macros that	contribute to the eventual parsing of
	   mathematical	content.  In particular, it generates an XMDual	using
	   the replacement tex for the presentation.  The content information
	   is drawn from the name and options

	   "DefMath" accepts the options:

	       See "Common Options".

	       These options are the same as for "Constructors"

	       gives a name attribute for the object

	       gives the OpenMath content dictionary that name is from.

	       adds a grammatical role attribute to the	object;	this specifies
	       the grammatical role that the object plays in surrounding
	       expressions.  This direly needs documentation!

	   "mathstyle=>('display' | 'text' | 'script' |	'scriptscript')"
	       Controls	whether	the this object	will be	presented in a
	       specific	mathstyle, or according	to the current setting of

	   "scriptpos=>('mid' |	'post')"
	       Controls	the positioning	of any sub and super-scripts relative
	       to this object; whether they be stacked over or under it, or
	       whether they will appear	in the usual position.	TeX.pool
	       defines a function "doScriptpos()" which	is useful for
	       operators like "\sum" in	that it	sets to	"mid" position when in
	       displaystyle, otherwise "post".

	       Whether or not the object is stretchy when displayed.

	       These three are similar to "role", "scriptpos" and "stretchy",
	       but are used in unusual cases.  These apply to the given
	       attributes to the operator token	in the content branch.

	       Normally, these commands	are digested with an implicit grouping
	       around them, localizing changes to fonts, etc; "noggroup=>1"
	       inhibits	this.


		role=>'ID', meaning=>'infinity');

       "DefMathI(cs, paramlist,	tex, %options);"

	   Internal form of "DefMath" where the	control	sequence and parameter
	   list	have already been separated; useful for	definitions from
	   within code.


       "DefEnvironment(prototype, replacement, %options);"

	   Defines an Environment that generates a specific XML	fragment.
	   "replacement" is of the same	form as	for DefConstructor, but	will
	   generally include reference to the "#body" property.	Upon
	   encountering	a "\begin{env}":  the mode is switched,	if needed,
	   else	a new group is opened; then the	environment name is noted; the
	   beforeDigest	hook is	run.  Then the Whatsit representing the	begin
	   command (but	ultimately the whole environment) is created and the
	   afterDigestBegin hook is run.  Next,	the body will be digested and
	   collected until the balancing "\end{env}".	Then, any afterDigest
	   hook	is run,	the environment	is ended, finally the mode is ended or
	   the group is	closed.	 The body and "\end{env}" whatsit are added to
	   the "\begin{env}"'s whatsit as body and trailer, respectively.

	   "DefEnvironment" takes the following	options:

	       See "Common Options".

	       These options are the same as for "Primitives"

	       These options are the same as for "DefConstructor"

	       This hook is similar to that for	"DefConstructor", but it
	       applies to the "\begin{environment}" control sequence.

	       This hook is similar to "DefConstructor"'s "afterDigest"	but it
	       applies to the "\begin{environment}" control sequence.  The
	       Whatsit is the one for the beginning control sequence, but
	       represents the environment as a whole.  Note that although the
	       arguments and properties	are present in the Whatsit, the	body
	       of the environment is not yet available!

	       This hook is similar to "DefConstructor"'s "beforeDigest" but
	       it applies to the "\end{environment}" control sequence.

	       This hook is simlar to "DefConstructor"'s "afterDigest" but it
	       applies to the "\end{environment}" control sequence.  Note,
	       however that the	Whatsit	is only	for the	ending control
	       sequence, not the Whatsit for the environment as	a whole.

	       This option supplies a hook to be executed during digestion
	       after the ending	control	sequence has been digested (and	all
	       the 4 other digestion hook have executed) and after the body of
	       the environment has been	obtained.  The Whatsit is the (useful)
	       one representing	the whole environment, and it now does have
	       the body	and trailer available, stored as a properties.


		"<ltx:emph>#1</ltx:emph", mode=>'text');

       "DefEnvironmentI(name, paramlist, replacement, %options);"

	   Internal form of "DefEnvironment" where the control sequence	and
	   parameter list have already been separated; useful for definitions
	   from	within code.

   Inputing Content and	Definitions
       "FindFile(name, %options);"

	   Find	an appropriate file with the given name	in the current
	   directories in "SEARCHPATHS".  If a file ending with	".ltxml" is
	   found, it will be preferred.

	   Note	that if	the "name" starts with a recognized protocol
	   (currently one of "(literal|http|https|ftp)") followed by a colon,
	   the name is returned, as is,	and no search for files	is carried

	   The options are:

	       specifies the file type.	 If not	set, it	will search for	both
	       "name.tex" and name.

	       inhibits	searching for a	LaTeXML	binding	("name.type.ltxml") to
	       use instead of the file itself.

	       inhibits	searching for raw tex version of the file.  That is,
	       it will only search for the LaTeXML binding.

       "InputContent(request, %options);"

	   "InputContent" is used for cases when the file (or data) is plain
	   TeX material	that is	expected to contribute content to the document
	   (as opposed to pure definitions).  A	Mouth is opened	onto the file,
	   and subsequent reading and/or digestion will	pull Tokens from that
	   Mouth until it is exhausted,	or closed.

	   In some circumstances it may	be useful to provide a string
	   containing the TeX material explicitly, rather than referencing a
	   file.  In this case,	the "literal" pseudo-protocal may be used:


	   If a	file named "$request.latexml" exists, it will be read in as if
	   it were a latexml binding file, before processing.  This can	be
	   used	for adhoc customization	of the conversion of specific files,
	   without modifying the source, or creating more elaborate bindings.

	   The only option to "InputContent" is:

	       Inhibits	signalling an error if no appropriate file is found.


	   "Input" is analogous	to LaTeX's "\input", and is used in cases
	   where it isn't completely clear whether content or definitions is
	   expected.  Once a file is found, the	approach specified by
	   "InputContent" or "InputDefinitions"	is used, depending on which
	   type	of file	is found.

       "InputDefinitions(request, %options);"

	   "InputDefinitions" is used for loading definitions, ie. various
	   macros, settings, etc, rather than document content;	it can be used
	   to load LaTeXML's binding files, or for reading in raw TeX
	   definitions or style	files.	It reads and processes the material
	   completely before returning,	even in	the case of TeX	definitions.
	   This	procedure optionally supports the conventions used for
	   standard LaTeX packages and classes (see "RequirePackage" and

	   Options for "InputDefinitions" are:

	       the file	type to	search for.

	       inhibits	searching for a	LaTeXML	binding; only raw TeX files
	       will be sought and loaded.

	       inhibits	searching for raw TeX files, only a LaTeXML binding
	       will be sought and loaded.

	       inhibits	reporting an error if no appropriate file is found.

	   The following options are primarily useful when "InputDefinitions"
	   is supporting standard LaTeX	package	and class loading.

	       indicates whether to pass in any	options	from the calling class
	       or package.

	       indicates whether options processing should be handled.

	       specifies a list	of options (in the 'package options' sense) to
	       be passed (possibly in addition to any provided by the calling
	       class or	package).

	   "after=>tokens | code($gullet)"
	       provides	tokens or code to be processed by a "name.type-h@@k"

	       fishy option that indicates that	this definitions file should
	       be treated as if	it were	defining a class; typically shows up
	       in latex	compatibility mode, or AMSTeX.

	   A handy method to use most of the TeX distribution's	raw TeX
	   definitions for a package, but override only	a few with LaTeXML
	   bindings is by defining a binding file, say "tikz.sty.ltxml", to

	     InputDefinitions('tikz', type => 'sty', noltxml =>	1);

	   which would find and	read in	"tizk.sty", and	then follow it by a
	   couple of strategic LaTeXML definitions, "DefMacro",	etc.

   Class and Packages
       "RequirePackage(package,	%options);"

	   Finds and loads a package implementation (usually
	   "package.sty.ltxml",	unless "noltxml" is specified)for the
	   requested package.  It returns the pathname of the loaded package.
	   The options are:

	       specifies the file type (default	"sty".

	       specifies a list	of package options.

	       inhibits	searching for the LaTeXML binding for the file (ie.

	       inhibits	searching for raw tex version of the file.  That is,
	       it will only search for the LaTeXML binding.

       "LoadClass(class, %options);"

	   Finds and loads a class definition (usually "class.cls.ltxml").  It
	   returns the pathname	of the loaded class.  The only option is

	       specifies a list	of class options.

       "LoadPool(pool, %options);"

	   Loads a pool	file (usually "pool.pool.ltxml"), one of the top-level
	   definition files, such as TeX, LaTeX	or AMSTeX.  It returns the
	   pathname of the loaded file.

       "DeclareOption(option, tokens | string |	code($stomach));"

	   Declares an option for the current package or class.	 The 2nd
	   argument can	be a string (which will	be tokenized and expanded) or
	   tokens (which will be macro expanded), to provide the value for the
	   option, or it can be	a code reference which is treated as a
	   primitive for side-effect.

	   If a	package	or class wants to accomodate options, it should	start
	   with	one or more "DeclareOptions", followed by "ProcessOptions()".

       "PassOptions(name, ext, @options); "

	   Causes the given @options (strings) to be passed to the package (if
	   ext is "sty") or class (if ext is "cls") named by name.


	   Processes the options that have been	passed to the current package
	   or class in a fashion similar to LaTeX.  The	only option (to
	   "ProcessOptions" is "inorder=>boolean" indicating whehter the
	   (package) options are processed in the order	they were used,	like


	   Process the options given explicitly	in @options.

       "AtBeginDocument(@stuff); "

	   Arranges for	@stuff to be carried out after the preamble, at	the
	   beginning of	the document.  @stuff should typically be macro-level
	   stuff, but carried out for side effect; it should be	tokens,	tokens
	   lists, strings (which will be tokenized), or	"code($gullet)"	which
	   would yeild tokens to be expanded.

	   This	operation is useful for	style files loaded with	"--preload" or
	   document specific customization files (ie. ending with ".latexml");
	   normally the	contents would be executed before LaTeX	and other
	   style files are loaded and thus can be overridden by	them.  By
	   deferring the evaluation to begin-document time, these contents can
	   override those style	files.	This is	likely to only be meaningful
	   for LaTeX documents.

	   Arranges for	@stuff to be carried out just before
	   "\\end{document}".  These tokens can	be used	for side effect, or
	   any content they generate will appear as the	last children of the

   Counters and	IDs
       "NewCounter(ctr,	within,	%options);"

	   Defines a new counter, like LaTeX's \newcounter, but	extended.  It
	   defines a counter that can be used to generate reference numbers,
	   and defines "\thectr", etc. It also defines an "uncounter" which
	   can be used to generate ID's	(xml:id) for unnumbered	objects.  ctr
	   is the name of the counter.	If defined, within is the name of
	   another counter which, when incremented, will cause this counter to
	   be reset.  The options are

	       Specifies a prefix to be	used to	generate ID's when using this

	       Not sure	that this is even sane.

       "$num = CounterValue($ctr);"

	   Fetches the value associated	with the counter $ctr.

       "$tokens	= StepCounter($ctr);"

	   Analog of "\stepcounter", steps the counter and returns the
	   expansion of	"\the$ctr".  Usually you should	use
	   "RefStepCounter($ctr)" instead.

       "$keys =	RefStepCounter($ctr);"

	   Analog of "\refstepcounter",	steps the counter and returns a	hash
	   containing the keys "refnum="$refnum, id=>$id>.  This makes it
	   suitable for	use in a "properties" option to	constructors.  The
	   "id"	is generated in	parallel with the reference number to assist

       "$keys =	RefStepID($ctr);"

	   Like	to "RefStepCounter", but only steps the	"uncounter", and
	   returns only	the id;	 This is useful	for unnumbered cases of
	   objects that	normally get both a refnum and id.


	   Resets the counter $ctr to zero.


	   Generates an	ID for nodes during the	construction phase, useful for
	   cases where the counter based scheme	is inappropriate.  The calling
	   pattern makes it appropriate	for use	in Tag,	as in

	      Tag('ltx:para',afterClose=>sub { GenerateID(@_,'p'); })

	   If $node doesn't already have an xml:id set,	it computes an
	   appropriate id by concatenating the xml:id of the closest ancestor
	   with	an id (if any),	the prefix (if any) and	a unique counter.

   Document Model
       Constructors define how TeX markup will generate	XML fragments, but the
       Document	Model is used to control exactly how those fragments are

       "Tag(tag, %properties);"

	   Declares properties of elements with	the name tag.  Note that "Tag"
	   can set or add properties to	any element from any binding file,
	   unlike the properties set on	control	by  "DefPrimtive",
	   "DefConstructor", etc..  And, since the properties are recorded in
	   the current Model, they are not subject to TeX grouping; once set,
	   they	remain in effect until changed or the end of the document.

	   The tag can be specified in one of three forms:

	      prefix:name matches specific name	in specific namespace
	      prefix:*	  matches any tag in the specific namespace;
	      *		  matches any tag in any namespace.

	   There are two kinds of properties:

	   Scalar properties
	       For scalar properties, only a single value is returned for a
	       given element.  When the	property is looked up, each of the
	       above forms is considered (the specific element name, the
	       namespace, and all elements); the first defined value is

	       The recognized scalar properties	are:

		   Specifies whether tag can be	automatically opened if	needed
		   to insert an	element	that can only be contained by tag.
		   This	property can help match	the more  SGML-like LaTeX to

		   Specifies whether this tag can be automatically closed if
		   needed to close an ancestor node, or	insert an element into
		   an ancestor.	 This property can help	match the more	SGML-
		   like	LaTeX to XML.

	   Code	properties
	       These properties	provide	a bit of code to be run	at the times
	       of certain events associated with an element.  All the code
	       bits that match a given element will be run, and	since they can
	       be added	by any binding file, and be specified in a random
	       orders, a little	bit of extra control is	desirable.

	       Firstly,	any early codes	are run	(eg "afterOpen:early"),	then
	       any normal codes	(without modifier) are run, and	finally	any
	       late codes are run (eg. "afterOpen:late").

	       Within each of those groups, the	codes assigned for an
	       element's specific name are run first, then those assigned for
	       its package and finally the generic one ("*"); that is, the
	       most specific codes are run first.

	       When code properties are	accumulated by "Tag" for normal	or
	       late events, the	code is	appended to the	end of the current
	       list (if	there were any previous	codes added); for early	event,
	       the code	is prepended.

	       The recognized code properties are:

		   Provides code to be run whenever a node with	this tag is
		   opened.  It is called with the document being constructed,
		   and the initiating digested object as arguments.  It	is
		   called after	the node has been created, and after any
		   initial attributes due to the constructor (passed to
		   openElement)	are added.

		   "afterOpen:early" or	"afterOpen:late" can be	used in	place
		   of "afterOpen"; these will be run as	a group	before,	or
		   after (respectively)	the unmodified blocks.

		   Provides code to be run whenever a node with	this tag is
		   closed.  It is called with the document being constructed,
		   and the initiating digested object as arguments.

		   "afterClose:early" or "afterClose:late" can be used in
		   place of "afterClose"; these	will be	run as a group bfore,
		   or after (respectively) the unmodified blocks.


	   Specifies the schema	to use for determining document	model.	You
	   can leave off the extension;	it will	look for "schemaname.rng" (and
	   maybe eventually, ".rnc" if that is ever implemented).

       "RegisterNamespace(prefix, URL);"

	   Declares the	prefix to be associated	with the given URL.  These
	   prefixes may	be used	in ltxml files,	particularly for constructors,
	   xpath expressions, etc.  They are not necessarily the same as the
	   prefixes that will be used in the generated document	Use the	prefix
	   "#default" for the default, non-prefixed, namespace.	 (See
	   RegisterDocumentNamespace, as well as DocType or RelaxNGSchema).

       "RegisterDocumentNamespace(prefix, URL);"

	   Declares the	prefix to be associated	with the given URL used	within
	   the generated XML. They are not necessarily the same	as the
	   prefixes used in code (RegisterNamespace).  This function is	less
	   rarely needed, as the namespace declarations	are generally obtained
	   from	the DTD	or Schema themselves Use the prefix "#default" for the
	   default, non-prefixed, namespace.  (See DocType or RelaxNGSchema).

       "DocType(rootelement, publicid, systemid, %namespaces);"

	   Declares the	expected rootelement, the public and system ID's of
	   the document	type to	be used	in the final document.	The hash
	   %namespaces specifies the namespaces	prefixes that are expected to
	   be found in the DTD,	along with each	associated namespace URI.  Use
	   the prefix "#default" for the default namespace (ie.	the namespace
	   of non-prefixed elements in the DTD).

	   The prefixes	defined	for the	DTD may	be different from the prefixes
	   used	in implementation CODE (eg. in ltxml files; see
	   RegisterNamespace).	The generated document will use	the namespaces
	   and prefixes	defined	for the	DTD.

   Document Rewriting
       During document construction, as	each node gets closed, the text
       content gets simplfied.	We'll call it applying ligatures, for lack of
       a better	name.

       "DefLigature(regexp, %options);"

	   Apply the regular expression	(given as a string: "/fa/fa/" since it
	   will	be converted internally	to a true regexp), to the text
	   content.  The only option is	"fontTest=>code($font)"; if given,
	   then	the substitution is applied only when "fontTest" returns true.

	   Predefined Ligatures	combine	sequences of "." or single-quotes into
	   appropriate Unicode characters.


	   A Math Ligature typically combines a	sequence of math tokens
	   (XMTok) into	a single one.  A simple	example	is

	      DefMathLigature(":=" => ":=", role => 'RELOP', meaning =>	'assign');

	   replaces the	two tokens for colon and equals	by a token
	   representing	assignment.  The options are those characterising an
	   XMTok, namely: "role", "meaning" and	"name".

	   For more complex cases (recognizing numbers,	for example), you may
	   supply a function "matcher="CODE($document,$node)>, which is	passed
	   the current document	and the	last math node in the sequence.	 It
	   should examine $node	and any	preceding nodes	(using
	   "previousSibling") and return a list	of "($n,$string,%attributes)"
	   to replace the $n nodes by a	new one	with text content being
	   $string content and the given attributes.  If no replacement	is
	   called for, CODE should return undef.

       After document construction, various rewriting and augmenting of	the
       document	can take place.


	   These two declarations define document rewrite rules	that are
	   applied to the document tree	after it has been constructed, but
	   before math parsing,	or any other postprocessing, is	done.  The
	   %specification consists of a	sequence of key/value pairs with the
	   initial specs successively narrowing	the selection of document
	   nodes, and the remaining specs indicating how to modify or replace
	   the selected	nodes.

	   The following select	portions of the	document:

	       Selects the part	of the document	with label=$label

	       The scope could be "label:foo" or "section:1.2.3" or something
	       similar.	These select a subtree labelled	'foo', or a section
	       with reference number "1.2.3"

	       Select those nodes matching an explicit xpath expression.

	       Selects nodes that look like what the processing	of tex would

	       Selects text nodes that match the regular expression.

	   The following act upon the selected node:

	       Adds the	attributes given in the	hash reference to the node.

	       Interprets replacement as TeX code to generate nodes that will
	       replace the selected nodes.

   Mid-Level support
       "$tokens	= Expand($tokens);"

	   Expands the given $tokens according to current definitions.

       "$boxes = Digest($tokens);"

	   Processes and digestes the $tokens.	Any arguments needed by
	   control sequences in	$tokens	must be	contained within the $tokens

       "@tokens	= Invocation($cs,@args);"

	   Constructs a	sequence of tokens that	would invoke the token $cs on
	   the arguments.

       "RawTeX('... tex	code ...');"

	   RawTeX is a convenience function for	including chunks of raw	TeX
	   (or LaTeX) code in a	Package	implementation.	 It is useful for
	   copying portions of the normal implementation that can be handled
	   simply using	macros and primitives.


	   Gives $token1 the same `meaning' (definition) as $token2; like
	   TeX's \let.

       "StartSemiVerbatim(); ... ; EndSemiVerbatim();"
	   Disable disable most	TeX catcodes.

       "$tokens	= Tokenize($string);"
	   Tokenizes the $string using the standard catcodes, returning	a

       "$tokens	= TokenizeInternal($string);"
	   Tokenizes the $string according to the internal cattable (where @
	   is a	letter), returning a LaTeXML::Core::Tokens.

   Argument Readers

	   Reads from $gullet the tokens corresponding to $spec	(a Parameters

       "DefParameterType(type, code($gullet,@values), %options);"

	   Defines a new Parameter type, type, with code for its reader.

	   Options are:

	       This code is responsible	for converting a previously parsed
	       argument	back into a sequence of	Token's.

	       whether it is an	error if no matching input is found.

	       whether the value returned should contribute to argument	lists,
	       or simply be passed over.

	       whether the catcode table should	be modified before reading

       "<DefColumnType(proto, expansion);"

	   Defines a new column	type for tabular and arrays.  proto is the
	   prototype for the pattern, analogous	to the pattern used for	other
	   definitions,	except that macro being	defined	is a single character.
	   The expansion is a string specifying	what it	should expand into,
	   typically more verbose column specification.

   Access to State
       "$value = LookupValue($name);"

	   Lookup the current value associated with the	the string $name.


	   Assign $value to be associated with the the string $name, according
	   to the given	scoping	rule.

	   Values are also used	to specify most	configuration parameters
	   (which can therefor also be scoped).	 The recognized	configuration
	   parameters are:

	    VERBOSITY	      :	the level of verbosity for debugging
				output,	with 0 being default.
	    STRICT	      :	whether	errors (eg. undefined macros)
				are fatal.
	    INCLUDE_COMMENTS  :	whether	to preserve comments in	the
				source,	and to add occasional line
				number comments. (Default true).
	    PRESERVE_NEWLINES :	whether	newlines in the	source should
				be preserved (not 100% TeX-like).
				By default this	is true.
	    SEARCHPATHS	      :	a list of directories to search	for
				sources, implementations, etc.


	   This	function, along	with the next three are	like "AssignValue",
	   but maintain	a global list of values.  "PushValue" pushes the
	   provided values onto	the end	of a list.  The	data stored for	$name
	   is global and must be a LIST	reference; it is created if needed.


	   Similar to  "PushValue", but	pushes a value onto the	front of the
	   list.  The data stored for $name is global and must be a LIST
	   reference; it is created if needed.


	   Removes and returns the value on the	end of the list	named by
	   $name.  The data stored for $name is	global and must	be a LIST
	   reference.  Returns "undef" if there	is no data in the list.


	   Removes and returns the first value in the list named by $name.
	   The data stored for $name is	global and must	be a LIST reference.
	   Returns "undef" if there is no data in the list.


	   This	function maintains a hash association named by $name.  It
	   returns the value associated	with $key within that mapping.	The
	   data	stored for $name is global and must be a HASH reference.
	   Returns "undef" if there is no data associated with $key in the
	   mapping, or the mapping is not (yet)	defined.


	   This	function associates $value with	$key within the	mapping	named
	   by $name.  The data stored for $name	is global and must be a	HASH
	   reference; it is created if needed.

       "$value = LookupCatcode($char);"

	   Lookup the current catcode associated with the the character	$char.


	   Set $char to	have the given $catcode, with the assignment made
	   according to	the given scoping rule.

	   This	method is also used to specify whether a given character is
	   active in math mode,	by using "math:$char" for the character, and
	   using a value of 1 to specify that it is active.

       "$meaning = LookupMeaning($token);"

	   Looks up the	current	meaning	of the given $token which may be a
	   Definition, another token, or the token itself if it	has not
	   otherwise been defined.

       "$defn =	LookupDefinition($token);"

	   Looks up the	current	definition, if any, of the $token.


	   Install the Definition $defn	into $STATE under its control

	   Tests whether the two tokens	are equal in the sense that they are
	   either equal	tokens,	or if defined, have the	same definition.

       "MergeFont(%fontspec); "

	   Set the current font	by merging the font style attributes with the
	   current font.  The %fontspec	specifies the properties of the
	   desired font.  Likely values	include	(the values aren't required to
	   be in this set):

	    family : serif, sansserif, typewriter, caligraphic,
		     fraktur, script
	    series : medium, bold
	    shape  : upright, italic, slanted, smallcaps
	    size   : tiny, footnote, small, normal, large,
		     Large, LARGE, huge, Huge
	    color  : any named color, default is black

	   Some	families will only be used in math.  This function returns
	   nothing so it can be	easily used in beforeDigest, afterDigest.

	   Declares a font map for the encoding	$name. The map $map is an
	   array of 128	or 256 entries,	each element is	either a unicode
	   string for the representation of that codepoint, or undef if	that
	   codepoint is	not supported  by this encoding.  The only option
	   currently is	"family" used because some fonts (notably cmr!)	 have
	   different glyphs in some font families, such	as

	   Returns the unicode string representing the given codepoint $code
	   (an integer)	in the given font encoding $encoding.  If $encoding is
	   undefined, the usual	case, the current font encoding	and font
	   family is used for the lookup.  Explicit decoding is	used when
	   "\\char" or similar are invoked ($implicit is false), and the
	   codepoint must be represented in the	fontmap, otherwise undef is
	   returned.  Implicit decoding	(ie. $implicit is true)	occurs within
	   the Stomach when a Token's content is being digested	and converted
	   to a	Box; in	that case only the lower 128 codepoints	are converted;
	   all codepoints above	128 are	assumed	to already be Unicode.

	   The font map	for $encoding is automatically loaded if it has	not
	   already been	loaded.

	   Returns the unicode string resulting	from decoding the individual
	   characters in $string according to FontDecode, above.

	   Finds and loads the font map	for the	encoding named $encoding, if
	   it hasn't been loaded before.  It looks for
	   "encoding.fontmap.ltxml", which would typically define the font map
	   using "DeclareFontMap", possibly including extra maps for families
	   like	"typewriter".

	   Lookup the color object associated with $name.

	   Associates the $name	with the given $color (a color object),	with
	   the given scoping.

	   Defines a color model $model	that is	derived	from the core color
	   model $coremodel.  The two functions	$tocore	and $fromcore convert
	   a color object in that model	to the core model, or from the core
	   model to the	derived	model.	Core models are	rgb, cmy, cmyk,	hsb
	   and gray.

   Low-level Functions

	   Cleans an $id of disallowed characters, trimming space.


	   Cleans a $label of disallowed characters, trimming space.  The
	   prefix $prefix is prepended (or "LABEL", if none given).


	   Cleans an index key,	so it can be used as an	ID.

	   Cleans a bibliographic citation key,	so it can be used as an	ID.


	   Cleans a url.


	   Generates a UTF character, handy for	the the	8 bit characters.  For
	   example, "UTF(0xA0)"	generates the non-breaking space.

       "@tokens	= roman($number);"

	   Formats the $number in (lowercase) roman numerals, returning	a list
	   of the tokens.

       "@tokens	= Roman($number);"

	   Formats the $number in (uppercase) roman numerals, returning	a list
	   of the tokens.

       See also	LaTeXML::Global, LaTeXML::Common::Object,
       LaTeXML::Common::Error, LaTeXML::Core::Token, LaTeXML::Core::Tokens,
       LaTeXML::Core::Box, LaTeXML::Core::List,	LaTeXML::Common::Number,
       LaTeXML::Common::Float, LaTeXML::Common::Dimension,
       LaTeXML::Common::Glue, LaTeXML::Core::MuDimension,
       LaTeXML::Core::MuGlue, LaTeXML::Core::Pair, LaTeXML::Core::PairList,
       LaTeXML::Common::Color, LaTeXML::Core::Alignment, LaTeXML::Common::XML,

       Bruce Miller <>

       Public domain software, produced	as part	of work	done by	the United
       States Government & not subject to copyright in the US.

perl v5.32.1			  2020-11-16		   LaTeXML::Package(3)


Want to link to this manual page? Use this URL:

home | help