Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
lpOD::Document(3)     User Contributed Perl Documentation    lpOD::Document(3)

       ODF::lpOD::Document - General ODF package handling and metadata

       This manual page	describes the "odf_document", the common features of
       any "odf_part" of a "odf_document", and the particular features of the
       "odf_meta" and "odf_manifest" parts (that handle	the global document
       metadata	and the	manifest of the	associated container).

       Every "odf_document" is associated with a "odf_container" that
       encapsulates all	the physical access logic. On the other	hand, every
       "odf_document" is made of several components so-called parts. The lpOD
       API is mainly focused on	parts that describe the	global metadata, the
       text content, the layout	and the	structure of the document, and that
       are physically stored according to an XML schema. The common lpOD class
       for these parts is "odf_xmlpart"	(whose Perl implementation is the
       "ODF::lpOD::XMLPart" package).

       lpOD provides specialized classes for the conventional ODF XML parts,
       namely "odf_meta", "odf_content", "odf_styles", "odf_settings",

       In order	to process particular pieces of	content	in the most complex
       parts, i.e. "odf_content" and "odf_styles", the "odf_element" class and
       its various specialized derivatives are available. They are described
       in other	chapters of the	lpOD documentation.

Document initialization	and termination
       Any access to a document	requires a valid "odf_document"	instance, that
       may be created from an existing document	or from	scratch, using one of
       the constructors	introduced below. Once created,	this instance gives
       access to individual parts through the "get_part()" method.

       Knowing that the	API is object oriented,	a document instance
       initialization is done through a	"odf_document-"new()> class method;
       however,	lpOD provides a	functional wrapper for each use	case of	this


       See "odf_new_document(doc_type)".


       This function creates a read-write document instance from an existing
       resource	(i.e. a	physical, local	or remote, ODF file). The returned
       object is associated to the ODF resource, which may be updated. The
       required	argument is the	URI (or	file path) of the resource.


	       my $doc = odf_get_document("C:\MyDocuments\test.odt");

       If the "save" method of "odf_document" is later used without explicit
       target, the document is wrote back to the same resource (if this
       resource	is not read-only).

       Alternatively, the argument may be a "IO::File" corresponding to	an
       open, seekable file handle:

	       my $fh =	IO::File->new("test.odt", "r");
	       my $doc = odf_get_document($fh);


       Same as "odf_get_document", but the ODF resource	is used	in read	only
       mode, i.e. it's used as a template in order to generate other ODF
       physical	documents.

       Some metadata of	the new	document are initialized to the	following

       o   the creation	and modification dates are set to the current date;

       o   the creator and initial creator are set to the owner	of the current
	   process as reported by the operating	system (if this	information is

       o   the number of editing cycles	is set to 1;

       o   the "ODF::lpOD" string followed by the lpOD version number is used
	   as the generator identifier string;

       Each piece of metadata may be changed later by the application.


       Unlike other constructors, this one generates a "odf_document" instance
       from scratch. Technically, it's a variant of
       "odf_new_document_from_template", but the default template (provided
       with the	lpOD library) is used. The required argument specifies the
       document	type, that must	be 'text', 'spreadsheet', 'presentation', or
       'drawing'. The new document instance is not persistent; no file is
       created before an explicit use of the "save" method.

       The following example creates a spreadsheet document instance:

	       my $doc = odf_new_document('spreadsheet');

       Note that the instructions below	are equivalent:

	       my $doc = odf_create_document('spreadsheet');
	       my $doc = odf_document->create('spreadsheet');

       The real	content	of the instance	depends	on the default template.

       A set of	valid template ODF files is transparently installed with the
       standard	lpOD distribution. Advanced users may use their	own template
       files. To do so,	they have to replace the ODF files present in the
       "templates" sub directory of the	lpOD installation; the path to the
       lpOD installation may be	retrieved through the lpod->installation_path
       common function.	The user-provided template files must have the same

       Some metadata are initialized in	the same way as	with

       Document	instance termination

       In a long running process, as soon as a document	instance is no longer
       used, it's strongly recommended to issue	an explicit call to its
       "forget()" method. Without explicit destructor call, the	allocated
       memory is not automatically released when the object goes out of	scope.
       This functional constraint comes	mainly from deliberately implemented
       circular	references that	allow the applications to navigate back	and
       forth between objects through direct links.

Document MIME type check and control

       Returns the MIME	type of	the document (i.e. the full string that
       identifies the document type). An example of regular ODF	MIME type is:



       Allows the user to force	a new arbitrary	MIME type (not to use in
       ordinary	lpOD applications !).

Access to individual document parts
       get_part(name [options])

       Generic "odf_document" method allowing access to	any part of a
       previously created document instance, including parts that are not
       handled by lpOD.	 The lpOD library provides symbolic constants that
       represent the ODF usual XML parts: "CONTENT", "STYLES", "META",

       This instruction	returns	the CONTENT part of a document as a
       "odf_content" object:

	       $content	= $document->get_part(CONTENT);

       With "MIMETYPE" as argument, "get_part()" returns the MIME type of the
       document	as a text string, i.e. the same	result as "get_mimetype()".

       Note that "get_part(CONTENT)" may be replaced by	the "content()"
       accessor, so the	short form of the instruction above is:

	       $content	= $document->content;

       The parts are loaded for	read-write use by default. However, a "update"
       boolean option may be provided; if set to "FALSE", this option
       instructs lpOD that the loaded part will	not be persistently changed.
       In such case, the part is not really in "read-only" mode, knowing that
       the user	can always insert, update or delete any	element, but the
       changes regarding this part are not committed in	the ODF	file when the
       "save()"	method is used.	However, the user can make an XML export
       reflecting these	changes	at any time through the	part-based
       "serialize()" method.

       For special purposes with XML parts, get_part() may be called with
       optional	"handlers" and/or "roots" parameters that specify a custom
       behavior	during the parsing time, before	the full document
       availability. These parameters are respectively linked to the
       "twig_handlers" and "twig_roots"	options	of the underlying XML::Twig
       API, so you can find details about them in the XML::Twig	documentation.
       The value of each one must be a hash reference whose keys are XML tags
       and values are user-defined function references.

       The given handlers are triggered	each time the corresponding XML	tags
       are found by the	XML parser when	the part is loaded, before any other
       processing. As an example, the following	sequence displays the total
       number of paragraphs found in a document	content, knowing that 'text:p'
       is the ODF tag for paragraphs:

	       my $doc = odf_get_document($filename);
	       my $count = 0;
	       my $content = $doc->get_part(
		       handlers	   => {
			   'text:p'    => sub {	$count++ }
	       say "This document contains $count paragraphs";

       Of course there are more	user-friendly ways to count objects once the
       document	part is	loaded,	and this feature is probably not needed	in
       most cases. However, it's the most efficient way	to process elements
       "on the fly" in huge documents.

       Note that the "handlers"	option works only when the document part is
       loaded for the first time. So, in the following sequence, it will not
       work because the	"CONTENT" part is implicitly and automatically loaded
       and parsed by "get_body()" (knowing that	the body context is located
       inside the "CONTENT"):

	       sub process_paragraph   { say "Hello paragraph !" }
	       $doc = odf_document->get($filename);
	       $context	= $doc->get_body;
	       $content	= $doc->get_part(
		       handlers	       => {
			       'text:p'	  => \&process_paragraph

       The user-defined	callback function receives 2 arguments.	The first one
       is the XML::Twig	instance internally used by lpOD to handle the XML
       part (you can ignore it as long as you work with	ODF::lpOD documented
       features	only). The second one is the parsed ODF	element	itself.

       Remember	that every key in the handlers hash may	be a quoted regexp in
       order to	provide	more flexibility. If, in the code example above,
       'text:p'	is replaced by "qr'text:(p|h)'", then the corresponding
       handler is triggered for	paragraphs and headings	(knowing that 'text:h'
       is the ODF tag for headings).

       The "roots" option produces a more drastic effect. If this option is
       set, "get_part()" ignores any XML content outside of the	given roots
       (with the exception of the root element of the XML part). As an
       example,	the instruction	below instructs	"get_part()" to	load the
       'office:automatic-styles' element only in the "CONTENT" part:

	       my $content = $doc->get_part(
		       roots   => {
			   'office:automatic-styles' =>	TRUE

       In the example above, a specified root tag is specified with an
       associated "TRUE" value.	The given value	may be a user-defined function
       as well;	if so, the given function is triggered each time the given XML
       tag is processed, in the	same way as with the "handler" option. The
       next example illustrates	the fastest way	to parse a large document just
       to extract and display its headings (i.e. the 'text:h' elements),
       without any other processing (this code,	with some more output
       presentation sugar, could be used in order to quickly export a table of

	       sub say_heading_text {
		       my ($twig, $heading) = @_;
		       say $heading->get_text;

		       roots   => {
			       'text:h'	=> \&say_heading_text

       Remember	that, after such a sequence, the loaded	content	includes only
       the root	element	and the	'text:h' elements.

       The "roots" option allows the applications to avoid performance issues
       when they just need to get a read-only access to	particular portions of
       huge documents. On the other hand, this option should not be used when
       the part	is loaded for update, because it would produce truncated and
       inconsistent documents. So, as soon as "roots" is set, the default
       value of	the "update" option is silently	set to "FALSE" (but the	user
       can explicitly set this option to "TRUE"... and live with the

       Caution:	These options work only	with a previously existing document,
       and if the given	part has not been already loaded.

       "get_part()" may	be used	in order to get	any other document part, such
       as an image or any other	non-XML	part. To do so,	the real path of the
       needed part must	be specified instead of	one of the XML part symbolic
       names. As an example, the instruction below returns the binary content
       of an image:

	       $img = $document->get_part('Pictures/logo.jpg');

       In such a case, the method returns the data as an uninterpreted
       sequence	of bytes.

       (Remember that images files included in an ODF package are stored in a
       "Pictures" folder.)

       Returns "undef" if case of failure.

       There is	a shortcut for "get_part()" for	each part in "CONTENT",
       "STYLES", "META", and "MANIFEST", that is an accessor whose name	is the
       part name in lower case.	It's just syntactic sugar. As an example, the
       two following instruction are equivalent:

	       $part = $doc->get_part(CONTENT);
	       $part = $doc->content;

       A special "get_body()" or "body()" accessor is available. "get_body()"
       is mainly a part-based method, introduced later,	but, when called from
       a document object, it returns the body element of the "CONTENT" part.
       So the four instructions	below are equivalent:

	       $context	= $doc->get_body;
	       $context	= $doc->get_part(CONTENT)->get_body;
	       $context	= $doc->content->get_body;
	       $context	= $doc->body;

       Note that "get_body()" may be called with an optional argument that
       specifies the type of content, typically	'text',	'spreadsheet',
       'presentation', or 'drawing'. Of	course,	a well-formed ODF document
       should contain only one body and	its content type depends on the
       document	type (for example the content type of a	text document is
       always 'text'). Providing a content type	to "get_body()"	is just	a way
       among others to check the document type,	knowing	that this method
       returns "undef" if the given content type doesn't match the real	one.

	       my $context = $doc->get_body('spreadsheet');
	       if ($context) {
		       # do something
	       } else {
		       alert "We are not in spreadsheet	context	!";


       Returns the list	of the document	parts.

Accessing data inside a	part
       Everything in the part is stored	as a set of "odf_element" instances.
       So, for complex parts (such as "CONTENT") or parts that are not
       explicitly covered in the present documentation,	the applications need
       to get access to	an "entry point" that is a particular element. The
       most used entry points are the "root" and the "body". Every part
       handler provides	the "get_root()" and "get_body()" methods, each	one
       returning a "odf_element" instance, that	provides all the element-based
       features	(including the creation, insertion or retrieval	of other
       elements	that may become	in turn	working	contexts).

       For those who know the ODF XML schema, two part-based methods allow the
       selection of elements according to XPath	expressions, namely
       "get_element()" and "get_elements()". The first one requires an XPath
       expression and a	positional number; it returns the element
       corresponding to	the given position in the result set of	the XPath
       expression (if any). The	second one returns the full result set (i.e. a
       list of "odf_element" instances). For example, the instructions below
       return respectively the first paragraph and all the paragraphs of a
       part (assuming $part is a previously selected document part):

	       my $paragraph = $part->get_element('text:p', 0);
	       my @paragraphs =	$part->get_elements('text:p');

       Beware that such	instructions should not	appear in a real application,
       knowing that lpOD provides more user-friendly methods to	retrieve
       paragraphs (see ODF::lpOD::TextElement).

       Note that the position argument of "get_element"	is zero-based, and
       that it may be a	negative value (if so, it specifies a position counted
       backward	from the last matching element,	-1 being the position of the
       last one).

       So a large part of the lpOD functionality is described with the
       "odf_element" class, i.e. ODF::lpOD::Element.

Global document	metadata
       From the	handler	provided by "get_part(META)" (or "meta()"), several
       pieces of document metadata may be directly get or set.

   Simple metadata accessors
       Most metadata are just text strings. The	user may read or write each
       one using a "get_xxx" or	"set_xxx" accessor, where "xxx"	is the lpOD
       name of a particular property. The presently supported simple
       properties are:

       o   "creation_date": the	date of	the initial version of the document,
	   expressed in	ISO-8601 date format

       o   "creator": the name of the user who created the current version of
	   the document

       o   "description": the long description of the document

       o   "editing_cycles": the number	of edit	sessions (may be regarded as a
	   version number)

       o   "editing_duration": the total editing time through interactive
	   software, expressed as a time delta in ISO-8601 format

       o   "generator":	the signature of the application that created the

       o   "initial_creator": the name of the user who created the first
	   version of the document

       o   "language": the ISO code of the main	language used in the document

       o   "modification_date":	the date of the	last modification (i.e.	of the
	   current version)

       o   "subject": the subject (or short description) of the	document

       o   "title": the	title of the document.

       When used without argument, some	"set" accessors	may automatically set
       default values, according to the	capabilities of	the run	time
       environment.  For "set_creation_date()" and "set_modification_date()",
       the default is the current system date. For "set_creator()" and
       "set_initial_creator()",	the default is the identifier of the current
       system user. For	"set_generator()" the default is the system name of
       the current program (as it would	appear in a command line) or, if not
       available, the current process identifier. If the execution environment
       can't provide such information, no default value	is provided.
       "set_editing_cycles()", without argument, increments the
       "editing_cycles"	indicator by 1.

       Both "set_creation_date"	and "set_modification_date" allow the user to
       provide the date	in the ODF-compliant (ISO-8601)	format,	or in numeric
       format (like the	Perl "time" format). In	the second case, the provided
       time is automatically converted in the required format. Of course, the
       numeric format is more convenient for time calculations.

       The instruction below, for example, sets	the modification date to one
       hour earlier than the current system time:

	       $meta->set_modification_date(time() - 3600);

       The corresponding "get_"	accessors always return	the dates in their
       storage format. However,	the lpOD library provides a "numeric_date"
       that translates a regular ISO date into a Perl numeric "time" value (a
       symmetric "iso_date" global function translates a Perl "time" into a
       ISO date).

       Examples	of use:

	       $meta->set_title("The lpOD Cookbook");
	       $meta->set_creator("The lpOD Project team");
	       my $old_version = $meta->get_editing_cycles;

   Document statistics
       The global document statistics (as defined in the A<section>3.1.18 of
       the ODF 1.1 specification) may be get or	set using the "get_statistics"
       and "set_statistics" accessors. The first one returns the statistic
       properties as a hash reference. The second one takes a hash reference
       with the	same structure,	containing the attribute names and values. The
       following example displays the page count of the	document (assuming
       it's a text document):

	       my $meta	= $document->meta;
	       my $stat	= $meta->get_statistics;
	       say $meta->{'meta:page-count'};

       Note that nothing prevents the applications from	using "set_statistics"
       to set any arbitrary figure.

       The document metadata include a list of keywords	(possibly empty). This
       list may	be used	or changed.


       Knowing that a document may be "tagged" by one or more keywords,
       "odf_meta" provides a "get_keywords" method that	returns	the list of
       the current keywords as a comma-separated string.


       "set_keywords" allows the user to set a full list of keywords, provided
       as a single comma-separated string; the provided	list replaces any
       previously existing keyword; this method, used without argument or with
       an empty	string,	just removes all the keywords. Example:

	       $meta->set_keywords("ODF, OpenDocument, Python, Perl, Ruby, XML")

       The spaces after	the commas are ignored,	and it's not possible to set a
       keyword that contains comma(s) through "set_keywords".


       "set_keyword" appends a new, given keyword to the list; it's neutral if
       the given keyword is already present; it	allows commas in the given
       keyword (but we don't recommend such a practice).


       "check_keyword" returns "TRUE" if its argument (which may be a regular
       expression) matches an existing keyword,	or "FALSE" if the keyword is
       not present.


       "remove_keyword"	deletes	any keyword that matches the argument (which
       may be a	regular	expression).

   User-defined	metadata
       Each user-defined metadata element has a	unique name (or	key), a	value
       and a data type.


       Retrieves a user-defined	field according	to its name (that should be
       unique for the document). In scalar context, returns the	value of the
       field. In array context,	returns	the value and the data type.

       The regular ODF data types are "float", "date", "time", "boolean", and


       The "odf_meta" API provides a "get_user_fields" method that returns a
       list whose each element is a hash ref whose (self-documented) keys are
       "name", "value",	and "type".

       As an example, the following loop displays the name, the	value and the
       type of each use	field in the metadata part of a	document:

	       my $doc = odf_get_document($source);
	       my $meta	= $doc->meta;
	       foreach my $uf ($meta->get_user_fields) {
		       say "Name   " . $uf->{name} .
			   "Value  " . $uf->{value} .
			   "Type   " . $uf->{type}


       Allows the applications to set or change	all the	user-defined items.
       Its argument is a list of hash refs with	the same structure as the
       result of "get_user_fields()".

       set_user_field(name, value, type)

       Creates or changes a user field.	The first argument is the name
       (identifier).  The last argument	is the data type, which	must be	ODF-
       compliant (see "get_user_field"). If the	type is	not specified, it's
       default value is	'string'. If the type is "date", the value is
       automatically converted in ISO-8601 format if provided as a numeric
       "time" value.


	       $meta->set_user_field("Development status", "Working draft");
	       $meta->set_user_field("Security status",	"Classified");
	       $meta->set_user_field("Ready for	release", FALSE, "boolean");

How to persistently update a document
       Every part may be updated using specific	methods	that creates, change
       or remove elements, but this methods don't produce any persistent

       The updates done	in a given part	may be either exported as an XML
       string, or returned to the "odf_document" instance from which the part
       depends.	With the first option, the user	is responsible of the
       management of the exported XML (that can't be used as is	through	a
       typical office application), and	the original document is not
       persistently changed. The second	option instructs the "odf_document"
       that the	part has been changed and that this change should be reflected
       as soon as the physical resource	is wrote back. However,	a part-based
       method can't directly update the	resource. The changes may be made
       persistent through a "save()" method of the "odf_document" object.


       Same as "serialize()", introduced below.


       This part-based method returns a	full XML export	of the part. The
       returned	XML string may be stored somewhere and used later in order to
       create or replace a part	in another document, or	to feed	another

       This method may be ignored by users who just need to save created or
       changed documents in a regular compressed ODF format, because the
       document-based "save()" method does the whole job.

       A "indent" or "pretty" named option may be provided. If set to "TRUE",
       this option specifies that the XML export should	be indented, so	as
       human-readable as possible. The default value of	this option is

       The example below returns a conveniently	indented XML representation of
       the content part	of a document:

	       $doc = odf_document->get("C:\MyDocuments\test.odt");
	       $part = $doc->get_part(CONTENT);
	       $xml = $part->serialize(indent => TRUE);

       Note that this XML export is not	affected by the	encoding/decoding
       mechanism that works for	user content, so it's character	set doesn't
       depend on the custom text output	character set possibly selected
       through the "set_output_charset()" method introduced in

       lpOD allow the applications to export individually selected XML
       elements	instead	of full	XML parts; to do so, a "serialize()" or
       "export()" element- based method	is provided (see ODF::lpOD::Element).


       This part-based method stores the present state (possibly changed) of
       the part	in a temporary,	non-persistent space, waiting for the
       execution of the	next call of the document-based	"save()" method.

       This method may be ignored by users who just need to save created or
       changed documents in a regular compressed ODF format, because the
       document-based "save()" method does the whole job.

       The following example selects the "CONTENT" part	of a document, removes
       the last	paragraph of this content, then	sends back the changed content
       to the document,	that in	turn is	made persistent:

	       $content	= $document->get_part(CONTENT);
	       $p = $content->get_body->get_paragraph(-1);

       Like "serialize()", "store()" allows the	"pretty" option, in order to
       store human-readable XML	in the file that will be generated by "save"
       (for debugging only).

       Note that "store()" doesn't write anything on a persistent storage
       support;	it just	instructs the "odf_document" that this part needs to
       be updated.

       The explicit use	of "store()" to	commit the changes made	in an
       individual part is not mandatory. When the whole	document is made
       persistent through the document-based "save()" method, each part	is
       automatically stored by default.	 However, this automatic storage may
       be deactivated using "needs_update()".


       This part-based method allows the user to prevent the automatic storage
       of the part when	the "save()" method of the corresponding
       "odf_document" is executed.

       As soon as a document part is used, either explicitly through the
       "get_part()" document method or indirectly, it may be modified. By
       default,	the document- based "save()" method stores back	in the
       container every part that may have been used. The user may change this
       default behavior	using the part-based "needs_update()" method, whose
       argument	is "TRUE" or "FALSE".

       In the example below, the application uses the "CONTENT"	and "META"
       parts, but the "META" part only is really updated, whatever the changes
       made in the "CONTENT":

	       $doc = odf_get_document('source.odt');
	       $content	= $doc->get_part(CONTENT);
	       $meta = $doc->get_part(META)

       Note that "needs_update(FALSE)" deactivates the automatic update	only;
       the explicit use	of the "store()" part-based method remains always


       This document-based method stores an external file "as is" in the
       document	container, without interpretation. The mandatory argument is
       the path	of the source file, provided according to either the local
       file system rules or an URL.

       If the path contains a ":" and if this sign is preceded by anything
       other than a single letter, then	it's regarded as a remote URL. So, as
       examples, a path	that looks like	"http:..." is supposed to be aimed at
       a distant resource, while "C:\...", "/xxx/yyy..." and "aaa" are
       supposed	to specify local files.	As soon	as a resource is regarded as
       remote, lpOD tries to load it through "LWP::Simple", so you should read
       the "LWP::Simple" documentation for details about the supported
       protocols. Beware that this module is not required at the ODF::lpOD
       installation time, and that "add_file()"	will just fail,	without	fatal
       error, as long as it's called with remote URLs when "LWP::Simple" is
       not installed.

       Optional	named parameters "path"	and "type" are allowed;	"path"
       specifies the destination path in the ODF package, while	"type" is the
       MIME type of the	added resource.	Note that the "path" parameter is by
       no mean related to the source path specified by the first argument.

       As an example, the instruction below inserts a binary image file
       available in the	current	directory in the "Thumbnails" folder of	the
       document	package:

		       path => "Thumbnails/thumbnail.png"

       If the "path" parameter is omitted, the destination folder in the
       package is either "Pictures" if the source is identified	as an image
       file (caution: such a recognition may not work with any image type in
       any environment)	or the root folder.

       The following example creates an	entry whose every property is

		       path    => "Pictures/portrait.jpg",
		       type    => "image/jpeg"

       If the "type" option is not provided, lpOD attempts to automatically
       determine the MIME type using "File::Type", provided that the file is
       available in the	local file system. If the file format is not
       recognized, lpOD	doesn't	provide	any default value, so the mime type of
       the resource is not registered in the document. Note that right MIME
       types are not absolutely	required by typical ODF-compatible software
       but that	it's a good practice to	provide	them when possible.

       The return value	is the destination path. If the	imported file is an
       image, this return value	may be used as a reference each	time the
       corresponding image is inserted in the document through a "frame" (for
       details about the ways to insert	image frames in	documents, see

       This method may be used in order	to import an external XML file as a
       replacement of a	conventional ODF XML part without interpretation. As
       an example, the following instruction replaces the "STYLES" part	of a
       document	by an arbitrary	file:

	       $document->add_file("custom_styles.xml",	path =>	STYLES);

       (For mnemonic reasons, it's possible to replace "path" by "part",
       knowing that each part of a document is practically identified by a
       path in the physical archive.)

       Note that the physical effect of	"add_file()" is	not immediate; the
       file is really added (and the source is really required)	only when the
       "save()"	method,	introduced below, is called. As	a consequence, any
       update that could be done in a document part loaded using "add_file()"
       is lost.	According to the same logic, a document	part loaded using
       "add_file()" is never available in the current document instance; it
       becomes available if the	current	instance is made persistent through a
       "save()"	call and if a new instance is created using the	saved package
       with "odf_get_document".


       Specialized derivative of "add_file()", to be used in order to import
       image files used	in the document	without	explicit "type"	and "path"

       In scalar context, the return value is the same as "add_file()",	so it
       may be used as the image	reference in order to associate	the image to a
       "frame" that will make it visible in the	document (see

       In array	context, "add_image_file()" returns the	image reference	then
       (if everything is right)	the image size.	This size (if defined) may be
       used to set the size of the corresponding image container in the
       document	(see the "Frames" section in ODF::lpOD::StructuredContainer),
       like in the following example:

	       my ($link, $size) = $doc->add_image_file('/home/images/logo.png');
	       my $frame = odf_create_image_frame($link, size => $size);

       However,	the automatic size detection works only	if the image file is
       recognized by Image::Size (fortunately, the most	popular	formats, such
       as PNG, JPG, BMP, XPM, TIFF and others are supported).

       If the "type" option is not set,	lpOD attempts to determine the MIME
       type using "File::Type",	but a specific rule applies in case of
       failure.	 If the	type is	not automatically recognized, then lpOD
       arbitrarily concatenates	the suffix of the file name to the "image/"
       string (so if the source	file name is "foo.jpeg"	then the supposed MIME
       type is "image/jpeg"), that may hopefully provide a correct MIME	type
       in some situations. And if nothing works	(i.e. if there is no
       application-provided type, if "File::Type" doesn't answer, and if there
       is no file suffix), then	the type is set	to "image/unknown". Users are
       encouraged to avoid such	a result, but, fortunately, a wrong MIME type
       doesn't prevent a typical ODF-compatible	office software	to correctly
       render an image in a document (provided that the	image format is	really
       supported, that doesn't depend on lpOD).

       Note that it's strongly recommended to avoid any	intensive use of
       "add_image_file()" in array context, especially in long running
       processes and/or	with remote resources, knowing that, in	order to get
       the image size, lpOD immediately	loads the file and stores it in
       memory. If "add_image_file()" is	called in scalar context, the
       effective file load is deferred until the ODF target file is generated
       by "save()".


       Allows the user to create or replace a document part using data in
       memory.	The first argument is the target ODF part, while the second
       one is the source string.


       Deletes a part in the document package. The deletion is physically done
       through the subsequent call of "save()".	The argument may be either the
       symbolic	constant standing for a	conventional ODF XML part or the real
       path of the part	in the package.

       The following sequence replaces (without	interpretation)	the current
       document	content	part by	an external content:

	       $document->add_file("/somewhere/stuff.xml", path	=> CONTENT);

       Note that the order of these instructions is not	significant; when
       "save()"	is called, it executes all the deletions then all the part
       insertions and/or updates.


       This method is provided by the "odf_document". If the document instance
       is associated with a regular ODF	resource available for update (meaning
       that it has been	created	using "odf_get_container" and that the user
       has a write access to the resource), the	resource is wrote back and
       reflects	all the	changes	previously committed by	one or more document
       parts using their respective "store" methods.

       The general form	of a document processing sections looks	like that:

	       $doc = odf_get_document($filepath);
	       # various document updates

       As an example, the sequence below updates a ODF file according to
       changes made in the "META" and "CONTENT"	parts:

	       my $doc = odf_get_document("/home/users/jmg/report.odt");
	       my $meta	= $doc->get_part(META);
	       my $content = $doc->get_part(CONTENT);
	       # meta updates are made here
	       # content updates are made here

       The "save()" method allows a "pretty" option in order to	get human-
       readable	XML in the resulting ODF files.	Warning: this feature is
       intended	for debugging only and must be avoided in production, knowing
       that it may insert indesirable spaces in	the text contents and increase
       the file	size. Example:

		       $document->save(pretty => TRUE);

       The "pretty" feature may	be in some way customized through the
       XML_PRETTY_PRINT() global setting function, that	allows the application
       to select a particular XML export style.	The default is 'indented';
       other legal values are 'nice', 'indented_c', 'indented_a',
       'indented_close_tag', 'cvs', 'wrapped', 'record', 'record_c', 'nsgmls'
       and 'none'. For details about the effects of each option, see
       "set_pretty_print()" in XML::Twig.

       In the following	example, the XML is stored according to	the 'nsgmls'

		       $document->save(pretty => TRUE);

       An optional "target" parameter may be provided to "save()". If set,
       this parameter specifies	an alternative destination for the file	(it
       produces	the same effect	as the "File/Save As" feature of a typical
       office software).  The "target" option is always	allowed, but it's
       mandatory with "odf_document" instances created using a
       "odf_new_document_from..." constructor.

       The manifest part of a document holds the list of the files included in
       the container associated	to the "odf_document". It's represented	by a
       "odf_manifest" object, that is a	particular "odf_xmlpart".

       Each included file is represented by a "odf_file_entry" object, whose
       properties are

       o   "path": full	path of	the file in the	container;

       o   "type" : the	media type (or MIME type) of the file.

       A "odf_manifest"	instance is created through the	"get_part()" method of
       "odf_document", with "MANIFEST" as part selector:

	       $manifest = $document->get_part(MANIFEST);

   Entry access
       The full	list of	manifest entries may be	obtained using

       It's possible to	restrict the list with an optional "type" parameter
       whose value is a	string of a regular expression.	If "type" is set, then
       the method returns the entries whose media type string matches the
       given expression.

       As an example, the first	instruction below returns the entries that
       correspond to XML parts only, while the next one	returns	all the	XML
       entries,	including those	whose type is not "text/xml" (such as
       "application/rdf+xml"), and the last returns all	the "image/xxx"
       entries (whatever the image format):

	       @xmlp_entries = $manifest->get_entries(type => 'text/xml');
	       @xml_entries = $manifest->get_entries(type => 'xml');
	       @image_entries =	$manifest->get_entries(type => 'image');

       An individual entry may be selected according to	its "path", knowing
       that the	path is	the entry identifier. The "get_entry()"	method,	whose
       mandatory argument is the "path", does the job. The following
       instruction returns the entry that stands for a given image resource
       included	in the package (if any):

	       $img_entry = $manifest->get_entry('Pictures/13BE2000BDD8EFA.jpg');

   Entry creation and removal
       Once selected, an entry may be deleted using the	generic	"delete"
       method.	The "del_entry()" method, whose	mandatory argument is an entry
       path, deletes the corresponding entry, if any. If the given entry
       doesn't exist, nothing is done. The return value	is the removed entry,
       or "undef".

       A new entry may be added	using the "set_entry()"	method.	This method
       requires	a unique path as its mandatory argument. A "type" optional
       named parameter may be provided,	but is not required; without "type"
       specification, the media	type remains empty. This method	returns	the
       new entry object, or a null value in case of failure. The example below
       adds an entry corresponding to an image file:

	       $manifest->set_entry('Pictures/xyz.jpg',	type =>	'image/jpeg');

       If "set_entry()"	is called with the same	path as	an existing entry, the
       old entry is removed and	replaced by the	new one.

       If the entry path is a folder, i.e. if its last character is "/", then
       the media type is automatically set to an empty value. However, this
       rule doesn't apply to the root folder, i.e. "/",	whose type should be
       the MIME	type of	the document.

       Beware: adding or removing a manifest entry doesn't automatically add
       or remove the corresponding file	in the container, and there is no
       automatic consistency check between the real content of the part	and
       the manifest.

   Entry property handling
       An individual manifest entry is a "odf_file_entry" object, that is a
       particular "odf_element"	object.

       It provides the "get_path()", "set_path()", "get_type()", "set_type()"
       accessors, to get or set	the "path" and "type" properties. There	is no
       check with "set_type()",	so the user is responsible for the consistency
       between the given type and the real content of the corresponding	file.
       On the other hand, "set_path()" fails if	the given "path" is already
       used by another entry; but there	is no other check regarding this
       property, so the	user must check	the consistency	between	the given path
       and the real path of the	corresponding resource.

       If "set_path()" puts a path whose last character	is "/",	the media type
       of the entry is automatically set to an empty string. However, for
       users who know exactly what they	do, "set_type()" allows	to force a
       non-empty type after "set_path()".

       Developer/Maintainer: Jean-Marie	Gouarne
       <> Contact:

       Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend.	Copyright (c)
       2011 Jean-Marie Gouarne.

       This work was sponsored by the Agence Nationale de la Recherche

       License:	GPL v3,	Apache v2.0 (see LICENSE).

       Hey! The	above document had some	coding errors, which are explained

       Around line 471:
	   Non-ASCII character seen before =encoding in	'A<section>3.1.18'.
	   Assuming UTF-8

perl v5.32.1			  2014-03-07		     lpOD::Document(3)

NAME | DESCRIPTION | Document initialization and termination | Document MIME type check and control | Access to individual document parts | Accessing data inside a part | Global document metadata | How to persistently update a document | Manifest | AUTHOR/COPYRIGHT | POD ERRORS

Want to link to this manual page? Use this URL:

home | help