Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
lpOD(3)		      User Contributed Perl Documentation	       lpOD(3)

       ODF::lpOD - An OpenDocument management interface

	       use ODF::lpOD:

	       my $document = odf_document->get("report.odt");

	       my $meta	= $document->get_part(META);
	       $meta->set_title("The best document format");

	       my $content = $document->get_part(CONTENT);
	       my $context = $content->get_body;
	       my $paragraph = $context->get_paragraph(
		       content => "I look for it"
	       $paragraph->set_text("I found it");
	       my $new_paragraph = odf_paragraph->create (
				       style =>	"Standard",
				       text => "A new content"
	       my $table = odf_table->create (
		       "Main Figures", height => 20, width => 16
	       $context->insert_element($table,	before => $paragraph);
	       my $cell	= $table->get_cell("B4");
	       $cell->set_text("Here B4");


       The code	example	above loads a document from an existing	"report.odt"
       file, updates various data in the document, then	saves the changes. The
       following actions are done in the document:

       1) The title is set to "The best	document format";

       2) The first paragraph containing "I look for it" is retrieved (this
       paragraph is supposed to	exist; otherwise get_paragraph would return

       3) The content of the found paragraph is	replaced by "I found it", and
       its style is set	to "Standout" (this style is supposed to exist or to
       be defined later);

       4) A new	paragraph, whose text is "A new	content" and style is
       "Standard", is created then appended to the document body;

       5) A new	table whose name is "Main Figures" and size is 20x16 is
       created then inserted just before the first retrieved paragraph;

       6) The "B4" cell	(i.e. the cell belonging to the	4th row	and the	2nd
       column, whatever	the document type) is retrieved, and its content is
       set to "Here B4"	(the cell data type is automatically set to 'string').

       This module is an office	document management interface. It allows the
       users to	create or transform office documents, or to extract data from
       them. It	can handle files which comply with the ODF standard and	whose
       type is text (odt), spreadsheet (ods), presentation (odp) or drawing
       (odg). It interacts directly with the files and doesn't depend on a
       particular office software.

       This is the Perl	implementation of the lpOD project.

       lpOD is a Free Software project that offers, for	high level use cases,
       an application programming interface dedicated to document processing
       with the	Python,	Perl and Ruby languages. It's complying	with the OASIS
       Open Document Format (ODF), i.e.	the ISO/IEC 26300 international

       lpOD is designed	according to a top-down	approach. The API is bound to
       the document functional structure and the user's	point of view. As a
       consequence, it may be used without full	knowledge of the ODF
       specification, and allows the application developer to be focused on
       the business needs instead of the low level storage concerns.

       The lpOD	API is object oriented.

Basic document access principles
       The general access to the documents uses	the "odf_document" class.
       Before processing a document, an	odf_document instance must be created
       using one of the	allowed	constructors. While an odf_document object
       encapsulates the	physical resource access logic,	the real data must be
       handled through document	parts, knowing that each part represents a
       specialized aspect of the document.

       Each part contains a set	of "odf_element" objects, knowing that
       odf_element is the common base class for	any kind of document simple or
       complex element (an odf_element may be a	visible	object,	such as	a
       paragraph or a table, as	well as	a piece	of data	that specifies the
       layout or the behavior of other objects,	such as	a text style or	a page
       layout).	Each part contains a root element, that	is a special
       odf_element containing all the elements of the part. A part may contain
       a body element, that is a more restricted but in	some cases more
       interesting context than	the root.

       lpOD is a read-write API. However, the changes made by the applications
       aren't automatically persistent.	The API	provides methods that insert,
       delete, or update elements in memory, but these changes must be
       explicitly committed using other, package-oriented methods, in order to
       become persistent.

   Global document initialization
       A few specialized constructors may be used in order to create
       odf_document objects. All these constructors return an odf_document
       object in case of success, a FALSE value	otherwise.

       One an odf_document is created, it's content may	be wrote back to a
       persistent storage using	its "save" method.


       Instantiates an "odf_document" object which is a	read-write interface
       to an existing ODF package corresponding	to the given source. The
       package should be an ODF-compliant zip file (odt, ods, odp, and so on).

	       my $document = odf_get_document("C:\Path\Doc.odt");

       "odf_get_document()" is just a functional way to	call the "get()"
       constructor of the "odf_document" class;	so the example above produce
       the same	effect as the following	one:

	       my $document = odf_document->get("C:\Path\Doc.odt");

       The source argument must	be provided either as a	regular	file path or
       as a "IO::File" object.


       Returns a new odf_document corresponding	to the given ODF document
       type.  Allowed document types are presently 'text', 'spreadsheet',
       'presentation', and 'drawing'). Example:

	       my $document = odf_new_document('spreadsheet');

       Knowing that this functional constructor	is just	a way to call the
       "create()" method of the	"odf_document" class, the following code is

	       my $document = odf_document->create('spreadsheet');

       Technically, the	new document is	generated as a clone of	an existing
       template	document, provided with	the lpOD distribution. It operates in
       the same	way as "odf_new_document_from_template", but the user doesn't
       need to provide the template document.


       Returns a new odf_document instantiated from an existing	ODF template
       package.	Same as	"odf_get_document", but	the source package is read-


       This function is	a method. It must be called from an odf_document

       Without argument, it attempts to	write it's content back	to the
       resource	that was used to create	it. A warning is issued	and nothing is
       done if the document has	been created without source file or from a
       read-only template (i.e.	through	"odf_new_document" or

       This method produces a file whose basic format is the same as the
       format of the source document or	template (whatever the target file
       name, if	any).

       If the optional parameter "target" is provided, it's regarded as	the
       storage destination. Its	value may be a regular file path or a
       "IO::File". This	parameter is mandatory if the "odf_document" instance
       has been	created	through	"odf_new_document_from_template"  or


	       $document->save(target => "/myfiles/target.odt");

   Document part initialization	and handling
       A regular ODF document contains various parts, some of them mandatory.
       The interesting parts in	the lpOD scope are 'content', 'styles',
       'meta', 'settings', and 'manifest'.

       The odf_document	class provides a "get_part()" method, that must	be
       used with an argument that specifies the	needed part. Example:

	       my $content = $document->get_part(CONTENT);
	       my $meta	= $document->get_part(META);

       The sequence above gives	access to the content and meta parts of	a
       previously created "odf_document" instance.

       Beware: if "get_part()" is called twice or more from the	same
       "odf_document" instance and with	the same part designation, it returns
       the same	object.	As a consequence, after	the sequence below, $p1	and
       $p2 will	be synonyms:

	       my $p1 =	$document->get_part(CONTENT);
	       my $p2 =	$document->get_part(CONTENT);

       "serialize()" returns an	XML export of the whole	part (the application
       is then responsible of the fate of this export).	An optional "pretty"
       argument, if set	to TRUE, specifies that	the XML	output must be human-
       readable. Example:

	       my $content = $document->get_part(CONTENT);
	       # here some content processing
	       my $xml = $content->serialize(pretty => TRUE);

Basic ODF element handling
       Every "odf_part"	objects	provides a low level "get_element" method
       whose first argument is an XPath	expression and the second one a
       numeric position.  The numeric argument specifies the order number of
       the required element among the set of elements matching the XPath. If
       the order number	is negative, the position is regarded as counted
       backward	from the end. The position is zero- based (i.e.	a zero value
       means the first matching	element). As an	example, the code below
       returns the last	paragraph of the document.

	       my $document = odf_document->get($source);
	       my $content = $document->get_part(CONTENT);
	       my $p = $part->get_element("//text:p", -1);

       However,	this way is not	the smartest one because it requires the
       knowledge of the	ODF schema (and	some XPath skills for more complicated
       cases). There are better	ways to	select the last	paragraph of a
       document	(and various other objects at any position in a	document).

       lpOD provides more user-friendly, XPath-free methods for	the most used
       elements	in the "CONTENT" part of a document. These methods are
       provided	through	the "odf_element" class. Any individual	element	in a
       part is an "odf_element"	object.	There is a shortcut to get the top (or
       root) element of	any part: the "get_root()" method. Once	selected, the
       top element provides all	the context methods of the lpOD	API.

       A context method	is a method owned by an	element	(the context) and
       whose effect is related to the children and descendants of this
       element.	So, the	"get_xxx" method of a given element is a retrieval
       method intended to select something below the current element. Thanks
       to the "get_paragraph" element provided by the "odf_element" class, the
       last example could be wrote as shown below:

	       my $document = odf_document->get($source);
	       my $context = $document->get_part(CONTENT)->get_root;
	       my $p = $context->get_paragraph(-1);

       In most cases (including	the previous example), "get_root" may be
       replaced	by "get_body", that return a context containing	all the
       visible elements	(including the paragraphs).

       There is	a generic context-based	"get_element" that differs from	the
       part-based one. It allows the user to select an element according to
       its text	content, one of	its attributes,	and/or its sequential position
       in the context. As an example, the sequence below displays the name of
       the last	page that uses the draw	page style "dp1" (assuming we are
       using a presentation or drawing document):

	       my $context = $document->get_part(CONTENT)->get_body;
	       my $page	= $context->get_element(
		       attribute       => 'style name',
		       value	       => 'dp1',
		       position	       => -1
	       say $page->get_attribute('name');

       lpOD provides special name-based	retrieval methods for some elements
       that own	unique names. For example the instruction below	selects	the
       table whose name	is "T1"	(if any):

	       $table =	$context->get_table_by_name("T1");

       The "meta" document part, unlike	others such as the "content" one,
       provides	direct "get" and "set" accessors for the content of the	usual
       metadata, so there is no	need of	a context element, as shown below in
       the following example that displays the title of	a document:

	       my $document = odf_document->get($source);
	       my $meta	= $document->get_part(META);
	       say $meta->get_title;

       The title (like an other	metadata value)	may be updated or created with
       the corresponding "set" accessor:

	       $meta->set_title("The new title");

       All the properties of a previously selected element are stored in one
       or more attributes and in a text. So, for any "odf_element" lpOD
       provides	corresponding "get" and	"set" accessors.

       "get_text" returns the current text, while "set_text" replaces the
       current content by a new	text (possibly empty). Without argument,
       "get_text" returns the text directly contained in the calling element,
       but with	a "recursive" optional named parameter set to "TRUE", it
       returns the concatenated	texts of all the descendants of	the calling
       element.	On the other hand, "set_text" deletes any previous content
       (i.e. direct text content and embedded elements such as bookmarks,
       variable	fields,	text segments with special styles, and so on).

       The "get_attribute" method requires the name of the needs attribute.
       This name may be	the technical name according to	the OpenDocument
       specification, or a more	simple and significant name. For example,
       assuming	$item is a list	item, and knowing that such an object may own
       a so-called "text:restart-numbering" attribute telling that the list
       numbering must be restarted at this point from a	given value, the
       following instruction sets this value to	6:

	       $item->set_attribute('restart numbering'	=> 6);

       "set_attribute" deletes an existing attribute as	soon as	the given
       value is	"undef"; so the	instruction below cancels the "restart
       numbering" feature:

	       $item->set_attribute('restart numbering'	=> undef);

       Note that "set_attribute", provided with	a non-null value,
       automatically creates the attribute if it doesn't exist;	there is no
       need to separately check	an attribute for existence and create it
       before setting a	value.

       It's possible to	get or set more	than one attributes in a single	call
       using "get_attributes" or "set_attributes". The first one returns the
       attributes as a hash reference (with the	real ODF names), while the
       second one requires a hash reference as argument.

       An element may be removed (with all its descendants) using its "delete"
       method.	(Beware: the deletion of a high	level element may destroy a
       lot of content !).  It's	possible to delete the whole content of	an
       element without removing	the element itself by issuing a	"set_text"
       with an empty string.

       The user	is allowed to create a new element using the
       "odf_create_element" constructor, that requires an appropriate ODF tag
       (corresponding to the type of element) or a valid XML string.
       Fortunately, lpOD provides a set	of specialized constructors (such as
       "odf_create_paragraph", "odf_create_table", and so on) that may be used
       without knowledge of the	XML stuff. Once	created	through	such a
       constructor, the	new element is not automatically included in a
       document. To do so, lpOD	provides the "insert_element" and
       "append_element"	methods, both context-based, i.e. called from an
       existing	element	that will become the parent of the new element.	As an
       example,	the sequence below creates a new paragraph (with given style
       and content), then appends it to	a selected section:

	       my $document = odf_document->($source);
	       my $context = $document->get_part(CONTENT)->get_body;
	       my $section = $context->get_section("Prologue");
	       my $paragraph = odf_paragraph->create(
		       style =>	"Standard", text => "The End of	the Beginning"

       Elements	may be created by replication of existing elements, thanks to
       the "clone" method. The result of the instruction below is a copy of an
       existing	section	(with all its content);	this copy is a "free" element
       (i.e. it's not included in any document,	and it has no link with	its
       prototype element), so it may be	inserted elsewhere in the same
       document	or in another document:

	       my $section = $context->get_section("Reusable");
	       my $free_section	= $section->clone;

Getting	started
   The "Hello Word" example
       Unsurprisingly, we propose you to test your lpOD	installation and your
       knowledge of the	big picture through this simple	program:

	       use ODF::lpOD;

	       my $doc = odf_document->create('text');
	       my $content = $doc->get_part(CONTENT);
	       my $context = $content->get_body;
			       style =>	"Standard",
			       text => "Hello World !"
	       $doc->save(target => "helloworld.odt");

       If this script runs without warning, open the "helloworld.odt" file
       using your favorite ODF-compliant text processor, and look at the text
       content.	You may	then introduce more sophistication using the metadata
       part of the document.  To do so,	you can	(for example) insert the lines
       below somewhere before the "save" instruction (and after	the
       "odf_document-"create()>	one).

	       my $meta	= $doc->get_part(META);
	       $meta->set_title("Hello World Test");

       After execution of the extended version,	check the author's name	and
       the title through the File/Properties dialog of your ODF	text editor.

   Using the documentation
       The ODF::lpOD::Tutorial is a recommended	first reading that may help to
       quickly gain a basic understanding and get started with lpOD. The
       reference documentation is split	into the following manual chapters:

       o   ODF::lpOD::Document:	General	document packaging and metadata

       o   ODF::lpOD::Element: Common features,	available with any element.

       o   ODF::lpOD::TextElement: Text	containers (paragraphs,	headings), and
	   various elements that may take place	in paragraphs (bookmarks,
	   index marks,	bibliography marks, text variables and fields).

       o   ODF::lpOD::Table: Access to tables and their	content.

       o   ODF::lpOD::StructuredContainer: High-level structures such as
	   sections, lists, draw pages,	shapes,	image or text frames, tables
	   of contents.

       o   ODF::lpOD::Style: Style retrieval, update, or creation

       o   ODF::lpOD::Common: Common utility functions

       An alternative tutorial,	intended for French-reading users, is
       available at

       Developer/Maintainer: Jean-Marie	Gouarne
       <> Contact:

       Copyright (c) 2010 Ars Aperta, Itaapy, Pierlis, Talend.	Copyright (c)
       2014 Jean-Marie Gouarne.

       This work was sponsored by the Agence Nationale de la Recherche

       License:	GPL v3,	Apache v2.0 (see LICENSE).

perl v5.24.1			  2014-05-21			       lpOD(3)

NAME | SYNOPSIS | DESCRIPTION | ABOUT lpOD | Basic document access principles | Basic ODF element handling | Getting started | AUTHOR/COPYRIGHT

Want to link to this manual page? Use this URL:

home | help