Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Text::BibTeX::Entry(3)User Contributed Perl DocumentatioText::BibTeX::Entry(3)

       Text::BibTeX::Entry - read and parse BibTeX files

	  use Text::BibTeX::Entry;

	  # ...assuming	that $bibfile and $newbib are both objects of class
	  # Text::BibTeX::File,	opened for reading and writing (respectively):

	  # Entry creation/parsing methods:
	  $entry = Text::BibTeX::Entry->new();
	  $entry->read ($bibfile);
	  $entry->parse	($filename, $filehandle);
	  $entry->parse_s ($entry_text);

	  # or:
	  $entry = Text::BibTeX::Entry->new( $bibfile );
	  $entry = Text::BibTeX::Entry->new( $filename,	$filehandle );
	  $entry = Text::BibTeX::Entry->new( $entry_text );

	  # Entry query	methods
	  warn "error in input"	unless $entry->parse_ok;
	  $metatype = $entry->metatype;
	  $type	= $entry->type;

	  # if metatype	is BTE_REGULAR or BTE_MACRODEF:
	  $key = $entry->key;		       # only for BTE_REGULAR metatype
	  $num_fields =	$entry->num_fields;
	  @fieldlist = $entry->fieldlist;
	  $has_title = $entry->exists ('title');
	  $title = $entry->get ('title');
	  # or:
	  ($val1,$val2,...$valn) = $entry->get ($field1, $field2, ..., $fieldn);

	  # if metatype	is BTE_COMMENT or BTE_PREAMBLE:
	  $value = $entry->value;

	  # Author name	methods
	  @authors = $entry->split ('author');
	  ($first_author) = $entry->names ('author');

	  # Entry modification methods
	  $entry->set_type ($new_type);
	  $entry->set_key ($new_key);
	  $entry->set ('title',	$new_title);
	  # or:
	  $entry->set ($field1,	$val1, $field2,	$val2, ..., $fieldn, $valn);
	  $entry->delete (@fields);
	  $entry->set_fieldlist	(\@fieldlist);

	  # Entry output methods
	  $entry->write	($newbib);
	  $entry->print	($filehandle);
	  $entry_text =	$entry->print_s;

	  # Reset internal parser state:
	  $entry = Text::BibTeX::Entry->new();
	  $entry->parse	($filename, undef);
	  $entry->parse_s (undef);

	  # or:
	  $entry = Text::BibTeX::Entry->new( $filename,	undef );
	  $entry = Text::BibTeX::Entry->new( undef );

	  # Miscellaneous methods
	  $entry->warn ($entry_warning);
	  # or:
	  $entry->warn ($field_warning,	$field);

       "Text::BibTeX::Entry" does all the real work of reading and parsing
       BibTeX files.  (Well, actually it just provides an object-oriented Perl
       front-end to a C	library	that does all that.  But that's	not important
       right now.)

       BibTeX entries can be read either from "Text::BibTeX::File" objects
       (using the "read" method), or directly from a filehandle	(using the
       "parse" method),	or from	a string (using	"parse_s").  The first is
       preferable, since you don't have	to worry about supplying the filename,
       and because of the extra	functionality provided by the
       "Text::BibTeX::File" class.  Currently, this means that you may specify
       the database structure to which entries are expected to conform via the
       "File" class.  This lets	you ensure that	entries	follow the rules for
       required	fields and mutually constrained	fields for a particular	type
       of database, and	also gives you access to all the methods of the
       structured entry	class for this database	structure.  See
       Text::BibTeX::Structure for details on database structures.

       Once you	have the entry,	you can	query it or change it in a variety of
       ways.  The query	methods	are "parse_ok",	"type",	"key", "num_fields",
       "fieldlist", "exists", and "get".  Methods for changing the entry are
       "set_type", "set_key", "set_fieldlist", "delete", and "set".

       Finally,	you can	output BibTeX entries, again either to an open
       "Text::BibTeX::File" object, a filehandle or a string.  (A filehandle
       or "File" object	must, of course, have been opened in write mode.)
       Output to a "File" object is done with the "write" method, to a
       filehandle via "print", and to a	string with "print_s".	Using the
       "File" class is recommended for future extensibility, although it
       currently doesn't offer anything	extra.

   Entry creation/parsing methods
       new ([OPTS ,] [SOURCE])
	   Creates a new "Text::BibTeX::Entry" object.	If the SOURCE
	   parameter is	supplied, it must be one of the	following: a
	   "Text::BibTeX::File"	(or descendant class) object, a
	   filename/filehandle pair, or	a string.  Calls "read"	to read	from a
	   "Text::BibTeX::File"	object,	"parse"	to read	from a filehandle, and
	   "parse_s" to	read from a string.

	   A filehandle	can be specified as a GLOB reference, or as an
	   "IO::Handle"	(or descendants) object, or as a "FileHandle" (or
	   descendants)	object.	 (But there's really no	point in using
	   "FileHandle"	objects, since "Text::BibTeX" requires Perl 5.004,
	   which always	includes the "IO" modules.)  You can not pass in the
	   name	of a filehandle	as a string, though, because
	   "Text::BibTeX::Entry" conforms to the "use strict" pragma (which
	   disallows such symbolic references).

	   The corresponding filename should be	supplied in order to allow for
	   accurate error messages; if you simply don't	have the filename, you
	   can pass "undef" and	you'll get error messages without a filename.
	   (It's probably better to rearrange your code	so that	the filename
	   is available, though.)

	   Thus, the following are equivalent to read from a file named	by
	   $filename (error handling ignored):

	      #	good ol' fashioned filehandle and GLOB ref
	      open (BIBFILE, $filename);
	      $entry = Text::BibTeX::Entry->new($filename, \*BIBFILE);

	      #	newfangled IO::File thingy
	      $file = IO::File->new($filename);
	      $entry = Text::BibTeX::Entry->new($filename, $file);

	   But using a "Text::BibTeX::File" object is simpler and preferred:

	      $file  = Text::BibTeX::File->new($filename);
	      $entry = Text::BibTeX::Entry->new($file);

	   Returns the new object, unless SOURCE is supplied and
	   reading/parsing the entry fails (e.g., due to end of	file) -- then
	   it returns false.

	   You may supply a reference to an option hash	as first argument.
	   Supported options are:

	       Set the way Text::BibTeX	deals with strings. By default it
	       manages strings as bytes. You can set BINMODE to	'utf-8'	to get
	       NFC normalized

		     { binmode => 'utf-8', normalization => 'NFD' },
		     $file });

	       UTF-8 strings and you can customise the normalization with the
	       NORMALIZATION option.

	   Clone a Text::BibTeX::Entry object, returning the clone. This re-
	   uses	the reference to any Text::BibTeX::Structure or
	   Text::BibTeX::File but copies everything else, so that the clone
	   can be modified apart from the original.

       read (BIBFILE)
	   Reads and parses an entry from BIBFILE, which must be a
	   "Text::BibTeX::File"	object (or descendant).	 The next entry	will
	   be read from	the file associated with that object.

	   Returns the same as "parse" (or "parse_s"): false if	no entry found
	   (e.g., at end-of-file), true	otherwise.  To see if the parse	itself
	   failed (due to errors in the	input),	call the "parse_ok" method.

	   Reads and parses the	next entry from	FILEHANDLE.  (That is, it
	   scans the input until an '@'	sign is	seen, and then slurps up to
	   the next '@'	sign.  Everything between the two '@' signs [including
	   the first one, but not the second one -- it's pushed	back onto the
	   input stream	for the	next entry] is parsed as a BibTeX entry, with
	   the simultaneous construction of an abstract	syntax tree [AST].
	   The AST is traversed	to ferret out the most interesting
	   information,	and this is stuffed into a Perl	hash, which
	   coincidentally is the "Text::BibTeX::Entry" object you've been
	   tossing around.  But	you don't need to know any of that -- I	just
	   figured if you've read this far, you	might want to know something
	   about the inner workings of this module.)

	   The success of the parse is stored internally so that you can later
	   query it with the "parse_ok"	method.	 Even in the presence of
	   syntax errors, you'll usually get something resembling your input,
	   but it's usually not	wise to	try to do anything with	it.  Just call
	   "parse_ok", and if it returns false then silently skip to the next
	   entry.  (The	error messages printed out by the parser should	be
	   quite adequate for the user to figure out what's wrong.  And	no,
	   there's currently no	way for	you to capture or redirect those error
	   messages -- they're always printed to "stderr" by the underlying C
	   code.  That should change in	future releases.)

	   If no '@' signs are seen on the input before	reaching end-of-file,
	   then	we've exhausted	all the	entries	in the file, and "parse"
	   returns a false value.  Otherwise, it returns a true	value -- even
	   if there were syntax	errors.	 Hence,	it's important to check

	   The FILENAME	parameter is only used for generating error messages,
	   but anybody using your program will certainly appreciate your
	   setting it correctly!

	   Passing "undef" to FILEHANDLE will reset the	state of the
	   underlying C	parser,	which is required in order to parse multiple

       parse_s (TEXT)
	   Parses a BibTeX entry (using	the above rules) from the string TEXT.
	   The string is not modified; repeatedly calling "parse_s" with the
	   same	string will give you the same results each time.  Thus,
	   there's no point in putting multiple	entries	in one string.

	   Passing "undef" to TEXT will	reset the state	of the underlying C
	   parser, which may be	required in order to parse multiple strings.

   Entry query methods
       parse_ok	()
	   Returns false if there were any serious errors encountered while
	   parsing the entry.  (A "serious" error is a lexical or syntax
	   error; currently, warnings such as "undefined macro"	result in an
	   error message being printed to "stderr" for the user's edification,
	   but no notice is available to the calling code.)

       type ()
	   Returns the type of the entry.  (The	`type' is the word that
	   follows the '@' sign; e.g. `article', `book', `inproceedings', etc.
	   for the standard BibTeX styles.)

       metatype	()
	   Returns the metatype	of the entry.  (The `metatype' is a numeric
	   value used to classify entry	types into four	groups:	comment,
	   preamble, macro definition (@string entries), and regular (all
	   other entry types).	"Text::BibTeX" exports four constants for
	   these metatypes: "BTE_COMMENT", "BTE_PREAMBLE", "BTE_MACRODEF", and

       key ()
	   Returns the key of the entry.  (The key is the token	immediately
	   following the opening `{' or	`(' in "regular" entries.  Returns
	   "undef" for entries that don't have a key, such as macro definition
	   (@string) entries.)

       num_fields ()
	   Returns the number of fields	in the entry.  (Note that, currently,
	   this	is not equivalent to putting "scalar" in front of a call to
	   "fieldlist".	 See below for the consequences	of calling "fieldlist"
	   in a	scalar context.)

       fieldlist ()
	   Returns the list of fields in the entry.

	   WARNING In scalar context, it no longer returns a reference to the
	   object's own	list of	fields.

       exists (FIELD)
	   Returns true	if a field named FIELD is present in the entry,	false

       get (FIELD, ...)
	   Returns the value of	one or more FIELDs, as a list of values.  For

	      $author =	$entry->get ('author');
	      ($author,	$editor) = $entry->get ('author', 'editor');

	   If a	FIELD is not present in	the entry, "undef" will	be returned at
	   its place in	the return list.  However, you can't completely	trust
	   this	as a test for presence or absence of a field; it is possible
	   for a field to be present but undefined.  Currently this can	only
	   happen due to certain syntax	errors in the input, or	if you pass an
	   undefined value to "set", or	if you create a	new field with
	   "set_fieldlist" (the	new field's value is implicitly	set to

	   Normally, the field value is	what the input looks like after
	   "maximal processing"--quote characters are removed, whitespace is
	   collapsed (the same way that	BibTeX itself does it),	macros are
	   expanded, and multiple tokens are pasted together.  (See
	   bt_postprocess for details on the post-processing performed by

	   For example,	if your	input file has the following:

	      @string{of = "of"}
	      @string{foobars =	"Foobars"}

		title =	{   The	Mating Habits	   } # of # " Adult   "	# foobars

	   then	using "get" to query the value of the "title" field from the
	   "foobar" entry would	give the string	"The Mating Habits of Adult

	   However, in certain circumstances you may wish to preserve the
	   values as they appear in the	input.	This is	done by	setting	a
	   "preserve_values" flag at some point; then, "get" will return not
	   strings but "Text::BibTeX::Value" objects.  Each "Value" object is
	   a list of "Text::BibTeX::SimpleValue" objects, which	in turn
	   consists of a simple	value type (string, macro, or number) and the
	   text	of the simple value.  Various ways to set the
	   "preserve_values" flag and the interface to both "Value" and
	   "SimpleValue" objects are described in Text::BibTeX::Value.

       value ()
	   Returns the single string associated	with @comment and @preamble
	   entries.  For instance, the entry

	      @preamble{" This is   a preamble"	#
			{---the	concatenation of several strings}}

	   would return	a value	of "This is a preamble---the concatenation of
	   several strings".

	   If this entry was parsed in "value preservation" mode, then "value"
	   acts	like "get", and	returns	a "Value" object rather	than a simple

   Author name methods
       This is the only	part of	the module that	makes any assumption about the
       nature of the data, namely that certain fields are lists	delimited by a
       simple word such	as "and", and that the delimited sub-strings are human
       names of	the "First von Last" or	"von Last, Jr.,	First" style used by
       BibTeX.	If you are using this module for anything other	than
       bibliographic data, you can most	likely forget about these two methods.
       However,	if you are in fact hacking on BibTeX-style bibliographic data,
       these could come	in very	handy -- the name-parsing done by BibTeX is
       not trivial, and	the list-splitting would also be a pain	to implement
       in Perl because you have	to pay attention to brace-depth.  (Not that it
       wasn't a	pain to	implement in C -- it's just a lot more efficient than
       a Perl implementation would be.)

       Incidentally, both of these methods assume that the strings being split
       have already been "collapsed" in	the BibTeX way,	i.e. all leading and
       trailing	whitespace removed and internal	whitespace reduced to single
       spaces.	This should always be the case when using these	two methods on
       a "Text::BibTeX::Entry" object, but these are actually just front ends
       to more general functions in "Text::BibTeX".  (More general in that you
       supply the string to be parsed, rather than supplying the name of an
       entry field.)  Should you ever use those	more general functions
       directly, you might have	to worry about collapsing whitespace; see
       Text::BibTeX (the "split_list" and "split_name" functions in
       particular) for more information.

       Please note that	the interface to author	name parsing is	experimental,
       subject to change, and open to discussion.  Please let me know if you
       have problems with it, think it's just perfect, or whatever.

       split (FIELD [, DELIM [,	DESC]])
	   Splits the value of FIELD on	DELIM (default:	`and').	 Don't assume
	   that	this works the same as Perl's builtin "split" just because the
	   names are the same: in particular, DELIM must be a simple string
	   (no regexps), and delimiters	that are at the	beginning or end of
	   the string, or at non-zero brace depth, or not surrounded by
	   whitespace, are ignored.  Some examples might illuminate matters:

	      if field F is...		      then split (F) returns...
	      'Name1 and Name2'		      ('Name1',	'Name2')
	      'Name1 and and Name2'	      ('Name1',	undef, 'Name2')
	      'Name1 and'		      ('Name1 and')
	      'and Name2'		      ('and Name2')
	      'Name1 {and} Name2 and Name3'   ('Name1 {and} Name2', 'Name3')
	      '{Name1 and Name2} and Name3'   ('{Name1 and Name2}', 'Name3')

	   Note	that a warning will be issued for empty	names (as in the
	   second example above).  A warning ought to be issued	for delimiters
	   at the beginning or end of a	string,	but currently this isn't done.

	   DESC	is a one-word description of the substrings; it	defaults to
	   'name'.  It is only used for	generating warning messages.

       names (FIELD)
	   Splits FIELD	as described above, and	further	splits each name into
	   four	components: first, von,	last, and jr.

	   Returns a list of "Text::BibTeX::Name" objects, each	of which
	   represents one name.	 Use the "part"	method to query	these objects;
	   see Text::BibTeX::Name for details on the interface to name objects
	   (and	on name-parsing	as well).

	   For example if this entry:

		       author =	{John Smith and
				 Hacker, J. Random and
				 Ludwig	van Beethoven and
				 {Foo, Bar and Company}}}

	   has been parsed into	a "Text::BibTeX::Entry"	object $entry, then

	      @names = $entry->names ('author');

	   will	put a list of "Text::BibTeX::Name" objects in @names.  These
	   can be queried individually as described in Text::BibTeX::Name; for

	      @last = $names[0]->part ('last');

	   would put the list of tokens	comprising the last name of the	first
	   author into the @last array:	"('Smith')".

   Entry modification methods
       set_type	(TYPE)
	   Sets	the entry's type.

       set_metatype (METATYPE)
	   Sets	the entry's metatype (must be one of the four constants
	   which are all optionally exported from "Text::BibTeX").

       set_key (KEY)
	   Sets	the entry's key.

       set (FIELD, VALUE, ...)
	   Sets	the value of field FIELD.  (VALUE might	be "undef" or
	   unsupplied, in which	case FIELD will	simply be set to "undef" --
	   this	is where the difference	between	the "exists" method and
	   testing the definedness of field values becomes clear.)

	   Multiple (FIELD, VALUE) pairs may be	supplied; they will be
	   processed in	order (i.e. the	input is treated like a	list, not a
	   hash).  For example:

	      $entry->set ('author', $author);
	      $entry->set ('author', $author, 'editor',	$editor);

	   VALUE can be	either a simple	string or a "Text::BibTeX::Value"
	   object; it doesn't matter if	the entry was parsed in	"full post-
	   processing" or "preserve input values" mode.

       delete (FIELD)
	   Deletes field FIELD from an entry.

       set_fieldlist (FIELDLIST)
	   Sets	the entry's list of fields to FIELDLIST, which must be a list
	   reference.  If any of the field names supplied in FIELDLIST are not
	   currently present in	the entry, they	are created with the value
	   "undef" and a warning is printed.  Conversely, if any of the	fields
	   currently present in	the entry are not named	in the list of fields
	   supplied to "set_fields", they are deleted from the entry and
	   another warning is printed.

   Entry output	methods
       write (BIBFILE)
	   Prints a BibTeX entry on the	filehandle associated with BIBFILE
	   (which should be a "Text::BibTeX::File" object, opened for output).
	   Currently the printout is not particularly human-friendly; a	highly
	   configurable	pretty-printer will be developed eventually.

       print (FILEHANDLE)
	   Prints a BibTeX entry on FILEHANDLE.

       print_s ()
	   Prints a BibTeX entry to a string, which is the return value.

   Miscellaneous methods
       warn (WARNING [,	FIELD])
	   Prepends a bit of location information (filename and	line
	   number(s)) to WARNING, appends a newline, and passes	it to Perl's
	   "warn".  If FIELD is	supplied, the line number given	is just	that
	   of the field; otherwise, the	range of lines for the whole entry is
	   given.  (Well, almost -- currently, the line	number of the last
	   field is used as the	last line of the whole entry.  This is a bug.)

	   For example,	if lines 10-15 of file foo.bib look like this:

		author = {Homer	Simpson	and Ned	Flanders},
		title =	{Territorial Imperatives in Modern Suburbia},
		journal	= {Journal of Suburban Studies},
		year = 1997

	   then, after parsing this entry to $entry, the calls

	      $entry->warn ('what a silly entry');
	      $entry->warn ('what a silly journal', 'journal');

	   would result	in the following warnings being	issued:

	      foo.bib, lines 10-14: what a silly entry
	      foo.bib, line 13:	what a silly journal

       line ([FIELD])
	   Returns the line number of FIELD.  If the entry was parsed from a
	   string, this	still works--it's just the line	number relative	to the
	   start of the	string.	 If the	entry was parsed from a	file, this
	   works just as you'd expect it to: it	returns	the absolute line
	   number with respect to the whole file.  Line	numbers	are one-based.

	   If FIELD is not supplied, returns a two-element list	containing the
	   line	numbers	of the beginning and end of the	whole entry.
	   (Actually, the "end"	line number is currently inaccurate: it's
	   really the the line number of the last field	in the entry.  But
	   it's	better than nothing.)

       filename	()
	   Returns the name of the file	from which the entry was parsed.  Only
	   works if the	file is	represented by a "Text::BibTeX::File"
	   object---if you just	passed a filename/filehandle pair to "parse",
	   you can't get the filename back.  (Sorry.)

       Text::BibTeX, Text::BibTeX::File, Text::BibTeX::Structure

       Greg Ward <>

       Copyright (c) 1997-2000 by Gregory P. Ward.  All	rights reserved.  This
       file is part of the Text::BibTeX	library.  This library is free
       software; you may redistribute it and/or	modify it under	the same terms
       as Perl itself.

perl v5.32.0			  2020-08-08		Text::BibTeX::Entry(3)


Want to link to this manual page? Use this URL:

home | help