Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Tutorial(3)	      User Contributed Perl Documentation	   Tutorial(3)

       Chemistry::Tutorial - PerlMol Quick Tutorial

       The modules in the PerlMol toolkit are designed to simplify the
       handling	of molecules from Perl programs	in a general and extensible
       way.  These modules are object-oriented;	however, this tries to assume
       little or no knowledge of object-oriented programming in	Perl. For a
       general introduction about how to use object-oriented modules, see

       This document shows some	of the more common methods included in the
       PerlMol toolkit,	in a reasonable	order for a quick introduction.	For
       more details see	the perldoc pages for each module.

How to read a molecule from a file
       The following code will read a PDB file:

	   use Chemistry::Mol;
	   use Chemistry::File::PDB;
	   my $mol = Chemistry::Mol->read("test.pdb");

       The first two lines (which only need to be used once in a given
       program)	tell Perl that you want	to "use" the specified modules The
       third line reads	the file and returns a molecule	object.

       To read other formats such as MDL molfiles,  you	need to	"use" the
       corresponding module, such as Chemistry::File::MDLMol. Readers for
       several formats are under development.

The molecule object
       "Chemistry::Mol->read" returns a	Chemistry::Mol object. An object is a
       data structure of a given class that has	methods	(i.e. subroutines)
       associated with it. To access or	modify an object's properties, you
       call the	methods	on the object through "arrow syntax":

	   my $name = $mol->name; # return the name of the molecule
	   $mol->name("water");	  # set	the name of the	molecule to "water"

       Note that these so-called accessor methods return the molecule object
       when they are used to set a property. A consequence of that if you
       want, you can "chain" several methods to	set several options in one


       A Chemistry::Mol	object contains	essentially a list of atoms, a list of
       bonds, and a few	generic	properties such	as name, type, and id. The
       atoms and bonds themselves are also objects.

Writing	a molecule file
       To write	a molecule to a	file, just use the "write" method:


       Make sure you "use"d the	right file I/O module. If you want to load all
       the available file I/O modules, you can do it with

	   use Chemistry::File ':auto';

Selecting atoms	in a molecule
       You can get an array of all the atoms by	calling	the atoms method
       without parameters, or a	specific atom by giving	its index:

	   @all_atoms =	$mol->atoms;
	   $atom3 = $mol->atoms(3);

       Note: Atom and bond indices are counted from 1, not from	0. This
       deviation from common Perl usage	was made to be consistent with the way
       atoms are numbered in most common file formats.

       You can select atoms that match an arbitrary expression by using	Perl's
       built-in	"grep" function:

	   # get all oxygen atoms within 3.0 Angstroms of atom 37
	   @close_oxygens = grep {
	       $_->symbol eq 'O'
	       and $_->distance($mol->atoms(37)) < 3.0
	   } $mol->atoms;

       The "grep" function loops through all the atoms returned	by
       "$mol->atoms", aliasing each to $_ at each iteration, and returns only
       those for which the expression in braces	is true.

       Using "grep" is a general way of	finding	atoms; however,	since finding
       atoms by	name is	common,	a convenience method is	available for that

	   $HB1	    = $mol->atoms_by_name('HB1');
	   @H_atoms = $mol->atoms_by_name('H.*'); # name treated as a regex

       Since the atom name is not generally unique, even the first example
       above might match more than one atom. In	that case, only	the first one
       found is	returned. In the second	case, since you	are assigning to an
       array, all matching atoms are returned.

The atom object
       Atoms are usually the most interesting objects in a molecule. Some of
       their main properties are Z, symbol, and	coords.

	   $atom->Z(8);	# set atomic number to 8
	   $symbol = $atom->symbol;
	   $coords = $atom->coords;

   Atom	coordinates
       The coordinates returned	by "$atom->coords" are a Math::VectorReal
       object. You can print these objects and use them	to do vector algebra:

	   $c1		  = $atom1->coords;
	   $c2		  = $atom2->coords;
	   $dot_product	  = $c1	. $c2;	     # returns a scalar
	   $cross_product = $c1	x $c2;	     # returns a vector
	   $delta	  = $c2	- $c1;	     # returns a vector
	   $distance	  = $delta->length;  # returns a scalar
	   ($x,	$y, $z)	  = $c1->array;	     # get the components of $c1
	   print $c1;	  # prints something like "[ 1.0E0  2.0E0  3.0E0 ]"

       Since one is very often interested in calculating the distance between
       atoms, Atom objects provide a "distance"	method to save some typing:

	   $d  = $atom1->distance($atom2);
	   $d2 = $atom1->distance($molecule2);

       In the second case, the value obtained is the minimum distance between
       the atom	and the	molecule. This can be useful for things	such as
       finding the water molecules closest to a	given atom.

       Atoms may also have internal coordinates, which define the position of
       an atom relative	to the positions of other atoms	by means of a
       distance, an angle, and a dihedral angle. Those coordinates can be
       accessed	through	the $atom->internal_coords method, which uses
       Chemistry::InternalCoords objects.

The Bond object
       A Chemistry::Bond object	is a list of atoms with	an associated bond
       order.  In most cases, a	bond has exactly two atoms, but	we don't want
       to exclude possibilities	such as	three-center bonds. You	can get	the
       list of atoms in	a bond by using	the "atoms" method; the	bond order is
       accessed	trough the "order" method;

	   @atoms_in_bond = $bond->atoms;
	   $bond_order	  = $bond->order;

       The other interesting method for	Bond objects is	"length", which
       returns the distance between the	two atoms in a bond (this method
       requires	that the bond have two atoms).

	   my $bondlength = $bond->length;

       In addition to these properties,	Bond objects have the generic
       properties described below. The most important of these,	as far as
       bonds are concerned, is "type".

Generic	properties
       There are three generic properties that all PerlMol objects have:

       id  Each	object must have a unique ID. In most cases you	don't have to
	   worry about it, because it is assigned automatically	unless you
	   specify it. You can use the "by_id" method to select	an object
	   contained in	a molecule:

	       $atom = $mol->by_id("a42");

	   In general, ids are preferable to indices because they don't	change
	   if you delete or move atoms or other	objects.

	   The name of the object does not have	any meaning from the point of
	   view	of the core modules, but most file types have the concept of
	   molecule name, and some (such as PDB) have the concept of atom

	   Again, the meaning of type is not universally defined, but it would
	   likely be used to specify atom types	and bond orders.

       Besides these, the user can specify arbitrary attributes, as discussed
       in the next section.

User-specified attributes
       The core	PerlMol	classes	define very few, very generic properties for
       atoms and molecules. This was chosen as a "minimum common denominator"
       because every file format and program has different ideas about the
       names, values and meaning of these properties. For example, some
       programs	only allow bond	orders of 1, 2,	and 3; some also have
       "aromatic" bonds; some use calculated non-integer bond orders. PerlMol
       tries not to commit to any particular convention, but it	allows you to
       specify whatever	attributes you want for	any object (be it a molecule,
       an atom,	or a bond). This is done through the "attr" method.

	   $mol->attr("melting point", "273.15"); # set	m.p.
	   $color = $atom->attr("color"); # get	atom color

       The core	modules	store these values but they don't know what they mean
       and they	don't care about them. Attributes can have whatever name you
       want, and they can be of	any type. However, by convention, non-core
       modules that need additional attributes should prefix their name	with a
       namespace, followed by a	slash.	(This is done to avoid modules
       fighting	over the same attribute	name.)	For example, atoms created by
       the PDB reader module (Chemistry::File::PDB) have the "pdb/residue"

	   $mol	 = Chemistry::Mol->read("test.pdb");
	   $atom = $mol->atoms(1234);
	   print $atom->attr("pdb/residue_name"); # prints "ALA123"

Molecule subclasses
       You can do lots of interesting thing with plain molecules. However, for
       some applications you may want to extend	the features of	the main
       Chemistry::Mol class. There are several subclasses of Chemistry::Mol
       available already:

	   Used	for macromolecules.

	   Used	for substructure matching.

	   Used	for representing rings (cycles)	in molecules.

	   Used	for representing and applying chemical transformations.

       As an example we'll discuss macromolecules. Future versions of this
       tutorial	may also include a discussion about patterns and rings.

       So far we have assumed that we are dealing with molecules of the
       Chemistry::Mol class.  However, one of the interesting things about
       object-oriented programming is that classes can be extended. For
       dealing with macromolecules, we have the	MacroMol class,	which extends
       the Chemistry::Mol class. This means that in practice you can use a
       Chemistry::MacroMol object exactly as you would use a Chemistry::Mol
       object, but with	some added functionality. In fact, the PDB reader can
       return Chemistry::MacroMol instead of Chemistry::Mol objects just by
       changing	the first example like this:

	   use Chemistry::MacroMol;
	   use Chemistry::File::PDB;
	   my $macromol	= Chemistry::MacroMol->read("test.pdb");

       Now the question	is, what is the	"added functionality" that MacroMol
       objects have on top of the original Chemistry::Mol object?

   The MacroMol	object
       For the purposes	of this	module,	a macromolecule	is considered to be a
       big molecule where atoms	are divided in Domains.	A domain is just a
       subset of the atoms in the molecule; in a protein, a domain would be
       just a residue.

       You can select domains in a molecule in a way similar to	that used for
       atoms and bonds,	in this	case through the "domains" method:

	   my @all_domains = $macromol->domains;
	   my $domain	   = $macromol->domains(57);

   The Domain object
       A domain	is a substructure of a larger molecule.	Other than having a
       parent molecule,	a domain is just like a	molecule. In other words, the
       Domain class extends the	Chemistry::Mol class; it is basically a
       collection of atoms and bonds.

	   my @atoms_in_domain = $domain->atoms;
	   my $atom5_in_domain = $domain->atoms(5);

       If you want to get at a given atom in a given domain in a
       macromolecule, you can "chain" the method calls without having to save
       the Domain object in a temporary	variable:

	   my $domain57_atom5 =	$macromol->domains(57)->atoms(5);
	   my $res233_HA = $macromol->domains(233)->atoms_by_name('HA');

       The second example is a good way	of selecting an	atom from a PDB	file
       when you	know the residue number	and atom name.


       Chemistry::Mol, Chemistry::Atom,	Chemistry::Bond, Chemistry::File,
       Chemistry::MacroMol, Chemistry::Domain.

       The PerlMol website <>

       Ivan Tubert-Brohman <>

       Copyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This
       program is free software; you can redistribute it and/or	modify it
       under the same terms as Perl itself.

perl v5.32.1			  2005-10-26			   Tutorial(3)

NAME | Introduction | How to read a molecule from a file | The molecule object | Writing a molecule file | Selecting atoms in a molecule | The atom object | The Bond object | Generic properties | User-specified attributes | Molecule subclasses | Macromolecules | VERSION | SEE ALSO | AUTHOR | COPYRIGHT

Want to link to this manual page? Use this URL:

home | help