Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Chemistry::FormulaPattUser3Contributed Perl DocumeChemistry::FormulaPattern(3)

       Chemistry::FormulaPattern - Match molecule by formula

	   use Chemistry::FormulaPattern;

	   # somehow get a bunch of molecules...
	   use Chemistry::File::SDF;
	   my @mols = Chemistry::Mol->read("file.sdf");

	   # we	want molecules with six	carbons	and 8 or more hydrogens
	   my $patt = Chemistry::FormulaPattern->new("C6H8-");

	   for my $mol (@mols) {
	       if ($patt->match($mol)) {
		   print $mol->name, " has a nice formula!\n";

	   # a concise way of selecting	molecules with grep
	   my @matches = grep {	$patt->match($mol) } @mols;

       This module implements a	simple language	for describing a range of
       molecular formulas and allows one to find out whether a molecule
       matches the formula specification. It can be used for searching for
       molecules by formula, in	a way similar to the NIST WebBook formula
       search (<>). Note
       however that the	language used by this module is	different from the one
       used by the WebBook!

       Chemistry::FormulaPattern shares	the same interface as
       Chemistry::Pattern.  To perform a pattern matching operation on a
       molecule, follow	these steps.

       1) Create a pattern object, by parsing a	string.	Let's assume that the
       pattern object is stored	in $patt and that the molecule is $mol.

       2) Execute the pattern on the molecule by calling $patt->match($mol).

       If $patt->match returns true, there was a match.	If $patt->match	is
       called two consecutive times with the same molecule, it returns false;
       then true (if there is a	match),	then false, etc. This is because the
       Chemistry::Pattern interface is designed	to allow multiple matches for
       a given molecule, and then returns false	when there are no further
       matches;	in the case of a formula pattern, there	is only	one possible

	   $patt->match($mol); # may return true
	   $patt->match($mol); # always	false
	   $patt->match($mol); # may return true
	   $patt->match($mol); # always	false
	   # ...

       This allows one two use the standard while loop for all kinds of
       patterns	without	having to worry	about endless loops:

	   # $patt might be a Chemistry::Pattern, Chemistry::FormulaPattern,
	   # or	Chemistry::MidasPattern	object
	   while ($patt->match($mol)) {
	       # do something

       Also note that formula patterns don't really have the concept of	an
       atom map, so $patt->atom_map and	$patt->bond_map	always return the
       empty list.

       In the simplest case, a formula pattern may be just a regular formula,
       as used by the Chemistry::File::Formula module. For example, the
       pattern "C6H6" will only	match molecules	with six carbons, six
       hydrogens, and no other atoms.

       The interesting thing is	that one can also specify ranges for the
       elements, as two	hyphen-separated numbers. "C6H8-10" will match
       molecules with six carbons and eight to ten hydrogens.

       Ranges may also be open,	by omitting the	upper part of the range.
       "C6H0-" will match molecules with six carbons and any number of
       hydrogens (i.e.,	zero or	more).

       A formula pattern may also allow	for unspecified	elements by means of
       the asterisk special character, which can be placed anywhere in the
       formula pattern.	For example, "C2H6*" (or "C2*H6, etc.) will match
       C2H6, and also C2H6O, C2H6S, C2H6SO, etc.

       Ranges can also be used after a subformula in parentheses: "(CH2)1-2"
       will match molecules with one or	two carbons and	two to four hydrogens.
       Note, however, that the "structure" of the bracketed part of the
       formula is forgotten, i.e., the multiplier applies to each element
       individually and	does not have to be an integer.	That is, the above
       pattern will match CH2, CH3, CH4, C2H2, C2H3, and C2H4.



       The PerlMol website <>

       Ivan Tubert-Brohman <>

       Copyright (c) 2004 Ivan Tubert-Brohman. All rights reserved. This
       program is free software; you can redistribute it and/or	modify it
       under the same terms as Perl itself.

perl v5.32.1			  2004-08-11	  Chemistry::FormulaPattern(3)


Want to link to this manual page? Use this URL:

home | help