Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
Text::Levenshtein(3)  User Contributed Perl Documentation Text::Levenshtein(3)

       Text::Levenshtein - calculate the Levenshtein edit distance between two

	use Text::Levenshtein qw(distance);

	print distance("foo","four");
	# prints "2"

	my @words     =	qw/ four foo bar /;
	my @distances =	distance("foo",@words);

	print "@distances";
	# prints "2 0 3"

       This module implements the Levenshtein edit distance, which measures
       the difference between two strings, in terms of the edit	distance.
       This distance is	the number of substitutions, deletions or insertions
       ("edits") needed	to transform one string	into the other one (and	vice
       versa).	When two strings have distance 0, they are the same.

       To learn	more about the Levenshtein metric, have	a look at the
       wikipedia page <>.

       The simplest usage will take two	strings	and return the edit distance:

	$distance = distance('brown', 'green');
	# returns 3, as	'r' and	'n' don't change

       Instead of a single second string, you can pass a list of strings.
       Each string will	be compared to the first string	passed,	and a list of
       the edit	distances returned:

	@words	   = qw/ green trainee brains /;
	@distances = distances('brown',	@words);
	# returns (3, 5, 3)

       Previous	versions of this module	provided an alternative
       implementation, in the function "fastdistance()".  This function	is
       still provided, for backwards compatibility, but	they now run the same
       function	to calculate the edit distance.

       Unlike "distance()", "fastdistance()" only takes	two strings, and
       returns the edit	distance between them.

       Both the	"distance()" and "fastdistance()" functions can	take a hashref
       with optional arguments,	as the final argument.	At the moment the only
       option is "ignore_diacritics".  If this is true,	then any diacritics
       are ignored when	calculating edit distance. For example,	"cafe" and
       "cafA(C)" normally have an edit distance	of 1, but when diacritics are
       ignored,	the distance will be 0:

	use Text::Levenshtein 0.11 qw/ distance	/;
	$distance = distance($word1, $word2, {ignore_diacritics	=> 1});

       If you turn on this option, then	Unicode::Collate will be loaded, and
       used when comparing characters in the words.

       Early version of	"Text::Levenshtein" didn't support this	version, so
       you should require version 0.11 or later, as above.

       There are many different	modules	on CPAN	for calculating	the edit
       distance	between	two strings. Here's just a selection.

       Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the
       Levenshtein algorithm that require a C compiler,	but will be a lot
       faster than this	module.

       Text::Levenshtein::Flexible is another C	implementation,	but offers
       some twists: you	can specify a maximum distance that you're interested
       in, which makes it faster; you can also give different costs to
       insertion, deletion, and	substitution. Hasn't been updated since	2014.

       Text::Levenshtein::Edlib	is a Perl wrapper around a C++ library that
       provides	the Levenshtein	edit distance and optimal alignment path for a
       pair of strings.	 It doesn't support UTF-8 strings, though.

       Text::Levenshtein::BV implements	the Levenshtein	algorithm using	bit
       vectors,	and claims to be faster	than this implementation.  I haven't
       benchmarked them.

       The Damerau-Levenshtein edit distance is	like the Levenshtein distance,
       but in addition to insertion, deletion and substitution,	it also
       considers the transposition of two adjacent characters to be a single
       edit.  The module Text::Levenshtein::Damerau defaults to	using a	pure
       perl implementation, but	if you've installed
       Text::Levenshtein::Damerau::XS then it will be a	lot quicker.

       Text::WagnerFischer is an implementation	of the Wagner-Fischer edit
       distance, which is similar to the Levenshtein, but applies different
       weights to each edit type.

       Text::Brew is an	implementation of the Brew edit	distance, which	is
       another algorithm based on edit weights.

       Text::Fuzzy provides a number of	operations for partial or fuzzy
       matching	of text	based on edit distance.	Text::Fuzzy::PP	is a pure perl
       implementation of the same interface.

       String::Similarity takes	two strings and	returns	a value	between	0
       (meaning	entirely different) and	1 (meaning identical).	Apparently
       based on	edit distance.

       Text::Dice calculates Dice's coefficient
       <,rensenaDice_coefficient> for two
       strings.	This formula was originally developed to measure the
       similarity of two different populations in ecological research.

       String::KeyboardDistance	and String::KeyboardDistanceXS calculate the
       "keyboard distance" between two strings.


       Dree Mistrut originally wrote this module and released it to CPAN in

       Josh Goldberg then took over maintenance	and released versions between
       2004 and	2008.

       Neil Bowers (NEILB on CPAN) is now maintaining this module.  Version
       0.07 was	a complete rewrite, based on one of the	algorithms on the
       wikipedia page.

       This software is	copyright (C) 2002-2004	Dree Mistrut.  Copyright (C)
       2004-2014 Josh Goldberg.	 Copyright (C) 2014- Neil Bowers.

       This is free software; you can redistribute it and/or modify it under
       the same	terms as the Perl 5 programming	language system	itself.

perl v5.32.1			  2021-03-24		  Text::Levenshtein(3)


Want to link to this manual page? Use this URL:

home | help