Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
BMF(1)									BMF(1)

       bmf - efficient Bayesian	mail filter

       bmf [-t]	[-n] [-s] [-N] [-S] [-f	fmt] [-d db] [-i file] [-k n] [-m type]	[-p]
	   [-v]	[-V] [-h]

       bmf  is	a  Bayesian  mail  filter. In its normal mode of operation, it
       takes an	email message or other text on standard	input, does a  statis-
       tical check against lists of "good" and "spam" words, registers the new
       data, and returns a status code indicating whether or not  the  message
       is spam.	BMF is written with fast, zero-copy algorithms,	coded directly
       in C, and tuned for speed. It aims to be	faster,	smaller, and more ver-
       satile than similar applications.

       bmf  supports both mbox and maildir mail	storage	formats. It will auto-
       matically process multiple messages within an mbox file separately.

       Without command-line options, bmf processes the input, registers	it  as
       either  "good"  or  "spam", and returns the appropriate error code. The
       wordlist	directory and nonexistent wordfiles are	created	if absent.

       -t Test to see if the input is spam. The	word lists are not updated.  A
       report is written to stdout showing the final score and the tokens with
       the highest deviation form a mean of 0.5.

       -n Register the input as	non-spam.

       -s Register the input as	spam.

       -N Register the input as	non-spam and  undo  a  prior  registration  as

       -S  Register  the  input	 as spam and undo a prior registration as non-

       -f fmt Specify database format. Valid formats are text, db, and	mysql.
       Text  is	 always	 valid.	 The others may	not be available if the	corre-
       sponding	option was not enabled at compile time.	The default is	db  if
       available, else text.

       -d  db Specify database or directory for	loading	and saving word	lists.
       The default is ~/.bmf in	text mode.

       -i file Use file	for input instead of stdin.

       -k n Specify the	number of extrema (keepers) to use in the Bayes	calcu-
       lation. The default is 15.

       -m fmt Specify mail storage format. Valid formats are mbox and maildir.
       The default is to automatically detect the mail	storage	 format.  This
       option is deprecated.

       -p  Copy	 the input to the output (passthrough) and insert spam headers
       in the style of SpamAssassin. An	X-Spam-Status  header  is  always  in-
       serted  with processing details.	The contents of	this header always be-
       gin with	either "Yes" or	"No". If the input is judged to	be  spam,  the
       header "X-Spam-Flag: YES" is also inserted.

       -v Be more verbose. This	option is not well supported yet.

       -V Display version information.

       -h Display usage	information.

       bmf  treats its input as	a bag of tokens. Each token is checked against
       "good" and "bad"	wordlists, which maintain counts  of  the  numbers  of
       times  it  has  occurred	 in non-spam and spam mails. These numbers are
       used to compute the probability that a mail in which the	 token	occurs
       is spam.	After probabilities for	all input tokens have been computed, a
       fixed number of the probabilities that deviate  furthest	 from  average
       are combined using Bayes's theorem on conditional probabilities.

       While  this  method  sounds  crude  compared to the more	usual pattern-
       matching	approach, it turns out to be extremely	effective.  Paul  Gra-
       ham's  paper  A	Plan  For Spam: is
       recommended reading.

       bmf improves on Paul's proposal by doing	smarter	lexical	 analysis.  In
       particular,  hostnames  and IP addresses	are not	discarded, and certain
       types of	MTA information	are discarded (such as message ids and dates).

       MIME and	other attachments are not decoded.  Experience	from  watching
       the  token  streams suggests that spam with enclosures invariably gives
       itself away through cues	in the headers and non-enclosure parts.	 None-
       theless,	I would	like to	add the	ability	to decode quoted-printable and
       perhaps base64 encodings	for textual attachments.

       Please see the README for samples and suggestions.

       In passthrough mode: zero for success, nonzero for failure.

       In non-passthrough mode:	0 for spam; 1 for non-spam; 2 for I/O or other

	      List of good tokens for text mode.

	      List of bad tokens for text mode.

	      List of good tokens for libdb mode.

	      List of bad tokens for libdb mode.

       The lexer should	recognize multiline headers.

       The lexer should	recognize MIME attachments.

       Content-Transfer-Encoding is not	decoded.

       Tom Marshall <>.

       The  Bayes  algorithm  is from bogofilter by Eric S. Raymond <esr@thyr->. bogofilter can	be  found  at  the  bogofilter	project	 page:



Want to link to this manual page? Use this URL:

home | help