Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
IFILE(1)			 User Commands			      IFILE(1)

       ifile - core executable for the ifile mail filtering system

       ifile  [-b  file] [-q|-Q] [-g] [-k] [-o]	[-v num] [lexing options] file
       ifile -c	-q|-Q [-T threshold] [-b file] [-g] [-k] [-o] [lexing options]
       file ...
       ifile  [-b  file]  [-d folder] [-i folder|-u folder] [-g] [-k] [-o] [-v
       num] [lexing options] file ...
       ifile -r	[-b file]

       ifile is	a mail filter client that uses machine learning	to classify e-
       mail  into  folders/mail	 boxes.	  The algorithm	that it	uses is	called
       Naive Bayes.   Basically, naive bayes considers each  document  an  un-
       ordered	collection  of	words  and classifies by matching the document
       distribution with the most closely  matching  folder/mailbox  distribu-

       -b, --db-file=file
	      Location to read/store ifile database.  Default is ~/.idata

       -c, --concise
	      equivalent of "ifile -v 0	| head -1 | cut	-f1 -d".  Must be used
	      with -q or -Q.

       -d, --delete=folder
	      Delete the statistics for	each of	files from the category	folder

       -f, --folder-calcs=folder
	      Show the word-probability	calculations for folder

       -g, --log-file
	      Create and store debugging information in	~/.ifile.log

       -i, --insert=folder
	      Add the statistics for each of the files to the category folder

       -k, --keep-infrequent
	      Leave in the database words that	occur  infrequently  (normally
	      they are tossed)

       -l, --query-loocv=folder
	      For  each	 of  the  files, temporarily removes file from folder,
	      performs query and then reinserts	file in	folder.	  Database  is
	      not modified.

       -o, --occur
	      Uses  document  bit-vector representation.  Count	each word once
	      per document.

       -q, --query
	      Output rating scores for each of the files

       -Q, --query-insert
	      For each of the files, output rating scores and  add  statistics
	      for the folder with the highest score

       -T, --threshold=threshold
	      When  used  with	both -c	and -q,	output the two highest ranking
	      categories if their score	differs	by at most threshold  /	 1000,
	      which  can  be  used  to detect border cases.  When used with -q
	      only and any threshold > 0, output the score difference percent-
	      age.  For	example,
		     ifile -T1 -q foo.txt
	      might result in
		     spam -15570.48640776
		     non-spam -18728.00272369
		     diff[spam,non-spam](%) 9.21
	      If so, then
		     ifile -T93	-q -c foo.txt
	      will result in
		     foo.txt spam,non-spam
		     ifile -T92	-q -c foo.txt
	      will result in
		     foo.txt spam

       -r, --reset-data
	      Erases all currently stored information

       -u, --update=folder
	      Same as 'insert' except only adds	stats if folder	already	exists

       -v, --verbosity=num
	      Amount  of  output while running:	0=silent, 1=quiet, 2=progress,
	      3=verbose, 4=debug

       Lexing options:

       -a, --alpha-lexer
	      Lex words	as sequences of	alphabetic characters (default)

       -A, --alpha-only-lexer
	      Only lex space-separated character sequences which are  composed
	      entirely of alphabetic characters

       -h, --strip-header
	      Skip all of the header lines except Subject:, From: and To:

       -m, --max-length=char
	      Ignore  portion of message after first char characters.  Use en-
	      tire message if char set to 0.  Default is 50,000.

       -p, --print-tokens
	      Just tokenize and	print, don't do	any other  processing.	 Docu-
	      ments are	returned as a list of word, frequency pairs.

       -s, --no-stoplist
	      Do not throw out overly frequent (stoplist) words	when lexing

       -S, --stemming
	      Use 'Porter' stemming algorithm when lexing documents

       -w, --white-lexer
	      Lex words	as sequences of	space separated	characters

       If  no files are	specified on the command line, ifile will use standard
       input as	its message to process.

       -?, --help
	      Give this	help list

	      Give a short usage message

       -V, --version
	      Print program version

       Mandatory or optional arguments to long options are also	 mandatory  or
       optional	for any	corresponding short options.

	      ifile  database  (default	 location).  See FAQ included in ifile
	      package for description of database format.

       Jason  Rennie  <>  and  many  others.    See   the
       ChangeLog for the full list.

       Before  using  ifile,  you  need	 to train it.  Let's say that you have
       three folders, "spam", "ifile" and "friends", and the following	direc-
       tory structure:

		 |	    +--2
		 |	    +--3
		 |	    +--2
		 |	    +--3

       The following commands build the	ifile database in ~/.idata (use	the -d
       option to specify a different location for the database):

	      ifile -h -i spam /spam/*
	      ifile -h -i ifile	/ifile/*
	      ifile -h -i friends /friends/*

       The -h option strips off	headers	besides	"Subject:", "From:" and	"To:".
       I find that -h improves ifile's performance, but	you may	find otherwise
       for your	personal collection.

       Note that we have made the argument to -i the same as the corresponding
       folder  name. This is not necessary. The	argument to -i can be any word
       you want	to use to identify a category of e-mails. The argument	to  -i
       must not	include	space characters (including tab, feedline, etc.).

       At this point, your ~/.idata file should	look something like this:

	      spam ifile friends
	      662 1020 6451
	      3	3 3
	      jrennie 9	0:3 1:18 2:16
	      mindspring 6 1:7 2:5
	      make 9 0:5 1:3
	      yahoo 9 0:1 1:22 2:2

       The  first  line	is the space-separated list of folders.	Their ordering
       specifies a numbering (spam=0, ifile=1, friends=2). The second line  is
       a  token	 count	for each folder	(e.g. 662 tokens observed in the three
       spam messages). The third line is an e-mail count for each folder (e.g.
       3  e-mails  for	each  of spam, ifile and friends). Each	following line
       specifies statistics for	a word.	The format of a	line is

	      word age folder:count [folder:count ...]

       where folder is the folder number determined by the first  line	order-
       ing.  Folders  with a count of zero are not listed. So, the line	begin-
       ning with "jrennie" indicates that "jrennie" appeared 3 times in	"spam"
       e-mails,	18 times in "ifile" e-mails and	16 times in "friends" e-mails.
       The age is the number of	e-mails	that have  been	 processed  since  the
       word  was  added	to the database. Very infrequent words are pruned from
       the database to keep the	database size down.

       Now that	you have a database, you might want to	filter	some  e-mails.
       Say you have the	following incoming e-mails:


       To find out what	folders	ifile thinks these e-mails belong in, run

	      ifile -c -q /inbox/1
	      ifile -c -q /inbox/2
	      ifile -c -q /inbox/3

       Let's  say that 1 is about ifile, 2 is spam and 3 is from a friend. As-
       suming ifile does its job correctly, you'll see output like this:

	      /inbox/1 ifile
	      /inbox/2 spam
	      /inbox/3 friends

       With such little	training data, ifile is	unlikely  to  get  the	labels
       correct,	but you	should get the idea :-)

       Now,  if	you move the e-mails to	the folders suggested by ifile,	you'll
       want to update the database accordingly.	You can	do this	 with  the  -i
       option,	like  before.  Or, you can simply use -Q in place of -q	above.
       This automatically adds the e-mail to the folder	ifile suggests.

       Now, assume for a moment	that e-mail 1 was actually spam. We've added 1
       to ifile	and put	it in the ifile	folder.	We need	to move	it to the spam
       folder and update the ifile database accordingly.  We  can  update  the
       database	with the following command:

	      ifile -d ifile -i	spam /inbox/1

       This deletes the	e-mail from "ifile" and	adds it	to "spam".

       Examples	 of how	to use ifile together with procmail(1) and metamail(1)
       can be found in the directory /usr/share/doc/ifile/examples.

ifile 1.3.4			 November 2004			      IFILE(1)


Want to link to this manual page? Use this URL:

home | help