Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
RDF2(1)			    General Commands Manual		       RDF2(1)

       prdf - test a protein sequence similarity for significance

       prdf  [-f  #  -g	 # -h -k # -O filename -s SMATRIX -w window-size ] se-
       quence-file-1 sequence-file-2 [ ktup ] [	#-of-shuffles ]

       prdf [-fghks] - interactive mode

       prdf is used to evaluate	the significance of a protein  sequence	 simi-
       larity score by comparing two sequences and calculating initial and op-
       timized similarity scores, and then repeatedly shuffling	the second se-
       quence,	and  calculating  the  initial	and optimized scores.  Extreme
       value distributions are then fit	to each	of the three distributions  of
       scores.	 The  characteristic parameters	of the extreme value distribu-
       tion are	then used to estimate the probability that each	of the unshuf-
       fled sequence scores would be obtained by chance	in one sequence, or in
       a number	of sequences equal to the number of shuffles.  This program is
       derived	from  rdf2,  which  was	 described by Pearson and Lipman, PNAS
       (1988) 85:2444-2448, and	Pearson	(Meth. Enz.  183:63-98).  Use  of  the
       extreme value distribution for estimating the probabilities of similar-
       ity  scores  was	 described  by	Altshul	 and   Karlin,	 PNAS	(1990)
       87:2264-2268.  The 'z-values' calculated	by rdf2	are not	as informative
       as the P-values and expectations	calculated by prdf.

       prdf also allows	a more sophisticated shuffling method: residues	can be
       shuffled	 within	 a  local  window, so that the order of	residues 1-10,
       11-20, etc, is destroyed	but a residue in the first 10 is never swapped
       with a residue outside the first	ten, and so on for each	local window.

       (1)    prdf -w 10 musplfm.aa lcbo.aa 1 250

       Compare	the  amino  acid  sequence in the file musplfm.aa with that in
       lcbo.aa,	then shuffle lcbo.aa 250 times using a local  shuffle  with  a
       window  of 10 and calculate initial and optimized similarity scores us-
       ing Ktup	= 1.  Report the significance of the  unshuffled  musplfm/lcbo
       comparison scores with respect to the shuffled scores.

       (2)    prdf musplfm.aa lcbo.aa 2

       Compare	the  amino  acid  sequence in the file musplfm.aa with the se-
       quences in the file lcbo.aa using ktup =	2.

       (3)    prdf

       Run prdf	in interactive mode.  The program will	prompt	for  the  file
       name of the two query sequence files, the ktup, and the number of shuf-
       fles to be used.	 100 shuffles are calculated by	 default;  250	-  500
       shuffles	should provide more accurate probability estimates.

       prss  can  be directed to change	the scoring matrix, gap	penalties, and
       shuffle parameters by entering options on the command  line  (preceeded
       by  a  `-'). All	of the options should preceed the file names number of

       -f #   Penalty for the first residue in a gap (-12 by default).

       -g #   Penalty for additional residues in a gap (-2 by default).

       -h     Do not display histogram of similarity scores.

       -k #   (GAPCUT) Sets the	threshold for joining the initial regions  for
	      calculating the initn score.

       -Q -q  "quiet" -	do not prompt for filename.

       -O filename
	      send copy	of results to "filename."

       -s str (SMATRIX)	 the  filename	of an alternative scoring matrix file.
	      For protein sequences, BLOSUM50 is used by default;  PAM250  can
	      be  used	with  the  command  line  option  -s  250(or  with  -s


       Bill Pearson

       The curve fitting routines in rweibull.c	were provided by  Phil	Green,
       Washington U., St. Louis.

				     local			       RDF2(1)


Want to link to this manual page? Use this URL:

home | help