RSS(1)			    General Commands Manual			RSS(1)

       prss - test a protein sequence similarity for significance

       prss  [-Q  -f  #	 -g # -h -O file -s SMATRIX -w # ] sequence-file-1 se-
       quence-file-2 [ #-of-shuffles ]

       prss [-fghsw] - interactive mode

       prss is used to evaluate	the significance of a protein  sequence	 simi-
       larity  score  by comparing two sequences and calculating optimal simi-
       larity scores, and then repeatedly shuffling the	second	sequence,  and
       calculating  optimal  similarity	 scores	using the Smith-Waterman algo-
       rithm. An extreme value distribution is then fit	 to  the  shuffled-se-
       quence scores.  The characteristic parameters of	the extreme value dis-
       tribution are then used to estimate the probability that	 each  of  the
       unshuffled sequence scores would	be obtained by chance in one sequence,
       or in a number of sequences equal to the	number of shuffles.  This pro-
       gram  is	 derived from rdf2, which was described	by Pearson and Lipman,
       PNAS (1988) 85:2444-2448, and Pearson (Meth. Enz.  183:63-98).  Use  of
       the extreme value distribution for estimating the probabilities of sim-
       ilarity scores  was  described  by  Altshul  and	 Karlin,  PNAS	(1990)
       87:2264-2268.  The 'z-values' calculated	by rdf2	are not	as informative
       as the P-values and expectations	calculated by prdf.  prss uses	calcu-
       lates  optimal  scores using the	same rigorous Smith-Waterman algorithm
       (Smith and Waterman, J. Mol. Biol.  (1983)  147:195-197)	 used  by  the
       ssearch program.

       prss also allows	a more sophisticated shuffling method: residues	can be
       shuffled	within a local window, so that the  order  of  residues	 1-10,
       11-20, etc, is destroyed	but a residue in the first 10 is never swapped
       with a residue outside the first	ten, and so on for each	local window.

       (1)    prss  -w 10 musplfm.aa lcbo.aa

       Compare the amino acid sequence in the file  musplfm.aa	with  that  in
       lcbo.aa,	 then  shuffle	lcbo.aa	100 times using	a local	shuffle	with a
       window of 10.  Report the significance of the  unshuffled  musplfm/lcbo
       comparison scores with respect to the shuffled scores.

       (2)    prss musplfm.aa lcbo.aa

       Compare	the  amino  acid  sequence in the file musplfm.aa with the se-
       quences in the file lcbo.aa.

       (3)    prss

       Run prss	in interactive mode.  The program will	prompt	for  the  file
       name  of	 the two query sequence	files and the number of	shuffles to be
       used.  100 shuffles are calculated  by  default;	 250  -	 500  shuffles
       should provide more accurate probability	estimates.

       prss  can  be directed to change	the scoring matrix, gap	penalties, and
       shuffle parameters by entering options on the command  line  (preceeded
       by  a  `-'). All	of the options should preceed the file names number of

       -f #   Penalty for the first residue in a gap (-12 by default).

       -g #   Penalty for additional residues in a gap (-2 by default).

       -h     Do not display histogram of similarity scores.

       -Q -q  "quiet" -	do not prompt for filename.

       -O filename
	      send copy	of results to "filename."

       -s str (SMATRIX)	the filename of	an alternative	scoring	 matrix	 file.
	      For  protein  sequences, BLOSUM50	is used	by default; PAM250 can
	      be  used	with  the  command  line  option  -s  250(or  with  -s

       ssearch(1), prdf(1), fasta(1), lfasta(1), protcodes(5)

       Bill Pearson

       The  curve  fitting routines in rweibull.c were provided	by Phil	Green,
       Washington U., St. Louis.

				     local				RSS(1)


