Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
SSEARCH(1)		    General Commands Manual		    SSEARCH(1)

       ssearch - scan a	protein	or DNA sequence	library	for similar sequences

       ssearch	[-a -b # -d # -E # -f #	-g # -h	-i -l FASTLIBS	-L -r STATFILE
       -m # -O filename	-Q -s SMATRIX -w # -z ]	 query-sequence-file  library-

       ssearch [-QabdEfghilmOrswz] query-file @library-name-file

       ssearch [-QabdEfghilmOrswz] query-file "%PRMVI"

       ssearch [-aEfghilmrsw] -	interactive mode

       ssearch	compares  a protein or DNA sequence to all of the entries in a
       sequence	library	using the rigorous Smith-Waterman algorithm (Smith and
       Waterman,  J.  Mol. Biol. (1983)	147:195-197.  For example, ssearch can
       compare a protein sequence to all of the	sequences in the NBRF PIR pro-
       tein  sequence database.	 ssearch will automatically decide whether the
       query sequence is DNA or	protein	by reading the query sequence as  pro-
       tein  and determining whether the `amino-acid composition' is more than
       85% A+C+G+T.  The program can be	invoked	either with command line argu-
       ments  or  in interactive mode.	ssearch	compares a query sequence to a
       sequence	library	which consists of sequence data	interspersed with com-
       ments,  see  below.  The	fasta programs,	including ssearch, use a stan-
       dard text format	sequence file.	Lines beginning	with  or  lower	 case,
       blanks,tabs and unrecognizable characters are ignored.  ssearch expects
       sequences to use	the single letter amino	acid codes, see	protcodes(1) .
       Library files for ssearch should	have the form shown below.

       ssearch	can  be	 directed to change the	scoring	matrix,	search parame-
       ters, output format, and	default	search directories by entering options
       on  the	command	 line  (preceeded by a `-'). All of the	options	should
       preceed the file	name and ktup arguments). Alternately,	these  options
       can be changed by setting environment variables.	 The options and envi-
       ronment variables are:

       -a     (SHOWALL)	Modifies the display of	the two	 sequences  in	align-
	      ments.  Normally,	both sequences are shown only where they over-
	      lap (SHOWALL=0); If -a or	the environment	variable SHOWALL =  1,
	      both sequences are shown in their	entirety.

       -b #   The  number  of similarity scores	to be shown when the -Q	option
	      is used.	This value is usually calculated based on  the	actual

       -d #   The  number  of alignments to be shown.  Normally, ssearch shows
	      the same number of alignments as similarity  scores.   By	 using
	      ssearch  -Q  -b 200 -d 50, one would see the top scoring 200 se-
	      quences and alignments for the 50	best scores.

       -E #   The expectation value threshold for displaying similarity	scores
	      and sequence alignments.	fasta -Q -E 2.0	would show all library
	      sequences	with scores expected to	occur no more than 2 times  by
	      chance in	a search of the	library.

       -f #   Penalty for the first residue in a gap (-12 by default).

       -g #   Penalty for additional residues in a gap (-2 by default).

       -h     Do not display histogram of similarity scores.

       -l file
	      (FASTLIBS)  The  name  of	 the library menu file.	 Normally this
	      will be determined by the	environment variable  FASTLIBS.	  How-
	      ever, a library menu file	can also be specified with -l.

       -L     display  more  information  about	 the  library  sequence	in the

       -m #   (MARKX) =0,1,2,3.	Alternate display of matches and mismatches in
	      alignments.  MARKX=0  uses ":","."," ", for identities, conseva-
	      tive replacements, and  non-conservative	replacements,  respec-
	      tively.  MARKX=1	uses  "	","x", and "X".	 MARKX=2 does not show
	      the second sequence, but uses the	second alignment line to  dis-
	      play  matches  with  a "."  for identity,	or with	the mismatched
	      residue for mismatches.  MARKX=2 is useful  for  aligning	 large
	      numbers  of similar sequences.  MARKX=3 writes out a file	of li-
	      brary sequences in FASTA format.	MARKX=3	should always be  used
	      with the "SHOWALL" (-a) option, but this does not	completely en-
	      sure that	all of the sequences output will be aligned.

       -O filename
	      Sends copy of results to "filename".

       -Q Quiet	option.	 This allows ssearch to	search a database and report
	      the results without asking any questions.	ssearch	 -Q  file  li-
	      brary  >	output	can be put in the background or	run at a later
	      time with	the unix  'at'	command.   The	number	of  similarity
	      scores  and alignments displayed with the	-Q option can be modi-
	      fied with	the -b (scores)	and -d (alignments) options.

       -r     STATFILE Causes ssearch to write out  the	 sequence  identifier,
	      superfamily  number  (if	available),  and  similarity scores to
	      STATFILE for every sequence in the library.  These  results  are
	      not sorted.

       -s str (SMATRIX)	 the  filename	of an alternative scoring matrix file.
	      For protein sequences, BLOSUM50 is used by default;  PAM250  can
	      be used with the command line option -s 250.

       -w #   (LINLEN)	output line length for sequence	alignments.  (normally
	      60, can be set up	to 200).

       -z     Do not do	statistical significance calculation.

       (1)    ssearch musplfm.aa $AABANK

       Compare the amino acid sequence in the file musplfm.aa  with  the  com-
       plete  PIR protein sequence library.  This is extremely slow and	should
       almost never be done.  ssearch is designed to  search  very  small  li-
       braries of sequences.

	    >LCBO bovine preprolactin
	    WILLLSQ ...
	    >LCHU human	...

       (2)    ssearch -a -w 80 musplfm.aa lcbo.aa

       Compare	the  amino  acid  sequence in the file musplfm.aa with the se-
       quences in the file lcbo.aa using ktup =	1.   Show  both	 sequences  in
       their entirety, with 80 residues	on each	output line.

       (3)    ssearch

       Run  the	 ssearch program in interactive	mode.  The program will	prompt
       for the file name for the query sequence, list alternative libraries to
       be seached (if FASTLIBS is set),	and prompt for the ktup.

       You can use your	own sequence files for ssearch,	just be	certain	to put
       a '>' and comment as the	first line before the sequence.

       rss(1), align(1), fasta(1), rdf2(1),protcodes(5), dnacodes(5)

       Bill Pearson

				     local			    SSEARCH(1)


Want to link to this manual page? Use this URL:

home | help