Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
SENSEIDX(5WN)		    WordNettm File Formats		 SENSEIDX(5WN)

       index.sense, sense.idx -	WordNet's sense	index

       The  WordNet  sense  index  provides  an	alternate method for accessing
       synsets and word	senses in the WordNet database.	 It is useful  to  ap-
       plications that retrieve	synsets	or other information related to	a spe-
       cific sense in WordNet, rather than all the senses of a word or	collo-
       cation.	 It can	also be	used with tools	like grep and Perl to find all
       senses of a word	in one or more parts of	speech.	  A  specific  WordNet
       sense,  encoded	as a sense_key,	can be used as an index	into this file
       to obtain its WordNet sense number, the database	 byte  offset  of  the
       synset containing the sense, and	the number of times it has been	tagged
       in the semantic concordance texts.

       Concatenating the lemma and lex_sense fields of a  semantically	tagged
       word  (represented  in  a <wf ... > attribute/value pair) in a semantic
       concordance file, using % as the	concatenation character,  creates  the
       sense_key for that sense, which can in turn be used to search the sense
       index file.

       A sense_key is the best way to represent	a sense	in semantic tagging or
       other systems that refer	to WordNet senses.  sense_keys are independent
       of WordNet sense	numbers	and synset_offsets, which  vary	 between  ver-
       sions of	the database.  Using the sense index and a sense_key, the cor-
       responding synset (via the synset_offset) and WordNet sense number  can
       easily  be  obtained.  A	mapping	from noun sense_keys in	WordNet	1.6 to
       corresponding 2.0 sense_keys is provided	with version 2.0, and  is  de-
       scribed in sensemap(5WN).

       See wndb(5WN) for a thorough discussion of the WordNet database files.

   File	Format
       The  sense  index  file lists all of the	senses in the WordNet database
       with each line representing one sense.  The file	is in alphabetical or-
       der,  fields  are  separated  by	one space, and each line is terminated
       with a newline character.

       Each line is of the form:

	      sense_key	 synset_offset	sense_number  tag_cnt

       sense_key is an encoding	of the word sense.  Programs can  construct  a
       sense  key  in  this  format and	use it as a binary search key into the
       sense index file.  The format of	a sense_key is described below.

       synset_offset is	the byte offset	that the synset	containing  the	 sense
       is  found  at  in the database "data" file corresponding	to the part of
       speech encoded in the sense_key.	 synset_offset is an  8	 digit,	 zero-
       filled  decimal integer,	and can	be used	with fseek(3) to read a	synset
       from the	data file.   When  passed  to  the  WordNet  library  function
       read_synset()  along with the syntactic category, a data	structure con-
       taining the parsed synset is returned.

       sense_number is a decimal integer indicating the	sense  number  of  the
       word,  within  the  part	of speech encoded in sense_key,	in the WordNet
       database.  See wndb(5WN)	for information	about how  sense  numbers  are

       tag_cnt	represents  the	decimal	number of times	the sense is tagged in
       various semantic	concordance texts.  A tag_cnt of 0 indicates that  the
       sense has not been semantically tagged.

   Sense Key Encoding
       A sense_key is represented as:


       where lex_sense is encoded as:


       lemma  is  the  ASCII  text  of the word	or collocation as found	in the
       WordNet database	index file corresponding to pos.  lemma	 is  in	 lower
       case,  and  collocations	are formed by joining individual words with an
       underscore (_) character.

       ss_type is a one	digit decimal integer representing the synset type for
       the  sense.   See Synset	Type below for a listing of the	numbers	corre-
       sponding	to each	synset type.

       lex_filenum is a	two digit decimal integer representing the name	of the
       lexicographer  file  containing	the  synset  for  the sense.  See lex-
       names(5WN) for the list of lexicographer	file names  and	 their	corre-
       sponding	numbers.

       lex_id  is  a two digit decimal integer that, when appended onto	lemma,
       uniquely	identifies a sense within a lexicographer file.	  lex_id  num-
       bers usually start with 00, and are incremented as additional senses of
       the word	are added to the same file, although there is  no  requirement
       that the	numbers	be consecutive or begin	with 00.  Note that a value of
       00 is the default, and therefore	is not present in lexicographer	files.
       Only  non-default lex_id	values must be explicitly assigned in lexicog-
       rapher files.  See wninput(5WN) for information on the format of	 lexi-
       cographer files.

       head_word  is  only  present  if	the sense is in	an adjective satellite
       synset.	It is the lemma	of the first  word  of	the  satellite's  head

       head_id	is  a  two  digit  decimal  integer  that,  when appended onto
       head_word, uniquely identifies the sense	of head_word within a lexicog-
       rapher  file,  as described for lex_id.	There is a value in this field
       only if head_word is present.

   Synset Type
       The synset type is encoded as follows:

	      1	   NOUN
	      2	   VERB
	      3	   ADJECTIVE
	      4	   ADVERB

       For non-satellite senses	the head_word and head_id fields have no  val-
       ues, however the	field separator	character (:) is present.

       WNHOME		   Base	 directory  for	 WordNet.  Default is /usr/lo-

       WNSEARCHDIR	   Directory in	which the WordNet  database  has  been
			   installed.  Default is WNHOME/dict.

			   Base	 directory  for	 WordNet.   Default is C:\Pro-
			   gram	Files\WordNet\3.0.

       index.sense	   sense index

       binsrch(3WN),	 wnsearch(3WN),	     lexnames(5WN),	 wnintro(5WN),
       sensemap(5WN), wndb(5WN), wninput(5WN).

WordNet	3.0			   Dec 2006			 SENSEIDX(5WN)


Want to link to this manual page? Use this URL:

home | help