Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
WNINPUT(5WN)		    WordNettm File Formats		  WNINPUT(5WN)

       noun.suffix,  verb.suffix,  adj.suffix, adv.suffix - WordNet lexicogra-
       pher files that are input to grind(1WN)

       WordNet's source	files are written by  lexicographers.	They  are  the
       product of a detailed relational	analysis of lexical semantics: a vari-
       ety of lexical and semantic relations are used to represent the organi-
       zation  of lexical knowledge.  Two kinds	of building blocks are distin-
       guished in the source files: word forms and word	meanings.  Word	 forms
       are represented in their	familiar orthography; word meanings are	repre-
       sented by synonym sets (synsets)	- lists	of synonymous word forms  that
       are interchangeable in some context.  Two kinds of relations are	recog-
       nized: lexical and  semantic.   Lexical	relations  hold	 between  word
       forms; semantic relations hold between word meanings.

       Lexicographer  files correspond to the syntactic	categories implemented
       in WordNet - noun, verb,	adjective and adverb.  All of the synsets in a
       lexicographer  file  are	 in  the same syntactic	category.  Each	synset
       consists	of a list of synonymous	words or collocations  (eg.  "fountain
       pen", "take in"), and pointers that describe the	relations between this
       synset and other	synsets.  These	relations include (but are not limited
       to) hypernymy/hyponymy, antonymy, entailment, and meronymy/holonymy.  A
       word or collocation may appear in more than one	synset,	 and  in  more
       than  one  part of speech.  Each	use of a word in a synset represents a
       sense of	that word in the part of speech	corresponding to the synset.

       Adjectives may be organized into	clusters containing head  synsets  and
       satellite  synsets.   Adverbs  generally	 point	to the adjectives from
       which they are derived.

       See wngloss(7WN)	for a glossary of WordNet terminology and a discussion
       of the database's content and logical organization.

   Lexicographer File Names
       The names of the	lexicographer files are	of the form:


       where  pos is either noun, verb,	adj or adv.  suffix may	be used	to or-
       ganize groups of	synsets	into different files, for example  noun.animal
       and  noun.plant.	  See  lexnames(5WN)  for a list of lexicographer file
       names that are used in building WordNet.

       Pointers	are used to represent the relations between the	words  in  one
       synset and another.  Semantic pointers represent	relations between word
       meanings, and therefore pertain to all of the words in the  source  and
       target  synsets.	  Lexical  pointers  represent	relations between word
       forms, and pertain only to specific words  in  the  source  and	target
       synsets.	 The following pointer types are usually used to indicate lex-
       ical relations: Antonym,	Pertainym, Participle, Also  See,  Derivation-
       ally Related.  The remaining pointer types are generally	used to	repre-
       sent semantic relations.

       A relation from a source	to a target synset is formed by	 specifying  a
       word  from  the	target	synset	in  the	source synset, followed	by the
       pointer_symbol indicating the pointer type.  The	location of a  pointer
       within a	synset defines it as either lexical or semantic.  The Lexicog-
       rapher File Format section describes the	syntax for entering a semantic
       pointer,	 and  Word  Syntax describes the syntax	for entering a lexical

       Although	there are many pointer types, only certain types of  relations
       are permitted between synsets of	each syntactic category.

       The pointer_symbols for nouns are:
	      !	   Antonym
	      @	   Hypernym
	      @i   Instance Hypernym
	      ~	   Hyponym
	      ~i   Instance Hyponym
	      #m   Member holonym
	      #s   Substance holonym
	      #p   Part	holonym
	      %m   Member meronym
	      %s   Substance meronym
	      %p   Part	meronym
	      =	   Attribute
	      +	   Derivationally related form
	      ;c   Domain of synset - TOPIC
	      -c   Member of this domain - TOPIC
	      ;r   Domain of synset - REGION
	      -r   Member of this domain - REGION
	      ;u   Domain of synset - USAGE
	      -u   Member of this domain - USAGE

       The pointer_symbols for verbs are:
	      !	   Antonym
	      @	   Hypernym
	      ~	   Hyponym
	      *	   Entailment
	      >	   Cause
	      ^	   Also	see
	      $	   Verb	Group
	      +	   Derivationally related form
	      ;c   Domain of synset - TOPIC
	      ;r   Domain of synset - REGION
	      ;u   Domain of synset - USAGE

       The pointer_symbols for adjectives are:
	      !	   Antonym
	      &	   Similar to
	      <	   Participle of verb
	      \	   Pertainym (pertains to noun)
	      =	   Attribute
	      ^	   Also	see
	      ;c   Domain of synset - TOPIC
	      ;r   Domain of synset - REGION
	      ;u   Domain of synset - USAGE

       The pointer_symbols for adverbs are:
	      !	   Antonym
	      \	   Derived from	adjective
	      ;c   Domain of synset - TOPIC
	      ;r   Domain of synset - REGION
	      ;u   Domain of synset - USAGE

       Many  pointer  types are	reflexive, meaning that	if a synset contains a
       pointer to another synset, the other synset  should  contain  a	corre-
       sponding	 reflexive  pointer.  grind(1WN) automatically inserts missing
       reflexive pointers for the following pointer types:

		  |	  Pointer	  |	   Reflect	   |
		  |Antonym		  | Antonym		   |
		  |Hyponym		  | Hypernym		   |
		  |Hypernym		  | Hyponym		   |
		  |Instance Hyponym	  | Instance Hypernym	   |
		  |Instance Hypernym	  | Instance Hyponym	   |
		  |Holonym		  | Meronym		   |
		  |Meronym		  | Holonym		   |
		  |Similar to		  | Similar to		   |
		  |Attribute		  | Attribute		   |
		  |Verb	Group		  | Verb Group		   |
		  |Derivationally Related | Derivationally Related |
		  |Domain of synset	  | Member of Doman	   |
   Verb	Frames
       Each verb synset	contains a list	of generic sentence frames  illustrat-
       ing  the	types of simple	sentences in which the verbs in	the synset can
       be used.	 For some verb senses, example sentences  illustrating	actual
       uses  of	 the  verb  are	 provided.   (See  Verb	 Example  Sentences in
       wndb(5WN).)  Whenever there is no example sentence,  the	 generic  sen-
       tence frames specified by the lexicographer are used.  The generic sen-
       tence frames are	entered	in a synset as a comma-separated list of inte-
       ger  frame  numbers.   The  following  list  is the text	of the generic
       frames, preceded	by their frame numbers:

	      1			     Something ----s
	      2			     Somebody ----s
	      3			     It	is ----ing
	      4			     Something is ----ing PP
	      5			     Something ----s something Adjective/Noun
	      6			     Something ----s Adjective/Noun
	      7			     Somebody ----s Adjective
	      8			     Somebody ----s something
	      9			     Somebody ----s somebody
	      10		     Something ----s somebody
	      11		     Something ----s something
	      12		     Something ----s to	somebody
	      13		     Somebody ----s on something
	      14		     Somebody ----s somebody something
	      15		     Somebody ----s something to somebody
	      16		     Somebody ----s something from somebody
	      17		     Somebody ----s somebody with something
	      18		     Somebody ----s somebody of	something
	      19		     Somebody ----s something on somebody
	      20		     Somebody ----s somebody PP
	      21		     Somebody ----s something PP
	      22		     Somebody ----s PP
	      23		     Somebody's	(body part) ----s
	      24		     Somebody ----s somebody to	INFINITIVE
	      25		     Somebody ----s somebody INFINITIVE
	      26		     Somebody ----s that CLAUSE
	      27		     Somebody ----s to somebody
	      28		     Somebody ----s to INFINITIVE
	      29		     Somebody ----s whether INFINITIVE
	      30		     Somebody ----s somebody into V-ing	something
	      31		     Somebody ----s something with something
	      32		     Somebody ----s INFINITIVE
	      33		     Somebody ----s VERB-ing
	      34		     It	----s that CLAUSE
	      35		     Something ----s INFINITIVE

   Lexicographer File Format
       Synsets are entered one per line, and each line is  terminated  with  a
       newline character.  A line containing a synset may be as	long as	neces-
       sary, but no newlines can be entered within a synset.  Within a synset,
       spaces  or  tabs	 may  be used to separate entities.  Items enclosed in
       italicized square brackets may not be present.

       The general synset syntax is:

	      {	  words	 pointers   (  gloss  )	 }

       Synsets of this form are	valid  for  all	 syntactic  categories	except
       verb,  and  are	referred to as basic synsets.  At least	one word and a
       gloss are required to form a valid synset.  Pointers entered  following
       all  the	words in a synset represent semantic relations between all the
       words in	the source and target synsets.

       For verbs, the basic synset syntax is defined as	follows:

	      {	  words	 pointers  frames   (  gloss  )	 }

       Adjective may be	organized into clusters	containing one	or  more  head
       synsets	and optional satellite synsets.	 Adjective clusters are	of the

	      head synset
	      [satellite synsets]
	      [additional head/satellite synsets]

       Each adjective cluster is enclosed in square brackets, and may have one
       or more parts.  Each part consists of a head synset and optional	satel-
       lite synsets that are conceptually similar to the head  synset's	 mean-
       ing.   Parts of a cluster are separated by one or more hyphens (-) on a
       line by themselves, with	the terminating	square bracket	following  the
       last  synset.   Head  and  satellite synsets follow the syntax of basic
       synsets,	however	a "Similar to" pointer must be	specified  in  a  head
       synset for each of its satellite	synsets.  Most adjective clusters con-
       tain two	antonymous parts.  See wngloss(7WN) for	a discussion of	adjec-
       tive clusters, and Special Adjective Syntax for more information	on ad-
       jective cluster syntax.

       Synsets for relational adjectives (pertainyms) and  participial	adjec-
       tives  do  not  adhere  to  the	cluster	structure.  They use the basic
       synset syntax.

       Comments	can be entered in a lexicographer file by enclosing  the  text
       of the comment in parentheses.  Note that comments cannot appear	within
       a synset, as parentheses	within a synset	 have  an  entirely  different
       meaning	(see  Gloss  Syntax  ).	 However, entire synsets (or adjective
       clusters) can be	"commented out"	 by  enclosing	them  in  parentheses.
       This  is	often used by the lexicographers to verify the syntax of files
       under development or to leave a note to oneself while  working  on  en-

   Word	Syntax
       A  synset  must	have at	least one word,	and the	words of a synset must
       appear after the	opening	brace and before any other synset  constructs.
       A word may be entered in	either the simple word or word/pointer syntax.

       A simple	word is	of the form:

	      word[ ( marker ) ][lex_id] ,

       word  may  be entered in	any combination	of upper and lower case	unless
       it is in	an adjective cluster.  A collocation is	entered	by joining the
       individual words	with an	underscore character (_).  Numbers (integer or
       real) may be entered, either by themselves or as	part of	a word string,
       by following the	number with a double quote (").

       See  Special  Adjective	Syntax for a description of adjective clusters
       and markers.

       word may	be followed by an integer lex_id from 1	to 15.	The lex_id  is
       used to distinguish different senses of the same	word within a lexicog-
       rapher file.  The lexicographer assigns lex_id values, usually  in  as-
       cending	order,	although  there	 is no requirement that	the numbers be
       consecutive.  The default is 0, and does	not have to be	specified.   A
       lex_id  must  be	 used  on pointers if the desired sense	has a non-zero
       lex_id in its synset specification.

       Word/pointer syntax is of the form:

	      [	  word[	( marker ) ][lex_id] ,	 pointers   ]

       This syntax is used when	one or more pointers correspond	 only  to  the
       specific	word in	the word/pointer set, rather than all the words	in the
       synset, and represents a	lexical	relation.  Note	 that  a  word/pointer
       set  appears within a synset, therefore the square brackets used	to en-
       close it	are treated differently	from those used	to define an adjective
       cluster.	  Only one word	can be specified in each word/pointer set, and
       any number of pointers may be included.	A synset can have  any	number
       of  word/pointer	 sets.	Each is	treated	by grind(1WN) essentially as a
       word, so	they all must appear before any	synset	pointers  representing
       semantic	relations.

       For  verbs, the word/pointer syntax is extended in the following	manner
       to allow	the user to specify generic sentence frames that, like	point-
       ers,  correspond	 only to a specific word, rather than all the words in
       the synset.  In this case, pointers are optional.

	      [	  word ,   [pointers]  frames	]

   Pointer Syntax
       Pointers	are optional in	synsets.  If a pointer is specified outside of
       a  word/pointer set, the	relation is applied to all of the words	in the
       synset, including any words specified using  the	 word/pointer  syntax.
       This indicates a	semantic relation between the meanings of the words in
       the synsets.  If	specified within a word/pointer	set, the relation cor-
       responds	only to	the word in the	set and	represents a lexical relation.

       A pointer is of the form:

	      [lex_filename: ]word[lex_id],pointer_symbol


	      [lex_filename: ]word[lex_id]^word[lex_id],pointer_symbol

       For pointers, word indicates a word in another synset.  When the	second
       form of a pointer is used, the first word indicates a word  in  a  head
       synset,	and the	second is a word in a satellite	of that	cluster.  word
       may be followed by a lex_id that	is used	to match the  pointer  to  the
       correct	target	synset.	  The synset containing	word may reside	in an-
       other lexicographer file.  In this case,	word is	preceded by  lex_file-
       name as shown.

       See Pointers for	a list of pointer_symbols and their meanings.

   Verb	Frame List Syntax
       Frame  numbers corresponding to generic sentence	frames must be entered
       in each verb synset.  If	 a  frame  list	 is  specified	outside	 of  a
       word/pointer set, the verb frames in the	list apply to all of the words
       in the synset, including	any words  specified  using  the  word/pointer
       syntax.	If specified within a word/pointer set,	the verb frames	in the
       list correspond only to the word	in the set.

       A frame number list is entered as follows:

	      frames:  f_num[,f_num...]

       Where f_num specifies a generic frame number.  See Verb	Frames	for  a
       list of generic sentences and their corresponding frame numbers.

   Gloss Syntax
       A gloss is included in all synsets.  The	lexicographer may enter	a text
       string of any length desired.  A	gloss is simply	a string  enclosed  in
       parentheses  with  no embedded carriage returns.	 It provides a defini-
       tion of what the	synset represents and/or example sentences.

   Special Adjective Syntax
       The syntax for representing antonymous adjective	synsets	requires  sev-
       eral additional conditions.

       The  first word of a head synset	must be	entered	in upper case, and can
       be thought of as	the head word of the head synset.  The word part of  a
       pointer	from  one  head	 synset	to another head	synset within the same
       cluster (usually	an antonym) must also be entered in upper case.	  Usu-
       ally  antonymous	 adjectives  are entered using the word/pointer	syntax
       described in Word Syntax	to indicate a lexical relation.	 There	is  no
       restriction  on	the  number of parts that a cluster may	have, and some
       clusters	have three parts, representing antonymous  triplets,  such  as
       solid, liquid, and gas.

       A  cross-cluster	pointer	may be specified, allowing a head or satellite
       synset to point to a head synset	in a different cluster.	 A cross-clus-
       ter  pointer  is	 indicated by entering the word	part of	the pointer in
       upper case.

       An adjective may	be annotated with a syntactic marker indicating	a lim-
       itation on the syntactic	position the adjective may have	in relation to
       noun that it modifies.  If so marked, the marker	 appears  between  the
       word and	its following comma.  If a lex_id is specified,	the marker im-
       mediately follows it.  The syntactic markers are:
	      (p)		     predicate position
	      (a)		     prenominal	(attributive) position
	      (ip)		     immediately postnominal position

       (Note that these	are hypothetical examples not  found  in  the  WordNet
       lexicographer files.)

       Sample noun synsets:
	      {	canine,	[ dog1,	cat,! ]	pooch, canid,@ }
	      {	collie,	dog1,@ (large multi-colored dog	with pointy nose) }
	      {	hound, hunting_dog, pack,#m dog1,@ }
	      {	dog, }

       Sample verb synsets:
	      {	[ confuse, clarify,! frames: 1 ] blur, obscure,	frames:	8, 10 }
	      {	[ clarify, confuse,! ] make_clear, interpret,@ frames: 8 }
	      {	interpret, construe, understand,@ frames: 8 }

       Sample adjective	clusters:
	      {	[ HOT, COLD,! ]	lukewarm(a), TEPID,^ (hot to the touch)	}
	      {	warm, }
	      {	[ COLD,	HOT,! ]	frigid,	(cold to the touch) }
	      {	freezing, }

       Sample adverb synsets:
	      {	[ basically, adj.all:essential^basic,\ ] [ essentially,	adj.all:basic^fundamental,\ ] (	by one's very nature )}
	      {	pointedly, adj.all:pungent^pointed,\ }
	      {	[ badly, adj.all:bad,\ well,! ]	ill, ("He was badly prepared") }

       grind(1WN),  wnintro(5WN),  lexnames(5WN), wndb(5WN), uniqbeg(7WN), wn-

       Fellbaum, C. (1998), ed.	 "WordNet: An  Electronic  Lexical  Database".
       MIT Press, Cambridge, MA.

WordNet	3.0			   Dec 2006			  WNINPUT(5WN)


Want to link to this manual page? Use this URL:

home | help