Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
MORPHY(7WN)			   WordNettm			   MORPHY(7WN)

       morphy -	discussion of WordNet's	morphological processing

       Although	 only  base  forms  of	words  are  usually stored in WordNet,
       searches	may be done on inflected forms.	 A  set	 of  morphology	 func-
       tions,  Morphy, is applied to the search	string to generate a form that
       is present in WordNet.

       Morphology in WordNet uses two types of processes to try	to convert the
       string  passed  into  one  that	can  be	found in the WordNet database.
       There are lists of inflectional endings,	based on  syntactic  category,
       that can	be detached from individual words in an	attempt	to find	a form
       of the word that	is in WordNet.	There are also exception  list	files,
       one  for	 each  syntactic  category, in which a search for an inflected
       form is done.  Morphy tries to use these	two processes in  an  intelli-
       gent  manner  to	 translate the string passed to	the base form found in
       WordNet.	 Morphy	first checks for exceptions, then uses	the  rules  of
       detachment.  The	Morphy functions are not independent from WordNet. Af-
       ter each	transformation,	WordNet	is searched for	the  resulting	string
       in the syntactic	category specified.

       The  Morphy  functions are passed a string and a	syntactic category.  A
       string is either	a single word or a  collocation.   Since  some	words,
       such  as	 axes  can have	more than one base form	(axe and axis),	Morphy
       works in	the following manner.  The first time that  Morphy  is	called
       with  a	specific  string, it returns a base form.  For each subsequent
       call to Morphy made with	a NULL string argument,	Morphy returns another
       base form.  Whenever Morphy cannot perform a transformation, whether on
       the first call for a word or subsequent calls,  NULL  is	 returned.   A
       transformation  to  a valid English string will return NULL if the base
       form of the string is not in WordNet.

       The morphological functions are found  in  the  WordNet	library.   See
       morph(3WN) for information on using these functions.

   Rules of Detachment
       The following table shows the rules of detachment used by Morphy.  If a
       word ends with one of the suffixes, it is stripped from	the  word  and
       the  corresponding  ending  is added.  Then WordNet is searched for the
       resulting string.  No rules are applicable to adverbs.

				    |	     |
			       POS  | Suffix | Ending
			       NOUN | "s"    | ""
			       NOUN | "ses"  | "s"
			       NOUN | "xes"  | "x"
			       NOUN | "zes"  | "z"
			       NOUN | "ches" | "ch"
			       NOUN | "shes" | "sh"
			       NOUN | "men"  | "man"
			       NOUN | "ies"  | "y"
			       VERB | "s"    | ""
			       VERB | "ies"  | "y"
			       VERB | "es"   | "e"
			       VERB | "es"   | ""
			       VERB | "ed"   | "e"
			       VERB | "ed"   | ""
			       VERB | "ing"  | "e"
			       VERB | "ing"  | ""

			       ADJ  | "er"   | ""
			       ADJ  | "est"  | ""
			       ADJ  | "er"   | "e"
			       ADJ  | "est"  | "e"

   Exception Lists
       There is	one exception list file	for each syntactic category.  The  ex-
       ception	lists  contain	the  morphological transformations for strings
       that are	not regular and	therefore cannot be processed in an  algorith-
       mic  manner.  Each line of an exception list contains an	inflected form
       of a word or collocation, followed by one or more base forms.  The list
       is kept in alphabetical order and a binary search is used to find words
       in these	lists.	See wndb(5WN) for information on the format of the ex-
       ception list files.

   Single Words
       In  general, single words are relatively	easy to	process.  Morphy first
       looks for the word in the exception list.  If it	 is  found  the	 first
       base  form  is  returned.  Subsequent calls with	a NULL argument	return
       additional base forms, if present.  A NULL is returned when  there  are
       no more base forms of the word.

       If  the	word  is  not found in the exception list corresponding	to the
       syntactic category, an algorithmic process using	the rules  of  detach-
       ment  looks  for	 a  matching suffix.  If a matching suffix is found, a
       corresponding ending is	applied	 (sometimes  this  ending  is  a  NULL
       string,	so in effect the suffix	is removed from	the word), and WordNet
       is consulted to see if the resulting word is found in the desired  part
       of speech.

       As  opposed  to	single	words,	collocations can be quite difficult to
       transform into a	base form that is present  in  WordNet.	  In  general,
       only  base  forms  of  words,  even  those comprising collocations, are
       stored in WordNet, such as attorney general.  Transforming the colloca-
       tion  attorneys general	is  then  simply  a matter of finding the base
       forms of	the individual words comprising	the collocation.  This usually
       works  for  nouns, therefore non-conforming nouns, such as customs duty
       are presently entered in	the noun exception list.

       Verb collocations that contain prepositions, such  as  ask for it,  are
       more  difficult.	  As with single words,	the exception list is searched
       first.  If the collocation is not found,	special	code in	Morphy	deter-
       mines whether a verb collocation	includes a preposition.	 If it does, a
       function	is called to try to find the base form in the  following  man-
       ner.   It  is  assumed that the first word in the collocation is	a verb
       and that	the last word is a noun.  The algorithm	then builds  a	search
       string  with the	base forms of the verb and noun, leaving the remainder
       of the collocation (usually just	the preposition, but more words	may be
       involved)  in the middle.  For example, passed asking for it, the data-
       base search would be performed with ask for it, which is	found in Word-
       Net,  and  therefore  returned from Morphy.  If a verb collocation does
       not contain a preposition, then the base	form of	each word in the  col-
       location	is found and WordNet is	searched for the resulting string.

       Hyphenation  also presents special difficulties when searching WordNet.
       It is often a subjective	decision as to whether a word  is  hyphenated,
       joined  as one word, or is a collocation	of several words, and which of
       the various forms are entered  into  WordNet.   When  Morphy  breaks  a
       string  into  "words",  it  looks for both spaces and hyphens as	delim-
       iters.  It also looks for periods in strings and	removes	them if	an ex-
       act  match is not found.	 A search for an abbreviation like oct.	return
       the synset for {	October, Oct }.	 Not every pattern of  hyphenated  and
       collocated  string  is searched for properly, so	it may be advantageous
       to specify several search strings if the	results	of  a  search  attempt
       seem incomplete.

   Special Processing for nouns	ending with 'ful'
       Morphy  contains	 code that searches for	nouns ending with ful and per-
       forms a transformation on the substring preceeding it.  It then appends
       'ful'  back  onto  the resulting	string and returns it. For example, if
       passed the nouns	boxesful, it will return boxful.

       Since  many  noun   collocations	  contains   prepositions,   such   as
       line of products, an algorithm similar to that used for verbs should be
       written for  nouns.   In	 the  present  scheme,	if  Morphy  is	passed
       lines of	products,  the search string becomes line of product, which is
       not in WordNet

       Morphy will allow non-words to be converted to words,  if  they	follow
       one of the rules	described above.  For example, it will happily convert
       plantes to plants.

       WNHOME		   Base	directory for WordNet.	 Default  is  /usr/lo-

       WNSEARCHDIR	   Directory  in  which	 the WordNet database has been
			   installed.  Default is WNHOME/dict.

			   Base	directory for  WordNet.	  Default  is  C:\Pro-
			   gram	Files\WordNet\3.0.

       pos.exc		   morphology exception	lists

       wn(1WN),	wnb(1WN), binsrch(3WN),	morph(3WN), wndb(5WN), wninput(7WN).

WordNet	3.0			   Dec 2006			   MORPHY(7WN)


Want to link to this manual page? Use this URL:

home | help