Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
lt-proc(1)							    lt-proc(1)

       lt-proc	-  This	 application is	part of	the lexical processing modules
       and tools ( lttoolbox )

       This tool is part of the	 apertium  machine  translation	 architecture:

       lt-proc	[  -a |	-b | -o	| -c | -d | -e | -g | -n | -p |	-s | -t	| -v |
       -h -z -w	] fst_file [input_file [output_file]]

       lt-proc [ --analysis | --bilingual | --surf-bilingual  |	 --case-sensi-
       tive  |	--debugged-gen	|  --decompose-nouns  |	 --generation |	--non-
       marked-gen | --tagged-gen | --post-generation | --sao |	--translitera-
       tion | --null-flush --dictionary-case --decompose-compounds | --version
       | --help	] fst_file [input_file [output_file]]

       lt-proc is the application responsible for providing the	 four  lexical
       processing functionalities

	      o	morphological analyser	( option -a )

	      o	lexical	transfer  ( option -n )

	      o	morphological generator	 ( option -g )

	      o	post-generator	( option -p )

       It  accomplishes	 these tasks by	reading	binary files containing	a com-
       pact and	efficient representation of dictionaries (a class  of  finite-
       state transducers called	augmented letter transducers). These files are
       generated by lt-comp(1).

       It is worth to mention that some	characters (`[', `]', `$',  `^',  `/',
       `+')  are  special chars	used for format	and encapsulation. They	should
       be escaped if they have to be used literally, for  instance:  `['...`]'
       are ignored and the format of a linefeed	is `^...$'.

       -a, --analysis
	      Tokenizes	 the  text in surface forms (lexical units as they ap-
	      pear in texts) and delivers, for each surface form, one or  more
	      lexical  forms consisting	of lemma, lexical category and morpho-
	      logical inflection information. Tokenization is not straightfor-
	      ward  due	 to  the  existence, on	the one	hand, of contractions,
	      and, on the other	hand, of multi-word lexical  units.  For  con-
	      tractions,  the system reads in a	single surface form and	deliv-
	      ers the corresponding sequence of	lexical	forms. Multi-word sur-
	      face  forms are analysed in a left-to-right, longest-match fash-
	      ion. Multi-word surface forms  may  be  invariable  (such	 as  a
	      multi-word  preposition  or conjunction) or inflected (for exam-
	      ple, in es, "echaban de menos", "they missed", is	a form of  the
	      imperfect	 indicative  tense  of	the verb "echar	de menos", "to
	      miss"). Limited support for some kinds of	 discontinuous	multi-
	      word units is also available. Single-word	surface	forms analysis
	      produces output like the one in  these  examples:	  "cantar"  ->
	      `^cantar/cantar_vblex__inf_$'	   or	      `"daba"	    ->

       -b, --bilingual
	      Does lexical transference,  attaching  queues  of	 morphological
	      symbols not specified in the dictionaries. As the	analysis mode,
	      supports multiple	lexical	forms in the  target  language	for  a
	      given  lexical form in the source	language. Works	tipically with
	      the output of apertium-pretransfer.

       -o, --surf-bilingual
	      As with -b, but takes input from apertium-tagger -p , with  sur-
	      face  forms,  and	if the lexical form is not found in the	bilin-
	      gual dictionary, it outputs the surface form of the word.

       -c, --case-sensitive
	      Use the literal case of the incoming characters

       -d, --debugged-gen
	      Morph. generation	with all the stuff

       -e, --decompose-compounds
	      Try to treat unknown words as compounds, and decompose them.

       -w, --dictionary-case
	      Use the case information contained in the	 lexicon,  instead  of
	      the surface case (only applied in	analysis mode).

       -g, --generation
	      Delivers a target-language surface form for each target-language
	      lexical form, by suitably	inflecting it.

       -n, --non-marked-gen
	      Morphological generation (like  -g)  but	without	 unknown  word
	      marks (asterisk `*').

       -b, --tagged-gen
	      Morphological  generation	(like -g) but retaining	part-of-speech

       -p, --post-generation
	      Performs orthographical  operations  such	 as  contractions  and
	      apostrophations.	The  post-generator  is	 usually dormant (just
	      copies the input to the output) until  a	special	 alarm	symbol
	      contained	 in  some target-language surface forms	wakes it up to
	      perform a	particular string transformation if necessary; then it
	      goes back	to sleep.

       -s, --sao
	      Input processing is in orthoepikon (previously `sao') annotation
	      system format:

       -t, --transliteration
	      Apply a transliteration dictionary

       -z, --null-flush
	      Flush output on the null character

       -v, --version
	      Display the version number.

       -h, --help
	      Display this help.

       input_file The input compiled dictionary.

       lt-expand(1), lt-comp(1), apertium-tagger(1), apertium(1).

       Lots of...lurking in the	dark and waiting for you!

       (c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.

				  2006-03-23			    lt-proc(1)


Want to link to this manual page? Use this URL:

home | help