Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
LT-PROC(1)		FreeBSD	General	Commands Manual		    LT-PROC(1)

     lt-proc --	lexical	processor for Apertium

	     [-a | -b |	-o | -c	| -d | -e | -g | -h | -p | -s |	-t | -v	| -h | -z | -w]
	     [-W] [-N -N] [-L -N] [-i icx_file]	fst_file
	     [input_file [output_file]]

     lt-proc is	the application	responsible for	providing the four lexical
     processing	functionalities:

     +o	 morphological analyser	(option	-a)

     +o	 lexical transfer (option -n)

     +o	 morphological generator (option -g)

     +o	 post-generator	(option	-p)

     It	accomplishes these tasks by reading binary files containing a compact
     and efficient representation of dictionaries (a class of finite-state
     transducers called	augmented letter transducers).	These files are	gener-
     ated by lt-comp(1).

     It	is worth mentioning that some characters (`[', `]', `$', `^', `/',
     `+') are special chars used for format and	encapsulation.	They should be
     escaped if	they have to be	used literally,	for instance: `['...`]'	are
     ignored and the format of a linefeed is `^...$'.

     -a, --analysis
	     Tokenizes the text	in surface forms (lexical units	as they	appear
	     in	texts) and delivers, for each surface form, one	or more	lexi-
	     cal forms consisting of lemma, lexical category and morphological
	     inflection	information.  Tokenization is not straightforward due
	     to	the existence, on the one hand,	of contractions, and, on the
	     other hand, of multi-word lexical units.  For contractions, the
	     system reads in a single surface form and delivers	the corre-
	     sponding sequence of lexical forms.  Multi-word surface forms are
	     analysed in a left-to-right, longest-match	fashion.  Multi-word
	     surface forms may be invariable (such as a	multi-word preposition
	     or	conjunction) or	inflected (for example,	in es, "echaban	de
	     menos", "they missed", is a form of the imperfect indicative
	     tense of the verb "echar de menos", "to miss").  Limited support
	     for some kinds of discontinuous multi-word	units is also avail-
	     able.  Single-word	surface	forms analysis produces	output like
	     the one in	these examples:

	     "cantar" -> "^cantar/cantar<vblex><inf>$" or "daba" ->

     -b, --bilingual
	     Does lexical transference,	attaching queues of morphological sym-
	     bols not specified	in the dictionaries.  As the analysis mode,
	     supports multiple lexical forms in	the target language for	a
	     given lexical form	in the source language.	 Works typically with
	     the output	of apertium-pretransfer(1).

     -o, --surf-bilingual
	     As	with -b, but takes input from apertium-tagger(1) -p, with sur-
	     face forms, and if	the lexical form is not	found in the bilingual
	     dictionary, it outputs the	surface	form of	the word.

     -c, --case-sensitive
	     Use the literal case of the incoming characters

     -d, --debugged-gen
	     Morphological generation with all the stuff

     -e, --decompose-compounds
	     Try to treat unknown words	as compounds, and decompose them.

     -w, --dictionary-case
	     Use the case information contained	in the lexicon,	instead	of the
	     surface case (only	applied	in analysis mode).

     -g, --generation
	     Delivers a	target-language	surface	form for each target-language
	     lexical form, by suitably inflecting it.

     -n, --non-marked-gen
	     Morphological generation (like -g)	but without unknown word marks
	     (asterisk `*').

     -b, --tagged-gen
	     Morphological generation (like -g)	but retaining part-of-speech

     -p, --post-generation
	     Performs orthographical operations	such as	contractions and apos-
	     trophations.  The post-generator is usually dormant (just copies
	     the input to the output) until a special alarm symbol contained
	     in	some target-language surface forms wakes it up to perform a
	     particular	string transformation if necessary; then it goes back
	     to	sleep.

     -s, --sao
	     Input processing is in orthoepikon	(previously sao) annotation
	     system format:

     -t, --transliteration
	     Apply a transliteration dictionary

     -i	icx_file, --ignored-chars icx_file
	     Ignores characters	specified in the file icx_file

     -z, --null-flush
	     Flush output on the null character

     -C, --careful-case
	     Use dictionary case if present, else surface

     -N, --analyses
	     Output no more than N analyses (if	the transducer is weighted,
	     the N best	analyses)

     -L, --weight-classes
	     Output no more than N best	weight classes (where analyses with
	     equal weight constitute a class)

     -W, --show-weights
	     Print final analysis weights (if any)

     -v, --version
	     Display the version number.

     -h, --help
	     Display this help.

	     The input compiled	dictionary.

     apertium(1), apertium-tagger(1), lt-comp(1), lt-expand(1)

     Copyright (C) 2005, 2006 Universitat d'Alacant / Universidad de Alicante.
     This is free software.  You may redistribute copies of it under the terms
     of	the GNU	General	Public License:

     Many... lurking in	the dark and waiting for you!

Apertium			March 23, 2006			      Apertium


Want to link to this manual page? Use this URL:

home | help