Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
apertium(1)							   apertium(1)

       apertium	- This application is part of (	apertium )

       This  tool  is  part  of	the apertium machine translation architecture:

       apertium	[-d datadir] [-f format]  [-u]	[-a]  {language-pair}  [infile

       apertium	 is  the application that most people will be using as it sim-
       plifies the use of apertium/lt-toolbox tools  for  machine  translation

       This  tool  tries to ease the use of lt-toolbox (which contains all the
       lexical processing modules and tools) and apertium (which contains  the
       rest of the engine) by providing	a unique front-end to the end-user.

       The different modules behind the	apertium machine translation architec-
       ture are	in order:
	      o	de-formatter: Separates	the text to  be	 translated  from  the
	      format information.

	      o	morphological-analyser:	Tokenizes the text in surface forms.

	      o	 part-of-speech	 tagger: Chooses one surface forms among homo-

	      o	lexical	transfer module: Reads	each  source-language  lexical
	      form and delivers	a corresponding	target-language	lexical	form.

	      o	 structural  transfer module: Detects fixed-length patterns of
	      lexical forms (chunks or phrases)	needing	special	processing due
	      to  grammatical  divergences  between the	two languages and per-
	      forms the	corresponding transformations.

	      o	morphological generator: Delivers  a  target-language  surface
	      form for each target-language lexical form, by suitably inflect-
	      ing it.

	      o	post-generator:	Performs  orthographical  operations  such  as
	      contractions and apostrophations.

	      o	 re-formatter: Restores	the format information encapsulated by
	      the de-formatter into the	translated text	and removes the	encap-
	      sulation	sequences  used	 to  protect certain characters	in the
	      source text.

       -d datadir The directory	holding	the linguistic data.   By  default  it
       will used the expected installation path.

       language-pair The language pair:	LANG1-LANG2 (for instance es-ca	or ca-

       -f format Specifies the format of the input and output files which  can
       have these values:
	      o	txt (default value) Input and output files are in text format.

	      o	 html Input and	output files are in "html" format. This	"html"
	      is the one accepted by the vast majority of web browsers.

	      o	html-noent Input and output files are in  "html"  format,  but
	      preserving  native  encoding  characters	rather than using HTML
	      text entities.

	      o	rtf Input and output files are in "rtf"	format.	 The  accepted
	      "rtf"  is	 the one generated by Microsoft	WordPad	(C) and	Micro-
	      soft Office (C) up to and	including Office-97.

       -u Disable marking of unknown words with	the '*'	character.

       -a Enable marking of disambiguated words	with the '=' character.

       These are the two files that can	be used	with this command:

       -m memory.tmx use a translation memory to recycle translations

       -o direction translation	direction using	the translation	memory,	by de-
       fault 'direction' is used instead

       -l lists	the available translation directions and exits direction typi-
       cally, LANG1-LANG2, but see modes.xml in	language data

       infile Input file (stdin	by default).

       outfile Output file (stdout by default).

       lt-proc(1), lt-comp(1), lt-expand(1), apertium-tagger(1).

       Lots of...lurking in the	dark and waiting for you!

       (c) 2005,2006 Universitat d'Alacant  /  Universidad  de	Alicante.  All
       rights reserved.

				  2006-03-08			   apertium(1)


Want to link to this manual page? Use this URL:

home | help