Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
RECOLL.CONF(5)		      File Formats Manual		RECOLL.CONF(5)

       recoll.conf - main personal configuration file for Recoll

       This  file  defines  the	 index	configuration for the Recoll full-text
       search system.

       The system-wide configuration file is normally located inside /usr/[lo-
       cal]/share/recoll/examples. Any parameter set in	the common file	may be
       overridden by setting it	in the personal	 configuration	file,  by  de-
       fault: $HOME/.recoll/recoll.conf

       Please note while I try to keep this manual page	reasonably up to date,
       it will frequently lag the current state	 of  the  software.  The  best
       source  of  information about the configuration are the comments	in the
       system-wide configuration file or the user manual which you can	access
       from the	recoll GUI help	menu or	on the recoll web site.

       A short extract of the file might look as follows:

	      #	Space-separated	list of	directories to index.
	      topdirs =	 ~/docs	/usr/share/doc

	      defaultcharset = utf-8

       There are three kinds of	lines:

	      o	     Comment or	empty

	      o	     Parameter affectation

	      o	     Section definition

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in	the form 'name = value'.

       Section	lines  allow  redefining  a parameter for a directory subtree.
       Some of the parameters used for indexing	are looked  up	hierarchically
       from  the more to the less specific. Not	all parameters can be meaning-
       fully redefined,	this is	specified for each in the next section.

       The tilde character (~) is expanded in file names to the	 name  of  the
       user's home directory.

       Where  values  are  lists, white	space is used for separation, and ele-
       ments with embedded spaces can be quoted	with double-quotes.

       topdirs = string
	      Space-separated list of files or directories to recursively  in-
	      dex. Default to ~	(indexes $HOME). You can use symbolic links in
	      the list,	they will be followed, independently of	the  value  of
	      the followLinks variable.

       monitordirs = string
	      Space-separated  list of files or	directories to monitor for up-
	      dates. When running the real-time	indexer, this allows  monitor-
	      ing  only	 a subset of the whole indexed area. The elements must
	      be included in the tree defined by the 'topdirs' members.

       skippedNames = string
	      Files and	directories which should be ignored.  White space sep-
	      arated  list  of wildcard	patterns (simple ones, not paths, must
	      contain no / ), which will be tested against file	and  directory
	      names.   The  list in the	default	configuration does not exclude
	      hidden directories (names	beginning with	a  dot),  which	 means
	      that  it	may  index quite a few things that you do not want. On
	      the other	hand, email user agents	like Thunderbird usually store
	      messages	in  hidden directories,	and you	probably want this in-
	      dexed. One possible solution is to have ".*" in  "skippedNames",
	      and   add	  things   like	  "~/.thunderbird"  "~/.evolution"  to
	      "topdirs".  Not even the file names are indexed for patterns  in
	      this  list, see the "noContentSuffixes" variable for an alterna-
	      tive approach which indexes the file names. Can be redefined for
	      any subtree.

       skippedNames- = string
	      List  of	name  endings  to remove from the default skippedNames

       skippedNames+ = string
	      List of name endings to add to the default skippedNames list.

       noContentSuffixes = string
	      List of name endings (not	 necessarily  dot-separated  suffixes)
	      for  which  we don't try MIME type identification, and don't un-
	      compress or index	content. Only the names	will be	indexed.  This
	      complements  the	now  obsoleted	recoll_noindex	list  from the
	      mimemap file, which will go away in a future release  (the  move
	      from  mimemap to recoll.conf allows editing the list through the
	      GUI). This is different from skippedNames	because	these are name
	      ending  matches  only (not wildcard patterns), and the file name
	      itself gets indexed normally. This can be	redefined  for	subdi-

       noContentSuffixes- = string
	      List  of	name  endings to remove	from the default noContentSuf-
	      fixes list.

       noContentSuffixes+ = string
	      List of name endings to add  to  the  default  noContentSuffixes

       skippedPaths = string
	      Absolute	paths  we  should not go into. Space-separated list of
	      wildcard expressions for absolute	filesystem paths. Must be  de-
	      fined  at	the top	level of the configuration file, not in	a sub-
	      section. Can contain files and  directories.  The	 database  and
	      configuration  directories  will automatically be	added. The ex-
	      pressions	are matched using 'fnmatch(3)' with  the  FNM_PATHNAME
	      flag  set	 by  default.  This  means that	'/' characters must be
	      matched explicitly. You can set 'skippedPathsFnmPathname'	 to  0
	      to  disable the use of FNM_PATHNAME (meaning that	'/*/dir3' will
	      match '/dir1/dir2/dir3').	The default value contains  the	 usual
	      mount  point  for	removable media	to remind you that it is a bad
	      idea to have Recoll work on these	(esp. with the monitor:	 media
	      gets indexed on mount, all data gets erased on unmount). Explic-
	      itly adding '/media/xxx' to the 'topdirs'	variable will override

       skippedPathsFnmPathname = bool
	      Set  to  0  to override use of FNM_PATHNAME for matching skipped

       nowalkfn	= string
	      File name	which will cause its parent directory to  be  skipped.
	      Any  directory  containing a file	with this name will be skipped
	      as if it was part	of the skippedPaths list. Ex: .recoll-noindex

       daemSkippedPaths	= string
	      skippedPaths equivalent specific to real time indexing. This en-
	      ables  having  parts of the tree which are initially indexed but
	      not monitored. If	daemSkippedPaths is not	set, the  daemon  uses

       zipUseSkippedNames = bool
	      Use  skippedNames	 inside	 Zip archives. Fetched directly	by the
	      rclzip handler. Skip the patterns	defined	by skippedNames	inside
	      Zip   archives.	Can  be	 redefined  for	 subdirectories.   See

       zipSkippedNames = string
	      Space-separated  list  of	 wildcard  expressions	for names that
	      should be	ignored	inside zip archives. This is used directly  by
	      the  zip	handler. If zipUseSkippedNames is not set, zipSkipped-
	      Names defines the	patterns to be skipped inside archives.	If zi-
	      pUseSkippedNames	is  set,  the  two  lists are concatenated and
	      used. Can	be redefined for subdirectories.  See https://www.les-

       followLinks = bool
	      Follow symbolic links during indexing. The default is to	ignore
	      symbolic	links  to  avoid multiple indexing of linked files. No
	      effort is	made to	avoid duplication when this option is  set  to
	      true.  This  option  can	be  set	 individually  for each	of the
	      'topdirs'	members	by using sections. It can not be changed below
	      the  'topdirs' level. Links in the 'topdirs' list	itself are al-
	      ways followed.

       indexedmimetypes	= string
	      Restrictive list of indexed mime types.  Normally	 not  set  (in
	      which  case all supported	types are indexed). If it is set, only
	      the types	from the list will have	their  contents	 indexed.  The
	      names  will  be  indexed anyway if indexallfilenames is set (de-
	      fault). MIME type	names should be	taken from  the	 mimemap  file
	      (the  values may be different from xdg-mime or file -i output in
	      some cases). Can be redefined for	subtrees.

       excludedmimetypes = string
	      List of excluded MIME types. Lets	you exclude  some  types  from
	      indexing.	 MIME type names should	be taken from the mimemap file
	      (the values may be different from	xdg-mime or file -i output  in
	      some cases) Can be redefined for subtrees.

       nomd5types = string
	      Don't  compute  md5 for these types. md5 checksums are used only
	      for deduplicating	results, and can be very expensive to  compute
	      on  multimedia  or  other	big files. This	list lets you turn off
	      md5 computation for selected types. It is	global	(no  redefini-
	      tion for subtrees). At the moment, it only has an	effect for ex-
	      ternal handlers (exec and	execm).	The file types can  be	speci-
	      fied  by	listing	either MIME types (e.g.	audio/mpeg) or handler
	      names (e.g. rclaudio).

       compressedfilemaxkbs = int
	      Size limit for compressed	files. We need to decompress these  in
	      a	 temporary directory for identification, which can be wasteful
	      in some cases. Limit the waste. Negative means no	limit.	0  re-
	      sults in no processing of	any compressed file. Default 50	MB.

       textfilemaxmbs =	int
	      Size limit for text files. Mostly	for skipping monster logs. De-
	      fault 20 MB.

       indexallfilenames = bool
	      Index the	file names of unprocessed files	 Index	the  names  of
	      files  the  contents  of	which we don't index because of	an ex-
	      cluded or	unsupported MIME type.

       usesystemfilecommand = bool
	      Use a system command for file MIME type guessing as a final step
	      in  file	type identification This is generally useful, but will
	      usually cause the	indexing of many bogus 'text' files. See 'sys-
	      temfilecommand' for the command used.

       systemfilecommand = string
	      Command  used  to	guess MIME types if the	internal methods fails
	      This should be a "file -i" workalike.  The  file	path  will  be
	      added  as	a last parameter to the	command	line. "xdg-mime" works
	      better than the traditional "file" command, and is now the  con-
	      figured default (with a hard-coded fallback to "file")

       processwebqueue = bool
	      Decide  if  we  process  the Web queue. The queue	is a directory
	      where the	Recoll Web browser plugins create the copies  of  vis-
	      ited pages.

       textfilepagekbs = int
	      Page  size for text files. If this is set, text/plain files will
	      be divided into documents	of approximately this size.  Will  re-
	      duce  memory  usage  at index time and help with loading data in
	      the preview window at query time.	Particularly useful with  very
	      big  files,  such	 as  application  or  system  logs.  Also  see
	      textfilemaxmbs and compressedfilemaxkbs.

       membermaxkbs = int
	      Size limit for archive members. This is passed to	the filters in
	      the environment as RECOLL_FILTER_MAXMEMBERKB.

       indexStripChars = bool
	      Decide  if  we store character case and diacritics in the	index.
	      If we do,	searches sensitive to case and diacritics can be  per-
	      formed,  but  the	index will be bigger, and some marginal	weird-
	      ness may sometimes occur.	The default is a stripped index.  When
	      using  multiple indexes for a search, this parameter must	be de-
	      fined identically	for all. Changing the value implies  an	 index

       indexStoreDocText = bool
	      Decide  if  we  store  the documents' text content in the	index.
	      Storing the text allows extracting snippets  from	 it  at	 query
	      time,  instead of	building them from index position data.	 Newer
	      Xapian index formats have	rendered our use of positions list un-
	      acceptably slow in some cases. The last Xapian index format with
	      good performance for the old method is Chert, which  is  default
	      for  1.2,	 still	supported  but	not default in 1.4 and will be
	      dropped in 1.6.  The stored document text	is translated from its
	      original	format to UTF-8	plain text, but	not stripped of	upper-
	      case, diacritics,	or punctuation signs. Storing it increases the
	      index  size by 10-20% typically, but also	allows for nicer snip-
	      pets, so it may be worth enabling	it even	if not strictly	needed
	      for  performance if you can afford the space.  The variable only
	      has an effect when creating an index, meaning that the  xapiandb
	      directory	 must  not  exist yet. Its exact effect	depends	on the
	      Xapian version.  For Xapian 1.4, if the variable is  set	to  0,
	      the  Chert format	will be	used, and the text will	not be stored.
	      If the variable is 1, Glass will be used,	and the	 text  stored.
	      For  Xapian 1.2, and for versions	after 1.5 and newer, the index
	      format is	always the default, but	the variable controls  if  the
	      text  is stored or not, and the abstract generation method. With
	      Xapian 1.5 and later, and	the variable set to 0, abstract	gener-
	      ation  may be very slow, but this	setting	may still be useful to
	      save space if you	do not use abstract generation at all.

       nonumbers = bool
	      Decides if terms will be	generated  for	numbers.  For  example
	      "123",  "1.5e6",, would not be indexed if nonumbers
	      is set ("value123" would still be). Numbers are often quite  in-
	      teresting	to search for, and this	should probably	not be set ex-
	      cept for special situations, ie, scientific documents with  huge
	      amounts  of numbers in them, where setting nonumbers will	reduce
	      the index	size. This can only be set for a whole index, not  for
	      a	subtree.

       dehyphenate = bool
	      Determines  if  we  index	'coworker' also	when the input is 'co-
	      worker'. This is new in version 1.22, and	on by default. Setting
	      the variable to off allows restoring the previous	behaviour.

       backslashasletter = bool
	      Process  backslash as normal letter This may make	sense for peo-
	      ple wanting to index TeX commands	as such	but  is	 not  of  much
	      general use.

       maxtermlength = int
	      Maximum  term  length. Words longer than this will be discarded.
	      The default is 40	and used to be hard-coded, but it can  now  be
	      adjusted.	You need an index reset	if you change the value.

       nocjk = bool
	      Decides if specific East Asian (Chinese Korean Japanese) charac-
	      ters/word	splitting is turned off. This will save	a small	amount
	      of  CPU if you have no CJK documents. If your document base does
	      include such text	but you	are not	interested  in	searching  it,
	      setting nocjk may	be a significant time and space	saver.

       cjkngramlen = int
	      This  lets  you adjust the size of n-grams used for indexing CJK
	      text. The	default	value of 2 is  probably	 appropriate  in  most
	      cases. A value of	3 would	allow more precision and efficiency on
	      longer words, but	the  index  will  be  approximately  twice  as

       indexstemminglanguages =	string
	      Languages	 for  which to create stemming expansion data. Stemmer
	      names can	be found by executing 'recollindex -l',	 or  this  can
	      also be set from a list in the GUI.

       defaultcharset =	string
	      Default  character set. This is used for files which do not con-
	      tain a character set definition (e.g.: text/plain). Values found
	      inside files, e.g. a 'charset' tag in HTML documents, will over-
	      ride it. If this is not set, the default character  set  is  the
	      one  defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG),
	      or ultimately iso-8859-1 (cp-1252	in fact).  If for some	reason
	      you want a general default which does not	match your LANG	and is
	      not 8859-1, use this variable. This can  be  redefined  for  any

       unac_except_trans = string
	      A	 list of characters, encoded in	UTF-8, which should be handled
	      specially	when converting	text to	unaccented lowercase. For  ex-
	      ample, in	Swedish, the letter a with diaeresis has full alphabet
	      citizenship and should not be turned into	an a.  Each element in
	      the space-separated list has the special character as first ele-
	      ment and the translation following. The  handling	 of  both  the
	      lowercase	and upper-case versions	of a character should be spec-
	      ified, as	appartenance to	the list will turn-off	both  standard
	      accent and case processing. The value is global and affects both
	      indexing and querying.  Examples:	Swedish:  unac_except_trans  =
	      AxAx  AAx	 A<paragraph>A<paragraph> AA<paragraph>	A1/4A1/4 AA1/4
	      Ass Aoe Aoe A|ae Aae i~ff	i~fi i~fl AYAY AAY unac_except_trans =
	      AxAx  AAx	 A<paragraph>A<paragraph> AA<paragraph>	A1/4A1/4 AA1/4
	      Ass Aoe Aoe A|ae Aae i~ff	i~fi i~fl In French, you probably want
	      to decompose oe and ae and nobody	would type a German A unac_ex-
	      cept_trans = Ass Aoe Aoe A|ae Aae	i~ff i~fi i~fl	are  not  per-
	      formed  by  unac,	but it is unlikely that	someone	would type the
	      composed forms in	a search.  unac_except_trans  =	 Ass  Aoe  Aoe
	      A|ae Aae i~ff i~fi i~fl

       maildefcharset =	string
	      Overrides	 the  default  character  set for email	messages which
	      don't specify one. This is mainly	useful	for  readpst  (libpst)
	      dumps, which are utf-8 but do not	say so.

       localfields = string
	      Set  fields on all files (usually	of a specific fs area).	Syntax
	      is the usual: name = value ; attr1 =  val1  ;  [...]   value  is
	      empty so this needs an initial semi-colon. This is useful, e.g.,
	      for setting the rclaptg field for	application  selection	inside

       testmodifusemtime = bool
	      Use  mtime instead of ctime to test if a file has	been modified.
	      The time is used in addition to the size,	which is always	 used.
	      Setting  this  can  reduce re-indexing on	systems	where extended
	      attributes are used (by some other  application),	 but  not  in-
	      dexed,  because changing extended	attributes only	affects	ctime.
	      Notes: - This may	prevent	detection of change in	some  marginal
	      file  rename  cases (the target would need to have the same size
	      and mtime).  - You should	probably also set noxattrfields	 to  1
	      in this case, except if you still	prefer to perform xattr	index-
	      ing, for example if the local file update	pattern	 makes	it  of
	      value  (as  in  general,	there  is a risk for pure extended at-
	      tributes updates without file modification  to  go  undetected).
	      Perform a	full index reset after changing	this.

       noxattrfields = bool
	      Disable  extended	attributes conversion to metadata fields. This
	      probably needs to	be set if testmodifusemtime is set.

       metadatacmds = string
	      Define commands to gather	external  metadata,  e.g.  tmsu	 tags.
	      There  can  be  several  entries,	separated by semi-colons, each
	      defining which field name	the data goes into and the command  to
	      use.  Don't  forget  the initial semi-colon. All the field names
	      must be different. You can use aliases in	the  "field"  file  if
	      necessary.   As  a  not too pretty hack conceded to convenience,
	      any field	name beginning with "rclmulti" will be taken as	an in-
	      dication that the	command	returns	multiple field values inside a
	      text blob	formatted as a recoll configuration file ("fieldname =
	      fieldvalue"  lines).  The	 rclmultixx  name will be ignored, and
	      field names and values will be parsed from the  data.   Example:
	      metadatacmds = ; tags = tmsu tags	%f; rclmulti1 =	cmdOutputsConf

       cachedir	= dfn
	      Top directory for	Recoll data. Recoll data directories are  nor-
	      mally  located  relative	to  the	 configuration directory (e.g.
	      ~/.recoll/xapiandb, ~/.recoll/mboxcache).	If 'cachedir' is  set,
	      the  directories	are  stored  under the specified value instead
	      (e.g. if cachedir	is ~/.cache/recoll, the	default	dbdir would be
	      ~/.cache/recoll/xapiandb).   This	 affects  dbdir,  webcachedir,
	      mboxcachedir, aspellDicDir,  which  can  still  be  individually
	      specified	 to override cachedir.	Note that if you have multiple
	      configurations, each must	have a different cachedir, there is no
	      automatic	computation of a subpath under cachedir.

       maxfsoccuppc = int
	      Maximum  file system occupation over which we stop indexing. The
	      value is a percentage, corresponding to what the	"Capacity"  df
	      output  column  shows. The default value is 0, meaning no	check-

       dbdir = dfn
	      Xapian database directory	location.  This	 will  be  created  on
	      first indexing. If the value is not an absolute path, it will be
	      interpreted as relative to cachedir if set, or the configuration
	      directory	(-c argument or	$RECOLL_CONFDIR).  If nothing is spec-
	      ified, the default is then ~/.recoll/xapiandb/

       idxstatusfile = fn
	      Name of the scratch file where the indexer process  updates  its
	      status.  Default:	 idxstatus.txt inside the configuration	direc-

       mboxcachedir = dfn
	      Directory	location for storing mbox message offsets cache	files.
	      This  is normally	'mboxcache' under cachedir if set, or else un-
	      der the configuration directory, but it may be useful to share a
	      directory	between	different configurations.

       mboxcacheminmbs = int
	      Minimum mbox file	size over which	we cache the offsets. There is
	      really no	sense in caching offsets for small files. The  default
	      is 5 MB.

       webcachedir = dfn
	      Directory	 where	we  store the archived web pages. This is only
	      used by the web history indexing code Default: cachedir/webcache
	      if cachedir is set, else $RECOLL_CONFDIR/webcache

       webcachemaxmbs =	int
	      Maximum  size in MB of the Web archive. This is only used	by the
	      web history indexing code.  Default: 40 MB.  Reducing  the  size
	      will not physically truncate the file.

       webqueuedir = fn
	      The  path	 to the	Web indexing queue. This used to be hard-coded
	      in the old plugin	as ~/.recollweb/ToIndex	so there would	be  no
	      need  or	possibility to change it, but the WebExtensions	plugin
	      now downloads the	files to the user Downloads directory,	and  a
	      script  moves  them  to webqueuedir. The script reads this value
	      from the config so it has	become possible	to change it.

       webdownloadsdir = fn
	      The path to browser downloads directory. This is where  the  new
	      browser  add-on extension	has to create the files. They are then
	      moved by a script	to webqueuedir.

       aspellDicDir = dfn
	      Aspell dictionary	storage	directory location. The	aspell dictio-
	      nary  (aspdict.(lang).rws)  is  normally stored in the directory
	      specified	by cachedir if set, or under the configuration	direc-

       filtersdir = dfn
	      Directory	location for executable	input handlers.	If RECOLL_FIL-
	      TERSDIR is set in	the environment, we use	it  instead.  Defaults
	      to  $prefix/share/recoll/filters.	Can be redefined for subdirec-

       iconsdir	= dfn
	      Directory	location for icons. The	only  reason  to  change  this
	      would be if you want to change the icons displayed in the	result
	      list. Defaults to	$prefix/share/recoll/images

       idxflushmb = int
	      Threshold	(megabytes of new data)	where we flush from memory  to
	      disk  index.  Setting this allows	some control over memory usage
	      by the indexer process. A	value of 0 means no explicit flushing,
	      which  lets Xapian perform its own thing,	meaning	flushing every
	      $XAPIAN_FLUSH_THRESHOLD documents	created, modified or  deleted:
	      as memory	usage depends on average document size,	not only docu-
	      ment count, the Xapian approach is is not	very useful,  and  you
	      should let Recoll	manage the flushes. The	program	compiled value
	      is 0. The	configured default value (from this file)  is  now  50
	      MB, and should be	ok in many cases.  You can set it as low as 10
	      to conserve memory, but if you are looking  for  maximum	speed,
	      you may want to experiment with values between 20	and 200. In my
	      experience, values beyond	this are always	counterproductive.  If
	      you find otherwise, please drop me a note.

       filtermaxseconds	= int
	      Maximum  external	filter execution time in seconds. Default 1200
	      (20mn). Set to 0 for no limit. This is mainly to avoid  infinite
	      loops in postscript files	(

       filtermaxmbytes = int
	      Maximum	virtual	 memory	 space	for  filter  processes	(setr-
	      limit(RLIMIT_AS)), in megabytes. Note  that  this	 includes  any
	      mapped  libs  (there  is no reliable Linux way to	limit the data
	      space only), so we need to be a bit generous here. Anything over
	      2000 will	be ignored on 32 bits machines.

       thrQSizes = string
	      Stage  input  queues  configuration.  There  are	three internal
	      queues in	the indexing pipeline stages  (file  data  extraction,
	      terms  generation,  index	 update).  This	 parameter defines the
	      queue depths for each stage (three integer values). If  a	 value
	      of  -1  is  given	 for  a	given stage, no	queue is used, and the
	      thread will go on	performing the next stage. In  practise,  deep
	      queues  have  not	been shown to increase performance. Default: a
	      value of 0 for the first queue tells Recoll to perform  autocon-
	      figuration based on the detected number of CPUs (no need for the
	      two other	values in this case).  Use thrQSizes =	-1  -1	-1  to
	      disable multithreading entirely.

       thrTCounts = string
	      Number of	threads	used for each indexing stage. The three	stages
	      are: file	data extraction, terms generation, index update).  The
	      use  of  the counts is also controlled by	some special values in
	      thrQSizes: if the	first queue depth is 0,	all counts are ignored
	      (autoconfigured);	 if  a	value of -1 is used for	a queue	depth,
	      the corresponding	thread count is	ignored. It makes no sense  to
	      use a value other	than 1 for the last stage because updating the
	      Xapian index is necessarily single-threaded (and protected by  a

       loglevel	= int
	      Log  file	verbosity 1-6. A value of 2 will print only errors and
	      warnings.	3 will print information like document updates,	 4  is
	      quite verbose and	6 very verbose.

       logfilename = fn
	      Log  file	 destination.  Use  'stderr' (default) to write	to the

       idxloglevel = int
	      Override loglevel	for the	indexer.

       idxlogfilename =	fn
	      Override logfilename for the indexer.

       daemloglevel = int
	      Override loglevel	for the	indexer	in real	time mode. The default
	      is to use	the idx... values if set, else the log... values.

       daemlogfilename = fn
	      Override	logfilename for	the indexer in real time mode. The de-
	      fault is to use the idx... values	if set,	else the  log...  val-

       orgidxconfdir = dfn
	      Original	location  of the configuration directory. This is used
	      exclusively for movable datasets.	Locating the configuration di-
	      rectory  inside  the directory tree makes	it possible to provide
	      automatic	query time path	translations once  the	data  set  has
	      moved (for example, because it has been mounted on another loca-

       curidxconfdir = dfn
	      Current location	of  the	 configuration	directory.  Complement
	      orgidxconfdir  for  movable datasets. This should	be used	if the
	      configuration directory has been copied from the dataset to  an-
	      other  location,	either	because	the dataset is readonly	and an
	      r/w copy is desired, or for performance  reasons.	 This  records
	      the  original moved location before copy,	to allow path transla-
	      tion computations.  For example if a dataset originally  indexed
	      as  '/home/me/mydata/config'  has	been mounted to	'/media/me/my-
	      data', and the GUI  is  running  from  a	copied	configuration,
	      orgidxconfdir  would  be	'/home/me/mydata/config',  and curidx-
	      confdir (as set in the copied configuration) would be

       idxrundir = dfn
	      Indexing process current directory. The input handlers sometimes
	      leave  temporary	files  in  the	current	directory, so it makes
	      sense to have recollindex	chdir to some temporary	directory.  If
	      the value	is empty, the current directory	is not changed.	If the
	      value is (literal) tmp, we use the temporary directory as	set by
	      the  environment	(RECOLL_TMPDIR	else TMPDIR else /tmp).	If the
	      value is an absolute path	to a directory,	we go there.

       checkneedretryindexscript = fn
	      Script used to heuristically check if we need to retry  indexing
	      files  which  previously	failed.	 The default script checks the
	      modified dates on	/usr/bin and /usr/local/bin. A	relative  path
	      will  be looked up in the	filters	dirs, then in the path.	Use an
	      absolute path to do otherwise.

       recollhelperpath	= string
	      Additional places	to search for helper executables. This is only
	      used on Windows for now.

       idxabsmlen = int
	      Length  of  abstracts  we	store while indexing. Recoll stores an
	      abstract for each	indexed	file.  The text	can come from  an  ac-
	      tual  'abstract' section in the document or will just be the be-
	      ginning of the document. It is stored in the index  so  that  it
	      can  be  displayed  inside the result lists without decoding the
	      original file. The idxabsmlen parameter defines the size of  the
	      stored  abstract.	The default value is 250 bytes.	The search in-
	      terface gives you	the choice to display this stored  text	 or  a
	      synthetic	 abstract  built  by extracting	text around the	search
	      terms. If	you always prefer the synthetic	abstract, you can  re-
	      duce this	value and save a little	space.

       idxmetastoredlen	= int
	      Truncation  length  of stored metadata fields. This does not af-
	      fect indexing (the whole field is	processed  anyway),  just  the
	      amount of	data stored in the index for the purpose of displaying
	      fields inside result lists or previews. The default value	is 150
	      bytes which may be too low if you	have custom fields.

       idxtexttruncatelen = int
	      Truncation  length for all document texts. Only index the	begin-
	      ning of documents. This is not recommended  except  if  you  are
	      sure  that  the interesting keywords are at the top and have se-
	      vere disk	space issues.

       aspellLanguage =	string
	      Language definitions to use when creating	the aspell dictionary.
	      The  value must match a set of aspell language definition	files.
	      You can type "aspell dicts"  to see a list The default  if  this
	      is not set is to use the NLS environment to guess	the value.

       aspellAddCreateParam = string
	      Additional  option  and  parameter to aspell dictionary creation
	      command. Some aspell packages  may  need	an  additional	option
	      (e.g.  on	 Debian	Jessie:	--local-data-dir=/usr/lib/aspell). See
	      Debian bug 772415.

       aspellKeepStderr	= bool
	      Set this to have a look at aspell	 dictionary  creation  errors.
	      There are	always many, so	this is	mostly for debugging.

       noaspell	= bool
	      Disable aspell use. The aspell dictionary	generation takes time,
	      and some combinations of aspell  version,	 language,  and	 local
	      terms, result in aspell crashing,	so it sometimes	makes sense to
	      just disable the thing.

       monauxinterval =	int
	      Auxiliary	database update	interval. The real time	 indexer  only
	      updates  the  auxiliary databases	(stemdb, aspell) periodically,
	      because it would be too costly  to  do  it  for  every  document
	      change. The default period is one	hour.

       monixinterval = int
	      Minimum  interval	 (seconds) between processings of the indexing
	      queue. The real time indexer does	not process each event when it
	      comes  in,  but  lets the	queue accumulate, to diminish overhead
	      and to aggregate multiple	events affecting the  same  file.  De-
	      fault 30 S.

       mondelaypatterns	= string
	      Timing  parameters  for  the real	time indexing. Definitions for
	      files which get a	longer delay  before  reindexing  is  allowed.
	      This  is	for fast-changing files, that should only be reindexed
	      once in a	while. A list of  wildcardPattern:seconds  pairs.  The
	      patterns	are  matched  with  fnmatch(pattern,  path, 0) You can
	      quote entries containing white space with	double	quotes	(quote
	      the  whole entry,	not the	pattern). The default is empty.	 Exam-
	      ple: mondelaypatterns = *.log:20 "*with spaces.*:30"

       monioniceclass =	int
	      ionice class for the real	time  indexing	process	 On  platforms
	      where this is supported. The default value is 3.

       monioniceclassdata = string
	      ionice  class  parameter	for the	real time indexing process. On
	      platforms	where this is supported. The default is	empty.

       autodiacsens = bool
	      auto-trigger diacritics sensitivity (raw index only). IF the in-
	      dex is not stripped, decide if we	automatically trigger diacrit-
	      ics sensitivity if the search term has accented characters  (not
	      in  unac_except_trans).  Else you	need to	use the	query language
	      and the "D" modifier to specify diacritics sensitivity.  Default
	      is no.

       autocasesens = bool
	      auto-trigger  case sensitivity (raw index	only). IF the index is
	      not stripped (see	indexStripChars), decide if  we	 automatically
	      trigger character	case sensitivity if the	search term has	upper-
	      case characters in any but the first position. Else you need  to
	      use  the	query language and the "C" modifier to specify charac-
	      ter-case sensitivity. Default is yes.

       maxTermExpand = int
	      Maximum query expansion count for	a single term (e.g.: when  us-
	      ing wildcards). This only	affects	queries, not indexing. We used
	      to not limit this	at all (except for filenames where  the	 limit
	      was  too	low at 1000), but it is	unreasonable with a big	index.
	      Default 10000.

       maxXapianClauses	= int
	      Maximum number of	clauses	we add to a single Xapian query.  This
	      only affects queries, not	indexing. In some cases, the result of
	      term expansion can be multiplicative, and	we want	to avoid  eat-
	      ing all the memory. Default 50000.

       snippetMaxPosWalk = int
	      Maximum  number  of positions we walk while populating a snippet
	      for the result list. The default of 1,000,000  may  be  insuffi-
	      cient  for very big documents, the consequence would be snippets
	      with possibly meaning-altering missing words.

       pdfocr =	bool
	      Attempt OCR of PDF files with no text content if both  tesseract
	      and pdftoppm are installed. The default is off because OCR is so
	      very slow.

       pdfocrlang = string
	      Language to assume for PDF OCR. This is very important for  hav-
	      ing a reasonable rate of errors with tesseract. This can also be
	      set through a configuration variable or directory-local  parame-
	      ters. See	the script.

       pdfattach = bool
	      Enable  PDF  attachment extraction by executing pdftk (if	avail-
	      able). This is normally disabled,	because	it does	slow down  PDF
	      indexing a bit even if not one attachment	is ever	found.

       pdfextrameta = string
	      Extract  text  from selected XMP metadata	tags. This is a	space-
	      separated	list of	qualified XMP tag names. Each element can also
	      include a	translation to a Recoll	field name, separated by a '|'
	      character. If the	second element is absent, the tag name is used
	      as  the Recoll field names. You will also	need to	add specifica-
	      tions to the "fields" file to direct processing of the extracted

       pdfextrametafix = fn
	      Define  name  of XMP field editing script. This defines the name
	      of a script to be	loaded	for  editing  XMP  field  values.  The
	      script should define a 'MetaFixer' class with a metafix()	method
	      which will be called with	the qualified tag name	and  value  of
	      each  selected  field, for editing or erasing. A new instance is
	      created for each document, so that the  object  can  keep	 state
	      for, e.g.	eliminating duplicate values.

       mhmboxquirks = string
	      Enable thunderbird/mozilla-seamonkey mbox	format quirks Set this
	      for the directory	where the email	mbox files are stored.

       recollindex(1) recoll(1)

			       14 November 2012			RECOLL.CONF(5)


Want to link to this manual page? Use this URL:

home | help