Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
INDEXER.CONF(5)		 mnoGoSearch reference manual	       INDEXER.CONF(5)

NAME
       indexer.conf - configuration file for indexer

DESCRIPTION
       This  is	 configuration	file for indexer (1).  Configuration file con-
       sists of	commands and their arguments.  All commands are	 case-insensi-
       tive.  You can use # to comment out lines.

VARIABLES
       Global parameters

	      These  commands  should be used only once	and take global	effect
	      for the whole configuration file.

       DBType type
	      Database type, currently	supported  values  are	mysql,	pgsql,
	      msql,  solid,  mssql, oracle, ibase, sqlite Actually it does not
	      matter for native	libraries support, but ODBC users must specify
	      one  of the supported values.  If	your database type is not sup-
	      ported, use unknown instead.

       DBHost host
	      SQL host name (Not required for ODBC)

	      Default: localhost

       DBName mnogosearch
	      SQL database name	or ODBC	DSN

	      Default: mnogosearch

       DBUser foo
	      Database username	to connect to database

	      Default: no user

       DBPass bar
	      Database password	to connect to database

	      Default: no password

       DBMode single/multi/crc/crc-multi
	      SQL database words storage mode. Does  not  apply	 for  built-in
	      database.	 When single is	specified, all words are stored	in the
	      same table.  multi means that words are stored in	different  ta-
	      bles  depending on wordlength.  multi mode is usualy faster, but
	      it requires more tables in database.  In case of crc mode,  mno-
	      GoSearch will store 32 bit integer word ID's calculated by CRC32
	      algorythm	instead	of words.  crc mode  requires  less  diskspace
	      and  is  faster  than  single  and  multi	modes.	crc-multi mode
	      shares storage structure with crc	mode, but stores words in dif-
	      ferent  tables depending on wordlength like multi	mode.  Default
	      DBMode value is single

       LocalCharset charset
	      Defines charset for local	file system. It	is required if you are
	      using  8	bit characters and is not applicable for 7 bit charac-
	      ters.  This command is to	be used	once and takes	global	effect
	      for the whole configuration file.

	      Example:
	      LocalCharset windows-1250

       CrossWords yes|no
	      Building	CrossWords  index. Crosswords are those, that are used
	      in a link	to the present page.  The default value	is no

       StopWordFile filename
	      This command indicates which file	 contains  stopwords  list  to
	      load.   You  may	specify	either absolute	file name, or filename
	      with a relative path to mnoGoSearch /etc directory.  You may use
	      several StopWordsFile commands.

       MinWordLength characters
	      MinWordLength characters	With these commands you	can change de-
	      fault length range of words stored in database. By default  mno-
	      GoSearch	stores	words  that are	longer than 1 and shorter than
	      32.  Example: MaxWordLength 35

       MaxDocSize bytes
	      Specify maximum size of a	document in bytes that can be indexed.
	      The  default  value  is 1048576 (1 Mb). This command take	global
	      effect for the whole config file.

       HTTPHeader header
	      You may add custom HTTP headers to indexer HTTP request. Do  not
	      use "If-modified-since" and "Accept-Charset" headers, since they
	      are composed by indexer  itself.	"User-Agent:  mnoGoSearch/ver-
	      sion" is sent too, although you may override it. The command has
	      global effect for	the whole configuration	file.

       ServerTable table_name
	      This command works only with SQL database	and is not  applicable
	      for built-in database mode.  Load	servers	with all their parame-
	      ters from	the table table_name For an  example  of  such	tables
	      structure,  please refer to the file create/mysql/server.txt You
	      may  use	several	 arguments  with  this	command:   ServerTable
	      my_servers1  my_servers2	my_servers3 or just a single argument:
	      ServerTable server

       DeleteNoServer yes|no
	      Use this command to specify whether to delete the	URL that  have
	      no corresponding Server commands.	Default	value is yes

       VarDir /path/to/my/var/dir
	      Specify  a  custom path to directory that	indexer	stores data to
	      when use with built-in database and in cache mode.   By  default
	      /var directory of	mnoGoSearch installation is used.

URL Control Configuration
       Allow [Match|NoMatch] {NoCase|Case] [String|Regex] <arg>	[<arg> ...]
	      Use  this	 command  to  allow  URL's that	match (does not	match)
	      given argument. First three  optional  parameters	 describe  the
	      type of comparison. Default values are Match, NoCase, String Use
	      NoCase or	Case values to to choose case insensitive or sensitive
	      comparison.  Use	Regex to choose	regular	expression comparison.
	      Use String to choose string with wildcards comparison. Wildcards
	      are  *  for  any number of characters, and ?  for	one character.
	      Note that	* and ?	 have special meaning in  String  match	 type.
	      Please  use  Regex  to describe documents	with ?	and * signs in
	      URL.  String match is much faster	 than  Regex,  so  use	String
	      where  it	is possible. You may use several arguments for one Al-
	      low command and use this command any number of times.  It	 takes
	      global  effect for the config file.  Note	that mnoGoSearch auto-
	      matically	adds one Allow regex .*	 command after reading	config
	      file.  That command means	that everything	is allowed that	is not
	      disallowed

       Disallow	[Match|NoMatch]	[Case|NoCase] [String|Regex] [<arg> ...]
	      Use this to disallow indexing documents  with  URLs  that	 match
	      given argument.  The meaning of the first	three optional parame-
	      ters is exactly the same as with the Allow command. You can  use
	      several  arguments for one Disallow command. Takes global	effect
	      for config file.

       Example:
	      #Exclude cgi-bin and non-parsed-headers
	      Disallow /cgi-bin/ \.cgi /nph

	      #Exclude some known extensions
	      Disallow \.b$  \.sh$     \.md5$
	      Disallow \.arj$  \.tar$  \.zip$  \.tgz$  \.gz$
	      Disallow \.lha$ \.lzh$ \.tar\.Z$	\.rar$	\.zoo$
	      Disallow \.gif$  \.jpg$  \.jpeg$ \.bmp$  \.tiff$
	      Disallow \.vdo$  \.mpeg$ \.mpe$  \.mpg$  \.avi$  \.movie$
	      Disallow \.mid$  \.mp3$  \.rm$   \.ram$  \.wav$  \.aiff$ \.ra$
	      Disallow \.vrml$ \.wrl$
	      Disallow \.exe$  \.cab$  \.dll$  \.bin$  \.class$
	      Disallow \.tex$  \.texi$ \.xls$  \.doc$  \.texinfo$
	      Disallow \.rtf$  \.pdf$  \.cdf$  \.ps$
	      Disallow \.ai$   \.eps$  \.ppt$  \.hqx$
	      Disallow \.cpt$  \.bms$  \.oda$  \.tcl$
	      Disallow \.rpm$

	      #Exclude Apache directory	list in	different sort order
	      Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$	\?S=A$
	      \?S=D$

	      #Exclude ./. and ./.. from Apache	and Squid directory list
	      Disallow /[.]{1,2} /\%2e /\%2f

       CheckOnly regexp	[regexp	[...] ]
	      Indexer  will  use HEAD instead of GET http method for URLs that
	      matches regexp. It means that file will be checked only and will
	      not  be  downloaded. Usefull for zip,exe,arj etc files.  One can
	      use several arguments for	one 'CheckOnly'	command.  One can  use
	      this  command any	times but not more than	MAXFILTER in indexer.h
	      Takes global effect for config file.

       Examples:
	      #Use HEAD	method for some	known non-text extensions:
	      CheckOnly	\.b$ \.sh$     \.md5$
	      CheckOnly	\.arj$	\.tar$	\.zip$	\.tgz$	\.gz$
	      CheckOnly	\.lha$ \.lzh$ \.tar\.Z$	 \.rar$	 \.zoo$
	      CheckOnly	\.gif$	\.jpg$	\.jpeg$	\.bmp$	\.tiff$
	      CheckOnly	\.vdo$	\.mpeg$	\.mpe$	\.mpg$	\.avi$	\.movie$
	      CheckOnly	\.mid$	\.mp3$	\.rm$	\.ram$	\.wav$	\.aiff$
	      CheckOnly	\.vrml$	\.wrl$
	      CheckOnly	\.exe$	\.cab$	\.dll$	\.bin$	\.class$
	      CheckOnly	\.tex$	\.texi$	\.xls$	\.doc$	\.texinfo$
	      CheckOnly	\.rtf$	\.pdf$	\.cdf$	\.ps$
	      CheckOnly	\.ai$	\.eps$	\.ppt$	\.hqx$
	      CheckOnly	\.cpt$	\.bms$	\.oda$	\.tcl$
	      CheckOnly	\.rpm$

       HrefOnly	regexp [regexp [...] ]
	      Indexer scans html documents that	match regexp as	it would  scan
	      any  other  URLs,	except that it will not	index the contents. It
	      will add any URLs	it finds in html document to database. Usefull
	      when indexing mail list archives with big	index pages which con-
	      tain mostly URLs.	 One can use several arguments for  one	 'Hre-
	      fOnly' command.  One can use this	command	any times but not more
	      than MAXFILTER in	indexer.h Takes	global effect for config file.

       Examples:
	      #Scan these files	for href tags only, but	 do  not  index	 there
	      contents.
	      HrefOnly mail.*\.html$ thr.*\.html$

MIME types and external	parsers
       UseRemoteContentType yes|no
	      This  command  specifies	if the indexer should get content type
	      from HTTP	server headers (yes) , or from	its  AddType  settings
	      (no).  If	 set  to no , and the indexer could not	determine con-
	      tent-type	with its AddType settings,

       SyslogFacility facility
	      Useful only if indexer is	compiled with syslog  support  and  if
	      you  do  not  like  the default. Argument	is the same as used in
	      syslog.conf file (for example: local7 , daemon ).	 For  list  of
	      possible	facilities  see	syslog.conf(5) Takes global effect and
	      should be	used only once !  Default: depends on compilation.

       LogdAddr	host[:port]
	      Use cachelogd at given host and port if specified. Required  for
	      cache mode only. Default values are localhost and	port 7000

       FollowOutside yes|no
	      Allow/disallow  indexer  to walk outside current server.	Should
	      be used carefully	(see MaxHops command).

	      Default: no

       Period seconds
	      Reindex period in	seconds, 604800	= 1 week.  May be used	before
	      every  Server  command  and  takes effect	till the end of	config
	      file or till next	Period command.

       Tag number
	      Use this parameter for your own purposes.	For example for	group-
	      ing  some	 servers  into	one  group, etc.  May be used multiple
	      times before every Server	command	and takes effect till the  end
	      of config	file or	till next Tag command.

       MaxHops number
	      Maximum  way  in	"mouse	clicks"	from start URL given in	Server
	      command. May be used multiple times before every Server  command
	      and  takes  effect till the end of config	file or	till next Max-
	      Hops command.

	      Default: 256

       MaxNetErrors number
	      Maximum network errors for each server.  If there	are  too  many
	      network  errors on some server (server is	down, host unreachable
	      etc.)  indexer will try not to do	more than number  attempts  to
	      connect  to  this	 server.   May	be  used multiple times	before
	      Server command and takes effect till the end of config  file  or
	      till next	MaxNetErrors command.

	      Default: 16

       TitleWeight number
	      Weight  of the words in the <title>...</title> Can be set	multi-
	      ple times	before Server command and takes	effect till the	end of
	      config file or till next TitleWeight command.

	      Default: 2

       BodyWeight number
	      Weight  of  the  words in	the <body>...</body> of	the html docu-
	      ments and	in the contents	of the text/plain documents.   Can  be
	      set  multiple  times before Server command and takes effect till
	      the end of config	file or	till next BodyWeight command.

	      Default: 1

       DescWeight number
	      Weight  of  the  words  in  the  <META  NAME="Description"  Con-
	      tent="...">  Can be set multiple times before Server command and
	      takes effect till	the end	of config file or till next DescWeight
	      command.

	      Default: 2

       KeywordWeight number
	      Weight  of the words in the <META	NAME="Keywords"	Content="...">
	      Can be set multiple times	before Server command and takes	effect
	      till the end of config file or till next KeywordWeight command.

	      Default: 2

       UrlWeight number
	      Weight  of  the  words  in the URL of the	documents.  Can	be set
	      multiple times before Server command and takes effect  till  the
	      end of config file or till next UrlWeight	command.

	      Default: 0

       DeleteBad yes|no
	      Prevent  indexer	from  deleting	bad (not found,	forbidden etc)
	      URLs from	database. Useful if you	want to	check  'integrity'  of
	      you server(s), so	if you set it to no , that "bad" URLs will re-
	      main in database.	 Can be	set multiple times before Server  com-
	      mand  and	 takes effect till the end of config file or till next
	      DeleteBad	command.

	      Default: yes

       Robots yes|no
	      Allows/disallows using robots.txt	and <META  NAME="robots">  ex-
	      clusions.	 Useful	 if  you  want	to  check  'integrity'	of you
	      server(s).  Can be set multiple times before Server command  and
	      takes  effect  till  the	end of config file or till next	Robots
	      command.

	      Default: yes.

       Section <string>	<number>
	      where <string> is	a section name and <number> is section ID  be-
	      tween  0 and 255.	Use 0 if you don't want	to index some of these
	      sections.	It is better to	use different sections IDs for differ-
	      ent  documents  parts. In	this case during search	time you'll be
	      able to give different weight to each part or even disallow some
	      sections at a search time.

       Index yes|no
	      Prevent indexer from storing words into database.	 Useful	if you
	      want to check 'integrity'	of you server(s).  Can be set multiple
	      times  before  "Server" command and takes	effect till the	end of
	      config file or till next Index command.

	      Note: Instead of Index no	you can	use the	alternate form NoIndex

	      Default: yes

       Follow yes|no
	      Allow/disallow indexer to	store <a  href="...">  into  database.
	      Can be set multiple times	before Server command and takes	effect
	      till the end of config file or till next Follow command.

	      Note: Instead of Follow no you can use the alternate form	NoFol-
	      low

	      Default: yes

       MaxDocSize size

	      Hope the name is self-explanatory, this command is to limit max-
	      imum document size.  size	is in bytes.   If  there  is  document
	      with  size  more	than size , indexer will parse only first size
	      bytes of documents.

	      Default: 1048576 (which is 1 megabyte)

       Mime   _from_mime_ _to_mime_[;charset] ["command	line [$1]"]

	      This is used to add support  for	parsing	 documents  with  mime
	      types  other  than text/plain and	text/html.  It can be done via
	      external parser (which should provide output in  plain  or  html
	      text)  or	 just  by substituting mime type so indexer can	under-
	      stand it directly.

	      _from_mime_ and _to_mime_	are standard  mime  types.   _to_mime_
	      should be	either text/plain or text/html , because these are the
	      only types that indexer understands.

	      We assume	external parser	generates results on stdout  (if  not,
	      you have to write	a little script	and cat	results	to stdout).

	      Optional charset parameter used to change	charset	if needed.

	      Command  line parameter is optional. If there's no command line,
	      this is used to change mime type.	Command	line could  also  have
	      $1  parameter which stands for temporary file name. Some parsers
	      could not	operate	on stdin, so indexer  creates  temporary  file
	      for parser and its name passed instead of	$1.

       CharSet charset
	      Useful  for 8 bit	character sets.	 WWW-servers send data in dif-
	      ferent character sets.  charset  is  default  character  set  of
	      server  in  next	Server	command(s).   May be used before every
	      Server command and takes effect till the end of config  file  or
	      till next	CharSet	command.

	      By   now	 indexer  supports  Cyrillic  koi8-r,  cp1251,	cp866,
	      iso8859-5, x-mac-cyrillic, Arabic	 cp1256,  Western  iso-8859-1,
	      Central Europe iso-8859-2	and cp1250 character sets.

	      This  parameter  is default character set	for "bad" servers that
	      do not send information about charset in header: just  "Content-
	      type:   text/html"   instead   of	  for  example	"Content-type:
	      text/html; charset=koi8-r" and do	not send  charset  information
	      in META tags.

	      CharSet command.

       Examples:

	      CharSet koi8-r
	      CharSet windows-1250
	      CharSet ISO-8859-1

       ForceIISCharset1251 yes/no
	      This  option  is	useful for users dealing with Cyrillic content
	      and broken (or misconfigured?) Microsoft IIS web servers,	 which
	      tends  to	 report	 charset  incorrectly.	This is	a really dirty
	      hack, but	if this	option is turned on it	is  assumed  that  all
	      servers  that  are reported as 'Microsoft' or 'IIS' have content
	      in Windows-1251 codepage.	 This command should be	used only once
	      in configuration file and	takes global effect.

	      Default: no

       AuthBasic login:passwd
	      Use  basic  http	authorization.	Can be set before every	Server
	      command and takes	effect only for	next Server command.

       Examples:

	      AuthBasic	somebody:something

	      If you have password protected directory(ies), but whole	server
	      is open, use:

	      AuthBasic	login1:passwd1
	      Server http://my.server.com/my/secure/directory1/
	      AuthBasic	login2:passwd2
	      Server http://my.server.com/my/secure/directory2/
	      Server http://my.server.com/

       ProxyAuthBasic login:passwd
	      Use  http	 proxy	basic  authorisation. Can be used before every
	      Server command and taked effect only for	the  next  one	Server
	      command! It should be also before	Proxy command.

       Example:
	      ProxyAuthBasic somebody:smth

       Proxy your.proxy.host[:port]
	      Connect  ia   proxy  rather directly.  You can index ftp servers
	      (only) when using	proxy.	If port	is not specified, it is	set to
	      default  value of	3128 (Squid).  If proxy	host is	not specified,
	      direct connection	will be	performed.  Can	be  set	 before	 every
	      Server  command  and takes effect	till the end of	config file or
	      till next	Proxy command.

       Examples:
	      Proxy atoll.anywhere.com
	       - proxy on atoll.anywhere.com, port 3128

	      Proxy lota.anywhere.com:8090
	       - proxy on lota.anywhere.com, port 8090

	      Proxy
	       - turn off proxy	usage (direct connection)

       Server URL
	      It is the	main configuration command.  Use this to add start URL
	      of  server  to  be indexed.  You may use many Server commands in
	      the same indexer.conf file

       Examples:

	      Server http://localhost/
	      Server http://www.yoursite.com/
	      Server http://www.yoursite.com/~yourname/
	      Server ftp://ftp.yourdomain.com/pub/

EXAMPLE
       This is a minimal sample	indexer	config file

	      DBHost	     localhost
	      DBName	     udmsearch
	      DBUser	     foo
	      DBPass	     bar
	      Server	     http://localhost/
	      Disallow /cgi-bin/ \.cgi /nph
	      Disallow \.b$  \.sh$     \.md5$
	      Disallow \.arj$  \.tar$  \.zip$  \.tgz$  \.gz$
	      Disallow \.lha$ \.lzh$ \.tar\.Z$	\.rar$	\.zoo$
	      Disallow \.gif$  \.jpg$  \.jpeg$ \.bmp$  \.tiff$
	      Disallow \.vdo$  \.mpeg$ \.mpe$  \.mpg$  \.avi$  \.movie$
	      Disallow \.mid$  \.mp3$  \.rm$   \.ram$  \.wav$  \.aiff$ \.ra$
	      Disallow \.vrml$ \.wrl$
	      Disallow \.exe$  \.cab$  \.dll$  \.bin$  \.class$
	      Disallow \.tex$  \.texi$ \.xls$  \.doc$  \.texinfo$
	      Disallow \.rtf$  \.pdf$  \.cdf$  \.ps$
	      Disallow \.ai$   \.eps$  \.ppt$  \.hqx$
	      Disallow \.cpt$  \.bms$  \.oda$  \.tcl$
	      Disallow \.rpm$
	      Disallow \?D=A$ \?D=A$ \?D=D$ \?M=A$ \?M=D$ \?N=A$ \?N=D$	\?S=A$
	      \?S=D$
	      Disallow /[.]{1,2} /\%2e /\%2f

SEE ALSO
       indexer(1), syslog.conf(5)

mnoGoSearch 3.1			 23 March 2001		       INDEXER.CONF(5)

NAME | DESCRIPTION | VARIABLES | URL Control Configuration | MIME types and external parsers | EXAMPLE | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=indexer.conf&sektion=5&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help