Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages


home | help
awffull.conf(5)						       awffull.conf(5)

       AWFFull - A Webalizer Fork, Full	o' features

       awffull.conf  is	the configuration file for awffull(1). awffull.conf is
       a standard ASCII(7) text	files that may be created or edited using  any
       standard	editor.

       Blank lines and lines that begin	with a pound sign ('#')	are ignored.

       Any  other lines	are considered to be configuration lines, and have the
       form `Keyword Value', where the	`Keyword'  is  one  of	the  currently
       available configuration keywords, and `Value' is	the value to assign to
       that particular option.

       Any text	found after the	keyword	up to the end of the line  is  consid-
       ered  the keyword's value, so you should	not include anything after the
       actual value on the line	that is	not actually part of the  value	 being
       assigned.  The file sample.conf provided	with the distribution contains
       lots of useful documentation and	examples as well.

       Some `Keywords' will accept a 2^nd  value.  In  those  situations,  the
       first  value  may  be enclosed in double	quotes (") to allow for	white-

       Keywords	are Case Insensitive. Values are  Case	Sensitive,  with  some
       gotchas:	See Ignore* for	details.

       Wildcards  within  AWFFull are a	little non standard and	may cause some

       Wildcards are only valid	within the Value of certain keywords

       A Value can have	either a leading or trailing '*' to signify a wildcard
       character.  If  no wildcard is found, a match can occur anywhere	in the
       string.	Given  a  string  `',	the   values   `your',
       `*' and `www.your*' will	all match.

       Thus  the use of	the wildcard signifies that the	other end of the Value
       is anchored at the Beginning or End of a	field to be searched against.

       eg. A Value of `Bot*' implies that the  field  (probably	 UserAgent  in
       this  case)  MUST start with the	letters	Bot. Or	in the case of a Host-
       name `*' implies a  match	 ONLY  against	Australian  Government

       The  Run	 Options are the generic ones that tell	AWFFull	where stuff is
       and how to generally operate. Some of these can modify the results that
       AWFFull will produce.

	      OutputDir	is where you want to put the output files. This	should
	      should be	a full path name, however relative ones	might work  as
	      well. If no output directory is specified, the current directory
	      will be used.

	      LogFile defines the web server log file to use. If not specified
	      here  or on on the command line, input will default to STDIN. If
	      the log filename ends in '.gz' (ie: a gzip compressed file),  it
	      will be decompressed on the fly as it is being read.

	      LogType  defines the log type being processed. Normally, AWFFull
	      expects a	CLF or Combined	web server log as  input.  Using  this
	      option, you can process ftp logs as well (xferlog	as produced by
	      wu-ftpd and others), or Squid native logs. Values	can be	'auto'
	      'clf',  'combined',  'ftp', 'domino' or 'squid', with 'auto' the
	      default. The 'auto' value	means that AWFFull will	try  and  work
	      out  what	 log  format you are sending to	it. If no joy, AWFFull
	      will immediately exit.

       GeoIP  GeoIP enables or disables	the use	of the	GeoIP  capability  for
	      more  accurate detection of countries. Default is	`no'. NOTE! Do
	      not enable GeoIP if you analyse files that have had the  IP  Ad-
	      dress  translated	to a Fully Qualified Host Name.	Use either raw
	      IP Addresses and GeoIP, or Names and disable  GeoIP.  ie.	 Don't
	      use GeoIP	AND DNShistory.

	      GeoIPDatabase  is	 the  location of the GeoIP database file. De-
	      fault is /usr/local/share/GeoIP/GeoIP.dat, which is where	a  de-
	      fault  GeoIP  install will put it. Note that the database	is up-
	      dated   monthly.	 For   the   details   see:   <http://www.max->

	      Incremental  processing  allows multiple partial log files to be
	      used instead of one huge one. Useful for large sites  that  have
	      to  rotate  their	log files more than once a month. AWFFull will
	      save its internal	state before exiting, and restore it the  next
	      time  run,  in  order  to	continue processing where it left off.
	      This mode	also causes AWFFull to scan for	and  ignore  duplicate
	      records  (records	 already processed by a	previous run). See the
	      README file for additional information. The value	may  be	 'yes'
	      or  'no',	 with  a  default of 'no'. The file awffull.current is
	      used to store the	current	state data, and	is located in the out-
	      put  directory of	the program (unless changed with the Incremen-
	      talName option below). Please read at least the section  on  In-
	      cremental	 processing  in	the README file	before you enable this

       TimeMe TimeMe allows you	to force the display of	timing information  at
	      the  end	of  processing.	A value	of 'yes' will force the	timing
	      information to be	displayed. A value of 'no' has no effect.

	      IgnoreHist should	not be used in a standard  configuration,  but
	      it  is here because it is	useful in certain analysis situations.
	      If the history file is ignored, the main `index.html' file  will
	      only  report on the current log files contents. Incremental data
	      (if present) is still processed. Useful when you want to	repro-
	      duce  the	 reports  from scratch,	for example. USE WITH CAUTION!
	      Valid values are `yes' or	`no'. Default is `no'.

	      IncrementalName allows you to specify the	 filename  for	saving
	      the incremental data in. It is similar to	the HistoryName	option
	      where the	name is	relative to the	 specified  output  directory,
	      unless  an absolute filename is specified. The default is	a file
	      named `awffull.current' kept in the normal output	directory.  If
	      you  don't  specify Incremental as 'yes' then this option	has no

	      HistoryName allows you to	specify	the name of the	 history  file
	      produced	by  AWFFull. The history file keeps the	data for up to
	      12 months	worth of logs, used for	generating the main HTML  page
	      (index.html).  The  default is a file named awffull.hist,	stored
	      in the specified output directory. If you	specify	just the file-
	      name  (without  a	path), it will be kept in the specified	output
	      directory. Otherwise, the	path is	relative to the	output	direc-
	      tory, unless absolute (leading /).

       These  are the basic analysis options that one can and should modify to
       start fine tuning AWFFull against a given website.

	      PageType lets you	tell AWFFull what types	of URL's you  consider
	      a	 'page'. Most people consider html and cgi documents as	pages,
	      while not	images and audio files.	If no types are	specified, de-
	      faults  will  be used ('htm', 'html', 'cgi' and HTMLExtension if
	      different	for web	logs, 'txt' for	ftp logs).  Putting  the  more
	      likely page types	first in the list should increase the speed of
	      a	run.

	      Do Not Use Wildcards Here. It will not work.

	      NotPageType is the direct	and incompatible opposite of PageType.
	      You  can use one set or the other, but not both. PageType	speci-
	      fies what	*is* a Page, NotPageType specifies what	 *isn't*,  and
	      hence  by	implication, everything	else is	a page.	Neither	method
	      is more or lessor	correct	than the other.	It's more what is more
	      accurate	for  *your*  site. Do not add the "." or use any wild-
	      cards.  As a general rule. There are some	assumed	internal opti-
	      misations	 that may otherwise break. Those who understand	pcre's
	      would do well to examine the source of parser.c if they wish  to
	      extract greater flexibility from the below.

	      FoldSeqErr  forces  AWFFull  to  ignore sequence errors. This is
	      useful for Netscape and other web	servers	that cache the writing
	      of log records and do not	guarantee that they will be in chrono-
	      logical order. The use of	the FoldSeqErr option will  cause  out
	      of  sequence  log	 records to be treated as if they had the same
	      time stamp as the	last valid record. The default	action	is  to
	      ignore out of sequence log records.

	      The  SearchEngine	keywords allow specification of	search engines
	      and their	query strings on the URL. These	are used to locate and
	      report what search strings are used to find your site. The first
	      word is a	substring to match in the referrer field that  identi-
	      fies  the	search engine, and the second is the URL variable used
	      by that search engine to define it's search terms.

	      VisitTimeout allows you to set the default timeout for  a	 visit
	      (sometimes called	a 'session'). The default is 30	minutes, which
	      should be	fine for most sites. Visits are	determined by  looking
	      at the time of the current request, and the time of the last re-
	      quest from the site. If the time difference is greater than  the
	      VisitTimeout  value, it is considered a new visit, and visit to-
	      tals are incremented. Value is the number	of seconds to  timeout

	      TrackPartialRequests  is used to track 206 codes.	This gives two
	      additional columns in the	Top URLs tables. The first  to	"Hits"
	      counts  the  number  of  partial requests	The second to "Volume"
	      counts the volume	in partial requests This option	is more	of use
	      to those with lots of PDF's.

	      The MangleAgents allows you to specify how much, if any, AWFFull
	      should mangle user agent names. This allows  several  levels  of
	      detail  to  be  produced	when  reporting	user agent statistics.
	      There are	six levels that	can be specified, which	define differ-
	      ent levels of detail suppression.	Level 5	shows only the browser
	      name (MSIE or Mozilla) and the major  version  number.  Level  4
	      adds  the	 minor	version	number (single decimal place). Level 3
	      displays the minor version to two	decimal	places.	Level  2  will
	      add  any sub-level designation (such as Mozilla/3.01Gold or MSIE
	      3.0b). Level 1 will attempt to also add the system type if it is
	      specified.  The  default	Level  0  displays the full user agent
	      field without modification and produces the greatest  amount  of
	      detail.  User agent names	that can't be mangled will be left un-

	      AssignToCountry allows a form of override	to force given domains
	      to a specified country. Use the standard 2 letter	country	codes.
	      Can also use org,	com, net and so	on, if more appropriate.  With
	      judicious	use of AllSites, GroupSite and 'whois',	this can cover
	      the majority of your users without too much effort.

	      AWFFull normally strips the string 'index.' off the end of URL's
	      in  order	 to  consolidate  URL  totals.	For  example,  the URL
	      /somedir/index.html is turned into /somedir/ which is really the
	      same  URL.  This option allows you to specify additional strings
	      to treat in the same way.	You don't need to specify 'index.'  as
	      it  is  always  scanned  for  by AWFFull,	this option is just to
	      specify _additional_ strings if needed. If you don't  need  any,
	      don't  specify  any  as each string will be scanned for in EVERY
	      log record... A bunch of them will  degrade  performance.	 Also,
	      the  string  is  scanned for anywhere in the URL,	so a string of
	      'home' would turn	the URL	/somedir/homepages/brad/home.html into
	      just /somedir/ which is probably not what	was intended.

	      The opposite (in a way) of IndexAlias is IgnoreIndexAlias.  This
	      will STOP	any URL	variable stripping, as well  as	 ignoring  the
	      default "index." setting,	or any that you	set above.

       The  Ignore*  keywords  allow you to completely ignore, or filter away,
       log records based on hostname, URL, user	agent, referrer	or user	 name.
       Use  the	 same syntax as	the Hide* keywords, where the value can	have a
       leading or trailing wildcard '*'.

	      Filters out traffic accessing certain URLs. eg You may  wish  to
	      avoid  seeing  traffic  that  accesses administration functions,
	      thus "IgnoreURL /admin*".	URLs are case sensitive.

	      Ignore sites that	visit this website. Ignore  by	what  is  pre-
	      sented  to  awffull  -  name or IP Address. Sites	are lowercased
	      prior to filtering, so if	Ignore'ing by name, do	use  a	lower-
	      cased Value.

	      Ignore  specified	referrers. Very	useful for filtering away SPAM
	      Referrers. Referrers are partially case sensitive. \o/ The  host
	      portion is lowercased; the URI is	case sensitive.

	      Ignore  specified	users. User names are lowercased prior to fil-

	      Agents are case sensitive.

       The Include* keywords allow you to force	the inclusion of  log  records
       based on	hostname, URL, user agent, referrer or user name. The Include*
       keywords	take precedence	over the Ignore* keywords.

       Note: Using Ignore/Include combinations to selectively process parts of
       a web site is _extremely	inefficient_!!!	Avoid doing so if possible ie:
       grep or gawk the	records	to a separate file if  you  really  want  that
       kind of report.






       Segmenting  is  a  bit like the Ignore* and Include* keywords. Where it
       differs is in "remembering". Such that, as  a  `session'	 (or  `visit')
       moves  away  from  the  original	entry condition, that session is still
       tracked.	So if you segment on a referal from Google, only sessions that
       were  refered  to  the  analysed	website, from Google, will be tracked.
       Even as that same session accesses other	pages within the website.

       eg. Google -> Site Page 1 -> Site Page 2	-> Site	Page 3

       Whereas Ignore/Include would only filter	 the  first  interaction.  eg.
       Google -> Site Page 1

       By  "session"  (or  `visit')  it	is meant that the time limitation of a
       session (typically 30 minutes timeout) will impact. So in the above ex-
       ample from Google, if the last step (from Page 2	to Page	3) occured 31+
       minutes after the Page 1	to Page	2 transition,  then  this  final  step
       would NOT be included. The trail	would be:

       Google -> Site Page 1 ->	Site Page 2

       Please  do  be aware that currently AWFFull uses	IP Addresses to	deter-
       mine the	continuation of	a given	session. This will be most  flawed  if
       you have	a user population that sits behind corporate firewalls,	or ISP
       Proxies.	To mention two major problem areas.

       Why do Segmenting?


       `Segment	 analysis  will	 tell you different things about your audience
       than you	will realize from studying overall population metrics.'

       `The goal of segmentation is to maximize	future value of	 that  segment
       by optimizing your marketing mix.'

       With apologies to Judah for mixing his phrase order around.  :-)

	      Segment  by Country: Only	track sessions that come from the fol-
	      lowing countries.	This will be determined	by:

	      1.  Use of AssignToCountry overrides

	      2.  GeoIP	lookups	if so configured and enabled

	      3.  Hostname TLD.	eg .au

       The third option	is generally going to be the worst for	accuracy.  eg.
       We  have	 plenty	 of  Australian	IP addresses that otherwise resolve to
       .com or .net etc.

       It is strongly advised to enable	GeoIP if you wish to use this option.

	      Segment by Referer: Only track sessions that originated from the
	      following	 referers.  NOTE!!!! SegReferer	only works against the
	      HOST name. Not the full URL.

       The Display Options modify the resulting	output that AWFFull  produces.
       Things  like  HTML  Headers and Footers to add on every page. These op-
       tions don't change the numbers that AWFFull  will  calculate,  but  may
       change which ones appear, giving	the illusion of	a numerical change.

	      ReportTitle  is  the  text to display as the title. The hostname
	      (unless blank) is	appended to the	end of this string  (separated
	      with  a  space) to generate the final full title string. Default
	      is (for English) `Usage Statistics for'.

	      HostName defines the hostname for	the report. This  is  used  in
	      the  title, and is prepended to the URL table items. This	allows
	      clicking on URL's	in the report to go to the proper location  in
	      the  event you are running the report on a 'virtual' web server,
	      or for a server different	than the one the report	resides	on. If
	      not  specified here, or on the command line, AWFFull will	try to
	      get the hostname via a uname system call.	If that	fails, it will
	      default to `localhost'.

	      This  option controls how	many years worth of data to display on
	      the front	summary	page. In months. eg: Display the last 5	years:
	      5	x 12 = 60

	      DailyStats  allows  the  daily statistics	table to be disabled -
	      not displayed. Values may	be `yes' or `no'. Default is  `yes'  -
	      do display the Daily Statistics table.

	      HourlyGraph  and	HourlyStats allows the hourly statistics graph
	      and statistics table to be disabled (not displayed). Values  may
	      be "yes" or "no".	Default	is "yes".

	      CSSFilename  is  used  to	set the	name of	the CSS	file to	use in
	      conjunction with the generated html. An  existing	 file  is  not
	      overwritten, so feel free	to make	you own	changes	to the default
	      file. The	default	is awffull.css.

	      FlagsLocation will enable	the display of country	flag  pictures
	      in the country table. The	path is	that for a webserver, not file
	      system. Can be relative or complete. The trailing	slash  is  not
	      necessary. The default location is not set and hence will	not be

	      YearlySubtotals will display the subtotal	for a  given  year  in
	      the  main	 page.	This  is in addition to	the Grand Total	of all

	      The GroupShading allows grouped rows to be shaded	in the report.
	      Useful  if  you  have lots of groups and individual records that
	      intermingle in the report, and you  want	to  differentiate  the
	      group  records  a	 little	more. Value can	be `yes' or `no', with
	      `yes' being the default.

	      GroupHighlight allows the	group record to	be displayed in	 BOLD.
	      Can be either `yes' or `no' with the default being `yes'.

	      HTMLExtension  allows  you  to specify the filename extension to
	      use for generated	HTML pages. Normally, this defaults to "html",
	      but  can be changed for sites who	need it	(like for PHP embedded

	      UseHTTPS should be used if the analysis is being run on a	secure
	      server,  and  links to urls should use `https://'	instead	of the
	      default `http://'. If you	need this, set it to `yes'. Default is
	      `no'. This only changes the behaviour of the `Top	URLs' table.

       Top*   The various `Top'	options	below define the number	of entries for
	      each table. Tables may be	disabled by using  zero	 (0)  for  the

	      The  most	 accessed  URLs	 or  Resources	by  number of requests
	      (hits). Includes both Pages and Images, for example. Defaults to
	      30 URLs.

	      The greatest volume generating URLs. Defaults to 10 URL's.

	      The  most	 accessed  initial  URLs within	a complete Visit. Will
	      also display Single Access counts, Stickiness ration  and	 Popu-
	      larity ratio. Defaults to	10 URLs.

	      The  most	 accessed  last	 URLs within a complete	Visit. ie: The
	      last page	recorded of a Visit. Also displays the Popularity  ra-
	      tio.  Defaults to	10 URLs.

	      The  most	seen error requests and	a corresponding	referring URL.
	      Defaults to 0, ie	not shown.

	      Those Sites that have accessed the most  Pages.  Default	is  30

	      Those Sites that have downloaded the greatest Volume. Default is
	      10 Sites.

	      Those local and remote URLs that refer the most  requests.   De-
	      fault is 30 Referrers.

	      Those  words and phrases used at remote Search Engines to	direct
	      traffic here. Default is 20 Phrases.

	      Those logged in users who	most  use  the	site.  Default	is  20

	      The  Browser  Agents that	are busiest against this site. Default
	      is 15 Agents.

	      A	view of	all traffic against this site via country.

       All*   The All* keywords	allow the display of all the  below  measures.
	      If  enabled,  a  separate	 HTML page will	be created, and	a link
	      will be added to the bottom  of  the  appropriate	 "Top"	table.
	      There are	a couple of conditions for this	to occur. First, there
	      must be more items than will fit in the "Top"  table  (otherwise
	      it would just be duplicating what	is already displayed). Second,
	      the listing will only show those items that are  normally	 visi-
	      ble,  which means	it will	not show any hidden items. Grouped en-
	      tries will be listed first, followed by  individual  items.  The
	      value  for  these	keywords can be	either 'yes' or	'no', with the
	      default being 'no'. Please be aware  that	 these	pages  can  be
	      quite  large  in size, particularly the sites page, and separate
	      pages are	generated for each month, which	can  consume  quite  a
	      lot of disk space	depending on the traffic to your site.

	      All accessed URLs

	      All Pages	that initialised a Visit

	      All the last or exit pages in all	Visits.

	      All ErrorRequests	and the	corresponding referral URLs.

	      All remote sites that accessed this website.

	      All local	and remote referring URLs

	      All Remote Search	Engine words and Phrases used to refer traffic

	      All users	who logged into	this website.

	      All Browser Agents used to access	this site. Useful for  identi-
	      fying robots.

	      GMTTime  allows  reports to show GMT (UTC) time instead of local
	      time. Default is to display the time the report was generated in
	      the timezone of the local	machine, such as EDT or	PST. This key-
	      word allows you to have times displayed in UTC instead. Use only
	      if  you  really have a good reason, since	it will	probably screw
	      up the reporting periods by however many hours your  local  time
	      zone is off of GMT.

	      HTMLPre defines HTML code	to insert at the very beginning	of the
	      file. Default is the DOCTYPE line	shown below. Max  line	length
	      is  80  characters,  so  use  multiple HTMLPre lines if you need

	      HTMLHead defines HTML code to insert  within  the	 <HEAD></HEAD>
	      block,  immediately  after the <TITLE> line. Maximum line	length
	      is 80 characters,	so use multiple	lines if needed.

	      HTMLBody defined the HTML	code to	be inserted, starting with the
	      <BODY>  tag.  If	not specified, the default is shown below.  If
	      used, you	MUST include your own <BODY> tag as  the  first	 line.
	      Maximum line length is 80	char, use multiple lines if needed.

	      HTMLPost	defines	the HTML code to insert	immediately before the
	      first <HR> on the	document, which	is just	after  the  title  and
	      "summary period"-"Generated on:" lines. If anything, this	should
	      be used to clean up in case an image was inserted	with HTMLBody.
	      As  with	HTMLHead,  you can define as many of these as you want
	      and they will be inserted	in the output stream in	order  of  ap-
	      pearance.	 Max  string size is 80	characters. Use	multiple lines
	      if you need to.

	      HTMLTail defines the HTML	code to	insert at the bottom  of  each
	      HTML  document, usually to include a link	back to	your home page
	      or insert	a small	graphic. It is inserted	as a table  data  ele-
	      ment  (ie:  <TD> your code here </TD>) and is right aligned with
	      the page.	The maximum string size	is 80 characters.

	      HTMLEnd defines the HTML code to add at the very end of the gen-
	      erated  files.  It defaults to what is shown below. If used, you
	      MUST specify the </BODY> and </HTML> closing tags	 as  the  last
	      lines. The maximum string	length is 80 characters.

       As  distinct from the general Display Options, the Graphing Options fo-
       cus on manipulating the various graphs produced.

	      CountryGraph allows the usage by country graph to	 be  disabled.
	      Values can be 'yes' or 'no', default is 'yes'.

	      DailyGraph determines if the daily statistics graph will be dis-
	      played or	not. Values may	be "yes" or "no". Default is  "yes"  -
	      do display the daily graph.

	      HourlyGraph  determines  if  the	daily statistics graph will be
	      displayed	or not.	Values may be "yes" or "no". Default is	 "yes"
	      -	do display the hourly graph.

	      Display a	pie chart of the top URLs by HITS

	      Display a	pie chart of the top URLs by HITS

	      Display  Top Exit	Pages Pie Chart. Values	may be `hits' or `vis-
	      its' or "no". Default is "no"

	      `hits' means order the graph by hits

	      `visits' means order the graph by	visits

	      Display Top Entry	Pages Pie Chart. Values	may be `hits' or `vis-
	      its' or "no". Default is "no"

	      `hits' means order the graph by hits

	      `visits' means order the graph by	visits

	      Display a	pie chart of the Top Sites by Page Impressions

	      Display a	pie chart of the Top Sites by Page Impressions

	      Display a	pie chart of the Top User Agents (by pages)

	      GraphLegend  allows  the	color coded legends to be turned on or
	      off in the graphs. The default is	for them to be displayed. This
	      only  toggles the	color coded legends, the other legends are not
	      changed. If you think they are hideous and ugly, say  'no'  here

	      GraphLines  allows  you  to  have	 index	lines drawn behind the
	      graphs. Anything other than "no" will enable the lines.

       Graph*X and Graph*Y
	      The following Graph*X and	Graph*Y	options	are used to modify the
	      sizes of the created charts. The default settings	are shown. The
	      defaults are also	the minimum  settings.	#define	 GRAPH_INDEX_X
	      512  /* px. Default X size (512) */ #define GRAPH_INDEX_Y	256 /*
	      px. Default Y size (256) */ #define  GRAPH_DAILY_X  512  /*  px.
	      Daily  X	size (512) */ #define GRAPH_DAILY_Y 400	/* px. Daily Y
	      size (400) */ #define GRAPH_HOURLY_X 512 /*  px.	Daily  X  size
	      (512) */ #define GRAPH_HOURLY_Y 400 /* px. Daily Y size (400) */
	      #define GRAPH_PIE_X 512 /* px.  Pie  X  size  (512)  */  #define
	      GRAPH_PIE_Y 300 /* px. Pie Y size	(300) */

	      The  main	 chart	on the front page. Summary of all Months.  De-
	      fault is 512 pixels.

	      Default is 256 pixels.

	      The Day by Day Summary graph at the start	of  each  Months  Sum-
	      mary. Default is 512 pixels.

	      Default is 400 pixels.

	      The  Hourly Average graph	within each Months Summary. Default is
	      512 pixels.

	      Default is 400 pixels.

	      All pie charts are the same size.	Default	is 512 pixels.

	      Default is 300 pixels.

       Graph and Table Colours
	      The custom bar graph and pie  Colours  can  be  overridden  with
	      these options. Declare them in the standard hexadecimal way - as
	      per HTML but without the '#'. If none are	given,	you  will  get
	      the default AWFFull colors.

	      Default value is `00805C'	(dark green)

	      Default value is `0000FF'	(blue)

	      Default value is `FF8000'	(orange)

	      Default value is `FF0000'	(red)

	      Default value is `00E0FF'	(cyan)

	      Default value is `FFFF00'	(yellow)

	      Default value is `00805C'	(dark green)

	      Default value is `0000FF'	(blue)

	      Default value is `FF8000'	(orange)

	      Default value is `FF0000'	(red)

       The  Group*  keywords permit the	grouping of similar objects as if they
       were one. Grouped records are displayed in the `Top' tables and can op-
       tionally	 be  displayed in bold and/or shaded. Groups cannot be hidden,
       and are not counted in the main totals. The Group* options do not  hide
       the individual items that are members of	the Group. If you wish to hide
       the records that	match -	so just	the grouping  record  is  displayed  -
       follow  with an identical Hide* keyword with the	same value. Or use the
       single GroupAndHide* keyword that matches, instead of  the  Group*  and
       Hide* combination.

       Group*  keywords	may have an optional label which will be displayed in-
       stead of	the keywords value. The	label should  be  separated  from  the
       value by	at least one white-space character, such as a space or tab.

       The Hide*, Group* and Ignore* and Include* keywords allow you to	change
       the way Sites, URL's, Referrers,	User Agents and	User names are manipu-
       lated.  The  Ignore*  keywords  will cause AWFFull to completely	ignore
       records as if they didn't exist (and thus not counted in	the main  site
       totals).	The Hide* keywords will	prevent	things from being displayed in
       the 'Top' tables, but will still	be counted in  the  main  totals.  The
       Group*  keywords	 allow	grouping  similar objects as if	they were one.
       Grouped records are displayed in	the 'Top' tables and can optionally be
       displayed  in  BOLD and/or shaded. Groups cannot	be hidden, and are not
       counted in the main totals. The Group* options do not, by default, hide
       all  the	 items	that  it matches. If you want to hide the records that
       match (so just the grouping record is displayed), follow	with an	 iden-
       tical  Hide*  keyword with the same value. (see example below) In addi-
       tion, Group* keywords may have an optional label	 which	will  be  dis-
       played  instead	of  the	keywords value.	 The label should be separated
       from the	value by at least one 'white-space' character, such as a space
       or tab.

       The value can have either a leading or trailing '*' wildcard character.
       If no wildcard is found,	a match	can  occur  anywhere  in  the  string.
       Given  a	 string	`', the	values `your', `*' and
       `www.your*' will	all match.






	      The GroupDomains keyword allows you  to  group  individual  host
	      names  into  their  respective  domains. The value specifies the
	      level of grouping	to perform, and	can be thought of as 'the num-
	      ber  of dots' that will be displayed. For	example, if a visiting
	      host is named, a domain grouping of 1  will
	      result  in  just ""	being displayed, while a 2 will	result
	      in "". The default value of zero disable this feature.
	      Domains  will  only be grouped if	they do	not match any existing
	      "GroupSite" records, which allows	overriding this	 feature  with
	      your own if desired.

       The  Hide*  keywords  will  prevent  things from	being displayed	in the
       'Top' tables. The hidden	items will still be counted in	the  main  to-

	      Hide URL matching	name.

	      Hide site	matching name.

	      Hide referrer matching name.


	      Hide user	agents matching	name.

	      HideAllSites allows forcing individual sites to be hidden	in the
	      report. This is particularly useful  when	 used  in  conjunction
	      with  the	 "GroupDomain"	feature,  but could be useful in other
	      situations as well, such	as  when  you  only  want  to  display
	      grouped  sites  (with  the GroupSite keywords...). The value for
	      this keyword can be either 'yes' or 'no',	with 'no' the default,
	      allowing individual sites	to be displayed.

       All  the	Hide and Group "name" options can be combined in a single con-
       fig line. eg GroupAndHideURL. If	you start using	the Group* options you
       will find that you tend to match	every Group* option with a correspond-
       ing Hide* option. The GroupAndHide* options simply short	 circuit  this
       unnecessary duplication.






       The  Dump*  keywords  allow the dumping of Sites, URL's,	Referrers User
       Agents, User names and Search strings to	separate  tab  delimited  text
       files, suitable for import into most database or	spreadsheet programs.

	      DumpPath specifies the path to dump the files. If	not specified,
	      it will default to the current output directory. Do  not	use  a
	      trailing slash ('/').

	      The  DumpHeader  keyword	specifies if a header record should be
	      written to the file. A header record is the first	record of  the
	      file,  and contains the labels for each field written. Normally,
	      files that are intended to be imported into  a  database	system
	      will  not	 need  a header	record,	while spreadsheets usually do.
	      Value can	be either 'yes'	or 'no', with 'no' being the default.

	      DumpExtension allow you to specify the dump  filename  extension
	      to  use. The default is "tab", but some programs are picky about
	      the filenames they use, so you may change	it here	(for  example,
	      some people may prefer to	use "csv").










       Sample Extract of a configuration file:

       # The 'auto' value means	that AWFFull will try and work out what	log format
       # you are sending to it.	If no joy, AWFFull will	immediately exit.

       LogType	      auto

       # OutputDir is where you	want to	put the	output files.  This should
       # should	be a full path name, however relative ones might work as well.
       # If no output directory	is specified, the current directory will be used.

       OutputDir      .

       Minimal configuration file:

       # Sample	*MINIMAL* AWFFull configuration	file
       # The below settings are	the only ones you *really* need	to worry about
       # when configuring AWFFull. See the sample.conf file for	all options if
       # the below only	serves to whet your appetite.
       # See awfful(1) or sample.conf for full explanations.

       # We can	do a little bit	each day, or hour...
       Incremental	       yes

       # Your server name to display

       # Use PageType OR NotPageType
       # I personally prefer NotPageType - YMMV!
       PageType		       htm
       PageType		       html
       PageType		       php
       #PageType	       pl
       #PageType	       cfm
       #PageType	       pdf
       #PageType	       txt
       #PageType	       cgi
       ### OR! ---------------------
       #NotPageType	       gif
       #NotPageType	       css
       #NotPageType	       js
       #NotPageType	       jpg
       #NotPageType	       ico
       #NotPageType	       png

       # Should	always fold in Sequence	Errors.	Logs can be messy...
       FoldSeqErr	       yes

       # If you	want to	see the	country	flags, uncomment the following.
       # This is the, possibly relative, URL where the flag flies are located.
       #FlagsLocation	       flags



       None currently known. YMMV....

       Report  bugs  to	<>, or use the email
       discussion list:	<>

       In case it is not obvious: AWFFull is a play/pun	on the	word  `awful',
       and is pronounced the same way. Yes it was deliberate.

       [1]  Web	 Site  Measurement  Hacks.  Eric  T.  Peterson	(and  others).
       O'Reilly. ISBN 0-596-00988-7.

				  2008-Dec-13		       awffull.conf(5)


Want to link to this manual page? Use this URL:

home | help