Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
rwuniq(1)			SiLK Tool Suite			     rwuniq(1)

NAME
       rwuniq -	Bin SiLK Flow records by a key and print each bin's volume

SYNOPSIS
	 rwuniq	--fields=KEY [--values=VALUES]
	       [--all-counts] [{--bytes	| --bytes=MIN |	--bytes=MIN-MAX}]
	       [{--packets | --packets=MIN | --packets=MIN-MAX}]
	       [{--flows | --flows=MIN | --flows=MIN-MAX}]
	       [--stime] [--etime]
	       [{--sip-distinct	| --sip-distinct=MIN | --sip-distinct=MIN-MAX}]
	       [{--dip-distinct	| --dip-distinct=MIN | --dip-distinct=MIN-MAX}]
	       [--presorted-input] [--sort-output]
	       [{--bin-time | --bin-time=SECONDS}]
	       [--timestamp-format=FORMAT] [--epoch-time]
	       [--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
	       [--integer-sensors] [--integer-tcp-flags]
	       [--no-titles] [--no-columns] [--column-separator=CHAR]
	       [--no-final-delimiter] [{--delimited | --delimited=CHAR}]
	       [--print-filenames] [--copy-input=PATH] [--output-path=PATH]
	       [--pager=PAGER_PROG] [--temp-directory=DIR_PATH]
	       [{--legacy-timestamps | --legacy-timestamps={1,0}}]
	       [--ipv6-policy={ignore,asv4,mix,force,only}]
	       [--site-config-file=FILENAME]
	       [--plugin=PLUGIN	[--plugin=PLUGIN ...]]
	       [--python-file=PATH [--python-file=PATH ...]]
	       [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       [--pmap-column-width=NUM]
	       {[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}

	 rwuniq	[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       [--plugin=PLUGIN	...] [--python-file=PATH ...] --help

	 rwuniq	[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       [--plugin=PLUGIN	...] [--python-file=PATH ...] --help-fields

	 rwuniq	--version

DESCRIPTION
       rwuniq reads SiLK Flow records and groups them by a key composed	of
       user-specified attributes of the	flows.	For each group (or bin), a
       collection of user-specified aggregate values is	computed; these	values
       are typically related to	the volume of the bin, such as the sum of the
       bytes fields for	all records that match the key.	 Once all the SiLK
       Flow records are	read, the key fields and the aggregate values are
       printed.	 For some of the built-in aggregate values, it is possible to
       limit the output	to the bins where the aggregate	value meets a user-
       specified minimum and/or	maximum.

       There is	no need	to sort	the input to rwuniq since rwuniq normally
       rearranges the records as they are read.	 To have rwuniq	sort its
       output, use the --sort-output switch.

       rwuniq reads SiLK Flow records from the files named on the command line
       or from the standard input when no file names are specified and --xargs
       is not present.	To read	the standard input in addition to the named
       files, use "-" or "stdin" as a file name.  If an	input file name	ends
       in ".gz", the file is uncompressed as it	is read.  When the --xargs
       switch is provided, rwuniq reads	the names of the files to process from
       the named text file or from the standard	input if no file name argument
       is provided to the switch.  The input to	--xargs	must contain one file
       name per	line.

       The user	must provide the --fields switch to select the flow
       attribute(s) (or	field(s)) that comprise	the key	for each bin.  The
       available fields	are similar to those supported by rwcut(1); see	the
       description of the --fields switch in the "OPTIONS" section below for
       the details.  The list of fields	can be extended	by loading PySiLK
       files (see silkpython(3)) or plug-ins (silk-plugin(3)).	The fields
       will be printed in the order in which they occur	in the --fields
       switch.	The size of the	key is limited to 256 octets.  A larger	key
       will more quickly use the available the memory leading to slower
       performance.

       The aggregate value(s) to compute for each bin are also chosen by the
       user.  As with the key fields, the user can extend the list of
       aggregate fields	by using PySiLK	or plug-ins.  The preferred way	to
       specify the aggregate fields is to use the --values switch; the
       aggregate fields	will be	printed	in the order they occur	in the
       --values	switch.	 The thresholding switches (e.g., --bytes) can also be
       used to specify the aggregate values to compute.	 Aggregate values that
       are only	specified with thresholding switches will be printed after
       those that appear in --values, in the following order for backward
       compatibility: bytes, packets, flows, stime, etime, sip-distinct, dip-
       distinct.  If the user does not select any aggregate value(s), rwuniq
       defaults	to computing the number	of flow	records	for each bin and
       printing	all bins.  As with the key fields, requesting more aggregate
       values slows performance.

       The --presorted-input switch may	allow rwuniq to	process	data more
       efficiently by causing rwuniq to	assume the input has been previously
       sorted with the rwsort(1) command.  With	this switch, rwuniq typically
       does not	need large amounts of memory because it	does not bin each
       flow; instead, it keeps a running summation and outputs the bin
       whenever	the key	changes.  For the output to be meaningful, rwsort and
       rwuniq must be invoked with the same --fields value.  When multiple
       input files are specified and --presorted-input is given, rwuniq	will
       merge-sort the flow records from	the input files.  rwuniq will usually
       run faster if you do not	include	the --presorted-input switch when
       counting	distinct IP addresses, even when reading sorted	input.
       Finally,	you may	get unusual results with --presorted-input when	the
       --fields	switch contains	multiple time-related key fields ("sTime",
       "duration", "eTime"), or	when the time-related key is not the final key
       listed in --fields; see the "NOTES" section for details.

       rwuniq attempts to keep all key and aggregate value data	in the
       computer's memory.  If rwuniq runs out of memory, the current key and
       aggregate value data is written to a temporary file.  Once all input
       has been	processed, the data from the temporary files is	merged to
       produce the final output.  By default, these temporary files are	stored
       in the /tmp directory.  Because these files can be large, it is
       strongly	recommended that /tmp not be used as the temporary directory.
       To modify the temporary directory used by rwuniq, provide the
       --temp-directory	switch,	set the	SILK_TMPDIR environment	variable, or
       set the TMPDIR environment variable.

OPTIONS
       Option names may	be abbreviated if the abbreviation is unique or	is an
       exact match for an option.  A parameter to an option may	be specified
       as --arg=param or --arg param, though the first form is required	for
       options that take optional parameters.

       The --fields switch is required.	 rwuniq	will fail when it is not
       provided.

       --fields=KEY
	   KEY contains	the list of flow attributes (a.k.a. fields or columns)
	   that	make up	the key	into which flows are binned.  The columns will
	   be displayed	in the order the fields	are specified.	Each field may
	   be specified	once only.  KEY	is a comma separated list of field-
	   names, field-integers, and ranges of	field-integers;	a range	is
	   specified by	separating the start and end of	the range with a
	   hyphen (-).	Field-names are	case insensitive.  Example:

	    --fields=stime,10,1-5

	   There is no default value for the --fields switch; the switch must
	   be specified.

	   The complete	list of	built-in fields	that the SiLK tool suite
	   supports follows, though note that not all fields are present in
	   all SiLK file formats; when a field is not present, its value is 0.

	   sIP,1
	       source IP address

	   dIP,2
	       destination IP address

	   sPort,3
	       source port for TCP and UDP, or equivalent

	   dPort,4
	       destination port	for TCP	and UDP, or equivalent.	 See note at
	       "iType".

	   protocol,5
	       IP protocol

	   packets,pkts,6
	       packet count

	   bytes,7
	       byte count

	   flags,8
	       bit-wise	OR of TCP flags	over all packets

	   sTime,9
	       starting	time of	flow (seconds resolution). When	the time-
	       related fields "sTime","duration","eTime" are all in use,
	       rwuniq will ignore the final time field when binning the
	       records.

	   duration,10
	       duration	of flow	(seconds resolution).  See note	at "sTime,9".

	   eTime,11
	       end time	of flow	(seconds resolution).  See note	at "sTime,9".

	   sensor,12
	       name or ID of the sensor	where the flow was collected

	   class,20
	       class assigned to the flow by rwflowpack(8).  Binning by
	       "class" and/or "type" equates to	binning	by the integer value
	       used internally to represent the	class/type pair.  When
	       --fields	contains "class" but not "type", rwuniq's output will
	       have multiple rows with the same	value(s) for the key field(s).

	   type,21
	       type assigned to	the flow by rwflowpack(8).  See	note on
	       previous	entry.

	   iType
	       the ICMP	type value for ICMP or ICMPv6 flows and	empty
	       (numerically zero) for non-ICMP flows.  Internally, SiLK	stores
	       the ICMP	type and code in the "dPort" field.  To	avoid getting
	       very odd	results, either	do not use the "dPort" field when your
	       key includes ICMP field(s) or be	certain	to include the
	       "protocol" field	as part	of your	key.  This field was
	       introduced in SiLK 3.8.1.

	   iCode
	       the ICMP	code value for ICMP or ICMPv6 flows and	empty for non-
	       ICMP flows.  See	note at	"iType".

	   icmpTypeCode,25
	       equivalent to "iType","iCode" when used in --fields.  This
	       field may not be	mixed with "iType" or "iCode", and this	field
	       is deprecated as	of SiLK	3.8.1.	As of SiLK 3.8.1,
	       "icmpTypeCode" may no longer be used as the argument to the
	       "Distinct:" value field;	the "dPort" field will provide an
	       equivalent result as long as the	input is limited to ICMP flow
	       records.

	   Many	SiLK file formats do not store the following fields and	their
	   values will always be 0; they are listed here for completeness:

	   in,13
	       router SNMP input interface or vlanId if	packing	tools were
	       configured to capture it	(see sensor.conf(5))

	   out,14
	       router SNMP output interface or postVlanId

	   nhIP,15
	       router next hop IP

	   SiLK	can store flows	generated by enhanced collection software that
	   provides more information than NetFlow v5.  These flows may support
	   some	or all of these	additional fields; for flows without this
	   additional information, the field's value is	always 0.

	   initialFlags,26
	       TCP flags on first packet in the	flow

	   sessionFlags,27
	       bit-wise	OR of TCP flags	over all packets except	the first in
	       the flow

	   attributes,28
	       flow attributes set by the flow generator:

	       "S" all the packets in this flow	record are exactly the same
		   size

	       "F" flow	generator saw additional packets in this flow
		   following a packet with a FIN flag (excluding ACK packets)

	       "T" flow	generator prematurely created a	record for a long-
		   running connection due to a timeout.	 (When the flow
		   generator yaf(1) is run with	the --silk switch, it will
		   prematurely create a	flow and mark it with "T" if the byte
		   count of the	flow cannot be stored in a 32-bit value.)

	       "C" flow	generator created this flow as a continuation of long-
		   running connection, where the previous flow for this
		   connection met a timeout (or	a byte threshold in the	case
		   of yaf).

	       Consider	a long-running ssh session that	exceeds	the flow
	       generator's active timeout.  (This is the active	timeout	since
	       the flow	generator creates a flow for a connection that still
	       has activity).  The flow	generator will create multiple flow
	       records for this	ssh session, each spanning some	portion	of the
	       total session.  The first flow record will be marked with a "T"
	       indicating that it hit the timeout.  The	second through next-
	       to-last records will be marked with "TC"	indicating that	this
	       flow both timed out and is a continuation of a flow that	timed
	       out.  The final flow will be marked with	a "C", indicating that
	       it was created as a continuation	of an active flow.

	   application,29
	       guess as	to the content of the flow.  Some software that
	       generates flow records from packet data,	such as	yaf, will
	       inspect the contents of the packets that	make up	a flow and use
	       traffic signatures to label the content of the flow.  SiLK
	       calls this label	the application; yaf refers to it as the
	       appLabel.  The application is the port number that is
	       traditionally used for that type	of traffic (see	the
	       /etc/services file on most UNIX systems).  For example, traffic
	       that the	flow generator recognizes as FTP will have a value of
	       21, even	if that	traffic	is being routed	through	the standard
	       HTTP/web	port (80).

	   The following fields	provide	a way to label the IPs or ports	on a
	   record.  These fields require external files	to provide the mapping
	   from	the IP or port to the label:

	   sType,16
	       for the source IP address, the value 0 if the address is	non-
	       routable, 1 if it is internal, or 2 if it is routable and
	       external.  Uses the mapping file	specified by the
	       SILK_ADDRESS_TYPES environment variable,	or the
	       address_types.pmap mapping file,	as described in	addrtype(3).

	   dType,17
	       as sType	for the	destination IP address

	   scc,18
	       for the source IP address, a two-letter country code
	       abbreviation denoting the country where that IP address is
	       located.	 Uses the mapping file specified by the
	       SILK_COUNTRY_CODES environment variable,	or the
	       country_codes.pmap mapping file,	as described in	ccfilter(3).
	       The abbreviations are those used	by the Root-Zone Whois Index
	       (see for	example	<http://www.iana.org/cctld/cctld-whois.htm>)
	       or the following	special	codes: -- N/A (e.g. private and
	       experimental reserved addresses); a1 anonymous proxy; a2
	       satellite provider; o1 other

	   dcc,19
	       as scc for the destination IP

	   src-MAPNAME
	       label determined	by passing the source IP or the
	       protocol/source-port to the user-defined	mapping	defined	in the
	       prefix map associated with MAPNAME.  See	the description	of the
	       --pmap-file switch below	and the	pmapfilter(3) manual page.

	   dst-MAPNAME
	       as src-MAPNAME for the destination IP or
	       protocol/destination-port.

	   sval
	   dval
	       These are deprecated field names	created	by pmapfilter that
	       correspond to src-MAPNAME and dst-MAPNAME, respectively.	 These
	       fields are available when a prefix map is used that is not
	       associated with a MAPNAME.

	   Finally, the	list of	built-in fields	may be augmented by the	run-
	   time	loading	of PySiLK code or plug-ins written in C	(also called
	   shared object files or dynamic libraries), as described by the
	   --python-file and --plugin switches.

       --values=VALUES
	   Specify the aggregate values	to compute for each bin	as a comma
	   separated list of names.  Names are case insensitive.  When a
	   thresholding	switch specifies an aggregate value field that does
	   appear in VALUES, that field	is added to end	of VALUES.  When
	   neither the --values	switch nor any thresholding switch is
	   specified, rwuniq counts the	number of flow records for each	bin.
	   The aggregate fields	are printed in the order they occur in VALUES.
	   The names of	the built-in value fields follow.  This	list can be
	   augmented through the use of	PySiLK and plug-ins.

	   Records
	       Count the number	of flow	records	that mapped to each bin.

	   Packets
	       Sum the number of packets across	all records that mapped	to
	       each bin.

	   Bytes
	       Sum the number of bytes across all records that mapped to each
	       bin.

	   sTime-Earliest
	       Keep track of the earliest start	time (minimum time) seen
	       across all records that mapped to each bin.

	   eTime-Latest
	       Keep track of the latest	end time (maximum time)	seen across
	       all records that	mapped to each bin.

	   sIP-Distinct
	       Count the number	of distinct source IP addresses	that were seen
	       for each	bin.

	   dIP-Distinct
	       Count the number	of distinct destination	IP addresses that were
	       seen for	each bin.

	   Distinct:KEY_FIELD
	       Count the number	of distinct values for KEY_FIELD, where
	       KEY_FIELD is any	field that can be used as an argument to
	       --fields	except "icmpTypeCode".	For example, "Distinct:sPort"
	       will count the number of	distinct source	ports for each bin.
	       When this aggregate value field is used,	the specified
	       KEY_FIELD cannot	be present in the argument to --fields.

       --plugin=PLUGIN
	   Augment the list of key fields and/or aggregate value fields	by
	   using run-time loading of the plug-in (shared object) whose path is
	   PLUGIN.  The	switch may be repeated to load multiple	plug-ins.  The
	   creation of plug-ins	is described in	the silk-plugin(3) manual
	   page.  When PLUGIN does not contain a slash ("/"), rwuniq will
	   attempt to find a file named	PLUGIN in the directories listed in
	   the "FILES" section.	 If rwuniq finds the file, it uses that	path.
	   If PLUGIN contains a	slash or if rwuniq does	not find the file,
	   rwuniq relies on your operating system's dlopen(3) call to find the
	   file.  When the SILK_PLUGIN_DEBUG environment variable is non-
	   empty, rwuniq prints	status messages	to the standard	error as it
	   attempts to find and	open each of its plug-ins.

       The next	eight options will add the appropriate aggregate field to
       --values	if the field is	not present.  The options are processed	in the
       order they appear here, regardless of the order they occur on the
       command line.  Use of these switches without a threshold	value is
       deprecated.

       --all-counts
	   Enable the next five	sets of	options	with their default thresholds;
	   i.e., all possible counts (except the distinct counts) are computed
	   and printed.	 This switch is	deprecated.

       --bytes
       --bytes=MIN
       --bytes=MIN-MAX
	   Cause rwuniq	to total, for each unique key, the number of bytes in
	   each	flow record.  When MIN is provided, bins are printed only when
	   they	had at least MIN total bytes.  When MAX	is also	provided, bins
	   are printed only when they had no more than MAX total bytes.	 A MIN
	   of 0	is treated as 1.  When MIN is not provided, a default of 1 is
	   used.

       --packets
       --packets=MIN
       --packets=MIN-MAX
	   Cause rwuniq	to sum,	for each unique	key, the number	of packets in
	   each	flow record.  When MIN is provided, bins are printed only when
	   they	had at least MIN sum of	packets.  When MAX is also provided,
	   bins	are printed only when they had no more than MAX	sum of
	   packets.  A MIN of 0	is treated as 1.  When MIN is not provided, a
	   default of 1	is used.

       --flows
       --flows=MIN
       --flows=MIN-MAX
	   Cause rwuniq	to sum the number of flow records in each uniquely
	   keyed bin.  When MIN	is provided, bins are printed only when	they
	   had at least	MIN number of flows.  When MAX is also provided, bins
	   are printed only when they had no more than MAX flows.  A MIN of 0
	   is treated as 1.  When MIN is not provided, a default of 1 is used.

       --stime
	   Cause rwuniq	to keep	track of the earliest time at which it saw a
	   flow	that matched each bin's	unique key.  This option does not
	   support thresholds, and it is deprecated.

       --etime
	   Cause rwuniq	to keep	track of the latest (most recent) time at
	   which it saw	a flow that matched each bin's unique key.  This
	   option does not support thresholds, and it is deprecated.

       --sip-distinct
       --sip-distinct=MIN
       --sip-distinct=MIN-MAX
	   Cause rwuniq	to count the number of distinct	source IP addresses
	   that	were seen for each uniquely keyed bin.	When MIN is provided,
	   bins	are printed only when they had at least	MIN distinct sources.
	   When	MAX is also provided, bins are printed only when they had no
	   more	than MAX distinct sources.  A MIN of 0 is treated as 1.	 When
	   MIN is not provided,	a default of 1 is used.	 When this switch is
	   provided, the sIP field cannot be part of the key.

       --dip-distinct
       --dip-distinct=MIN
       --dip-distinct=MIN-MAX
	   As --sip-distinct for destination IP	addresses.

       Miscellaneous options:

       --presorted-input
	   Cause rwuniq	to assume that it is reading sorted input; i.e., that
	   rwuniq's input file(s) were generated by rwsort(1) using the	exact
	   same	value for the --fields switch.	When no	distinct counts	are
	   being computed, rwuniq can process its input	without	needing	to
	   write temporary files.  When	multiple input files are specified,
	   rwuniq will merge-sort the flow records from	the input files.  See
	   the "NOTES" section for issues that may occur when using
	   --presorted-input.

       --sort-output
	   Cause rwuniq	to present the output in sorted	numerical order.  The
	   key rwuniq uses for sorting is the same key it uses to index	each
	   bin.

       --bin-time
       --bin-time=SECONDS
	   Adjust the key fields 'sTime' and 'eTime' to	appear on
	   SECONDS-second boundaries (the floor	of the time is used).  When no
	   value is provided to	the switch, 60-second time bins	are used.
	   (When the start-time	is the only key	field and time binning is
	   desired, consider using rwcount(1) instead.)

       --timestamp-format=FORMAT
	   Specify the format and/or timezone to use when printing timestamps.
	   When	this switch is not specified, the SILK_TIMESTAMP_FORMAT
	   environment variable	is checked for a default format	and/or
	   timezone.  If it is empty or	contains invalid values, timestamps
	   are printed in the default format, and the timezone is UTC unless
	   SiLK	was compiled with local	timezone support.  FORMAT is a comma-
	   separated list of a format and/or a timezone.  The format is	one
	   of:

	   default
	       Print the timestamps as "YYYY/MM/DDThh:mm:ss".

	   iso Print the timestamps as "YYYY-MM-DD hh:mm:ss".

	   m/d/y
	       Print the timestamps as "MM/DD/YYYY hh:mm:ss".

	   epoch
	       Print the timestamps as the number of seconds since 00:00:00
	       UTC on 1970-01-01.

	   When	a timezone is specified, it is used regardless of the default
	   timezone support compiled into SiLK.	 The timezone is one of:

	   utc Use Coordinated Universal Time to print timestamps.

	   local
	       Use the TZ environment variable or the local timezone.

       --epoch-time
	   Print timestamps as epoch time (number of seconds since midnight
	   GMT on 1970-01-01).	This switch is equivalent to
	   --timestamp-format=epoch, it	is deprecated as of SiLK 3.0.0,	and it
	   will	be removed in the SiLK 4.0 release.

       --ip-format=FORMAT
	   Specify how IP addresses are	printed.  When this switch is not
	   specified, the SILK_IP_FORMAT environment variable is checked for a
	   format.  If it is empty or contains an invalid format, IPs are
	   printed in the canonical format.  The FORMAT	is one of:

	   canonical
	       Print IP	addresses in their canonical form: dotted quad for
	       IPv4 (127.0.0.1)	and hexadectet for IPv6	("2001:db8::1").  Note
	       that IPv6 addresses in ::ffff:0:0/96 and	some IPv6 addresses in
	       ::/96 will be printed as	a mixture of IPv6 and IPv4.

	   zero-padded
	       Print IP	addresses in their canonical form, but add zeros to
	       the output so it	fully fills the	width of column.  The
	       addresses 127.0.0.1 and "2001:db8::1" are printed as
	       127.000.000.001 and "2001:0db8:0000:0000:0000:0000:0000:0001",
	       respectively.  When the --ipv6-policy is	"force", the output
	       for 127.0.0.1 becomes
	       "0000:0000:0000:0000:0000:ffff:7f00:0001".

	   decimal
	       Print IP	addresses as integers in decimal format.  The
	       addresses 127.0.0.1 and "2001:db8::1" are printed as 2130706433
	       and 42540766411282592856903984951653826561, respectively.

	   hexadecimal
	       Print IP	addresses as integers in hexadecimal format.  The
	       addresses 127.0.0.1 and "2001:db8::1" are printed as "7f000001"
	       and "20010db8000000000000000000000001", respectively.

	   force-ipv6
	       Print all IP addresses in the canonical form for	IPv6 without
	       using any IPv4 notation.	 Any IPv4 address is mapped into the
	       ::ffff:0:0/96 netblock.	The addresses 127.0.0.1	and
	       "2001:db8::1" are printed as "::ffff:7f00:1" and	"2001:db8::1",
	       respectively.

       --integer-ips
	   Print IP addresses as integers.  This switch	is equivalent to
	   --ip-format=decimal,	it is deprecated as of SiLK 3.7.0, and it will
	   be removed in the SiLK 4.0 release.

       --zero-pad-ips
	   Print IP addresses as fully-expanded, zero-padded values in their
	   canonical form.  This switch	is equivalent to
	   --ip-format=zero-padded, it is deprecated as	of SiLK	3.7.0, and it
	   will	be removed in the SiLK 4.0 release.

       --integer-sensors
	   Print the integer ID	of the sensor rather than its name.

       --integer-tcp-flags
	   Print the TCP flag fields (flags, initialFlags, sessionFlags) as an
	   integer value.  Typically, the characters "F,S,R,P,A,U,E,C" are
	   used	to represent the TCP flags.

       --no-titles
	   Turn	off column titles.  By default,	titles are printed.

       --no-columns
	   Disable fixed-width columnar	output.

       --column-separator=C
	   Use specified character between columns and after the final column.
	   When	this switch is not specified, the default of '|' is used.

       --no-final-delimiter
	   Do not print	the column separator after the final column.  Normally
	   a delimiter is printed.

       --delimited
       --delimited=C
	   Run as if --no-columns --no-final-delimiter --column-sep=C had been
	   specified.  That is,	disable	fixed-width columnar output; if
	   character C is provided, it is used as the delimiter	between
	   columns instead of the default '|'.

       --print-filenames
	   Print to the	standard error the names of input files	as they	are
	   opened.

       --copy-input=PATH
	   Copy	all binary SiLK	Flow records read as input to the specified
	   file	or named pipe.	PATH may be "stdout" or	"-" to write flows to
	   the standard	output as long as the --output-path switch is
	   specified to	redirect rwuniq's textual output to a different
	   location.

       --output-path=PATH
	   Write the textual output to PATH, where PATH	is a filename, a named
	   pipe, the keyword "stderr" to write the output to the standard
	   error, or the keyword "stdout" or "-" to write the output to	the
	   standard output (and	bypass the paging program).  If	PATH names an
	   existing file, rwuniq exits with an error unless the	SILK_CLOBBER
	   environment variable	is set,	in which case PATH is overwritten.  If
	   this	switch is not given, the output	is either sent to the pager or
	   written to the standard output.

       --pager=PAGER_PROG
	   When	output is to a terminal, invoke	the program PAGER_PROG to view
	   the output one screen full at a time.  This switch overrides	the
	   SILK_PAGER environment variable, which in turn overrides the	PAGER
	   variable.  If the --output-path switch is given or if the value of
	   the pager is	determined to be the empty string, no paging is
	   performed and all output is written to the terminal.

       --ipv6-policy=POLICY
	   Determine how IPv4 and IPv6 flows are handled when SiLK has been
	   compiled with IPv6 support.	When the switch	is not provided, the
	   SILK_IPV6_POLICY environment	variable is checked for	a policy.  If
	   it is also unset or contains	an invalid policy, the POLICY is mix.
	   When	SiLK has not been compiled with	IPv6 support, IPv6 flows are
	   always ignored, regardless of the value passed to this switch or in
	   the SILK_IPV6_POLICY	variable.  The supported values	for POLICY
	   are:

	   ignore
	       Ignore any flow record marked as	IPv6, regardless of the	IP
	       addresses it contains.

	   asv4
	       Convert IPv6 flow records that contain addresses	in the
	       ::ffff:0:0/96 prefix to IPv4 and	ignore all other IPv6 flow
	       records.

	   mix Process the input as a mixture of IPv4 and IPv6 flow records.
	       When an IP address is used as part of the key or	value, this
	       policy is equivalent to force.

	   force
	       Convert IPv4 flow records to IPv6, mapping the IPv4 addresses
	       into the	::ffff:0:0/96 prefix.

	   only
	       Process only flow records that are marked as IPv6 and ignore
	       IPv4 flow records in the	input.

       --temp-directory=DIR_PATH
	   Specify the name of the directory in	which to store data files
	   temporarily when the	memory is not large enough to store all	the
	   bins	and their aggregate values.  This switch overrides the
	   directory specified in the SILK_TMPDIR environment variable,	which
	   overrides the directory specified in	the TMPDIR variable, which
	   overrides the default, /tmp.

       --site-config-file=FILENAME
	   Read	the SiLK site configuration from the named file	FILENAME.
	   When	this switch is not provided, rwuniq searches for the site
	   configuration file in the locations specified in the	"FILES"
	   section.

       --legacy-timestamps
       --legacy-timestamps=NUM
	   When	NUM is not specified or	is 1, this switch is equivalent	to
	   --timestamp-format=m/d/y.  Otherwise, the switch has	no effect.
	   This	switch is deprecated as	of SiLK	3.0.0, and it will be removed
	   in the SiLK 4.0 release.

       --xargs
       --xargs=FILENAME
	   Read	the names of the input files from FILENAME or from the
	   standard input if FILENAME is not provided.	The input is expected
	   to have one filename	per line.  rwuniq opens	each named file	in
	   turn	and reads records from it as if	the filenames had been listed
	   on the command line.

       --help
	   Print the available options and exit.  Specifying switches that add
	   new fields, values, or additional switches before --help will allow
	   the output to include descriptions of those fields or switches.

       --help-fields
	   Print the description and alias(es) of each field and value and
	   exit.  Specifying switches that add new fields before --help-fields
	   will	allow the output to include descriptions of those fields.

       --version
	   Print the version number and	information about how SiLK was
	   configured, then exit the application.

       --pmap-file=MAPNAME:PATH
       --pmap-file=PATH
	   Instruct rwuniq to load the mapping file located at PATH and	create
	   the src-MAPNAME and dst-MAPNAME fields.  When MAPNAME is provided
	   explicitly, it will be used to refer	to the fields specific to that
	   prefix map.	If MAPNAME is not provided, rwuniq will	check the
	   prefix map file to see if a map-name	was specified when the file
	   was created.	 If no map-name	is available, rwuniq creates the
	   fields sval and dval.  Multiple --pmap-file switches	are supported
	   as long as each uses	a unique value for map-name.  The --pmap-file
	   switch(es) must precede the --fields	switch.	 For more information,
	   see pmapfilter(3).

       --pmap-column-width=NUM
	   When	printing a label associated with a prefix map, this switch
	   gives the maximum number of characters to use when displaying the
	   textual value of the	field.

       --python-file=PATH
	   When	the SiLK Python	plug-in	is used, rwuniq	reads the Python code
	   from	the file PATH to define	additional fields that can be used as
	   part	of the key or as an aggregate value.  This file	should call
	   register_field() for	each field it wishes to	define.	 For details
	   and examples, see the silkpython(3) and pysilk(3) manual pages.

EXAMPLES
       In these	examples, the dollar sign ("$")	represents the shell prompt
       and a backslash ("\") is	used to	continue a line	for better
       readability.  Many examples assume previous rwfilter(1) commands	have
       written data files named	data.rw	and data-v6.rw.

       Print the byte-,	packet-, and record-counts for each protocol, sorting
       the results by protocol (to sort	by the volume, use rwstats(1)):

	$ rwuniq --fields=proto	--values=bytes,packets,records --sort data.rw
	pro|	     Bytes|	   Packets|   Records|
	  1|	   5344836|	     73473|	 7801|
	  6|   59945492930|	  72127917|    165363|
	 17|	  17553593|	     77764|	77764|

       Print the number	of records seen	for each source	port:

	$ rwuniq --fields=sport	data.rw	| head
	sPort|	 Records|
	29485|	      45|
	29055|	      31|
	26373|	       7|
	28149|	      17|
	28171|	      21|
	28413|	      39|
	25836|	       3|
	28376|	       7|
	23847|	       1|

       Print the number	of records seen	for each source	port for ports having
       more than 1000 records:

	$ rwuniq --fields=sport	--flows=1000 data.rw
	sPort|	 Records|
	   25|	   15568|
	   67|	    7807|
	   80|	   27044|
	   53|	   62216|
	   22|	   27994|
	 8080|	    3946|
	  443|	    7917|
	  123|	    7741|
	    0|	    7801|

       Print the source	addresses that sent more than 10,000,000 bytes:

	$ rwuniq --fields=sip --bytes=10000000 data-v6.rw
			      sIP|		 Bytes|
	     2001:db8:a:fd::90:bd|	      14529210|

       For source addresses that sent more than	10,000,000 bytes, print	the
       number of unique	destination hosts it contacted:

	$ rwuniq --fields=sip --values=bytes,distinct:dip data-v6.rw
			      sIP|		 Bytes|dIP-Distin|
	     2001:db8:a:fd::90:bd|	      14529210|		2|

       Print the number	of bytes that host shared with each destination	(first
       use rwfilter to limit the input to that host):

	$ rwfilter --saddr=2001:db8:a:fd::90:bd	--pass=- data-v6.rw	   \
	  | rwuniq --fields=sip	--values=bytes
			      dIP|		 Bytes|
	    2001:db8:c0:a8::fa:5d|	       7097847|
	     2001:db8:c0:a8::dd:6|	       7431363|

       Print the packet	and byte counts	for each source-destination IP pair,
       where the prefix	length is 16 (use rwnetmask(1) on the input to
       rwuniq):

	$ rwnetmask --4sip-prefix=16 --4dip-prefix=16 data.rw	   \
	  | rwuniq --fields=sip,dip --values=packet,byte | head
		   sIP|		   dIP|	 Packets|	 Bytes|
	    10.139.0.0|	   192.168.0.0|	   33490|     22950353|
	     10.40.0.0|	   192.168.0.0|	     258|	 18544|
	    10.204.0.0|	   192.168.0.0|	  353233|    288736424|
	    10.106.0.0|	   192.168.0.0|	   13051|      3843693|
	     10.71.0.0|	   192.168.0.0|	    4355|      1391194|
	     10.98.0.0|	   192.168.0.0|	    7312|      7328359|
	    10.114.0.0|	   192.168.0.0|	    2538|      4137927|
	    10.168.0.0|	   192.168.0.0|	   92094|     86883062|
	    10.176.0.0|	   192.168.0.0|	  122101|    116555051|

       Print the source	of TCP traffic with no more than 3 packets and which
       also appears at least 4 times (use rwfilter on the input):

	$ rwfilter --proto=6 --packets=1-3 --pass=- data.rw	   \
	  | rwuniq --field=sip --flows=4 | head	-5
		    sIP|   Records|
	 10.147.252.145|       256|
	  10.103.144.78|       256|
	 10.117.142.175|       256|
	  10.41.221.170|       256|

       The silkpython(3) manual	page provides examples that use	PySiLK to
       create arbitrary	fields to use as part of the key for rwuniq.

ENVIRONMENT
       SILK_IPV6_POLICY
	   This	environment variable is	used as	the value for --ipv6-policy
	   when	that switch is not provided.

       SILK_IP_FORMAT
	   This	environment variable is	used as	the value for --ip-format when
	   that	switch is not provided.	 Since SiLK 3.11.0.

       SILK_TIMESTAMP_FORMAT
	   This	environment variable is	used as	the value for
	   --timestamp-format when that	switch is not provided.	 Since SiLK
	   3.11.0.

       SILK_PAGER
	   When	set to a non-empty string, rwuniq automatically	invokes	this
	   program to display its output a screen at a time.  If set to	an
	   empty string, rwuniq	does not automatically page its	output.

       PAGER
	   When	set and	SILK_PAGER is not set, rwuniq automatically invokes
	   this	program	to display its output a	screen at a time.

       SILK_TMPDIR
	   When	set and	--temp-directory is not	specified, rwuniq writes the
	   temporary files it creates to this directory.  SILK_TMPDIR
	   overrides the value of TMPDIR.

       TMPDIR
	   When	set and	SILK_TMPDIR is not set,	rwuniq writes the temporary
	   files it creates to this directory.

       PYTHONPATH
	   This	environment variable is	used by	Python to locate modules.
	   When	--python-file is specified, rwuniq must	load the Python	files
	   that	comprise the PySiLK package, such as silk/__init__.py.	If
	   this	silk/ directory	is located outside Python's normal search path
	   (for	example, in the	SiLK installation tree), it may	be necessary
	   to set or modify the	PYTHONPATH environment variable	to include the
	   parent directory of silk/ so	that Python can	find the PySiLK
	   module.

       SILK_PYTHON_TRACEBACK
	   When	set, Python plug-ins will output traceback information on
	   Python errors to the	standard error.

       SILK_COUNTRY_CODES
	   This	environment variable allows the	user to	specify	the country
	   code	mapping	file that rwuniq uses when computing the scc and dcc
	   fields.  The	value may be a complete	path or	a file relative	to the
	   SILK_PATH.  See the "FILES" section for standard locations of this
	   file.

       SILK_ADDRESS_TYPES
	   This	environment variable allows the	user to	specify	the address
	   type	mapping	file that rwuniq uses when computing the sType and
	   dType fields.  The value may	be a complete path or a	file relative
	   to the SILK_PATH.  See the "FILES" section for standard locations
	   of this file.

       SILK_CLOBBER
	   The SiLK tools normally refuse to overwrite existing	files.
	   Setting SILK_CLOBBER	to a non-empty value removes this restriction.

       SILK_CONFIG_FILE
	   This	environment variable is	used as	the value for the
	   --site-config-file when that	switch is not provided.

       SILK_DATA_ROOTDIR
	   This	environment variable specifies the root	directory of data
	   repository.	As described in	the "FILES" section, rwuniq may	use
	   this	environment variable when searching for	the SiLK site
	   configuration file.

       SILK_PATH
	   This	environment variable gives the root of the install tree.  When
	   searching for configuration files and plug-ins, rwuniq may use this
	   environment variable.  See the "FILES" section for details.

       TZ  When	the argument to	the --timestamp-format switch includes "local"
	   or when a SiLK installation is built	to use the local timezone, the
	   value of the	TZ environment variable	determines the timezone	in
	   which rwuniq	displays timestamps.  (If both of those	are false, the
	   TZ environment variable is ignored.)	 If the	TZ environment
	   variable is not set,	the machine's default timezone is used.
	   Setting TZ to the empty string or 0 causes timestamps to be
	   displayed in	UTC.  For system information on	the TZ variable, see
	   tzset(3) or environ(7).  (To	determine if SiLK was built with
	   support for the local timezone, check the "Timezone support"	value
	   in the output of rwuniq --version.)

       SILK_PLUGIN_DEBUG
	   When	set to 1, rwuniq prints	status messages	to the standard	error
	   as it attempts to find and open each	of its plug-ins.  In addition,
	   when	an attempt to register a field fails, rwuniq prints a message
	   specifying the additional function(s) that must be defined to
	   register the	field in rwuniq.  Be aware that	the output can be
	   rather verbose.

       SILK_TEMPFILE_DEBUG
	   When	set to 1, rwuniq prints	debugging messages to the standard
	   error as it creates,	re-opens, and removes temporary	files.

       SILK_UNIQUE_DEBUG
	   When	set to 1, the binning engine used by rwuniq prints debugging
	   messages to the standard error.

FILES
       ${SILK_ADDRESS_TYPES}
       ${SILK_PATH}/share/silk/address_types.pmap
       ${SILK_PATH}/share/address_types.pmap
       /usr/local/share/silk/address_types.pmap
       /usr/local/share/address_types.pmap
	   Possible locations for the address types mapping file required by
	   the sType and dType fields.

       ${SILK_CONFIG_FILE}
       ${SILK_DATA_ROOTDIR}/silk.conf
       /data/silk.conf
       ${SILK_PATH}/share/silk/silk.conf
       ${SILK_PATH}/share/silk.conf
       /usr/local/share/silk/silk.conf
       /usr/local/share/silk.conf
	   Possible locations for the SiLK site	configuration file which are
	   checked when	the --site-config-file switch is not provided.

       ${SILK_COUNTRY_CODES}
       ${SILK_PATH}/share/silk/country_codes.pmap
       ${SILK_PATH}/share/country_codes.pmap
       /usr/local/share/silk/country_codes.pmap
       /usr/local/share/country_codes.pmap
	   Possible locations for the country code mapping file	required by
	   the scc and dcc fields.

       ${SILK_PATH}/lib64/silk/
       ${SILK_PATH}/lib64/
       ${SILK_PATH}/lib/silk/
       ${SILK_PATH}/lib/
       /usr/local/lib64/silk/
       /usr/local/lib64/
       /usr/local/lib/silk/
       /usr/local/lib/
	   Directories that rwuniq checks when attempting to load a plug-in.

       ${SILK_TMPDIR}/
       ${TMPDIR}/
       /tmp/
	   Directory in	which to create	temporary files.

NOTES
       If multiple thresholds are given	(e.g., "--bytes=80 --flows=2"),	the
       values must meet	all thresholds before the record is printed.  For
       example,	if a given key saw a single 100-byte flow, the entry would not
       printed given the switches above.

       rwuniq functionally replaces the	combination of

	rwcut |	sort | uniq -c

       To get a	list of	unique IP addresses in a data set without the counting
       or threshold abilities of rwuniq, consider using	the IPset tools	for
       improved	performance:

	rwset --sip-set=stdout | rwsetcat --print-ips

       For situations where the	key and	value are each a single	field, the Bag
       tools usually provide better performance, especially when the key is
       one or two bytes:

	rwbag --bag-file=sport,bytes,stdout | rwbagcat

       rwgroup(1) works	similarly to rwuniq, except the	data remains in	the
       form of SiLK Flow records, and the next-hop-IP field is modified	to
       denote the records that form a bin.

       rwstats(1) can do the same binning as rwuniq, and then sort the data by
       an aggregate field.

       When the	--bin-time switch is given and the three time fields
       (starting-time ("sTime"), ending-time ("eTime"),	and duration
       ("duration")) are present in the	key, the duration field's value	will
       be modified to be the difference	between	the ending and starting	times.

       When the	three time-related key fields ("sTime","duration","eTime") are
       all in use, rwuniq will ignore the final	time field when	binning	the
       records,	but the	field will appear in the output.  Due to truncation of
       the milliseconds	values,	rwuniq will print a different number of	rows
       depending on the	order in which those three values appear in the
       --fields	switch.

       rwuniq supports counting	distinct source	and/or destination IPs.	 To
       see the number of distinct sources for each 10 minute bin, run:

	rwuniq --fields=stime --values=sip-distinct --bin-time=600 --sort-output

       When computing distinct counts over a field, the	field may not be part
       of the key; that	is, you	cannot have "--fields=sip
       --values=sip-distinct".

       Using the --presorted-input switch sometimes introduces more issues
       than it solves, and --presorted-input is	less necessary now that	rwuniq
       can use temporary files while processing	input.

       When computing distinct IP counts, rwuniq will typically	run faster if
       you do not use the --presorted-input switch, even if the	data was
       previously sorted.

       When using the --presorted-input	switch,	it is highly recommended that
       you use no more than one	time-related key field ("sTime", "duration",
       "eTime")	in the --fields	switch and that	the time-related key appear
       last in --fields.  The issue is caused by rwsort	considering the
       millisecond values on the times when sorting, while rwuniq truncates
       the millisecond value.  The result may be unsorted output and multiple
       rows in the output that have the	same values for	the key	fields:

	$ rwsort --fields=stime,duration data.rw       \
	  | rwuniq --fields=stime,dur --presorted
		      sTime|durat|   Records|
	...
	2009/02/12T00:00:57|	0|	   2|
	2009/02/12T00:00:57|   29|	   2|
	2009/02/12T00:00:57|	0|	   2|
	2009/02/12T00:00:57|   13|	   2|
	...

       rwuniq's	strength is its	ability	to build arbitrary keys	and aggregate
       fields.	For a key of a single IP address, see rwaddrcount(1) and
       rwbag(1); for a key made	up of a	single CIDR block (/8, /16, /24	only),
       a single	port, or a single protocol, use	rwtotal(1) or rwbag(1).

SEE ALSO
       rwfilter(1), rwbag(1), rwcut(1),	rwset(1), rwsetcat(1), rwaddrcount(1),
       rwgroup(1), rwstats(1), rwnetmask(1), rwsort(1),	rwtotal(1),
       rwcount(1), addrtype(3),	ccfilter(3), pmapfilter(3), pysilk(3),
       silkpython(3), silk-plugin(3), sensor.conf(5), rwflowpack(8), silk(7),
       yaf(1), dlopen(3), tzset(3), environ(7)

SiLK 3.15.0			  2017-07-02			     rwuniq(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | EXAMPLES | ENVIRONMENT | FILES | NOTES | SEE ALSO

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=rwuniq&sektion=1&manpath=FreeBSD+12.1-RELEASE+and+Ports>

home | help