Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
rwsort(1)			SiLK Tool Suite			     rwsort(1)

NAME
       rwsort -	Sort SiLK Flow records on one or more fields

SYNOPSIS
	 rwsort	--fields=KEY [--presorted-input] [--reverse]
	       [--temp-directory=DIR_PATH] [--sort-buffer-size=SIZE]
	       [--note-add=TEXT] [--note-file-add=FILE]
	       [--compression-method=COMP_METHOD] [--print-filenames]
	       [--output-path=PATH] [--site-config-file=FILENAME]
	       [--plugin=PLUGIN	[--plugin=PLUGIN ...]]
	       [--python-file=PATH [--python-file=PATH ...]]
	       [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       {[--input-pipe=PATH] | [--xargs]|[--xargs=FILE] | [FILES...]}

	 rwsort	[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       [--plugin=PLUGIN	...] [--python-file=PATH ...] --help

	 rwsort	[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
	       [--plugin=PLUGIN	...] [--python-file=PATH ...] --help-fields

	 rwsort	--version

DESCRIPTION
       rwsort reads SiLK Flow records, sorts the records by the	field(s)
       listed in the --fields switch, and writes the records to	the
       --output-path or	to the standard	output if it is	not connected to a
       terminal.  The output from rwsort is binary SiLK	Flow records; the
       output must be passed into another tool for human-readable output.

       Sorting records is an expensive operation, and it should	only be	used
       when necessary.	The tools that bin flow	records	(rwcount(1),
       rwuniq(1), rwstats(1), etc) do not require sorted data.

       rwsort reads SiLK Flow records from the files named on the command line
       or from the standard input when no file names are specified and neither
       --xargs nor --input-pipe	is present.  To	read the standard input	in
       addition	to the named files, use	"-" or "stdin" as a file name.	If an
       input file name ends in ".gz", the file is uncompressed as it is	read.
       When the	--xargs	switch is provided, rwsort reads the names of the
       files to	process	from the named text file or from the standard input if
       no file name argument is	provided to the	switch.	 The input to --xargs
       must contain one	file name per line.  The --input-pipe switch is
       deprecated and it is provided for legacy	reasons; its use is not
       required	since rwsort will automatically	read form the standard input.
       The --input-pipe	switch will be removed in the SiLK 4.0 release.

       The amount of fast memory used by rwsort	will increase until it reaches
       a maximum near 2GB.  (Use the --sort-buffer-size	switch to change this
       upper limit on the buffer size.)	 If more records are read than will
       fit into	memory,	the in-core records are	sorted and temporarily stored
       on disk as described by the --temp-directory switch.  When all records
       have been read, the on-disk files are merged and	the sorted records
       written to the output.

       By default, the temporary files are stored in the /tmp directory.
       Because these temporary files will be large, it is strongly recommended
       that /tmp not be	used as	the temporary directory.  To modify the
       temporary directory used	by rwsort, provide the --temp-directory
       switch, set the SILK_TMPDIR environment variable, or set	the TMPDIR
       environment variable.

       To merge	previously sorted SiLK data files into a sorted	stream,	run
       rwsort with the --presorted-input switch.  rwsort will merge-sort all
       the input files,	reducing it's memory requirements considerably.	 It is
       the user's responsibility to ensure that	all the	input files have been
       sorted with the same --fields value (and	--reverse if applicable).
       rwsort may still	require	use of a temporary directory while merging the
       files (for example, if rwsort does not have enough available file
       handles to open all the input files at once).

OPTIONS
       Option names may	be abbreviated if the abbreviation is unique or	is an
       exact match for an option.  A parameter to an option may	be specified
       as --arg=param or --arg param, though the first form is required	for
       options that take optional parameters.

       The --fields switch is required.	 rwsort	will fail when it is not
       provided.

       --fields=KEY
	   KEY contains	the list of flow attributes (a.k.a. fields or columns)
	   that	make up	the key	by which flows are sorted.  The	fields are in
	   listed in order from	primary	sort key, secondary key, etc.  Each
	   field may be	specified once only.  KEY is a comma separated list of
	   field-names,	field-integers,	and ranges of field-integers; a	range
	   is specified	by separating the start	and end	of the range with a
	   hyphen (-).	Field-names are	case insensitive.  Example:

	    --fields=stime,10,1-5

	   There is no default value for the --fields switch; the switch must
	   be specified.

	   The complete	list of	built-in fields	that the SiLK tool suite
	   supports follows, though note that not all fields are present in
	   all SiLK file formats; when a field is not present, its value is 0.

	   sIP,1
	       source IP address

	   dIP,2
	       destination IP address

	   sPort,3
	       source port for TCP and UDP, or equivalent

	   dPort,4
	       destination port	for TCP	and UDP, or equivalent.	 See note at
	       "iType".

	   protocol,5
	       IP protocol

	   packets,pkts,6
	       packet count

	   bytes,7
	       byte count

	   flags,8
	       bit-wise	OR of TCP flags	over all packets

	   sTime,9,sTime+msec,22
	       starting	time of	flow (milliseconds resolution)

	   duration,10,dur+msec,24
	       duration	of flow	(milliseconds resolution)

	   eTime,11,eTime+msec,23
	       end time	of flow	(milliseconds resolution)

	   sensor,12
	       name or ID of sensor where flow was collected

	   class,20,type,21
	       integer value of	the class/type pair assigned to	the flow by
	       rwflowpack(8)

	   iType
	       the ICMP	type value for ICMP or ICMPv6 flows and	zero for non-
	       ICMP flows.  Internally,	SiLK stores the	ICMP type and code in
	       the "dPort" field, so there is no need have both	"dPort"	and
	       "iType" or "iCode" in the sort key.  This field was introduced
	       in SiLK 3.8.1.

	   iCode
	       the ICMP	code value for ICMP or ICMPv6 flows and	zero for non-
	       ICMP flows.  See	note at	"iType".

	   icmpTypeCode,25
	       equivalent to "iType","iCode".  This field may not be mixed
	       with "iType" or "iCode",	and this field is deprecated as	of
	       SiLK 3.8.1.  Prior to SiLK 3.8.1, specifying the	"icmpTypeCode"
	       field was equivalent to specifying the "dPort" field.

	   Many	SiLK file formats do not store the following fields and	their
	   values will always be 0; they are listed here for completeness:

	   in,13
	       router SNMP input interface or vlanId if	packing	tools were
	       configured to capture it	(see sensor.conf(5))

	   out,14
	       router SNMP output interface or postVlanId

	   nhIP,15
	       router next hop IP

	   SiLK	can store flows	generated by enhanced collection software that
	   provides more information than NetFlow v5.  These flows may support
	   some	or all of these	additional fields; for flows without this
	   additional information, the field's value is	always 0.

	   initialFlags,26
	       TCP flags on first packet in the	flow

	   sessionFlags,27
	       bit-wise	OR of TCP flags	over all packets except	the first in
	       the flow

	   attributes,28
	       flow attributes set by the flow generator:

	       "S" all the packets in this flow	record are exactly the same
		   size

	       "F" flow	generator saw additional packets in this flow
		   following a packet with a FIN flag (excluding ACK packets)

	       "T" flow	generator prematurely created a	record for a long-
		   running connection due to a timeout.	 (When the flow
		   generator yaf(1) is run with	the --silk switch, it will
		   prematurely create a	flow and mark it with "T" if the byte
		   count of the	flow cannot be stored in a 32-bit value.)

	       "C" flow	generator created this flow as a continuation of long-
		   running connection, where the previous flow for this
		   connection met a timeout (or	a byte threshold in the	case
		   of yaf).

	       Consider	a long-running ssh session that	exceeds	the flow
	       generator's active timeout.  (This is the active	timeout	since
	       the flow	generator creates a flow for a connection that still
	       has activity).  The flow	generator will create multiple flow
	       records for this	ssh session, each spanning some	portion	of the
	       total session.  The first flow record will be marked with a "T"
	       indicating that it hit the timeout.  The	second through next-
	       to-last records will be marked with "TC"	indicating that	this
	       flow both timed out and is a continuation of a flow that	timed
	       out.  The final flow will be marked with	a "C", indicating that
	       it was created as a continuation	of an active flow.

	   application,29
	       guess as	to the content of the flow.  Some software that
	       generates flow records from packet data,	such as	yaf, will
	       inspect the contents of the packets that	make up	a flow and use
	       traffic signatures to label the content of the flow.  SiLK
	       calls this label	the application; yaf refers to it as the
	       appLabel.  The application is the port number that is
	       traditionally used for that type	of traffic (see	the
	       /etc/services file on most UNIX systems).  For example, traffic
	       that the	flow generator recognizes as FTP will have a value of
	       21, even	if that	traffic	is being routed	through	the standard
	       HTTP/web	port (80).

	   The following fields	provide	a way to label the IPs or ports	on a
	   record.  These fields require external files	to provide the mapping
	   from	the IP or port to the label:

	   sType,16
	       categorize the source IP	address	as "non-routable", "internal",
	       or "external" and sort based on the category.  Uses the mapping
	       file specified by the SILK_ADDRESS_TYPES	environment variable,
	       or the address_types.pmap mapping file, as described in
	       addrtype(3).

	   dType,17
	       as sType	for the	destination IP address

	   scc,18
	       the country code	of the source IP address.  Uses	the mapping
	       file specified by the SILK_COUNTRY_CODES	environment variable,
	       or the country_codes.pmap mapping file, as described in
	       ccfilter(3).

	   dcc,19
	       as scc for the destination IP

	   src-map-name
	       label contained in the prefix map file associated with map-
	       name.  If the prefix map	is for IP addresses, the label is that
	       associated with the source IP address.  If the prefix map is
	       for protocol/port pairs,	the label is that associated with the
	       protocol	and source port.  See also the description of the
	       --pmap-file switch below	and the	pmapfilter(3) manual page.

	   dst-map-name
	       as src-map-name for the destination IP address or the protocol
	       and destination port.

	   sval
	       as src-map-name when no map-name	is associated with the prefix
	       map file

	   dval
	       as dst-map-name when no map-name	is associated with the prefix
	       map file

	   Finally, the	list of	built-in fields	may be augmented by the	run-
	   time	loading	of PySiLK code or plug-ins written in C	(also called
	   shared object files or dynamic libraries), as described by the
	   --python-file and --plugin switches.

       --presorted-input
	   Instruct rwsort to merge-sort the input files; that is, rwsort
	   assumes the input files have	been previously	sorted using the same
	   values for the --fields and --reverse switches as was given for
	   this	invocation.  This switch can greatly reduce rwsort's memory
	   requirements	as a large buffer is not required for sorting the
	   records.  If	the input files	were created with rwsort, you can run
	   rwfileinfo(1) on the	files to see the rwsort	invocation that
	   created them.

       --reverse
	   Cause rwsort	to reverse the sort order, causing larger values to
	   occur in the	output before smaller values.  Normally	smaller	values
	   appear before larger	values.

       --plugin=PLUGIN
	   Augment the list of fields by using run-time	loading	of the plug-in
	   (shared object) whose path is PLUGIN.  The switch may be repeated
	   to load multiple plug-ins.  The creation of plug-ins	is described
	   in the silk-plugin(3) manual	page.  When PLUGIN does	not contain a
	   slash ("/"),	rwsort will attempt to find a file named PLUGIN	in the
	   directories listed in the "FILES" section.  If rwsort finds the
	   file, it uses that path.  If	PLUGIN contains	a slash	or if rwsort
	   does	not find the file, rwsort relies on your operating system's
	   dlopen(3) call to find the file.  When the SILK_PLUGIN_DEBUG
	   environment variable	is non-empty, rwsort prints status messages to
	   the standard	error as it attempts to	find and open each of its
	   plug-ins.

       --temp-directory=DIR_PATH
	   Specify the name of the directory in	which to store data files
	   temporarily when more records have been read	that will fit into
	   RAM.	 This switch overrides the directory specified in the
	   SILK_TMPDIR environment variable, which overrides the directory
	   specified in	the TMPDIR variable, which overrides the default,
	   /tmp.

       --sort-buffer-size=SIZE
	   Set the maximum size	of the buffer used for sorting the records, in
	   bytes.  A larger buffer means fewer temporary files need to be
	   created, reducing the I/O wait times.  When this switch is not
	   specified, the default maximum for this buffer is near 2GB.	The
	   SIZE	may be given as	an ordinary integer, or	as a real number
	   followed by a suffix	"K", "M" or "G", which represents the
	   numerical value multiplied by 1,024 (kilo), 1,048,576 (mega), and
	   1,073,741,824 (giga), respectively.	For example, 1.5K represents
	   1,536 bytes,	or one and one-half kilobytes.	(This value does not
	   represent the absolute maximum amount of RAM	that rwsort will
	   allocate, since additional buffers will be allocated	for reading
	   the input and writing the output.)  The sort	buffer is not used
	   when	the --presorted-input switch is	specified.

       --note-add=TEXT
	   Add the specified TEXT to the header	of the output file as an
	   annotation.	This switch may	be repeated to add multiple
	   annotations to a file.  To view the annotations, use	the
	   rwfileinfo(1) tool.

       --note-file-add=FILENAME
	   Open	FILENAME and add the contents of that file to the header of
	   the output file as an annotation.	This switch may	be repeated to
	   add multiple	annotations.  Currently	the application	makes no
	   effort to ensure that FILENAME contains text; be careful that you
	   do not attempt to add a SiLK	data file as an	annotation.

       --compression-method=COMP_METHOD
	   Specify the compression library to use when writing output files.
	   If this switch is not given,	the value in the
	   SILK_COMPRESSION_METHOD environment variable	is used	if the value
	   names an available compression method.  When	no compression method
	   is specified, output	to the standard	output or to named pipes is
	   not compressed, and output to files is compressed using the default
	   chosen when SiLK was	compiled.  The valid values for	COMP_METHOD
	   are determined by which external libraries were found when SiLK was
	   compiled.  To see the available compression methods and the default
	   method, use the --help or --version switch.	SiLK can support the
	   following COMP_METHOD values	when the required libraries are
	   available.

	   none
	       Do not compress the output using	an external library.

	   zlib
	       Use the zlib(3) library for compressing the output, and always
	       compress	the output regardless of the destination.  Using zlib
	       produces	the smallest output files at the cost of speed.

	   lzo1x
	       Use the lzo1x algorithm from the	LZO real time compression
	       library for compression,	and always compress the	output
	       regardless of the destination.  This compression	provides good
	       compression with	less memory and	CPU overhead.

	   snappy
	       Use the snappy library for compression, and always compress the
	       output regardless of the	destination.  This compression
	       provides	good compression with less memory and CPU overhead.
	       Since SiLK 3.13.0.

	   best
	       Use lzo1x if available, otherwise use snappy if available,
	       otherwise use zlib if available.	 Only compress the output when
	       writing to a file.

       --print-filenames
	   Print to the	standard error the names of input files	as they	are
	   opened.

       --output-path=PATH
	   Write the binary SiLK Flow records to PATH, where PATH is a
	   filename, a named pipe, the keyword "stderr"	to write the output to
	   the standard	error, or the keyword "stdout" or "-" to write the
	   output to the standard output.  If PATH names an existing file,
	   rwsort exits	with an	error unless the SILK_CLOBBER environment
	   variable is set, in which case PATH is overwritten.	If this	switch
	   is not given, the output is written to the standard output.
	   Attempting to write the binary output to a terminal causes rwsort
	   to exit with	an error.

       --site-config-file=FILENAME
	   Read	the SiLK site configuration from the named file	FILENAME.
	   When	this switch is not provided, rwsort searches for the site
	   configuration file in the locations specified in the	"FILES"
	   section.

       --input-pipe=PATH
	   Read	the SiLK Flow records to be sorted from	the named pipe at
	   PATH.  If PATH is "stdin" or	"-", records are read from the
	   standard input.  Use	of this	switch is not required,	since rwsort
	   will	automatically read data	from the standard input	when no	file
	   names are specified on the command line.  This switch is deprecated
	   and will be removed in the SiLK 4.0 release.

       --xargs
       --xargs=FILENAME
	   Read	the names of the input files from FILENAME or from the
	   standard input if FILENAME is not provided.	The input is expected
	   to have one filename	per line.  rwsort opens	each named file	in
	   turn	and reads records from it as if	the filenames had been listed
	   on the command line.

       --help
	   Print the available options and exit.  Specifying switches that add
	   new fields or additional switches before --help will	allow the
	   output to include descriptions of those fields or switches.

       --help-fields
	   Print the description and alias(es) of each field and exit.
	   Specifying switches that add	new fields before --help-fields	will
	   allow the output to include descriptions of those fields.

       --version
	   Print the version number and	information about how SiLK was
	   configured, then exit the application.

       --pmap-file=PATH
       --pmap-file=MAPNAME:PATH
	   Load	the prefix map file located at PATH and	create fields named
	   src-map-name	and dst-map-name where map-name	is either the MAPNAME
	   part	of the argument	or the map-name	specified when the file	was
	   created (see	rwpmapbuild(1)).  If no	map-name is available, rwsort
	   names the fields "sval" and "dval".	Specify	PATH as	"-" or "stdin"
	   to read from	the standard input.  The switch	may be repeated	to
	   load	multiple prefix	map files, but each prefix map must use	a
	   unique map-name.  The --pmap-file switch(es)	must precede the
	   --fields switch.  See also pmapfilter(3).

       --python-file=PATH
	   When	the SiLK Python	plug-in	is used, rwsort	reads the Python code
	   from	the file PATH to define	additional fields that can be used as
	   part	of the sort key.  This file should call	register_field() for
	   each	field it wishes	to define.  For	details	and examples, see the
	   silkpython(3) and pysilk(3) manual pages.

LIMITATIONS
       When the	temporary files	and the	final output are stored	on the same
       file volume, rwsort will	require	approximately twice as much free disk
       space as	the size of data to be sorted.

       When the	temporary files	and the	final output are on different volumes,
       rwsort will require between 1 and 1.5 times as much free	space on the
       temporary volume	as the size of the data	to be sorted.

EXAMPLES
       In the following	examples, the dollar sign ("$")	represents the shell
       prompt.	The text after the dollar sign represents the command line.

       To sort the records in infile.rw	based primarily	on destination port
       and secondarily on source IP and	write the binary output	to outfile.rw,
       run:

	$ rwsort --fields=dport,sip --output-path=outfile.rw infile.rw

       The silkpython(3) manual	page provides examples that use	PySiLK to
       create arbitrary	fields to use as part of the key for rwsort.

ENVIRONMENT
       SILK_TMPDIR
	   When	set and	--temp-directory is not	specified, rwsort writes the
	   temporary files it creates to this directory.  SILK_TMPDIR
	   overrides the value of TMPDIR.

       TMPDIR
	   When	set and	SILK_TMPDIR is not set,	rwsort writes the temporary
	   files it creates to this directory.

       PYTHONPATH
	   This	environment variable is	used by	Python to locate modules.
	   When	--python-file is specified, rwsort must	load the Python	files
	   that	comprise the PySiLK package, such as silk/__init__.py.	If
	   this	silk/ directory	is located outside Python's normal search path
	   (for	example, in the	SiLK installation tree), it may	be necessary
	   to set or modify the	PYTHONPATH environment variable	to include the
	   parent directory of silk/ so	that Python can	find the PySiLK
	   module.

       SILK_PYTHON_TRACEBACK
	   When	set, Python plug-ins will output traceback information on
	   Python errors to the	standard error.

       SILK_COUNTRY_CODES
	   This	environment variable allows the	user to	specify	the country
	   code	mapping	file that rwsort uses when computing the scc and dcc
	   fields.  The	value may be a complete	path or	a file relative	to the
	   SILK_PATH.  See the "FILES" section for standard locations of this
	   file.

       SILK_ADDRESS_TYPES
	   This	environment variable allows the	user to	specify	the address
	   type	mapping	file that rwsort uses when computing the sType and
	   dType fields.  The value may	be a complete path or a	file relative
	   to the SILK_PATH.  See the "FILES" section for standard locations
	   of this file.

       SILK_CLOBBER
	   The SiLK tools normally refuse to overwrite existing	files.
	   Setting SILK_CLOBBER	to a non-empty value removes this restriction.

       SILK_COMPRESSION_METHOD
	   This	environment variable is	used as	the value for
	   --compression-method	when that switch is not	provided.  Since SiLK
	   3.13.0.

       SILK_CONFIG_FILE
	   This	environment variable is	used as	the value for the
	   --site-config-file when that	switch is not provided.

       SILK_DATA_ROOTDIR
	   This	environment variable specifies the root	directory of data
	   repository.	As described in	the "FILES" section, rwsort may	use
	   this	environment variable when searching for	the SiLK site
	   configuration file.

       SILK_PATH
	   This	environment variable gives the root of the install tree.  When
	   searching for configuration files and plug-ins, rwsort may use this
	   environment variable.  See the "FILES" section for details.

       SILK_PLUGIN_DEBUG
	   When	set to 1, rwsort prints	status messages	to the standard	error
	   as it attempts to find and open each	of its plug-ins.  In addition,
	   when	an attempt to register a field fails, the application prints a
	   message specifying the additional function(s) that must be defined
	   to register the field in the	application.  Be aware that the	output
	   can be rather verbose.

       SILK_TEMPFILE_DEBUG
	   When	set to 1, rwsort prints	debugging messages to the standard
	   error as it creates,	re-opens, and removes temporary	files.

FILES
       ${SILK_ADDRESS_TYPES}
       ${SILK_PATH}/share/silk/address_types.pmap
       ${SILK_PATH}/share/address_types.pmap
       /usr/local/share/silk/address_types.pmap
       /usr/local/share/address_types.pmap
	   Possible locations for the address types mapping file required by
	   the sType and dType fields.

       ${SILK_CONFIG_FILE}
       ${SILK_DATA_ROOTDIR}/silk.conf
       /data/silk.conf
       ${SILK_PATH}/share/silk/silk.conf
       ${SILK_PATH}/share/silk.conf
       /usr/local/share/silk/silk.conf
       /usr/local/share/silk.conf
	   Possible locations for the SiLK site	configuration file which are
	   checked when	the --site-config-file switch is not provided.

       ${SILK_COUNTRY_CODES}
       ${SILK_PATH}/share/silk/country_codes.pmap
       ${SILK_PATH}/share/country_codes.pmap
       /usr/local/share/silk/country_codes.pmap
       /usr/local/share/country_codes.pmap
	   Possible locations for the country code mapping file	required by
	   the scc and dcc fields.

       ${SILK_PATH}/lib64/silk/
       ${SILK_PATH}/lib64/
       ${SILK_PATH}/lib/silk/
       ${SILK_PATH}/lib/
       /usr/local/lib64/silk/
       /usr/local/lib64/
       /usr/local/lib/silk/
       /usr/local/lib/
	   Directories that rwsort checks when attempting to load a plug-in.

       ${SILK_TMPDIR}/
       ${TMPDIR}/
       /tmp/
	   Directory in	which to create	temporary files.

SEE ALSO
       rwcount(1), rwcut(1), rwfileinfo(1), rwstats(1),	rwuniq(1),
       rwpmapbuild(1), addrtype(3), ccfilter(3), pmapfilter(3),	pysilk(3),
       silkpython(3), silk-plugin(3), sensor.conf(5), rwflowpack(8), silk(7),
       yaf(1), dlopen(3), zlib(3)

NOTES
       If an output path is not	specified, rwsort will write to	the standard
       output unless it	is connected to	a terminal, in which case an error is
       printed and rwsort exits.

       If an input pipe	or a set of input files	are not	specified, rwsort will
       read records from the standard input unless it is connected to a
       terminal, in which case an error	is printed and rwsort exits.

       Note that rwsort	produces binary	output.	 Use rwcut(1) to view the
       records.

       Do not spend the	resources to sort the data if you are going to be
       passing it to an	aggregation tool like rwtotal or rwaddrcount, which
       have their own internal data structures that will ignore	the sorted
       data.

       Both rwuniq(1) and rwstats(1) can take advantage	of previously sorted
       data, but you must explicitly inform them that the input	is sorted by
       providing the --presorted-input switch.

SiLK 3.19.1			  2020-08-27			     rwsort(1)

NAME | SYNOPSIS | DESCRIPTION | OPTIONS | LIMITATIONS | EXAMPLES | ENVIRONMENT | FILES | SEE ALSO | NOTES

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=rwsort&sektion=1&manpath=FreeBSD+12.2-RELEASE+and+Ports>

home | help