Skip site navigation (1)Skip section navigation (2)

FreeBSD Man Pages

Man Page or Keyword Search:
Man Architecture
Apropos Keyword Search (all sections) Output format
home | help
TCP(4)                 FreeBSD Kernel Interfaces Manual                 TCP(4)

NAME
     tcp - Internet Transmission Control Protocol

SYNOPSIS
     #include <sys/types.h>
     #include <sys/socket.h>
     #include <netinet/in.h>

     int
     socket(AF_INET, SOCK_STREAM, 0);

DESCRIPTION
     The TCP protocol provides reliable, flow-controlled, two-way transmission
     of data.  It is a byte-stream protocol used to support the SOCK_STREAM
     abstraction.  TCP uses the standard Internet address format and, in
     addition, provides a per-host collection of ``port addresses''.  Thus,
     each address is composed of an Internet address specifying the host and
     network, with a specific TCP port on the host identifying the peer
     entity.

     Sockets utilizing the TCP protocol are either ``active'' or ``passive''.
     Active sockets initiate connections to passive sockets.  By default, TCP
     sockets are created active; to create a passive socket, the listen(2)
     system call must be used after binding the socket with the bind(2) system
     call.  Only passive sockets may use the accept(2) call to accept incoming
     connections.  Only active sockets may use the connect(2) call to initiate
     connections.

     Passive sockets may ``underspecify'' their location to match incoming
     connection requests from multiple networks.  This technique, termed
     ``wildcard addressing'', allows a single server to provide service to
     clients on multiple networks.  To create a socket which listens on all
     networks, the Internet address INADDR_ANY must be bound.  The TCP port
     may still be specified at this time; if the port is not specified, the
     system will assign one.  Once a connection has been established, the
     socket's address is fixed by the peer entity's location.  The address
     assigned to the socket is the address associated with the network
     interface through which packets are being transmitted and received.
     Normally, this address corresponds to the peer entity's network.

     TCP supports a number of socket options which can be set with
     setsockopt(2) and tested with getsockopt(2):

     TCP_INFO            Information about a socket's underlying TCP session
                         may be retrieved by passing the read-only option
                         TCP_INFO to getsockopt(2).  It accepts a single
                         argument: a pointer to an instance of struct
                         tcp_info.

                         This API is subject to change; consult the source to
                         determine which fields are currently filled out by
                         this option.  FreeBSD specific additions include send
                         window size, receive window size, and bandwidth-
                         controlled window space.

     TCP_CONGESTION      Select or query the congestion control algorithm that
                         TCP will use for the connection.  See mod_cc(4) for
                         details.

     TCP_NODELAY         Under most circumstances, TCP sends data when it is
                         presented; when outstanding data has not yet been
                         acknowledged, it gathers small amounts of output to
                         be sent in a single packet once an acknowledgement is
                         received.  For a small number of clients, such as
                         window systems that send a stream of mouse events
                         which receive no replies, this packetization may
                         cause significant delays.  The boolean option
                         TCP_NODELAY defeats this algorithm.

     TCP_MAXSEG          By default, a sender- and receiver-TCP will negotiate
                         among themselves to determine the maximum segment
                         size to be used for each connection.  The TCP_MAXSEG
                         option allows the user to determine the result of
                         this negotiation, and to reduce it if desired.

     TCP_NOOPT           TCP usually sends a number of options in each packet,
                         corresponding to various TCP extensions which are
                         provided in this implementation.  The boolean option
                         TCP_NOOPT is provided to disable TCP option use on a
                         per-connection basis.

     TCP_NOPUSH          By convention, the sender-TCP will set the ``push''
                         bit, and begin transmission immediately (if
                         permitted) at the end of every user call to write(2)
                         or writev(2).  When this option is set to a non-zero
                         value, TCP will delay sending any data at all until
                         either the socket is closed, or the internal send
                         buffer is filled.

     TCP_MD5SIG          This option enables the use of MD5 digests (also
                         known as TCP-MD5) on writes to the specified socket.
                         In the current release, only outgoing traffic is
                         digested; digests on incoming traffic are not
                         verified.  The current default behavior for the
                         system is to respond to a system advertising this
                         option with TCP-MD5; this may change.

                         One common use for this in a FreeBSD router
                         deployment is to enable based routers to interwork
                         with Cisco equipment at peering points.  Support for
                         this feature conforms to RFC 2385.  Only IPv4
                         (AF_INET) sessions are supported.

                         In order for this option to function correctly, it is
                         necessary for the administrator to add a tcp-md5 key
                         entry to the system's security associations database
                         (SADB) using the setkey(8) utility.  This entry must
                         have an SPI of 0x1000 and can therefore only be
                         specified on a per-host basis at this time.

                         If an SADB entry cannot be found for the destination,
                         the outgoing traffic will have an invalid digest
                         option prepended, and the following error message
                         will be visible on the system console:
                         tcp_signature_compute: SADB lookup failed for
                         %d.%d.%d.%d.

     The option level for the setsockopt(2) call is the protocol number for
     TCP, available from getprotobyname(3), or IPPROTO_TCP.  All options are
     declared in <netinet/tcp.h>.

     Options at the IP transport level may be used with TCP; see ip(4).
     Incoming connection requests that are source-routed are noted, and the
     reverse source route is used in responding.

     The default congestion control algorithm for TCP is cc_newreno(4).  Other
     congestion control algorithms can be made available using the mod_cc(4)
     framework.

   MIB Variables
     The TCP protocol implements a number of variables in the net.inet.tcp
     branch of the sysctl(3) MIB.

     TCPCTL_DO_RFC1323      (rfc1323) Implement the window scaling and
                            timestamp options of RFC 1323 (default is true).

     TCPCTL_MSSDFLT         (mssdflt) The default value used for the maximum
                            segment size (``MSS'') when no advice to the
                            contrary is received from MSS negotiation.

     TCPCTL_SENDSPACE       (sendspace) Maximum TCP send window.

     TCPCTL_RECVSPACE       (recvspace) Maximum TCP receive window.

     log_in_vain            Log any connection attempts to ports where there
                            is not a socket accepting connections.  The value
                            of 1 limits the logging to SYN (connection
                            establishment) packets only.  That of 2 results in
                            any TCP packets to closed ports being logged.  Any
                            value unlisted above disables the logging (default
                            is 0, i.e., the logging is disabled).

     slowstart_flightsize   The number of packets allowed to be in-flight
                            during the TCP slow-start phase on a non-local
                            network.

     local_slowstart_flightsize
                            The number of packets allowed to be in-flight
                            during the TCP slow-start phase to local machines
                            in the same subnet.

     msl                    The Maximum Segment Lifetime, in milliseconds, for
                            a packet.

     keepinit               Timeout, in milliseconds, for new, non-established
                            TCP connections.

     keepidle               Amount of time, in milliseconds, that the
                            connection must be idle before keepalive probes
                            (if enabled) are sent.

     keepintvl              The interval, in milliseconds, between keepalive
                            probes sent to remote machines, when no response
                            is received on a keepidle probe.  After
                            TCPTV_KEEPCNT (default 8) probes are sent, with no
                            response, the connection is dropped.

     always_keepalive       Assume that SO_KEEPALIVE is set on all TCP
                            connections, the kernel will periodically send a
                            packet to the remote host to verify the connection
                            is still up.

     icmp_may_rst           Certain ICMP unreachable messages may abort
                            connections in SYN-SENT state.

     do_tcpdrain            Flush packets in the TCP reassembly queue if the
                            system is low on mbufs.

     blackhole              If enabled, disable sending of RST when a
                            connection is attempted to a port where there is
                            not a socket accepting connections.  See
                            blackhole(4).

     delayed_ack            Delay ACK to try and piggyback it onto a data
                            packet.

     delacktime             Maximum amount of time, in milliseconds, before a
                            delayed ACK is sent.

     path_mtu_discovery     Enable Path MTU Discovery.

     tcbhashsize            Size of the TCP control-block hash table (read-
                            only).  This may be tuned using the kernel option
                            TCBHASHSIZE or by setting net.inet.tcp.tcbhashsize
                            in the loader(8).

     pcbcount               Number of active process control blocks (read-
                            only).

     syncookies             Determines whether or not SYN cookies should be
                            generated for outbound SYN-ACK packets.  SYN
                            cookies are a great help during SYN flood attacks,
                            and are enabled by default.  (See syncookies(4).)

     isn_reseed_interval    The interval (in seconds) specifying how often the
                            secret data used in RFC 1948 initial sequence
                            number calculations should be reseeded.  By
                            default, this variable is set to zero, indicating
                            that no reseeding will occur.  Reseeding should
                            not be necessary, and will break TIME_WAIT
                            recycling for a few minutes.

     rexmit_min, rexmit_slop
                            Adjust the retransmit timer calculation for TCP.
                            The slop is typically added to the raw calculation
                            to take into account occasional variances that the
                            SRTT (smoothed round-trip time) is unable to
                            accommodate, while the minimum specifies an
                            absolute minimum.  While a number of TCP RFCs
                            suggest a 1 second minimum, these RFCs tend to
                            focus on streaming behavior, and fail to deal with
                            the fact that a 1 second minimum has severe
                            detrimental effects over lossy interactive
                            connections, such as a 802.11b wireless link, and
                            over very fast but lossy connections for those
                            cases not covered by the fast retransmit code.
                            For this reason, we use 200ms of slop and a near-0
                            minimum, which gives us an effective minimum of
                            200ms (similar to Linux).

     inflight.enable        Enable TCP bandwidth-delay product limiting.  An
                            attempt will be made to calculate the bandwidth-
                            delay product for each individual TCP connection,
                            and limit the amount of inflight data being
                            transmitted, to avoid building up unnecessary
                            packets in the network.  This option is
                            recommended if you are serving a lot of data over
                            connections with high bandwidth-delay products,
                            such as modems, GigE links, and fast long-haul
                            WANs, and/or you have configured your machine to
                            accommodate large TCP windows.  In such
                            situations, without this option, you may
                            experience high interactive latencies or packet
                            loss due to the overloading of intermediate
                            routers and switches.  Note that bandwidth-delay
                            product limiting only effects the transmit side of
                            a TCP connection.

     inflight.debug         Enable debugging for the bandwidth-delay product
                            algorithm.

     inflight.min           This puts a lower bound on the bandwidth-delay
                            product window, in bytes.  A value of 1024 is
                            typically used for debugging.  6000-16000 is more
                            typical in a production installation.  Setting
                            this value too low may result in slow ramp-up
                            times for bursty connections.  Setting this value
                            too high effectively disables the algorithm.

     inflight.max           This puts an upper bound on the bandwidth-delay
                            product window, in bytes.  This value should not
                            generally be modified, but may be used to set a
                            global per-connection limit on queued data,
                            potentially allowing you to intentionally set a
                            less than optimum limit, to smooth data flow over
                            a network while still being able to specify huge
                            internal TCP buffers.

     inflight.stab          The bandwidth-delay product algorithm requires a
                            slightly larger window than it otherwise
                            calculates for stability.  This parameter
                            determines the extra window in maximal packets /
                            10.  The default value of 20 represents 2 maximal
                            packets.  Reducing this value is not recommended,
                            but you may come across a situation with very slow
                            links where the ping(8) time reduction of the
                            default inflight code is not sufficient.  If this
                            case occurs, you should first try reducing
                            inflight.min and, if that does not work, reduce
                            both inflight.min and inflight.stab, trying values
                            of 15, 10, or 5 for the latter.  Never use a value
                            less than 5.  Reducing inflight.stab can lead to
                            upwards of a 20% underutilization of the link as
                            well as reducing the algorithm's ability to adapt
                            to changing situations and should only be done as
                            a last resort.

     rfc3042                Enable the Limited Transmit algorithm as described
                            in RFC 3042.  It helps avoid timeouts on lossy
                            links and also when the congestion window is
                            small, as happens on short transfers.

     rfc3390                Enable support for RFC 3390, which allows for a
                            variable-sized starting congestion window on new
                            connections, depending on the maximum segment
                            size.  This helps throughput in general, but
                            particularly affects short transfers and high-
                            bandwidth large propagation-delay connections.

                            When this feature is enabled, the
                            slowstart_flightsize and
                            local_slowstart_flightsize settings are not
                            observed for new connection slow starts, but they
                            are still used for slow starts that occur when the
                            connection has been idle and starts sending again.

     sack.enable            Enable support for RFC 2018, TCP Selective
                            Acknowledgment option, which allows the receiver
                            to inform the sender about all successfully
                            arrived segments, allowing the sender to
                            retransmit the missing segments only.

     sack.maxholes          Maximum number of SACK holes per connection.
                            Defaults to 128.

     sack.globalmaxholes    Maximum number of SACK holes per system, across
                            all connections.  Defaults to 65536.

     maxtcptw               When a TCP connection enters the TIME_WAIT state,
                            its associated socket structure is freed, since it
                            is of negligible size and use, and a new structure
                            is allocated to contain a minimal amount of
                            information necessary for sustaining a connection
                            in this state, called the compressed TCP TIME_WAIT
                            state.  Since this structure is smaller than a
                            socket structure, it can save a significant amount
                            of system memory.  The net.inet.tcp.maxtcptw MIB
                            variable controls the maximum number of these
                            structures allocated.  By default, it is
                            initialized to kern.ipc.maxsockets / 5.

     nolocaltimewait        Suppress creating of compressed TCP TIME_WAIT
                            states for connections in which both endpoints are
                            local.

     fast_finwait2_recycle  Recycle TCP FIN_WAIT_2 connections faster when the
                            socket is marked as SBS_CANTRCVMORE (no user
                            process has the socket open, data received on the
                            socket cannot be read).  The timeout used here is
                            finwait2_timeout.

     finwait2_timeout       Timeout to use for fast recycling of TCP
                            FIN_WAIT_2 connections.  Defaults to 60 seconds.

     ecn.enable             Enable support for TCP Explicit Congestion
                            Notification (ECN).  ECN allows a TCP sender to
                            reduce the transmission rate in order to avoid
                            packet drops.

     ecn.maxretries         Number of retries (SYN or SYN/ACK retransmits)
                            before disabling ECN on a specific connection.
                            This is needed to help with connection
                            establishment when a broken firewall is in the
                            network path.

ERRORS
     A socket operation may fail with one of the following errors returned:

     [EISCONN]          when trying to establish a connection on a socket
                        which already has one;

     [ENOBUFS]          when the system runs out of memory for an internal
                        data structure;

     [ETIMEDOUT]        when a connection was dropped due to excessive
                        retransmissions;

     [ECONNRESET]       when the remote peer forces the connection to be
                        closed;

     [ECONNREFUSED]     when the remote peer actively refuses connection
                        establishment (usually because no process is listening
                        to the port);

     [EADDRINUSE]       when an attempt is made to create a socket with a port
                        which has already been allocated;

     [EADDRNOTAVAIL]    when an attempt is made to create a socket with a
                        network address for which no network interface exists;

     [EAFNOSUPPORT]     when an attempt is made to bind or connect a socket to
                        a multicast address.

SEE ALSO
     getsockopt(2), socket(2), sysctl(3), blackhole(4), inet(4), intro(4),
     ip(4), mod_cc(4), syncache(4), setkey(8)

     V. Jacobson, R. Braden, and D. Borman, TCP Extensions for High
     Performance, RFC 1323.

     A. Heffernan, Protection of BGP Sessions via the TCP MD5 Signature
     Option, RFC 2385.

     K. Ramakrishnan, S. Floyd, and D. Black, The Addition of Explicit
     Congestion Notification (ECN) to IP, RFC 3168.

HISTORY
     The TCP protocol appeared in 4.2BSD.  The RFC 1323 extensions for window
     scaling and timestamps were added in 4.4BSD.  The TCP_INFO option was
     introduced in Linux 2.6 and is subject to change.

FreeBSD 11.0-PRERELEASE       September 15, 2011       FreeBSD 11.0-PRERELEASE

NAME | SYNOPSIS | DESCRIPTION | ERRORS | SEE ALSO | HISTORY

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=tcp&sektion=4&manpath=FreeBSD+8.3-RELEASE>

home | help