Skip site navigation (1)Skip section navigation (2)

FreeBSD Man Pages

Man Page or Keyword Search:
Man Architecture
Apropos Keyword Search (all sections) Output format
home | help
TCP(4)                 FreeBSD Kernel Interfaces Manual                 TCP(4)

     tcp - Internet Transmission Control Protocol

     #include <sys/types.h>
     #include <sys/socket.h>
     #include <netinet/in.h>

     socket(AF_INET, SOCK_STREAM, 0);

     The TCP protocol provides reliable, flow-controlled, two-way transmission
     of data.  It is a byte-stream protocol used to support the SOCK_STREAM
     abstraction.  TCP uses the standard Internet address format and, in
     addition, provides a per-host collection of ``port addresses''.  Thus,
     each address is composed of an Internet address specifying the host and
     network, with a specific TCP port on the host identifying the peer

     Sockets utilizing the TCP protocol are either ``active'' or ``passive''.
     Active sockets initiate connections to passive sockets.  By default, TCP
     sockets are created active; to create a passive socket, the listen(2)
     system call must be used after binding the socket with the bind(2) system
     call.  Only passive sockets may use the accept(2) call to accept incoming
     connections.  Only active sockets may use the connect(2) call to initiate

     Passive sockets may ``underspecify'' their location to match incoming
     connection requests from multiple networks.  This technique, termed
     ``wildcard addressing'', allows a single server to provide service to
     clients on multiple networks.  To create a socket which listens on all
     networks, the Internet address INADDR_ANY must be bound.  The TCP port
     may still be specified at this time; if the port is not specified, the
     system will assign one.  Once a connection has been established, the
     socket's address is fixed by the peer entity's location.  The address
     assigned to the socket is the address associated with the network
     interface through which packets are being transmitted and received.
     Normally, this address corresponds to the peer entity's network.

     TCP supports a number of socket options which can be set with
     setsockopt(2) and tested with getsockopt(2):

     TCP_INFO         Information about a socket's underlying TCP session may
                      be retrieved by passing the read-only option TCP_INFO to
                      getsockopt(2).  It accepts a single argument: a pointer
                      to an instance of struct tcp_info.

                      This API is subject to change; consult the source to
                      determine which fields are currently filled out by this
                      option.  FreeBSD specific additions include send window
                      size, receive window size, and bandwidth-controlled
                      window space.

     TCP_NODELAY      Under most circumstances, TCP sends data when it is
                      presented; when outstanding data has not yet been
                      acknowledged, it gathers small amounts of output to be
                      sent in a single packet once an acknowledgement is
                      received.  For a small number of clients, such as window
                      systems that send a stream of mouse events which receive
                      no replies, this packetization may cause significant
                      delays.  The boolean option TCP_NODELAY defeats this

     TCP_MAXSEG       By default, a sender- and receiver-TCP will negotiate
                      among themselves to determine the maximum segment size
                      to be used for each connection.  The TCP_MAXSEG option
                      allows the user to determine the result of this
                      negotiation, and to reduce it if desired.

     TCP_NOOPT        TCP usually sends a number of options in each packet,
                      corresponding to various TCP extensions which are
                      provided in this implementation.  The boolean option
                      TCP_NOOPT is provided to disable TCP option use on a
                      per-connection basis.

     TCP_NOPUSH       By convention, the sender-TCP will set the ``push'' bit,
                      and begin transmission immediately (if permitted) at the
                      end of every user call to write(2) or writev(2).  When
                      this option is set to a non-zero value, TCP will delay
                      sending any data at all until either the socket is
                      closed, or the internal send buffer is filled.

     TCP_MD5SIG       This option enables the use of MD5 digests (also known
                      as TCP-MD5) on writes to the specified socket.  In the
                      current release, only outgoing traffic is digested;
                      digests on incoming traffic are not verified.  The
                      current default behavior for the system is to respond to
                      a system advertising this option with TCP-MD5; this may

                      One common use for this in a FreeBSD router deployment
                      is to enable based routers to interwork with Cisco
                      equipment at peering points.  Support for this feature
                      conforms to RFC 2385.  Only IPv4 (AF_INET) sessions are

                      In order for this option to function correctly, it is
                      necessary for the administrator to add a tcp-md5 key
                      entry to the system's security associations database
                      (SADB) using the setkey(8) utility.  This entry must
                      have an SPI of 0x1000 and can therefore only be
                      specified on a per-host basis at this time.

                      If an SADB entry cannot be found for the destination,
                      the outgoing traffic will have an invalid digest option
                      prepended, and the following error message will be
                      visible on the system console: tcp_signature_compute:
                      SADB lookup failed for %d.%d.%d.%d.

     The option level for the setsockopt(2) call is the protocol number for
     TCP, available from getprotobyname(3), or IPPROTO_TCP.  All options are
     declared in <netinet/tcp.h>.

     Options at the IP transport level may be used with TCP; see ip(4).
     Incoming connection requests that are source-routed are noted, and the
     reverse source route is used in responding.

   MIB Variables
     The TCP protocol implements a number of variables in the net.inet.tcp
     branch of the sysctl(3) MIB.

     TCPCTL_DO_RFC1323      (rfc1323) Implement the window scaling and
                            timestamp options of RFC 1323 (default is true).

     TCPCTL_MSSDFLT         (mssdflt) The default value used for the maximum
                            segment size (``MSS'') when no advice to the
                            contrary is received from MSS negotiation.

     TCPCTL_SENDSPACE       (sendspace) Maximum TCP send window.

     TCPCTL_RECVSPACE       (recvspace) Maximum TCP receive window.

     log_in_vain            Log any connection attempts to ports where there
                            is not a socket accepting connections.  The value
                            of 1 limits the logging to SYN (connection
                            establishment) packets only.  That of 2 results in
                            any TCP packets to closed ports being logged.  Any
                            value unlisted above disables the logging (default
                            is 0, i.e., the logging is disabled).

     slowstart_flightsize   The number of packets allowed to be in-flight
                            during the TCP slow-start phase on a non-local

                            The number of packets allowed to be in-flight
                            during the TCP slow-start phase to local machines
                            in the same subnet.

     msl                    The Maximum Segment Lifetime, in milliseconds, for
                            a packet.

     keepinit               Timeout, in milliseconds, for new, non-established
                            TCP connections.

     keepidle               Amount of time, in milliseconds, that the
                            connection must be idle before keepalive probes
                            (if enabled) are sent.

     keepintvl              The interval, in milliseconds, between keepalive
                            probes sent to remote machines.  After
                            TCPTV_KEEPCNT (default 8) probes are sent, with no
                            response, the connection is dropped.

     always_keepalive       Assume that SO_KEEPALIVE is set on all TCP
                            connections, the kernel will periodically send a
                            packet to the remote host to verify the connection
                            is still up.

     icmp_may_rst           Certain ICMP unreachable messages may abort
                            connections in SYN-SENT state.

     do_tcpdrain            Flush packets in the TCP reassembly queue if the
                            system is low on mbufs.

     blackhole              If enabled, disable sending of RST when a
                            connection is attempted to a port where there is
                            not a socket accepting connections.  See

     delayed_ack            Delay ACK to try and piggyback it onto a data

     delacktime             Maximum amount of time, in milliseconds, before a
                            delayed ACK is sent.

     newreno                Enable TCP NewReno Fast Recovery algorithm, as
                            described in RFC 2582.

     path_mtu_discovery     Enable Path MTU Discovery.

     tcbhashsize            Size of the TCP control-block hash table (read-
                            only).  This may be tuned using the kernel option
                            TCBHASHSIZE or by setting net.inet.tcp.tcbhashsize
                            in the loader(8).

     pcbcount               Number of active process control blocks (read-

     syncookies             Determines whether or not SYN cookies should be
                            generated for outbound SYN-ACK packets.  SYN
                            cookies are a great help during SYN flood attacks,
                            and are enabled by default.  (See syncookies(4).)

     isn_reseed_interval    The interval (in seconds) specifying how often the
                            secret data used in RFC 1948 initial sequence
                            number calculations should be reseeded.  By
                            default, this variable is set to zero, indicating
                            that no reseeding will occur.  Reseeding should
                            not be necessary, and will break TIME_WAIT
                            recycling for a few minutes.

     rexmit_min, rexmit_slop
                            Adjust the retransmit timer calculation for TCP.
                            The slop is typically added to the raw calculation
                            to take into account occasional variances that the
                            SRTT (smoothed round-trip time) is unable to
                            accommodate, while the minimum specifies an
                            absolute minimum.  While a number of TCP RFCs
                            suggest a 1 second minimum, these RFCs tend to
                            focus on streaming behavior, and fail to deal with
                            the fact that a 1 second minimum has severe
                            detrimental effects over lossy interactive
                            connections, such as a 802.11b wireless link, and
                            over very fast but lossy connections for those
                            cases not covered by the fast retransmit code.
                            For this reason, we use 200ms of slop and a near-0
                            minimum, which gives us an effective minimum of
                            200ms (similar to Linux).

     inflight.enable        Enable TCP bandwidth-delay product limiting.  An
                            attempt will be made to calculate the bandwidth-
                            delay product for each individual TCP connection,
                            and limit the amount of inflight data being
                            transmitted, to avoid building up unnecessary
                            packets in the network.  This option is
                            recommended if you are serving a lot of data over
                            connections with high bandwidth-delay products,
                            such as modems, GigE links, and fast long-haul
                            WANs, and/or you have configured your machine to
                            accommodate large TCP windows.  In such
                            situations, without this option, you may
                            experience high interactive latencies or packet
                            loss due to the overloading of intermediate
                            routers and switches.  Note that bandwidth-delay
                            product limiting only effects the transmit side of
                            a TCP connection.

     inflight.debug         Enable debugging for the bandwidth-delay product

     inflight.min           This puts a lower bound on the bandwidth-delay
                            product window, in bytes.  A value of 1024 is
                            typically used for debugging.  6000-16000 is more
                            typical in a production installation.  Setting
                            this value too low may result in slow ramp-up
                            times for bursty connections.  Setting this value
                            too high effectively disables the algorithm.

     inflight.max           This puts an upper bound on the bandwidth-delay
                            product window, in bytes.  This value should not
                            generally be modified, but may be used to set a
                            global per-connection limit on queued data,
                            potentially allowing you to intentionally set a
                            less than optimum limit, to smooth data flow over
                            a network while still being able to specify huge
                            internal TCP buffers.

     inflight.stab          The bandwidth-delay product algorithm requires a
                            slightly larger window than it otherwise
                            calculates for stability.  This parameter
                            determines the extra window in maximal packets /
                            10.  The default value of 20 represents 2 maximal
                            packets.  Reducing this value is not recommended,
                            but you may come across a situation with very slow
                            links where the ping(8) time reduction of the
                            default inflight code is not sufficient.  If this
                            case occurs, you should first try reducing
                            inflight.min and, if that does not work, reduce
                            both inflight.min and inflight.stab, trying values
                            of 15, 10, or 5 for the latter.  Never use a value
                            less than 5.  Reducing inflight.stab can lead to
                            upwards of a 20% underutilization of the link as
                            well as reducing the algorithm's ability to adapt
                            to changing situations and should only be done as
                            a last resort.

     rfc3042                Enable the Limited Transmit algorithm as described
                            in RFC 3042.  It helps avoid timeouts on lossy
                            links and also when the congestion window is
                            small, as happens on short transfers.

     rfc3390                Enable support for RFC 3390, which allows for a
                            variable-sized starting congestion window on new
                            connections, depending on the maximum segment
                            size.  This helps throughput in general, but
                            particularly affects short transfers and high-
                            bandwidth large propagation-delay connections.

                            When this feature is enabled, the
                            slowstart_flightsize and
                            local_slowstart_flightsize settings are not
                            observed for new connection slow starts, but they
                            are still used for slow starts that occur when the
                            connection has been idle and starts sending again.

     sack.enable            Enable support for RFC 2018, TCP Selective
                            Acknowledgment option, which allows the receiver
                            to inform the sender about all successfully
                            arrived segments, allowing the sender to
                            retransmit the missing segments only.

     sack.maxholes          Maximum number of SACK holes per connection.
                            Defaults to 128.

     sack.globalmaxholes    Maximum number of SACK holes per system, across
                            all connections.  Defaults to 65536.

     maxtcptw               When a TCP connection enters the TIME_WAIT state,
                            its associated socket structure is freed, since it
                            is of negligible size and use, and a new structure
                            is allocated to contain a minimal amount of
                            information necessary for sustaining a connection
                            in this state, called the compressed TCP TIME_WAIT
                            state.  Since this structure is smaller than a
                            socket structure, it can save a significant amount
                            of system memory.  The net.inet.tcp.maxtcptw MIB
                            variable controls the maximum number of these
                            structures allocated.  By default, it is
                            initialized to kern.ipc.maxsockets / 5.

     nolocaltimewait        Suppress creating of compressed TCP TIME_WAIT
                            states for connections in which both endpoints are

     fast_finwait2_recycle  Recycle TCP FIN_WAIT_2 connections faster when the
                            socket is marked as SBS_CANTRCVMORE (no user
                            process has the socket open, data received on the
                            socket cannot be read).  The timeout used here is

     finwait2_timeout       Timeout to use for fast recycling of TCP
                            FIN_WAIT_2 connections.  Defaults to 60 seconds.

     A socket operation may fail with one of the following errors returned:

     [EISCONN]          when trying to establish a connection on a socket
                        which already has one;

     [ENOBUFS]          when the system runs out of memory for an internal
                        data structure;

     [ETIMEDOUT]        when a connection was dropped due to excessive

     [ECONNRESET]       when the remote peer forces the connection to be

     [ECONNREFUSED]     when the remote peer actively refuses connection
                        establishment (usually because no process is listening
                        to the port);

     [EADDRINUSE]       when an attempt is made to create a socket with a port
                        which has already been allocated;

     [EADDRNOTAVAIL]    when an attempt is made to create a socket with a
                        network address for which no network interface exists;

     [EAFNOSUPPORT]     when an attempt is made to bind or connect a socket to
                        a multicast address.

     getsockopt(2), socket(2), sysctl(3), blackhole(4), inet(4), intro(4),
     ip(4), syncache(4), setkey(8)

     V. Jacobson, R. Braden, and D. Borman, TCP Extensions for High
     Performance, RFC 1323.

     A. Heffernan, Protection of BGP Sessions via the TCP MD5 Signature
     Option, RFC 2385.

     The TCP protocol appeared in 4.2BSD.  The RFC 1323 extensions for window
     scaling and timestamps were added in 4.4BSD.  The TCP_INFO option was
     introduced in Linux 2.6 and is subject to change.

FreeBSD 11.0-PRERELEASE        February 28, 2007       FreeBSD 11.0-PRERELEASE


Want to link to this manual page? Use this URL:

home | help