Skip site navigation (1)Skip section navigation (2)

CVS log for src/sys/kern/uipc_socket.c

[BACK] Up to [FreeBSD] / src / sys / kern

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.360: download - view: text, markup, annotated - select for diffs
Sat Feb 4 15:00:26 2012 UTC (5 days, 10 hours ago) by hrs
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.359: preferred, colored
Changes since revision 1.359: +1 -1 lines
SVN rev 230981 on 2012-02-04 15:00:26Z by hrs

Fix input validation in SO_SETFIB.

Reviewed by:	bz
MFC after:	1 day

Revision 1.359: download - view: text, markup, annotated - select for diffs
Mon Nov 14 18:21:27 2011 UTC (2 months, 3 weeks ago) by rmh
Branches: MAIN
Diff to: previous 1.358: preferred, colored
Changes since revision 1.358: +0 -3 lines
SVN rev 227503 on 2011-11-14 18:21:27Z by rmh

Remove a few bits of FreeBSD 2.x compatibility code.

Approved by:	kib (mentor)

Revision 1.358.2.1.2.1: download - view: text, markup, annotated - select for diffs
Fri Nov 11 04:20:22 2011 UTC (2 months, 4 weeks ago) by kensmith
Branches: RELENG_9_0
CVS tags: RELENG_9_0_0_RELEASE
Diff to: previous 1.358.2.1: preferred, colored; next MAIN 1.359: preferred, colored
Changes since revision 1.358.2.1: +0 -0 lines
SVN rev 227445 on 2011-11-11 04:20:22Z by kensmith

Copy stable/9 to releng/9.0 as part of the FreeBSD 9.0-RELEASE release
cycle.

Approved by:	re (implicit)

Revision 1.358.2.1: download - view: text, markup, annotated - select for diffs
Fri Sep 23 00:51:37 2011 UTC (4 months, 2 weeks ago) by kensmith
Branches: RELENG_9
CVS tags: RELENG_9_0_BP
Branch point for: RELENG_9_0
Diff to: previous 1.358: preferred, colored; next MAIN 1.359: preferred, colored
Changes since revision 1.358: +0 -0 lines
SVN rev 225736 on 2011-09-23 00:51:37Z by kensmith

Copy head to stable/9 as part of 9.0-RELEASE release cycle.

Approved by:	re (implicit)

Revision 1.340.2.11: download - view: text, markup, annotated - select for diffs
Thu Sep 15 12:27:26 2011 UTC (4 months, 3 weeks ago) by attilio
Branches: RELENG_8
Diff to: previous 1.340.2.10: preferred, colored; branchpoint 1.340: preferred, colored; next MAIN 1.341: preferred, colored
Changes since revision 1.340.2.10: +2 -0 lines
SVN rev 225585 on 2011-09-15 12:27:26Z by attilio

MFC r225177,225181:
Introduce and use seldrain() function for dealing with fast
selrecord/selinfo destruction.

Sponsored by:	Sandvine Incorporated

Revision 1.358: download - view: text, markup, annotated - select for diffs
Thu Aug 25 15:51:54 2011 UTC (5 months, 2 weeks ago) by attilio
Branches: MAIN
CVS tags: RELENG_9_BP
Branch point for: RELENG_9
Diff to: previous 1.357: preferred, colored
Changes since revision 1.357: +2 -0 lines
SVN rev 225177 on 2011-08-25 15:51:54Z by attilio

Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks

Revision 1.302.2.20: download - view: text, markup, annotated - select for diffs
Wed Aug 17 14:29:33 2011 UTC (5 months, 3 weeks ago) by deischen
Branches: RELENG_7
Diff to: previous 1.302.2.19: preferred, colored; branchpoint 1.302: preferred, colored; next MAIN 1.303: preferred, colored
Changes since revision 1.302.2.19: +5 -4 lines
SVN rev 224941 on 2011-08-17 14:29:33Z by deischen

MFC r218627

Allow SO_SETFIB to select/set the default routing table.

Requested by: Andrew Boyer aboyer at averesystems dot com.

Revision 1.340.2.10: download - view: text, markup, annotated - select for diffs
Sat Jul 23 12:55:01 2011 UTC (6 months, 2 weeks ago) by deischen
Branches: RELENG_8
Diff to: previous 1.340.2.9: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.9: +5 -4 lines
SVN rev 224281 on 2011-07-23 12:55:01Z by deischen

MFC r218627

Allow SO_SETFIB to select/set the default routing table.

Requested by: Andrew Boyer aboyer at averesystems dot com.

Revision 1.357: download - view: text, markup, annotated - select for diffs
Fri Jul 8 10:50:13 2011 UTC (7 months ago) by andre
Branches: MAIN
Diff to: previous 1.356: preferred, colored
Changes since revision 1.356: +9 -13 lines
SVN rev 223863 on 2011-07-08 10:50:13Z by andre

In the experimental soreceive_stream():

 o Move the non-blocking socket test below the SBS_CANTRCVMORE so that EOF
   is correctly returned on a remote connection close.
 o In the non-blocking socket test compare SS_NBIO against the so->so_state
   field instead of the incorrect sb->sb_state field.
 o Simplify the ENOTCONN test by removing cases that can't occur.

Submitted by:	trociny (with some further tweaks by committer)
Tested by:	trociny

Revision 1.356: download - view: text, markup, annotated - select for diffs
Thu Jul 7 10:37:14 2011 UTC (7 months ago) by andre
Branches: MAIN
Diff to: previous 1.355: preferred, colored
Changes since revision 1.355: +0 -2 lines
SVN rev 223839 on 2011-07-07 10:37:14Z by andre

Remove the TCP_SORECEIVE_STREAM compile time option.  The use of
soreceive_stream() for TCP still has to be enabled with the loader
tuneable net.inet.tcp.soreceive_stream.

Suggested by:	trociny and others

Revision 1.340.2.9: download - view: text, markup, annotated - select for diffs
Wed Jun 15 20:34:40 2011 UTC (7 months, 3 weeks ago) by trociny
Branches: RELENG_8
Diff to: previous 1.340.2.8: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.8: +10 -4 lines
SVN rev 223119 on 2011-06-15 20:34:40Z by trociny

MFC r222454:

In soreceive_generic(), if MSG_WAITALL is set but the request is
larger than the receive buffer, we have to receive in sections.
When notifying the protocol that some data has been drained the
lock is released for a moment. Returning we block waiting for the
rest of data. There is a race, when data could arrive while the
lock was released and then the connection stalls in sbwait.

Fix this by checking for data before blocking and skip blocking
if there are some.

PR:		kern/154504
Reported by:	Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Tested by:	Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Reviewed by:	rwatson

Approved by:	pjd (mentor)

Revision 1.355: download - view: text, markup, annotated - select for diffs
Sun May 29 18:00:50 2011 UTC (8 months, 1 week ago) by trociny
Branches: MAIN
Diff to: previous 1.354: preferred, colored
Changes since revision 1.354: +10 -4 lines
SVN rev 222454 on 2011-05-29 18:00:50Z by trociny

In soreceive_generic(), if MSG_WAITALL is set but the request is
larger than the receive buffer, we have to receive in sections.
When notifying the protocol that some data has been drained the
lock is released for a moment. Returning we block waiting for the
rest of data. There is a race, when data could arrive while the
lock was released and then the connection stalls in sbwait.

Fix this by checking for data before blocking and skip blocking
if there are some.

PR:		kern/154504
Reported by:	Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Tested by:	Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Reviewed by:	rwatson
Approved by:	kib (co-mentor)
MFC after:	2 weeks

Revision 1.340.2.8: download - view: text, markup, annotated - select for diffs
Sat Apr 16 23:30:53 2011 UTC (9 months, 3 weeks ago) by bz
Branches: RELENG_8
Diff to: previous 1.340.2.7: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.7: +78 -20 lines
SVN rev 220733 on 2011-04-16 23:30:53Z by bz

MFC r218757:

  Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

    VNET socket push back:
    try to minimize the number of places where we have to switch vnets
    and narrow down the time we stay switched.  Add assertions to the
    socket code to catch possibly unset vnets as seen in r204147.

    While this reduces the number of vnet recursion in some places like
    NFS, POSIX local sockets and some netgraph, .. recursions are
    impossible to fix.

    The current expectations are documented at the beginning of
    uipc_socket.c along with the other information there.

    Sponsored by: The FreeBSD Foundation
    Sponsored by: CK Software GmbH
    Reviewed by:  jhb
    Tested by:    zec

  Tested by:    Mikolaj Golub (to.my.trociny gmail.com)

Revision 1.340.2.7: download - view: text, markup, annotated - select for diffs
Sat Apr 9 13:45:13 2011 UTC (10 months ago) by bz
Branches: RELENG_8
Diff to: previous 1.340.2.6: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.6: +2 -1 lines
SVN rev 220495 on 2011-04-09 13:45:13Z by bz

MFC r218559:

  Mfp4 CH=177255:

    Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

    Change the syntax to match KASSERT() to allow more flexible panic
    messages rather than having a printf with hardcoded arguments
    before panic.

    Adjust the few assertions we have to the new format (and enhance
    the output).

    Sponsored by: The FreeBSD Foundation
    Sponsored by: CK Software GmbH
    Reviewed by:  jhb

Revision 1.354: download - view: text, markup, annotated - select for diffs
Wed Feb 16 21:29:13 2011 UTC (11 months, 3 weeks ago) by bz
Branches: MAIN
Diff to: previous 1.353: preferred, colored
Changes since revision 1.353: +78 -20 lines
SVN rev 218757 on 2011-02-16 21:29:13Z by bz

Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

  VNET socket push back:
  try to minimize the number of places where we have to switch vnets
  and narrow down the time we stay switched.  Add assertions to the
  socket code to catch possibly unset vnets as seen in r204147.

  While this reduces the number of vnet recursion in some places like
  NFS, POSIX local sockets and some netgraph, .. recursions are
  impossible to fix.

  The current expectations are documented at the beginning of
  uipc_socket.c along with the other information there.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb
  Tested by:    zec

Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	2 weeks

Revision 1.353: download - view: text, markup, annotated - select for diffs
Sun Feb 13 00:14:13 2011 UTC (11 months, 3 weeks ago) by deischen
Branches: MAIN
Diff to: previous 1.352: preferred, colored
Changes since revision 1.352: +5 -4 lines
SVN rev 218627 on 2011-02-13 00:14:13Z by deischen

Allow the SO_SETFIB socket option to select the default (0)
routing table.

Reviewed by:	julian

Revision 1.352: download - view: text, markup, annotated - select for diffs
Fri Feb 11 13:27:00 2011 UTC (11 months, 4 weeks ago) by bz
Branches: MAIN
Diff to: previous 1.351: preferred, colored
Changes since revision 1.351: +2 -1 lines
SVN rev 218559 on 2011-02-11 13:27:00Z by bz

Mfp4 CH=177255:

  Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

  Change the syntax to match KASSERT() to allow more flexible panic
  messages rather than having a printf with hardcoded arguments
  before panic.

  Adjust the few assertions we have to the new format (and enhance
  the output).

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:	jhb

MFC after:	2 weeks

Revision 1.302.2.19.4.1: download - view: text, markup, annotated - select for diffs
Tue Dec 21 17:10:29 2010 UTC (13 months, 2 weeks ago) by kensmith
Branches: RELENG_7_4
CVS tags: RELENG_7_4_0_RELEASE
Diff to: previous 1.302.2.19: preferred, colored; next MAIN 1.302.2.20: preferred, colored
Changes since revision 1.302.2.19: +0 -0 lines
SVN rev 216618 on 2010-12-21 17:10:29Z by kensmith

Copy stable/7 to releng/7.4 in preparation for FreeBSD-7.4 release.

Approved by:	re (implicit)

Revision 1.340.2.6.2.1: download - view: text, markup, annotated - select for diffs
Tue Dec 21 17:09:25 2010 UTC (13 months, 2 weeks ago) by kensmith
Branches: RELENG_8_2
CVS tags: RELENG_8_2_0_RELEASE
Diff to: previous 1.340.2.6: preferred, colored; next MAIN 1.340.2.7: preferred, colored
Changes since revision 1.340.2.6: +0 -0 lines
SVN rev 216617 on 2010-12-21 17:09:25Z by kensmith

Copy stable/8 to releng/8.2 in preparation for FreeBSD-8.2 release.

Approved by:	re (implicit)

Revision 1.351: download - view: text, markup, annotated - select for diffs
Fri Nov 12 13:02:26 2010 UTC (14 months, 4 weeks ago) by luigi
Branches: MAIN
Diff to: previous 1.350: preferred, colored
Changes since revision 1.350: +10 -0 lines
SVN rev 215178 on 2010-11-12 13:02:26Z by luigi

This commit implements the SO_USER_COOKIE socket option, which lets
you tag a socket with an uint32_t value. The cookie can then be
used by the kernel for various purposes, e.g. setting the skipto
rule or pipe number in ipfw (this is the reason SO_USER_COOKIE has
been implemented; however there is nothing ipfw-specific in its
implementation).

The ipfw-related code that uses the optopn will be committed separately.

This change adds a field to 'struct socket', but the struct is not
part of any driver or userland-visible ABI so the change should be
harmless.

See the discussion at
http://lists.freebsd.org/pipermail/freebsd-ipfw/2009-October/004001.html

Idea and code from Paul Joe, small modifications and manpage
changes by myself.

Submitted by:	Paul Joe
MFC after:	1 week

Revision 1.340.2.6: download - view: text, markup, annotated - select for diffs
Thu Oct 28 16:53:54 2010 UTC (15 months, 1 week ago) by tuexen
Branches: RELENG_8
CVS tags: RELENG_8_2_BP
Branch point for: RELENG_8_2
Diff to: previous 1.340.2.5: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.5: +6 -1 lines
SVN rev 214461 on 2010-10-28 16:53:54Z by tuexen

MFC r211030:
Fix a bug where MSG_TRUNC was not returned in all necessary cases for
SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could
not be copied to the application. If some data was left in the last
mbuf, it was correctly discarded, but MSG_TRUNC was not set.

Revision 1.350: download - view: text, markup, annotated - select for diffs
Sat Sep 18 11:18:42 2010 UTC (16 months, 3 weeks ago) by rwatson
Branches: MAIN
Diff to: previous 1.349: preferred, colored
Changes since revision 1.349: +5 -5 lines
SVN rev 212822 on 2010-09-18 11:18:42Z by rwatson

With reworking of the socket life cycle in 7.x, the need for a "sotryfree()"
was eliminated: all references to sockets are explicitly managed by sorele()
and the protocols.  As such, garbage collect sotryfree(), and update
sofree() comments to make the new world order more clear.

MFC after:	3 days
Reported by:	Anuranjan Shukla <anshukla at juniper dot net>

Revision 1.349: download - view: text, markup, annotated - select for diffs
Sat Aug 7 17:57:58 2010 UTC (18 months ago) by tuexen
Branches: MAIN
Diff to: previous 1.348: preferred, colored
Changes since revision 1.348: +6 -1 lines
SVN rev 211030 on 2010-08-07 17:57:58Z by tuexen

Fix a bug where MSG_TRUNC was not returned in all necessary cases for
SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could
not be copied to the application. If some data was left in the last
mbuf, it was correctly discarded, but MSG_TRUNC was not set.

Reviewed by: bz
MFC after: 3 weeks

Revision 1.340.2.5.2.1: download - view: text, markup, annotated - select for diffs
Mon Jun 14 02:09:06 2010 UTC (19 months, 4 weeks ago) by kensmith
Branches: RELENG_8_1
CVS tags: RELENG_8_1_0_RELEASE
Diff to: previous 1.340.2.5: preferred, colored; next MAIN 1.340.2.6: preferred, colored
Changes since revision 1.340.2.5: +0 -0 lines
SVN rev 209145 on 2010-06-14 02:09:06Z by kensmith

Copy stable/8 to releng/8.1 in preparation for 8.1-RC1.

Approved by:	re (implicit)

Revision 1.340.2.5: download - view: text, markup, annotated - select for diffs
Tue Jun 1 13:59:48 2010 UTC (20 months, 1 week ago) by rwatson
Branches: RELENG_8
CVS tags: RELENG_8_1_BP
Branch point for: RELENG_8_1
Diff to: previous 1.340.2.4: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.4: +4 -1 lines
SVN rev 208692 on 2010-06-01 13:59:48Z by rwatson

Merge r208601 from head to stable/8:

  When close() is called on a connected socket pair, SO_ISCONNECTED might be
  set but be cleared before the call to sodisconnect().  In this case,
  ENOTCONN is returned: suppress this error rather than returning it to
  userspace so that close() doesn't report an error improperly.

  PR:		kern/144061
  Reported by:	Matt Reimer <mreimer at vpop.net>,
		Nikolay Denev <ndenev at gmail.com>,
		Mikolaj Golub <to.my.trociny at gmail.com>

Approved by:	re (kib)

Revision 1.348: download - view: text, markup, annotated - select for diffs
Thu May 27 15:27:31 2010 UTC (20 months, 2 weeks ago) by rwatson
Branches: MAIN
Diff to: previous 1.347: preferred, colored
Changes since revision 1.347: +4 -1 lines
SVN rev 208601 on 2010-05-27 15:27:31Z by rwatson

When close() is called on a connected socket pair, SO_ISCONNECTED might be
set but be cleared before the call to sodisconnect().  In this case,
ENOTCONN is returned: suppress this error rather than returning it to
userspace so that close() doesn't report an error improperly.

PR:		kern/144061
Reported by:	Matt Reimer <mreimer at vpop.net>,
		Nikolay Denev <ndenev at gmail.com>,
		Mikolaj Golub <to.my.trociny at gmail.com>
MFC after:	3 days

Revision 1.340.2.4: download - view: text, markup, annotated - select for diffs
Wed Apr 7 02:24:41 2010 UTC (22 months ago) by nwhitehorn
Branches: RELENG_8
Diff to: previous 1.340.2.3: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.3: +3 -3 lines
SVN rev 206336 on 2010-04-07 02:24:41Z by nwhitehorn

MFC r205014,205015:

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

This MFC is required for MFCs of later changes to the freebsd32
compatibility from HEAD.

Requested by:	kib

Revision 1.340.2.3: download - view: text, markup, annotated - select for diffs
Sat Mar 27 17:42:04 2010 UTC (22 months, 2 weeks ago) by bz
Branches: RELENG_8
Diff to: previous 1.340.2.2: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.2: +3 -2 lines
SVN rev 205758 on 2010-03-27 17:42:04Z by bz

MFC r204147:

  Set curvnet earlier so that it also covers calls to sodisconnect(), which
  before were possibly panicing the system in ULP code in the VIMAGE case.

  Submitted by: Igor (igor ispsystem.com)

Revision 1.347: download - view: text, markup, annotated - select for diffs
Thu Mar 11 14:49:06 2010 UTC (23 months ago) by nwhitehorn
Branches: MAIN
Diff to: previous 1.346: preferred, colored
Changes since revision 1.346: +3 -3 lines
SVN rev 205014 on 2010-03-11 14:49:06Z by nwhitehorn

Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb

Revision 1.346: download - view: text, markup, annotated - select for diffs
Sat Feb 20 22:29:28 2010 UTC (23 months, 2 weeks ago) by bz
Branches: MAIN
Diff to: previous 1.345: preferred, colored
Changes since revision 1.345: +3 -2 lines
SVN rev 204147 on 2010-02-20 22:29:28Z by bz

Set curvnet earlier so that it also covers calls to sodisconnect(), which
before were possibly panicing the system in ULP code in the VIMAGE case.

Submitted by:	Igor (igor ispsystem.com)
MFC after:	5 days

Revision 1.302.2.19.2.1: download - view: text, markup, annotated - select for diffs
Wed Feb 10 00:26:20 2010 UTC (2 years ago) by kensmith
Branches: RELENG_7_3
CVS tags: RELENG_7_3_0_RELEASE
Diff to: previous 1.302.2.19: preferred, colored; next MAIN 1.302.2.20: preferred, colored
Changes since revision 1.302.2.19: +0 -0 lines
SVN rev 203736 on 2010-02-10 00:26:20Z by kensmith

Copy stable/7 to releng/7.3 as part of the 7.3-RELEASE process.

Approved by:	re (implicit)

Revision 1.302.2.19: download - view: text, markup, annotated - select for diffs
Fri Jan 22 17:02:07 2010 UTC (2 years ago) by jhb
Branches: RELENG_7
CVS tags: RELENG_7_4_BP, RELENG_7_3_BP
Branch point for: RELENG_7_4, RELENG_7_3
Diff to: previous 1.302.2.18: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.18: +4 -8 lines
SVN rev 202814 on 2010-01-22 17:02:07Z by jhb

MFC 193951:
Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Unlike the change in HEAD, this does not remove kl_locked() and replace it
with kl_assert_locked() and kl_assert_unlocked().  Instead, the kl_locked
can now be set to NULL in which case no assertion checks are performed on
the lock.  The vfs kqfilter code uses this mode to disable assertion checks.
This preserves the existing ABI for knlist_init().

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Reviewed by:	kib

Revision 1.340.2.2: download - view: text, markup, annotated - select for diffs
Mon Dec 14 10:48:19 2009 UTC (2 years, 1 month ago) by rwatson
Branches: RELENG_8
Diff to: previous 1.340.2.1: preferred, colored; branchpoint 1.340: preferred, colored
Changes since revision 1.340.2.1: +0 -3 lines
SVN rev 200504 on 2009-12-14 10:48:19Z by rwatson

Merge r197720 from head to stable/8:

  Don't comment on stream socket handling in sosend_dgram, since that's
  not handled.

Revision 1.340.2.1.2.1: download - view: text, markup, annotated - select for diffs
Sun Oct 25 01:10:29 2009 UTC (2 years, 3 months ago) by kensmith
Branches: RELENG_8_0
CVS tags: RELENG_8_0_0_RELEASE
Diff to: previous 1.340.2.1: preferred, colored; next MAIN 1.340.2.2: preferred, colored
Changes since revision 1.340.2.1: +0 -0 lines
SVN rev 198460 on 2009-10-25 01:10:29Z by kensmith

Copy stable/8 to releng/8.0 as part of 8.0-RELEASE release procedure.

Approved by:	re (implicit)

Revision 1.345: download - view: text, markup, annotated - select for diffs
Fri Oct 2 21:31:15 2009 UTC (2 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.344: preferred, colored
Changes since revision 1.344: +0 -3 lines
SVN rev 197720 on 2009-10-02 21:31:15Z by rwatson

Don't comment on stream socket handling in sosend_dgram, since that's
not handled.

MFC after:	3 weeks

Revision 1.344: download - view: text, markup, annotated - select for diffs
Tue Sep 15 22:23:45 2009 UTC (2 years, 4 months ago) by andre
Branches: MAIN
Diff to: previous 1.343: preferred, colored
Changes since revision 1.343: +2 -0 lines
SVN rev 197236 on 2009-09-15 22:23:45Z by andre

-Put the optimized soreceive_stream() under a compile time option called
TCP_SORECEIVE_STREAM for the time being.

Requested by:	brooks

Once compiled in make it easily switchable for testers by using a tuneable
 net.inet.tcp.soreceive_stream
and a corresponding read-only sysctl to report the current state.

Suggested by:	rwatson

MFC after:	2 days
-This line, and those below, will be ignored--
> Description of fields to fill in above:                     76 columns --|
> PR:            If a GNATS PR is affected by the change.
> Submitted by:  If someone else sent in the change.
> Reviewed by:   If someone else reviewed your modification.
> Approved by:   If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after:     N [day[s]|week[s]|month[s]].  Request a reminder email.
> Security:      Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.

M    sys/conf/options
M    sys/kern/uipc_socket.c
M    sys/netinet/tcp_subr.c
M    sys/netinet/tcp_usrreq.c

Revision 1.343: download - view: text, markup, annotated - select for diffs
Sat Sep 12 20:03:45 2009 UTC (2 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.342: preferred, colored
Changes since revision 1.342: +15 -6 lines
SVN rev 197134 on 2009-09-12 20:03:45Z by rwatson

Use C99 initialization for struct filterops.

Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks

Revision 1.342: download - view: text, markup, annotated - select for diffs
Tue Aug 25 21:44:14 2009 UTC (2 years, 5 months ago) by jilles
Branches: MAIN
Diff to: previous 1.341: preferred, colored
Changes since revision 1.341: +7 -5 lines
SVN rev 196556 on 2009-08-25 21:44:14Z by jilles

Fix poll() on half-closed sockets, while retaining POLLHUP for fifos.

This reverts part of r196460, so that sockets only return POLLHUP if both
directions are closed/error. Fifos get POLLHUP by closing the unused
direction immediately after creating the sockets.

The tools/regression/poll/*poll.c tests now pass except for two other things:
- if POLLHUP is returned, POLLIN is always returned as well instead of only
  when there is data left in the buffer to be read
- fifo old/new reader distinction does not work the way POSIX specs it

Reviewed by:	kib, bde

Revision 1.341: download - view: text, markup, annotated - select for diffs
Sun Aug 23 12:44:15 2009 UTC (2 years, 5 months ago) by kib
Branches: MAIN
Diff to: previous 1.340: preferred, colored
Changes since revision 1.340: +5 -7 lines
SVN rev 196460 on 2009-08-23 12:44:15Z by kib

Fix the conformance of poll(2) for sockets after r195423 by
returning POLLHUP instead of POLLIN for several cases. Now, the
tools/regression/poll results for FreeBSD are closer to that of the
Solaris and Linux.

Also, improve the POSIX conformance by explicitely clearing POLLOUT
when POLLHUP is reported in pollscan(), making the fix global.

Submitted by:	bde
Reviewed by:	rwatson
MFC after:	1 week

Revision 1.340.2.1: download - view: text, markup, annotated - select for diffs
Mon Aug 3 08:13:06 2009 UTC (2 years, 6 months ago) by kensmith
Branches: RELENG_8
CVS tags: RELENG_8_0_BP
Branch point for: RELENG_8_0
Diff to: previous 1.340: preferred, colored
Changes since revision 1.340: +0 -0 lines
SVN rev 196045 on 2009-08-03 08:13:06Z by kensmith

Copy head to stable/8 as part of 8.0 Release cycle.

Approved by:	re (Implicit)

Revision 1.340: download - view: text, markup, annotated - select for diffs
Sat Aug 1 19:26:27 2009 UTC (2 years, 6 months ago) by rwatson
Branches: MAIN
CVS tags: RELENG_8_BP
Branch point for: RELENG_8
Diff to: previous 1.339: preferred, colored
Changes since revision 1.339: +2 -1 lines
SVN rev 196019 on 2009-08-01 19:26:27Z by rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)

Revision 1.302.2.18: download - view: text, markup, annotated - select for diffs
Sat Aug 1 07:09:50 2009 UTC (2 years, 6 months ago) by julian
Branches: RELENG_7
Diff to: previous 1.302.2.17: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.17: +1 -0 lines
SVN rev 196010 on 2009-08-01 07:09:50Z by julian

MFC #195922
Fix accept on sockets using multiple routing tables

Revision 1.339: download - view: text, markup, annotated - select for diffs
Tue Jul 28 19:43:27 2009 UTC (2 years, 6 months ago) by julian
Branches: MAIN
Diff to: previous 1.338: preferred, colored
Changes since revision 1.338: +1 -0 lines
SVN rev 195922 on 2009-07-28 19:43:27Z by julian

Somewhere along the line accept sockets stopped honoring the
FIB selected for them. Fix this.

Reviewed by:	ambrisko
Approved by:	re (kib)
MFC after:	3 days

Revision 1.338: download - view: text, markup, annotated - select for diffs
Sun Jul 19 17:40:45 2009 UTC (2 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.337: preferred, colored
Changes since revision 1.337: +2 -2 lines
SVN rev 195769 on 2009-07-19 17:40:45Z by rwatson

Normalize field naming for struct vnet, fix two debugging printfs that
print them.

Reviewed by:	bz
Approved by:	re (kensmith, kib)

Revision 1.337: download - view: text, markup, annotated - select for diffs
Tue Jul 7 09:43:44 2009 UTC (2 years, 7 months ago) by kib
Branches: MAIN
Diff to: previous 1.336: preferred, colored
Changes since revision 1.336: +10 -9 lines
SVN rev 195423 on 2009-07-07 09:43:44Z by kib

Fix poll(2) and select(2) for named pipes to return "ready for read"
when all writers, observed by reader, exited. Use writer generation
counter for fifo, and store the snapshot of the fifo generation in the
f_seqcount field of struct file, that is otherwise unused for fifos.
Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is
equal to fifo' fi_wgen, and revert r89376.

Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them.
Note that the patch does not fix not returning POLLHUP for fifos.

PR:	kern/94772
Submitted by:	bde (original version)
Reviewed by:	rwatson, jilles
Approved by:	re (kensmith)
MFC after:	6 weeks (might be)

Revision 1.336: download - view: text, markup, annotated - select for diffs
Mon Jun 22 23:08:05 2009 UTC (2 years, 7 months ago) by andre
Branches: MAIN
Diff to: previous 1.335: preferred, colored
Changes since revision 1.335: +196 -0 lines
SVN rev 194672 on 2009-06-22 23:08:05Z by andre

Add soreceive_stream(), an optimized version of soreceive() for
stream (TCP) sockets.

It is functionally identical to generic soreceive() but has a
number stream specific optimizations:
o does only one sockbuf unlock/lock per receive independent of
  the length of data to be moved into the uio compared to
  soreceive() which unlocks/locks per *mbuf*.
o uses m_mbuftouio() instead of its own copy(out) variant.
o much more compact code flow as a large number of special
  cases is removed.
o much improved reability.

It offers significantly reduced CPU usage and lock contention
when receiving fast TCP streams.  Additional gains are obtained
when the receiving application is using SO_RCVLOWAT to batch up
some data before a read (and wakeup) is done.

This function was written by "reverse engineering" and is not
just a stripped down variant of soreceive().

It is not yet enabled by default on TCP sockets.  Instead it is
commented out in the protocol initialization in tcp_usrreq.c
until more widespread testing has been done.

Testers, especially with 10GigE gear, are welcome.

MFP4:	r164817 //depot/user/andre/soreceive_stream/

Revision 1.335: download - view: text, markup, annotated - select for diffs
Mon Jun 15 19:01:53 2009 UTC (2 years, 7 months ago) by jamie
Branches: MAIN
Diff to: previous 1.334: preferred, colored
Changes since revision 1.334: +1 -1 lines
SVN rev 194252 on 2009-06-15 19:01:53Z by jamie

Get vnets from creds instead of threads where they're available, and from
passed threads instead of curthread.

Reviewed by:	zec, julian
Approved by:	bz (mentor)

Revision 1.334: download - view: text, markup, annotated - select for diffs
Wed Jun 10 20:59:32 2009 UTC (2 years, 8 months ago) by kib
Branches: MAIN
Diff to: previous 1.333: preferred, colored
Changes since revision 1.333: +4 -8 lines
SVN rev 193951 on 2009-06-10 20:59:32Z by kib

Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland

Revision 1.333: download - view: text, markup, annotated - select for diffs
Fri Jun 5 14:55:22 2009 UTC (2 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.332: preferred, colored
Changes since revision 1.332: +0 -1 lines
SVN rev 193511 on 2009-06-05 14:55:22Z by rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd

Revision 1.332: download - view: text, markup, annotated - select for diffs
Tue Jun 2 18:26:17 2009 UTC (2 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.331: preferred, colored
Changes since revision 1.331: +0 -2 lines
SVN rev 193332 on 2009-06-02 18:26:17Z by rwatson

Add internal 'mac_policy_count' counter to the MAC Framework, which is a
count of the number of registered policies.

Rather than unconditionally locking sockets before passing them into MAC,
lock them in the MAC entry points only if mac_policy_count is non-zero.

This avoids locking overhead for a number of socket system calls when no
policies are registered, eliminating measurable overhead for the MAC
Framework for the socket subsystem when there are no active policies.

Possibly socket locks should be acquired by policies if they are required
for socket labels, which would further avoid locking overhead when there
are policies but they don't require labeling of sockets, or possibly
don't even implement socket controls.

Obtained from:	TrustedBSD Project

Revision 1.331: download - view: text, markup, annotated - select for diffs
Mon Jun 1 21:17:03 2009 UTC (2 years, 8 months ago) by jhb
Branches: MAIN
Diff to: previous 1.330: preferred, colored
Changes since revision 1.330: +63 -6 lines
SVN rev 193272 on 2009-06-01 21:17:03Z by jhb

Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem

Revision 1.330: download - view: text, markup, annotated - select for diffs
Fri May 8 14:34:25 2009 UTC (2 years, 9 months ago) by zec
Branches: MAIN
Diff to: previous 1.329: preferred, colored
Changes since revision 1.329: +2 -2 lines
SVN rev 191917 on 2009-05-08 14:34:25Z by zec

A NOP change: style / whitespace cleanup of the noise that slipped
into r191816.

Spotted by:	bz
Approved by:	julian (mentor) (an earlier version of the diff)

Revision 1.329: download - view: text, markup, annotated - select for diffs
Tue May 5 10:56:12 2009 UTC (2 years, 9 months ago) by zec
Branches: MAIN
Diff to: previous 1.328: preferred, colored
Changes since revision 1.328: +35 -9 lines
SVN rev 191816 on 2009-05-05 10:56:12Z by zec

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)

Revision 1.328: download - view: text, markup, annotated - select for diffs
Thu Apr 30 13:36:26 2009 UTC (2 years, 9 months ago) by zec
Branches: MAIN
Diff to: previous 1.327: preferred, colored
Changes since revision 1.327: +4 -0 lines
SVN rev 191688 on 2009-04-30 13:36:26Z by zec

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)

Revision 1.302.2.17.2.1: download - view: text, markup, annotated - select for diffs
Wed Apr 15 03:14:26 2009 UTC (2 years, 9 months ago) by kensmith
Branches: RELENG_7_2
CVS tags: RELENG_7_2_0_RELEASE
Diff to: previous 1.302.2.17: preferred, colored; next MAIN 1.302.2.18: preferred, colored
Changes since revision 1.302.2.17: +0 -0 lines
SVN rev 191087 on 2009-04-15 03:14:26Z by kensmith

Create releng/7.2 from stable/7 in preparation for 7.2-RELEASE.

Approved by:	re (implicit)

Revision 1.302.2.17: download - view: text, markup, annotated - select for diffs
Wed Mar 18 14:43:56 2009 UTC (2 years, 10 months ago) by bz
Branches: RELENG_7
CVS tags: RELENG_7_2_BP
Branch point for: RELENG_7_2
Diff to: previous 1.302.2.16: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.16: +3 -1 lines
SVN rev 189965 on 2009-03-18 14:43:56Z by bz

MFC r185892:
  Style changes only. Put the return type on an extra line[1] and
  add an empty line at the beginning as we do not have any local
  variables.

Submitted by: rwatson [1]

Revision 1.302.2.16: download - view: text, markup, annotated - select for diffs
Wed Mar 18 13:47:44 2009 UTC (2 years, 10 months ago) by bz
Branches: RELENG_7
Diff to: previous 1.302.2.15: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.15: +2 -1 lines
SVN rev 189960 on 2009-03-18 13:47:44Z by bz

MFC r185893:
  Make sure nmbclusters are initialized before maxsockets
  by running the tunable_mbinit() SYSINIT at SI_ORDER_MIDDLE
  before the init_maxsockets() SYSINT at SI_ORDER_ANY.

Revision 1.302.2.15: download - view: text, markup, annotated - select for diffs
Wed Feb 25 14:26:16 2009 UTC (2 years, 11 months ago) by rwatson
Branches: RELENG_7
Diff to: previous 1.302.2.14: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.14: +1 -2 lines
SVN rev 189041 on 2009-02-25 14:26:16Z by rwatson

Merge r188123 from head to stable/7:

  Remove written-to but never read local variable 'offset' from
  soreceive_dgram().

  Submitted by: Christoph Mallon <christoph dot mallon at gmx dot de>

Revision 1.302.2.14: download - view: text, markup, annotated - select for diffs
Wed Feb 18 20:12:08 2009 UTC (2 years, 11 months ago) by jamie
Branches: RELENG_7
Diff to: previous 1.302.2.13: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.13: +1 -8 lines
SVN rev 188761 on 2009-02-18 20:12:08Z by jamie

MFC:

 r188144:
   Standardize the various prison_foo_ip[46] functions and prison_if to
   return zero on success and an error code otherwise.  The possible errors
   are EADDRNOTAVAIL if an address being checked for doesn't match the
   prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
   that address family.  For most callers of these functions, use the
   returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
   EINVAL.

   Always include a jailed() check in these functions, where a non-jailed
   cred always returns success (and makes no changes).  Remove the explicit
   jailed() checks that preceded many of the function calls.

 r188146:
   Don't allow creating a socket with a protocol family that the current
   jail doesn't support.  This involves a new function prison_check_af,
   like prison_check_ip[46] but that checks only the family.

   With this change, most of the errors generated by jailed sockets
   shouldn't ever occur, at least until jails are changeable.

 r188148:
   Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of
   prison_local_ip6 in in6_pcbbind.

 r188149:
   Call prison_if from rtm_get_jailed, instead of splitting it out into
   prison_check_ip4 and prison_check_ip6.  As prison_if includes a jailed()
   check, remove that check before calling rtm_get_jailed.

 r188151:
   Don't bother null-checking the thread pointer before the prison checks
   in udp6_connect (td is already dereferenced elsewhere without such a
   check).  This makes the conversion from a sockaddr to a sockaddr_in6
   always happen, so convert once at the beginning of the function rather
   than twice in the middle.

Approved by:	bz (mentor)

Revision 1.302.2.13: download - view: text, markup, annotated - select for diffs
Sat Feb 7 13:19:08 2009 UTC (3 years ago) by bz
Branches: RELENG_7
Diff to: previous 1.302.2.12: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.12: +4 -0 lines
SVN rev 188281 on 2009-02-07 13:19:08Z by bz

MFC:
 r185435:
  This enhances the current jail implementation to permit multiple
  addresses per jail. In addtion to IPv4, IPv6 is supported as well.
  Due to updated checks it is even possible to have jails without
  an IP address at all, which basically gives one a chroot with
  restricted process view, no networking,..

  SCTP support was updated and supports IPv6 in jails as well.

  Cpuset support permits jails to be bound to specific processor
  sets after creation.

  Jails can have an unrestricted (no duplicate protection, etc.) name
  in addition to the hostname. The jail name cannot be changed from
  within a jail and is considered to be used for management purposes
  or as audit-token in the future.

  DDB 'show jails' command was added to aid debugging.

  Proper compat support permits 32bit jail binaries to be used on 64bit
  systems to manage jails. Also backward compatibility was preserved where
  possible: for jail v1 syscalls, as well as with user space management
  utilities.

  Both jail as well as prison version were updated for the new features.
  A gap was intentionally left as the intermediate versions had been
  used by various patches floating around the last years.

  Bump __FreeBSD_version for the afore mentioned and in kernel changes.

 r185441:
  Unbreak the no-networks (no INET/6) build.

 r185899:
  Correctly check the number of prison states to not access anything
  outside the prison_states array.
  When checking if there is a name configured for the prison, check the
  first character to not be '\0' instead of checking if the char array
  is present, which it always is. Note, that this is different for the
  *jailname in the syscall.

  Found with:	Coverity Prevent(tm)
  CID:		4156, 4155

 r186085:
  Make sure that the direct jls invocations prints something
  reasonable close to and in the same format as it had always.

 r186606:
  Make sure that unused j->ip[46] are cleared.

 r186834:
  Document the special loopback address behaviour of jails.

  PR:		kern/103464

 r186841:
  Put the devfs ruleset next to devfs enable, add a comment about
  the suggested ruleset[1].

  While here use an IP from the 'test-net' prefix for docs.

  PR:		kern/130102

 r187059:
  Add a short section talking about jails and file systems; mention the
  mountand jail-aware file systems as well as quota.

  PR:		kern/68192

 r187092:
  Sort .Xr.

 r187365:
  s,unmount 8,umount 8, it is unmount(2) which I did not mean.

 r187669:
  Update the description of the '-h' option wrt to primary addresses
  per address family and add a reference to the ip-addresses option.

 r187670:
  New sentence starts on a new line.

Revision 1.327: download - view: text, markup, annotated - select for diffs
Thu Feb 5 14:15:18 2009 UTC (3 years ago) by jamie
Branches: MAIN
Diff to: previous 1.326: preferred, colored
Changes since revision 1.326: +1 -8 lines
SVN rev 188146 on 2009-02-05 14:15:18Z by jamie

Don't allow creating a socket with a protocol family that the current
jail doesn't support.  This involves a new function prison_check_af,
like prison_check_ip[46] but that checks only the family.

With this change, most of the errors generated by jailed sockets
shouldn't ever occur, at least until jails are changeable.

Approved by:	bz (mentor)

Revision 1.326: download - view: text, markup, annotated - select for diffs
Wed Feb 4 20:00:17 2009 UTC (3 years ago) by rwatson
Branches: MAIN
Diff to: previous 1.325: preferred, colored
Changes since revision 1.325: +1 -2 lines
SVN rev 188123 on 2009-02-04 20:00:17Z by rwatson

Remove written-to but never read local variable 'offset' from
soreceive_dgram().

Submitted by:	Christoph Mallon <christoph dot mallon at gmx dot de>
MFC after:	1 week

Revision 1.325: download - view: text, markup, annotated - select for diffs
Wed Dec 10 22:17:09 2008 UTC (3 years, 2 months ago) by bz
Branches: MAIN
Diff to: previous 1.324: preferred, colored
Changes since revision 1.324: +2 -1 lines
SVN rev 185893 on 2008-12-10 22:17:09Z by bz

Make sure nmbclusters are initialized before maxsockets
by running the tunable_mbinit() SYSINIT at SI_ORDER_MIDDLE
before the init_maxsockets() SYSINT at SI_ORDER_ANY.

Reviewed by:		rwatson, zec
Sponsored by:		The FreeBSD Foundation
MFC after:		4 weeks

Revision 1.324: download - view: text, markup, annotated - select for diffs
Wed Dec 10 22:10:37 2008 UTC (3 years, 2 months ago) by bz
Branches: MAIN
Diff to: previous 1.323: preferred, colored
Changes since revision 1.323: +3 -1 lines
SVN rev 185892 on 2008-12-10 22:10:37Z by bz

Style changes only. Put the return type on an extra line[1] and
add an empty line at the beginning as we do not have any local
variables.

Submitted by:	rwatson [1]
Reviewed by:	rwatson
MFC after:	4 weeks

Revision 1.323: download - view: text, markup, annotated - select for diffs
Sat Nov 29 14:32:14 2008 UTC (3 years, 2 months ago) by bz
Branches: MAIN
Diff to: previous 1.322: preferred, colored
Changes since revision 1.322: +4 -0 lines
SVN rev 185435 on 2008-11-29 14:32:14Z by bz

MFp4:
  Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
  and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
  help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
  suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
  on cluster machines as well as all the testers and people
  who provided feedback the last months on freebsd-jail and
  other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by:	(see above)
MFC after:	3 months (this is just so that I get the mail)
X-MFC Before:   7.2-RELEASE if possible

Revision 1.302.2.11.2.2: download - view: text, markup, annotated - select for diffs
Tue Nov 25 20:02:47 2008 UTC (3 years, 2 months ago) by julian
Branches: RELENG_7_1
CVS tags: RELENG_7_1_0_RELEASE
Diff to: previous 1.302.2.11.2.1: preferred, colored; branchpoint 1.302.2.11: preferred, colored; next MAIN 1.302.2.12: preferred, colored
Changes since revision 1.302.2.11.2.1: +3 -0 lines
SVN rev 185317 on 2008-11-25 20:02:47Z by julian

MFC @ 185101
Fix a scope problem in the multiple routing table code that
stopped the SO_SETFIB socket option from working correctly.

Approved by:     re (kensmith, kostik)
Obtained from:  Ironport

Revision 1.302.2.12: download - view: text, markup, annotated - select for diffs
Tue Nov 25 19:26:36 2008 UTC (3 years, 2 months ago) by julian
Branches: RELENG_7
Diff to: previous 1.302.2.11: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.11: +3 -0 lines
SVN rev 185311 on 2008-11-25 19:26:36Z by julian

MFC @ 185101
Fix a scope problem in the multiple routing table code that
stopped the SO_SETFIB socket option from working correctly.

Approved by:	 re (kensmith, kostik)
Obtained from:	Ironport

Revision 1.302.2.11.2.1: download - view: text, markup, annotated - select for diffs
Tue Nov 25 02:59:29 2008 UTC (3 years, 2 months ago) by kensmith
Branches: RELENG_7_1
Diff to: previous 1.302.2.11: preferred, colored
Changes since revision 1.302.2.11: +0 -0 lines
SVN rev 185281 on 2008-11-25 02:59:29Z by kensmith

Create releng/7.1 in preparation for moving into RC phase of 7.1 release
cycle.

Approved by:	re (implicit)

Revision 1.322: download - view: text, markup, annotated - select for diffs
Sat Nov 22 12:36:15 2008 UTC (3 years, 2 months ago) by kib
Branches: MAIN
Diff to: previous 1.321: preferred, colored
Changes since revision 1.321: +3 -4 lines
SVN rev 185169 on 2008-11-22 12:36:15Z by kib

Add sv_flags field to struct sysentvec with intention to provide description
of the ABI of the currently executing image. Change some places to test
the flags instead of explicit comparing with address of known sysentvec
structures to determine ABI features.

Discussed with:	dchagin, imp, jhb, peter

Revision 1.321: download - view: text, markup, annotated - select for diffs
Wed Nov 19 19:19:30 2008 UTC (3 years, 2 months ago) by julian
Branches: MAIN
Diff to: previous 1.320: preferred, colored
Changes since revision 1.320: +3 -0 lines
SVN rev 185101 on 2008-11-19 19:19:30Z by julian

Fix a scope problem in the multiple routing table code that stopped the
SO_SETFIB socket option from working correctly.

Obtained from:	Ironport
MFC after:	3 days

Revision 1.320: download - view: text, markup, annotated - select for diffs
Fri Oct 17 01:25:45 2008 UTC (3 years, 3 months ago) by kmacy
Branches: MAIN
Diff to: previous 1.319: preferred, colored
Changes since revision 1.319: +2 -0 lines
SVN rev 183963 on 2008-10-17 01:25:45Z by kmacy

make sure that SO_NO_DDP and SO_NO_OFFLOAD get passed in correctly

PR:		127360
MFC after:	3 days

Revision 1.302.2.11: download - view: text, markup, annotated - select for diffs
Tue Oct 14 22:48:38 2008 UTC (3 years, 3 months ago) by rwatson
Branches: RELENG_7
CVS tags: RELENG_7_1_BP
Branch point for: RELENG_7_1
Diff to: previous 1.302.2.10: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.10: +2 -1 lines
SVN rev 183898 on 2008-10-14 22:48:38Z by rwatson

Merge r183675 from head to stable/7:

  In soreceive_dgram, when a 0-length buffer is passed into recv(2) and
  no data is ready, return 0 rather than blocking or returning EAGAIN.
  This is consistent with the behavior of soreceive_generic (soreceive)
  in earlier versions of FreeBSD, and restores this behavior for UDP.

  Discussed with: jhb, sam

Approved by:	re (kib)

Revision 1.302.2.10: download - view: text, markup, annotated - select for diffs
Tue Oct 14 08:44:27 2008 UTC (3 years, 3 months ago) by rwatson
Branches: RELENG_7
Diff to: previous 1.302.2.9: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.9: +0 -12 lines
SVN rev 183875 on 2008-10-14 08:44:27Z by rwatson

Merge r183664 from head to stable/7:

  Remove temporary debugging KASSERT's introduced to detect protocols
  improperly invoking sosend(), soreceive(), and sopoll() instead of
  attach either specialized or _generic() versions of those functions
  to their pru_sosend, pru_soreceive, and pru_sopoll protosw methods.

Approved by:	re (kib)

Revision 1.302.2.9: download - view: text, markup, annotated - select for diffs
Thu Oct 9 17:52:47 2008 UTC (3 years, 4 months ago) by jhb
Branches: RELENG_7
Diff to: previous 1.302.2.8: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.8: +13 -23 lines
SVN rev 183726 on 2008-10-09 17:52:47Z by jhb

MFC: Wait until after dropping the receive socket buffer lock to allocate
space to store the socket address stored in the first mbuf in a packet
chain in soreceive_dgram().

Approved by:	re (kensmith)

Revision 1.319: download - view: text, markup, annotated - select for diffs
Tue Oct 7 20:57:55 2008 UTC (3 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.318: preferred, colored
Changes since revision 1.318: +2 -1 lines
SVN rev 183675 on 2008-10-07 20:57:55Z by rwatson

In soreceive_dgram, when a 0-length buffer is passed into recv(2) and
no data is ready, return 0 rather than blocking or returning EAGAIN.
This is consistent with the behavior of soreceive_generic (soreceive)
in earlier versions of FreeBSD, and restores this behavior for UDP.

Discussed with:	jhb, sam
MFC after:	3 days

Revision 1.318: download - view: text, markup, annotated - select for diffs
Tue Oct 7 09:57:03 2008 UTC (3 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.317: preferred, colored
Changes since revision 1.317: +0 -12 lines
SVN rev 183664 on 2008-10-07 09:57:03Z by rwatson

Remove temporary debugging KASSERT's introduced to detect protocols
improperly invoking sosend(), soreceive(), and sopoll() instead of
attach either specialized or _generic() versions of those functions
to their pru_sosend, pru_soreceive, and pru_sopoll protosw methods.

MFC after:	3 days

Revision 1.302.2.8: download - view: text, markup, annotated - select for diffs
Sun Oct 5 21:05:20 2008 UTC (3 years, 4 months ago) by rwatson
Branches: RELENG_7
Diff to: previous 1.302.2.7: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.7: +19 -58 lines
SVN rev 183633 on 2008-10-05 21:05:20Z by rwatson

Merge r183512 from head to stable/7:

  Various cleanups for soreceive_dgram():

  - Update or remove comments that were left over from the original
    soreceive_generic() implementation.  Quite a few were misleading in the
    context of the new code.
  - Since soreceive_dgram() has a simpler structure, replace several gotos
    with a while loop making the invariants more clear.
  - In the blocking while loop, don't try to handle cases incompatible with
    the loop invariant (since m is always NULL, don't check for and handle
    non-NULL).
  - Don't drop and re-acquire the socket buffer lock unnecessarily after
    sbwait() returns, which may help reduce lock contention (etc).
  - Assume PR_ATOMIC since we assert it at the top of the function.

Approved by:	re (gnn)

Revision 1.302.2.7: download - view: text, markup, annotated - select for diffs
Fri Oct 3 21:33:58 2008 UTC (3 years, 4 months ago) by jhb
Branches: RELENG_7
Diff to: previous 1.302.2.6: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.6: +4 -4 lines
SVN rev 183575 on 2008-10-03 21:33:58Z by jhb

MFC: Update the function name in several assertions in soreceive_dgram().

Approved by:	re (kib)

Revision 1.242.2.10.4.1: download - view: text, markup, annotated - select for diffs
Thu Oct 2 02:57:24 2008 UTC (3 years, 4 months ago) by kensmith
Branches: RELENG_6_4
CVS tags: RELENG_6_4_0_RELEASE
Diff to: previous 1.242.2.10: preferred, colored; next MAIN 1.243: preferred, colored
Changes since revision 1.242.2.10: +0 -0 lines
SVN rev 183531 on 2008-10-02 02:57:24Z by kensmith

Create releng/6.4 from stable/6 in preparation for 6.4-RC1.

Approved by:	re (implicit)

Revision 1.317: download - view: text, markup, annotated - select for diffs
Wed Oct 1 19:14:05 2008 UTC (3 years, 4 months ago) by jhb
Branches: MAIN
Diff to: previous 1.316: preferred, colored
Changes since revision 1.316: +13 -23 lines
SVN rev 183518 on 2008-10-01 19:14:05Z by jhb

Wait until after dropping the receive socket buffer lock to allocate space
to store the socket address stored in the first mbuf in a packet chain.
This reduces contention on the lock and CPU system time in certain UDP
workloads.

Tested by:	ps
Reviewed by:	rwatson
MFC after:	1 week

Revision 1.316: download - view: text, markup, annotated - select for diffs
Wed Oct 1 13:26:52 2008 UTC (3 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.315: preferred, colored
Changes since revision 1.315: +19 -58 lines
SVN rev 183512 on 2008-10-01 13:26:52Z by rwatson

Various cleanups for soreceive_dgram():

- Update or remove comments that were left over from the original
  soreceive_generic() implementation.  Quite a few were misleading in the
  context of the new code.
- Since soreceive_dgram() has a simpler structure, replace several gotos
  with a while loop making the invariants more clear.
- In the blocking while loop, don't try to handle cases incompatible with
  the loop invariant (since m is always NULL, don't check for and handle
  non-NULL).
- Don't drop and re-acquire the socket buffer lock unnecessarily after
  sbwait() returns, which may help reduce lock contention (etc).
- Assume PR_ATOMIC since we assert it at the top of the function.

MFC after:	3 days

Revision 1.315: download - view: text, markup, annotated - select for diffs
Tue Sep 30 18:44:26 2008 UTC (3 years, 4 months ago) by jhb
Branches: MAIN
Diff to: previous 1.314: preferred, colored
Changes since revision 1.314: +4 -4 lines
SVN rev 183503 on 2008-09-30 18:44:26Z by jhb

Update the function name in several assertions in soreceive_dgram().

Approved by:	rwatson
MFC after:	3 days

Revision 1.302.2.6: download - view: text, markup, annotated - select for diffs
Mon Sep 15 20:46:32 2008 UTC (3 years, 4 months ago) by rwatson
Branches: RELENG_7
Diff to: previous 1.302.2.5: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.5: +230 -1 lines
SVN rev 183051 on 2008-09-15 20:46:32Z by rwatson

Merge r180198, r180211, r180365, r182682 from head to stable/7:

  Add soreceive_dgram(9), an optimized socket receive function for use by
  datagram-only protocols, such as UDP.  This version removes use of
  sblock(), which is not required due to an inability to interlace data
  improperly with datagrams, as well as avoiding some of the larger loops
  and state management that don't apply on datagram sockets.

  This is experimental code, so hook it up only for UDPv4 for testing; if
  there are problems we may need to revise it or turn it off by default,
  but it offers *significant* performance improvements for threaded UDP
  applications such as BIND9, nsd, and memcached using UDP.

  Tested by:      kris, ps

  Update copyright date in light of soreceive_dgram(9).

  Use soreceive_dgram() and sosend_dgram() with UDPv6, as we do with UDPv4.

  Tested by:      ps

  Remove XXXRW in soreceive_dgram that proves unnecessary.

  Remove unused orig_resid variable in soreceive_dgram.

  Submitted by:   alfred

Note: in the MFC, we do enable sosend_dgram for UDPv6 by default (it was
already used for UDPv4), but use of soreceive_dgram for both UDPv4 and
UDPv6 is controlled by a new loader tunable,
net.inet.udp.soreceive_dgram_enabled as soreceive_dgram has less testing
exposure than sosend_dgram.  We may wish to change the default (and
eliminate the tunable) in 7.2.

MFC requested by:	gnn, kris, ps
Approved by:		re (kib)

Revision 1.314: download - view: text, markup, annotated - select for diffs
Tue Sep 2 16:55:21 2008 UTC (3 years, 5 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.313: preferred, colored
Changes since revision 1.313: +0 -5 lines
SVN rev 182682 on 2008-09-02 16:55:21Z by rwatson

Remove XXXRW in soreceive_dgram that proves unnecessary.

Remove unused orig_resid variable in soreceive_dgram.

Submitted by:	alfred
X-MFC with:	soreceive_dgram (r180198, r180211)

Revision 1.302.2.5: download - view: text, markup, annotated - select for diffs
Thu Jul 31 05:48:51 2008 UTC (3 years, 6 months ago) by kmacy
Branches: RELENG_7
Diff to: previous 1.302.2.4: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.4: +139 -0 lines
SVN rev 181046 on 2008-07-31 05:48:51Z by kmacy

MFC accessor functions for socket fields.

Revision 1.302.2.4: download - view: text, markup, annotated - select for diffs
Thu Jul 24 01:13:22 2008 UTC (3 years, 6 months ago) by julian
Branches: RELENG_7
Diff to: previous 1.302.2.3: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.3: +20 -0 lines
SVN rev 180774 on 2008-07-24 01:13:22Z by julian

MFC an ABI compatible implementation of Multiple routing tables.
See the commit message for
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/net/route.c
version 1.129 (svn change # 178888) for more info.

Obtained from:	 Ironport (Cisco Systems)

Revision 1.313: download - view: text, markup, annotated - select for diffs
Mon Jul 21 00:49:34 2008 UTC (3 years, 6 months ago) by kmacy
Branches: MAIN
Diff to: previous 1.312: preferred, colored
Changes since revision 1.312: +139 -0 lines
SVN rev 180641 on 2008-07-21 00:49:34Z by kmacy

Add accessor functions for socket fields.

MFC after:	1 week

Revision 1.312: download - view: text, markup, annotated - select for diffs
Thu Jul 3 06:47:45 2008 UTC (3 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.311: preferred, colored
Changes since revision 1.311: +1 -1 lines
SVN rev 180211 on 2008-07-03 06:47:45Z by rwatson

Update copyright date in light of soreceive_dgram(9).

Revision 1.311: download - view: text, markup, annotated - select for diffs
Wed Jul 2 23:23:27 2008 UTC (3 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.310: preferred, colored
Changes since revision 1.310: +234 -0 lines
SVN rev 180198 on 2008-07-02 23:23:27Z by rwatson

Add soreceive_dgram(9), an optimized socket receive function for use by
datagram-only protocols, such as UDP.  This version removes use of
sblock(), which is not required due to an inability to interlace data
improperly with datagrams, as well as avoiding some of the larger loops
and state management that don't apply on datagram sockets.

This is experimental code, so hook it up only for UDPv4 for testing; if
there are problems we may need to revise it or turn it off by default,
but it offers *significant* performance improvements for threaded UDP
applications such as BIND9, nsd, and memcached using UDP.

Tested by:	kris, ps

Revision 1.310: download - view: text, markup, annotated - select for diffs
Fri May 9 23:02:55 2008 UTC (3 years, 9 months ago) by julian
Branches: MAIN
Diff to: previous 1.309: preferred, colored
Changes since revision 1.309: +20 -0 lines
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

  One thing where FreeBSD has been falling behind, and which by chance I
  have some time to work on is "policy based routing", which allows
  different
  packet streams to be routed by more than just the destination address.

  Constraints:
  ------------

  I want to make some form of this available in the 6.x tree
  (and by extension 7.x) , but FreeBSD in general needs it so I might as
  well do it in -current and back port the portions I need.

  One of the ways that this can be done is to have the ability to
  instantiate multiple kernel routing tables (which I will now
  refer to as "Forwarding Information Bases" or "FIBs" for political
  correctness reasons). Which FIB a particular packet uses to make
  the next hop decision can be decided by a number of mechanisms.
  The policies these mechanisms implement are the "Policies" referred
  to in "Policy based routing".

  One of the constraints I have if I try to back port this work to
  6.x is that it must be implemented as a EXTENSION to the existing
  ABIs in 6.x so that third party applications do not need to be
  recompiled in timespan of the branch.

  This first version will not have some of the bells and whistles that
  will come with later versions. It will, for example, be limited to 16
  tables in the first commit.
  Implementation method, Compatible version. (part 1)
  -------------------------------
  For this reason I have implemented a "sufficient subset" of a
  multiple routing table solution in Perforce, and back-ported it
  to 6.x. (also in Perforce though not  always caught up with what I
  have done in -current/P4). The subset allows a number of FIBs
  to be defined at compile time (8 is sufficient for my purposes in 6.x)
  and implements the changes needed to allow IPV4 to use them. I have not
  done the changes for ipv6 simply because I do not need it, and I do not
  have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

  Other protocol families are left untouched and should there be
  users with proprietary protocol families, they should continue to work
  and be oblivious to the existence of the extra FIBs.

  To understand how this is done, one must know that the current FIB
  code starts everything off with a single dimensional array of
  pointers to FIB head structures (One per protocol family), each of
  which in turn points to the trie of routes available to that family.

  The basic change in the ABI compatible version of the change is to
  extent that array to be a 2 dimensional array, so that
  instead of protocol family X looking at rt_tables[X] for the
  table it needs, it looks at rt_tables[Y][X] when for all
  protocol families except ipv4 Y is always 0.
  Code that is unaware of the change always just sees the first row
  of the table, which of course looks just like the one dimensional
  array that existed before.

  The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
  are all maintained, but refer only to the first row of the array,
  so that existing callers in proprietary protocols can continue to
  do the "right thing".
  Some new entry points are added, for the exclusive use of ipv4 code
  called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
  which have an extra argument which refers the code to the correct row.

  In addition, there are some new entry points (currently called
  rtalloc_fib() and friends) that check the Address family being
  looked up and call either rtalloc() (and friends) if the protocol
  is not IPv4 forcing the action to row 0 or to the appropriate row
  if it IS IPv4 (and that info is available). These are for calling
  from code that is not specific to any particular protocol. The way
  these are implemented would change in the non ABI preserving code
  to be added later.

  One feature of the first version of the code is that for ipv4,
  the interface routes show up automatically on all the FIBs, so
  that no matter what FIB you select you always have the basic
  direct attached hosts available to you. (rtinit() does this
  automatically).

  You CAN delete an interface route from one FIB should you want
  to but by default it's there. ARP information is also available
  in each FIB. It's assumed that the same machine would have the
  same MAC address, regardless of which FIB you are using to get
  to it.

  This brings us as to how the correct FIB is selected for an outgoing
  IPV4 packet.

  Firstly, all packets have a FIB associated with them. if nothing
  has been done to change it, it will be FIB 0. The FIB is changed
  in the following ways.

  Packets fall into one of a number of classes.

  1/ locally generated packets, coming from a socket/PCB.
     Such packets select a FIB from a number associated with the
     socket/PCB. This in turn is inherited from the process,
     but can be changed by a socket option. The process in turn
     inherits it on fork. I have written a utility call setfib
     that acts a bit like nice..

         setfib -3 ping target.example.com # will use fib 3 for ping.

     It is an obvious extension to make it a property of a jail
     but I have not done so. It can be achieved by combining the setfib and
     jail commands.

  2/ packets received on an interface for forwarding.
     By default these packets would use table 0,
     (or possibly a number settable in a sysctl(not yet)).
     but prior to routing the firewall can inspect them (see below).
     (possibly in the future you may be able to associate a FIB
     with packets received on an interface..  An ifconfig arg, but not yet.)

  3/ packets inspected by a packet classifier, which can arbitrarily
     associate a fib with it on a packet by packet basis.
     A fib assigned to a packet by a packet classifier
     (such as ipfw) would over-ride a fib associated by
     a more default source. (such as cases 1 or 2).

  4/ a tcp listen socket associated with a fib will generate
     accept sockets that are associated with that same fib.

  5/ Packets generated in response to some other packet (e.g. reset
     or icmp packets). These should use the FIB associated with the
     packet being reponded to.

  6/ Packets generated during encapsulation.
     gif, tun and other tunnel interfaces will encapsulate using the FIB
     that was in effect withthe proces that set up the tunnel.
     thus setfib 1 ifconfig gif0 [tunnel instructions]
     will set the fib for the tunnel to use to be fib 1.

  Routing messages would be associated with their
  process, and thus select one FIB or another.
  messages from the kernel would be associated with the fib they
  refer to and would only be received by a routing socket associated
  with that fib. (not yet implemented)

  In addition Netstat has been edited to be able to cope with the
  fact that the array is now 2 dimensional. (It looks in system
  memory using libkvm (!)). Old versions of netstat see only the first FIB.

  In addition two sysctls are added to give:
  a) the number of FIBs compiled in (active)
  b) the default FIB of the calling process.

  Early testing experience:
  -------------------------

  Basically our (IronPort's) appliance does this functionality already
  using ipfw fwd but that method has some drawbacks.

  For example,
  It can't fully simulate a routing table because it can't influence the
  socket's choice of local address when a connect() is done.

  Testing during the generating of these changes has been
  remarkably smooth so far. Multiple tables have co-existed
  with no notable side effects, and packets have been routes
  accordingly.

  ipfw has grown 2 new keywords:

  setfib N ip from anay to any
  count ip from any to any fib N

  In pf there seems to be a requirement to be able to give symbolic names to the
  fibs but I do not have that capacity. I am not sure if it is required.

  SCTP has interestingly enough built in support for this, called VRFs
  in Cisco parlance. it will be interesting to see how that handles it
  when it suddenly actually does something.

  Where to next:
  --------------------

  After committing the ABI compatible version and MFCing it, I'd
  like to proceed in a forward direction in -current. this will
  result in some roto-tilling in the routing code.

  Firstly: the current code's idea of having a separate tree per
  protocol family, all of the same format, and pointed to by the
  1 dimensional array is a bit silly. Especially when one considers that
  there is code that makes assumptions about every protocol having the
  same internal structures there. Some protocols don't WANT that
  sort of structure. (for example the whole idea of a netmask is foreign
  to appletalk). This needs to be made opaque to the external code.

  My suggested first change is to add routing method pointers to the
  'domain' structure, along with information pointing the data.
  instead of having an array of pointers to uniform structures,
  there would be an array pointing to the 'domain' structures
  for each protocol address domain (protocol family),
  and the methods this reached would be called. The methods would have
  an argument that gives FIB number, but the protocol would be free
  to ignore it.

  When the ABI can be changed it raises the possibilty of the
  addition of a fib entry into the "struct route". Currently,
  the structure contains the sockaddr of the desination, and the resulting
  fib entry. To make this work fully, one could add a fib number
  so that given an address and a fib, one can find the third element, the
  fib entry.

  Interaction with the ARP layer/ LL layer would need to be
  revisited as well. Qing Li has been working on this already.

  This work was sponsored by Ironport Systems/Cisco

Reviewed by:    several including rwatson, bz and mlair (parts each)
Obtained from:  Ironport systems/Cisco

Revision 1.302.2.3: download - view: text, markup, annotated - select for diffs
Tue Apr 29 09:17:39 2008 UTC (3 years, 9 months ago) by rrs
Branches: RELENG_7
Diff to: previous 1.302.2.2: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.2: +3 -1 lines
  Add pru_flush routine so a transport can
  flush itself during Shutdown

Revision 1.309: download - view: text, markup, annotated - select for diffs
Mon Apr 14 18:06:04 2008 UTC (3 years, 9 months ago) by rrs
Branches: MAIN
Diff to: previous 1.308: preferred, colored
Changes since revision 1.308: +3 -1 lines
Add pru_flush routine so a transport can
flush itself during Shutdown

MFC after:	1 week

Revision 1.308: download - view: text, markup, annotated - select for diffs
Tue Mar 25 09:38:59 2008 UTC (3 years, 10 months ago) by ru
Branches: MAIN
Diff to: previous 1.307: preferred, colored
Changes since revision 1.307: +12 -14 lines
Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.

Revision 1.307: download - view: text, markup, annotated - select for diffs
Wed Mar 19 09:58:25 2008 UTC (3 years, 10 months ago) by sobomax
Branches: MAIN
Diff to: previous 1.306: preferred, colored
Changes since revision 1.306: +0 -1 lines
Revert previous change - it appears that the limit I was hitting was a
maxsockets limit, not maxfiles limit. The question remains why those
limits are handled differently (with error code for maxfiles but with
sleep for maxsokets), but those would be addressed in a separate commit
if necessary.

Requested by:   rwhatson, jeff

Revision 1.306: download - view: text, markup, annotated - select for diffs
Sun Mar 16 06:21:30 2008 UTC (3 years, 10 months ago) by sobomax
Branches: MAIN
Diff to: previous 1.305: preferred, colored
Changes since revision 1.305: +1 -0 lines
Properly set size of the file_zone to match kern.maxfiles parameter.
Otherwise the parameter is no-op, since zone by default limits number
of descriptors to some 12K entries. Attempt to allocate more ends up
sleeping on zonelimit.

MFC after:	2 weeks

Revision 1.302.2.2: download - view: text, markup, annotated - select for diffs
Sat Mar 1 15:40:52 2008 UTC (3 years, 11 months ago) by rwatson
Branches: RELENG_7
Diff to: previous 1.302.2.1: preferred, colored; branchpoint 1.302: preferred, colored
Changes since revision 1.302.2.1: +12 -12 lines
Merge uipc_sockbuf.c:1.176, uipc_socket.c:1.305, socketvar.h:1.162 from
HEAD to RELENG_7:

  Further clean up sorflush:

  - Expose sbrelease_internal(), a variant of sbrelease() with no
    expectations about the validity of locks in the socket buffer.
  - Use sbrelease_internel() in sorflush(), and as a result avoid
    initializing and destroying a socket buffer lock for the temporary
    stack copy of the actual buffer, asb.
  - Add a comment indicating why we do what we do, and remove an XXX
    since things have gotten less ugly in sorflush() lately.

  This makes socket close cleaner, and possibly also marginally faster.

Revision 1.305: download - view: text, markup, annotated - select for diffs
Mon Feb 4 12:25:13 2008 UTC (4 years ago) by rwatson
Branches: MAIN
Diff to: previous 1.304: preferred, colored
Changes since revision 1.304: +12 -12 lines
Further clean up sorflush:

- Expose sbrelease_internal(), a variant of sbrelease() with no
  expectations about the validity of locks in the socket buffer.
- Use sbrelease_internel() in sorflush(), and as a result avoid intializing
  and destroying a socket buffer lock for the temporary stack copy of the
  actual buffer, asb.
- Add a comment indicating why we do what we do, and remove an XXX since
  things have gotten less ugly in sorflush() lately.

This makes socket close cleaner, and possibly also marginally faster.

MFC after:	3 weeks

Revision 1.302.4.1: download - view: text, markup, annotated - select for diffs
Sat Feb 2 12:44:13 2008 UTC (4 years ago) by rwatson
Branches: RELENG_7_0
CVS tags: RELENG_7_0_0_RELEASE
Diff to: previous 1.302: preferred, colored; next MAIN 1.303: preferred, colored
Changes since revision 1.302: +11 -5 lines
Merge uipc_sockbuf.c:1.175, uipc_socket.c:1.304, uipc_syscalls.c:1.264,
sctp_input.c:1.67, sctp_peeloff.c:1.17, sctputil.c:1.73,
socketvar.h:1.161 from HEAD to RELENG_7_0:

  Correct two problems relating to sorflush(), which is called to flush
  read socket buffers in shutdown() and close():

  - Call socantrcvmore() before sblock() to dislodge any threads that
    might be sleeping (potentially indefinitely) while holding sblock(),
    such as a thread blocked in recv().

  - Flag the sblock() call as non-interruptible so that a signal
    delivered to the thread calling sorflush() doesn't cause sblock() to
    fail.  The sblock() is required to ensure that all other socket
    consumer threads have, in fact, left, and do not enter, the socket
    buffer until we're done flushin it.

  To implement the latter, change the 'flags' argument to sblock() to
  accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
  flag.  When SBL_NOINTR is set, it forces a non-interruptible sx
  acquisition, regardless of the setting of the disposition of SB_NOINTR
  on the socket buffer; without this change it would be possible for
  another thread to clear SB_NOINTR between when the socket buffer mutex
  is released and sblock() is invoked.

  Reviewed by:    bz, kmacy, rrs
  Reported by:    Jos Backus <jos at catnook dot com>

Approved by:	re (kensmith)

Revision 1.302.2.1: download - view: text, markup, annotated - select for diffs
Fri Feb 1 22:51:39 2008 UTC (4 years ago) by rwatson
Branches: RELENG_7
Diff to: previous 1.302: preferred, colored
Changes since revision 1.302: +11 -5 lines
Merge uipc_sockbuf.c:1.175, uipc_socket.c:1.304, uipc_syscalls.c:1.264,
sctp_input.c:1.67, sctp_peeloff.c:1.17, sctputil.c:1.73,
socketvar.h:1.161 from HEAD to RELENG_7:

  Correct two problems relating to sorflush(), which is called to flush
  read socket buffers in shutdown() and close():

  - Call socantrcvmore() before sblock() to dislodge any threads that
    might be sleeping (potentially indefinitely) while holding sblock(),
    such as a thread blocked in recv().

  - Flag the sblock() call as non-interruptible so that a signal
    delivered to the thread calling sorflush() doesn't cause sblock() to
    fail.  The sblock() is required to ensure that all other socket
    consumer threads have, in fact, left, and do not enter, the socket
    buffer until we're done flushin it.

  To implement the latter, change the 'flags' argument to sblock() to
  accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
  flag.  When SBL_NOINTR is set, it forces a non-interruptible sx
  acquisition, regardless of the setting of the disposition of SB_NOINTR
  on the socket buffer; without this change it would be possible for
  another thread to clear SB_NOINTR between when the socket buffer mutex
  is released and sblock() is invoked.

  Reviewed by:    bz, kmacy
  Reported by:    Jos Backus <jos at catnook dot com>

Revision 1.304: download - view: text, markup, annotated - select for diffs
Thu Jan 31 08:22:24 2008 UTC (4 years ago) by rwatson
Branches: MAIN
Diff to: previous 1.303: preferred, colored
Changes since revision 1.303: +11 -5 lines
Correct two problems relating to sorflush(), which is called to flush
read socket buffers in shutdown() and close():

- Call socantrcvmore() before sblock() to dislodge any threads that
  might be sleeping (potentially indefinitely) while holding sblock(),
  such as a thread blocked in recv().

- Flag the sblock() call as non-interruptible so that a signal
  delivered to the thread calling sorflush() doesn't cause sblock() to
  fail.  The sblock() is required to ensure that all other socket
  consumer threads have, in fact, left, and do not enter, the socket
  buffer until we're done flushin it.

To implement the latter, change the 'flags' argument to sblock() to
accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK
flag.  When SBL_NOINTR is set, it forces a non-interruptible sx
acquisition, regardless of the setting of the disposition of SB_NOINTR
on the socket buffer; without this change it would be possible for
another thread to clear SB_NOINTR between when the socket buffer mutex
is released and sblock() is invoked.

Reviewed by:	bz, kmacy
Reported by:	Jos Backus <jos at catnook dot com>

Revision 1.303: download - view: text, markup, annotated - select for diffs
Wed Oct 24 19:03:55 2007 UTC (4 years, 3 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.302: preferred, colored
Changes since revision 1.302: +4 -4 lines
Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer

Revision 1.242.2.10: download - view: text, markup, annotated - select for diffs
Thu Aug 23 18:17:08 2007 UTC (4 years, 5 months ago) by jinmei
Branches: RELENG_6
CVS tags: RELENG_6_4_BP, RELENG_6_3_BP, RELENG_6_3_0_RELEASE, RELENG_6_3
Branch point for: RELENG_6_4
Diff to: previous 1.242.2.9: preferred, colored; branchpoint 1.242: preferred, colored; next MAIN 1.243: preferred, colored
Changes since revision 1.242.2.9: +2 -2 lines
MFC:
  Fix a kernel panic based on receiving an ICMPv6 Packet too Big message.
  (MFC was planned but has been missed)

PR:		99779
Submitted by:	Jinmei Tatuya
Reviewed by:	clement, rwatson
Approved by:	gnn (mentor)

src/sys/kern/uipc_socket.c:	1.280

Revision 1.302: download - view: text, markup, annotated - select for diffs
Mon Jun 4 18:25:07 2007 UTC (4 years, 8 months ago) by dwmalone
Branches: MAIN
CVS tags: RELENG_7_BP, RELENG_7_0_BP
Branch point for: RELENG_7_0, RELENG_7
Diff to: previous 1.301: preferred, colored
Changes since revision 1.301: +2 -2 lines
Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.

Revision 1.301: download - view: text, markup, annotated - select for diffs
Fri Jun 1 01:12:43 2007 UTC (4 years, 8 months ago) by jeff
Branches: MAIN
Diff to: previous 1.300: preferred, colored
Changes since revision 1.300: +3 -3 lines
 - Move rusage from being per-process in struct pstats to per-thread in
   td_ru.  This removes the requirement for per-process synchronization in
   statclock() and mi_switch().  This was previously supported by
   sched_lock which is going away.  All modifications to rusage are now
   done in the context of the owning thread.  reads proceed without locks.
 - Aggregate exiting threads rusage in thread_exit() such that the exiting
   thread's rusage is not lost.
 - Provide a new routine, rufetch() to fetch an aggregate of all rusage
   structures from all threads in a process.  This routine must be used
   in any place requiring a rusage from a process prior to it's exit.  The
   exited process's rusage is still available via p_ru.
 - Aggregate tick statistics only on demand via rufetch() or when a thread
   exits.  Tick statistics are kept in the thread and protected by sched_lock
   until it exits.

Initial patch by:	attilio
Reviewed by:		attilio, bde (some objections), arch (mostly silent)

Revision 1.300: download - view: text, markup, annotated - select for diffs
Wed May 16 20:41:07 2007 UTC (4 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.299: preferred, colored
Changes since revision 1.299: +61 -116 lines
Generally migrate to ANSI function headers, and remove 'register' use.

Revision 1.299: download - view: text, markup, annotated - select for diffs
Tue May 8 12:34:14 2007 UTC (4 years, 9 months ago) by yongari
Branches: MAIN
Diff to: previous 1.298: preferred, colored
Changes since revision 1.298: +1 -1 lines
Add missing socket buffer unlock before returning to userland.

Reviewed by:    rwatson

Revision 1.298: download - view: text, markup, annotated - select for diffs
Thu May 3 14:42:41 2007 UTC (4 years, 9 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.297: preferred, colored
Changes since revision 1.297: +68 -61 lines
sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags
on each socket buffer with the socket buffer's mutex.  This sleep lock is
used to serialize I/O on sockets in order to prevent I/O interlacing.

This change replaces the custom sleep lock with an sx(9) lock, which
results in marginally better performance, better handling of contention
during simultaneous socket I/O across multiple threads, and a cleaner
separation between the different layers of locking in socket buffers.
Specifically, the socket buffer mutex is now solely responsible for
serializing simultaneous operation on the socket buffer data structure,
and not for I/O serialization.

While here, fix two historic bugs:

(1) a bug allowing I/O to be occasionally interlaced during long I/O
    operations (discovere by Isilon).

(2) a bug in which failed non-blocking acquisition of the socket buffer
    I/O serialization lock might be ignored (discovered by sam).

SCTP portion of this patch submitted by rrs.

Revision 1.297: download - view: text, markup, annotated - select for diffs
Mon Mar 26 17:05:09 2007 UTC (4 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.296: preferred, colored
Changes since revision 1.296: +32 -34 lines
Following movement of functions from uipc_socket2.c to uipc_socket.c and
uipc_sockbuf.c, clean up and update comments.

Revision 1.296: download - view: text, markup, annotated - select for diffs
Mon Mar 26 08:59:03 2007 UTC (4 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.295: preferred, colored
Changes since revision 1.295: +298 -0 lines
Complete removal of uipc_socket2.c by moving the last few functions to
other C files:

- Move sbcreatecontrol() and sbtoxsockbuf() to uipc_sockbuf.c.  While
  sbcreatecontrol() is really an mbuf allocation routine, it does its work
  with awareness of the layout of socket buffer memory.

- Move pru_*() protocol switch stubs to uipc_socket.c where the non-stub
  versions of several of these functions live.  Likewise, move socket state
  transition calls (soisconnecting(), etc) to uipc_socket.c.  Moveo
  sodupsockaddr() and sotoxsocket().

Revision 1.295: download - view: text, markup, annotated - select for diffs
Thu Mar 22 13:21:24 2007 UTC (4 years, 10 months ago) by glebius
Branches: MAIN
Diff to: previous 1.294: preferred, colored
Changes since revision 1.294: +5 -4 lines
Move the dom_dispose and pru_detach calls in sofree() earlier. Only after
calling pru_detach we can be absolutely sure, that we don't have any
references to the socket in the stack.

This closes race between lockless sbdestroy() and data arriving on socket.

Reviewed by:	rwatson

Revision 1.294: download - view: text, markup, annotated - select for diffs
Mon Mar 12 19:27:36 2007 UTC (4 years, 11 months ago) by jhb
Branches: MAIN
Diff to: previous 1.293: preferred, colored
Changes since revision 1.293: +5 -20 lines
- Use m_gethdr(), m_get(), and m_clget() instead of the macros in
  sosend_copyin().
- Use M_WAITOK instead of M_TRYWAIT in sosend_copyin().
- Don't check for NULL from M_WAITOK and return ENOBUFS.
  M_WAITOK/M_TRYWAIT allocations don't fail with NULL.

Reviewed by:	andre
Requested by:	andre (2)

Revision 1.242.2.9: download - view: text, markup, annotated - select for diffs
Mon Mar 12 12:13:52 2007 UTC (4 years, 11 months ago) by ru
Branches: RELENG_6
Diff to: previous 1.242.2.8: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.8: +4 -4 lines
MFC: Don't block on the socket zone limit during the socket()
syscall which can lock up a system otherwise; instead, return
ENOBUFS as documented, which matches the FreeBSD 4.x behavior.

Revision 1.293: download - view: text, markup, annotated - select for diffs
Mon Feb 26 10:45:21 2007 UTC (4 years, 11 months ago) by ru
Branches: MAIN
Diff to: previous 1.292: preferred, colored
Changes since revision 1.292: +5 -5 lines
Don't block on the socket zone limit during the socket()
call which can easily lock up a system otherwise; instead,
return ENOBUFS as documented in a manpage, thus reverting
us to the FreeBSD 4.x behavior.

Reviewed by:	rwatson
MFC after:	2 weeks

Revision 1.292: download - view: text, markup, annotated - select for diffs
Thu Feb 15 10:11:00 2007 UTC (4 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.291: preferred, colored
Changes since revision 1.291: +3 -3 lines
Rename somaxconn_sysctl() to sysctl_somaxconn() so that I will be able to
claim that sofoo() functions all accept a socket as their first argument.

Revision 1.242.2.8: download - view: text, markup, annotated - select for diffs
Sat Feb 3 04:01:22 2007 UTC (5 years ago) by bms
Branches: RELENG_6
Diff to: previous 1.242.2.7: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.7: +13 -1 lines
MFC:
 Drop all received data mbufs from a socket's queue if the MT_SONAME
 mbuf is dropped, to preserve the invariant in the PR_ADDR case.

PR:		kern/38495
Submitted by:	James Juran
Reviewed by:	sam, rwatson
Obtained from:	NetBSD

Revision 1.291: download - view: text, markup, annotated - select for diffs
Sat Feb 3 03:57:45 2007 UTC (5 years ago) by bms
Branches: MAIN
Diff to: previous 1.290: preferred, colored
Changes since revision 1.290: +2 -3 lines
Diff reduction with RELENG_6, style(9):
Remove unnecessary brace; && should be on end of line.
No functional changes.

Revision 1.290: download - view: text, markup, annotated - select for diffs
Thu Feb 1 17:53:40 2007 UTC (5 years ago) by andre
Branches: MAIN
Diff to: previous 1.289: preferred, colored
Changes since revision 1.289: +8 -0 lines
Generic socket buffer auto sizing support, header defines, flag inheritance.

MFC after:	1 month

Revision 1.289: download - view: text, markup, annotated - select for diffs
Mon Jan 22 14:50:28 2007 UTC (5 years ago) by andre
Branches: MAIN
Diff to: previous 1.288: preferred, colored
Changes since revision 1.288: +10 -0 lines
Unbreak writes of 0 bytes.  Zero byte writes happen when only ancillary
control data but no payload data is passed.

Change m_uiotombuf() to return at least one empty mbuf if the requested
length was zero.  Add comment to sosend_dgram and sosend_generic().

Diagnoses by:		jhb
Regression test by:	rwatson
Pointy hat to.		andre

Revision 1.288: download - view: text, markup, annotated - select for diffs
Mon Jan 8 17:49:59 2007 UTC (5 years, 1 month ago) by rwatson
Branches: MAIN
Diff to: previous 1.287: preferred, colored
Changes since revision 1.287: +2 -1 lines
Canonicalize copyrights in some files I hold copyrights on:

- Sort by date in license blocks, oldest copyright first.
- All rights reserved after all copyrights, not just the first.
- Use (c) to be consistent with other entries.

MFC after:	3 days

Revision 1.287: download - view: text, markup, annotated - select for diffs
Sat Dec 23 21:07:07 2006 UTC (5 years, 1 month ago) by bms
Branches: MAIN
Diff to: previous 1.286: preferred, colored
Changes since revision 1.286: +9 -11 lines
Drop all received data mbufs from a socket's queue if the MT_SONAME
mbuf is dropped, to preserve the invariant in the PR_ADDR case.

Add a regression test to detect this condition, but do not hook it
up to the build for now.

PR:             kern/38495
Submitted by:   James Juran
Reviewed by:    sam, rwatson
Obtained from:  NetBSD
MFC after:      2 weeks

Revision 1.286: download - view: text, markup, annotated - select for diffs
Wed Nov 22 23:54:29 2006 UTC (5 years, 2 months ago) by mohans
Branches: MAIN
Diff to: previous 1.285: preferred, colored
Changes since revision 1.285: +26 -22 lines
Fix a race in soclose() where connections could be queued to the
listening socket after the pass that cleans those queues. This
results in these connections being orphaned (and leaked). The fix
is to clean up the so queues after detaching the socket from the
protocol. Thanks to ups and jhb for discussions and a thorough code
review.

Revision 1.242.2.6.2.1: download - view: text, markup, annotated - select for diffs
Wed Nov 22 23:06:26 2006 UTC (5 years, 2 months ago) by mohans
Branches: RELENG_6_2
CVS tags: RELENG_6_2_0_RELEASE
Diff to: previous 1.242.2.6: preferred, colored; next MAIN 1.242.2.7: preferred, colored
Changes since revision 1.242.2.6: +23 -23 lines
Fix a race in soclose() where connections could be queued to the
listening socket after the pass that cleans those queues. This
results in these connections being orphaned (and leaked). The fix
is to clean up the so queues after detaching the socket from the
protocol. Thanks to ups and jhb for discussions and a thorough code
review.
Approved by: re

Revision 1.242.2.7: download - view: text, markup, annotated - select for diffs
Wed Nov 22 22:21:57 2006 UTC (5 years, 2 months ago) by mohans
Branches: RELENG_6
Diff to: previous 1.242.2.6: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.6: +23 -23 lines
Fix a race in soclose() where connections could be queued to the
listening socket after the pass that cleans those queues. This
results in these connections being orphaned (and leaked). The fix
is to clean up the so queues after detaching the socket from the
protocol. Thanks to ups and jhb for discussions and a thorough code
review.
Approved by: re

Revision 1.285: download - view: text, markup, annotated - select for diffs
Thu Nov 2 17:45:28 2006 UTC (5 years, 3 months ago) by andre
Branches: MAIN
Diff to: previous 1.284: preferred, colored
Changes since revision 1.284: +29 -1 lines
Use the improved m_uiotombuf() function instead of home grown sosend_copyin()
to do the userland to kernel copying in sosend_generic() and sosend_dgram().

sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported
by m_uiotombuf().

Benchmaring shows significant improvements (95% confidence):
 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO)
 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO)

(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)

Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 month

Revision 1.284: download - view: text, markup, annotated - select for diffs
Sun Oct 22 11:52:14 2006 UTC (5 years, 3 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.283: preferred, colored
Changes since revision 1.283: +2 -0 lines
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA

Revision 1.283: download - view: text, markup, annotated - select for diffs
Fri Sep 22 15:34:16 2006 UTC (5 years, 4 months ago) by bms
Branches: MAIN
Diff to: previous 1.282: preferred, colored
Changes since revision 1.282: +16 -1 lines
Fix a case where socket I/O atomicity is violated due to not dropping
the entire record when a non-data mbuf is removed in the soreceive() path.
This only triggers a panic directly when compiled with INVARIANTS.

PR:		38495
Submitted by:	James Juran
MFC after:	1 week

Revision 1.282: download - view: text, markup, annotated - select for diffs
Wed Sep 13 06:58:40 2006 UTC (5 years, 4 months ago) by pjd
Branches: MAIN
Diff to: previous 1.281: preferred, colored
Changes since revision 1.281: +1 -1 lines
Fix a lock leak in an error case.

Reported by:	netchild
Reviewed by:	rwatson

Revision 1.281: download - view: text, markup, annotated - select for diffs
Sun Sep 10 17:08:06 2006 UTC (5 years, 5 months ago) by andre
Branches: MAIN
Diff to: previous 1.280: preferred, colored
Changes since revision 1.280: +4 -1 lines
New sockets created by incoming connections into listen sockets should
inherit all settings and options except listen specific options.

Add the missing send/receive timeouts and low watermarks.
Remove inheritance of the field so_timeo which is unused.

Noticed by:	phk
Reviewed by:	rwatson
Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 days

Revision 1.280: download - view: text, markup, annotated - select for diffs
Fri Aug 18 14:05:13 2006 UTC (5 years, 5 months ago) by gnn
Branches: MAIN
Diff to: previous 1.279: preferred, colored
Changes since revision 1.279: +2 -2 lines
Fix a kernel panic based on receiving an ICMPv6 Packet too Big message.

PR:		99779
Submitted by:	Jinmei Tatuya
Reviewed by:	clement, rwatson
MFC after:	1 week

Revision 1.279: download - view: text, markup, annotated - select for diffs
Fri Aug 11 23:03:10 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.278: preferred, colored
Changes since revision 1.278: +3 -0 lines
Before performing a sodealloc() when pru_attach() fails, assert that
the socket refcount remains 1, and then drop to 0 before freeing the
socket.

PR:		101763
Reported by:	Gleb Kozyrev <gkozyrev at ukr dot net>

Revision 1.278: download - view: text, markup, annotated - select for diffs
Wed Aug 2 18:37:44 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.277: preferred, colored
Changes since revision 1.277: +2 -2 lines
Move destroying kqueue state from above pru_detach to below it in
sofree(), as a number of protocols expect to be able to call
soisdisconnected() during detach.  That may not be a good assumption,
but until I'm sure if it's a good assumption or not, allow it.

Revision 1.277: download - view: text, markup, annotated - select for diffs
Wed Aug 2 00:45:27 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.276: preferred, colored
Changes since revision 1.276: +1 -3 lines
Move updated of 'numopensockets' from bottom of sodealloc() to the top,
eliminating a second set of identical mutex operations at the bottom.
This allows brief exceeding of the max sockets limit, but only by
sockets in the last stages of being torn down.

Revision 1.276: download - view: text, markup, annotated - select for diffs
Tue Aug 1 10:30:26 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.275: preferred, colored
Changes since revision 1.275: +22 -14 lines
Reimplement socket buffer tear-down in sofree(): as the socket is no
longer referenced by other threads (hence our freeing it), we don't need
to set the can't send and can't receive flags, wake up the consumers,
perform two levels of locking, etc.  Implement a fast-path teardown,
sbdestroy(), which flushes and releases each socket buffer.  A manual
dom_dispose of the receive buffer is still required explicitly to GC
any in-flight file descriptors, etc, before flushing the buffer.

This results in a 9% UP performance improvement and 16% SMP performance
improvement on a tight loop of socket();close(); in micro-benchmarking,
but will likely also affect CPU-bound macro-benchmark performance.

Revision 1.275: download - view: text, markup, annotated - select for diffs
Mon Jul 24 15:20:07 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.274: preferred, colored
Changes since revision 1.274: +52 -2 lines
soreceive_generic(), and sopoll_generic().  Add new functions sosend(),
soreceive(), and sopoll(), which are wrappers for pru_sosend,
pru_soreceive, and pru_sopoll, and are now used univerally by socket
consumers rather than either directly invoking the old so*() functions
or directly invoking the protocol switch method (about an even split
prior to this commit).

This completes an architectural change that was begun in 1996 to permit
protocols to provide substitute implementations, as now used by UDP.
Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to
perform these operations on sockets -- in particular, distributed file
systems and socket system calls.

Architectural head nod:	sam, gnn, wollman

Revision 1.274: download - view: text, markup, annotated - select for diffs
Sun Jul 23 20:36:04 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.273: preferred, colored
Changes since revision 1.273: +153 -139 lines
Update various uipc_socket.c comments, and reformat others.

Revision 1.273: download - view: text, markup, annotated - select for diffs
Fri Jul 21 17:11:11 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.272: preferred, colored
Changes since revision 1.272: +5 -6 lines
Change semantics of socket close and detach.  Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket.  pru_abort is now a
notification of close also, and no longer detaches.  pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket.  This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree().  With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by:	gnn

Revision 1.272: download - view: text, markup, annotated - select for diffs
Sun Jul 16 23:09:39 2006 UTC (5 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.271: preferred, colored
Changes since revision 1.271: +12 -12 lines
Change comment on soabort() to more accurately describe how/when
soabort() is used.  Remove trailing white space.

Revision 1.271: download - view: text, markup, annotated - select for diffs
Tue Jul 11 23:18:28 2006 UTC (5 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.270: preferred, colored
Changes since revision 1.270: +4 -2 lines
Several protocol switch functions (pru_abort, pru_detach, pru_sosetlabel)
return void, so don't implement no-op versions of these functions.
Instead, consistently check if those switch pointers are NULL before
invoking them.

Revision 1.270: download - view: text, markup, annotated - select for diffs
Tue Jul 11 21:56:58 2006 UTC (5 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.269: preferred, colored
Changes since revision 1.269: +1 -4 lines
When pru_attach() fails, call sodealloc() on the socket rather than
using sorele() and the full tear-down path.  Since protocol state
allocation failed, this is not required (and is arguably undesirable).
This matches the behavior of sonewconn() under the same circumstances.

Revision 1.242.2.6: download - view: text, markup, annotated - select for diffs
Wed Jun 28 15:01:08 2006 UTC (5 years, 7 months ago) by rwatson
Branches: RELENG_6
CVS tags: RELENG_6_2_BP
Branch point for: RELENG_6_2
Diff to: previous 1.242.2.5: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.5: +13 -13 lines
Merge uipc_socket.c:1.267 from HEAD to RELENG_6:

  Rearrange code in soalloc() so that it's less indented by returning
  early if uma_zalloc() from the socket zone fails.  No functional
  change.

Revision 1.269: download - view: text, markup, annotated - select for diffs
Sun Jun 18 19:02:49 2006 UTC (5 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.268: preferred, colored
Changes since revision 1.268: +2 -0 lines
When retrieving SO_ERROR via getsockopt(), hold the socket lock around
the retrieval and replacement with 0.

MFC after:	1 week

Revision 1.242.2.5: download - view: text, markup, annotated - select for diffs
Sat Jun 17 17:47:05 2006 UTC (5 years, 7 months ago) by gnn
Branches: RELENG_6
Diff to: previous 1.242.2.4: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.4: +4 -1 lines
MFC a forgotten fix

Make sure that there actually is a next packet before setting
nextrecord to that field.

PR: 83885
Submitted by: hirose@comm.yamaha.co.jp

Revision 1.268: download - view: text, markup, annotated - select for diffs
Sat Jun 10 14:34:07 2006 UTC (5 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.267: preferred, colored
Changes since revision 1.267: +171 -36 lines
Move some functions and definitions from uipc_socket2.c to uipc_socket.c:

- Move sonewconn(), which creates new sockets for incoming connections on
  listen sockets, so that all socket allocate code is together in
  uipc_socket.c.

- Move 'maxsockets' and associated sysctls to uipc_socket.c with the
  socket allocation code.

- Move kern.ipc sysctl node to uipc_socket.c, add a SYSCTL_DECL() for it
  to sysctl.h and remove lots of scattered implementations in various
  IPC modules.

- Sort sodealloc() after soalloc() in uipc_socket.c for dependency order
  reasons.  Statisticize soalloc() and sodealloc() as they are now
  required only in uipc_socket.c, and are internal to the socket
  implementation.

After this change, socket allocation and deallocation is entirely
centralized in one file, and uipc_socket2.c consists entirely of socket
buffer manipulation and default protocol switch functions.

MFC after:	1 month

Revision 1.267: download - view: text, markup, annotated - select for diffs
Thu Jun 8 22:33:18 2006 UTC (5 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.266: preferred, colored
Changes since revision 1.266: +13 -13 lines
Rearrange code in soalloc() so that it's less indented by returning
early if uma_zalloc() from the socket zone fails.  No functional
change.

MFC after:	1 week

Revision 1.266: download - view: text, markup, annotated - select for diffs
Sun Apr 23 18:15:54 2006 UTC (5 years, 9 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.265: preferred, colored
Changes since revision 1.265: +3 -1 lines
Assert that sockets passed into soabort() not be SQ_COMP or SQ_INCOMP,
since that removal should have been done a layer up.

MFC after:	3 months

Revision 1.265: download - view: text, markup, annotated - select for diffs
Sun Apr 23 15:37:23 2006 UTC (5 years, 9 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.264: preferred, colored
Changes since revision 1.264: +1 -1 lines
Add missing 'not' to SQ_COMP comment.

MFC after:	3 months

Revision 1.264: download - view: text, markup, annotated - select for diffs
Sun Apr 23 15:33:38 2006 UTC (5 years, 9 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.263: preferred, colored
Changes since revision 1.263: +5 -17 lines
Move handling of SQ_COMP exception case in sofree() to the top of the
function along with the remainder of the reference checking code.  Move
comment from body to header with remainder of comments.  Inclusion of a
socket in a completed connection queue counts as a true reference, and
should not be handled as an under-documented edge case.

MFC after:	3 months

Revision 1.263: download - view: text, markup, annotated - select for diffs
Sat Apr 1 15:41:58 2006 UTC (5 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.262: preferred, colored
Changes since revision 1.262: +19 -19 lines
Chance protocol switch method pru_detach() so that it returns void
rather than an error.  Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF.  so_pcb is now entirely owned and
managed by the protocol code.  Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit.  In their current state they may leak
memory or panic.

MFC after:	3 months

Revision 1.262: download - view: text, markup, annotated - select for diffs
Sat Apr 1 15:15:02 2006 UTC (5 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.261: preferred, colored
Changes since revision 1.261: +29 -11 lines
Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful.  Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit.  This will be corrected shortly in followup
commits to these components.

MFC after:      3 months

Revision 1.261: download - view: text, markup, annotated - select for diffs
Sat Apr 1 10:45:52 2006 UTC (5 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.260: preferred, colored
Changes since revision 1.260: +2 -0 lines
Assert so->so_pcb is NULL in sodealloc() -- the protocol state should not
be present at this point.  We will eventually remove this assert because
the socket layer should never look at so_pcb, but for now it's a useful
debugging tool.

MFC after:	3 months

Revision 1.260: download - view: text, markup, annotated - select for diffs
Sat Apr 1 10:43:02 2006 UTC (5 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.259: preferred, colored
Changes since revision 1.259: +57 -0 lines
Add a somewhat sizable comment documenting the semantics of various kernel
socket calls relating to the creation and destruction of sockets.  This
will eventually form the foundation of socket(9), but is currently in too
much flux to do so.

MFC after:	3 months

Revision 1.259: download - view: text, markup, annotated - select for diffs
Thu Mar 16 07:03:14 2006 UTC (5 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.258: preferred, colored
Changes since revision 1.258: +3 -5 lines
Change soabort() from returning int to returning void, since all
consumers ignore the return value, soabort() is required to succeed,
and protocols produce errors here to report multiple freeing of the
pcb, which we hope to eliminate.

Revision 1.258: download - view: text, markup, annotated - select for diffs
Wed Mar 15 12:45:35 2006 UTC (5 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.257: preferred, colored
Changes since revision 1.257: +3 -3 lines
As with socket consumer references (so_count), make sofree() return
without GC'ing the socket if a strong protocol reference to the socket
is present (SS_PROTOREF).

Revision 1.257: download - view: text, markup, annotated - select for diffs
Sun Feb 12 15:00:27 2006 UTC (5 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.256: preferred, colored
Changes since revision 1.256: +8 -8 lines
Improve consistency of return() style.

MFC after:	3 days

Revision 1.256: download - view: text, markup, annotated - select for diffs
Fri Jan 13 10:22:01 2006 UTC (6 years ago) by rwatson
Branches: MAIN
Diff to: previous 1.255: preferred, colored
Changes since revision 1.255: +155 -2 lines
Add sosend_dgram(), a greatly reduced and simplified version of sosend()
intended for use solely with atomic datagram socket types, and relies
on the previous break-out of sosend_copyin().  Changes to allow UDP to
optionally use this instead of sosend() will be committed as a
follow-up.

Revision 1.242.2.4: download - view: text, markup, annotated - select for diffs
Wed Dec 28 18:05:13 2005 UTC (6 years, 1 month ago) by ps
Branches: RELENG_6
CVS tags: RELENG_6_1_BP, RELENG_6_1_0_RELEASE, RELENG_6_1
Diff to: previous 1.242.2.3: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.3: +29 -3 lines
MFC: rev 1.250
Allow 32bit get/setsockopt with SO_SNDTIMEO or SO_RECVTIMEO to work.

Revision 1.255: download - view: text, markup, annotated - select for diffs
Tue Nov 29 23:07:14 2005 UTC (6 years, 2 months ago) by jhb
Branches: MAIN
Diff to: previous 1.254: preferred, colored
Changes since revision 1.254: +1 -1 lines
Fix snderr() to not leak the socket buffer lock if an error occurs in
sosend().  Robert accidentally changed the snderr() macro to jump to the
out label which assumes the lock is already released rather than the
release label which drops the lock in his previous change to sosend().
This should fix the recent panics about returning from write(2) with the
socket lock held and the most recent LOR on current@.

Revision 1.254: download - view: text, markup, annotated - select for diffs
Mon Nov 28 21:45:36 2005 UTC (6 years, 2 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.253: preferred, colored
Changes since revision 1.253: +15 -15 lines
Move zero copy statistics structure before sosend_copyin().

MFC after:	1 month
Reported by:	tinderbox, sam

Revision 1.253: download - view: text, markup, annotated - select for diffs
Mon Nov 28 18:09:03 2005 UTC (6 years, 2 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.252: preferred, colored
Changes since revision 1.252: +170 -141 lines
Break out functionality in sosend() responsible for building mbuf
chains and copying in mbufs from the body of the send logic, creating
a new function sosend_copyin().  This changes makes sosend() almost
readable, and will allow the same logic to be used by tailored socket
send routines.

MFC after:	1 month
Reviewed by:	andre, glebius

Revision 1.252: download - view: text, markup, annotated - select for diffs
Wed Nov 2 13:46:31 2005 UTC (6 years, 3 months ago) by andre
Branches: MAIN
Diff to: previous 1.251: preferred, colored
Changes since revision 1.251: +1 -1 lines
Retire MT_HEADER mbuf type and change its users to use MT_DATA.

Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it.  It only adds a layer of confusion.  The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.

Non-native code is not changed in this commit.  For compatibility
MT_HEADER is mapped to MT_DATA.

Sponsored by:	TCP/IP Optimization Fundraise 2005

Revision 1.251: download - view: text, markup, annotated - select for diffs
Sun Oct 30 19:44:38 2005 UTC (6 years, 3 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.250: preferred, colored
Changes since revision 1.250: +6 -15 lines
Push the assignment of a new or updated so_qlimit from solisten()
following the protocol pru_listen() call to solisten_proto(), so
that it occurs under the socket lock acquisition that also sets
SO_ACCEPTCONN.  This requires passing the new backlog parameter
to the protocol, which also allows the protocol to be aware of
changes in queue limit should it wish to do something about the
new queue limit.  This continues a move towards the socket layer
acting as a library for the protocol.

Bump __FreeBSD_version due to a change in the in-kernel protocol
interface.  This change has been tested with IPv4 and UNIX domain
sockets, but not other protocols.

Revision 1.250: download - view: text, markup, annotated - select for diffs
Thu Oct 27 04:26:35 2005 UTC (6 years, 3 months ago) by ps
Branches: MAIN
Diff to: previous 1.249: preferred, colored
Changes since revision 1.249: +29 -3 lines
Allow 32bit get/setsockopt with SO_SNDTIMEO or SO_RECVTIMEO to work.

Revision 1.242.2.3: download - view: text, markup, annotated - select for diffs
Thu Oct 6 18:31:38 2005 UTC (6 years, 4 months ago) by delphij
Branches: RELENG_6
CVS tags: RELENG_6_0_BP, RELENG_6_0_0_RELEASE, RELENG_6_0
Diff to: previous 1.242.2.2: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.2: +1 -0 lines
MFC 1.244 (by kbyanc)

| Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std
| 1003.1 (POSIX).
|
| Revision  Changes    Path
| 1.244     +1 -0      src/sys/kern/uipc_socket.c

Approved by:	re (scottl)

Revision 1.208.2.24: download - view: text, markup, annotated - select for diffs
Tue Sep 27 21:54:02 2005 UTC (6 years, 4 months ago) by rwatson
Branches: RELENG_5
CVS tags: RELENG_5_5_BP, RELENG_5_5_0_RELEASE, RELENG_5_5
Diff to: previous 1.208.2.23: preferred, colored; branchpoint 1.208: preferred, colored; next MAIN 1.209: preferred, colored
Changes since revision 1.208.2.23: +17 -0 lines
Merge uipc_socket.c:1.249, socket.h:1.89 from HEAD to RELENG_5:

  Add three new read-only socket options, which allow regression tests
  and other applications to query the state of the stack regarding the
  accept queue on a listen socket:

  SO_LISTENQLIMIT    Return the value of so_qlimit (socket backlog)
  SO_LISTENQLEN      Return the value of so_qlen (complete sockets)
  SO_LISTENINCQLEN   Return the value of so_incqlen (incomplete sockets)

  Minor white space tweaks to existing socket options to make them
  consistent.

  Discussed with: andre

Revision 1.242.2.2: download - view: text, markup, annotated - select for diffs
Tue Sep 27 21:14:10 2005 UTC (6 years, 4 months ago) by rwatson
Branches: RELENG_6
Diff to: previous 1.242.2.1: preferred, colored; branchpoint 1.242: preferred, colored
Changes since revision 1.242.2.1: +17 -0 lines
Merge uipc_socket.c:1.249, socket.h:1.89 from HEAD to RELENG_6:

  Add three new read-only socket options, which allow regression tests
  and other applications to query the state of the stack regarding the
  accept queue on a listen socket:

  SO_LISTENQLIMIT    Return the value of so_qlimit (socket backlog)
  SO_LISTENQLEN      Return the value of so_qlen (complete sockets)
  SO_LISTENINCQLEN   Return the value of so_incqlen (incomplete sockets)

  Minor white space tweaks to existing socket options to make them
  consistent.

  Discussed with: andre

Approved by:	re (scottl)

Revision 1.208.2.23: download - view: text, markup, annotated - select for diffs
Wed Sep 21 15:36:04 2005 UTC (6 years, 4 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.22: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.22: +1 -1 lines
Merge uipc_socket.c:1.248 from HEAD to RELENG_5:

  Fix spelling in a comment.

Revision 1.242.2.1: download - view: text, markup, annotated - select for diffs
Wed Sep 21 15:32:21 2005 UTC (6 years, 4 months ago) by rwatson
Branches: RELENG_6
Diff to: previous 1.242: preferred, colored
Changes since revision 1.242: +1 -1 lines
Merge uipc_socket.c:1.248 from HEAD to RELENG_6:

  Fix spelling in a comment.

Approved by:	re (scottl)

Revision 1.249: download - view: text, markup, annotated - select for diffs
Sun Sep 18 21:08:03 2005 UTC (6 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.248: preferred, colored
Changes since revision 1.248: +17 -0 lines
Add three new read-only socket options, which allow regression tests
and other applications to query the state of the stack regarding the
accept queue on a listen socket:

SO_LISTENQLIMIT    Return the value of so_qlimit (socket backlog)
SO_LISTENQLEN      Return the value of so_qlen (complete sockets)
SO_LISTENINCQLEN   Return the value of so_incqlen (incomplete sockets)

Minor white space tweaks to existing socket options to make them
consistent.

Discussed with:	andre
MFC after:	1 week

Revision 1.248: download - view: text, markup, annotated - select for diffs
Sun Sep 18 10:46:34 2005 UTC (6 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.247: preferred, colored
Changes since revision 1.247: +1 -1 lines
Fix spelling in a comment.

MFC after:	3 days

Revision 1.247: download - view: text, markup, annotated - select for diffs
Thu Sep 15 13:18:05 2005 UTC (6 years, 4 months ago) by maxim
Branches: MAIN
Diff to: previous 1.246: preferred, colored
Changes since revision 1.246: +0 -2 lines
Backout rev. 1.246, it breaks code uses shutdown(2) on non-connected
sockets.

Pointed out by:	rwatson

Revision 1.246: download - view: text, markup, annotated - select for diffs
Thu Sep 15 11:45:36 2005 UTC (6 years, 4 months ago) by maxim
Branches: MAIN
Diff to: previous 1.245: preferred, colored
Changes since revision 1.245: +2 -0 lines
o Return ENOTCONN when shutdown(2) on non-connected socket.

PR:		kern/84761
Submitted by:	James Juran
R-test:		tools/regression/sockets/shutdown
MFC after:	1 month

Revision 1.245: download - view: text, markup, annotated - select for diffs
Tue Sep 6 17:05:11 2005 UTC (6 years, 5 months ago) by glebius
Branches: MAIN
Diff to: previous 1.244: preferred, colored
Changes since revision 1.244: +1 -8 lines
In soreceive(), when a first mbuf is removed from socket buffer use
sockbuf_pushsync(). Previous manipulation could lead to an inconsistent
mbuf.

Reviewed by:	rwatson

Revision 1.208.2.22: download - view: text, markup, annotated - select for diffs
Fri Aug 5 13:30:10 2005 UTC (6 years, 6 months ago) by gnn
Branches: RELENG_5
Diff to: previous 1.208.2.21: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.21: +4 -1 lines
MFC
Fix for PR 83885.

Make sure that there actually is a next packet before setting
nextrecord to that field.

PR: 83885
Submitted by: hirose@comm.yamaha.co.jp
Obtained from:  Patch suggested in the PR

Revision 1.244: download - view: text, markup, annotated - select for diffs
Mon Aug 1 21:15:09 2005 UTC (6 years, 6 months ago) by kbyanc
Branches: MAIN
Diff to: previous 1.243: preferred, colored
Changes since revision 1.243: +1 -0 lines
Make getsockopt(..., SOL_SOCKET, SO_ACCEPTCONN, ...) work per IEEE Std
1003.1 (POSIX).

Revision 1.243: download - view: text, markup, annotated - select for diffs
Thu Jul 28 10:10:01 2005 UTC (6 years, 6 months ago) by gnn
Branches: MAIN
Diff to: previous 1.242: preferred, colored
Changes since revision 1.242: +4 -1 lines
Fix for PR 83885.

Make sure that there actually is a next packet before setting
nextrecord to that field.

PR: 83885
Submitted by: hirose@comm.yamaha.co.jp
Obtained from:	Patch suggested in the PR
MFC after: 1 week

Revision 1.242: download - view: text, markup, annotated - select for diffs
Fri Jul 1 16:28:30 2005 UTC (6 years, 7 months ago) by ssouhlal
Branches: MAIN
CVS tags: RELENG_6_BP
Branch point for: RELENG_6
Diff to: previous 1.241: preferred, colored
Changes since revision 1.241: +4 -2 lines
Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone

Revision 1.68.2.28: download - view: text, markup, annotated - select for diffs
Sun Jun 26 07:04:23 2005 UTC (6 years, 7 months ago) by maxim
Branches: RELENG_4
Diff to: previous 1.68.2.27: preferred, colored; branchpoint 1.68: preferred, colored; next MAIN 1.69: preferred, colored
Changes since revision 1.68.2.27: +3 -1 lines
MFC rev. 1.19 sys/kern/uipc_accf.c to sys/kern/uipc_socket.c, bug fixes for:
setsockopt(2) cannot remove accept filter, getsockopt(SO_ACCEPTFILTER)
always returns success on listen socket.

Revision 1.208.2.21: download - view: text, markup, annotated - select for diffs
Sun Jun 26 06:59:49 2005 UTC (6 years, 7 months ago) by maxim
Branches: RELENG_5
Diff to: previous 1.208.2.20: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.20: +2 -0 lines
MFC rev. 1.19 sys/kern/uipc_accf.c to sys/kern/uipc_accf.c and
sys/kern/uipc_socket.c, bug fixes for: setsockopt(2) cannot remove accept
filter, getsockopt(SO_ACCEPTFILTER) always returns success on listen socket.

MFC rev. 1.4 tools/regression/sockets/accf_data_attach/accf_data_attach.c:
Add regression tests for these bugs.

Revision 1.241: download - view: text, markup, annotated - select for diffs
Fri Jun 10 16:49:18 2005 UTC (6 years, 8 months ago) by brooks
Branches: MAIN
Diff to: previous 1.240: preferred, colored
Changes since revision 1.240: +3 -3 lines
Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam

Revision 1.68.2.27: download - view: text, markup, annotated - select for diffs
Thu Jun 9 20:06:44 2005 UTC (6 years, 8 months ago) by scottl
Branches: RELENG_4
Diff to: previous 1.68.2.26: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.26: +1 -3 lines
Back out the previous revision to unbreak UDP and NFS.

Submitted by: rwatson

Revision 1.208.2.20: download - view: text, markup, annotated - select for diffs
Thu Jun 9 20:04:04 2005 UTC (6 years, 8 months ago) by jmg
Branches: RELENG_5
Diff to: previous 1.208.2.19: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.19: +1 -3 lines
back out previous commit as it breaks NFS..

Revision 1.240: download - view: text, markup, annotated - select for diffs
Thu Jun 9 19:59:09 2005 UTC (6 years, 8 months ago) by scottl
Branches: MAIN
Diff to: previous 1.239: preferred, colored
Changes since revision 1.239: +966 -504 lines
Drat!  Committed from the wrong branch.  Restore HEAD to its previous goodness.

Revision 1.239: download - view: text, markup, annotated - select for diffs
Thu Jun 9 19:56:38 2005 UTC (6 years, 8 months ago) by scottl
Branches: MAIN
Diff to: previous 1.238: preferred, colored
Changes since revision 1.238: +504 -966 lines
Back out 1.68.2.26.  It was a mis-guided change that was already backed out
of HEAD and should not have been MFC'd.  This will restore UDP socket
functionality, which will correct the recent NFS problems.

Submitted by: rwatson

Revision 1.68.2.26: download - view: text, markup, annotated - select for diffs
Tue Jun 7 07:12:58 2005 UTC (6 years, 8 months ago) by jmg
Branches: RELENG_4
Diff to: previous 1.68.2.25: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.25: +3 -1 lines
MFC: v1.209
make sure that the socket is either accepting connections or is connected
when attaching a knote to it...  otherwise return EINVAL...

Suggested by:	dclark at applmath.scu.edu

Revision 1.208.2.19: download - view: text, markup, annotated - select for diffs
Tue Jun 7 07:11:03 2005 UTC (6 years, 8 months ago) by jmg
Branches: RELENG_5
Diff to: previous 1.208.2.18: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.18: +3 -1 lines
MFC: v1.209
make sure that the socket is either accepting connections or is connected
when attaching a knote to it...  otherwise return EINVAL...

Revision 1.238: download - view: text, markup, annotated - select for diffs
Sun Jun 5 17:13:23 2005 UTC (6 years, 8 months ago) by gallatin
Branches: MAIN
Diff to: previous 1.237: preferred, colored
Changes since revision 1.237: +4 -7 lines
Allow sends sent from non page-aligned userspace addresses to be
considered for zero-copy sends.

Reviewed by: alc
Submitted by: Romer Gil at Rice University

Revision 1.208.2.18: download - view: text, markup, annotated - select for diffs
Thu Mar 31 22:35:23 2005 UTC (6 years, 10 months ago) by sobomax
Branches: RELENG_5
CVS tags: RELENG_5_4_BP, RELENG_5_4_0_RELEASE, RELENG_5_4
Diff to: previous 1.208.2.17: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.17: +11 -2 lines
MFC: When re-connecting already connected datagram socket ensure to clean
     up its pending error state.

Approved by:	re (kensmith)

Revision 1.237: download - view: text, markup, annotated - select for diffs
Sat Mar 12 12:57:17 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.236: preferred, colored
Changes since revision 1.236: +1 -18 lines
Move the logic implementing retrieval of the SO_ACCEPTFILTER socket option
from uipc_socket.c to uipc_accf.c in do_getopt_accept_filter(), so that it
now matches do_setopt_accept_filter().  Slightly reformulate the logic to
match the optimistic allocation of storage for the argument in advance,
and slightly expand the coverage of the socket lock.

Revision 1.236: download - view: text, markup, annotated - select for diffs
Fri Mar 11 19:16:02 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.235: preferred, colored
Changes since revision 1.235: +0 -1 lines
Remove an additional commented out reference to a possible future sx
lock.

Revision 1.235: download - view: text, markup, annotated - select for diffs
Fri Mar 11 16:30:02 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.234: preferred, colored
Changes since revision 1.234: +1 -3 lines
When setting up a socket in socreate(), there's no need to lock the
socket lock around knlist_init(), so don't.

Hard code the setting of the socket reference count to 1 rather than
using soref() to avoid asserting the socket lock, since we've not yet
exposed the socket to other threads.

This removes two mutex operations from each socket allocation.

Revision 1.234: download - view: text, markup, annotated - select for diffs
Fri Mar 11 16:26:33 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.233: preferred, colored
Changes since revision 1.233: +0 -1 lines
Remove suggestive sx_init() comment in soalloc().  We will have something
like this at some point, but for now it clutters the source.

Revision 1.208.2.17: download - view: text, markup, annotated - select for diffs
Mon Mar 7 13:08:03 2005 UTC (6 years, 11 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.16: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.16: +42 -14 lines
Merge uipc_socket.c:1.233, uipc_usrreq.c:1.151, atm_cm.c:1.33,
atm_socket.c:1.23, atm_var.h:1.26, ipatm_load.c:1.l21,
ng_btsocket_l2cap.c:1.16, ng_btsocket_rfcomm.c:1.15, tcp_usrreq.c:1.15,
spx_usrreq.c:1.62, socketvar.h:1.139 from HEAD to RELENG_5:

  In the current world order, solisten() implements the state transition of
  a socket from a regular socket to a listening socket able to accept new
  connections.  As part of this state transition, solisten() calls into the
  protocol to update protocol-layer state.  There were several bugs in this
  implementation that could result in a race wherein a TCP SYN received
  in the interval between the protocol state transition and the shortly
  following socket layer transition would result in a panic in the TCP code,
  as the socket would be in the TCPS_LISTEN state, but the socket would not
  have the SO_ACCEPTCONN flag set.

  This change does the following:

  - Pushes the socket state transition from the socket layer solisten() to
    to socket "library" routines called from the protocol.  This permits
    the socket routines to be called while holding the protocol mutexes,
    preventing a race exposing the incomplete socket state transition to TCP
    after the TCP state transition has completed.  The check for a socket
    layer state transition is performed by solisten_proto_check(), and the
    actual transition is performed by solisten_proto().

  - Holds the socket lock for the duration of the socket state test and set,
    and over the protocol layer state transition, which is now possible as
    the socket lock is acquired by the protocol layer, rather than vice
    versa.  This prevents additional state related races in the socket
    layer.

  This permits the dual transition of socket layer and protocol layer state
  to occur while holding locks for both layers, making the two changes
  atomic with respect to one another.  Similar changes are likely require
  elsewhere in the socket/protocol code.

  Reported by:            Peter Holm <peter@holm.cc>
  Review and fixes from:  emax, Antoine Brodin <antoine.brodin@laposte.net>
  Philosophical head nod: gnn

Note that this changes the behavior of the pru_listen() protocol entry point;
all protocols are updated to match the new behavior.  We do not know of any
third party protocol implementations that this might cause problems for.

Approved by:	re (kensmith)

Revision 1.208.2.16: download - view: text, markup, annotated - select for diffs
Fri Feb 25 17:57:30 2005 UTC (6 years, 11 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.15: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.15: +2 -1 lines
Merge uipc_socket.c:1.232 from HEAD to RELENG_5:

  In soreceive(), when considering delivery to a socket in SS_ISCONFIRMING,
  only call the protocol's pru_rcvd() if the protocol has the flag
  PR_WANTRCVD set.  This brings that instance of pru_rcvd() into line with
  the rest, which do check the flag.

Revision 1.208.2.15: download - view: text, markup, annotated - select for diffs
Fri Feb 25 17:55:05 2005 UTC (6 years, 11 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.14: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.14: +0 -126 lines
Merge uipc_aaccf.c:1.14, uipc_socket.c:1.229, and socketvar.h:1.138 from
HEAD to RELENG_5:

  Move do_setopt_accept_filter() from uipc_socket.c to uipc_accf.c, where
  the rest of the accept filter code currently lives.

Revision 1.208.2.14: download - view: text, markup, annotated - select for diffs
Fri Feb 25 17:52:12 2005 UTC (6 years, 11 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.13: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.13: +4 -6 lines
Merge uipc_socket.c:1.227 from HEAD to RELENG_5:

  date: 2005/02/18 00:52:17;  author: rwatson;  state: Exp;  lines: +4 -6
  In solisten(), unconditionally set the SO_ACCEPTCONN option in
  so->so_options when solisten() will succeed, rather than setting it
  conditionally based on there not being queued sockets in the completed
  socket queue.  Otherwise, if the protocol exposes new sockets via the
  completed queue before solisten() completes, the listen() system call
  will succeed, but the socket and protocol state will be out of sync.
  For TCP, this didn't happen in practice, as the TCP code will panic if
  a new connection comes in after the tcpcb has been transitioned to a
  listening state but the socket doesn't have SO_ACCEPTCONN set.

  This is historical behavior resulting from bitrot since 4.3BSD, in which
  that line of code was associated with the conditional NULL'ing of the
  connection queue pointers (one-time initialization to be performed
  during the transition to a listening socket), which are now initialized
  separately.

  Discussed with: fenner, gnn

Revision 1.233: download - view: text, markup, annotated - select for diffs
Mon Feb 21 21:58:16 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.232: preferred, colored
Changes since revision 1.232: +42 -14 lines
In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn

Revision 1.208.2.13: download - view: text, markup, annotated - select for diffs
Mon Feb 21 12:34:32 2005 UTC (6 years, 11 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.12: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.12: +1 -1 lines
Merge uipc_socket.c:1.231 from HEAD to RELENG_5:

  date: 2005/02/18 19:15:22;  author: rwatson;  state: Exp;  lines: +1 -1
  Correct a typo in the comment describing soreceive_rcvoob().

Revision 1.208.2.12: download - view: text, markup, annotated - select for diffs
Mon Feb 21 10:59:29 2005 UTC (6 years, 11 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.11: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.11: +3 -3 lines
Merge uipc_socket.c:1.228 from HEAD to RELENG_5:

  date: 2005/02/18 18:43:33;  author: rwatson;  state: Exp;  lines: +3 -3
  Re-order checks in socheckuid() so that we check all deny cases before
  returning accept.

Revision 1.232: download - view: text, markup, annotated - select for diffs
Sun Feb 20 15:54:44 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.231: preferred, colored
Changes since revision 1.231: +2 -1 lines
In soreceive(), when considering delivery to a socket in SS_ISCONFIRMING,
only call the protocol's pru_rcvd() if the protocol has the flag
PR_WANTRCVD set.  This brings that instance of pru_rcvd() into line with
the rest, which do check the flag.

MFC after:	3 days

Revision 1.208.2.11: download - view: text, markup, annotated - select for diffs
Sat Feb 19 20:56:07 2005 UTC (6 years, 11 months ago) by glebius
Branches: RELENG_5
Diff to: previous 1.208.2.10: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.10: +8 -0 lines
Apply a workaround for problem fixed in sys/socketvar.h rev. 1.137. We
can't MFC socketvar.h since it will break ABI between kernel and several
modules. This workaround saves sb_state in local variable and restores
it after bzero().

Original commit log of socketvar.h and problem description:
  Move sb_state to the beginning of structure, above sb_startzero member.
  sb_state shouldn't be erased, when socket buffer is flushed by sorflush().

  When sb_state was bzero'ed, a recently set SBS_CANTRCVMORE flag was cleared.
  If a socket was shutdown(SHUT_RD), a subsequent read() would block on it.

  Reported by:    Ed Maste, Gerrit Nagelhout
  Reviewed by:    rwatson

Revision 1.208.2.10: download - view: text, markup, annotated - select for diffs
Sat Feb 19 14:17:12 2005 UTC (6 years, 11 months ago) by glebius
Branches: RELENG_5
Diff to: previous 1.208.2.9: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.9: +23 -2 lines
MFC 1.226, with one change: use SHRT_MAX instead of USHRT_MAX, since
sys/socketvar.h rev. 1.136 is not merged due to API freeze.

  - Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > SHRT_MAX.

  Before this change setting somaxconn to smth above 32767 and calling
  listen(fd, -1) lead to a socket, which doesn't accept connections at all.

  Reviewed by:    rwatson
  Reported by:    Igor Sysoev

Revision 1.231: download - view: text, markup, annotated - select for diffs
Fri Feb 18 19:15:22 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.230: preferred, colored
Changes since revision 1.230: +1 -1 lines
Correct a typo in the comment describing soreceive_rcvoob().

MFC after:	3 days

Revision 1.230: download - view: text, markup, annotated - select for diffs
Fri Feb 18 19:13:51 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.229: preferred, colored
Changes since revision 1.229: +0 -2 lines
In soconnect(), when resetting so->so_error, the socket lock is not
required due to a straight integer write in which minor races are not
a problem.

Revision 1.229: download - view: text, markup, annotated - select for diffs
Fri Feb 18 18:54:42 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.228: preferred, colored
Changes since revision 1.228: +0 -126 lines
Move do_setopt_accept_filter() from uipc_socket.c to uipc_accf.c, where
the rest of the accept filter code currently lives.

MFC after:	3 days

Revision 1.228: download - view: text, markup, annotated - select for diffs
Fri Feb 18 18:43:33 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.227: preferred, colored
Changes since revision 1.227: +3 -3 lines
Re-order checks in socheckuid() so that we check all deny cases before
returning accept.

MFC after:	3 days

Revision 1.227: download - view: text, markup, annotated - select for diffs
Fri Feb 18 00:52:17 2005 UTC (6 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.226: preferred, colored
Changes since revision 1.226: +4 -6 lines
In solisten(), unconditionally set the SO_ACCEPTCONN option in
so->so_options when solisten() will succeed, rather than setting it
conditionally based on there not being queued sockets in the completed
socket queue.  Otherwise, if the protocol exposes new sockets via the
completed queue before solisten() completes, the listen() system call
will succeed, but the socket and protocol state will be out of sync.
For TCP, this didn't happen in practice, as the TCP code will panic if
a new connection comes in after the tcpcb has been transitioned to a
listening state but the socket doesn't have SO_ACCEPTCONN set.

This is historical behavior resulting from bitrot since 4.3BSD, in which
that line of code was associated with the conditional NULL'ing of the
connection queue pointers (one-time initialization to be performed
during the transition to a listening socket), which are now initialized
separately.

Discussed with:	fenner, gnn
MFC after:	3 days

Revision 1.208.2.9: download - view: text, markup, annotated - select for diffs
Mon Jan 31 23:26:18 2005 UTC (7 years ago) by imp
Branches: RELENG_5
Diff to: previous 1.208.2.8: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.8: +1 -1 lines
MFC: /*- and related license changes

Revision 1.226: download - view: text, markup, annotated - select for diffs
Mon Jan 24 12:20:20 2005 UTC (7 years ago) by glebius
Branches: MAIN
Diff to: previous 1.225: preferred, colored
Changes since revision 1.225: +23 -2 lines
- Convert so_qlen, so_incqlen, so_qlimit fields of struct socket from
  short to unsigned short.
- Add SYSCTL_PROC() around somaxconn, not accepting values < 1 or > U_SHRTMAX.

Before this change setting somaxconn to smth above 32767 and calling
listen(fd, -1) lead to a socket, which doesn't accept connections at all.

Reviewed by:	rwatson
Reported by:	Igor Sysoev

Revision 1.225: download - view: text, markup, annotated - select for diffs
Wed Jan 12 10:15:23 2005 UTC (7 years ago) by sobomax
Branches: MAIN
Diff to: previous 1.224: preferred, colored
Changes since revision 1.224: +11 -2 lines
When re-connecting already connected datagram socket ensure to clean
up its pending error state, which may be set in some rare conditions resulting
in connect() syscall returning that bogus error and making application believe
that attempt to change association has failed, while it has not in fact.

There is sockets/reconnect regression test which excersises this bug.

MFC after:	2 weeks

Revision 1.224: download - view: text, markup, annotated - select for diffs
Thu Jan 6 23:35:40 2005 UTC (7 years, 1 month ago) by imp
Branches: MAIN
Diff to: previous 1.223: preferred, colored
Changes since revision 1.223: +1 -1 lines
/* -> /*- for copyright notices, minor format tweaks as necessary

Revision 1.208.2.8: download - view: text, markup, annotated - select for diffs
Thu Jan 6 20:24:58 2005 UTC (7 years, 1 month ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.7: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.7: +4 -12 lines
Merge uipc_socket.c:1.223 from HEAD to RELENG_5:

  date: 2004/12/23 01:07:12;  author: rwatson;  state: Exp;  lines: +4 -12
  Remove an XXXRW indicating atomic operations might be used as a
  substitute for a global mutex protecting the socket count and
  generation number.

  The observation that soreceive_rcvoob() can't return an mbuf
  chain is a property, not a bug, so remove the XXXRW.

  In sorflush, s/existing/previous/ for code when describing prior
  behavior.

  For SO_LINGER socket option retrieval, remove an XXXRW about why
  we hold the mutex: this is correct and not dubious.

  MFC after:      2 weeks

Revision 1.208.2.7: download - view: text, markup, annotated - select for diffs
Thu Jan 6 20:23:57 2005 UTC (7 years, 1 month ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.6: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.6: +3 -14 lines
Merge uipc_socket.c:1.222 from HEAD to RELENG_5:

  date: 2004/12/23 00:59:43;  author: rwatson;  state: Exp;  lines: +3 -14
  In soalloc(), simplify the mac_init_socket() handling to remove
  unnecessary use of a global variable and simplify the return case.
  While here, use ()'s around return values.

  In sodealloc(), remove a comment about why we bump the gencnt and
  decrement the socket count separately.  It doesn't add
  substantially to the reading, and clutters the function.

  MFC after:      2 weeks

Revision 1.223: download - view: text, markup, annotated - select for diffs
Thu Dec 23 01:07:12 2004 UTC (7 years, 1 month ago) by rwatson
Branches: MAIN
Diff to: previous 1.222: preferred, colored
Changes since revision 1.222: +4 -12 lines
Remove an XXXRW indicating atomic operations might be used as a
substitute for a global mutex protecting the socket count and
generation number.

The observation that soreceive_rcvoob() can't return an mbuf
chain is a property, not a bug, so remove the XXXRW.

In sorflush, s/existing/previous/ for code when describing prior
behavior.

For SO_LINGER socket option retrieval, remove an XXXRW about why
we hold the mutex: this is correct and not dubious.

MFC after:	2 weeks

Revision 1.222: download - view: text, markup, annotated - select for diffs
Thu Dec 23 00:59:43 2004 UTC (7 years, 1 month ago) by rwatson
Branches: MAIN
Diff to: previous 1.221: preferred, colored
Changes since revision 1.221: +3 -14 lines
In soalloc(), simplify the mac_init_socket() handling to remove
unnecessary use of a global variable and simplify the return case.
While here, use ()'s around return values.

In sodealloc(), remove a comment about why we bump the gencnt and
decrement the socket count separately.  It doesn't add
substantially to the reading, and clutters the function.

MFC after:	2 weeks

Revision 1.221: download - view: text, markup, annotated - select for diffs
Fri Dec 10 04:49:13 2004 UTC (7 years, 2 months ago) by alc
Branches: MAIN
Diff to: previous 1.220: preferred, colored
Changes since revision 1.220: +0 -12 lines
Remove unneeded code from the zero-copy receive path.

Discussed with: gallatin@
Tested by: ken@

Revision 1.220: download - view: text, markup, annotated - select for diffs
Wed Dec 8 05:25:08 2004 UTC (7 years, 2 months ago) by alc
Branches: MAIN
Diff to: previous 1.219: preferred, colored
Changes since revision 1.219: +2 -3 lines
Tidy up the zero-copy receive path: Remove an unneeded argument to
uiomoveco() and userspaceco().

Revision 1.219: download - view: text, markup, annotated - select for diffs
Mon Nov 29 23:10:59 2004 UTC (7 years, 2 months ago) by ps
Branches: MAIN
Diff to: previous 1.218: preferred, colored
Changes since revision 1.218: +7 -1 lines
If soreceive() is called from a socket callback, there's no reason
to do a window update to the peer (thru an ACK) from soreceive()
itself. TCP will do that upon return from the socket callback.
Sending a window update from soreceive() results in a lock reversal.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson

Revision 1.218: download - view: text, markup, annotated - select for diffs
Mon Nov 29 23:09:07 2004 UTC (7 years, 2 months ago) by ps
Branches: MAIN
Diff to: previous 1.217: preferred, colored
Changes since revision 1.217: +21 -3 lines
Make soreceive(MSG_DONTWAIT) nonblocking. If MSG_DONTWAIT is passed into
soreceive(), then pass in M_DONTWAIT to m_copym(). Also fix up error
handling for the case where m_copym() returns failure.

Submitted by:	Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by:	rwatson

Revision 1.208.2.6: download - view: text, markup, annotated - select for diffs
Tue Nov 16 08:15:07 2004 UTC (7 years, 2 months ago) by glebius
Branches: RELENG_5
Diff to: previous 1.208.2.5: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.5: +3 -3 lines
MFC 1.217:
  Since sb_timeo type was increased to int, use INT_MAX instead of SHRT_MAX.

Revision 1.217: download - view: text, markup, annotated - select for diffs
Tue Nov 9 18:35:26 2004 UTC (7 years, 3 months ago) by glebius
Branches: MAIN
Diff to: previous 1.216: preferred, colored
Changes since revision 1.216: +3 -3 lines
Since sb_timeo type was increased to int, use INT_MAX instead of SHRT_MAX.
This also gives us ability to close PR.

PR:		kern/42352
Approved by:	julian (mentor)
MFC after:	1 week

Revision 1.208.2.3.2.2: download - view: text, markup, annotated - select for diffs
Thu Nov 4 01:17:31 2004 UTC (7 years, 3 months ago) by rwatson
Branches: RELENG_5_3
CVS tags: RELENG_5_3_0_RELEASE
Diff to: previous 1.208.2.3.2.1: preferred, colored; branchpoint 1.208.2.3: preferred, colored; next MAIN 1.208.2.4: preferred, colored
Changes since revision 1.208.2.3.2.1: +1 -0 lines
Merged uipc_socket.c:1.216 from HEAD to RELENG_5_3:

  date: 2004/11/02 17:15:13;  author: rwatson;  state: Exp;  lines: +1 -0
  Acquire the accept mutex in soabort() before calling sotryfree(), as
  that is now required.

  RELENG_5_3 candidate.

  Foot provided by:       Dikshie <dikshie at ppk dot itb dot ac dot id>

Approved by:	re (kensmith)

Revision 1.208.2.5: download - view: text, markup, annotated - select for diffs
Wed Nov 3 21:11:34 2004 UTC (7 years, 3 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.4: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.4: +1 -0 lines
Merge uipc_socket.c:1.216 from HEAD to RELENG_5:

  date: 2004/11/02 17:15:13;  author: rwatson;  state: Exp;  lines: +1 -0
  Acquire the accept mutex in soabort() before calling sotryfree(), as
  that is now required.

  RELENG_5_3 candidate.

  Foot provided by:       Dikshie <dikshie at ppk dot itb dot ac dot id>

Approved by:	re (kensmith)

Revision 1.216: download - view: text, markup, annotated - select for diffs
Tue Nov 2 17:15:13 2004 UTC (7 years, 3 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.215: preferred, colored
Changes since revision 1.215: +1 -0 lines
Acquire the accept mutex in soabort() before calling sotryfree(), as
that is now required.

RELENG_5_3 candidate.

Foot provided by:	Dikshie <dikshie at ppk dot itb dot ac dot id>

Revision 1.215: download - view: text, markup, annotated - select for diffs
Sat Oct 23 19:06:43 2004 UTC (7 years, 3 months ago) by andre
Branches: MAIN
Diff to: previous 1.214: preferred, colored
Changes since revision 1.214: +2 -1 lines
socreate() does an early abort if either the protocol cannot be found,
or pru_attach is NULL.  With loadable protocols the SPACER dummy protocols
have valid function pointers for all methods to functions returning just
EOPNOTSUPP.  Thus the early abort check would not detect immediately that
attach is not supported for this protocol.  Instead it would correctly
get the EOPNOTSUPP error later on when it calls the protocol specific
attach function.

Add testing against the pru_attach_notsupp() function pointer to the
early abort check as well.

Revision 1.208.2.3.2.1: download - view: text, markup, annotated - select for diffs
Thu Oct 21 09:30:46 2004 UTC (7 years, 3 months ago) by rwatson
Branches: RELENG_5_3
Diff to: previous 1.208.2.3: preferred, colored
Changes since revision 1.208.2.3: +4 -3 lines
Merge kern_descrip.c:1.246, uipc_socket.c:1.214, uipc_usrreq.c:1.141,
raw_cb.c:1.30, raw_usrreq.c:1.35, ddp_pcb.c:1.45, atm_socket.c:1.21,
ng_btsocket_hci_raw.c:1.16, ng_btsocket_l2cap.c:1.14,
ng_btsocket_l2cap_raw.c:1.13, ng_btsocket_rfcomm.c:1.13, in_pcb.c:1.156,
tcp_subr.c:1.205, in6_pcb.c:1.61, ipx_pcb.c:1.29, ipx_usrreq.c:1.41,
natm.c:1.35, socketvar.h:1.135 from HEAD to RELENG_5_3:

  Push acquisition of the accept mutex out of sofree() into the caller
  (sorele()/sotryfree()):

  - This permits the caller to acquire the accept mutex before the socket
    mutex, avoiding sofree() having to drop the socket mutex and re-order,
    which could lead to races permitting more than one thread to enter
    sofree() after a socket is ready to be free'd.

  - This also covers clearing of the so_pcb weak socket reference from
    the protocol to the socket, preventing races in clearing and
    evaluation of the reference such that sofree() might be called more
    than once on the same socket.

  This appears to close a race I was able to easily trigger by repeatedly
  opening and resetting TCP connections to a host, in which the
  tcp_close() code called as a result of the RST raced with the close()
  of the accepted socket in the user process resulting in simultaneous
  attempts to de-allocate the same socket.  The new locking increases
  the overhead for operations that may potentially free the socket, so we
  will want to revise the synchronization strategy here as we normalize
  the reference counting model for sockets.  The use of the accept mutex
  in freeing of sockets that are not listen sockets is primarily
  motivated by the potential need to remove the socket from the
  incomplete connection queue on its parent (listen) socket, so cleaning
  up the reference model here may allow us to substantially weaken the
  synchronization requirements.

  RELENG_5_3 candidate.

  MFC after:      3 days
  Reviewed by:    dwhite
  Discussed with: gnn, dwhite, green
  Reported by:    Marc UBM Bocklet <ubm at u-boot-man dot de>
  Reported by:    Vlad <marchenko at gmail dot com>

Approved by:    re (scottl)

Revision 1.208.2.4: download - view: text, markup, annotated - select for diffs
Wed Oct 20 12:42:54 2004 UTC (7 years, 3 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.3: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.3: +4 -3 lines
Merge kern_descrip.c:1.246, uipc_socket.c:1.214, uipc_usrreq.c:1.141,
raw_cb.c:1.30, raw_usrreq.c:1.35, ddp_pcb.c:1.45, atm_socket.c:1.21,
ng_btsocket_hci_raw.c:1.16, ng_btsocket_l2cap.c:1.14,
ng_btsocket_l2cap_raw.c:1.13, ng_btsocket_rfcomm.c:1.13, in_pcb.c:1.156,
tcp_subr.c:1.205, in6_pcb.c:1.61, ipx_pcb.c:1.29, ipx_usrreq.c:1.41,
natm.c:1.35, socketvar.h:1.135 from HEAD to RELENG_5:

  Push acquisition of the accept mutex out of sofree() into the caller
  (sorele()/sotryfree()):

  - This permits the caller to acquire the accept mutex before the socket
    mutex, avoiding sofree() having to drop the socket mutex and re-order,
    which could lead to races permitting more than one thread to enter
    sofree() after a socket is ready to be free'd.

  - This also covers clearing of the so_pcb weak socket reference from
    the protocol to the socket, preventing races in clearing and
    evaluation of the reference such that sofree() might be called more
    than once on the same socket.

  This appears to close a race I was able to easily trigger by repeatedly
  opening and resetting TCP connections to a host, in which the
  tcp_close() code called as a result of the RST raced with the close()
  of the accepted socket in the user process resulting in simultaneous
  attempts to de-allocate the same socket.  The new locking increases
  the overhead for operations that may potentially free the socket, so we
  will want to revise the synchronization strategy here as we normalize
  the reference counting model for sockets.  The use of the accept mutex
  in freeing of sockets that are not listen sockets is primarily
  motivated by the potential need to remove the socket from the
  incomplete connection queue on its parent (listen) socket, so cleaning
  up the reference model here may allow us to substantially weaken the
  synchronization requirements.

  RELENG_5_3 candidate.

  MFC after:      3 days
  Reviewed by:    dwhite
  Discussed with: gnn, dwhite, green
  Reported by:    Marc UBM Bocklet <ubm at u-boot-man dot de>
  Reported by:    Vlad <marchenko at gmail dot com>

MFC after:	1 day
Approved by:	re (scottl)

Revision 1.214: download - view: text, markup, annotated - select for diffs
Mon Oct 18 22:19:42 2004 UTC (7 years, 3 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.213: preferred, colored
Changes since revision 1.213: +4 -3 lines
Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>

Revision 1.208.2.3: download - view: text, markup, annotated - select for diffs
Thu Oct 14 11:43:16 2004 UTC (7 years, 3 months ago) by rwatson
Branches: RELENG_5
CVS tags: RELENG_5_3_BP
Branch point for: RELENG_5_3
Diff to: previous 1.208.2.2: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.2: +19 -5 lines
Merge uipc_socket.c:1.213 from HEAD to RELENG_5:

  date: 2004/10/11 08:11:26;  author: rwatson;  state: Exp;  lines: +19 -5
  Rework sofree() logic to take into account a possible race with accept().
  Sockets in the listen queues have reference counts of 0, so if the
  protocol decides to disconnect the pcb and try to free the socket, this
  triggered a race with accept() wherein accept() would bump the reference
  count before sofree() had removed the socket from the listen queues,
  resulting in a panic in sofree() when it discovered it was freeing a
  referenced socket.  This might happen if a RST came in prior to accept()
  on a TCP connection.

  The fix is two-fold: to expand the coverage of the accept mutex earlier
  in sofree() to prevent accept() from grabbing the socket after the "is it
  really safe to free" tests, and to expand the logic of the "is it really
  safe to free" tests to check that the refcount is still 0 (i.e., we
  didn't race).

  RELENG_5 candidate.

  Much discussion with and work by:       green
  Reported by:    Marc UBM Bocklet <ubm at u-boot-man dot de>
  Reported by:    Vlad <marchenko at gmail dot com>

Note that this fix is not yet confirmed by submitters to correct the
specific symptom they are seeing, but does correct a known problem.

Approved by:	re (scottl)

Revision 1.213: download - view: text, markup, annotated - select for diffs
Mon Oct 11 08:11:26 2004 UTC (7 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.212: preferred, colored
Changes since revision 1.212: +19 -5 lines
Rework sofree() logic to take into account a possible race with accept().
Sockets in the listen queues have reference counts of 0, so if the
protocol decides to disconnect the pcb and try to free the socket, this
triggered a race with accept() wherein accept() would bump the reference
count before sofree() had removed the socket from the listen queues,
resulting in a panic in sofree() when it discovered it was freeing a
referenced socket.  This might happen if a RST came in prior to accept()
on a TCP connection.

The fix is two-fold: to expand the coverage of the accept mutex earlier
in sofree() to prevent accept() from grabbing the socket after the "is it
really safe to free" tests, and to expand the logic of the "is it really
safe to free" tests to check that the refcount is still 0 (i.e., we
didn't race).

RELENG_5 candidate.

Much discussion with and work by:	green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>

Revision 1.208.2.2: download - view: text, markup, annotated - select for diffs
Tue Sep 7 23:27:07 2004 UTC (7 years, 5 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208.2.1: preferred, colored; branchpoint 1.208: preferred, colored
Changes since revision 1.208.2.1: +4 -4 lines
Merge uipc_socket.c:1.212 to RELENG_5:

  date: 2004/09/05 14:33:21;  author: rwatson;  state: Exp;  lines: +4 -4
  Expand the scope of the socket buffer locks in sopoll() to include the
  state test as well as set, or we risk a race between a socket wakeup
  and registering for select() or poll() on the socket.  This does
  increase the cost of the poll operation, but can probably be optimized
  some in the future.

  This appears to correct poll() "wedges" experienced with X11 on SMP
  systems with highly interactive applications, and might affect a plethora
  of other select() driven applications.

  RELENG_5 candidate.

  Problem reported by:    Maxim Maximov <mcsi at mcsi dot pp dot ru>
  Debugged with help of:  dwhite

Approved by:	re (scottl)

Revision 1.212: download - view: text, markup, annotated - select for diffs
Sun Sep 5 14:33:21 2004 UTC (7 years, 5 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.211: preferred, colored
Changes since revision 1.211: +4 -4 lines
Expand the scope of the socket buffer locks in sopoll() to include the
state test as well as set, or we risk a race between a socket wakeup
and registering for select() or poll() on the socket.  This does
increase the cost of the poll operation, but can probably be optimized
some in the future.

This appears to correct poll() "wedges" experienced with X11 on SMP
systems with highly interactive applications, and might affect a plethora
of other select() driven applications.

RELENG_5 candidate.

Problem reported by:	Maxim Maximov <mcsi at mcsi dot pp dot ru>
Debugged with help of:	dwhite

Revision 1.208.2.1: download - view: text, markup, annotated - select for diffs
Mon Aug 30 17:54:57 2004 UTC (7 years, 5 months ago) by rwatson
Branches: RELENG_5
Diff to: previous 1.208: preferred, colored
Changes since revision 1.208: +16 -35 lines
Merge kern_socket.c:1.211 to RELENG_5:

  date: 2004/08/24 05:28:18;  author: rwatson;  state: Exp;  lines: +16 -35
  Conditional acquisition of socket buffer mutexes when testing socket
  buffers with kqueue filters is no longer required: the kqueue framework
  will guarantee that the mutex is held on entering the filter, either
  due to a call from the socket code already holding the mutex, or by
  explicitly acquiring it.  This removes the last of the conditional
  socket locking.

Approved by:	re (scottl)

Revision 1.211: download - view: text, markup, annotated - select for diffs
Tue Aug 24 05:28:18 2004 UTC (7 years, 5 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.210: preferred, colored
Changes since revision 1.210: +16 -35 lines
Conditional acquisition of socket buffer mutexes when testing socket
buffers with kqueue filters is no longer required: the kqueue framework
will guarantee that the mutex is held on entering the filter, either
due to a call from the socket code already holding the mutex, or by
explicitly acquiring it.  This removes the last of the conditional
socket locking.

Revision 1.210: download - view: text, markup, annotated - select for diffs
Fri Aug 20 16:24:23 2004 UTC (7 years, 5 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.209: preferred, colored
Changes since revision 1.209: +1 -3 lines
Back out uipc_socket.c:1.208, as it incorrectly assumes that all
sockets are connection-oriented for the purposes of kqueue
registration.  Since UDP sockets aren't connection-oriented, this
appeared to break a great many things, such as RPC-based
applications and services (i.e., NFS).  Since jmg isn't around I'm
backing this out before too many more feet are shot, but intend to
investigate the right solution with him once he's available.

Apologies to:	jmg
Discussed with:	imp, scottl

Revision 1.209: download - view: text, markup, annotated - select for diffs
Fri Aug 20 04:15:30 2004 UTC (7 years, 5 months ago) by jmg
Branches: MAIN
Diff to: previous 1.208: preferred, colored
Changes since revision 1.208: +3 -1 lines
make sure that the socket is either accepting connections or is connected
when attaching a knote to it...  otherwise return EINVAL...

Pointed out by:	benno

Revision 1.208: download - view: text, markup, annotated - select for diffs
Sun Aug 15 06:24:41 2004 UTC (7 years, 5 months ago) by jmg
Branches: MAIN
CVS tags: RELENG_5_BP
Branch point for: RELENG_5
Diff to: previous 1.207: preferred, colored
Changes since revision 1.207: +10 -6 lines
Add locking to the kqueue subsystem.  This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)

Revision 1.207: download - view: text, markup, annotated - select for diffs
Wed Aug 11 03:43:10 2004 UTC (7 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.206: preferred, colored
Changes since revision 1.206: +1 -1 lines
Replace a reference to splnet() with a reference to locking in a comment.

Revision 1.206: download - view: text, markup, annotated - select for diffs
Sun Jul 25 23:29:47 2004 UTC (7 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.205: preferred, colored
Changes since revision 1.205: +76 -29 lines
Do some initial locking on accept filter registration and attach.  While
here, close some races that existed in the pre-locking world during low
memory conditions.  This locking isn't perfect, but it's closer than
before.

Revision 1.205: download - view: text, markup, annotated - select for diffs
Sun Jul 18 19:10:36 2004 UTC (7 years, 6 months ago) by dwmalone
Branches: MAIN
Diff to: previous 1.204: preferred, colored
Changes since revision 1.204: +16 -12 lines
The recent changes to control message passing broke some things
that get certain types of control messages (ping6 and rtsol are
examples). This gets the new code closer to working:

	1) Collect control mbufs for processing in the controlp ==
	NULL case, so that they can be freed by externalize.

	2) Loop over the list of control mbufs, as the externalize
	function may not know how to deal with chains.

	3) In the case where there is no externalize function,
	remember to add the control mbuf to the controlp list so
	that it will be returned.

	4) After adding stuff to the controlp list, walk to the
	end of the list of stuff that was added, incase we added
	a chain.

This code can be further improved, but this is enough to get most
things working again.

Reviewed by:	rwatson

Revision 1.204: download - view: text, markup, annotated - select for diffs
Fri Jul 16 00:37:34 2004 UTC (7 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.203: preferred, colored
Changes since revision 1.203: +2 -0 lines
When entering soclose(), assert that SS_NOFDREF is not already set.

Revision 1.203: download - view: text, markup, annotated - select for diffs
Mon Jul 12 21:42:33 2004 UTC (7 years, 7 months ago) by dwmalone
Branches: MAIN
Diff to: previous 1.202: preferred, colored
Changes since revision 1.202: +1 -1 lines
Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a
a better name. I have a kern_[sg]etsockopt which I plan to commit
shortly, but the arguments to these function will be quite different
from so_setsockopt.

Approved by:	alfred

Revision 1.202: download - view: text, markup, annotated - select for diffs
Mon Jul 12 06:22:42 2004 UTC (7 years, 7 months ago) by alfred
Branches: MAIN
Diff to: previous 1.201: preferred, colored
Changes since revision 1.201: +19 -0 lines
Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts.
Tune the timeout from 5 seconds to 12 seconds.
Provide a sysctl to show how many reconnects the NFS client has done.

Seems to fix IPv6 from: kuriyama

Revision 1.201: download - view: text, markup, annotated - select for diffs
Sun Jul 11 23:13:14 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.200: preferred, colored
Changes since revision 1.200: +47 -34 lines
Use sockbuf_pushsync() to synchronize stack and socket buffer state
in soreceive() after removing an MT_SONAME mbuf from the head of the
socket buffer.

When processing MT_CONTROL mbufs in soreceive(), first remove all of
the MT_CONTROL mbufs from the head of the socket buffer to a local
mbuf chain, then feed them into dom_externalize() as a set, which
both avoids thrashing the socket buffer lock when handling multiple
control mbufs, and also avoids races with other threads acting on
the socket buffer when the socket buffer mutex is released to enter
the externalize code.  Existing races that might occur if the protocol
externalize method blocked during processing have also been closed.

Now that we synchronize socket buffer and stack state following
modifications to the socket buffer, turn the manual synchronization
that previously followed control mbuf processing with a set of
assertions.  This can eventually be removed.

The soreceive() code is now substantially more MPSAFE.

Revision 1.200: download - view: text, markup, annotated - select for diffs
Sun Jul 11 22:59:32 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.199: preferred, colored
Changes since revision 1.199: +38 -0 lines
Add sockbuf_pushsync(), an inline function that, following a change to
the head of the mbuf chains in a socket buffer, re-synchronizes the
cache pointers used to optimize socket buffer appends.  This will be
used by soreceive() before dropping socket buffer mutexes to make sure
a consistent version of the socket buffer is visible to other threads.

While here, update copyright to account for substantial rewrite of much
socket code required for fine-grained locking.

Revision 1.199: download - view: text, markup, annotated - select for diffs
Sun Jul 11 18:29:47 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.198: preferred, colored
Changes since revision 1.198: +35 -1 lines
Add additional annotations to soreceive(), documenting the effects of
locking on 'nextrecord' and concerns regarding potentially inconsistent
or stale use of socket buffer or stack fields if they aren't carefully
synchronized whenever the socket buffer mutex is released.  Document
that the high-level sblock() prevents races against other readers on
the socket.

Also document the 'type' logic as to how soreceive() guarantees that
it will only return one of normal data or inline out-of-band data.

Revision 1.198: download - view: text, markup, annotated - select for diffs
Sun Jul 11 01:44:12 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.197: preferred, colored
Changes since revision 1.197: +1 -0 lines
In the 'dontblock' section of soreceive(), assert that the mbuf on hand
('m') is in fact the first mbuf in the receive socket buffer.

Revision 1.197: download - view: text, markup, annotated - select for diffs
Sun Jul 11 01:34:34 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.196: preferred, colored
Changes since revision 1.196: +63 -38 lines
Break out non-inline out-of-band data receive code from soreceive()
and put it in its own helper function soreceive_rcvoob().

Revision 1.196: download - view: text, markup, annotated - select for diffs
Sun Jul 11 01:22:40 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.195: preferred, colored
Changes since revision 1.195: +2 -2 lines
Assign pointers values of NULL rather than 0 in soreceive().

Revision 1.195: download - view: text, markup, annotated - select for diffs
Sat Jul 10 21:43:35 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.194: preferred, colored
Changes since revision 1.194: +2 -0 lines
When the MT_SONAME mbuf is popped off of a receive socket buffer
associated with a PR_ADDR protocol, make sure to update the m_nextpkt
pointer of the new head mbuf on the chain to point to the next record.
Otherwise, when we release the socket buffer mutex, the socket buffer
mbuf chain may be in an inconsistent state.

Revision 1.194: download - view: text, markup, annotated - select for diffs
Sat Jul 10 04:38:06 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.193: preferred, colored
Changes since revision 1.193: +1 -4 lines
Now socket buffer locks are being asserted at higher code blocks in
soreceive(), remove some leaf assertions that are redundant.

Revision 1.193: download - view: text, markup, annotated - select for diffs
Sat Jul 10 03:47:15 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.192: preferred, colored
Changes since revision 1.192: +5 -0 lines
Assert socket buffer lock at strategic points between sections of code
in soreceive() to confirm we've moved from block to block properly
maintaining locking invariants.

Revision 1.192: download - view: text, markup, annotated - select for diffs
Mon Jul 5 19:29:33 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.191: preferred, colored
Changes since revision 1.191: +4 -1 lines
Drop the socket buffer lock around a call to m_copym() with M_TRYWAIT.
A subset of locking changes to soreceive() in the queue for merging.

Bumped into by:	Willem Jan Withagen <wjw@withagen.nl>

Revision 1.191: download - view: text, markup, annotated - select for diffs
Sun Jun 27 03:22:15 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.190: preferred, colored
Changes since revision 1.190: +26 -2 lines
Add a new global mutex, so_global_mtx, which protects the global variables
so_gencnt, numopensockets, and the per-socket field so_gencnt.  Annotate
this this might be better done with atomic operations.

Annotate what accept_mtx protects.

Revision 1.190: download - view: text, markup, annotated - select for diffs
Sat Jun 26 17:12:29 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.189: preferred, colored
Changes since revision 1.189: +4 -1 lines
Replace comment on spl state when calling soabort() with a comment on
locking state.  No socket locks should be held when calling soabort()
as it will call into protocol code that may acquire socket locks.

Revision 1.189: download - view: text, markup, annotated - select for diffs
Thu Jun 24 04:28:30 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.188: preferred, colored
Changes since revision 1.188: +4 -0 lines
Lock socket buffers when processing setting socket options SO_SNDLOWAT
or SO_RCVLOWAT for read-modify-write.

Revision 1.188: download - view: text, markup, annotated - select for diffs
Thu Jun 24 00:54:26 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.187: preferred, colored
Changes since revision 1.187: +2 -2 lines
Slide socket buffer lock earlier in sopoll() to cover the call into
selrecord(), setting up select and flagging the socker buffers as SB_SEL
and setting up select under the lock.

Revision 1.187: download - view: text, markup, annotated - select for diffs
Tue Jun 22 03:49:22 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.186: preferred, colored
Changes since revision 1.186: +8 -40 lines
Remove spl's from uipc_socket to ease in merging.

Revision 1.186: download - view: text, markup, annotated - select for diffs
Mon Jun 21 00:20:42 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.185: preferred, colored
Changes since revision 1.185: +31 -7 lines
Merge next step in socket buffer locking:

- sowakeup() now asserts the socket buffer lock on entry.  Move
  the call to KNOTE higher in sowakeup() so that it is made with
  the socket buffer lock held for consistency with other calls.
  Release the socket buffer lock prior to calling into pgsigio(),
  so_upcall(), or aio_swake().  Locking for this event management
  will need revisiting in the future, but this model avoids lock
  order reversals when upcalls into other subsystems result in
  socket/socket buffer operations.  Assert that the socket buffer
  lock is not held at the end of the function.

- Wrapper macros for sowakeup(), sorwakeup() and sowwakeup(), now
  have _locked versions which assert the socket buffer lock on
  entry.  If a wakeup is required by sb_notify(), invoke
  sowakeup(); otherwise, unconditionally release the socket buffer
  lock.  This results in the socket buffer lock being released
  whether a wakeup is required or not.

- Break out socantsendmore() into socantsendmore_locked() that
  asserts the socket buffer lock.  socantsendmore()
  unconditionally locks the socket buffer before calling
  socantsendmore_locked().  Note that both functions return with
  the socket buffer unlocked as socantsendmore_locked() calls
  sowwakeup_locked() which has the same properties.  Assert that
  the socket buffer is unlocked on return.

- Break out socantrcvmore() into socantrcvmore_locked() that
  asserts the socket buffer lock.  socantrcvmore() unconditionally
  locks the socket buffer before calling socantrcvmore_locked().
  Note that both functions return with the socket buffer unlocked
  as socantrcvmore_locked() calls sorwakeup_locked() which has
  similar properties.  Assert that the socket buffer is unlocked
  on return.

- Break out sbrelease() into a sbrelease_locked() that asserts the
  socket buffer lock.  sbrelease() unconditionally locks the
  socket buffer before calling sbrelease_locked().
  sbrelease_locked() now invokes sbflush_locked() instead of
  sbflush().

- Assert the socket buffer lock in socket buffer sanity check
  functions sblastrecordchk(), sblastmbufchk().

- Assert the socket buffer lock in SBLINKRECORD().

- Break out various sbappend() functions into sbappend_locked()
  (and variations on that name) that assert the socket buffer
  lock.  The !_locked() variations unconditionally lock the socket
  buffer before calling their _locked counterparts.  Internally,
  make sure to call _locked() support routines, etc, if already
  holding the socket buffer lock.

- Break out sbinsertoob() into sbinsertoob_locked() that asserts
  the socket buffer lock.  sbinsertoob() unconditionally locks the
  socket buffer before calling sbinsertoob_locked().

- Break out sbflush() into sbflush_locked() that asserts the
  socket buffer lock.  sbflush() unconditionally locks the socket
  buffer before calling sbflush_locked().  Update panic strings
  for new function names.

- Break out sbdrop() into sbdrop_locked() that asserts the socket
  buffer lock.  sbdrop() unconditionally locks the socket buffer
  before calling sbdrop_locked().

- Break out sbdroprecord() into sbdroprecord_locked() that asserts
  the socket buffer lock.  sbdroprecord() unconditionally locks
  the socket buffer before calling sbdroprecord_locked().

- sofree() now calls socantsendmore_locked() and re-acquires the
  socket buffer lock on return.  It also now calls
  sbrelease_locked().

- sorflush() now calls socantrcvmore_locked() and re-acquires the
  socket buffer lock on return.  Clean up/mess up other behavior
  in sorflush() relating to the temporary stack copy of the socket
  buffer used with dom_dispose by more properly initializing the
  temporary copy, and selectively bzeroing/copying more carefully
  to prevent WITNESS from getting confused by improperly
  initialized mutexes.  Annotate why that's necessary, or at
  least, needed.

- soisconnected() now calls sbdrop_locked() before unlocking the
  socket buffer to avoid locking overhead.

Some parts of this change were:

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS

Revision 1.185: download - view: text, markup, annotated - select for diffs
Sun Jun 20 17:50:42 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.184: preferred, colored
Changes since revision 1.184: +7 -0 lines
When retrieving the SO_LINGER socket option for user space, hold the
socket lock over pulling so_options and so_linger out of the socket
structure in order to retrieve a consistent snapshot.  This may be
overkill if user space doesn't require a consistent snapshot.

Revision 1.184: download - view: text, markup, annotated - select for diffs
Sun Jun 20 17:47:51 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.183: preferred, colored
Changes since revision 1.183: +1 -2 lines
Convert an if->panic in soclose() into a call to KASSERT().

Revision 1.183: download - view: text, markup, annotated - select for diffs
Sun Jun 20 17:38:19 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.182: preferred, colored
Changes since revision 1.182: +5 -0 lines
Annotate some ordering-related issues in solisten() which are not yet
resolved by socket locking: in particular, that we test the connection
state at the socket layer without locking, request that the protocol
begin listening, and then set the listen state on the socket
non-atomically, resulting in a non-atomic cross-layer test-and-set.

Revision 1.182: download - view: text, markup, annotated - select for diffs
Sat Jun 19 03:23:14 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.181: preferred, colored
Changes since revision 1.181: +60 -25 lines
Assert socket buffer lock in sb_lock() to protect socket buffer sleep
lock state.  Convert tsleep() into msleep() with socket buffer mutex
as argument.  Hold socket buffer lock over sbunlock() to protect sleep
lock state.

Assert socket buffer lock in sbwait() to protect the socket buffer
wait state.  Convert tsleep() into msleep() with socket buffer mutex
as argument.

Modify sofree(), sosend(), and soreceive() to acquire SOCKBUF_LOCK()
in order to call into these functions with the lock, as well as to
start protecting other socket buffer use in their implementation.  Drop
the socket buffer mutexes around calls into the protocol layer, around
potentially blocking operations, for copying to/from user space, and
VM operations relating to zero-copy.  Assert the socket buffer mutex
strategically after code sections or at the beginning of loops.  In
some cases, modify return code to ensure locks are properly dropped.

Convert the potentially blocking allocation of storage for the remote
address in soreceive() into a non-blocking allocation; we may wish to
move the allocation earlier so that it can block prior to acquisition
of the socket buffer lock.

Drop some spl use.

NOTE: Some races exist in the current structuring of sosend() and
soreceive().  This commit only merges basic socket locking in this
code; follow-up commits will close additional races.  As merged,
these changes are not sufficient to run without Giant safely.

Reviewed by:	juli, tjr

Revision 1.181: download - view: text, markup, annotated - select for diffs
Fri Jun 18 04:02:56 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.180: preferred, colored
Changes since revision 1.180: +4 -1 lines
Hold SOCK_LOCK(so) while frobbing so_options.  Note that while the
local race is corrected, there's still a global race in sosend()
relating to so_options and the SO_DONTROUTE flag.

Revision 1.180: download - view: text, markup, annotated - select for diffs
Fri Jun 18 02:57:55 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.179: preferred, colored
Changes since revision 1.179: +26 -13 lines
Merge some additional leaf node socket buffer locking from
rwatson_netperf:

Introduce conditional locking of the socket buffer in fifofs kqueue
filters; KNOTE() will be called holding the socket buffer locks in
fifofs, but sometimes the kqueue() system call will poll using the
same entry point without holding the socket buffer lock.

Introduce conditional locking of the socket buffer in the socket
kqueue filters; KNOTE() will be called holding the socket buffer
locks in the socket code, but sometimes the kqueue() system call
will poll using the same entry points without holding the socket
buffer lock.

Simplify the logic in sodisconnect() since we no longer need spls.

NOTE: To remove conditional locking in the kqueue filters, it would
make sense to use a separate kqueue API entry into the socket/fifo
code when calling from the kqueue() system call.

Revision 1.179: download - view: text, markup, annotated - select for diffs
Thu Jun 17 22:48:09 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.178: preferred, colored
Changes since revision 1.178: +25 -14 lines
Merge additional socket buffer locking from rwatson_netperf:

- Lock down low hanging fruit use of sb_flags with socket buffer
  lock.

- Lock down low hanging fruit use of so_state with socket lock.

- Lock down low hanging fruit use of so_options.

- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with
  socket buffer lock.

- Annotate situations in which we unlock the socket lock and then
  grab the receive socket buffer lock, which are currently actually
  the same lock.  Depending on how we want to play our cards, we
  may want to coallesce these lock uses to reduce overhead.

- Convert a if()->panic() into a KASSERT relating to so_state in
  soaccept().

- Remove a number of splnet()/splx() references.

More complex merging of socket and socket buffer locking to
follow.

Revision 1.178: download - view: text, markup, annotated - select for diffs
Mon Jun 14 18:16:19 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.177: preferred, colored
Changes since revision 1.177: +10 -10 lines
The socket field so_state is used to hold a variety of socket related
flags relating to several aspects of socket functionality.  This change
breaks out several bits relating to send and receive operation into a
new per-socket buffer field, sb_state, in order to facilitate locking.
This is required because, in order to provide more granular locking of
sockets, different state fields have different locking properties.  The
following fields are moved to sb_state:

  SS_CANTRCVMORE            (so_state)
  SS_CANTSENDMORE           (so_state)
  SS_RCVATMARK              (so_state)

Rename respectively to:

  SBS_CANTRCVMORE           (so_rcv.sb_state)
  SBS_CANTSENDMORE          (so_snd.sb_state)
  SBS_RCVATMARK             (so_rcv.sb_state)

This facilitates locking by isolating fields to be located with other
identically locked fields, and permits greater granularity in socket
locking by avoiding storing fields with different locking semantics in
the same short (avoiding locking conflicts).  In the future, we may
wish to coallesce sb_state and sb_flags; for the time being I leave
them separate and there is no additional memory overhead due to the
packing/alignment of shorts in the socket buffer structure.

Revision 1.177: download - view: text, markup, annotated - select for diffs
Sat Jun 12 20:47:28 2004 UTC (7 years, 7 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.176: preferred, colored
Changes since revision 1.176: +10 -1 lines
Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
  soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
  so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
  various contexts in the stack, both at the socket and protocol
  layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
  this could result in frobbing of a non-present socket if
  sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
  they don't free the socket.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS

Revision 1.176: download - view: text, markup, annotated - select for diffs
Sat Jun 12 16:08:41 2004 UTC (7 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.175: preferred, colored
Changes since revision 1.175: +4 -0 lines
Introduce a mutex into struct sockbuf, sb_mtx, which will be used to
protect fields in the socket buffer.  Add accessor macros to use the
mutex (SOCKBUF_*()).  Initialize the mutex in soalloc(), and destroy
it in sodealloc().  Add addition, add SOCK_*() access macros which
will protect most remaining fields in the socket; for the time being,
use the receive socket buffer mutex to implement socket level locking
to reduce memory overhead.

Submitted by:	sam
Sponosored by:	FreeBSD Foundation
Obtained from:	BSD/OS

Revision 1.175: download - view: text, markup, annotated - select for diffs
Tue Jun 8 13:08:17 2004 UTC (7 years, 8 months ago) by stefanf
Branches: MAIN
Diff to: previous 1.174: preferred, colored
Changes since revision 1.174: +2 -2 lines
Avoid assignments to cast expressions.

Reviewed by:	md5
Approved by:	das (mentor)

Revision 1.174: download - view: text, markup, annotated - select for diffs
Wed Jun 2 04:15:37 2004 UTC (7 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.173: preferred, colored
Changes since revision 1.173: +49 -22 lines
Integrate accept locking from rwatson_netperf, introducing a new
global mutex, accept_mtx, which serializes access to the following
fields across all sockets:

          so_qlen          so_incqlen         so_qstate
          so_comp          so_incomp          so_list
          so_head

While providing only coarse granularity, this approach avoids lock
order issues between sockets by avoiding ownership of the fields
by a specific socket and its per-socket mutexes.

While here, rewrite soclose(), sofree(), soaccept(), and
sonewconn() to add assertions, close additional races and  address
lock order concerns.  In particular:

- Reorganize the optimistic concurrency behavior in accept1() to
  always allocate a file descriptor with falloc() so that if we do
  find a socket, we don't have to encounter the "Oh, there wasn't
  a socket" race that can occur if falloc() sleeps in the current
  code, which broke inbound accept() ordering, not to mention
  requiring backing out socket state changes in a way that raced
  with the protocol level.  We may want to add a lockless read of
  the queue state if polling of empty queues proves to be important
  to optimize.

- In accept1(), soref() the socket while holding the accept lock
  so that the socket cannot be free'd in a race with the protocol
  layer.  Likewise in netgraph equivilents of the accept1() code.

- In sonewconn(), loop waiting for the queue to be small enough to
  insert our new socket once we've committed to inserting it, or
  races can occur that cause the incomplete socket queue to
  overfill.  In the previously implementation, it was sufficient
  to simply tested once since calling soabort() didn't release
  synchronization permitting another thread to insert a socket as
  we discard a previous one.

- In soclose()/sofree()/et al, it is the responsibility of the
  caller to remove a socket from the incomplete connection queue
  before calling soabort(), which prevents soabort() from having
  to walk into the accept socket to release the socket from its
  queue, and avoids races when releasing the accept mutex to enter
  soabort(), permitting soabort() to avoid lock ordering issues
  with the caller.

- Generally cluster accept queue related operations together
  throughout these functions in order to facilitate locking.

Annotate new locking in socketvar.h.

Revision 1.173: download - view: text, markup, annotated - select for diffs
Tue Jun 1 02:42:55 2004 UTC (7 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.172: preferred, colored
Changes since revision 1.172: +4 -4 lines
The SS_COMP and SS_INCOMP flags in the so_state field indicate whether
the socket is on an accept queue of a listen socket.  This change
renames the flags to SQ_COMP and SQ_INCOMP, and moves them to a new
state field on the socket, so_qstate, as the locking for these flags
is substantially different for the locking on the remainder of the
flags in so_state.

Revision 1.172: download - view: text, markup, annotated - select for diffs
Tue Jun 1 01:18:51 2004 UTC (7 years, 8 months ago) by truckman
Branches: MAIN
Diff to: previous 1.171: preferred, colored
Changes since revision 1.171: +3 -2 lines
Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call.  The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation.  The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.

Revision 1.171: download - view: text, markup, annotated - select for diffs
Mon May 31 21:46:04 2004 UTC (7 years, 8 months ago) by bmilekic
Branches: MAIN
Diff to: previous 1.170: preferred, colored
Changes since revision 1.170: +55 -38 lines
Bring in mbuma to replace mballoc.

mbuma is an Mbuf & Cluster allocator built on top of a number of
extensions to the UMA framework, all included herein.

Extensions to UMA worth noting:
  - Better layering between slab <-> zone caches; introduce
    Keg structure which splits off slab cache away from the
    zone structure and allows multiple zones to be stacked
    on top of a single Keg (single type of slab cache);
    perhaps we should look into defining a subset API on
    top of the Keg for special use by malloc(9),
    for example.
  - UMA_ZONE_REFCNT zones can now be added, and reference
    counters automagically allocated for them within the end
    of the associated slab structures.  uma_find_refcnt()
    does a kextract to fetch the slab struct reference from
    the underlying page, and lookup the corresponding refcnt.

mbuma things worth noting:
  - integrates mbuf & cluster allocations with extended UMA
    and provides caches for commonly-allocated items; defines
    several zones (two primary, one secondary) and two kegs.
  - change up certain code paths that always used to do:
    m_get() + m_clget() to instead just use m_getcl() and
    try to take advantage of the newly defined secondary
    Packet zone.
  - netstat(1) and systat(1) quickly hacked up to do basic
    stat reporting but additional stats work needs to be
    done once some other details within UMA have been taken
    care of and it becomes clearer to how stats will work
    within the modified framework.

From the user perspective, one implication is that the
NMBCLUSTERS compile-time option is no longer used.  The
maximum number of clusters is still capped off according
to maxusers, but it can be made unlimited by setting
the kern.ipc.nmbclusters boot-time tunable to zero.
Work should be done to write an appropriate sysctl
handler allowing dynamic tuning of kern.ipc.nmbclusters
at runtime.

Additional things worth noting/known issues (READ):
   - One report of 'ips' (ServeRAID) driver acting really
     slow in conjunction with mbuma.  Need more data.
     Latest report is that ips is equally sucking with
     and without mbuma.
   - Giant leak in NFS code sometimes occurs, can't
     reproduce but currently analyzing; brueffer is
     able to reproduce but THIS IS NOT an mbuma-specific
     problem and currently occurs even WITHOUT mbuma.
   - Issues in network locking: there is at least one
     code path in the rip code where one or more locks
     are acquired and we end up in m_prepend() with
     M_WAITOK, which causes WITNESS to whine from within
     UMA.  Current temporary solution: force all UMA
     allocations to be M_NOWAIT from within UMA for now
     to avoid deadlocks unless WITNESS is defined and we
     can determine with certainty that we're not holding
     any locks when we're M_WAITOK.
   - I've seen at least one weird socketbuffer empty-but-
     mbuf-still-attached panic.  I don't believe this
     to be related to mbuma but please keep your eyes
     open, turn on debugging, and capture crash dumps.

This change removes more code than it adds.

A paper is available detailing the change and considering
various performance issues, it was presented at BSDCan2004:
http://www.unixdaemons.com/~bmilekic/netbuf_bmilekic.pdf
Please read the paper for Future Work and implementation
details, as well as credits.

Testing and Debugging:
    rwatson,
    brueffer,
    Ketrien I. Saihr-Kesenchedra,
    ...
Reviewed by: Lots of people (for different parts)

Revision 1.170: download - view: text, markup, annotated - select for diffs
Fri Apr 9 13:23:51 2004 UTC (7 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.169: preferred, colored
Changes since revision 1.169: +53 -51 lines
Compare pointers with NULL rather than using pointers are booleans in
if/for statements.  Assign pointers to NULL rather than typecast 0.
Compare pointers with NULL rather than 0.

Revision 1.169: download - view: text, markup, annotated - select for diffs
Mon Apr 5 21:03:36 2004 UTC (7 years, 10 months ago) by imp
Branches: MAIN
Diff to: previous 1.168: preferred, colored
Changes since revision 1.168: +0 -4 lines
Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core

Revision 1.168: download - view: text, markup, annotated - select for diffs
Wed Mar 31 03:48:35 2004 UTC (7 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.167: preferred, colored
Changes since revision 1.167: +2 -1 lines
In sofree(), avoid nested declaration and initialization in
declaration.  Observe that initialization in declaration is
frequently incompatible with locking, not just a bad idea
due to style(9).

Submitted by:	bde

Revision 1.167: download - view: text, markup, annotated - select for diffs
Mon Mar 29 18:06:15 2004 UTC (7 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.166: preferred, colored
Changes since revision 1.166: +20 -16 lines
Use a common return path for filt_soread() and filt_sowrite() to
simplify the impact of locking on these functions.

Submitted by:	sam
Sponsored by:	FreeBSD Foundation

Revision 1.166: download - view: text, markup, annotated - select for diffs
Mon Mar 29 17:57:43 2004 UTC (7 years, 10 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.165: preferred, colored
Changes since revision 1.165: +2 -2 lines
In sofree(), moving caching of 'head' from 'so->so_head' to later in
the function once it has been determined to be non-NULL to simplify
locking on an earlier return.

Revision 1.68.2.25: download - view: text, markup, annotated - select for diffs
Mon Mar 22 23:59:54 2004 UTC (7 years, 10 months ago) by ps
Branches: RELENG_4
CVS tags: RELENG_4_11_BP, RELENG_4_11_0_RELEASE, RELENG_4_11, RELENG_4_10_BP, RELENG_4_10_0_RELEASE, RELENG_4_10
Diff to: previous 1.68.2.24: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.24: +52 -3 lines
MFC:
speedup stream socket recv handling by tracking the tail of the
mbuf chain instead of walking the list for each append.  This has
been pretty well tested at Yahoo!

Obtained from:	netbsd (jason thorpe)
Reviewed by:	silby

Revision 1.165: download - view: text, markup, annotated - select for diffs
Mon Mar 1 03:14:21 2004 UTC (7 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.164: preferred, colored
Changes since revision 1.164: +3 -3 lines
Rename dup_sockaddr() to sodupsockaddr() for consistency with other
functions in kern_socket.c.

Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".

Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.

Submitted by:	sam

Revision 1.164: download - view: text, markup, annotated - select for diffs
Mon Mar 1 01:14:28 2004 UTC (7 years, 11 months ago) by scottl
Branches: MAIN
Diff to: previous 1.163: preferred, colored
Changes since revision 1.163: +1 -1 lines
Convert the other use of flags to mflags in soalloc().

Revision 1.163: download - view: text, markup, annotated - select for diffs
Sun Feb 29 17:54:05 2004 UTC (7 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.162: preferred, colored
Changes since revision 1.162: +3 -10 lines
Modify soalloc() API so that it accepts a malloc flags argument rather
than a "waitok" argument.  Callers now passing M_WAITOK or M_NOWAIT
rather than 0 or 1.  This simplifies the soalloc() logic, and also
makes the waiting behavior of soalloc() more clear in the calling
context.

Submitted by:	sam

Revision 1.162: download - view: text, markup, annotated - select for diffs
Thu Feb 12 01:48:40 2004 UTC (8 years ago) by green
Branches: MAIN
Diff to: previous 1.161: preferred, colored
Changes since revision 1.161: +7 -0 lines
Always socantsendmore() before deallocating a socket.  This, in turn,
calls selwakeup() if necessary (which it is, if you don't want freed
memory hanging around on your td->td_selq).

Props to:	alfred

Revision 1.161: download - view: text, markup, annotated - select for diffs
Sat Jan 31 10:40:23 2004 UTC (8 years ago) by phk
Branches: MAIN
Diff to: previous 1.160: preferred, colored
Changes since revision 1.160: +2 -0 lines
Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead.  Simultaneous
use of the two options is possible and they will return consistent
timestamps.

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.

Revision 1.160: download - view: text, markup, annotated - select for diffs
Sun Jan 18 14:02:53 2004 UTC (8 years ago) by ru
Branches: MAIN
Diff to: previous 1.159: preferred, colored
Changes since revision 1.159: +1 -0 lines
Since "m" is not part of the "mp" chain, need to free() it.

Reported by:	Stanford Metacompilation research group

Revision 1.159: download - view: text, markup, annotated - select for diffs
Sun Nov 16 18:25:20 2003 UTC (8 years, 2 months ago) by rwatson
Branches: MAIN
CVS tags: RELENG_5_2_BP, RELENG_5_2_1_RELEASE, RELENG_5_2_0_RELEASE, RELENG_5_2
Diff to: previous 1.158: preferred, colored
Changes since revision 1.158: +5 -7 lines
Reduce gratuitous redundancy and length in function names:

  mac_setsockopt_label_set() -> mac_setsockopt_label()
  mac_getsockopt_label_get() -> mac_getsockopt_label()
  mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel()

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories

Revision 1.158: download - view: text, markup, annotated - select for diffs
Sun Nov 16 03:53:36 2003 UTC (8 years, 2 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.157: preferred, colored
Changes since revision 1.157: +8 -0 lines
When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, make
sure to sooptcopyin() the (struct mac) so that the MAC Framework
knows which label types are being requested.  This fixes process
queries of socket labels.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories

Revision 1.68.2.24: download - view: text, markup, annotated - select for diffs
Tue Nov 11 17:18:18 2003 UTC (8 years, 3 months ago) by silby
Branches: RELENG_4
Diff to: previous 1.68.2.23: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.23: +4 -0 lines
MFC rev 1.143; don't allow a listen on already connected sockets

Revision 1.157: download - view: text, markup, annotated - select for diffs
Sun Nov 9 09:17:24 2003 UTC (8 years, 3 months ago) by tanimura
Branches: MAIN
Diff to: previous 1.156: preferred, colored
Changes since revision 1.156: +1 -1 lines
- Implement selwakeuppri() which allows raising the priority of a
  thread being waken up.  The thread waken up can run at a priority as
  high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
  priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
  threads.  Used by selwakeuppri() if collision occurs.

Not objected in:	-arch, -current

Revision 1.156: download - view: text, markup, annotated - select for diffs
Tue Oct 28 05:47:39 2003 UTC (8 years, 3 months ago) by sam
Branches: MAIN
Diff to: previous 1.155: preferred, colored
Changes since revision 1.155: +52 -3 lines
speedup stream socket recv handling by tracking the tail of
the mbuf chain instead of walking the list for each append

Submitted by:	ps/jayanth
Obtained from:	netbsd (jason thorpe)

Revision 1.155: download - view: text, markup, annotated - select for diffs
Tue Oct 21 18:28:35 2003 UTC (8 years, 3 months ago) by silby
Branches: MAIN
Diff to: previous 1.154: preferred, colored
Changes since revision 1.154: +2 -1 lines
Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.

Revision 1.68.2.23: download - view: text, markup, annotated - select for diffs
Sun Aug 24 08:24:38 2003 UTC (8 years, 5 months ago) by hsu
Branches: RELENG_4
CVS tags: RELENG_4_9_BP, RELENG_4_9_0_RELEASE, RELENG_4_9
Diff to: previous 1.68.2.22: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.22: +1 -4 lines
Merge from -current support for Protocol Independent Multicast.

Submitted by:   Pavlin Radoslavov <pavlin@icir.org>

Revision 1.154: download - view: text, markup, annotated - select for diffs
Tue Aug 5 00:27:54 2003 UTC (8 years, 6 months ago) by hsu
Branches: MAIN
Diff to: previous 1.153: preferred, colored
Changes since revision 1.153: +1 -4 lines
Make the second argument to sooptcopyout() constant in order to
simplify the upcoming PIM patches.

Submitted by:   Pavlin Radoslavov <pavlin@icir.org>

Revision 1.153: download - view: text, markup, annotated - select for diffs
Thu Jul 17 23:49:10 2003 UTC (8 years, 6 months ago) by robert
Branches: MAIN
Diff to: previous 1.152: preferred, colored
Changes since revision 1.152: +7 -1 lines
To avoid a kernel panic provoked by a NULL pointer dereference,
do not clear the `sb_sel' member of the sockbuf structure
while invalidating the receive sockbuf in sorflush(), called
from soshutdown().

The panic was reproduceable from user land by attaching a knote
with EVFILT_READ filters to a socket, disabling further reads
from it using shutdown(2), and then closing it.  knote_remove()
was called to remove all knotes from the socket file descriptor
by detaching each using its associated filterops' detach call-
back function, sordetach() in this case, which tried to remove
itself from the invalidated sockbuf's klist (sb_sel.si_note).

PR:	kern/54331

Revision 1.152: download - view: text, markup, annotated - select for diffs
Mon Jul 14 20:39:22 2003 UTC (8 years, 6 months ago) by hsu
Branches: MAIN
Diff to: previous 1.151: preferred, colored
Changes since revision 1.151: +1 -1 lines
Rev 1.121 meant to pass the value 1 to soalloc() to indicate waitok.

Reported by:	arr

Revision 1.151: download - view: text, markup, annotated - select for diffs
Wed Jun 11 00:56:58 2003 UTC (8 years, 8 months ago) by obrien
Branches: MAIN
Diff to: previous 1.150: preferred, colored
Changes since revision 1.150: +3 -1 lines
Use __FBSDID().

Revision 1.150: download - view: text, markup, annotated - select for diffs
Tue Apr 29 13:36:03 2003 UTC (8 years, 9 months ago) by kan
Branches: MAIN
CVS tags: RELENG_5_1_BP, RELENG_5_1_0_RELEASE, RELENG_5_1
Diff to: previous 1.149: preferred, colored
Changes since revision 1.149: +1 -1 lines
Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on:	standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>

Revision 1.149: download - view: text, markup, annotated - select for diffs
Mon Apr 14 14:44:36 2003 UTC (8 years, 9 months ago) by cognet
Branches: MAIN
Diff to: previous 1.148: preferred, colored
Changes since revision 1.148: +1 -2 lines
Use while (*controlp != NULL) instead of do ... while (*control != NULL)
There are valid cases where *controlp will be NULL at this point.

Discussed with:	dwmalone

Revision 1.148: download - view: text, markup, annotated - select for diffs
Sun Mar 2 15:56:49 2003 UTC (8 years, 11 months ago) by des
Branches: MAIN
Diff to: previous 1.147: preferred, colored
Changes since revision 1.147: +33 -33 lines
Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.

Revision 1.147: download - view: text, markup, annotated - select for diffs
Sun Mar 2 15:50:23 2003 UTC (8 years, 11 months ago) by des
Branches: MAIN
Diff to: previous 1.146: preferred, colored
Changes since revision 1.146: +5 -5 lines
uiomove-related caddr_t -> void * (just the low-hanging fruit)

Revision 1.146: download - view: text, markup, annotated - select for diffs
Thu Feb 20 03:26:11 2003 UTC (8 years, 11 months ago) by cognet
Branches: MAIN
Diff to: previous 1.145: preferred, colored
Changes since revision 1.145: +0 -1 lines
Remove duplicate includes.

Submitted by:	Cyril Nguyen-Huu <cyril@ci0.org>

Revision 1.145: download - view: text, markup, annotated - select for diffs
Wed Feb 19 05:47:26 2003 UTC (8 years, 11 months ago) by imp
Branches: MAIN
Diff to: previous 1.144: preferred, colored
Changes since revision 1.144: +16 -16 lines
Back out M_* changes, per decision of the TRB.

Approved by: trb

Revision 1.144: download - view: text, markup, annotated - select for diffs
Tue Jan 21 08:55:55 2003 UTC (9 years ago) by alfred
Branches: MAIN
Diff to: previous 1.143: preferred, colored
Changes since revision 1.143: +16 -16 lines
Remove M_TRYWAIT/M_WAITOK/M_WAIT.  Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.

Revision 1.143: download - view: text, markup, annotated - select for diffs
Fri Jan 17 19:20:00 2003 UTC (9 years ago) by tmm
Branches: MAIN
Diff to: previous 1.142: preferred, colored
Changes since revision 1.142: +4 -0 lines
Disallow listen() on sockets which are in the SS_ISCONNECTED or
SS_ISCONNECTING state, returning EINVAL (which is what POSIX mandates
in this case).
listen() on connected or connecting sockets would cause them to enter
a bad state; in the TCP case, this could cause sockets to go
catatonic or panics, depending on how the socket was connected.

Reviewed by:	-net
MFC after:	2 weeks

Revision 1.142: download - view: text, markup, annotated - select for diffs
Mon Jan 13 00:28:55 2003 UTC (9 years ago) by dillon
Branches: MAIN
Diff to: previous 1.141: preferred, colored
Changes since revision 1.141: +6 -6 lines
Bow to the whining masses and change a union back into void *.  Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.

Revision 1.141: download - view: text, markup, annotated - select for diffs
Sun Jan 12 01:37:08 2003 UTC (9 years, 1 month ago) by dillon
Branches: MAIN
Diff to: previous 1.140: preferred, colored
Changes since revision 1.140: +6 -6 lines
Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.

Revision 1.140: download - view: text, markup, annotated - select for diffs
Sun Jan 5 11:14:04 2003 UTC (9 years, 1 month ago) by alfred
Branches: MAIN
Diff to: previous 1.139: preferred, colored
Changes since revision 1.139: +3 -9 lines
In sodealloc(), if there is an accept filter present on the socket
then call do_setopt_accept_filter(so, NULL) which will free the filter
instead of duplicating the code in do_setopt_accept_filter().

Pointed out by: Hiten Pandya <hiten@angelica.unixdaemons.com>

Revision 1.139: download - view: text, markup, annotated - select for diffs
Mon Dec 23 21:37:28 2002 UTC (9 years, 1 month ago) by phk
Branches: MAIN
Diff to: previous 1.138: preferred, colored
Changes since revision 1.138: +1 -1 lines
s/sokqfilter/soo_kqfilter/ for consistency with the naming of all
other socket/file operations.

Revision 1.68.2.22: download - view: text, markup, annotated - select for diffs
Sun Dec 15 09:24:23 2002 UTC (9 years, 1 month ago) by maxim
Branches: RELENG_4
CVS tags: RELENG_4_8_BP, RELENG_4_8_0_RELEASE, RELENG_4_8
Diff to: previous 1.68.2.21: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.21: +2 -0 lines
MFC rev. 1.138: small SO_RCVTIMEO and SO_SNDTIMEO values are mistakenly
taken to be zero.

Revision 1.138: download - view: text, markup, annotated - select for diffs
Wed Nov 27 13:34:04 2002 UTC (9 years, 2 months ago) by maxim
Branches: MAIN
CVS tags: RELENG_5_0_BP, RELENG_5_0_0_RELEASE, RELENG_5_0
Diff to: previous 1.137: preferred, colored
Changes since revision 1.137: +2 -0 lines
Small SO_RCVTIMEO and SO_SNDTIMEO values are mistakenly taken to be zero.

PR:		kern/32827
Submitted by:	Hartmut Brandt <brandt@fokus.gmd.de>
Approved by:	re (jhb)
MFC after:	2 weeks

Revision 1.137: download - view: text, markup, annotated - select for diffs
Sat Nov 9 12:55:06 2002 UTC (9 years, 3 months ago) by alfred
Branches: MAIN
Diff to: previous 1.136: preferred, colored
Changes since revision 1.136: +1 -1 lines
Fix instances of macros with improperly parenthasized arguments.

Verified by: md5

Revision 1.136: download - view: text, markup, annotated - select for diffs
Tue Nov 5 18:48:46 2002 UTC (9 years, 3 months ago) by kbyanc
Branches: MAIN
Diff to: previous 1.135: preferred, colored
Changes since revision 1.135: +1 -1 lines
Fix filt_soread() to properly flag a kevent when a 0-byte datagram is
received.

Verified by:	dougb, Manfred Antar <null@pozo.com>
Sponsored by:	NTT Multimedia Communications Labs

Revision 1.135: download - view: text, markup, annotated - select for diffs
Sat Nov 2 05:14:30 2002 UTC (9 years, 3 months ago) by alc
Branches: MAIN
Diff to: previous 1.134: preferred, colored
Changes since revision 1.134: +1 -1 lines
Revert the change in revision 1.77 of kern/uipc_socket2.c.  It is causing
a panic because the socket's state isn't as expected by sofree().

Discussed with: dillon, fenner

Revision 1.134: download - view: text, markup, annotated - select for diffs
Fri Nov 1 21:27:59 2002 UTC (9 years, 3 months ago) by kbyanc
Branches: MAIN
Diff to: previous 1.133: preferred, colored
Changes since revision 1.133: +1 -1 lines
Track the number of non-data chararacters stored in socket buffers so that
the data value returned by kevent()'s EVFILT_READ filter on non-TCP
sockets accurately reflects the amount of data that can be read from the
sockets by applications.

PR:		30634
Reviewed by:	-net, -arch
Sponsored by:	NTT Multimedia Communications Labs
MFC after:	2 weeks

Revision 1.133: download - view: text, markup, annotated - select for diffs
Mon Oct 28 21:17:53 2002 UTC (9 years, 3 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.132: preferred, colored
Changes since revision 1.132: +8 -8 lines
Trim extraneous #else and #endif MAC comments per style(9).

Revision 1.132: download - view: text, markup, annotated - select for diffs
Sat Oct 5 21:23:46 2002 UTC (9 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.131: preferred, colored
Changes since revision 1.131: +11 -3 lines
Modify label allocation semantics for sockets: pass in soalloc's malloc
flags so that we can call malloc with M_NOWAIT if necessary, avoiding
potential sleeps while holding mutexes in the TCP syncache code.
Similar to the existing support for mbuf label allocation: if we can't
allocate all the necessary label store in each policy, we back out
the label allocation and fail the socket creation.  Sync from MAC tree.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories

Revision 1.131: download - view: text, markup, annotated - select for diffs
Fri Aug 16 12:51:58 2002 UTC (9 years, 5 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.130: preferred, colored
Changes since revision 1.130: +2 -1 lines
Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential.  Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential.  Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument.  This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
  MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
  than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
  to vn_stat().  Pass active_cred to MAC and fp->f_cred to VOP_POLL()
  to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
  and consumers so that this distinction is maintained at the VFS
  as well as 'struct file' layer.  Pass active_cred instead of
  td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
  credential is properly initialized and can be used in the socket
  code if desired.  Pass ap->a_td->td_ucred as the active
  credential to soo_poll().  If we teach the vnop interface about
  the distinction between file and active credentials, we would use
  the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained.  It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs

Revision 1.130: download - view: text, markup, annotated - select for diffs
Mon Aug 12 16:49:03 2002 UTC (9 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.129: preferred, colored
Changes since revision 1.129: +2 -2 lines
Use the credential authorizing the socket creation operation to perform
the jail check and the MAC socket labeling in socreate().  This handles
socket creation using a cached credential better (such as in the NFS
client code when rebuilding a socket following a disconnect: the new
socket should be created using the nfsmount cached cred, not the cred
of the thread causing the socket to be rebuilt).

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs

Revision 1.129: download - view: text, markup, annotated - select for diffs
Thu Aug 1 17:47:56 2002 UTC (9 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.128: preferred, colored
Changes since revision 1.128: +1 -1 lines
Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by:	bde

Revision 1.128: download - view: text, markup, annotated - select for diffs
Thu Aug 1 03:45:40 2002 UTC (9 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.127: preferred, colored
Changes since revision 1.127: +42 -1 lines
Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement two IOCTLs at the socket level to retrieve the primary
and peer labels from a socket.  Note that this user process interface
will be changing to improve multi-policy support.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs

Revision 1.127: download - view: text, markup, annotated - select for diffs
Wed Jul 31 03:03:22 2002 UTC (9 years, 6 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.126: preferred, colored
Changes since revision 1.126: +11 -0 lines
Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke the necessary MAC entry points to maintain labels on sockets.
In particular, invoke entry points during socket allocation and
destruction, as well as creation by a process or during an
accept-scenario (sonewconn).  For UNIX domain sockets, also assign
a peer label.  As the socket code isn't locked down yet, locking
interactions are not yet clear.  Various protocol stack socket
operations (such as peer label assignment for IPv4) will follow.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs

Revision 1.126: download - view: text, markup, annotated - select for diffs
Wed Jul 24 14:21:41 2002 UTC (9 years, 6 months ago) by mike
Branches: MAIN
Diff to: previous 1.125: preferred, colored
Changes since revision 1.125: +1 -1 lines
Catch up to rev 1.87 of sys/sys/socketvar.h (sb_cc changed from u_long
to u_int).

Noticed by:	sparc64 tinderbox

Revision 1.125: download - view: text, markup, annotated - select for diffs
Sat Jun 29 00:29:12 2002 UTC (9 years, 7 months ago) by alfred
Branches: MAIN
Diff to: previous 1.124: preferred, colored
Changes since revision 1.124: +2 -2 lines
More caddr_t removal.
Change struct knote's kn_hook from caddr_t to void *.

Revision 1.124: download - view: text, markup, annotated - select for diffs
Wed Jun 26 03:34:48 2002 UTC (9 years, 7 months ago) by ken
Branches: MAIN
Diff to: previous 1.123: preferred, colored
Changes since revision 1.123: +102 -0 lines
At long last, commit the zero copy sockets code.

MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.

Revision 1.123: download - view: text, markup, annotated - select for diffs
Thu Jun 20 18:52:54 2002 UTC (9 years, 7 months ago) by alfred
Branches: MAIN
Diff to: previous 1.122: preferred, colored
Changes since revision 1.122: +2 -0 lines
Implement SO_NOSIGPIPE option for sockets.  This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin

Revision 1.122: download - view: text, markup, annotated - select for diffs
Fri May 31 11:52:30 2002 UTC (9 years, 8 months ago) by tanimura
Branches: MAIN
Diff to: previous 1.121: preferred, colored
Changes since revision 1.121: +31 -177 lines
Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by:	hsu

Revision 1.68.2.16.2.2: download - view: text, markup, annotated - select for diffs
Tue May 28 19:52:45 2002 UTC (9 years, 8 months ago) by nectar
Branches: RELENG_4_4
Diff to: previous 1.68.2.16.2.1: preferred, colored; branchpoint 1.68.2.16: preferred, colored; next MAIN 1.68.2.17: preferred, colored
Changes since revision 1.68.2.16.2.1: +2 -1 lines
Back out previous commit.  The bug which it was intended to address
is a result of interaction with the syncache, but the latter does
not exist on this branch.

Reported by:	silby

Revision 1.68.2.16.2.1: download - view: text, markup, annotated - select for diffs
Tue May 28 18:28:31 2002 UTC (9 years, 8 months ago) by nectar
Branches: RELENG_4_4
Diff to: previous 1.68.2.16: preferred, colored
Changes since revision 1.68.2.16: +2 -3 lines
MFC src/sys/kern/uipc_socket.c rev 1.116
    src/sys/kern/uipc_socket2.c rev 1.87, 1.94

Make sure that sockets undergoing accept filtering are aborted in a
LRU fashion when the listen queue fills up.

Revision 1.68.2.17.2.1: download - view: text, markup, annotated - select for diffs
Tue May 28 18:27:55 2002 UTC (9 years, 8 months ago) by nectar
Branches: RELENG_4_5
Diff to: previous 1.68.2.17: preferred, colored; next MAIN 1.68.2.18: preferred, colored
Changes since revision 1.68.2.17: +1 -2 lines
MFC src/sys/kern/uipc_socket.c rev 1.116
    src/sys/kern/uipc_socket2.c rev 1.87, 1.94

Make sure that sockets undergoing accept filtering are aborted in a
LRU fashion when the listen queue fills up.

Revision 1.121: download - view: text, markup, annotated - select for diffs
Tue May 21 21:30:44 2002 UTC (9 years, 8 months ago) by arr
Branches: MAIN
Diff to: previous 1.120: preferred, colored
Changes since revision 1.120: +2 -2 lines
- td will never be NULL, so the call to soalloc() in socreate() will always
  be passed a 1; we can, however, use M_NOWAIT to indicate this.
- Check so against NULL since it's a pointer to a structure.

Revision 1.120: download - view: text, markup, annotated - select for diffs
Tue May 21 21:18:41 2002 UTC (9 years, 8 months ago) by arr
Branches: MAIN
Diff to: previous 1.119: preferred, colored
Changes since revision 1.119: +1 -2 lines
- OR the flag variable with M_ZERO so that the uma_zalloc() handles the
  zero'ing out of the allocated memory.  Also removed the logical bzero
  that followed.

Revision 1.119: download - view: text, markup, annotated - select for diffs
Mon May 20 05:41:03 2002 UTC (9 years, 8 months ago) by tanimura
Branches: MAIN
Diff to: previous 1.118: preferred, colored
Changes since revision 1.118: +177 -31 lines
Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
  socket buffer. The mutex in the receive buffer also protects the data
  in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

  - so_count
  - so_options
  - so_linger
  - so_state

o Remove *_locked() socket APIs.  Make the following socket APIs
  touching the members above now require a locked socket:

 - sodisconnect()
 - soisconnected()
 - soisconnecting()
 - soisdisconnected()
 - soisdisconnecting()
 - sofree()
 - soref()
 - sorele()
 - sorwakeup()
 - sotryfree()
 - sowakeup()
 - sowwakeup()

Reviewed by:	alfred

Revision 1.118: download - view: text, markup, annotated - select for diffs
Mon May 6 19:31:28 2002 UTC (9 years, 9 months ago) by alfred
Branches: MAIN
Diff to: previous 1.117: preferred, colored
Changes since revision 1.117: +1 -1 lines
Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp.  We need
to check the process for P_WEXIT to see if it's exiting.  Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.

Revision 1.117: download - view: text, markup, annotated - select for diffs
Wed May 1 20:44:44 2002 UTC (9 years, 9 months ago) by alfred
Branches: MAIN
Diff to: previous 1.116: preferred, colored
Changes since revision 1.116: +1 -1 lines
Redo the sigio locking.

Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.

Revision 1.68.2.21: download - view: text, markup, annotated - select for diffs
Wed May 1 03:27:35 2002 UTC (9 years, 9 months ago) by silby
Branches: RELENG_4
CVS tags: RELENG_4_7_BP, RELENG_4_7_0_RELEASE, RELENG_4_7, RELENG_4_6_BP, RELENG_4_6_2_RELEASE, RELENG_4_6_1_RELEASE, RELENG_4_6_0_RELEASE, RELENG_4_6
Diff to: previous 1.68.2.20: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.20: +1 -2 lines
MFC:

  Make sure that sockets undergoing accept filtering are aborted in a
  LRU fashion when the listen queue fills up.  Previously, there was
  no mechanism to kick out old sockets, leading to an easy DoS of
  daemons using accept filtering.

  Revision  Changes    Path
  1.116     +1 -2      src/sys/kern/uipc_socket.c
  1.87      +7 -1      src/sys/kern/uipc_socket2.c

Revision 1.68.2.20: download - view: text, markup, annotated - select for diffs
Sun Apr 28 21:38:13 2002 UTC (9 years, 9 months ago) by hsu
Branches: RELENG_4
Diff to: previous 1.68.2.19: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.19: +1 -2 lines
MFC: shave 4 bytes off struct socket

Revision 1.116: download - view: text, markup, annotated - select for diffs
Fri Apr 26 02:07:46 2002 UTC (9 years, 9 months ago) by silby
Branches: MAIN
Diff to: previous 1.115: preferred, colored
Changes since revision 1.115: +1 -2 lines
Make sure that sockets undergoing accept filtering are aborted in a
LRU fashion when the listen queue fills up.  Previously, there was
no mechanism to kick out old sockets, leading to an easy DoS of
daemons using accept filtering.

Reviewed by:	alfred
MFC after:	3 days

Revision 1.115: download - view: text, markup, annotated - select for diffs
Mon Apr 8 03:04:22 2002 UTC (9 years, 10 months ago) by hsu
Branches: MAIN
Diff to: previous 1.114: preferred, colored
Changes since revision 1.114: +1 -2 lines
There's only one socket zone so we don't need to remember it
in every socket structure.

Revision 1.114: download - view: text, markup, annotated - select for diffs
Wed Mar 20 21:23:26 2002 UTC (9 years, 10 months ago) by jeff
Branches: MAIN
Diff to: previous 1.113: preferred, colored
Changes since revision 1.113: +7 -2 lines
UMA permited us to utilize the 'waitok' flag to soalloc.

Revision 1.113: download - view: text, markup, annotated - select for diffs
Wed Mar 20 04:09:58 2002 UTC (9 years, 10 months ago) by jeff
Branches: MAIN
Diff to: previous 1.112: preferred, colored
Changes since revision 1.112: +4 -4 lines
Remove references to vm_zone.h and switch over to the new uma API.

Also, remove maxsockets.  If you look carefully you'll notice that the old
zone allocator never honored this anyway.

Revision 1.112: download - view: text, markup, annotated - select for diffs
Tue Mar 19 09:11:47 2002 UTC (9 years, 10 months ago) by jeff
Branches: MAIN
Diff to: previous 1.111: preferred, colored
Changes since revision 1.111: +1 -1 lines
This is the first part of the new kernel memory allocator.  This replaces
malloc(9) and vm_zone with a slab like allocator.

Reviewed by:	arch@

Revision 1.68.2.19: download - view: text, markup, annotated - select for diffs
Wed Mar 6 00:37:08 2002 UTC (9 years, 11 months ago) by dillon
Branches: RELENG_4
Diff to: previous 1.68.2.18: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.18: +1 -1 lines
MFC 1.111 - enforce the same socket buffer limits for kernel tasks as for
user tasks to avoid mbuf exhaustion.

Revision 1.111: download - view: text, markup, annotated - select for diffs
Thu Feb 28 11:22:40 2002 UTC (9 years, 11 months ago) by iedowse
Branches: MAIN
Diff to: previous 1.110: preferred, colored
Changes since revision 1.110: +1 -1 lines
In sosend(), enforce the socket buffer limits regardless of whether
the data was supplied as a uio or an mbuf. Previously the limit was
ignored for mbuf data, and NFS could run the kernel out of mbufs
when an ipfw rule blocked retransmissions.

Revision 1.110: download - view: text, markup, annotated - select for diffs
Wed Feb 27 18:32:12 2002 UTC (9 years, 11 months ago) by jhb
Branches: MAIN
Diff to: previous 1.109: preferred, colored
Changes since revision 1.109: +1 -1 lines
Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.

Revision 1.68.2.18: download - view: text, markup, annotated - select for diffs
Wed Feb 13 00:43:10 2002 UTC (9 years, 11 months ago) by dillon
Branches: RELENG_4
Diff to: previous 1.68.2.17: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.17: +3 -4 lines
MFC remove the MFREE() mbuf macro and cleanup twists in related code.

Revision 1.109: download - view: text, markup, annotated - select for diffs
Tue Feb 5 02:00:53 2002 UTC (10 years ago) by dillon
Branches: MAIN
Diff to: previous 1.108: preferred, colored
Changes since revision 1.108: +2 -2 lines
Get rid of the twisted MFREE() macro entirely.

Reviewed by:	dg, bmilekic
MFC after:	3 days

Revision 1.108: download - view: text, markup, annotated - select for diffs
Mon Jan 14 22:03:48 2002 UTC (10 years ago) by alfred
Branches: MAIN
Diff to: previous 1.107: preferred, colored
Changes since revision 1.107: +8 -1 lines
Fix select on fifos.

Backout revision 1.56 and 1.57 of fifo_vnops.c.

Introduce a new poll op "POLLINIGNEOF" that can be used to ignore
EOF on a fifo, POLLIN/POLLRDNORM is converted to POLLINIGNEOF within
the FIFO implementation to effect the correct behavior.

This should allow one to view a fifo pretty much as a data source
rather than worry about connections coming and going.

Reviewed by: bde

Revision 1.107: download - view: text, markup, annotated - select for diffs
Mon Dec 31 17:45:15 2001 UTC (10 years, 1 month ago) by rwatson
Branches: MAIN
Diff to: previous 1.106: preferred, colored
Changes since revision 1.106: +3 -2 lines
o Make the credential used by socreate() an explicit argument to
  socreate(), rather than getting it implicitly from the thread
  argument.

o Make NFS cache the credential provided at mount-time, and use
  the cached credential (nfsmount->nm_cred) when making calls to
  socreate() on initially connecting, or reconnecting the socket.

This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well
as bugs involving NFS and mandatory access control implementations.

Reviewed by:	freebsd-arch

Revision 1.68.2.17: download - view: text, markup, annotated - select for diffs
Sat Dec 1 21:32:42 2001 UTC (10 years, 2 months ago) by dillon
Branches: RELENG_4
CVS tags: RELENG_4_5_BP, RELENG_4_5_0_RELEASE
Branch point for: RELENG_4_5
Diff to: previous 1.68.2.16: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.16: +9 -1 lines
Fix a bug in the MSG_WAITALL code that could cause long 5-second stalls
during heavy data transfers (only effects a small number of programs,
but samba is one of them).

This is an MFC of 1.95 which for some reason was not MFC'd back in march.

Revision 1.106: download - view: text, markup, annotated - select for diffs
Sat Nov 17 03:07:07 2001 UTC (10 years, 2 months ago) by dillon
Branches: MAIN
Diff to: previous 1.105: preferred, colored
Changes since revision 1.105: +27 -6 lines
Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.

Revision 1.105: download - view: text, markup, annotated - select for diffs
Mon Nov 12 20:51:40 2001 UTC (10 years, 2 months ago) by keramida
Branches: MAIN
Diff to: previous 1.104: preferred, colored
Changes since revision 1.104: +8 -8 lines
Remove EOL whitespace.

Reviewed by:	alfred

Revision 1.104: download - view: text, markup, annotated - select for diffs
Mon Nov 12 20:50:06 2001 UTC (10 years, 2 months ago) by keramida
Branches: MAIN
Diff to: previous 1.103: preferred, colored
Changes since revision 1.103: +4 -3 lines
Make KASSERT's print the values that triggered a panic.

Reviewed by:	alfred

Revision 1.103: download - view: text, markup, annotated - select for diffs
Thu Oct 11 23:38:15 2001 UTC (10 years, 4 months ago) by jhb
Branches: MAIN
Diff to: previous 1.102: preferred, colored
Changes since revision 1.102: +2 -3 lines
Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.

Revision 1.102: download - view: text, markup, annotated - select for diffs
Tue Oct 9 21:40:30 2001 UTC (10 years, 4 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.101: preferred, colored
Changes since revision 1.101: +1 -20 lines
- Combine kern.ps_showallprocs and kern.ipc.showallsockets into
  a single kern.security.seeotheruids_permitted, describes as:
  "Unprivileged processes may see subjects/objects with different real uid"
  NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is
  an API change.  kern.ipc.showallsockets does not.
- Check kern.security.seeotheruids_permitted in cr_cansee().
- Replace visibility calls to socheckuid() with cr_cansee() (retain
  the change to socheckuid() in ipfw, where it is used for rule-matching).
- Remove prison_unpcb() and make use of cr_cansee() against the UNIX
  domain socket credential instead of comparing root vnodes for the
  UDS and the process.  This allows multiple jails to share the same
  chroot() and not see each others UNIX domain sockets.
- Remove unused socheckproc().

Now that cr_cansee() is used universally for socket visibility, a variety
of policies are more consistently enforced, including uid-based
restrictions and jail-based restrictions.  This also better-supports
the introduction of additional MAC models.

Reviewed by:	ps, billf
Obtained from:	TrustedBSD Project

Revision 1.101: download - view: text, markup, annotated - select for diffs
Fri Oct 5 07:06:21 2001 UTC (10 years, 4 months ago) by ps
Branches: MAIN
Diff to: previous 1.100: preferred, colored
Changes since revision 1.100: +31 -1 lines
Only allow users to see their own socket connections if
kern.ipc.showallsockets is set to 0.

Submitted by:	billf (with modifications by me)
Inspired by:	Dave McKay (aka pm aka Packet Magnet)
Reviewed by:	peter
MFC after:	2 weeks

Revision 1.100: download - view: text, markup, annotated - select for diffs
Thu Oct 4 13:11:47 2001 UTC (10 years, 4 months ago) by dwmalone
Branches: MAIN
Diff to: previous 1.99: preferred, colored
Changes since revision 1.99: +14 -15 lines
Hopefully improve control message passing over Unix domain sockets.

1) Allow the sending of more than one control message at a time
over a unix domain socket. This should cover the PR 29499.

2) This requires that unp_{ex,in}ternalize and unp_scan understand
mbufs with more than one control message at a time.

3) Internalize and externalize used to work on the mbuf in-place.
This made life quite complicated and the code for sizeof(int) <
sizeof(file *) could end up doing the wrong thing. The patch always
create a new mbuf/cluster now. This resulted in the change of the
prototype for the domain externalise function.

4) You can now send SCM_TIMESTAMP messages.

5) Always use CMSG_DATA(cm) to determine the start where the data
in unp_{ex,in}ternalize. It was using ((struct cmsghdr *)cm + 1)
in some places, which gives the wrong alignment on the alpha.
(NetBSD made this fix some time ago).

This results in an ABI change for discriptor passing and creds
passing on the alpha. (Probably on the IA64 and Spare ports too).

6) Fix userland programs to use CMSG_* macros too.

7) Be more careful about freeing mbufs containing (file *)s.
This is made possible by the prototype change of externalise.

PR:		29499
MFC after:	6 weeks

Revision 1.99: download - view: text, markup, annotated - select for diffs
Fri Sep 21 22:46:53 2001 UTC (10 years, 4 months ago) by jhb
Branches: MAIN
Diff to: previous 1.98: preferred, colored
Changes since revision 1.98: +3 -3 lines
Use the passed in thread to selrecord() instead of curthread.

Revision 1.98: download - view: text, markup, annotated - select for diffs
Wed Sep 12 08:37:46 2001 UTC (10 years, 5 months ago) by julian
Branches: MAIN
CVS tags: KSE_MILESTONE_2
Diff to: previous 1.97: preferred, colored
Changes since revision 1.97: +35 -35 lines
KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after:    ha ha ha ha

Revision 1.68.2.16: download - view: text, markup, annotated - select for diffs
Thu Jun 14 20:46:06 2001 UTC (10 years, 7 months ago) by ume
Branches: RELENG_4
CVS tags: RELENG_4_4_BP, RELENG_4_4_0_RELEASE
Branch point for: RELENG_4_4
Diff to: previous 1.68.2.15: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.15: +2 -2 lines
unbreak kernel without option INET
maybe, MFC failure in 1.68.2.11

Revision 1.97: download - view: text, markup, annotated - select for diffs
Tue May 1 08:12:58 2001 UTC (10 years, 9 months ago) by markm
Branches: MAIN
CVS tags: KSE_PRE_MILESTONE_2
Diff to: previous 1.96: preferred, colored
Changes since revision 1.96: +4 -1 lines
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by:	bde (with reservations)

Revision 1.96: download - view: text, markup, annotated - select for diffs
Fri Apr 27 13:42:50 2001 UTC (10 years, 9 months ago) by alfred
Branches: MAIN
Diff to: previous 1.95: preferred, colored
Changes since revision 1.95: +4 -2 lines
Actually show the values that tripped the assertion "receive 1"

Revision 1.95: download - view: text, markup, annotated - select for diffs
Fri Mar 16 22:37:06 2001 UTC (10 years, 10 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.94: preferred, colored
Changes since revision 1.94: +7 -1 lines
When doing a recv(.. MSG_WAITALL) for a message which is larger than
the socket buffer size, the receive is done in sections.  After completing
a read, call pru_rcvd on the underlying protocol before blocking again.

This allows the the protocol to take appropriate action, such as
sending a TCP window update to the peer, if the window happened to
close because the socket buffer was filled.  If the protocol is not
notified, a TCP transfer may stall until the remote end sends a window
probe.

Revision 1.68.2.15: download - view: text, markup, annotated - select for diffs
Fri Mar 9 16:41:20 2001 UTC (10 years, 11 months ago) by jlemon
Branches: RELENG_4
CVS tags: RELENG_4_3_BP, RELENG_4_3_0_RELEASE, RELENG_4_3
Diff to: previous 1.68.2.14: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.14: +2 -5 lines
MFC: let protocol layer decide how to handle accept() on a disconnected socket.

Revision 1.94: download - view: text, markup, annotated - select for diffs
Fri Mar 9 08:16:40 2001 UTC (10 years, 11 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.93: preferred, colored
Changes since revision 1.93: +2 -5 lines
Push the test for a disconnected socket when accept()ing down to the
protocol layer.  Not all protocols behave identically.  This fixes the
brokenness observed with unix-domain sockets (and postfix)

Revision 1.68.2.14: download - view: text, markup, annotated - select for diffs
Wed Mar 7 08:43:45 2001 UTC (10 years, 11 months ago) by ru
Branches: RELENG_4
Diff to: previous 1.68.2.13: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.13: +6 -4 lines
MFC: In soshutdown(), use SHUT_* constants instead of FREAD and FWRITE.
     Also, return EINVAL if `how' is invalid, as required by POSIX.

Approved by:	jkh

Revision 1.93: download - view: text, markup, annotated - select for diffs
Tue Feb 27 13:48:07 2001 UTC (10 years, 11 months ago) by ru
Branches: MAIN
Diff to: previous 1.92: preferred, colored
Changes since revision 1.92: +6 -4 lines
In soshutdown(), use SHUT_{RD,WR,RDWR} instead of FREAD and FWRITE.
Also, return EINVAL if `how' is invalid, as required by POSIX spec.

Revision 1.68.2.13: download - view: text, markup, annotated - select for diffs
Mon Feb 26 04:23:16 2001 UTC (10 years, 11 months ago) by jlemon
Branches: RELENG_4
Diff to: previous 1.68.2.12: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.12: +35 -28 lines
MFC: sync kq up to current (extend to device layer, plus other fixes)

Revision 1.68.2.12: download - view: text, markup, annotated - select for diffs
Sat Feb 24 18:37:25 2001 UTC (10 years, 11 months ago) by jlemon
Branches: RELENG_4
Diff to: previous 1.68.2.11: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.11: +3 -6 lines
MFC: have accept return ECONNABORTED for sockets which are now gone.

Revision 1.92: download - view: text, markup, annotated - select for diffs
Sat Feb 24 01:41:31 2001 UTC (10 years, 11 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.91: preferred, colored
Changes since revision 1.91: +5 -1 lines
Introduce a NOTE_LOWAT flag for use with the read/write filters, which
allow the watermark to be passed in via the data field during the EV_ADD
operation.

Hook this up to the socket read/write filters; if specified, it overrides
the so_{rcv|snd}.sb_lowat values in the filter.

Inspired by: "Ronald F. Guilmette" <rfg@monkeys.com>

Revision 1.91: download - view: text, markup, annotated - select for diffs
Sat Feb 24 01:33:12 2001 UTC (10 years, 11 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.90: preferred, colored
Changes since revision 1.90: +3 -1 lines
When returning EV_EOF for the socket read/write filters, also return
the current socket error in fflags.  This may be useful for determining
why a connect() request fails.

Inspired by:  "Jonathan Graehl" <jonathan@graehl.org>

Revision 1.90: download - view: text, markup, annotated - select for diffs
Wed Feb 21 06:39:55 2001 UTC (10 years, 11 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.89: preferred, colored
Changes since revision 1.89: +2 -2 lines
o Move per-process jail pointer (p->pr_prison) to inside of the subject
  credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project

Revision 1.89: download - view: text, markup, annotated - select for diffs
Thu Feb 15 16:34:07 2001 UTC (10 years, 11 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.88: preferred, colored
Changes since revision 1.88: +29 -28 lines
Extend kqueue down to the device layer.

Backwards compatible approach suggested by: peter

Revision 1.88: download - view: text, markup, annotated - select for diffs
Wed Feb 14 02:09:11 2001 UTC (10 years, 11 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.87: preferred, colored
Changes since revision 1.87: +3 -6 lines
Return ECONNABORTED from accept if connection is closed while on the
listen queue, as well as the current behavior of a zero-length sockaddr.

Obtained from: KAME
Reviewed by: -net

Revision 1.87: download - view: text, markup, annotated - select for diffs
Sun Jan 21 22:23:10 2001 UTC (11 years ago) by des
Branches: MAIN
Diff to: previous 1.86: preferred, colored
Changes since revision 1.86: +3 -3 lines
First step towards an MP-safe zone allocator:
 - have zalloc() and zfree() always lock the vm_zone.
 - remove zalloci() and zfreei(), which are now redundant.

Reviewed by:	bmilekic, jasone

Revision 1.68.2.11: download - view: text, markup, annotated - select for diffs
Fri Dec 22 10:25:21 2000 UTC (11 years, 1 month ago) by alfred
Branches: RELENG_4
Diff to: previous 1.68.2.10: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.10: +18 -7 lines
MFC: unbreak kernel without option INET by making acceptfilter support optional

Revision 1.86: download - view: text, markup, annotated - select for diffs
Thu Dec 21 21:43:24 2000 UTC (11 years, 1 month ago) by bmilekic
Branches: MAIN
Diff to: previous 1.85: preferred, colored
Changes since revision 1.85: +10 -10 lines
* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
  This is because calls with M_WAIT (now M_TRYWAIT) may not wait
  forever when nothing is available for allocation, and may end up
  returning NULL. Hopefully we now communicate more of the right thing
  to developers and make it very clear that it's necessary to check whether
  calls with M_(TRY)WAIT also resulted in a failed allocation.
  M_TRYWAIT basically means "try harder, block if necessary, but don't
  necessarily wait forever." The time spent blocking is tunable with
  the kern.ipc.mbuf_wait sysctl.
  M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
  malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
  value of the M_WAIT flag, this could have became a big problem.

Revision 1.85: download - view: text, markup, annotated - select for diffs
Fri Dec 8 21:50:37 2000 UTC (11 years, 2 months ago) by dwmalone
Branches: MAIN
Diff to: previous 1.84: preferred, colored
Changes since revision 1.84: +3 -5 lines
Convert more malloc+bzero to malloc+M_ZERO.

Submitted by:	josh@zipperup.org
Submitted by:	Robert Drehmel <robd@gmx.net>

Revision 1.84: download - view: text, markup, annotated - select for diffs
Mon Nov 20 01:35:25 2000 UTC (11 years, 2 months ago) by alfred
Branches: MAIN
Diff to: previous 1.83: preferred, colored
Changes since revision 1.83: +20 -7 lines
Accept filters broke kernels compiled without options INET.
Make accept filters conditional on INET support to fix.

Pointed out by: bde
Tested and assisted by: Stephen J. Kiernan <sab@vegamuse.org>

Revision 1.68.2.10: download - view: text, markup, annotated - select for diffs
Fri Nov 17 19:47:27 2000 UTC (11 years, 2 months ago) by jkh
Branches: RELENG_4
CVS tags: RELENG_4_2_0_RELEASE
Diff to: previous 1.68.2.9: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.9: +4 -3 lines
MFC: check to see if prp is null before dereferencing it.

Revision 1.68.2.9: download - view: text, markup, annotated - select for diffs
Sun Oct 29 19:25:38 2000 UTC (11 years, 3 months ago) by rwatson
Branches: RELENG_4
Diff to: previous 1.68.2.8: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.8: +10 -1 lines
MFC of jail fixups:

  1.7       +9 -2      src/sys/kern/kern_jail.c
  1.73      +10 -1     src/sys/kern/uipc_socket.c
  1.9       +2 -1      src/sys/sys/jail.h

For reference:

  o Modify jail to limit creation of sockets to UNIX domain sockets,
    TCP/IP (v4) sockets, and routing sockets.  Previously, interaction
    with IPv6 was not well-defined, and might be inappropriate for some
    environments.  Similarly, sysctl MIB entries providing interface
    information also give out only addresses from those protocol domains.

    For the time being, this functionality is enabled by default, and
    toggleable using the sysctl variable jail.socket_unixiproute_only.
    In the future, protocol domains will be able to determine whether or
    not they are ``jail aware''.

Revision 1.68.2.8: download - view: text, markup, annotated - select for diffs
Thu Sep 28 04:51:58 2000 UTC (11 years, 4 months ago) by jlemon
Branches: RELENG_4
Diff to: previous 1.68.2.7: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.7: +6 -2 lines
MFC: kqueue fixes - 1.80 (honor lowat value), 1.83 (return on udp error)

Revision 1.83: download - view: text, markup, annotated - select for diffs
Thu Sep 28 04:41:22 2000 UTC (11 years, 4 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.82: preferred, colored
Changes since revision 1.82: +5 -1 lines
Check so_error in filt_so{read|write} in order to detect UDP errors.

PR: 21601

Revision 1.68.2.7: download - view: text, markup, annotated - select for diffs
Thu Sep 7 19:13:37 2000 UTC (11 years, 5 months ago) by truckman
Branches: RELENG_4
CVS tags: RELENG_4_1_1_RELEASE
Diff to: previous 1.68.2.6: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.6: +3 -3 lines
MFC:

Remove hash table lookups and maintenance from chgproccnt() and chgsbsize()
and chase pointers stored in pcred and ucred instead for better performance
and to avoid these operations in interrupt context which could possibly
cause panics.

Because the pcred and ucred structures changed size, libkvm and friends
will need to be rebuilt.

Revision 1.82: download - view: text, markup, annotated - select for diffs
Tue Sep 5 22:10:24 2000 UTC (11 years, 5 months ago) by truckman
Branches: MAIN
CVS tags: PRE_SMPNG
Diff to: previous 1.81: preferred, colored
Changes since revision 1.81: +3 -3 lines
Remove uidinfo hash table lookup and maintenance out of chgproccnt() and
chgsbsize(), which are called rather frequently and may be called from an
interrupt context in the case of chgsbsize().  Instead, do the hash table
lookup and maintenance when credentials are changed, which is a lot less
frequent.  Add pointers to the uidinfo structures to the ucred and pcred
structures for fast access.  Pass a pointer to the credential to chgproccnt()
and chgsbsize() instead of passing the uid.  Add a reference count to the
uidinfo structure and use it to decide when to free the structure rather
than freeing the structure when the resource consumption drops to zero.
Move the resource tracking code from kern_proc.c to kern_resource.c.  Move
some duplicate code sequences in kern_prot.c to separate helper functions.
Change KASSERTs in this code to unconditional tests and calls to panic().

Revision 1.68.2.6: download - view: text, markup, annotated - select for diffs
Wed Aug 30 00:20:47 2000 UTC (11 years, 5 months ago) by green
Branches: RELENG_4
Diff to: previous 1.68.2.5: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.5: +3 -3 lines
MFC: chgsbsize() API change (it modifies the passed &sb_hiwat itself under
     an splnet() 'lock')

Alfred initally came to the conclusion this was another race condition.

Revision 1.68.2.5: download - view: text, markup, annotated - select for diffs
Wed Aug 30 00:01:44 2000 UTC (11 years, 5 months ago) by green
Branches: RELENG_4
Diff to: previous 1.68.2.4: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.4: +3 -3 lines
MFC: uidinfo race fixes

Revision 1.81: download - view: text, markup, annotated - select for diffs
Tue Aug 29 11:28:02 2000 UTC (11 years, 5 months ago) by green
Branches: MAIN
Diff to: previous 1.80: preferred, colored
Changes since revision 1.80: +3 -3 lines
Remove any possibility of hiwat-related race conditions by changing
the chgsbsize() call to use a "subject" pointer (&sb.sb_hiwat) and
a u_long target to set it to.  The whole thing is splnet().

This fixes a problem that jdp has been able to provoke.

Revision 1.80: download - view: text, markup, annotated - select for diffs
Mon Aug 7 17:52:08 2000 UTC (11 years, 6 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.79: preferred, colored
Changes since revision 1.79: +2 -2 lines
Make the kqueue socket read filter honor the SO_RCVLOWAT value.

Spotted by:  "Steve M." <stevem@redlinenetworks.com>

Revision 1.68.2.4: download - view: text, markup, annotated - select for diffs
Fri Jul 28 04:03:46 2000 UTC (11 years, 6 months ago) by alfred
Branches: RELENG_4
Diff to: previous 1.68.2.3: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.3: +110 -1 lines
MFC: accept_filters

Revision 1.68.2.3: download - view: text, markup, annotated - select for diffs
Mon Jul 24 05:15:22 2000 UTC (11 years, 6 months ago) by jlemon
Branches: RELENG_4
CVS tags: RELENG_4_1_0_RELEASE
Diff to: previous 1.68.2.2: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.2: +5 -1 lines
When shutdown(s, SHUT_RD) is called, it calls sorflush(), which uses
bzero to wipe out the socket buffer.  However, this is not the right
thing to do, as some state may need to be kept (SB_SEL, SB_AIO, SB_KNOTE).
In particular, erasing the si_note entry will cause a panic in
filt_sordetach() later, when the system attempts to remove the knote
from the SLIST.

Add a workaround which leaves the si_note field alone.  The real fix
is to only clear out those fields which need to be reset, rather than
using bzero() to swat everything.  However, this close to -release, a
minimal fix was chosen in order to keep changes to a minimum.

Reviewed by:	dillon, wollman (in concept)
Approved by:	jkh

Revision 1.79: download - view: text, markup, annotated - select for diffs
Thu Jul 20 12:17:17 2000 UTC (11 years, 6 months ago) by alfred
Branches: MAIN
Diff to: previous 1.78: preferred, colored
Changes since revision 1.78: +9 -1 lines
only allow accept filter modifications on listening sockets

Submitted by: ps

Revision 1.78: download - view: text, markup, annotated - select for diffs
Thu Jun 22 22:27:15 2000 UTC (11 years, 7 months ago) by alfred
Branches: MAIN
Diff to: previous 1.77: preferred, colored
Changes since revision 1.77: +3 -3 lines
fix races in the uidinfo subsystem, several problems existed:

1) while allocating a uidinfo struct malloc is called with M_WAITOK,
   it's possible that while asleep another process by the same user
   could have woken up earlier and inserted an entry into the uid
   hash table.  Having redundant entries causes inconsistancies that
   we can't handle.

   fix: do a non-waiting malloc, and if that fails then do a blocking
   malloc, after waking up check that no one else has inserted an entry
   for us already.

2) Because many checks for sbsize were done as "test then set" in a non
   atomic manner it was possible to exceed the limits put up via races.

   fix: instead of querying the count then setting, we just attempt to
   set the count and leave it up to the function to return success or
   failure.

3) The uidinfo code was inlining and repeating, lookups and insertions
   and deletions needed to be in their own functions for clarity.

Reviewed by: green

Revision 1.77: download - view: text, markup, annotated - select for diffs
Tue Jun 20 01:09:21 2000 UTC (11 years, 7 months ago) by alfred
Branches: MAIN
Diff to: previous 1.76: preferred, colored
Changes since revision 1.76: +102 -1 lines
return of the accept filter part II

accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg

Revision 1.76: download - view: text, markup, annotated - select for diffs
Sun Jun 18 08:49:13 2000 UTC (11 years, 7 months ago) by alfred
Branches: MAIN
Diff to: previous 1.75: preferred, colored
Changes since revision 1.75: +1 -5 lines
backout accept optimizations.

Requested by: jmg, dcs, jdp, nate

Revision 1.75: download - view: text, markup, annotated - select for diffs
Thu Jun 15 18:18:42 2000 UTC (11 years, 7 months ago) by alfred
Branches: MAIN
Diff to: previous 1.74: preferred, colored
Changes since revision 1.74: +5 -1 lines
add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()
until the incoming connection has either data waiting or what looks like a
HTTP request header already in the socketbuffer.  This ought to reduce
the context switch time and overhead for processing requests.

The initial idea and code for HTTPACCEPT came from Yahoo engineers and has
been cleaned up and a more lightweight DELAYACCEPT for non-http servers
has been added

Reviewed by: silence on hackers.

Revision 1.74: download - view: text, markup, annotated - select for diffs
Tue Jun 13 15:44:04 2000 UTC (11 years, 8 months ago) by asmodai
Branches: MAIN
Diff to: previous 1.73: preferred, colored
Changes since revision 1.73: +4 -3 lines
Fix panic by moving the prp == 0 check up the order of sanity checks.

Submitted by:	Bart Thate <freebsd@1st.dudi.org> on -current
Approved by:	rwatson

Revision 1.51.2.6: download - view: text, markup, annotated - select for diffs
Sat Jun 10 17:44:56 2000 UTC (11 years, 8 months ago) by jlemon
Branches: RELENG_3
CVS tags: RELENG_3_5_0_RELEASE
Diff to: previous 1.51.2.5: preferred, colored; branchpoint 1.51: preferred, colored; next MAIN 1.52: preferred, colored
Changes since revision 1.51.2.5: +19 -3 lines
MFC: mbuf wait code, as well as some other mbuf related bugfixes.
Submitted by: Mike Silbersack <silby@silby.com>

Revision 1.73: download - view: text, markup, annotated - select for diffs
Sun Jun 4 04:28:31 2000 UTC (11 years, 8 months ago) by rwatson
Branches: MAIN
Diff to: previous 1.72: preferred, colored
Changes since revision 1.72: +10 -1 lines
o Modify jail to limit creation of sockets to UNIX domain sockets,
  TCP/IP (v4) sockets, and routing sockets.  Previously, interaction
  with IPv6 was not well-defined, and might be inappropriate for some
  environments.  Similarly, sysctl MIB entries providing interface
  information also give out only addresses from those protocol domains.

  For the time being, this functionality is enabled by default, and
  toggleable using the sysctl variable jail.socket_unixiproute_only.
  In the future, protocol domains will be able to determine whether or
  not they are ``jail aware''.

o Further limitations on process use of getpriority() and setpriority()
  by jailed processes.  Addresses problem described in kern/17878.

Reviewed by:	phk, jmg

Revision 1.72: download - view: text, markup, annotated - select for diffs
Fri May 26 02:04:39 2000 UTC (11 years, 8 months ago) by jake
Branches: MAIN
Diff to: previous 1.71: preferred, colored
Changes since revision 1.71: +3 -3 lines
Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others

Revision 1.71: download - view: text, markup, annotated - select for diffs
Tue May 23 20:37:17 2000 UTC (11 years, 8 months ago) by jake
Branches: MAIN
Diff to: previous 1.70: preferred, colored
Changes since revision 1.70: +3 -3 lines
Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd

Revision 1.68.2.2: download - view: text, markup, annotated - select for diffs
Fri May 5 03:49:57 2000 UTC (11 years, 9 months ago) by jlemon
Branches: RELENG_4
Diff to: previous 1.68.2.1: preferred, colored; branchpoint 1.68: preferred, colored
Changes since revision 1.68.2.1: +110 -1 lines
MFC: kqueue() and kevent()

Revision 1.70: download - view: text, markup, annotated - select for diffs
Sun Apr 16 18:53:13 2000 UTC (11 years, 9 months ago) by jlemon
Branches: MAIN
Diff to: previous 1.69: preferred, colored
Changes since revision 1.69: +110 -1 lines
Introduce kqueue() and kevent(), a kernel event notification facility.

Revision 1.68.2.1: download - view: text, markup, annotated - select for diffs
Sat Mar 18 08:58:13 2000 UTC (11 years, 10 months ago) by fenner
Branches: RELENG_4
CVS tags: RELENG_4_0_0_RELEASE
Diff to: previous 1.68: preferred, colored
Changes since revision 1.68: +8 -2 lines
MFC 1.59: free the socket in soabort() if the protocol couldn't.

Revision 1.69: download - view: text, markup, annotated - select for diffs
Sat Mar 18 08:56:56 2000 UTC (11 years, 10 months ago) by fenner
Branches: MAIN
Diff to: previous 1.68: preferred, colored
Changes since revision 1.68: +8 -2 lines
Make sure to free the socket in soabort() if the protocol couldn't
 free it (this could happen if the protocol already freed its part
 and we just kept the socket around to make sure accept(2) didn't block)

Revision 1.68: download - view: text, markup, annotated - select for diffs
Fri Jan 14 02:53:26 2000 UTC (12 years ago) by jasone
Branches: MAIN
CVS tags: RELENG_4_BP
Branch point for: RELENG_4
Diff to: previous 1.67: preferred, colored
Changes since revision 1.67: +2 -1 lines
Add aio_waitcomplete().  Make aio work correctly for socket descriptors.
Make gratuitous style(9) fixes (me, not the submitter) to make the aio
code more readable.

PR:		kern/12053
Submitted by:	Chris Sedore <cmsedore@maxwell.syr.edu>

Revision 1.67: download - view: text, markup, annotated - select for diffs
Mon Dec 27 06:31:53 1999 UTC (12 years, 1 month ago) by green
Branches: MAIN
Diff to: previous 1.66: preferred, colored
Changes since revision 1.66: +3 -5 lines
Correct an uninitialized variable use, which, unlike most times, is
actually a bug this time.

Submitted by:	bde
Reviewed by:	bde

Revision 1.66: download - view: text, markup, annotated - select for diffs
Sun Dec 12 05:52:49 1999 UTC (12 years, 2 months ago) by green
Branches: MAIN
Diff to: previous 1.65: preferred, colored
Changes since revision 1.65: +13 -1 lines
This is Bosko Milekic's mbuf allocation waiting code.  Basically, this
means that running out of mbuf space isn't a panic anymore, and code
which runs out of network memory will sleep to wait for it.

Submitted by:	Bosko Milekic <bmilekic@dsuper.net>
Reviewed by:	green, wollman

Revision 1.65: download - view: text, markup, annotated - select for diffs
Mon Nov 22 02:44:50 1999 UTC (12 years, 2 months ago) by shin
Branches: MAIN
Diff to: previous 1.64: preferred, colored
Changes since revision 1.64: +113 -2 lines
KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project

Revision 1.64: download - view: text, markup, annotated - select for diffs
Tue Nov 16 10:56:05 1999 UTC (12 years, 2 months ago) by phk
Branches: MAIN
Diff to: previous 1.63: preferred, colored
Changes since revision 1.63: +7 -6 lines
This is a partial commit of the patch from PR 14914:

   Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
   structures for list operations.  This patch makes all list operations
   in sys/kern use the queue(3) macros, rather than directly accessing the
   *Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by:    phk
Submitted by:   Jake Burkholder <jake@checker.org>
PR:     14914

Revision 1.63: download - view: text, markup, annotated - select for diffs
Sat Oct 9 20:42:10 1999 UTC (12 years, 4 months ago) by green
Branches: MAIN
Diff to: previous 1.62: preferred, colored
Changes since revision 1.62: +11 -5 lines
Implement RLIMIT_SBSIZE in the kernel.  This is a per-uid sockbuf total
usage limit.

Revision 1.62: download - view: text, markup, annotated - select for diffs
Sun Sep 19 02:16:19 1999 UTC (12 years, 4 months ago) by green
Branches: MAIN
Diff to: previous 1.61: preferred, colored
Changes since revision 1.61: +5 -9 lines
Change so_cred's type to a ucred, not a pcred.  THis makes more sense, actually.
Make a sonewconn3() which takes an extra argument (proc) so new sockets created
with sonewconn() from a user's system call get the correct credentials, not
just the parent's credentials.

Revision 1.10.4.5: download - view: text, markup, annotated - select for diffs
Sun Sep 5 08:32:37 1999 UTC (12 years, 5 months ago) by peter
Branches: RELENG_2_1_0
Diff to: previous 1.10.4.4: preferred, colored; branchpoint 1.10: preferred, colored; next MAIN 1.11: preferred, colored
Changes since revision 1.10.4.4: +1 -1 lines
$Id$ -> $FreeBSD$

Revision 1.20.2.6: download - view: text, markup, annotated - select for diffs
Sun Sep 5 08:15:33 1999 UTC (12 years, 5 months ago) by peter
Branches: RELENG_2_2
Diff to: previous 1.20.2.5: preferred, colored; branchpoint 1.20: preferred, colored; next MAIN 1.21: preferred, colored
Changes since revision 1.20.2.5: +1 -1 lines
$Id$ -> $FreeBSD$

Revision 1.51.2.5: download - view: text, markup, annotated - select for diffs
Sun Aug 29 16:26:11 1999 UTC (12 years, 5 months ago) by peter
Branches: RELENG_3
CVS tags: RELENG_3_4_0_RELEASE, RELENG_3_3_0_RELEASE
Diff to: previous 1.51.2.4: preferred, colored; branchpoint 1.51: preferred, colored
Changes since revision 1.51.2.4: +1 -1 lines
$Id$ -> $FreeBSD$

Revision 1.51.2.4: download - view: text, markup, annotated - select for diffs
Sun Aug 29 13:09:12 1999 UTC (12 years, 5 months ago) by green
Branches: RELENG_3
Diff to: previous 1.51.2.3: preferred, colored; branchpoint 1.51: preferred, colored
Changes since revision 1.51.2.3: +12 -4 lines
MFC:
   This is the pre-3.3 IPFW megamerge. This brings IPFW almost completely
up to 4.0's. __FreeBSD_version is bumped by this commi. Changes include:
	- per-socket credentials stored
	- ability to get those credentials with sysctl
	- uid- and gid- based filtering in IPFW
	- dynamic logging in IPFW (rules can be set as logging for any number
	  of packets, not just the default, and logging can be reset)

Following this is a commit to pidentd to use 1 and 2.

Revision 1.61: download - view: text, markup, annotated - select for diffs
Sat Aug 28 00:46:22 1999 UTC (12 years, 5 months ago) by peter
Branches: MAIN
Diff to: previous 1.60: preferred, colored
Changes since revision 1.60: +1 -1 lines
$Id$ -> $FreeBSD$

Revision 1.60: download - view: text, markup, annotated - select for diffs
Thu Jun 17 23:54:47 1999 UTC (12 years, 7 months ago) by green
Branches: MAIN
Diff to: previous 1.59: preferred, colored
Changes since revision 1.59: +11 -4 lines
Reviewed by: the cast of thousands

This is the change to struct sockets that gets rid of so_uid and replaces
it with a much more useful struct pcred *so_cred. This is here to be able
to do socket-level credential checks (i.e. IPFW uid/gid support, to be added
to HEAD soon). Along with this comes an update to pidentd which greatly
simplifies the code necessary to get a uid from a socket. Soon to come:
a sysctl() interface to finding individual sockets' credentials.

Revision 1.59: download - view: text, markup, annotated - select for diffs
Fri Jun 4 02:27:02 1999 UTC (12 years, 8 months ago) by peter
Branches: MAIN
Diff to: previous 1.58: preferred, colored
Changes since revision 1.58: +10 -1 lines
Plug a mbuf leak in tcp_usr_send().  pru_send() routines are expected
to either enqueue or free their mbuf chains, but tcp_usr_send() was
dropping them on the floor if the tcpcb/inpcb has been torn down in the
middle of a send/write attempt.  This has been responsible for a wide
variety of mbuf leak patterns, ranging from slow gradual leakage to rather
rapid exhaustion.  This has been a problem since before 2.2 was branched
and appears to have been fixed in rev 1.16 and lost in 1.23/1.28.

Thanks to Jayanth Vijayaraghavan <jayanth@yahoo-inc.com> for checking
(extensively) into this on a live production 2.2.x system and that it
was the actual cause of the leak and looks like it fixes it.  The machine
in question was loosing (from memory) about 150 mbufs per hour under
load and a change similar to this stopped it.  (Don't blame Jayanth
for this patch though)

An alternative approach to this would be to recheck SS_CANTSENDMORE etc
inside the splnet() right before calling pru_send() after all the potential
sleeps, interrupts and delays have happened.  However, this would mean
exposing knowledge of the tcp stack's reset handling and removal of the
pcb to the generic code.  There are other things that call pru_send()
directly though.

Problem originally noted by:  John Plevyak <jplevyak@inktomi.com>

Revision 1.58: download - view: text, markup, annotated - select for diffs
Fri May 21 15:54:40 1999 UTC (12 years, 8 months ago) by ache
Branches: MAIN
Diff to: previous 1.57: preferred, colored
Changes since revision 1.57: +12 -4 lines
Realy fix overflow on SO_*TIMEO

Submitted by: bde

Revision 1.51.2.3: download - view: text, markup, annotated - select for diffs
Mon May 10 08:30:34 1999 UTC (12 years, 9 months ago) by dg
Branches: RELENG_3
CVS tags: RELENG_3_2_PAO_BP, RELENG_3_2_PAO, RELENG_3_2_0_RELEASE
Diff to: previous 1.51.2.2: preferred, colored; branchpoint 1.51: preferred, colored
Changes since revision 1.51.2.2: +4 -21 lines
Backed out changes in rev 1.51.2.1 since it causes a socket leak and there
is insufficient time to troubleshoot this prior to the 3.2 release.
Not backed out from current. Some collateral changes in uipc_socket2.c and
sys/socketvar.h were also not backed out as they are harmless.

Revision 1.57: download - view: text, markup, annotated - select for diffs
Mon May 3 23:57:23 1999 UTC (12 years, 9 months ago) by billf
Branches: MAIN
Diff to: previous 1.56: preferred, colored
Changes since revision 1.56: +3 -3 lines
Add sysctl descriptions to many SYSCTL_XXXs

PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)

Revision 1.51.2.2: download - view: text, markup, annotated - select for diffs
Fri Apr 30 19:52:50 1999 UTC (12 years, 9 months ago) by ache
Branches: RELENG_3
Diff to: previous 1.51.2.1: preferred, colored; branchpoint 1.51: preferred, colored
Changes since revision 1.51.2.1: +2 -2 lines
MFC: linger time in seconds

Revision 1.56: download - view: text, markup, annotated - select for diffs
Sat Apr 24 18:22:34 1999 UTC (12 years, 9 months ago) by ache
Branches: MAIN
CVS tags: PRE_VFS_BIO_NFS_PATCH, PRE_SMP_VMSHARE, POST_VFS_BIO_NFS_PATCH, POST_SMP_VMSHARE
Diff to: previous 1.55: preferred, colored
Changes since revision 1.55: +3 -3 lines
Lite2 bugfixes merge:
so_linger is in seconds, not in 1/HZ
range checking in SO_*TIMEO was wrong

PR: 11252

Revision 1.51.2.1: download - view: text, markup, annotated - select for diffs
Fri Feb 26 17:32:49 1999 UTC (12 years, 11 months ago) by peter
Branches: RELENG_3
Diff to: previous 1.51: preferred, colored
Changes since revision 1.51: +21 -4 lines
Backport NetBSD's 19990120-accept fix from -current to 3.x-stable.

This includes:  uipc_socket.c rev 1.52,1.54; uipc_socket2.c rev 1.44,
socketvar.h rev 1.33, uipc_syscall.c rev 1.54 (indirectly related)

Revision 1.55: download - view: text, markup, annotated - select for diffs
Tue Feb 16 10:49:49 1999 UTC (12 years, 11 months ago) by dfr
Branches: MAIN
CVS tags: PRE_NEWBUS, POST_NEWBUS
Diff to: previous 1.54: preferred, colored
Changes since revision 1.54: +3 -1 lines
* Change sysctl from using linker_set to construct its tree using SLISTs.
  This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
  file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
	Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)

Revision 1.54: download - view: text, markup, annotated - select for diffs
Tue Feb 2 07:23:28 1999 UTC (13 years ago) by fenner
Branches: MAIN
Diff to: previous 1.53: preferred, colored
Changes since revision 1.53: +7 -2 lines
Fix the port of the NetBSD 19990120-accept fix.  I misread a piece of
code when examining their fix, which caused my code (in rev 1.52) to:
- panic("soaccept: !NOFDREF")
- fatal trap 12, with tracebacks going thru soclose and soaccept

Revision 1.53: download - view: text, markup, annotated - select for diffs
Wed Jan 27 21:49:57 1999 UTC (13 years ago) by dillon
Branches: MAIN
Diff to: previous 1.52: preferred, colored
Changes since revision 1.52: +2 -2 lines
        Fix warnings in preparation for adding -Wall -Wcast-qual to the
        kernel compile

Revision 1.52: download - view: text, markup, annotated - select for diffs
Mon Jan 25 16:58:52 1999 UTC (13 years ago) by fenner
Branches: MAIN
Diff to: previous 1.51: preferred, colored
Changes since revision 1.51: +15 -3 lines
Port NetBSD's 19990120-accept bug fix.  This works around the race condition
where select(2) can return that a listening socket has a connected socket
queued, the connection is broken, and the user calls accept(2), which then
blocks because there are no connections queued.

Reviewed by:	wollman
Obtained from:	NetBSD
(ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19990120-accept)

Revision 1.51: download - view: text, markup, annotated - select for diffs
Wed Jan 20 17:45:22 1999 UTC (13 years ago) by fenner
Branches: MAIN
CVS tags: RELENG_3_BP, RELENG_3_1_0_RELEASE
Branch point for: RELENG_3
Diff to: previous 1.50: preferred, colored
Changes since revision 1.50: +2 -2 lines
Also consider the space left in the socket buffer when deciding whether
to set PRUS_MORETOCOME.

Revision 1.50: download - view: text, markup, annotated - select for diffs
Wed Jan 20 17:31:54 1999 UTC (13 years ago) by fenner
Branches: MAIN
Diff to: previous 1.49: preferred, colored
Changes since revision 1.49: +4 -2 lines
Add a flag, passed to pru_send routines, PRUS_MORETOCOME.  This
flag means that there is more data to be put into the socket buffer.
Use it in TCP to reduce the interaction between mbuf sizes and the
Nagle algorithm.

Based on:	"Justin C. Walker" <justin@apple.com>'s description of Apple's
		fix for this problem.

Revision 1.49: download - view: text, markup, annotated - select for diffs
Sun Jan 10 01:58:25 1999 UTC (13 years, 1 month ago) by eivind
Branches: MAIN
Diff to: previous 1.48: preferred, colored
Changes since revision 1.48: +2 -2 lines
KNFize, by bde.

Revision 1.48: download - view: text, markup, annotated - select for diffs
Fri Jan 8 17:31:13 1999 UTC (13 years, 1 month ago) by eivind
Branches: MAIN
Diff to: previous 1.47: preferred, colored
Changes since revision 1.47: +6 -13 lines
Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith

Revision 1.47: download - view: text, markup, annotated - select for diffs
Mon Dec 7 21:58:29 1998 UTC (13 years, 2 months ago) by archie
Branches: MAIN
Diff to: previous 1.46: preferred, colored
Changes since revision 1.46: +1 -3 lines
The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.

Revision 1.46: download - view: text, markup, annotated - select for diffs
Wed Nov 11 10:03:56 1998 UTC (13 years, 3 months ago) by truckman
Branches: MAIN
Diff to: previous 1.45: preferred, colored
Changes since revision 1.45: +4 -5 lines
Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices.  For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR:		kern/7899
Reviewed by:	bde, elvind

Revision 1.45: download - view: text, markup, annotated - select for diffs
Mon Aug 31 18:07:23 1998 UTC (13 years, 5 months ago) by wollman
Branches: MAIN
CVS tags: RELENG_3_0_0_RELEASE
Diff to: previous 1.44: preferred, colored
Changes since revision 1.44: +6 -5 lines
Bow to tradition and correctly implement the bogus-but-hallowed semantics
of getsockopt never telling how much it might have copied if only the
buffer were big enough.

Revision 1.44: download - view: text, markup, annotated - select for diffs
Mon Aug 31 15:34:55 1998 UTC (13 years, 5 months ago) by wollman
Branches: MAIN
Diff to: previous 1.43: preferred, colored
Changes since revision 1.43: +3 -6 lines
Correctly set the return length regardless of the relative size of the
user's buffer.  Simplify the logic a bit.  (Can we have a version of
min() for size_t?)

Revision 1.43: download - view: text, markup, annotated - select for diffs
Sun Aug 23 03:06:59 1998 UTC (13 years, 5 months ago) by wollman
Branches: MAIN
Diff to: previous 1.42: preferred, colored
Changes since revision 1.42: +148 -97 lines
Yow!  Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process.  Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.

Revision 1.42: download - view: text, markup, annotated - select for diffs
Sat Jul 18 18:48:45 1998 UTC (13 years, 6 months ago) by fenner
Branches: MAIN
Diff to: previous 1.41: preferred, colored
Changes since revision 1.41: +1 -2 lines
Undo rev 1.41 until we get more details about why it makes some systems
 fail.

Revision 1.41: download - view: text, markup, annotated - select for diffs
Mon Jul 6 19:27:14 1998 UTC (13 years, 7 months ago) by fenner
Branches: MAIN
Diff to: previous 1.40: preferred, colored
Changes since revision 1.40: +2 -1 lines
Introduce (fairly hacky) workaround for odd TCP behavior with application
 writes of size (100,208]+N*MCLBYTES.

The bug:
 sosend() hands each mbuf off to the protocol output routine as soon as it
 has copied it, in the hopes of increasing parallelism (see
  http://www.kohala.com/~rstevens/vanj.88jul20.txt ). This works well for
 TCP as long as the first mbuf handed off is at least the MSS.  However,
 when doing small writes (between MHLEN and MINCLSIZE), the transaction is
 split into 2 small MBUF's and each is individually handed off to TCP.
 TCP assumes that the first small mbuf is the whole transaction, so sends
 a small packet.  When the second small mbuf arrives, Nagle prevents TCP
 from sending it so it must wait for a (potentially delayed) ACK.  This
 sends throughput down the toilet.

The workaround:
 Set the "atomic" flag when we're doing small writes.  The "atomic" flag
 has two meanings:
 1. Copy all of the data into a chain of mbufs before handing off to the
    protocol.
 2. Leave room for a datagram header in said mbuf chain.
 TCP wants the first but doesn't want the second.  However, the second
 simply results in some memory wastage (but is why the workaround is a
 hack and not a fix).

The real fix:
 The real fix for this problem is to introduce something like a "requested
 transfer size" variable in the socket->protocol interface.  sosend()
 would then accumulate an mbuf chain until it exceeded the "requested
 transfer size".  TCP could set it to the TCP MSS (note that the
 current interface causes strange TCP behaviors when the MSS > MCLBYTES;
 nobody notices because MCLBYTES > ethernet's MTU).

Revision 1.40: download - view: text, markup, annotated - select for diffs
Fri May 15 20:11:30 1998 UTC (13 years, 8 months ago) by wollman
Branches: MAIN
CVS tags: PRE_NOBDEV
Diff to: previous 1.39: preferred, colored
Changes since revision 1.39: +46 -8 lines
Convert socket structures to be type-stable and add a version number.

Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.

Convert PF_LOCAL PCB structures to be type-stable and add a version number.

Define an external format for infomation about socket structures and use
it in several places.

Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time.  This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.

It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).

Revision 1.39: download - view: text, markup, annotated - select for diffs
Sat Mar 28 10:33:08 1998 UTC (13 years, 10 months ago) by bde
Branches: MAIN
CVS tags: PRE_DEVFS_SLICE, POST_DEVFS_SLICE
Diff to: previous 1.38: preferred, colored
Changes since revision 1.38: +2 -1 lines
Moved some #includes from <sys/param.h> nearer to where they are actually
used.

Revision 1.20.2.5: download - view: text, markup, annotated - select for diffs
Mon Mar 2 07:58:12 1998 UTC (13 years, 11 months ago) by guido
Branches: RELENG_2_2
CVS tags: RELENG_2_2_8_RELEASE, RELENG_2_2_7_RELEASE, RELENG_2_2_6_RELEASE
Diff to: previous 1.20.2.4: preferred, colored; branchpoint 1.20: preferred, colored
Changes since revision 1.20.2.4: +2 -1 lines
MFC: add uid's to struct socket.

Revision 1.38: download - view: text, markup, annotated - select for diffs
Sun Mar 1 19:39:17 1998 UTC (13 years, 11 months ago) by guido
Branches: MAIN
CVS tags: PRE_SOFTUPDATE, POST_SOFTUPDATE
Diff to: previous 1.37: preferred, colored
Changes since revision 1.37: +2 -1 lines
Make sure that you can only bind a more specific address when it is
done by the same uid.
Obtained from: OpenBSD

Revision 1.20.2.4: download - view: text, markup, annotated - select for diffs
Thu Feb 19 20:20:27 1998 UTC (13 years, 11 months ago) by fenner
Branches: RELENG_2_2
Diff to: previous 1.20.2.3: preferred, colored; branchpoint 1.20: preferred, colored
Changes since revision 1.20.2.3: +7 -3 lines
Merge rev 1.37: clear so_error after reporting it in sosend().

Revision 1.37: download - view: text, markup, annotated - select for diffs
Thu Feb 19 19:38:20 1998 UTC (13 years, 11 months ago) by fenner
Branches: MAIN
Diff to: previous 1.36: preferred, colored
Changes since revision 1.36: +7 -3 lines
Revert sosend() to its behavior from 4.3-Tahoe and before: if
so_error is set, clear it before returning it.  The behavior
introduced in 4.3-Reno (to not clear so_error) causes potentially
transient errors (e.g.  ECONNREFUSED if the other end hasn't opened
its socket yet) to be permanent on connected datagram sockets that
are only used for writing.

(soreceive() clears so_error before returning it, as does
getsockopt(...,SO_ERROR,...).)

Submitted by:	Van Jacobson <van@ee.lbl.gov>, via a comment in the vat sources.

Revision 1.36: download - view: text, markup, annotated - select for diffs
Fri Feb 6 12:13:28 1998 UTC (14 years ago) by eivind
Branches: MAIN
Diff to: previous 1.35: preferred, colored
Changes since revision 1.35: +1 -3 lines
Back out DIAGNOSTIC changes.

Revision 1.35: download - view: text, markup, annotated - select for diffs
Wed Feb 4 22:32:37 1998 UTC (14 years ago) by eivind
Branches: MAIN
Diff to: previous 1.34: preferred, colored
Changes since revision 1.34: +3 -1 lines
Turn DIAGNOSTIC into a new-style option.

Revision 1.20.2.3: download - view: text, markup, annotated - select for diffs
Wed Jan 28 23:32:26 1998 UTC (14 years ago) by jkh
Branches: RELENG_2_2
Diff to: previous 1.20.2.2: preferred, colored; branchpoint 1.20: preferred, colored
Changes since revision 1.20.2.2: +27 -4 lines
MFC: better range checking to prevent attack.

Revision 1.34: download - view: text, markup, annotated - select for diffs
Sun Nov 9 05:07:40 1997 UTC (14 years, 3 months ago) by jkh
Branches: MAIN
Diff to: previous 1.33: preferred, colored
Changes since revision 1.33: +9 -3 lines
MF22: MSG_EOR bug fix.
Submitted by:	wollman

Revision 1.20.2.2: download - view: text, markup, annotated - select for diffs
Sun Nov 9 05:06:12 1997 UTC (14 years, 3 months ago) by jkh
Branches: RELENG_2_2
Diff to: previous 1.20.2.1: preferred, colored; branchpoint 1.20: preferred, colored
Changes since revision 1.20.2.1: +9 -3 lines
Prevent bogus use of MSG_EOR with SOCK_STREAM sockets from panicing
the system.

Submitted by: wollman

Revision 1.33: download - view: text, markup, annotated - select for diffs
Sun Oct 12 20:24:12 1997 UTC (14 years, 4 months ago) by phk
Branches: MAIN
Diff to: previous 1.32: preferred, colored
Changes since revision 1.32: +5 -1 lines
Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde

Revision 1.32: download - view: text, markup, annotated - select for diffs
Sat Oct 4 18:21:15 1997 UTC (14 years, 4 months ago) by phk
Branches: MAIN
Diff to: previous 1.31: preferred, colored
Changes since revision 1.31: +3 -2 lines
While booting diskless we have no proc pointer.

Revision 1.31: download - view: text, markup, annotated - select for diffs
Sun Sep 14 02:34:14 1997 UTC (14 years, 4 months ago) by peter
Branches: MAIN
Diff to: previous 1.30: preferred, colored
Changes since revision 1.30: +27 -29 lines
Extend select backend for sockets to work with a poll interface (more
detail is passed back and forwards).  This mostly came from NetBSD, except
that our interfaces have changed a lot and this funciton is in a different
part of the kernel.

Obtained from: NetBSD

Revision 1.30: download - view: text, markup, annotated - select for diffs
Tue Sep 2 20:05:57 1997 UTC (14 years, 5 months ago) by bde
Branches: MAIN
Diff to: previous 1.29: preferred, colored
Changes since revision 1.29: +1 -2 lines
Removed unused #includes.

Revision 1.29: download - view: text, markup, annotated - select for diffs
Thu Aug 21 20:33:39 1997 UTC (14 years, 5 months ago) by bde
Branches: MAIN
Diff to: previous 1.28: preferred, colored
Changes since revision 1.28: +3 -1 lines
#include <machine/limits.h> explicitly in the few places that it is required.

Revision 1.28: download - view: text, markup, annotated - select for diffs
Sat Aug 16 19:15:04 1997 UTC (14 years, 5 months ago) by wollman
Branches: MAIN
Diff to: previous 1.27: preferred, colored
Changes since revision 1.27: +16 -22 lines
Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs.  (Socket buffers are the one exception.)  A number
of kernel APIs needed to get fixed in order to make this happen.  Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead.  Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.

Revision 1.27.2.1: download - view: text, markup, annotated - select for diffs
Wed Jul 2 19:53:44 1997 UTC (14 years, 7 months ago) by wollman
Branches: WOLLMAN_MBUF
Diff to: previous 1.27: preferred, colored; next MAIN 1.28: preferred, colored
Changes since revision 1.27: +16 -22 lines
Check in my big get-rid-of-sockaddrs-in-mbufs patch, on a private branch.

Requested by: julian

Revision 1.27: download - view: text, markup, annotated - select for diffs
Fri Jun 27 15:28:54 1997 UTC (14 years, 7 months ago) by peter
Branches: MAIN
CVS tags: BP_WOLLMAN_MBUF
Branch point for: WOLLMAN_MBUF
Diff to: previous 1.26: preferred, colored
Changes since revision 1.26: +27 -4 lines
Don't accept insane values for SO_(SND|RCV)BUF, and the low water marks.
Specifically, don't allow a value < 1 for any of them (it doesn't make
sense), and don't let the low water mark be greater than the corresponding
high water mark.

Pre-Approved by: wollman
Obtained from: NetBSD

Revision 1.26: download - view: text, markup, annotated - select for diffs
Sun Apr 27 20:00:44 1997 UTC (14 years, 9 months ago) by wollman
Branches: MAIN
Diff to: previous 1.25: preferred, colored
Changes since revision 1.25: +57 -26 lines
The long-awaited mega-massive-network-code- cleanup.  Part I.

This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call.  Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken.  I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.

Revision 1.25: download - view: text, markup, annotated - select for diffs
Sun Mar 23 03:36:31 1997 UTC (14 years, 10 months ago) by bde
Branches: MAIN
CVS tags: pre_smp_merge, post_smp_merge
Diff to: previous 1.24: preferred, colored
Changes since revision 1.24: +2 -2 lines
Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place.  Most things don't depend on file.h stuff at all.

Revision 1.24: download - view: text, markup, annotated - select for diffs
Mon Feb 24 20:30:56 1997 UTC (14 years, 11 months ago) by wollman
Branches: MAIN
Diff to: previous 1.23: preferred, colored
Changes since revision 1.23: +3 -2 lines
Create a new branch of the kernel MIB, kern.ipc, to store
all of the configurables and instrumentation related to
inter-process communication mechanisms.  Some variables,
like mbuf statistics, are instrumented here for the first
time.

For mbuf statistics: also keep track of m_copym() and
m_pullup() failures, and provide for the user's inspection
the compiled-in values of MSIZE, MHLEN, MCLBYTES, and MINCLSIZE.

Revision 1.23: download - view: text, markup, annotated - select for diffs
Sat Feb 22 09:39:28 1997 UTC (14 years, 11 months ago) by peter
Branches: MAIN
Diff to: previous 1.22: preferred, colored
Changes since revision 1.22: +1 -1 lines
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$.  We are not
ready for it yet.

Revision 1.22: download - view: text, markup, annotated - select for diffs
Tue Jan 14 06:44:19 1997 UTC (15 years ago) by jkh
Branches: MAIN
Diff to: previous 1.21: preferred, colored
Changes since revision 1.21: +1 -1 lines
Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.

Revision 1.20.2.1: download - view: text, markup, annotated - select for diffs
Tue Dec 3 10:48:58 1996 UTC (15 years, 2 months ago) by phk
Branches: RELENG_2_2
CVS tags: RELENG_2_2_5_RELEASE, RELENG_2_2_2_RELEASE, RELENG_2_2_1_RELEASE, RELENG_2_2_0_RELEASE
Diff to: previous 1.20: preferred, colored
Changes since revision 1.20: +3 -1 lines
YAMFC

Revision 1.21: download - view: text, markup, annotated - select for diffs
Fri Nov 29 19:03:42 1996 UTC (15 years, 2 months ago) by davidg
Branches: MAIN
Diff to: previous 1.20: preferred, colored
Changes since revision 1.20: +3 -1 lines
Check for error return from uiomove to prevent looping endlessly in
soreceive(). Closes PR#2114.

Submitted by:	wpaul

Revision 1.20: download - view: text, markup, annotated - select for diffs
Mon Oct 7 04:32:26 1996 UTC (15 years, 4 months ago) by pst
Branches: MAIN
CVS tags: RELENG_2_2_BP
Branch point for: RELENG_2_2
Diff to: previous 1.19: preferred, colored
Changes since revision 1.19: +2 -1 lines
Increase robustness of FreeBSD against high-rate connection attempt
denial of service attacks.

Reviewed by:	bde,wollman,olah
Inspired by:	vjs@sgi.com

Revision 1.19: download - view: text, markup, annotated - select for diffs
Thu Jul 11 16:31:56 1996 UTC (15 years, 7 months ago) by wollman
Branches: MAIN
Diff to: previous 1.18: preferred, colored
Changes since revision 1.18: +18 -36 lines
Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone.  This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner.  This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out).  This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted).  Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.

Revision 1.10.4.4: download - view: text, markup, annotated - select for diffs
Wed Jun 5 19:49:12 1996 UTC (15 years, 8 months ago) by nate
Branches: RELENG_2_1_0
CVS tags: RELENG_2_1_7_RELEASE, RELENG_2_1_6_RELEASE, RELENG_2_1_6_1_RELEASE, RELENG_2_1_5_RELEASE
Diff to: previous 1.10.4.3: preferred, colored; branchpoint 1.10: preferred, colored
Changes since revision 1.10.4.3: +9 -4 lines
Fixed bogus changes from mega-commit 3.  This reverts the files to their
revisions *before* the mega-commit but makes sure any subsequent fixes
are brought in.

TODO - netiso

Revision 1.10.4.3: download - view: text, markup, annotated - select for diffs
Wed Jun 5 02:54:30 1996 UTC (15 years, 8 months ago) by jkh
Branches: RELENG_2_1_0
Diff to: previous 1.10.4.2: preferred, colored; branchpoint 1.10: preferred, colored
Changes since revision 1.10.4.2: +3 -8 lines
This 3rd mega-commit should hopefully bring us back to where we were.
I can get it to `make world' succesfully, anyway!

Revision 1.10.4.2: download - view: text, markup, annotated - select for diffs
Fri May 31 08:04:11 1996 UTC (15 years, 8 months ago) by peter
Branches: RELENG_2_1_0
Diff to: previous 1.10.4.1: preferred, colored; branchpoint 1.10: preferred, colored
Changes since revision 1.10.4.1: +9 -4 lines
Add sysctl hooks for user-mode setproctitle() and libkvm to see,
taken from -current, but implemented in old-style sysctl.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Thu May 9 20:14:57 1996 UTC (15 years, 9 months ago) by wollman
Branches: MAIN
Diff to: previous 1.17: preferred, colored
Changes since revision 1.17: +3 -1 lines
Make it possible to return more than one piece of control information
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps  as control information (PR #1179).

Submitted by:	Louis Mamakos <loiue@TransSys.com>

Revision 1.17: download - view: text, markup, annotated - select for diffs
Tue Apr 16 03:50:08 1996 UTC (15 years, 9 months ago) by davidg
Branches: MAIN
Diff to: previous 1.16: preferred, colored
Changes since revision 1.16: +8 -4 lines
Fix for PR #1146: the "next" pointer must be cached before calling soabort
since the struct containing it may be freed.

Revision 1.1.1.2 (vendor branch): download - view: text, markup, annotated - select for diffs
Mon Mar 11 20:01:52 1996 UTC (15 years, 11 months ago) by peter
Branches: CSRG
CVS tags: bsd_44_lite_2
Diff to: previous 1.1.1.1: preferred, colored
Changes since revision 1.1.1.1: +28 -12 lines
Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge.  "C" generally
means that there was a change.
[note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]

Revision 1.16: download - view: text, markup, annotated - select for diffs
Mon Mar 11 15:37:31 1996 UTC (15 years, 11 months ago) by davidg
Branches: MAIN
CVS tags: wollman_polling
Diff to: previous 1.15: preferred, colored
Changes since revision 1.15: +23 -10 lines
Changed socket code to use 4.4BSD queue macros. This includes removing
the obsolete soqinsque and soqremque functions as well as collapsing
so_q0len and so_qlen into a single queue length of unaccepted connections.
Now the queue of unaccepted & complete connections is checked directly
for queued sockets. The new code should be functionally equivilent to
the old while being substantially faster - especially in cases where
large numbers of connections are often queued for accept (e.g. http).

Revision 1.15: download - view: text, markup, annotated - select for diffs
Tue Feb 13 18:16:20 1996 UTC (16 years ago) by wollman
Branches: MAIN
Diff to: previous 1.14: preferred, colored
Changes since revision 1.14: +2 -2 lines
Kill XNS.
While we're at it, fix socreate() to take a process argument.  (This
was supposed to get committed days ago...)

Revision 1.14: download - view: text, markup, annotated - select for diffs
Wed Feb 7 16:19:19 1996 UTC (16 years ago) by wollman
Branches: MAIN
Diff to: previous 1.13: preferred, colored
Changes since revision 1.13: +10 -1 lines
Define a new socket option, SO_PRIVSTATE.  Getting it returns the state
of the SS_PRIV flag in so_state; setting it always clears same.

Revision 1.13: download - view: text, markup, annotated - select for diffs
Thu Dec 14 22:51:01 1995 UTC (16 years, 2 months ago) by bde
Branches: MAIN
Diff to: previous 1.12: preferred, colored
Changes since revision 1.12: +2 -2 lines
Nuked ambiguous sleep message strings:
	old:				new:
	netcls[] = "netcls"		"soclos"
	netcon[] = "netcon"		"accept", "connec"
	netio[] = "netio"		"sblock", "sbwait"

Revision 1.12: download - view: text, markup, annotated - select for diffs
Fri Nov 3 18:33:43 1995 UTC (16 years, 3 months ago) by wollman
Branches: MAIN
Diff to: previous 1.11: preferred, colored
Changes since revision 1.11: +8 -4 lines
Make somaxconn (maximum backlog in a listen(2) request) and sb_max
(maximum size of a socket buffer) tunable.

Permit callers of listen(2) to specify a negative backlog, which
is translated into somaxconn.  Previously, a negative backlog was
silently translated into 0.

Revision 1.10.4.1: download - view: text, markup, annotated - select for diffs
Tue Sep 12 08:29:56 1995 UTC (16 years, 5 months ago) by davidg
Branches: RELENG_2_1_0
CVS tags: RELENG_2_1_0_RELEASE
Diff to: previous 1.10: preferred, colored
Changes since revision 1.10: +2 -3 lines
Brought in change from rev 1.11: kill extra arg to a pr_usrreq call.

Revision 1.11: download - view: text, markup, annotated - select for diffs
Fri Aug 25 20:27:46 1995 UTC (16 years, 5 months ago) by bde
Branches: MAIN
Diff to: previous 1.10: preferred, colored
Changes since revision 1.10: +2 -3 lines
Remove extra arg from one of the calls to (*pr_usrreq)().

Revision 1.10: download - view: text, markup, annotated - select for diffs
Tue May 30 08:06:21 1995 UTC (16 years, 8 months ago) by rgrimes
Branches: MAIN
CVS tags: RELENG_2_1_0_BP, RELENG_2_0_5_RELEASE, RELENG_2_0_5_BP, RELENG_2_0_5
Branch point for: RELENG_2_1_0
Diff to: previous 1.9: preferred, colored
Changes since revision 1.9: +2 -2 lines
Remove trailing whitespace.

Revision 1.9: download - view: text, markup, annotated - select for diffs
Thu Feb 16 01:07:43 1995 UTC (16 years, 11 months ago) by wollman
Branches: MAIN
CVS tags: RELENG_2_0_5_ALPHA
Diff to: previous 1.8: preferred, colored
Changes since revision 1.8: +2 -2 lines
getsockopt(s, SOL_SOCKET, SO_SNDTIMEO, ...) would construct the returned
timeval incorrectly, truncating the usec part.

Obtained from: Stevens vol. 2 p. 548

Revision 1.8: download - view: text, markup, annotated - select for diffs
Tue Feb 7 02:01:14 1995 UTC (17 years ago) by wollman
Branches: MAIN
Diff to: previous 1.7: preferred, colored
Changes since revision 1.7: +21 -4 lines
Merge in the socket-level support for Transaction TCP.

Revision 1.5.4.1: download - view: text, markup, annotated - select for diffs
Tue Feb 7 01:33:12 1995 UTC (17 years ago) by wollman
Branches: OLAH_TTCP
Diff to: previous 1.5: preferred, colored; next MAIN 1.6: preferred, colored
Changes since revision 1.5: +31 -1 lines
Andras Olah's port of Bob Braden's Transaction TCP code.  This should
be on a private branch until it gets merged back into the main
line (which may take some time).

Revision 1.7: download - view: text, markup, annotated - select for diffs
Mon Feb 6 02:22:12 1995 UTC (17 years ago) by davidg
Branches: MAIN
Diff to: previous 1.6: preferred, colored
Changes since revision 1.6: +2 -2 lines
Use M_NOWAIT instead of M_KERNEL for socket allocations; it is apparantly
possible for certain socket operations to occur during interrupt context.

Submitted by:	John Dyson

Revision 1.6: download - view: text, markup, annotated - select for diffs
Thu Feb 2 08:49:08 1995 UTC (17 years ago) by davidg
Branches: MAIN
Diff to: previous 1.5: preferred, colored
Changes since revision 1.5: +2 -2 lines
Calling semantics for kmem_malloc() have been changed...and the third
argument is now more than just a single flag. (kern_malloc.c)
Used new M_KERNEL value for socket allocations that previous were
"M_NOWAIT". Note that this will change when we clean up the M_ namespace
mess.

Submitted by:	John Dyson

Revision 1.5: download - view: text, markup, annotated - select for diffs
Sun Oct 2 17:35:32 1994 UTC (17 years, 4 months ago) by phk
Branches: MAIN
CVS tags: RELEASE_2_0, BETA_2_0, ALPHA_2_0
Branch point for: OLAH_TTCP
Diff to: previous 1.4: preferred, colored
Changes since revision 1.4: +16 -13 lines
All of this is cosmetic.  prototypes, #includes, printfs and so on.  Makes
GCC a lot more silent.

Revision 1.4: download - view: text, markup, annotated - select for diffs
Tue Aug 2 07:43:06 1994 UTC (17 years, 6 months ago) by davidg
Branches: MAIN
Diff to: previous 1.3: preferred, colored
Changes since revision 1.3: +1 -0 lines
Added $Id$

Revision 1.3: download - view: text, markup, annotated - select for diffs
Sun May 29 07:48:17 1994 UTC (17 years, 8 months ago) by davidg
Branches: MAIN
Diff to: previous 1.2: preferred, colored
Changes since revision 1.2: +3 -12 lines
Changed mbuf allocation policy to get a cluster if size > MINCLSIZE. Makes
a BIG difference in socket performance.

Revision 1.2: download - view: text, markup, annotated - select for diffs
Wed May 25 09:05:40 1994 UTC (17 years, 8 months ago) by rgrimes
Branches: MAIN
Diff to: previous 1.1: preferred, colored
Changes since revision 1.1: +21 -1 lines
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by:	Rodney W. Grimes
Submitted by:	John Dyson and David Greenman

Revision 1.1.1.1 (vendor branch): download - view: text, markup, annotated - select for diffs
Tue May 24 10:04:24 1994 UTC (17 years, 8 months ago) by rgrimes
Branches: CSRG
CVS tags: bsd_44_lite, REL_before_johndavid_2_0_0
Diff to: previous 1.1: preferred, colored
Changes since revision 1.1: +0 -0 lines
BSD 4.4 Lite Kernel Sources

Revision 1.1: download - view: text, markup, annotated - select for diffs
Tue May 24 10:04:23 1994 UTC (17 years, 8 months ago) by rgrimes
Branches: MAIN
Initial revision

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options