- Copyright (c) 2006 Robert N. M. Watson Copyright (c) 2014 Benjamin J. Kaduk All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the f...
NAMEsocket - kernel socket interface
SYNOPSISIn sys/socket.h In sys/socketvar.h Ft void Fn soabort struct socket *so Ft int Fn soaccept struct socket *so struct sockaddr **nam Ft int Fn socheckuid struct socket *so uid_t uid Ft int Fn sobind struct socket *so struct sockaddr *nam struct thread *td Ft void Fn soclose struct socket *so Ft int Fn soconnect struct socket *so struct sockaddr *nam struct thread *td Ft int Fo socreate Fa int dom struct socket **aso int type int proto Fa struct ucred *cred struct thread *td Fc Ft int Fn sodisconnect struct socket *so Ft struct sockaddr * Fn sodupsockaddr const struct sockaddr *sa int mflags Ft void Fn sofree struct socket *so Ft void Fn sohasoutofband struct socket *so Ft int Fn solisten struct socket *so int backlog struct thread *td Ft void Fn solisten_proto struct socket *so int backlog Ft int Fn solisten_proto_check struct socket *so Ft struct socket * Fn sonewconn struct socket *head int connstatus Ft int Fo sopoll Fa struct socket *so int events struct ucred *active_cred Fa struct thread *td Fc Ft int Fo sopoll_generic Fa struct socket *so int events struct ucred *active_cred Fa struct thread *td Fc Ft int Fo soreceive Fa struct socket *so struct sockaddr **psa struct uio *uio Fa struct mbuf **mp0 struct mbuf **controlp int *flagsp Fc Ft int Fo soreceive_stream Fa struct socket *so struct sockaddr **paddr Fa struct uio *uio struct mbuf **mp0 struct mbuf **controlp Fa int *flagsp Fc Ft int Fo soreceive_dgram Fa struct socket *so struct sockaddr **paddr Fa struct uio *uio struct mbuf **mp0 struct mbuf **controlp Fa int *flagsp Fc Ft int Fo soreceive_generic Fa struct socket *so struct sockaddr **paddr Fa struct uio *uio struct mbuf **mp0 struct mbuf **controlp Fa int *flagsp Fc Ft int Fn soreserve struct socket *so u_long sndcc u_long rcvcc Ft void Fn sorflush struct socket *so Ft int Fo sosend Fa struct socket *so struct sockaddr *addr struct uio *uio Fa struct mbuf *top struct mbuf *control int flags struct thread *td Fc Ft int Fo sosend_dgram Fa struct socket *so struct sockaddr *addr Fa struct uio *uio struct mbuf *top struct mbuf *control Fa int flags struct thread *td Fc Ft int Fo sosend_generic Fa struct socket *so struct sockaddr *addr Fa struct uio *uio struct mbuf *top struct mbuf *control Fa int flags struct thread *td Fc Ft int Fn soshutdown struct socket *so int how Ft void Fn sotoxsocket struct socket *so struct xsocket *xso Ft void Fn soupcall_clear struct socket *so int which Ft void Fo soupcall_set Fa struct socket *so int which Fa int (*func)(struct socket *, void *, int) void *arg Fc Ft void Fn sowakeup struct socket *so struct sockbuf *sb In sys/sockopt.h Ft int Fn sosetopt struct socket *so struct sockopt *sopt Ft int Fn sogetopt struct socket *so struct sockopt *sopt Ft int Fn sooptcopyin struct sockopt *sopt void *buf size_t len size_t minlen Ft int Fn sooptcopyout struct sockopt *sopt const void *buf size_t len
DESCRIPTIONThe kernel programming interface permits in-kernel consumers to interact with local and network socket objects in a manner similar to that permitted using the socket(2) user API. These interfaces are appropriate for use by distributed file systems and other network-aware kernel services. While the user API operates on file descriptors, the kernel interfaces operate directly on Vt struct socket pointers. Some portions of the kernel API exist only to implement the user API, and are not expected to be used by kernel code. The portions of the socket API used by socket consumers and implementations of network protocols will differ; some routines are only useful for protocol implementors.
Except where otherwise indicated, functions may sleep, and are not appropriate for use in an ithread(9) context or while holding non-sleepable kernel locks.
Creating and Destroying SocketsA new socket may be created using Fn socreate . As with socket(2), arguments specify the requested domain, type, and protocol via Fa dom , type , and Fa proto . The socket is returned via Fa aso on success. In addition, the credential used to authorize operations associated with the socket will be passed via Fa cred (and will be cached for the lifetime of the socket), and the thread performing the operation via Fa td . Warning authorization of the socket creation operation will be performed using the thread credential for some protocols (such as raw sockets).
Sockets may be closed and freed using Fn soclose , which has similar semantics to close(2).
In certain circumstances, it is appropriate to destroy a socket without waiting for it to disconnect, for which Fn soabort is used. This is only appropriate for incoming connections which are in a partially connected state. It must be called on an unreferenced socket, by the thread which removed the socket from its listen queue, to prevent races. It will call into protocol code, so no socket locks may be held over the call. The caller of Fn soabort is responsible for setting the VNET context. The normal path to freeing a socket is Fn sofree , which handles reference counting on the socket. It should be called whenever a reference is released, and also whenever reference flags are cleared in socket or protocol code. Calls to Fn sofree should not be made from outside the socket layer; outside callers should use Fn soclose instead.
Connections and AddressesThe Fn sobind function is equivalent to the bind(2) system call, and binds the socket Fa so to the address Fa nam . The operation would be authorized using the credential on thread Fa td .
The Fn soconnect function is equivalent to the connect(2) system call, and initiates a connection on the socket Fa so to the address Fa nam . The operation will be authorized using the credential on thread Fa td . Unlike the user system call, Fn soconnect returns immediately; the caller may msleep(9) on Fa so->so_timeo while holding the socket mutex and waiting for the SS_ISCONNECTING flag to clear or Fa so->so_error to become non-zero. If Fn soconnect fails, the caller must manually clear the SS_ISCONNECTING flag.
A call to Fn sodisconnect disconnects the socket without closing it.
The Fn soshutdown function is equivalent to the shutdown(2) system call, and causes part or all of a connection on a socket to be closed down.
Sockets are transitioned from non-listening status to listening with Fn solisten .
Socket OptionsThe Fn sogetopt function is equivalent to the getsockopt(2) system call, and retrieves a socket option on socket Fa so . The Fn sosetopt function is equivalent to the setsockopt(2) system call, and sets a socket option on socket Fa so .
The second argument in both Fn sogetopt and Fn sosetopt is the Fa sopt pointer to a Vt struct sopt describing the socket option operation. The caller-allocated structure must be zeroed, and then have its fields initialized to specify socket option operation arguments:
- Set to SOPT_SET or SOPT_GET depending on whether this is a get or set operation.
- Specify the level in the network stack the operation is targeted at; for example, SOL_SOCKET
- Specify the name of the socket option to set.
- Kernel space pointer to the argument value for the socket option.
- Size of the argument value in bytes.
Socket UpcallsIn order for the owner of a socket to be notified when the socket is ready to send or receive data, an upcall may be registered on the socket. The upcall is a function that will be called by the socket framework when a socket buffer associated with the given socket is ready for reading or writing. Fn soupcall_set is used to register a socket upcall. The function func is registered, and the pointer arg will be passed as its second argument when it is called by the framework. The possible values for which are SO_RCV and SO_SND which register upcalls for receive and send events, respectively. The upcall function Fn func must return either SU_OK or SU_ISCONNECTED depending on whether or not a call to soisconnected should be made by the socket framework after the upcall returns. The upcall func cannot call soisconnected itself due to lock ordering with the socket buffer lock. Only SO_RCV upcalls should return SU_ISCONNECTED When a SO_RCV upcall returns SU_ISCONNECTED the upcall will be removed from the socket.
Upcalls are removed from their socket by Fn soupcall_clear . The which argument again specifies whether the sending or receiving upcall is to be cleared, with SO_RCV or SO_SND
Socket I/OThe Fn soreceive function is equivalent to the recvmsg(2) system call, and attempts to receive bytes of data from the socket Fa so , optionally blocking awaiting for data if none is ready to read. Data may be retrieved directly to kernel or user memory via the Fa uio argument, or as an mbuf chain returned to the caller via Fa mp0 , avoiding a data copy. The Fa uio must always be non- NULL If Fa mp0 is non- NULL only the Fa uio_resid of Fa uio is used. The caller may optionally retrieve a socket address on a protocol with the PR_ADDR capability by providing storage via non- NULL Fa psa argument. The caller may optionally retrieve control data mbufs via a non- NULL Fa controlp argument. Optional flags may be passed to Fn soreceive via a non- NULL Fa flagsp argument, and use the same flag name space as the recvmsg(2) system call.
The Fn sosend function is equivalent to the sendmsg(2) system call, and attempts to send bytes of data via the socket Fa so , optionally blocking if data cannot be immediately sent. Data may be sent directly from kernel or user memory via the Fa uio argument, or as an mbuf chain via Fa top , avoiding a data copy. Only one of the Fa uio or Fa top pointers may be non- NULL An optional destination address may be specified via a non- NULL Fa addr argument, which may result in an implicit connect if supported by the protocol. The caller may optionally send control data mbufs via a non- NULL Fa control argument. Flags may be passed to Fn sosend using the Fa flags argument, and use the same flag name space as the sendmsg(2) system call.
Kernel callers running in ithread(9) context, or with a mutex held, will wish to use non-blocking sockets and pass the MSG_DONTWAIT flag in order to prevent these functions from sleeping.
A socket can be queried for readability, writability, out-of-band data, or end-of-file using Fn sopoll . The possible values for events are as for poll(2), with symbolic values POLLIN POLLPRI POLLOUT POLLRDNORM POLLWRNORM POLLRDBAND and POLLINGEOF taken from In sys/poll.h .
Calls to Fn soaccept pass through to the protocol's accept routine to accept an incoming connection.
Socket Utility FunctionsThe uid of a socket's credential may be compared against a uid with Fn socheckuid .
A copy of an existing Vt struct sockaddr may be made using Fn sodupsockaddr .
Protocol implementations notify the socket layer of the arrival of out-of-band data using Fn sohasoutofband , so that the socket layer can notify socket consumers of the available data.
An ``external-format'' version of a Vt struct socket can be created using Fn sotoxsocket , suitable for isolating user code from changes in the kernel structure.
Protocol ImplementationsProtocols must supply an implementation for Fn solisten ; such protocol implementations can call back into the socket layer using Fn solisten_proto_check and Fn solisten_proto to check and set the socket-layer listen state. These callbacks are provided so that the protocol implementation can order the socket layer and protocol locks as necessary. Protocols must supply an implementation of Fn soreceive ; the functions Fn soreceive_stream , Fn soreceive_dgram , and Fn soreceive_generic are supplied for use by such implementations.
Protocol implementations can use Fn sonewconn to create a socket and attach protocol state to that socket. This can be used to create new sockets available for Fn soaccept on a listen socket. The returned socket has a reference count of zero.
Protocols must supply an implementation for Fn sopoll ; Fn sopoll_generic is provided for the use by protocol implementations.
The functions Fn sosend_dgram and Fn sosend_generic are supplied to assist in protocol implementations of Fn sosend .
When a protocol creates a new socket structure, it is necessary to reserve socket buffer space for that socket, by calling Fn soreserve . The rough inverse of this reservation is performed by Fn sorflush , which is called automatically by the socket framework.
When a protocol needs to wake up threads waiting for the socket to become ready to read or write, variants of Fn sowakeup are used. The Fn sowakeup function should not be called directly by protocol code, instead use the wrappers Fn sorwakeup , Fn sorwakeup_locked , Fn sowwakeup , and Fn sowwakeup_locked for readers and writers, with the corresponding socket buffer lock not already locked, or already held, respectively.
The functions Fn sooptcopyin and Fn sooptcopyout are useful for transferring Vt struct sockopt data between user and kernel code.
SEE ALSObind(2), close(2), connect(2), getsockopt(2), recv(2), send(2), setsockopt(2), shutdown(2), socket(2), ng_ksocket4, ithread(9), msleep(9), ucred(9)
HISTORYThe socket(2) system call appeared in BSD 4.2 This manual page was introduced in Fx 7.0 .
AUTHORSThis manual page was written by An Robert Watson and An Benjamin Kaduk .
BUGSThe use of explicitly passed credentials, credentials hung from explicitly passed threads, the credential on curthread and the cached credential from socket creation time is inconsistent, and may lead to unexpected behaviour. It is possible that several of the Fa td arguments should be Fa cred arguments, or simply not be present at all.
The caller may need to manually clear SS_ISCONNECTING if Fn soconnect returns an error.
The MSG_DONTWAIT flag is not implemented for Fn sosend , and may not always work with Fn soreceive when zero copy sockets are enabled.
This manual page does not describe how to register socket upcalls or monitor a socket for readability/writability without using blocking I/O.
The Fn soref and Fn sorele functions are not described, and in most cases should not be used, due to confusing and potentially incorrect interactions when Fn sorele is last called after Fn soclose .