Copyright (c) 1999 Kenneth D. Merry. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS ...
NAMEpci - generic PCI bus driver
SYNOPSISTo compile the PCI bus driver into the kernel, place the following line in your kernel configuration file:
To compile in support for Single Root I/O Virtualization (SR-IOV)
To compile in support for native PCI-express HotPlug:
DESCRIPTIONThe sysctl Cm net.inet.tcp.syncookies driver provides support for PCI devices in the kernel and limited access to PCI devices for userland.
The sysctl Cm net.inet.tcp.syncookies driver provides a /dev/pci character device that can be used by userland programs to read and write PCI configuration registers. Programs can also use this device to get a list of all PCI devices, or all PCI devices that match various patterns.
Since the sysctl Cm net.inet.tcp.syncookies driver provides a write interface for PCI configuration registers, system administrators should exercise caution when granting access to the sysctl Cm net.inet.tcp.syncookies device. If used improperly, this driver can allow userland applications to crash a machine or cause data loss.
The sysctl Cm net.inet.tcp.syncookies driver implements the PCI bus in the kernel. It enumerates any devices on the PCI bus and gives PCI client drivers the chance to attach to them. It assigns resources to children, when the BIOS does not. It takes care of routing interrupts when necessary. It reprobes the unattached PCI children when PCI client drivers are dynamically loaded at runtime. The sysctl Cm net.inet.tcp.syncookies driver also includes support for PCI-PCI bridges, various platform-specific Host-PCI bridges, and basic support for PCI VGA adapters.
IOCTLSThe following ioctl(2) calls are supported by the sysctl Cm net.inet.tcp.syncookies driver. They are defined in the header file In sys/pciio.h .
It allows the user to retrieve information on all
devices in the system, or on
devices matching patterns supplied by the user.
The call may set
to any value specified in either
structure consists of a number of fields:
- The length, in bytes, of the buffer filled with user-supplied patterns.
- The number of user-supplied patterns.
Pointer to a buffer filled with user-supplied patterns.
is a pointer to
structure consists of the following elements:
- PCI domain, bus, slot and function.
- PCI device driver name.
- PCI device driver unit number.
- PCI vendor ID.
- PCI device ID.
- PCI device class.
- The flags describe which of the fields the kernel should match against. A device must match all specified fields in order to be returned. The match flags are enumerated in the pci_getconf_flags structure. Hopefully the flag values are obvious enough that they do not need to described in detail.
- Length of the matches buffer allocated by the user to hold the results of the PCIOCGETCONF query.
- Number of matches returned by the kernel.
Buffer containing matching devices returned by the kernel.
The items in this buffer are of type
which consists of the following items:
- PCI domain, bus, slot and function.
- PCI header type.
- PCI subvendor ID.
- PCI subdevice ID.
- PCI vendor ID.
- PCI device ID.
- PCI device class.
- PCI device subclass.
- PCI device programming interface.
- PCI revision ID.
- Driver name.
- Driver unit number.
- The offset is passed in by the user to tell the kernel where it should start traversing the device list. The value passed out by the kernel points to the record immediately after the last one returned. The user may pass the value returned by the kernel in subsequent calls to the PCIOCGETCONF ioctl. If the user does not intend to use the offset, it must be set to zero.
- PCI configuration generation. This value only needs to be set if the offset is set. The kernel will compare the current generation number of its internal device list to the generation passed in by the user to determine whether its device list has changed since the user last called the PCIOCGETCONF ioctl. If the device list has changed, a status of PCI_GETCONF_LIST_CHANGED will be passed back.
The status tells the user the disposition of his request for a device list.
The possible status values are:
- This means that there are no more devices in the PCI device list after the ones returned in the matches buffer.
- This status tells the user that the PCI device list has changed since his last call to the PCIOCGETCONF ioctl and he must reset the offset and generation to zero to start over at the beginning of the list.
- This tells the user that his buffer was not large enough to hold all of the remaining devices in the device list that possibly match his criteria. It is possible for this status to be returned, even when none of the remaining devices in the list would match the user's criteria.
- This indicates a general error while servicing the user's request. If the pat_buf_len is not equal to num_patterns times Fn sizeof struct pci_match_conf , errno will be set to Er EINVAL .
configuration registers specified by the passed-in
structure consists of the following fields:
- A pcisel structure which specifies the domain, bus, slot and function the user would like to query. If the specific bus is not found, errno will be set to ENODEV and -1 returned from the ioctl.
- The PCI configuration register the user would like to access.
- The width, in bytes, of the data the user would like to read. This value may be either 1, 2, or 4. 3-byte reads and reads larger than 4 bytes are not supported. If an invalid width is passed, errno will be set to EINVAL.
- The data returned by the kernel.
- This ioctl(2) allows users to write to the PCI specified in the passed-in pci_io structure. The pci_io structure is described above. The limitations on data width described for reading registers, above, also apply to writing PCI configuration registers.
LOADER TUNABLESTunables can be set at the loader(8) prompt before booting the kernel, or stored in loader.conf5. The current value of these tunables can be examined at runtime via sysctl(8) nodes of the same name. Unless otherwise specified, each of these tunables is a boolean that can be enabled by setting the tunable to a non-zero value.
- hw.pci.clear_bars (Defaults to 0)
- Ignore any firmware-assigned memory and I/O port resources. This forces the PCI bus driver to allocate resource ranges for memory and I/O port resources from scratch.
- hw.pci.clear_buses (Defaults to 0)
- Ignore any firmware-assigned bus number registers in PCI-PCI bridges. This forces the PCI bus driver and PCI-PCI bridge driver to allocate bus numbers for secondary buses behind PCI-PCI bridges.
- hw.pci.clear_pcib (Defaults to 0)
Ignore any firmware-assigned memory and I/O port resource windows in PCI-PCI
This forces the PCI-PCI bridge driver to allocate memory and I/O port resources
for resource windows from scratch.
By default the PCI-PCI bridge driver will allocate windows that contain the firmware-assigned resources devices behind the bridge. In addition, the PCI-PCI bridge driver will suballocate from existing window regions when possible to satisfy a resource request. As a result, both hw.pci.clear_bars and hw.pci.clear_pcib must be enabled to fully ignore firmware-supplied resource assignments.
- hw.pci.default_vgapci_unit (Defaults to -1)
- By default, the first PCI VGA adapter encountered by the system is assumed to be the boot display device. This tunable can be set to choose a specific VGA adapter by specifying the unit number of the associated vgapci X device.
- hw.pci.do_power_nodriver (Defaults to 0)
Place devices into a low power state
when a suitable device driver is not found.
Can be set to one of the following values:
- Powers down all PCI devices without a device driver.
- Powers down most devices without a device driver. PCI devices with the display, memory, and base peripheral device classes are not powered down.
- Similar to a setting of 2 except that storage controllers are also not powered down.
- All devices are left fully powered.
A PCI device must support power management to be powered down. Placing a device into a low power state may not reduce power consumption.
- hw.pci.do_power_resume (Defaults to 1)
- Place PCI devices into the fully powered state when resuming either the system or an individual device. Setting this to zero is discouraged as the system will not attempt to power up non-powered PCI devices after a suspend.
- hw.pci.do_power_suspend (Defaults to 1)
- Place PCI devices into a low power state when suspending either the system or individual devices. Normally the D3 state is used as the low power state, but firmware may override the desired power state during a system suspend.
- hw.pci.enable_ari (Defaults to 1)
- Enable support for PCI-express Alternative RID Interpretation. This is often used in conjunction with SR-IOV.
- hw.pci.enable_io_modes (Defaults to 1)
- Enable memory or I/O port decoding in a PCI device's command register if it has firmware-assigned memory or I/O port resources. The firmware (BIOS) in some systems does not enable memory or I/O port decoding for some devices even when it has assigned resources to the device. This enables decoding for such resources during bus probe.
- hw.pci.enable_msi (Defaults to 1)
- Enable support for Message Signalled Interrupts (MSI) MSI interrupts can be disabled by setting this tunable to 0.
- hw.pci.enable_msix (Defaults to 1)
- Enable support for extended Message Signalled Interrupts (MSI-X) MSI-X interrupts can be disabled by setting this tunable to 0.
- hw.pci.enable_pcie_hp (Defaults to 1)
- Enable support for native PCI-express HotPlug.
- hw.pci.honor_msi_blacklist (Defaults to 1)
- MSI and MSI-X interrupts are disabled for certain chipsets known to have broken MSI and MSI-X implementations when this tunable is set. It can be set to zero to permit use of MSI and MSI-X interrupts if the chipset match is a false positive.
- hw.pci.iov_max_config (Defaults to 1MB)
- The maximum amount of memory permitted for the configuration parameters used when creating Virtual Functions via SR-IOV. This tunable can also be changed at runtime via sysctl(8).
- hw.pci.realloc_bars (Defaults to 0)
- Attempt to allocate a new resource range during the initial device scan for any memory or I/O port resources with firmware-assigned ranges that conflict with another active resource.
- hw.pci.usb_early_takeover (Defaults to 1 on amd64 and i386 )
- Disable legacy device emulation of USB devices during the initial device scan. Set this tunable to zero to use USB devices via legacy emulation when using a custom kernel without USB controller drivers.
These tunables can be used to override the interrupt routing for legacy
PCI INTx interrupts.
Unlike other tunables in this list,
these do not have corresponding sysctl nodes.
The tunable name includes the address of the PCI device as well as the
pin of the desired INTx IRQ to override:
- The domain (or segment) of the PCI device in decimal.
- The bus address of the PCI device in decimal.
- The slot of the PCI device in decimal.
- The interrupt pin of the PCI slot to override. One of `A' , `B' , `C' , or `D'
The value of the tunable is the raw IRQ value to use for the INTx interrupt pin identified by the tunable name. Mapping of IRQ values to platform interrupt sources is machine dependent.
- Character device for the sysctl Cm net.inet.tcp.syncookies driver.