Pci
Pci
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The world of PCI is vast and full of (mostly unpleasant) surprises.
Since each CPU architecture implements different chip-sets and PCI devices
have different requirements (erm, "features"), the result is the PCI support
in the Linux kernel is not as trivial as one would wish. This short paper
tries to introduce all potential driver authors to Linux APIs for
PCI device drivers.
http://lwn.net/Kernel/LDD3/
However, keep in mind that all documents are subject to "bit rot".
Refer to the source code if things are not working as described here.
Once the driver knows about a PCI device and takes ownership, the
driver generally needs to perform the following initialization:
When done using the device, and perhaps the module needs to be unloaded,
the driver needs to take the follow steps:
Disable the device from generating IRQs
Release the IRQ (free_irq())
Stop all DMA activity
Release DMA buffers (both streaming and coherent)
Unregister from other subsystems (e.g. scsi or netdev)
Release MMIO/IOP resources
Disable the device
1. pci_register_driver() call
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
New PCI IDs may be added to a device driver pci_ids table at runtime
as shown below:
Once added, the driver probe routine will be invoked for any unclaimed
PCI devices listed in its (newly updated) pci_ids list.
When the driver exits, it just calls pci_unregister_driver() and the PCI layer
automatically calls the remove hook for all devices handled by the driver.
o Do NOT mark a function if you are not sure which mark to use.
Better to not mark the function than mark the function wrong.
PCI drivers should have a really good reason for not using the
pci_register_driver() interface to search for PCI devices.
The main reason PCI devices are controlled by multiple drivers
is because one PCI device implements several different HW services.
E.g. combined serial/parallel port/floppy controller.
pci_get_class(CLASS_ID, dev)
As noted in the introduction, most PCI drivers need the following steps
for device initialization:
The driver can access PCI config space registers at any time.
(Well, almost. When running BIST, config space can go away...but
that will just result in a PCI Bus Master Abort and config reads
will return garbage).
[ See OS BUG comment above. Currently (2.6.19), The driver can only
determine MMIO and IO Port resource availability _after_ calling
pci_enable_device(). ]
Drivers for all PCI-X and PCIe compliant devices must call
pci_set_dma_mask() as they are 64-bit DMA devices.
All interrupt handlers for IRQ lines should be registered with IRQF_SHARED
and use the devid to map IRQs to devices (remember that all PCI IRQ lines
can be shared).
MSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts"
which deliver interrupts to the CPU via a DMA write to a Local APIC.
The fundamental difference between MSI and MSI-X is how multiple
"vectors" get allocated. MSI requires contiguous blocks of vectors
while MSI-X can allocate several individual ones.
There are (at least) two really good reasons for using MSI:
1) MSI is an exclusive interrupt vector by definition.
This means the interrupt handler doesn't have to verify
its device caused the interrupt.
Stopping DMA after stopping the IRQs can avoid races where the
IRQ handler might restart DMA engines.
While this step sounds obvious and trivial, several "mature" drivers
didn't get this step right in the past.
If you access fields in the standard portion of the config header, please
use symbolic names of locations and bits declared in <linux/pci.h>.
7. Miscellaneous hints
~~~~~~~~~~~~~~~~~~~~~~
When displaying PCI device names to the user (for example when a driver wants
to tell the user what card has it found), please use pci_name(pci_dev).
Don't try to turn on Fast Back to Back writes in your driver. All devices
on the bus need to be capable of doing it, so this is something which needs
to be handled by platform and generic code, not individual drivers.
9. Obsolete functions
~~~~~~~~~~~~~~~~~~~~~
There are several functions which you might come across when trying to
port an old driver to the new PCI interface. They are no longer present
in the kernel as they aren't compatible with hotplug or PCI domains or
having sane locking.
The alternative is the traditional PCI device driver that walks PCI
device lists. This is still possible but discouraged.
Converting a driver from using I/O Port space to using MMIO space
often requires some additional changes. Specifically, "write posting"
needs to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2)
already do this. I/O Port space guarantees write transactions reach the PCI
device before the CPU can continue. Writes to MMIO space allow the CPU
to continue before the transaction reaches the PCI device. HW weenies
call this "Write Posting" because the write completion is "posted" to
the CPU before the transaction has reached its destination.
Thus, timing sensitive code should add readl() where the CPU is
expected to wait before doing other work. The classic "bit banging"
sequence works fine for I/O Port space:
Another case to watch out for is when resetting a PCI device. Use PCI
Configuration space reads to flush the writel(). This will gracefully
handle the PCI master abort on all platforms if the PCI device is
expected to not respond to a readl(). Most x86 platforms will allow
MMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").