0% found this document useful (0 votes)
217 views10 pages

Pcie Aer

This document discusses enabling PCI Express advanced error reporting in the Linux kernel. It provides details on the PCI Express error reporting topology and architecture of the PCI Express advanced error reporting driver. The driver gathers comprehensive error information, performs error recovery actions, and reports errors to users for any platform that supports PCI Express. It serves as a root port advanced error reporting service driver to handle interrupts from errors detected on PCI Express links or transactions.

Uploaded by

黃是豪
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
217 views10 pages

Pcie Aer

This document discusses enabling PCI Express advanced error reporting in the Linux kernel. It provides details on the PCI Express error reporting topology and architecture of the PCI Express advanced error reporting driver. The driver gathers comprehensive error information, performs error recovery actions, and reports errors to users for any platform that supports PCI Express. It serves as a root port advanced error reporting service driver to handle interrupts from errors detected on PCI Express links or transactions.

Uploaded by

黃是豪
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Enable PCI Express Advanced Error Reporting in the Kernel

Yanmin Zhang and T. Long Nguyen


Intel Corporation
[email protected], [email protected]

Abstract Express, it is far easier for individual developers to get


such a machine and add error recovery code into specific
PCI Express is a high-performance, general-purpose I/O device drivers.
Interconnect. It introduces AER (Advanced Error Re-
porting) concepts, which provide significantly higher re-
liability at a lower cost than the previous PCI and PCI-X 2 PCI Express Advanced Error Reporting
standards. The AER driver of the Linux kernel provides Driver
a clean, generic, and architecture-independent solution.
As long as a platform supports PCI Express, the AER 2.1 PCI Express Advanced Error Reporting Topol-
driver shall gather and manage all occurred PCI Express ogy
errors and incorporate with PCI Express device drivers
to perform error-recovery actions.
To understand the PCI Express Advanced Error Report-
This paper is targeted toward kernel developers inter- ing Driver architecture, it helps to begin with the ba-
ested in the details of enabling PCI Express device sics of PCI Express Port topology. Figure 1 illustrates
drivers, and it provides insight into the scope of imple- two types of PCI Express Port devices: the Root Port
menting the PCI Express AER driver and the AER con- and the Switch Port. The Root Port originates a PCI
formation usage model. Express Link from a PCI Express Root Complex. The
Switch Port, which has its secondary bus representing
1 Introduction switch internal routing logic, is called the Switch Up-
stream Port. The Switch Port which is bridging from
Current machines need higher reliability than before and switch internal routing buses to the bus representing
need to recover from failure quickly. As one of failure the downstream PCI Express Link is called the Switch
causes, peripheral devices might run into errors, or go Downstream Port. Each PCI Express Port device can
crazy completely. If one device is crazy, device driver be implemented to support up to four distinct services:
might get bad information and cause a kernel panic: the native hot plug (HP), power management event (PME),
system might crash unexpectedly. advanced error reporting (AER), virtual channels (VC).
As a matter of fact, IBM engineers (Linas Vepstas and The AER driver development is based on the service
others) created a framework to support PCI error re- driver framework of the PCI Express Port Bus Driver
covery procedures in-kernel because IBM Power4 and design model [3]. As illustrated in Figure 2, the PCI
Power5-based pSeries provide specific PCI device er- Express AER driver serves as a Root Port AER service
ror recovery functions in platforms [4]. However, this driver attached to the PCI Express Port Bus driver.
model lacks the ability to support platform indepen-
dence and is not easy for individual developers to get
a Power machine for testing these functions. The PCI 2.2 PCI Express Advanced Error Reporting Driver
Express introduces the AER, which is a world standard. Architecture
The PCI Express AER driver is developed to support the
PCI Express AER. First, any platform which supports PCI Express error signaling can occur on the PCI Ex-
the PCI Express could use the PCI Express AER driver press link itself or on behalf of transactions initiated on
to process device errors and handle error recovery ac- the link. PCI Express defines the AER capability, which
cordingly. Second, as lots of platforms support the PCI is implemented with the PCI Express AER Extended

• 297 •
298 • Enable PCI Express Advanced Error Reporting in the Kernel

Root Port
Root Complex CPU

Root Root
Interrupt
Port Port Switch
Upstream Root Complex

Up
Port
Port Switch Root
Downstream Port
PCI Express Switch
Port
Down Down
Port Port
Up
Port
Switch
Figure 1: PCI Express Port Topology
Down Down
Port Port

Error Message
PBD

End Point
Root Complex
PMErs AERrs HPrs VCrs
Root Root
Port Port PMErs AERrs HPrs VCrs Figure 3: PCI Express Error Reporting procedures

Claim

When errors happen, the PCI Express AER driver could


AER Port Service Driver
provide such infrastructure with three basic functions:

Figure 2: AER Root Port Service Driver • Gathers the comprehensive error information if er-
rors occurred.

• Performs error recovery actions.


Capability Structure, to allow a PCI Express compo-
nent (agent) to send an error reporting message to the • Reports error to the users.
Root Port. The Root Port, a host receiver of all error
messages associated with its hierarchy, decodes an er-
ror message into an error type and an agent ID and then 2.2.1 PCI Express Error Introduction
logs these into its PCI Express AER Extended Capabil-
ity Structure. Depending on whether an error reporting Traditional PCI devices provide simple error reporting
message is enabled in the Root Error Command Reg- approaches, PERR# and SERR#. PERR# is parity error,
ister, the Root Port device generates an interrupt if an while SERR# is system error. All non-PERR# errors are
error is detected. The PCI Express AER service driver SERR#. PCI uses two independent signal lines to rep-
is implemented to service AER interrupts generated by resent PERR# and SERR#, which are platform chipset-
the Root Ports. Figure 3 illustrates the error report pro- specific. As for how software is notified about the errors,
cedures. it totally depends on the specific platforms.

Once the PCI Express AER service driver is loaded, it To support traditional error handling, PCI Express pro-
claims all AERrs service devices in a system device hi- vides baseline error reporting, which defines the basic
erarchy, as shown in Figure 2. For each AERrs service error reporting mechanism. All PCI Express devices
device, the advanced error reporting service driver con- have to implement this baseline capability and must map
figures its service device to generate an interrupt when required PCI Express error support to the PCI-related
an error is detected [3]. error registers, which include enabling error reporting
2007 Linux Symposium, Volume Two • 299

and setting status bits that can be read by PCI-compliant trol Protocol Errors, Completion Time-out Errors, Com-
software. But the baseline error reporting doesn’t define pleter Abort Errors, Unexpected Completion Errors, Re-
how platforms notify system software about the errors. ceiver Overflow Errors, Malformed TLPs, ECRC Er-
rors, and Unsupported Request Errors. When an un-
PCI Express errors consist of two types, correctable er- correctable error occurs, the corresponding bit within
rors and uncorrectable errors. Correctable errors include the Advanced Uncorrectable Error Status register is set
those error conditions where the PCI Express protocol automatically by hardware and is cleared by software
can recover without any loss of information. A cor- when writing a “1” to the bit position. Advanced error
rectable error, if one occurs, can be corrected by the handling permits software to select the severity of each
hardware without requiring any software intervention. error within the Advanced Uncorrectable Error Severity
Although the hardware has an ability to correct and re- register. This gives software the opportunity to treat er-
duce the correctable errors, correctable errors may have rors as fatal or non-fatal, according to the severity asso-
impacts on system performance. ciated with a given application. Software could use the
Uncorrectable errors are those error conditions that im- Advanced Uncorrectable Mask register to mask specific
pact functionality of the interface. To provide more errors.
robust error handling to system software, PCI Express
further classifies uncorrectable errors as fatal and non-
fatal. Fatal errors might cause corresponding PCI Ex- 2.2.2 PCI Express AER Driver Designed To Handle
press links and hardware to become unreliable. System PCI Express Errors
software needs to reset the links and corresponding de-
vices in a hierarchy where a fatal error occurred. Non- Before kernel 2.6.18, the Linux kernel had no root port
fatal errors wouldn’t cause PCI Express link to become AER service driver. Usually, the BIOS provides basic
unreliable, but might cause transaction failure. System error mechanism, but it couldn’t coordinate correspond-
software needs to coordinate with a device agent, which ing devices to get more detailed error information and
generates a non-fatal error, to retry any failed transac- perform recovery actions. As a result, the AER driver
tions. has been developed to support PCI Express AER en-
abling for the Linux kernel.
PCI Express AER provides more reliable error report-
ing infrastructure. Besides the baseline error reporting,
PCI Express AER defines more fine-grained error types
and provides log capability. Devices have a header log 2.2.2.1 AER Initialization Procedures
register to capture the header for the TLP corresponding When a machine is booting, the system allocates in-
to a detected error. terrupt vector(s) for every PCI Express root port. To
service the PCI Express AER interrupt at a PCI Express
Correctable errors consist of receiver errors, bad TLP, root port, the PCI Express AER driver registers its in-
bad DLLP, REPLAY_NUM rollover, and replay timer terrupt service handler with Linux kernel. Once a PCI
time-out. When a correctable error occurs, the corre- Express root port receives an error reported from the
sponding bit within the advanced correctable error status downstream device, that PCI Express root port sends an
register is set. These bits are automatically set by hard- interrupt to the CPU, from which the Linux kernel will
ware and are cleared by software when writing a “1” call the PCI Express AER interrupt service handler.
to the bit position. In addition, through the Advanced
Correctable Error Mask Register (which has the similar Most of AER processing work should be done under
bitmap like advanced correctable error status register), a process context. The PCI Express AER driver cre-
a specific correctable error could be masked and not be ates one worker per PCI Express AER root port virtual
reported to root port. Although the errors are not re- device. Depending on where an AER interrupt occurs
ported with the mask configuration, the corresponding in a system hierarchy, the corresponding worker will be
bit in advanced correctable error status register will still scheduled.
be set.
Most BIOS vendors provide a non-standard error pro-
Uncorrectable errors consist of Training Errors, Data cessing mechanism. To avoid conflict with BIOS while
Link Protocol Errors, Poisoned TLP Errors, Flow Con- handling PCI Express errors, the PCI Express AER
300 • Enable PCI Express Advanced Error Reporting in the Kernel

driver must request the BIOS for ownership of the PCI AER Driver
Express AER via the ACPI _OSC method, as specified
1) Get Source ID/
in PCI Express Specification and ACPI Specification. If Error Type,
the BIOS doesn’t support the ACPI _OSC method, or Clear Root Status
the ACPI _OSC method returns errors, the PCI Express
AER driver’s probe function will fail (refer to Section 3 Root Complex

for a workaround if the BIOS vendor does not support Root


the ACPI _OSC method). Port

Once the PCI Express AER driver takes over, the BIOS 2) Get Detailed Error Type,
must stop its activities on PCI Express error processing. Clear Correctable Error Status

The Express AER driver then configures PCI Express


AER capability registers of the PCI Express root port
End Point: E1
and specific devices to support PCI Express native AER.

2.2.2.2 Handle PCI Express Correctable Errors Figure 4: Procedure to Process Correctable Errors

Because a correctable error can be corrected by the hard-


ware without requiring any software intervention, if one AER Driver
3) Error Recovery
occurs, the PCI Express AER driver first decodes an er-
ror message received at PCI Express root port into an er- 1) Get Source ID/
End Point
ror type and an agent ID. Second, the PCI Express AER Error Type
Driver
driver uses decoded error information to read the PCI Clear root status

Express AER capability of the agent device to obtain Root Complex


more details about an error. Third, the PCI Express AER
driver clears the corresponding bit in the correctable er- Root
ror status register of both PCI Express root port and the Port

agent device. Figure 4 illustrates the procedure to pro- 2) Get Detailed Error
cess correctable errors. Last but not least, the details Type and Log

about an error will be formatted and output to the sys-


tem console as shown below: End Point: E1

+—— PCI-Express Device Error —–+ Figure 5: Procedures to Process Non-Fatal Errors
Error Severity : Corrected
PCIE Bus Error type : Physical Layer
Receiver Error : Multiple
The first two steps are like the ones to process cor-
Receiver ID : 0020
rectable errors. During Step 2, the AER driver need to
VendorID=8086h, DeviceID=3597h, Bus=00h, Device=04h,
retrieve the packet header log from the agent if the error
Function=00h
is TLP-related.
The Requester ID is the ID of the device which reports Below is an example of non-fatal error output to the sys-
the error. Based on such information, an administrator tem console.
could find the bad device easily. +—— PCI-Express Device Error ——+
Error Severity : Uncorrected (Non-Fatal)
2.2.2.3 Handle PCI Express Non-Fatal Errors PCIE Bus Error type : Transaction Layer
If an agent device reports non-fatal errors, the PCI Completion Timeout : Multiple
Express AER driver uses the same mechanism as de- Requester ID : 0018
scribed in Section 2.2.2 to obtain more details about an VendorID=8086h, DeviceID=3596h, Bus=00h, Device=03h,
error from an agent device and output error information Function=00h
to the system console. Figure 5 illustrates the procedure
to process non-fatal errors. Unlike correctable errors, non-fatal errors might cause
2007 Linux Symposium, Volume Two • 301

some transaction failures. To help an agent device driver callbacks of the relevant drivers. In the resume func-
to retry any failed transactions, the PCI Express AER tions, drivers could resume operations to the devices.
driver must perform a non-fatal error recovery proce-
dure, which depends on where a non-fatal error occurs If an error_detected callback returns PCI_ERS_
in a system hierarchy. As illustrated in Figure 6, for RESULT_NEED_RESET, the recovery procedure will call
example, there are two PCI Express switches. If end- all slot_reset callbacks of relevant drivers. If
point device E2 reports a non-fatal error, the PCI Ex- all slot_reset functions return PCI_ERS_RESULT_
press AER driver will try to perform an error recovery CAN_RECOVER, the resume callback will be called to
procedure only on this device. Other devices won’t take finish the recovery. Currently, some device drivers pro-
part in this error recovery procedure. If downstream port vide err_handler callbacks. For example, Intel’s
P1 of switch 1 reports a non-fatal error, the PCI Express E100 and E1000 network card driver and IBM’s POWER
AER driver will do error recovery procedure on all de- RAID driver.
vices under port P1, including all ports of switch 2, end The PCI Express AER driver outputs some information
point E1, and E2. about non-fatal error recovery steps and results. Below
is an example.
Root Complex

Root
+—— PCI-Express Device Error —–+
Port Error Severity : Uncorrected (Non-Fatal)
PCIE Bus Error type : Transaction Layer
Up Unsupported Request : First
Port
Switch 1 Requester ID : 0500
Down VendorID=14e4h, DeviceID=1659h, Bus=05h, Device=00h,
Port: P1 Function=00h
TLB Header:
04000001 0020060f 05010008 00000000
Up
Port
Broadcast error_detected message
Switch 2 Broadcast slot_reset message
Down Down Broadcast resume message
Port Port
tg3: eth3: Link is down.
AER driver successfully recovered
End Point: E1 End Point: E2

Figure 6: Non-Fatal Error Recovery Example 2.2.2.4 Handle PCI Express Fatal Errors
When processing fatal errors, the PCI Express AER
driver also collects detailed error information from the
To take part in the error recovery procedure, specific de- reporter in the same manner as described in Sections
vice drivers need to implement error callbacks as de- 2.2.2.2 and 2.2.2.3. Below is an example of non-fatal
scribed in Section 4.1. error output to the system console:
When an uncorrectable non-fatal error happens, the +—— PCI-Express Device Error ——+
AER error recovery procedure first calls the error_ Error Severity : Uncorrected (Fatal)
detected routine of all relevant drivers to notify their PCIE Bus Error type : Transaction Layer
devices run into errors by the deep-first sequence. In Unsupported Request : First
the callback error_detected, the driver shouldn’t Requester ID : 0200
operate the devices, i.e., do not perform any I/O on the VendorID=8086h, DeviceID=0329h, Bus=02h, Device=00h,
devices. Mostly, error_detected might cancel all Function=00h
pending requests or put the requests into a queue. TLB Header:
04000001 00180003 02040000 00020400
If the return values from all relevant error_
detected routines are PCI_ERS_RESULT_CAN_ When performing the error recovery procedure, the ma-
RECOVER, the AER recovery procedure calls all resume jor difference between non-fatal and fatal is whether
302 • Enable PCI Express Advanced Error Reporting in the Kernel

the PCI Express link will be reset. If the return val- driver specific */
ues from all relevant error_detected routines are pci_ers_result_t (*reset_link) (struct
PCI_ERS_RESULT_CAN_RECOVER, the AER recovery pci_dev *dev);
procedure resets the PCI Express link based on whether ...
the agent is a bridge. Figure 7 illustrates an example. };

Root Complex
If a port uses a vendor-specific approach to reset link, its
Root
Port: P0 AER port service driver has to provide a reset_link
function. If a root port driver or downstream port ser-
vice driver doesn’t provide a reset_link function,
Up
Port: P1 the default reset_link function will be called. If
Switch an upstream port service driver doesn’t implement a
Down Down reset_link function, the error recovery will fail.
Port: P2 Port: P3
Below is the system console output example printed by
End Point: E1
the PCI Express AER driver when doing fatal error re-
covery.
Figure 7: Reset PCI Express Link Example +—— PCI-Express Device Error —–+
Error Severity : Uncorrected (Fatal)
PCIE Bus Error type : (Unaccessible)
In Figure 7, if root port P0 (a kind of bridge) reports a
Unaccessible Received : First
fatal error to itself, the PCI Express AER driver chooses
Unregistered Agent ID : 0500
to reset the upstream link between root port P0 and up-
Broadcast error_detected message
stream port P1. If end-point device E1 reports a fatal
Complete link reset at Root[0000:00:04.0]
error, the PCI Express AER driver chooses to reset the
Broadcast slot_reset message
upstream link of E1, i.e., the link between P2 and E1.
Broadcast resume message
The reset is executed by the port. If the agent is a port, tg3: eth3: Link is down.
the port will execute reset. If the agent is an end-point AER driver successfully recovered
device, for example, E1 in Figure 7, the port of the up-
stream link of E1, i.e., port P2 will execute reset. 2.3 Including PCI Express Advanced Error Re-
porting Driver Into the Kernel
The reset method depends on the port type. As for root
port and downstream port, the PCI Express Specifica-
tion defines an approach to reset their downstream link. The PCI Express AER Root driver is a Root Port ser-
In Figure 7, if port P0, P2, P3, and end point E1 report vice driver attached to the PCI Express Port Bus driver.
fatal errors, the method defined in PCI Express Specifi- Its service must be registered with the PCI Express Port
cation will be used. The PCI Express AER driver im- Bus driver and users are required to include the PCI Ex-
plements the standard method as default reset function. press Port Bus driver in the kernel [5]. Once the ker-
nel configuration option CONFIG_PCIEPORTBUS is in-
There is no standard way to reset the downstream cluded, the PCI Express AER Root driver is automati-
link under the upstream port because different switches cally included as a kernel driver by default (CONFIG_
might implement different reset approaches. To facili- PCIEAER = Y).
tate the link reset approach, the PCI Express AER driver
adds reset_link, a new function pointer, in the data
structure pcie_port_service_driver.
3 Impact to PCI Express BIOS Vendor

struct pcie_port_service_driver { Currently, most BIOSes don’t follow PCI FW 3.0 to


... support the ACPI _OSC handler. As a result, the PCI
/* Link Reset Capability - AER service Express AER driver will fail when calling the ACPI
2007 Linux Symposium, Volume Two • 303

control method _OSC. The PCI Express AER driver 4.2 Device driver helper functions
provides a current workaround for the lack of ACPI
BIOS _OSC support by implementing a boot param- To communicate with device AER capabilities, drivers
eter, forceload=y/n. When the kernel boots with need to access AER registers in configuration space. It’s
parameter aerdriver.forceload=y, the PCI Ex- easy to write incorrect code because they must access/
press AER driver still binds to all root ports, which im- change the bits of registers. To facilitate driver program-
plements the AER capability. ming and reduce coding errors, the AER driver provides
a couple of helper functions which could be used by de-
4 Impact to PCI Express Device Driver vice drivers.

4.1 Device driver requirements


4.2.1 int pci_find_aer_capability
To conform to AER driver infrastructure, PCI Express (struct pci_dev *dev);
device drivers need support AER capability.
pci_find_aer_capability locates the PCI Ex-
First, when a driver initiates a device, it needs to enable press AER capability in the device configuration space.
the device’s error reporting capability. By default, de- Since offset 0x100 in configuration space, PCI Express
vice error reporting is turned off, so the device won’t devices could provide a couple of optional capabilities
send error messages to root port when it captures an er- and they link each other in a chain. AER is one of them.
ror. To locate AER registers, software needs to go through
Secondly, to take part in the error recovery procedure, a the chain. This function returns the AER offset in the
device driver needs to implement error callbacks as de- device configuration space.
scribed in the pci_error_handlers data structure
as shown below. 4.2.2 int pci_enable_pcie_error_reporting (struct
struct pci_error_handlers { pci_dev *dev);
/* PCI bus error detected on this device */
pci_ers_result_t (*error_detected)(struct pci_enable_pcie_error_reporting enables
pci_dev *dev, enum pci_channel_state error); the device to send error messages to the root port when
/* MMIO has been re-enabled, but not DMA */
an error is detected. If the device doesn’t support PCI-
pci_ers_result_t (*mmio_enabled)(struct
pci_dev *dev);
Express capability, the function returns 0. When a de-
/* PCI slot has been reset */ vice driver initiates a device (mostly, in its probe func-
pci_ers_result_t (*slot_reset)(struct tion), it should call pci_enable_pcie_error_
pci_dev *dev); reporting.
/* Device driver may resume
normal operations */
void (*resume)(struct pci_dev *dev); 4.2.3 int pci_disable_pcie_error_reporting (struct
}; pci_dev *dev);
In data structure pci_driver, add err_handler
pci_disable_pcie_error_reporting dis-
as a new pointer to point to the pci_error_
ables the device from sending error messages to the root
handlers. In kernel 2.6.14, the definition of pci_
port. Sometimes, device drivers want to process errors
error_handlers had already been added to support
by themselves instead of using the AER driver. It’s not
PCI device error recovery [4]. To be compatible with
encouraged, but we provide this capability.
PCI device error recovery, PCI Express device error re-
covery also uses the same definition and follows a sim-
ilar rule. One of our starting points is that we try to 4.2.4 int pci_cleanup_aer_uncorrect_
keep the recovery callback interfaces as simple as we error_status (struct pci_dev *dev);
can. If the interfaces are complicated, there will be no
driver developers who will be happy to add error recov- pci_cleanup_aer_uncorrect_error_
ery callbacks into device drivers. status cleans up the uncorrectable error status
304 • Enable PCI Express Advanced Error Reporting in the Kernel

register. The AER driver only clears correctable References


error status register when processing errors. As for
uncorrectable errors, specific device drivers should [1] PCI Express Base Specification Revision 1.1.
do so because they might do more specific process- March 28, 2005. http://www.pcisig.com
ing. Usually, a driver should call this function in its
[2] PCI Firmware Specification Revision 3.0,
slot_reset or resume callbacks.
http://www.pcisig.com
4.3 Testing PCI Express AER On Device Driver [3] Tom Long Nguyen, Dely L. Sy, & Steven
Carbonari. “PCI Express Port Bus Driver Support
It’s hard to test device driver AER capabilities. By lots for Linux.” Proceedings of the Linux Symposium,
of experiments, we have found that UR (Unsupported Vol. 2, Ottawa, Ontario, 2005.
Request) can be used to test device drivers. We trig- http://www.linuxsymposium.org/
gered UR error messages by probing a non-existent de- 2005/linuxsymposium_procv2.pdf
vice function. For example, if a PCI Express device only
has one function, when kernel reads the ClassID from [4] pci-error-recovery.txt. Available from:
the configuration space of the second function of the 2.6.20/Documentation.
device, the device might send an Unsupported Request [5] PCIEBUS-HOWTO.txt. Available from:
error message to the root port and set the bit in uncor- 2.6.20/Documentation.
rectable error status register. By setting different values
in the corresponding bit in uncorrectable error mask reg- [6] pcieaer-howto.txt. Available from:
ister, we could test both non-fatal and fatal errors. 2.6.20/Documentation.

5 Conclusion

The PCI Express AER driver creates a generic infras-


tructure to support PCI Express AER. This infrastruc-
ture provides the Linux kernel with an ability to capture
PCI Express device errors and perform error recovery
where in a hierarchy an agent device reports. Last but
not least the system administrators could get formatted,
useful error information to debug device errors.
Linux kernel 2.6.19 has accepted the PCI Express AER
patches. Future work includes enabling PCI Express
AER for every PCI Express device by default, blocking
I/O when an error happens, and so on.

6 Acknowledgement

Special thanks to Steven Carbonari for his contributions


to the architecture design of PCI Express AER driver,
Rajesh Shah for his contributions to code review, and
the Linux community for providing great input.

Legal Statement

This paper is copyright c 2007 by Intel Corporation. Per-


mission to redistribute in accordance with Linux Sympo-
sium submission guidelines is granted; all other rights are
reserved.
Proceedings of the
Linux Symposium

Volume Two

June 27th–30th, 2007


Ottawa, Ontario
Canada
Conference Organizers
Andrew J. Hutton, Steamballoon, Inc., Linux Symposium,
Thin Lines Mountaineering
C. Craig Ross, Linux Symposium

Review Committee
Andrew J. Hutton, Steamballoon, Inc., Linux Symposium,
Thin Lines Mountaineering
Dirk Hohndel, Intel
Martin Bligh, Google
Gerrit Huizenga, IBM
Dave Jones, Red Hat, Inc.
C. Craig Ross, Linux Symposium

Proceedings Formatting Team


John W. Lockhart, Red Hat, Inc.
Gurhan Ozen, Red Hat, Inc.
John Feeney, Red Hat, Inc.
Len DiMaggio, Red Hat, Inc.
John Poelstra, Red Hat, Inc.

You might also like