Hyper Transport Technology
Hyper Transport Technology
INTRODUCTION
The demand for faster processors, memory and I/O is a familiar refrain in market
applications ranging from personal computers and servers to networking systems and
from video games to office automation equipment. Once information is digitized, the
speed at which it is processed becomes the foremost determinate of product success.
Faster system speed leads to faster processing. Faster processing leads to faster system
performance. Faster system performance results in greater success in the marketplace.
This obvious logic has led a generation of processor and memory designers to focus on
one overriding objective – squeezing more speed from processors and memory devices.
Processor designers have responded with faster clock rates and super pipelined
architectures that use level 1 and level 2 caches to feed faster execution units even faster.
Memory designers have responded with dual data rate memories that allow data access
on both the leading and trailing clock edges doubling data access. I/O developers have
responded by designing faster and wider I/O channels and introducing new protocols to
meet anticipated I/O needs. Today, processors hit the market with 2+ GHz clock rates,
memory devices provide sub5 ns access times and standard I/O buses are 32- and 64-bit
wide, with new higher speed protocols on the horizon.
Increased processor speeds, faster memories, and wider I/O channels are not
always practical answers to the need for speed. The main problem is integration of more
and faster system elements. Faster execution units, faster memories and wider, faster I/O
buses lead to crowding of more high-speed signal lines onto the physical printed circuit
board. One aspect of the integration problem is the physical problems posed by speed.
Faster signal speeds lead to manufacturing problems due to loss of signal integrity and
greater susceptibility to noise. Very high-speed digital signals tend to become high
frequency radio waves exhibiting the same problematic characteristics of high-frequency
analog signals. This wreaks havoc on printed circuit board’s manufactured using
standard, low-cost materials and technologies.
Signal integrity problems caused by signal crosstalk, signal and clock skew and
signal reflections increase dramatically as clock speed increases. The other aspect of the
Hyper transport Technology
integration problem is the I/O bottleneck that develops when multiple high-speed
execution units are combined for greater performance. While faster execution units
relieve processor performance bottlenecks, the bottleneck moves to the I/O links. Now
more data sits idling, waiting for the processor and I/O buses to clear and movement of
large amounts of data from one subsystem to another slows down the overall system
performance ratings.
Hyper transport Technology
AGP Pro, and SNA buses like InfiniBand. The hodge-podge of buses increases system
complexity, adds many transistors devoted to bus arbitration and bridge logic, while
delivering less than optimal performance. A number of new technologies are responsible
for the increasing demand for additional bandwidth. High-resolution, texture-mapped 3D
graphics and high-definition streaming video are escalating bandwidth needs between
CPUs and graphics processors. Technologies like high-speed networking (Gigabit
Ethernet, InfiniBand, etc.) and wireless communications (Bluetooth) are allowing more
devices to exchange growing amounts of data at rapidly increasing speeds. Software
technologies are evolving, resulting in breakthrough methods of utilizing multiple system
processors. As processor speeds rise, so will the need for very fast, high-volume inter-
processor data traffic. While these new technologies quickly exceed the capabilities of
today’s PCI bus, existing interface functions like MP3 audio, v.90 modems, USB, 1394,
and 10/100Ethernet are left to compete for the remaining bandwidth. These functions are
now commonly integrated into core logic products. Higher integration is increasing the
number of pins needed to bring these multiple buses into and out of the chip packages.
Nearly all of these existing buses are single ended, requiring additional power and ground
pins to provide sufficient current return paths. High pin counts increase RF radiation,
which makes it difficult for system designers to meet FCC and VDE requirements.
Reducing pin count helps system designers to reduce power consumption and meet
thermal requirements. In response to these problems, AMD began developing the Hyper
Transport™ I/O link architecture in 1997. Hyper Transport technology has been designed
to provide system architects with significantly more bandwidth, low-latency responses,
lower pin counts, compatibility with legacy PC buses, extensibility to new SNA buses,
and transparency to operating system software, with little impact on peripheral drivers.
As CPUs advanced in terms of clock speed and processing power, the I/O
subsystem that supports the processor could not keep up. In fact, different links
developed at different rates within the subsystem. The basic elements found on a
motherboard include the CPU, Northbridge, Southbridge, PCI bus, and system memory.
Other components are found on a motherboard, such as network controllers, USB ports,
etc., but most generally communicate with the rest of the system through the Southbridge.
Hyper transport Technology
Many of the links above have advanced over the years. They each began with
standard PCI-like performance (33MHz 32-bit wide, for just over 1Gbps throughput), but
each has developed differently over time.The link between the CPU and Northbridge has
progressed to a 133MHz(effectively a 266MHz as it is sampled twice per clock cycle) 64-
bit wide bus. This provides a throughput of close to 17Gbps. The Northbridge to system
memory link has advanced to support PC2100memory: it is a 64-bit wide 133MHz (also
sampled twice per clock cycle) bus. This link also has a bandwidth of almost
17Gbps. The Northbridge to graphics controller connection has stayed at 32-bits wide
and grown to a 66MHz bus, but with 4xAGP it is sampled four times per clock. 8xAGP
Hyper transport Technology
(sampling the data eight times per clock) will pull the throughput of this link even with
the other two at nearly 17Gbps.
Until recently, however, the Northbridge-Southbridge link has remained the same
standard PCI bus. Although most devices connected to the Southbridge do not demand
high bandwidth, their demands are growing as they evolve, and the aggregate bandwidth
they could require easily exceeds the bandwidth of the Northbridge-Southbridge link.
Many server applications, such as database functions and data mining, require access to a
large amount of data. This requires as much throughput from the disk and network as
possible, which is gated by the Northbridge-Southbridge link.
Hyper transport Technology
DESIGN GOALS
PHYSICAL LAYER
The data link layer includes the initialization and configuration sequence, periodic
cyclic redundancy check (CRC), disconnect/reconnect sequence, information packets for
flow control and error management, and double word framing for other packets
Initialization
The protocol layer includes the commands, the virtual channels in which they run,
and the ordering rules that govern their flow. The transaction layer uses the elements
provided by the protocol layer to perform actions, such as read request and responses.
Hyper transport Technology
COMMANDS
All Hyper Transport technology commands are either four or eight bytes long and
begin with a 6-bit command type field. The most commonly used commands are Read
Request, Read Response, and Write.. A virtual channel contains requests or responses
with the same ordering priority.
When the command requires an address, the last byte of the command is concatenated
with an additional four bytes to create a 40-bit address.
Hyper Transport commands and data are separated into one of three types of
virtual channels: non-posted requests, posted requests and responses. Non posted
requests require a response from the receiver. All read requests and some write requests
are non-posted requests. Posted requests do not require a response from the receiver.
Write requests are posted requests. Responses are replies to non-posted requests. Read
responses or target done responses to non-posted writes are types of response messages.
Hyper transport Technology
Command packets are 4 or 8 bytes and include all of the information needed for
inter-device or system-wide communications except in the case of reads and writes, when
the data packet is required for the data payload. Hyper Transport writes require an 8-byte
Write Request control packet, followed by the data packet. Hyper Transport reads require
an 8-byte Read Request control packet(issued from the host or other device), followed by
a 4-byte Read Response control packet (issued by the peripheral or responding device),
followed by the data packet.
Hyper transport Technology
SESSION LAYER
The session layer includes link width optimization and link frequency
optimization along with interrupt and power state capabilities.
The initial link-width negotiation sequence may result in links that do not operate
at their maximum width potential. All 16-bit, 32-bit, and asymmetrically-sized
configurations must be enabled by a software initialization step. At cold reset, all links
power-up and synchronize according to the protocol. Firmware (or BIOS) then
interrogates all the links in the system, reprograms them to the desired width, and takes
the system through a warm reset to change the link widths.
At cold reset, all links power-up with 200-MHz clocks. For each link, firmware
reads a specific register of each device to determine the supported clock frequencies. The
reported frequency capability, combined with system-specific information about the
board layout and power requirements, is used to determine the frequency to be used for
each link. Firmware then writes the two frequency registers to set the frequency for each
link. Once all devices have been configured, firmware initiates an LDTSTOP# disconnect
or RESET# of the affected chain to cause the new frequency to take effect.
Hyper transport Technology
The designers of Hyper Transport technology wanted to use as few pins as possible to
enable smaller packages, reduced power consumption, and better thermal characteristics,
while reducing total system cost. This goal is accomplished by using separate
unidirectional data paths and very low-voltage differential signaling.
The signals used in Hyper Transport technology are summarized in Table 2.
Commands, addresses, and data (CAD) all share the same bits.
Each data path includes a Control (CTL) signal and one or more Clock (CLK)
signals.
- The CTL signal differentiates commands and addresses from data packets.
- For every grouping of eight bits or less within the data path, there is a forwarded
CLK signal. Clock forwarding reduces clock skew between the reference clock
signal and the signals traveling on the link. Multiple forwarded clocks limit the
number of signals that must be routed closely in wider Hyper Transport links.
In addition to CAD, Clock, Control, VLDT power, and ground pins, each
Hyper Transport device has Power OK (PWROK) and Reset (RESET#) pins. These
pins are single-ended because of their low-frequency use.
Devices that implement Hyper Transport technology for use in lower power
applications such as notebook computers should also implement Stop (LDTSTOP#)
and Request (LDTREQ#). These power management signals are used to enter and
exit low-power states.
Hyper transport Technology
At first glance, the signaling used to implement a Hyper Transport I/O link would
seem to increase pin counts because it requires two pins per bit and uses separate
upstream and downstream data paths. However, the increase in signal pins is offset by
two factors: By using separate data paths, Hyper Transport I/O links are designed to
operate at much higher frequencies than existing bus architectures. This means that buses
delivering equivalent or better bandwidth can be implemented using fewer signals.
Differential signaling provides a return current path for each signal, greatly reducing the
number of power and ground pins required in each package.
Commands, addresses, and data traveling on a Hyper Transport link are double
pumped, where transfers take place on both the rising and falling edges of the clock
signal. For example, if the link clock is 800 MHz, the data rate is 1600 MHz. An
implementation of Hyper Transport links with 16 CAD bits in each direction with
a 1.6-GHz data rate provides bandwidth of 3.2 Gigabytes per second in each
direction, for an aggregate peak bandwidth of 6.4 Gbytes/s, or 48 times the peak
bandwidth of a 33-MHz PCI bus. A low-cost, low-power Hyper Transport link using two
CAD bits in each direction and clocked at 400 MHz provides 200 Mbytes/s of bandwidth
in each direction, or nearly four times the peak bandwidth of PCI 32/33. Such a link can
be implemented with just 24 pins, including power and ground pins,
Hyper transport Technology
Hyper Transport technology provides the bandwidth support necessary to take full
advantage of InfiniBand Architecture throughput. A 1X InfiniBand link could demand as
much as 5Gbps bandwidth. While PCI 64/66, at over 4Gbps, may be adequate, it can hold
back network performance. Even PCI-X, at just over 8.5Gbps, cannot handle a single 4X
InfiniBand Architecture channel. Figure illustrates how InfiniBand Architecture’s
bandwidth needs compare to the support internal I/O technologies can provide.
IMPLEMENTATION
single ended slave is the smallest possible chain, and a host with 31 tunnel devices is the
largest possible daisy chain. Hyper Transport technology can route the data of up to 31
attached devices at an aggregate transfer rate of 3.2 gigabytes per second over an 8-bit
Hyper Transport technology I/O link, and up to 12.8 gigabytes per second over a 32-bit
link. This gives the designer a significantly larger and faster fabric while still using
existing PCI I/O drivers. In fact, the total end-to-end length for a Hyper Transport
technology chain can be several meters, providing for great flexibility in system
configuration.
Switch Topology
from the full bandwidth of the Hyper Transport technology I/O link because the switch
directs the flow of electrical signals between the slave devices connected to it.
Star Topology
Whereas daisy chain configurations offer linear bus topologies much like a
network “backbone,” and switch topologies expand these into parallel chains, a star
topology approach that distributes Hyper Transport technology links in a spoke fashion
around a central host or switch offers a great deal of flexibility. With Hyper Transport
technology tunnels and switches, Hyper Transport technology can be used to support any
type of topology, including star topologies and redundant configurations, where dual star
configurations are utilized to create redundant links.
CONCLUSION