0% found this document useful (0 votes)

72 views13 pages

LPDDR4 Multi-Channel Architectures WP

This white paper discusses the LPDDR4 DRAM specification, highlighting its advantages for mobile applications, including increased bandwidth and reduced power consumption. It explains the architectural changes of LPDDR4, particularly its multi-channel capabilities, and provides guidance on optimal configurations for various use cases. The document also compares different connection options for LPDDR4 devices, emphasizing the benefits of multi-channel architectures for enhancing performance and efficiency.

Uploaded by

upretikanhaiya1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views13 pages

LPDDR4 Multi-Channel Architectures WP

Uploaded by

upretikanhaiya1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

White Paper

Optimizing LPDDR4 Performance and

Power with Multi-Channel Architectures
January 2016

Author LPDDR4, the latest double data rate synchronous DRAM for mobile applications, is a DRAM speciﬁcation
found in today’s high-end portable products such as Samsung Galaxy S6 smart phones, the Apple
Marc Greenberg
iPhone 6S [1], and several other recently announced devices. In addition to mobile use, we predict that
Director of Product
LPDDR4 will follow its predecessor LPDDR3 into tablets and thin and light laptops in a “memory down”
Marketing for DDR IP,
conﬁguration — i.e. when the DRAM is physically soldered onto the board.
Synopsys
LPDDR4 offers huge bandwidth in a physically small PCB area and volume; up to 25.6 GByte/s of
bandwidth at a 3,200 Mbps data rate from a single 15mmx15mm LPDDR4 package when two dies are
packaged together. LPDDR4 builds on the success of LPDDR2 and LPDDR3 by adding new features and
introducing a major architectural change.

This white paper explains how LPDDR4 is different from all previous JEDEC DRAM speciﬁcations.
It discusses:

Why designers are selecting LPDDR4

Highlights of the LPDDR4 architecture

How to best conﬁgure LPDDR4 channels

How to handle 2-die and 4-die packages with multi-channel connections

The advantages of sharing channels through system-on-chip (SoC) partitioning

How to optimize channels for the lowest power consumption

Why LPDDR4?
LPDDR4 includes a number of features that enable SoC design teams to reduce power consumption of
discrete DRAM. Desktop devices like PCs and servers commonly utilize DDR devices mounted on dual
inline memory modules (DIMM) hosted on 64-bit wide buses. This board-level solution allows ﬁeld-
upgradeable DRAM capacity expansion, but requires long and more heavily loaded interconnects which
consume more power than short traces. Systems using LPDDR2, LPDDR3 and LPDDR4 tend to have
fewer memory devices on each bus and shorter interconnects, and thus consume less power than DDR2,
DDR3 and DDR4 devices.

Design teams can call on power-saving options within the LPDDR4 DRAM. These features include
reduced voltage and I/O capacitance; a reduced width, multiplexed command and address bus;
eliminating the on-DRAM DLL; providing lower power standby modes with faster entry and exit; and
enabling faster, less complex frequency changes.

[1] Source: https://blogs.synopsys.com/committedtomemory/2015/09/29/apple-iphone-6s-lpddr4-arrives-at-apple/

Accessed November 19, 2015
Finally, LPDDR4 DRAMs have a temperature-aware refresh method that can help match the refresh rate of
the DRAM to the requirements of the DRAM’s bit cells themselves, especially when they are in a low power
self-refresh standby mode. This feature can be enabled automatically in standby, and similar temperature
indication can be read while in active mode to allow the LPDDR4 controller to adjust its refresh rate to match
the thermal state of the LPDDR4 device.

LPDDR4 Use Models for Mobile Devices

Realistically, mobile users only beneﬁt from the highest operational frequency of LPDDR4 for a small
percentage of their device’s usage. That’s when the user is capturing or displaying high deﬁnition (4K) video,
playing games with very intensive graphics needs, processing images, or booting or loading new software.

For part of the time, the memory drops to the LPDDR3 speed grade. This level of performance is sufﬁcient
to support texts, calls, web browsing, photography, simple gaming: all features that don’t place too many
demands on the CPU or GPU.

For the majority of the time, when the mobile device is not in use and in a pocket or at a bedside, the DRAM is
switched off or in low speed mode. It will have one channel of the memory active just to perform ‘always-on,
always-connected’ tasks. In this mode, the device is performing background activities such as maintaining
cell contact, receiving messages, receiving / displaying push notiﬁcations, synchronizing mail, and displaying
the time.

However, it is the performance of the device during the highest use time that drives many mobile users to
upgrade their devices, which is why it is so important to provide an outstanding user experience in this use
mode (Figure 1).

LPDDR4 Performance focus

Drives replacement cycle 3200 Mbps+
HD video,
Shiny games

Best performance
200 - 1600 Mbps (LPDDR3 range) under low-speed
Text, phone, browse, read, power limits
photograph, puzzles and simple games

DDR is off or low speed, one channel active

Phone dark and in pocket or at bedside Power focus
Maintain cell contact, receive texts, receive and
display push notiﬁcations, sync mail, show clock

Figure 1. Highest use times drive the upgrade cycle for mobile users

LPDDR4 Architectural Change

The LPDDR4 speciﬁcation deﬁnes a range of performance and feature improvements over its predecessors.
But most importantly, LPDDR4 incorporates a fundamental change of architecture: LPDDR4 devices are
arranged as two independent channels on each die.

DDR2, DDR3, and DDR4 devices offer one command address bus input and one data bus per package,
and most commonly one die per package. LPDDR2 and LPDDR3 may offer one to four dies per package. In
the case of two-die and four-die packages for LPDDR4, LPDDR3 and LPDDR2, generally two independent
command address input and data busses (channels) are provided. In other words, multi-channel has partial
enablement in LPDDR2 and LPDDR3 as they offer two independent channels per package. LPDDR4 forces the
issue into the forefront as there are two independent channels per die and four channels in most packages.

2
Connecting Multiple Channels
The LPDDR4 architecture is natively two-channel (Figure 2), in that each die has two command address inputs
and two data buses per die. Four independent channels are available on a LPDDR4 2-die package. To deploy
LPDDR4 effectively, designers must understand how this architectural change affects the system architecture.

2 independent channels
x8 x8
DQ DQ

2KB page

Channel B
Channel A

CH-A 8 banks CH-B

CA per channel CA

x8 x8
DQ DQ

Figure 2. LPDDR4 two-channel architecture

A single DRAM device with one channel (for example, a single-die package of LPDDR3) can only be connected
one way — with the command/address bus on the SoC to the command/address bus on the DRAM and the
SoC data bus to the DRAM data bus (Figure 3). A chip select enables the DRAM when it is required.

Data
DRAM
Command/address
C
SoC
(example:
Chip select LPDDR3)

Figure 3. A standard way to connect a single DRAM device

Having two DRAM devices, or one DRAM device with two independent interfaces like LPDDR4, supports four
possible conﬁgurations:

Parallel (lockstep)

Series (multi-rank)

Multi-channel

Shared command/address

Parallel (lockstep) connection

The most familiar option for designers experienced in DDR2/DDR3/DDR4 is the parallel, or lockstep,
conﬁguration. The parallel conﬁguration (Figure 4) is appropriate to two or more DRAM dies or two channels
of a LPDDR4 connected to the same command/address bus. They use the same chip-select, but each
has independent data channels. In this parallel connection, all of the DRAM devices receive the same
command and address, but they transmit their data over different byte lines. All of the devices are accessed

3
simultaneously, so both of the DRAM devices are always in the same state. They always have the same page
of memory open and access the same column, although the data stored in each DRAM is different.

Data
DRAM*
Command/address
SoC Chip select

Data
DRAM*

* One DRAM die of LPDDR2/3 or one channel of LPDDR4

Figure 4. Parallel (lockstep) connection

Series (multi-rank) connection

A second option is to connect devices together in series or multi-rank conﬁguration (Figure 5). This is
equivalent to putting multiple DIMMs into the same channel on a PC. The command/address and data buses
are connected in common to both of the DRAM devices, but access to the two DRAM devices is controlled
independently using two different chip selects on any particular command cycle. The two devices may be in
varying states with different pages of memory active. Typically, the SoC controls arbitration of the shared data
bus to ensure that the DRAMs do not transmit at the same time.

DRAM* DRAM*

Chip selects
SoC
Data

Command/address

* One DRAM die of LPDDR2/3 or one channel of LPDDR4

Figure 5. Series (multi-rank) connection

Multi-channel connection
The multi-channel connection (Figure 6) provides each channel of DRAM or each DRAM device with an
independent connection to the SoC, where each device or channel has its own command/address bus, data
bus and chip select. This ﬂexible conﬁguration enables each DRAM device (or group of devices) to operate
completely independently of the other. They may be in different states, receiving different commands and
different addresses, and one may be reading while the other is writing.

A multi-channel connection also allows for the DRAMs to operate in different power states. For example, one
memory might be in a standby self-refresh mode, while the other is fully active.

Chip select
Data
DRAM*
Command/address
SoC
Command/address
Data
DRAM*

Chip select

* One DRAM die of LPDDR2/3 or one channel of LPDDR4

Figure 6. Multi-channel connection

4
Shared command/address (CA) connection
The final configuration option, which is used more commonly in non-low-power DDR installations, is multi-
channel with shared command/address (CA) or shared AC (Figure 7). In this configuration, both of the
DRAM devices receive the same command and address, but like the serial implementation, the chip selects
determine which DRAM device is listening on any particular clock cycle, so each device may be in a different
state. The DRAM commands are arbitrated between the two channels at the SoC, but each DRAM can
transmit data independently.

Chip select
Data DRAM

SoC Command/address

Data
DRAM
Chip select

Figure 7. Shared CA connection

Comparing options for connecting two channels

Each of these conﬁguration options has its own advantages and disadvantages (Figure 8). For example, the
parallel implementation has only eight banks available, and the minimum amount of data that can be fetched
is 64 bytes at a time over a 32-bit wide bus. The parallel implementation is also less suited to package-on-
package (PoP) implementation.

DRAM DRAM DRAM DRAM DRAM

SoC SoC SoC SoC

DRAM 1 channel
DRAM DRAM DRAM of LPDDR4
o

CA bus
DQ (data) bus
Parallel Series Multi-channel Shared CA CS (chip select)
CA pins: 6 CA pins: 6 CA pins: 12 CA pins: 6
DQ pins: 32 DQ pins: 16 DQ pins: 32 DQ pins: 32
CS pins: 1 CS pins: 1 CS pins: 2 CS pins: 2
Banks: 8 Banks: 8 Banks: 16 Banks: 16
Fetch: 64 Fetch: 32 Fetch: 32 Fetch: 32/64

Figure 8. Comparing options to connect two channels (1 die) of LPDDR4

The series connection is also less suited for PoP implementation. It does save some DQ pins, but because the
DRAM devices share a data bus it offers half the bandwidth of the other solutions, which makes this approach
less attractive.

While a shared CA implementation is better suited to DDR systems, a multi-channel connection can help
design teams to get the best out of LPDDR4.

Manage 2-Die and 4-Die Packages with Multi-Channel Connections

A popular way of implementing LPDDR4 is to use two LPDDR4 dies in a single package, which provides four
16-bit channels and allows for eight different topologies. Of these eight possible ways to connect LPDDR4
devices to the SoC, there are three particularly useful implementations:

“True” four-channel

Two-channel and two-parallel

Fully parallel

5
Design teams that want to get the most bandwidth out of their LPDDR4 device, especially if using small
data transfers, may consider a true four-channel implementation (Figure 9). Compared to the other
implementations, it has the highest number of banks and the smallest fetch size. It requires 24 CA pins on the
SoC and may be implemented with four separate memory controllers and PHYs on the SoC.

DRAM

4 Channel 1 channel
DRAM
CA pins: 24 o
of LPDDR4
DQ pins: 64 DRAM SoC DRAM CA bus
CS pins: 4
DQ (data) bus
Banks: 32
Fetch: 32 CS (chip select)

DRAM

Figure 9. True four-channel implementation

The two-channel and parallel implementation offers a good compromise between a fully parallel and a four-
channel implementation. It is especially useful for LPDDR3-LPDDR4 combinations (Figure 10). Most early
examples of commercial SoCs using LPDDR4 have used this conﬁguration.

DRAM DRAM

2 Channel
and parallel DRAM 1 channel
of LPDDR4
o
CA pins: 12 SoC
DQ pins: 64 CA bus
CS pins: 2 DQ (data) bus
Banks: 16 CS (chip select)
Fetch: 64

DRAM DRAM

Figure 10. Two-channel and parallel implementation

The fully parallel implementation uses only six CA pins and has the maximum number of DQs (64). However,
there are only eight banks available in this system. The fetch size is a minimum fetch of 128 bytes, which can
limit its usefulness for some applications. It may also be necessary to duplicate the pins of the CA bus for bus
loading or chip-level timing closure reasons.

Figure 11 shows an example of a 2-die 4-channel LPDDR4 multi-channel implementation (left) and a 4-die
implementation (right). The LPDDR4 package contains four connected dies. Each physical channel has two
ranks of memory connected to it. This conﬁguration requires the design team to extend the connection in a
serial direction on each of the four channels on the package. Unfortunately, a 4-die package doesn’t provide
8-channel connectivity; there are only four channels of package balls on the 4-die package.

6
2-die LPDDR4 multichannel 4-die LPDDR4 multichannel
implementation and serial implementation

DRAM

DRAM DRAM

DRAM SoC DRAM DRAM DRAM SoC DRAM DRAM

DRAM DRAM DRAM 1 channel

of LPDDR4
o

CA bus
DQ (data) bus
DRAM CS (chip select)

Figure 11. Two-die and four-die implementations. Four-die LPDDR4 multichannel and serial implementation adds
DRAM capacity. This solution is compatible with two-die packages.

In summary, the recommended two-die LPDDR4 implementations are:


Two-channel and parallel, which is the most familiar implementation for LPDDR3 users and those
looking to combine LPDDR3/LPDDR4

Four-channel, which is the most ﬂexible and potentially the highest performance

Design Recommendations for Sharing Channels

Improve LPDDR4 performance through many banks
LPDDR4 has a similar heritage to many generations of DRAM in that it has memory that is structured into
banks, each bank having a multitude of rows and each row having a multitude of columns that store data.
Access to data in columns in the same row is fast, and accessing different rows in different banks is fast, but
accessing different rows within the same bank is very slow.

Accessing each channel independently of the others means that every bank on every channel can have
a different row activated. For small transfers like video and network packets that are spread randomly
throughout the memory, having more banks available will avoid some of the inherent memory timing
parameters that could limit performance. Spreading transactions across as many banks as possible will
improve the performance because it reduces the probability of hitting some of the memory timing parameters.

Having more banks in the system, and extending the length of time it takes for commands to complete on
each bank, can improve performance by reducing the probability of delays due to tRRD, tFAW, and tRC
memory timing parameters:

tRC — the row cycle time of the memory. This is the minimum time between activate times to different
rows in the same bank.

tRRD — row-to-row delay. This is the minimum time between activate commands to different rows in
different banks.

tFAW — four activate window. This timing parameter says that no more than four activate commands
can be issued within a rolling tFAW window. It is set to being four times tRRD in the LPDDR4
standard, so for LPDDR4, these are effectively the same timing constraints, although other memories
may use a different relationship between tRRD and tFAW.

The tRC timing can cause problems particularly in faster devices. At the highest speed of LPDDR4, the tRC
time is over 100 clock cycles. When operating at the highest speeds of LPDDR4, after a row in a bank has
been activated, tRC prevents access to any other row in that bank for at least 100 clock cycles, which is a long
time to lock out that bank from being used again. Having more banks available will lower the probability of
having to access a new row in a bank that is currently locked out because of the tRC time.

7
The tRRD and tFAW limit the ability to change banks frequently, something that the design team may want to
do to avoid the tRC timing parameter.

Figure 12 shows an example device with a four activate window tFAW of four times the row-to-row delay tRRD.
The tRRD time may be up to 16 clock cycles at LPDDR4-3200.

Figure 12. tFAW and tRRD timing

Figure 13 shows a continuous sequence of transactions executing on the parallel implementation. The
annotation AC/BA0 is shorthand for an activate command to bank 0. The command next to it, RD/BA4, shows
a read command to bank 4 (assume that bank 4 was activated some time earlier in time). Each command
bubble represents four clock cycles, because of the four-phase addressing of the LPDDR4 device. In practice,
the sequence would be extended as activate followed by a read, activate, read, activate, read, activate, read.
Data comes back completely occupying the DQ bus, which is full. The parallel access pattern utilizes 100%
memory bandwidth — but only when accessing the device at 800MHz (DDR1600).

Parallel

DRAM CA AC RD AC RD AC RD AC RD
BA0 BA4 BA1 BA5 BA2 BA6 BA3 BA7

SoC

DQ[31:16] Data Data Data Data

DRAM bytes 32-63 bytes 32-63 bytes 32-63 bytes 32-63
Data Data Data Data
DQ[15:0]
bytes 0-31 bytes 0-31 bytes 0-31 bytes 0-31

One bubble represents multiple clock cycles. AC = Activate Command. RD = Read Command

Figure 13. Parallel implementation using continuous 64-byte reads to rotating addresses at
BL 16 and 800MHz/DDR1600

Figure 14 shows the two-channel implementation executing the same sequence using each of the command
address channels independently. Each command address bus has a slightly different pattern on it: activate,
read, no-op, read, activate, read, no-op, read. The space in the command channel could be used for
something else like a commanded pre-charge or a per-bank refresh, or simply left as an idle clock cycle. The
data bus is fully occupied.

8
2 Channel

CA_a AC RD RD AC RD RD AC
BA0 BA4 BA4 BA2 BA6 BA6 BA4
DRAM
CA_b AC RD RD AC RD RD AC
BA1 BA5 BA5 BA3 BA7 BA7 BA5
SoC

DQ[31:16] Data Data Data Data

DRAM bytes 0-31 bytes 32-63 bytes 0-31 bytes 32-63
Data Data Data Data
DQ[15:0]
bytes 0-31 bytes 32-63 bytes 0-31 bytes 32-63

One bubble represents multiple clock cycles. AC = Activate Command. RD = Read Command

Figure 14. Two-channel implementation using command address channels independently using continuous
64-byte reads to rotating addresses at BL 16 and 800MHz/DDR1600

When the frequency is doubled to 1600 MHz (DDR 3200 operation) (Figure 15), the tRRD time will limit
the SoC’s ability to send activate commands to the LPDDR4 device in the upper example of a parallel
implementation. The sequence is: activate, read, no-op, no-op, activate, read, no-op, no-op. The no-op
cycles could be used for pre-charges or refreshes, but the memory cannot be activated fast enough to issue
sequential 64-bank transactions to a new bank with each transaction.

Parallel
Data gaps created, tRRD limits activates
tRRD(min)

DRAM CA AC RD AC RD AC RD AC
BA0 BA4 BA1 BA5 BA1 BA5 BA2

SoC

DQ[31:16] Data Data

DRAM bytes 32-63 bytes 32-63
Data Data
DQ[15:0]
bytes 0-31 bytes 0-31

2 Channel
Still works

CA_a AC RD RD AC RD RD AC
BA0 BA4 BA4 BA2 BA6 BA6 BA4
DRAM
CA_b AC RD RD AC RD RD AC
BA1 BA5 BA5 BA3 BA7 BA7 BA5
SoC

DQ[31:16] Data Data Data Data

DRAM bytes 0-31 bytes 32-63 bytes 0-31 bytes 32-63
Data Data Data Data
DQ[15:0]
bytes 0-31 bytes 32-63 bytes 0-31 bytes 32-63

One bubble represents multiple clock cycles. AC = Activate Command. RD = Read Command

Figure 15. Doubling the frequency to 1600 MHz/DDR3200

Without having another 64-byte transaction to the same page of memory, the SoC must wait until tRRD
has elapsed and it can activate a new page in memory again. This mode of operation limits the maximum
performance of the device to 50% bandwidth if the transactions are not long enough to allow two reads to
each bank before moving to a new bank.

9
By contrast, the two-channel implementation at the bottom of Figure 15 allows each channel to satisfy tRRD
because of the “activate, read, no-op, read” pattern, even with shorter accesses. The bus bandwidth can run
at full capacity, even at the DDR 3200 data rate.

Find the minimum fetch size

The fetch size is the minimum number of bytes that can be transmitted in one DRAM transaction (one burst).
Because LPDDR4 has a minimum burst length of 16, parallel connections of LPDDR4 may have an undesirable
fetch size.

The best approach is to match the fetch size to the SoC, both in terms of the size of transfers to be transmitted
over the bus and the total bandwidth targeted from the device.

A preferred size for the cache lines of many SoCs and CPUs is 32 bytes. Occasionally, some larger 64-bit
CPUs use 64-byte cache lines. Video and networking trafﬁc often requires short transactions of 32 bytes or
less. Ideally, the multichannel architecture should match the system fetch size, so the system can be optimized
to the size of the fetch that it can use.

The parallel implementation shown in Figure 16, with a minimum burst length of 16 for LPDDR4, and 64 DQ
pins in parallel, produces a 128-byte fetch, which is really only suitable for long data transfers to contiguous
addresses. The parallel implementation can work for accesses in units of 128 bytes at a time, but if the
accesses are smaller than 128 bytes and to random addresses, the parallel implementation will be inefﬁcient.

DRAM

Parallel 1 channel
DRAM DRAM
CA pins: 6 o
of LPDDR4
DQ pins: 64 SoC
CA bus
CS pins: 1
Banks: 8 DRAM DQ (data) bus
Fetch: 128 CS (chip select)

DRAM

Figure 16. Parallel implementation

Another issue in creating a 64-bit parallel implementation is the physical connection between the SoC and
the DRAM dies. The ball-out of the LPDDR4 PoP package is arranged as a channel in each corner, so there
are four channels on the package to accommodate two or four dies. Each channel is in a corner of the device.
Ideally, the SoC memory controller and PHY placement should match that LPDDR4 ballout. This arrangement
will allow channel A to map to channel A, channel B to B, C to C, and D to D, keeping the routes within
the LPDDR4 PoP package as short as possible without crossovers. This package layout also makes for a
challenging physical implementation of a parallel 4-channel LPDDR4 interface.

The user should also take care that if the transactions are to different pages in memory, that tRRD may limit
the effective bandwidth at higher frequencies as explained in the previous section.

For these reasons, the multichannel implementations of LPDDR4 are often preferred over the four-channel
parallel implementation.

Command/address bus
LPDDR4 has a very narrow command/address bus (only six bits wide per channel compared to 20 or
more bits for DDR4) so the overhead of using multiple command/address channels is less than with other
technologies. Using all four of the command/address buses independently on the LPDDR4 package offers the
most ﬂexibility and potentially the highest performance for the overall system.

SoC Partitioning for LPDDR4 PoP

There are a number of ways to partition the SoC for LPDDR4 — Figure 17 shows the simplest way. This is a
homogeneous CPU architecture that happens to have four CPUs and four channels. Each CPU has its own

10
access to its own independent channel. This architecture has some advantages: the CPUs don’t block each
other and the SoC buses are shorter. Channels that are not being used can be powered down.

Channel A Channel C Channel A Channel C

CPU CPU

LPDDR4 package
ballout

CPU CPU

Channel B Channel D Channel B Channel D

LPDDR4 package SoC layout

Figure 17. Simplest SoC partitioning for LPDDR4 PoP

However, this architecture is also inﬂexible. If channel A needs to use some of the data that is in channel C,
it cannot use the memory as a mailbox. It must transfer the data somehow through the SoC. It also makes it
harder for the CPUs to work on shared tasks for load balancing.

Another approach is to have every CPU sharing every memory (Figure 18). This allows for more ﬂexible
partitioning. It tends to work better for heterogeneous processing and the CPUs can work on shared data, but
there is a lot more wiring and longer wires on the chip, and it may require a sophisticated on-chip interconnect
system. This represents more accurately how real chips work, especially with a heterogeneous processing
architecture with different sizes of CPUs, GPUs, and other processing elements.

Channel A Channel C

CPU CPU

Channel B Channel D

Figure 18. Share the channels — all CPUs share all memory

Logical to physical address map

A multichannel architecture offers a number of options to control the logical to physical address map.
Consider a two-channel architecture as shown in Figure 19. There are a number of ways to control the logical
to physical address mapping. The simplest way is to arrange the two channel memories in different address
spaces on the SoC (Figure 19).

11
Logical
Physical location
address

Two memory channels are

completely different memory spaces

Y MByte

Application
data,
video buffer,
etc

X MByte

Operating
system and
“always
on always
connected”
functions

0
Channel A Channel B

Figure 19. Logical to physical address mapping using separate memory map

For example, Channel A might hold the operating system and always-on, always-connected functions.
Channel B may contain application data, a video buffer, and similar data. These two different address spaces
are independent and separate. This helps power control because, for example, channel B can be powered
down when not in use.

Another approach is to interleave the memory map by having small consecutive logic address regions
accessing different channels of the memory (Figure 20). For example, bytes 0 to 63 in channel A, bytes 64 to
127 in channel B, and so on back and forth up through the memory. The logical space is interleaved across
the whole memory. This approach helps load balancing across the two different channels, and can enable
good performance. However because both channels are always required, it’s not possible to shut down either
channel to save power.

Logical
Physical location
address

Y MByte

The whole logical address

space is interleaved across
X MByte the whole memory

0
Channel A Channel B

Figure 20. Interleaved memory map

12
A further implementation option is to use a hybrid memory map (Figure 21) where different regions in each
channel can provide either non-interleaved access or interleaved access. This hybrid approach could include
a region of memory that is always on and always connected, a region of memory that is interleaved between
two channels to get the highest performance, and an upper area of memory for programs that are associated
with the high bandwidth.

Logical
Physical location
address

Memory for
programs that
are associated
with high
bandwidth

Y MByte

Memory for high-performance

and video functions may be
interleaved across both memory
channels to distribute trafﬁc
when both channels are on

X MByte

Operating
system and
“always
on always
connected”
functions

0
Channel A Channel B

Figure 21. Hybrid memory map

Synopsys LPDDR4 IP Solution for High-Performance,

Low-Power Mobile SoCs
Synopsys’ complete LPDDR4 IP solution includes an LPDDR4 multiPHY with I/Os, Enhanced Universal DDR
memory controller (uMCTL2) and Protocol controller (uPCTL2), veriﬁcation IP, modelling tools and hardening
and signal integrity services. The IP fully supports the LPDDR4 standard and can be conﬁgured to take
advantage of the multi-channel architecture options described here.

The Synopsys DDR memory controllers, including the uMCTL2 memory controller, offer a multiport or single
port connection into the SoC. The buses available include AXI3, AXI4, or AHB from 1-16 ports. A single-port
protocol controller, uPCTL2, is available for systems that schedule memory trafﬁc outside the controller.

uMCTL2 has low latency and high bandwidth and strong QoS, including QoS-driven arbitration and
high-performance scheduling algorithm within it. The low-power functions within the memory control are
automated, allowing the design team to focus on the system design. It has multiple memory support for DDR2,
DDR3, DDR4, as well as LPDDR2, LPDDR3, and LPDDR4. For automotive and other high-reliability systems,
the IP offers a range of Reliability, Availability, Serviceability (RAS) features.

The uMCTL2 memory controller for LPDDR4 offers a CAM-based scheduling architecture, especially
optimized for 2667-4266 data rate, and multiple address maps to allow ﬂexibility in systems supporting
different use modes and multiple memory types. It has automatic power-down and self-refresh with fast
frequency switching, and supports automatic temperature monitoring and refresh rate adjustment.

Conclusion
The LPDDR4 multichannel speciﬁcation provides new opportunities for novel system designs, especially
within multichannel architectures that can improve system performance. Design teams need to consider
performance, power, and complexity when considering deployment of the LPDDR4 architecture.

Synopsys, Inc. • 690 East Middleﬁ eld Road • Mountain View, CA 94043 • www.synopsys.com
©2016 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks is
available at http://www.synopsys.com/copyright.html . All other names mentioned herein are trademarks or registered trademarks of their respective owners.
01/27/16.CS6789_Optimizing LPDDR4_WP_kw.

LPDDR5 5X JEDEC 9600 Specification Draft
No ratings yet
LPDDR5 5X JEDEC 9600 Specification Draft
643 pages
Lpddr5 System Training
100% (1)
Lpddr5 System Training
32 pages
Introduction To SDRAM and Memory Controllers
No ratings yet
Introduction To SDRAM and Memory Controllers
31 pages
RK3399 TRM PDF
No ratings yet
RK3399 TRM PDF
1,884 pages
Ipad3 Diagram
100% (1)
Ipad3 Diagram
48 pages
Atmel 11121 32 Bit Cortex A5 Microcontroller SAMA5D3 Datasheet
No ratings yet
Atmel 11121 32 Bit Cortex A5 Microcontroller SAMA5D3 Datasheet
1,817 pages
IMX23 Reference Manual
No ratings yet
IMX23 Reference Manual
1,612 pages
LPDDR Tec104 Slides
No ratings yet
LPDDR Tec104 Slides
19 pages
WHKETaz Wa FPFT XBX
No ratings yet
WHKETaz Wa FPFT XBX
281 pages
JEDEC
No ratings yet
JEDEC
36 pages
BIG-INNO EMCP MX100 Series Datasheet PDF
No ratings yet
BIG-INNO EMCP MX100 Series Datasheet PDF
109 pages
CompTIA A+ 220-1101 (Core 1) Module 4
No ratings yet
CompTIA A+ 220-1101 (Core 1) Module 4
111 pages
Imx8qxpaec 1950139
No ratings yet
Imx8qxpaec 1950139
152 pages
Advanced Hardware and PCB Design Masterclass 2022
No ratings yet
Advanced Hardware and PCB Design Masterclass 2022
5 pages
MindShare Fundamentals DRAM DDRX LPDDRX Outline
No ratings yet
MindShare Fundamentals DRAM DDRX LPDDRX Outline
2 pages
13 Generation Intel Core Processors: Rev. 005 February 2023
No ratings yet
13 Generation Intel Core Processors: Rev. 005 February 2023
196 pages
11 Generation Intel Core™ Processor: Datasheet, Volume 1 of 2
No ratings yet
11 Generation Intel Core™ Processor: Datasheet, Volume 1 of 2
166 pages
Entrenamiento Técnico Modelos 2023 TV
No ratings yet
Entrenamiento Técnico Modelos 2023 TV
142 pages
DDR5 Vs DDR4 - All The Design Challenges & Advantages - Rambus
No ratings yet
DDR5 Vs DDR4 - All The Design Challenges & Advantages - Rambus
4 pages
Memory Ram
No ratings yet
Memory Ram
15 pages
i.MX 8MMini Yhsc
No ratings yet
i.MX 8MMini Yhsc
49 pages
Datasheet BGA 254 UFS
No ratings yet
Datasheet BGA 254 UFS
64 pages
Cleone Foo Hui Ming-Tp063909 - Eva Tam Shi Ying-Tp064122 - Ong Jing Qing-Tp063906 - Wong Xin Ying-Tp064308 - Wong Zu Er-Tp064482 - Yeo Zu En-Tp060441
No ratings yet
Cleone Foo Hui Ming-Tp063909 - Eva Tam Shi Ying-Tp064122 - Ong Jing Qing-Tp063906 - Wong Xin Ying-Tp064308 - Wong Zu Er-Tp064482 - Yeo Zu En-Tp060441
46 pages
Calculation DDR
No ratings yet
Calculation DDR
26 pages
Emcp Specification: NCEPNA6M4-xxxx
No ratings yet
Emcp Specification: NCEPNA6M4-xxxx
71 pages
Synopsys - DDR4.datasheet - DWC ddr4 Multiphy Ds
No ratings yet
Synopsys - DDR4.datasheet - DWC ddr4 Multiphy Ds
4 pages
Mobile DRAM App Note For PCB Design Guide Rev0-1
No ratings yet
Mobile DRAM App Note For PCB Design Guide Rev0-1
17 pages
Mobile Devices Deep Dive Sample Report Xiaomi MI3
No ratings yet
Mobile Devices Deep Dive Sample Report Xiaomi MI3
58 pages
An4803 Highspeed Si Simulations Using Ibis and Boardlevel Simulations Using Hyperlynx Si On Stm32 Mcus and Mpus Stmicroelectronics
No ratings yet
An4803 Highspeed Si Simulations Using Ibis and Boardlevel Simulations Using Hyperlynx Si On Stm32 Mcus and Mpus Stmicroelectronics
34 pages
Chip 20 Pin Buck Controler VDD Ram Ds8231ab-07
No ratings yet
Chip 20 Pin Buck Controler VDD Ram Ds8231ab-07
23 pages
Yole 2019-Dram Nand
No ratings yet
Yole 2019-Dram Nand
40 pages
PolarFire SoC FPGA Motion JPEG Video Streaming Over Ethernet Application Note AN4520
No ratings yet
PolarFire SoC FPGA Motion JPEG Video Streaming Over Ethernet Application Note AN4520
37 pages
Zetta - 8GB eMMC and 8Gb 2 X 4Gb LPDDR3 MCP - Datasheet
No ratings yet
Zetta - 8GB eMMC and 8Gb 2 X 4Gb LPDDR3 MCP - Datasheet
9 pages
DDR Sdram - Wikipedia
No ratings yet
DDR Sdram - Wikipedia
40 pages
Amf Net T3267
No ratings yet
Amf Net T3267
48 pages
CHAP11
No ratings yet
CHAP11
60 pages
Technical Note: LPDDR - Thermal Implications For Die Stacks
No ratings yet
Technical Note: LPDDR - Thermal Implications For Die Stacks
11 pages
DDR Evolution
No ratings yet
DDR Evolution
2 pages
DDR Sdram
No ratings yet
DDR Sdram
9 pages
DDR5-Anil Pandey PDF
No ratings yet
DDR5-Anil Pandey PDF
3 pages
Memory Selection of ES
No ratings yet
Memory Selection of ES
37 pages
DRAM Component Part Numbering System: MT 42 A 128M16 D1 KL - 25 IT ES:A
No ratings yet
DRAM Component Part Numbering System: MT 42 A 128M16 D1 KL - 25 IT ES:A
1 page
DDRSTRPPT
No ratings yet
DDRSTRPPT
44 pages
e-Catalouge@LPDDR 230308
No ratings yet
e-Catalouge@LPDDR 230308
2 pages
CHMA Unit - V
100% (1)
CHMA Unit - V
25 pages
rk3066 Datasheet v1
No ratings yet
rk3066 Datasheet v1
46 pages
DDR Sdram
No ratings yet
DDR Sdram
36 pages
U98m Auto at Ut Lpddr2
No ratings yet
U98m Auto at Ut Lpddr2
144 pages
DDR4 vs. DDR5 - Blog
No ratings yet
DDR4 vs. DDR5 - Blog
6 pages
LPDDR4 Moves Mobile
100% (1)
LPDDR4 Moves Mobile
14 pages
DDR SDR Sdram Comparision
No ratings yet
DDR SDR Sdram Comparision
12 pages
Introduction To DDRX Technology 1732457404
No ratings yet
Introduction To DDRX Technology 1732457404
75 pages
NewFeatures in (LP) DDR5 20231224
No ratings yet
NewFeatures in (LP) DDR5 20231224
20 pages
DDR Sdram
No ratings yet
DDR Sdram
25 pages
ApplicationSpecific DRAM Architectures and Designs
No ratings yet
ApplicationSpecific DRAM Architectures and Designs
81 pages
DDR Sdram: Double Data Rate Synchronous Dynamic Random-Access Memory
No ratings yet
DDR Sdram: Double Data Rate Synchronous Dynamic Random-Access Memory
8 pages
Technical Note: Using DDR4 in Networking Subsystems
No ratings yet
Technical Note: Using DDR4 in Networking Subsystems
24 pages
Ieee Proof: DRAM Refresh Mechanisms, Penalties, and Trade-Offs
No ratings yet
Ieee Proof: DRAM Refresh Mechanisms, Penalties, and Trade-Offs
14 pages
The Dynamic Random Access Memory Challenge in Embedded Computing Systems
No ratings yet
The Dynamic Random Access Memory Challenge in Embedded Computing Systems
18 pages
Technical Note: DDR4 Point-to-Point Design Guide
No ratings yet
Technical Note: DDR4 Point-to-Point Design Guide
33 pages
CompTIA A 220-1101 RAM & STORAGE SOLUTIONS
No ratings yet
CompTIA A 220-1101 RAM & STORAGE SOLUTIONS
18 pages
Dual-Channel Memory
No ratings yet
Dual-Channel Memory
14 pages
Lecture 7 Main Memory
No ratings yet
Lecture 7 Main Memory
36 pages
Technical Note: DDR4 Point-to-Point Design Guide
No ratings yet
Technical Note: DDR4 Point-to-Point Design Guide
34 pages
Technical Note: DDR4 Point-to-Point Design Guide
No ratings yet
Technical Note: DDR4 Point-to-Point Design Guide
31 pages
PRESENTATION2
No ratings yet
PRESENTATION2
13 pages
DDR PHY Test Solution
No ratings yet
DDR PHY Test Solution
45 pages
Voltage Frequency Bus Width Prefetch Bu Er Use Case Performance
No ratings yet
Voltage Frequency Bus Width Prefetch Bu Er Use Case Performance
4 pages
Memory Configuration Guide: Super Micro Computer, Inc
No ratings yet
Memory Configuration Guide: Super Micro Computer, Inc
10 pages
X9 DP Memory Config Socket R
No ratings yet
X9 DP Memory Config Socket R
6 pages
Coa CH4
No ratings yet
Coa CH4
6 pages
LPDDR Mynotes
No ratings yet
LPDDR Mynotes
6 pages
DDR4 Brochure 0
No ratings yet
DDR4 Brochure 0
6 pages
DDR4 Sdram
No ratings yet
DDR4 Sdram
29 pages
Types of Dram Theory
No ratings yet
Types of Dram Theory
5 pages
Computer Memories
No ratings yet
Computer Memories
26 pages
Double Data Rate Synchronous Dynamic RAM (DDR SDRAM) : Time in Market: Popular Products Using DDR SDRAM
No ratings yet
Double Data Rate Synchronous Dynamic RAM (DDR SDRAM) : Time in Market: Popular Products Using DDR SDRAM
2 pages
Memory Drammodules: Never Stop Thinking
No ratings yet
Memory Drammodules: Never Stop Thinking
16 pages
EKH-dram Standards v03
No ratings yet
EKH-dram Standards v03
16 pages
Design of DDR4 SDRAM Controller: December 2014
No ratings yet
Design of DDR4 SDRAM Controller: December 2014
5 pages
Mobile DRAM Standard Formulation
No ratings yet
Mobile DRAM Standard Formulation
5 pages
DDR4 White Paper
No ratings yet
DDR4 White Paper
8 pages
DRAM Is A Type of Random Access Memory That Stores Each Bit of Data in A Separate
No ratings yet
DRAM Is A Type of Random Access Memory That Stores Each Bit of Data in A Separate
6 pages
What Is F-RAM: Micron DDR3
No ratings yet
What Is F-RAM: Micron DDR3
3 pages
DDR Sdram: A 1.8V, 700mb/s/pin, 512Mb DDR-II SDRAM With On-Die Termination and Off-Chip Driver Calibration
No ratings yet
DDR Sdram: A 1.8V, 700mb/s/pin, 512Mb DDR-II SDRAM With On-Die Termination and Off-Chip Driver Calibration
36 pages
Indoor Radio Planning: A Practical Guide for 2G, 3G and 4G
From Everand
Indoor Radio Planning: A Practical Guide for 2G, 3G and 4G
Morten Tolstrup
5/5 (1)
DMR for Hams The Ultimate Beginner’s Guide
From Everand
DMR for Hams The Ultimate Beginner’s Guide
Duarte Braga
No ratings yet
Analog Dialogue, Volume 47, Number 4
From Everand
Analog Dialogue, Volume 47, Number 4
Analog Dialogue
No ratings yet

LPDDR4 Multi-Channel Architectures WP

Uploaded by

LPDDR4 Multi-Channel Architectures WP

Uploaded by

White Paper

Optimizing LPDDR4 Performance and

[1] Source: https://blogs.synopsys.com/committedtomemory/2015/09/29/apple-iphone-6s-lpddr4-arrives-at-apple/

LPDDR4 Use Models for Mobile Devices

LPDDR4 Performance focus

DDR is off or low speed, one channel active

LPDDR4 Architectural Change

CH-A 8 banks CH-B

Figure 2. LPDDR4 two-channel architecture

Figure 3. A standard way to connect a single DRAM device

Parallel (lockstep) connection

* One DRAM die of LPDDR2/3 or one channel of LPDDR4

Figure 4. Parallel (lockstep) connection

Series (multi-rank) connection

* One DRAM die of LPDDR2/3 or one channel of LPDDR4

Figure 5. Series (multi-rank) connection

* One DRAM die of LPDDR2/3 or one channel of LPDDR4

Figure 6. Multi-channel connection

Figure 7. Shared CA connection

Comparing options for connecting two channels

DRAM DRAM DRAM DRAM DRAM

SoC SoC SoC SoC

Figure 8. Comparing options to connect two channels (1 die) of LPDDR4

Manage 2-Die and 4-Die Packages with Multi-Channel Connections

Figure 9. True four-channel implementation

Figure 10. Two-channel and parallel implementation

DRAM SoC DRAM DRAM DRAM SoC DRAM DRAM

DRAM DRAM DRAM 1 channel

In summary, the recommended two-die LPDDR4 implementations are:

Design Recommendations for Sharing Channels

Figure 12. tFAW and tRRD timing

DQ[31:16] Data Data Data Data

DQ[31:16] Data Data Data Data

DQ[31:16] Data Data

DQ[31:16] Data Data Data Data

Figure 15. Doubling the frequency to 1600 MHz/DDR3200

Find the minimum fetch size

Figure 16. Parallel implementation

SoC Partitioning for LPDDR4 PoP

Channel A Channel C Channel A Channel C

Channel B Channel D Channel B Channel D

LPDDR4 package SoC layout

Figure 17. Simplest SoC partitioning for LPDDR4 PoP

Logical to physical address map

Two memory channels are

The whole logical address

Figure 20. Interleaved memory map

Memory for high-performance

Figure 21. Hybrid memory map

Synopsys LPDDR4 IP Solution for High-Performance,

You might also like