Intel® Processor Architecture: January 2013
Intel® Processor Architecture: January 2013
Architecture
January 2013
Core
32MB
Shared Core • New micro-architecture with 8
Core
Last
Level Core Cores
Cache
Core Core • 54 MB on-die cache
Memory Link • Improved RAS and power
Controllers Controllers
management capabilities
• Doubles execution width from 6
to 12 instructions/cycle
4 Intel
Scalable
4 Full + 2 Half
Width Intel
• 32nm process technology
Memory QuickPath
Interface Interconnect • Launched in November 2012
(SMI)
Processor:
Intel® ATOM™
Z2460
2nd 3nd
Intel® Core™ Micro Architecture Generation
Intel® Core™
Generation
Intel® Core™
MicroArchitecture Codename “Nehalem” Micro Micro
Architecture Architecture
4nth
Generation
Intel® Core™ TBD TBD TBD TBD TBD
Micro
Architecture
4x single precision
Intel® SSE FP
2x double precision
FP
8x 16 bit integer
Intel® SSE2
4x 32 bit integer
2x 64 bit integer
8x single precision
FP
Intel® AVX
4x double precision
FP
bits
Gather Load elements from vector of indices 170 /
vectorization enabler 124
Copyright© 2012, Intel Corporation. All rights reserved. Partially Intel Confidential Information.
25 *Other brands and names are the property of their respective owners.
Agenda
•Overview Intel® processor architecture
•Intel x86 ISA (instruction set architecture)
•Micro-architecture of processor core
•Uncore structure
•Additional processor features
– Hyper-threading
– Turbo mode
•Summary
Instruction Queue
Memory
Decode
4
Rename/Allocate
2/4/6 MB
Front-
Side
Retirement Unit 4
2nd Level Cache Bus
(ReOrder Buffer)
Reservation Station
6
Execution Units Out-Of-Order
DTLB Execution
32kB Engine
Data Cache
Instruction Queue
Execution
Decode Engine
4 2nd Level TLB
256kB
Rename/Allocate L3 and
2nd Level Cache beyond
MLC -
Retirement Unit 4 Mid Level Cache Uncore
(ReOrder Buffer)
Reservation Station
6
Execution Units
Memory
DTLB
32kB
Data Cache
CPU CPU
memory memory
IOH
QPI
DRAM CPU0 CPU1 DRAM
DDR3 DDR3
DDR3 DDR3
DDR3 DDR3
Mem Control
DDR3 DDR3
DDR3 DDR3
DDR3 DDR3
Mem Control
DDR3
DDR3
DDR3
Graphics
“TDP” Use
accumulated
energy budget
to enhance user
Sleep or experience
Low power
Time
Buildup thermal budget
during idle periods
Software & Services Group, Developer Products Division
Copyright © 2013, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners. 60
Simultaneous Multi-Threading (SMT)
“Intel Hyper-Threading – HT”
• Run 2 threads at the very same time per core w/o SMT SMT
• Available on Nehalem (and successors) as well as
Intel® ATOM Architecture
• Take advantage of 4-wide execution engine
– Keep it fed with multiple threads
– Hide latency of a single thread
40%
Performance Gain SMT enabled vs disabled
34%
35%
29%
30%
25%
20%
16%
15% 13%
10%
10% 7%
5%
0%
Floating Point 3dsMax* Integer Cinebench* 10POV-Ray* 3.7 3DMark*
beta 25 Vantage* CPU
Intel® Core™ i7
Floating Point is based on SPECfp_rate_base2006* estimate
Integer is based on SPECint_rate_base2006* estimate
SPEC, SPECint, SPECfp, and SPECrate are trademarks of the Standard Performance Evaluation Corporation.
For more information on SPEC benchmarks, see: http://www.spec.org
Source: Intel. Configuration: pre-production Intel® Core™ i7 processor with 3 channel DDR3 memory. Performance tests and ratings are measured
Software & Services Group, Developer Products Division
using specific computer systems and / or components and reflect the approximate performance of Intel products as measured by those tests. Any
difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information
to Copyright
evaluate the©performance
2013, Intelof systems or components
Corporation. they are considering purchasing. For more information on performance tests and on the
All rights reserved.
performance of Intel products, visit http://www.intel.com/performance/
*Other brands and names are the property of their respective owners. 63
Agenda
•Overview Intel® processor architecture
•Intel x86 ISA (instruction set architecture)
•Micro-architecture of processor core
•Uncore structure
•Additional processor features
– Hyper-threading
– Turbo mode
•Summary
Performance tests and ratings are measured using specific computer systems and/or components and reflect the
approximate performance of Intel products as measured by those tests. Any difference in system hardware or
software design or configuration may affect actual performance. Buyers should consult other sources of information
to evaluate the performance of systems or components they are considering purchasing. For more information on
performance tests and on the performance of Intel products, reference www.intel.com/software/products.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2012. Intel Corporation.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are
not unique to Intel microprocessors. These optimizations include SSE2®, SSE3, and SSSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use
with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Prefetch Buffers
Instruction
Per Thread
XLAT /
Per - thread FL Cache
Instruction 2 - wide ILD
Queues
XLAT /
FL Inst .
TLB
AGU AGU DL 1
prefetcher
GDDR GDDR
... Memory
GDDR
Processor
...
Multi-Threaded Multi-Threaded
Wide SIMD Wide SIMD
Core Core
I$ D I$ D
$ $
PCIe
Client L2 L2 L2 L2
Logic
GDDR MC TD TD TD TD GDDR MC
GDDR MC TD TD TD TD GDDR MC
L2 L2 L2 L2
GDDR MC
GDDR MC
L2 L2
TD TD
Core
GDDR MC
TD
L2
GDDR MC
Core
TD
L2
Core
TD
L2
GDDR MC
Core
TD
L2
GDDR MC
TD TD
L2 L2
GDDR MC
GDDR MC
Core Core
T3 IP
16B/Cycle (2 IPC)
4 Threads
In-Order
Decode uCode
512KB
TLB Miss
HWP L2 Cache
Handler
Pipe 0 Pipe 1 L2 Control
L2 TLB
To On-Die Interconnect
X87 ALU 0 ALU 1
VPU
512b SIMD
TLB Miss
Numeric Numeric
512K L2 Cache
Convert Convert
Local Subset
8x 16b Vmask
512b
/ 4 cycles
T3
T1
T2
*
Data Convert /Broadcast
512b
T0
/
M 512b 512b 32x
E / / 512b
+
Vreg
M
L2 L1 Data Swizzle
O
R 512b
/
Y
Scalar Scalar
Register Units
Scalar Part
Copyright © 2013 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Xeon Phi™ Coprocessor: Prorgaming Model:
Operate as a
compute node
Run a full OS
Run MPI
Run OpenMP*
GPU
Run x86 code
ASIC
FPGA
Run restricted code Run offloaded code
Restrictive architectures limit the ability for applications to use arbitrary nested parallelism, functions calls and threading models
Copyright © 2013 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Well, it is an SMP-on-a-chip running Linux*
Copyright © 2013 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Xeon Phi™ Environment
NETWORK
NATIVE
Linux
MEM MEM IP
SSH
FTP
Physical View NFS
...
Autonomous
MEM MEM
Logical Views
Xeon MIC
NETWORK
MEM
OFFLOAD
NETWORK
Heterogeneous
Copyright © 2013 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
SINGLE
Flexible Execution Models
SOURCE Optimized Performance for different Workloads
CODE
Copyright © 2013 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Flexible Execution Models
Optimized Performance for different Usage Models
MPI
XEON MPI
PHI™ XEON
XEON®
PHI™
DIRECTIVES
XEON® XEON®
XEON® XEON®
PHI
XEON XEON
XEON XEON®
PHI™ PHI™
PHI™
Copyright © 2013 Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners