Assignment 7: Ee15B124: H.Sahithya Kavya Ee15B127: Shyam Shankar H R November 6, 2017
Assignment 7: Ee15B124: H.Sahithya Kavya Ee15B127: Shyam Shankar H R November 6, 2017
Processor : Jaguar
Introduction :
The AMD Jaguar Family 16h is a low-power microarchitecture designed by AMD, and used in APUs . It is two-way superscalar. Jaguar is a
64-bit, out-of-order microprocessor that decodes and issues 2 instructions and dispatches 6 operations per cycle.
Architectural features :
L1 Instruction Cache : 32KB, 2 way associative design with 64B cache lines.The instructions are parity protected, with a pseudo-Least
Recently Used (LRU) replacement algorithm.The L1 Instruction Translation Look-aside Buffer (ITLB) is fully associative with 32 entries
for 4KB pages and 8 entries for 2MB pages. The larger 1GB pages are fragmented into multiple 2MB pages, since the target markets
are relatively unlikely to use huge pages.
L1 Data Cache : 32KB per core.The basic load-to-use latency for the L1D cache is 3 cycles, comprising one cycle for address generation
and two cycles for cache access. Jaguars bandwidth to the L1 data cache is doubled compared to Bobcat, corresponding to the increase
in performance in the FP/SIMD cluster. Jaguar also includes a number of enhancements to store forwarding.
L2 Cache : 1MB - 2MB Unified cache shared.A group of four cores shares a single L2 cache control block that controls 4 banks of L2
data arrays and acts as the central point of coherency and interface to the rest of the system. For systems with more than four cores,
each local cluster must be connected through a fabric.
Instruction set : AMD 64(X86-64).
General purpose registers : 16 with 64bit each
Virtual address format : 64 bits virtual address, of which the low-order 48 bits are used in current implementations. This allows up to
256 TB of virtual address space. The architecture definition allows this limit to be raised in future implementations to the full 64 bits.
1
Virtual Memory :
Logical address :
A logical address is a reference into a segmented-address space. It is comprised of the segment selector and the effective address. Notationally,
a logical address is represented as Logical Address = Segment Selector : Offset. The real address or segment descriptor specified by the
segment register is combined with the offset contained in the second register to form a real or virtual address.
Effective address :
The offset into a memory segment is referred to as an effective address . Effective addresses are formed by adding together elements comprising
a base value, a scaled index value, and a displacement value. The effective-address computation is represented by :
where ,
Base A value stored in any general-purpose register.
Scale A positive value of 1, 2, 4, or 8.
Index A twos-complement value stored in any general-purpose register.
Displacement An 8-bit, 16-bit, or 32-bit twos-complement value encoded as part of the instruction.
Long mode defines a 64-bit effective-address length. If a processor implementation does not support the full 64-bit virtual-address space,
the effective address must be in canonical form, which is as follows
The AMD64 architecture defines an expanded page-translation mechanism supporting translation of a 64-bit virtual address to a 52-bit
physical address. The enhancements are summarized below.
2
Physical-Address Extensions (PAE) :The AMD64 architecture requires physical-address extensions to be enabled (CR4.PAE=1) before
long mode is entered. When PAE is enabled, all paging data-structures are 64 bits, allowing references into the full 52-bit physical-
address space supported by the architecture.
Page-Size Extensions (PSE) :Page-size extensions (CR4.PSE) are ignored in long mode. Long mode does not support the 4-Mbyte
page size enabled by page-size extensions. Long mode does, however, support 4-Kbyte and 2-Mbyte page sizes.
Paging Data Structures : The AMD64 architecture extends the page-translation data structures in support of long mode.
64 bit mode :
The 64-bit mode disables segmentation, it uses a flat, paged-memory model for memory management. The 4 Gbyte segment limit is ignored in
64-bit mode. The below figure is an example of this model :
Page Translation
The paging mechanism enables the system software to create separate address spaces for each process or application. These address spaces are
known as virtual address spaces. The AMD64 architecture used in Jaguar enhances this support to allow translation of 64-bit virtual addresses
into 52-bit physical addresses, although processor implementations can support smaller virtual-address and physical-address spaces.
3
Virtual addresses are translated to physical addresses through hierarchical translation tables created and managed by system software. The
figure above shows an overview of the page-translation hierarchy used in long mode.
Long-mode page translation requires the use of physical-address extensions (PAE). Because PAE is always enabled in long mode, the PS
bit in the page directory entry (PDE.PS) selects between 4-Kbyte and 2-Mbyte page sizes, and the CR4.PSE bit is ignored. When 1-Gbyte
pages are supported, the PDPE. PS bit selects the 1-Gbyte page size.
CR3:
In long mode, the CR3 register (Control Register) is used to point to the PML4 base address. CR3 is expanded to 64 bits in long mode, allowing
the PML4 table to be located anywhere in the 52-bit physical-address space. Figure below shows the long-mode CR3 format.
Table Base Address Field: Bits 51:12. This 40-bit field points to the PML4 base address. The PML4 table is aligned on a 4-Kbyte
boundary with the low-order 12 address bits (11:0) assumed to be 0. This yields a total base-address size of 52 bits.
Page-Level Writethrough (PWT) Bit: Bit 3. Page-level writethrough indicates whether the highestlevel page-translation table has a
writeback or writethrough caching policy. When PWT=0, the table has a writeback caching policy. When PWT=1, the table has a
writethrough caching policy.
Page-Level Cache Disable (PCD) Bit: Bit 4. Page-level cache disable indicates whether the highestlevel page-translation table is
cacheable. When PCD=0, the table is cacheable. When PCD=1, the table is not cacheable.
Reserved Bits. Reserved fields should be cleared to 0 by software when writing CR3.
The figure below shows 4-Kbyte page translation, performed by dividing the virtual address into six fields. Four of the fields are used as
indices into the level page-translation hierarchy. The virtual addresses are described as shown:
Translation-Table Base Address Field: The translation-table base-address field points to the physical base address of the next-lower-
level table in the page-translation hierarchy.
4
Physical-Page Base Address Field: The physical-page base-address field points to the base address of the translated physical page.
This field is found only in the lowest level of the page translation hierarchy.
Present (P) Bit: Bit 0. This bit indicates whether the page-translation table or physical page is loaded in physical memory.
Read/Write (R/W) Bit: Bit 1. This bit controls read/write access to all physical pages mapped by the table entry.
User/Supervisor (U/S) Bit: Bit 2. This bit controls user (CPL 3) access to all physical pages mapped by the table entry.
Page-Level Writethrough (PWT) Bit: Bit 3. This bit indicates whether the page-translation table or physical page to which this entry
points has a writeback or writethrough caching policy.
Page-Level Cache Disable (PCD) Bit: Bit 4. This bit indicates whether the page-translation table or physical page to which this entry
points is cacheable.
Accessed (A) Bit: Bit 5. This bit indicates whether the page-translation table or physical page to which this entry points has been
accessed.
Dirty (D) Bit: Bit 6. This bit is only present in the lowest level of the page-translation hierarchy. It indicates whether the physical
page to which this entry points has been written.
Page Size (PS) Bit: Bit 7. When the PS bit is set in the page-directory-pointer entry or page-directory entry, that entry is the lowest
level of the page-translation hierarchy.
Global Page (G) Bit: Bit 8. This bit is only present in the lowest level of the page-translation hierarchy. It indicates the physical page
is a global page.
TLB Management
In general, software changes made to paging-data structures are not automatically reflected in the TLB. In these situations, it is necessary for
software to invalidate TLB entries so that these changes are immediately propagated to the page-translation mechanism.
Speculative Caching of Address Translations: The processor may create a TLB entry for any linear address for which valid entries exist
in the page table structure currently pointed to by CR3. Such entries remain cached in the TLBs and may be used in subsequent translations.
Loading a translation speculatively will set the Accessed bit, if not already set. A translation will not be loaded speculatively if the Dirty bit
needs to be set.
Improvements on Jaguar
The Jaguar processor was succeeded by Puma Family 16h, as a second generation version, targeting the same market. Compared to Jaguar,
it has 19% core leakage reduction, 38% GPU leakage reduction, 500 mW reduction in memory controller power, 200 mW reduction in display
interface power, chassis temperature aware turbo boost, selective boosting according to application needs (intelligent boost), support for ARM
TrustZone via integrated Cortex-A5 processor and support for DDR3L-1866 memory.