Computer Architecture
Computer Architecture
Computer organisation
Memory components
▪ D latch
o store the state value unless the clock input C is asserted
o When C is asserted the value of input D replaces the value of Q
▪ flip-flop
o The output is equal to the value of the stored state
o The internal state is changed only on clock edge
▪ In a shift register the output of the flip-flopi is connected to the input of the flip-
flopi+1
▪ A register file is an array of registers
o Each register can be read by supplying a its register number
▪ Endianness
o The order of byte wise values in memory
▪ Big-Endian
o Byte with most significant value: stored first (lowest memory address)
o Data networking and mainframes
o Motorola 68000 and PowerPC G5 are big-endian
▪ Little-Endian
o Byte with least significant value: stored first (lowest memory address)
o x86 Intel and AMD64 processors family and most microprocessors
Little Endian
Computer organisation
Data operations
Processors
▪ M Chips
o N cores/chip
o T threads/core
▪ What do we need?
o A program – sequence of instructions
▪ Or multiple sequences... if concurrent/parallel
*When executing a single thread per core, then such a thread has all core
resources available!
- Memory bandwidth
- Functional units
▪ Multithreading
o Execute multiple threads in parallel
Software Thread
▪ The instruction flow of a given running program. Any program has at least one
thread.
o Single-Threaded
Hardware multithreading
*When executing a single thread per core, then such a thread has all core
resources available!
- Memory bandwidth
- Functional units
Detailed memory access
Sample code
▪ IBM Power 9
o 14 nm technology
o 24 cores / SMT (8), 3.0 – 4.0 GHz.
o L1 caches 32+32 KB
o L2 cache 512 KB.
o L3 cache 120MB
o MAX CPU supported 4-8 and more sockets
o 2 TB MAX RAM DDR4
o PCIe v4 x4, x8, x16
▪ Intel KNL – Xeon Phi 72x5
o 14 nm technology
o 72 cores 1.5 – 1.6 GHz.
o L2 cache 36 MB.
o MAX CPU supported 1 socket?
o 384 GB. MAX RAM DDR4
o PCIe v3 x4, x8, x16
▪ ARM Cortex-A77
o 7 nm technology
o aarch64 – ARMv8-A
o 4-8 cores
o DynamIQ Technology – (big-LITTLE)
▪ Apple M3
o 3 nm technology
o 4.05 GHz performance, 2.76 GHz efficiency
o aarch64 – ARMv8.6-A
o 4 performance cores + 4 efficiency cores
o L1 cache 192+128 KiB per performance core
o L1 cache 128+64 KiB per efficiency core
o L2 cache 16 MiB
o RAM 8-24 GB
o GPU 8-10 cores
Computer organisation
Input/Output components
▪ The I/O Bus extends the access to
o Accelerators (GPUs, FPGAs)
o Disks
o Network
o Human-Machine Interface Peripherals
Accelerators
Access to accelerators/devices/peripherals
▪ Send/receive information to
o Servers
o Network-attached disks
▪ Protocols
o Low-level – ethernet packet
o High-level – TCP/IP