SEMSPIRAL
SEMSPIRAL
KONNI, PATHANAMTHITTA
(Affiliated to Mahatma Gandhi University, Kottayam)
SEMINAR REPORT
ON
GRAPHICS PROCESSING UNIT
Submitted by:
SREELEKSHMI PV
REG NO:220021026527
MANNAM MEMORIAL NSS COLLEGE
KONNI, PATHANAMTHITTA
(Affiliated to Mahatma Gandhi University)
CERTIFICATE
This to certify that the seminar report entitled “GRAPHICS PROCESSING
UNIT” submitted by SREELEKSHMI PV Reg No: 220021026527 in partial
fulfilment of the requirement for the degree of B.Sc. computer science of the
Mahatma Gandhi University, Kottayam in bonafide work done by the during the
year 2022-2025
…...………… ….…………………..
Prof. JYOTHI. R Prof. RADHIKA. R
Principal Head of the Department
………………… ……………………..
Mrs. SMITHA. RAJAN
SEMINAR COORDINATOR EXTERNAL
ACKNOWLEDGEMENT
At the outset, I thank God almighty for making my endeavour a success. I also express my sincere
gratitude to Prof. RADHIKA.R, Head of Department of Computer Science for providing with
adequate facilities, ways and means by which I was able to complete this seminar.
I express my sincere gratitude to our seminar guide Mrs. SMITHA. RAJAN for her constant
support and valuable suggestions for the successful completion of the seminar.
I express my immense pleasure and thankfulness to all the teachers and staffs of the Department
of Computer Science, Mannam Memorial N.S.S College for their cooperation and support.
Last but the least I thank all the others and especially my classmates and my family members who
in one way or another helped me in the successful completion of my project.
SREELEKSHMI PV
ABSTRACT
A Graphics Processing Unit (GPU) is a microprocessor that has been designed specifically
for the processing of 3D graphics. The processor is built with integrated transform, lighting,
processes per second. GPUs allow products such as desktop PCs, portable computers, and game
consoles to process real-time 3D graphics that only a few years ago were only available on high-
end workstations. Used primarily for 3-D applications, a graphics processing unit is a single-chip
processor that creates lighting effects and transforms objects every time a 3D scene is redrawn.
These are mathematically-intensive tasks, which otherwise, would put quite a strain on the CPU.
TABLE OF CONTENTS
S.NO TOPICS
1. INTRODUCTION
2. WHAT’S A GPU
4. INTERFACING PORTS
5. COMPONENTS OF A GPU
6. WORKING
7. GPU COMPUTING
9. GPU IN MOBILES
10. ADVANTAGES
11. APPLICATIONS
13 CONCLUSION
14. REFERENCES
INTRODUCTION
There are various applications that require a 3D world to be simulated as realistically as possible
on a computer screen. These include 3D animations in games, movies and other real-world
simulations. It takes a lot of computing power to represent a 3D world due to the great amount of
information that must be used to generate a realistic 3D world and the complex mathematical
operations that must be used to project this 3D world onto a computer screen. In this situation, the
processing time and bandwidth are at a premium due to large amounts of both computation and
data.
The functional purpose of a GPU then, is to provide a separate dedicated graphics resource,
including a graphics processor and memory, to relieve some of the burden off of the main system
resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would
otherwise get saturated with graphical operations and I/O requests. The abstract goal of a GPU,
however, is to enable a representation of a 3D world as realistically as possible. So these GPUs are
designed to provide additional computational power that is customized specifically to perform
these 3D tasks.
WHAT’S A GPU?
A Graphics Processing Unit (GPU) is a microprocessor that has been designed specifically for the
processing of 3D graphics. The processor is built with integrated transform, lighting, triangle
setup/clipping, and rendering engines, capable of handling millions of math-intensive processes
per second. GPUs form the heart of modern graphics cards, relieving the CPU (central processing
units) of much of the graphics processing load. GPUs allow products such as desktop PCs, portable
computers, and game consoles to process real-time 3D graphics that only a few years ago were
only available on high-end workstations.
Used primarily for 3-D applications, a graphics processing unit is a single-chip processor that
creates lighting effects and transforms objects every time a 3D scene is redrawn. These are
mathematically-intensive tasks, which otherwise, would put quite a strain on the CPU. Lifting this
burden from the CPU frees up cycles that can be used for other jobs.
However, the GPU is not just for playing 3D-intense videogames or for those who create graphics
(sometimes referred to as graphics rendering or content-creation) but is a crucial component that
is critical to the PC's overall system speed. In order to fully appreciate the graphics card's role, it
must first be understood.
Many synonyms exist for Graphics Processing Unit in which the popular one being the graphics
card. It’s also known as a video card, video accelerator, video adapter, video board, graphics
accelerator, or graphics adapter.
HISTORY AND STANDARDS
The first graphics cards, introduced in August of 1981 by IBM, were monochrome cards designated
as Monochrome Display Adapters (MDAs). The displays that used these cards were typically text-
only, with green or white text on a black background. Colour for IBM-compatible computers
appeared on the scene with the 4-color Hercules Graphics Card (HGC), followed by the 8-color
Colour Graphics Adapter (CGA) and 16-color Enhanced Graphics Adapter (EGA). During the
same time, other computer manufacturers, such as Commodore, were introducing computers with
built-in graphics adapters that could handle a varying number of colours.
When IBM introduced the Video Graphics Array (VGA) in 1987, a new graphics standard came
into being. A VGA display could support up to 256 colours (out of a possible 262,144-color palette)
at resolutions up to 720x400. Perhaps the most interesting difference between VGA and the
preceding formats is that VGA was analog, whereas displays had been digital up to that point.
Going from digital to analog may seem like a step backward, but it actually provided the ability to
vary the signal for more possible combinations than the strict on/off nature of digital.
Over the years, VGA gave way to Super Video Graphics Array (SVGA). SVGA cards were based
on VGA, but each card manufacturer added resolutions and increased colour depth in different
ways. Eventually, the Video Electronics Standards Association (VESA) agreed on a standard
implementation of SVGA that provided up to 16.8 million colours and 1280x1024 resolution. Most
graphics cards available today support Ultra
Extended Graphics Array (UXGA). UXGA can support a palette of up to 16.8 million colours and
resolutions up to 1600x1200 pixels.
Even though any card you can buy today will offer higher colours and resolution than the basic
VGA specification, VGA mode is the de facto standard for graphics and is the minimum on all
cards. In addition to including VGA, a graphics card must be able to connect to your computer.
While there are still a number of graphics cards that plug into an Industry Standard Architecture
(ISA) or Peripheral Component Interconnect (PCI) slot, most current graphics cards use the
Accelerated Graphics Port (AGP).
INTERFACING PORTS
There are a lot of incredibly complex components in a computer. And all of these parts need to
communicate with each other in a fast and efficient manner. Essentially, a bus is the channel or
path between the components in a computer. During the early 1990s, Intel introduced a new bus
standard for consideration, the Peripheral Component Interconnect (PCI). It provides direct access
to system memory for connected devices, but uses a bridge to connect to the front side bus and
therefore to the CPU.
The illustration below shows how the various buses connect to the CPU.
PCI can connect up to five external components. Each of the five connectors for an external
component can be replaced with two fixed devices on the motherboard. The PCI bridge chip
regulates the speed of the PCI bus independently of the CPU's speed. This provides a higher degree
of reliability and ensures that PCI-hardware manufacturers know exactly what to design for.
PCI originally operated at 33 MHz using a 32-bit-wide path. Revisions to the standard include
increasing the speed from 33 MHz to 66 MHz and doubling the bit count to 64. Currently, PCI-X
provides for 64-bit transfers at a speed of 133 MHz for an amazing 1-GBps (gigabyte per second)
transfer rate!
PCI cards use 47 pins to connect (49 pins for a mastering card, which can control the PCI bus
without CPU intervention). The PCI bus is able to work with so few pins because of hardware
multiplexing, which means that the device sends more than one signal over a single pin. Also, PCI
supports devices that use either 5 volts or 3.3 volts. PCI slots are the best choice for network
interface cards (NIC), 2-D video cards, and other high-bandwidth devices. On some PCs, PCI has
completely superseded the old ISA expansion slots.
Although Intel proposed the PCI standard in 1991, it did not achieve popularity until the arrival of
Windows 95 (in 1995). This sudden interest in PCI was due to the fact that Windows 95 supported
a feature called Plug and Play (PnP). PnP means that you can connect a device or insert a card into
your computer and it is automatically recognized and configured to work in your system. Intel
created the PnP standard and incorporated it into the design for PCI. But it wasn't until several
years later that a mainstream operating system, Windows 95, provided system-level support for
PnP. The introduction of PnP accelerated the demand for computers with PCI.
The need for streaming video and real-time-rendered 3-D games requires an even faster
throughput than that provided by PCI. In 1996, Intel debuted the Accelerated Graphics Port
(AGP), a modification of the PCI bus designed specifically to facilitate the use of streaming
video and high-performance graphics.
AGP is a high-performance interconnect between the core-logic chipset and the graphics controller
for enhanced graphics performance for 3D applications. AGP relieves the graphics bottleneck by
adding a dedicated highspeed interface directly between the chipset and the graphics controller as
shown below.
Segments of system memory can be dynamically reserved by the OS for use by the graphics
controller. This memory is termed AGP memory or nonlocal video memory. The net result is
that the graphics controller is required to keep fewer texture maps in local memory.
AGP has 32 lines for multiplexed address and data. There are an additional 8 lines for sideband
addressing. Local video memory can be expensive and it cannot be used for other purposes by the
OS when unneeded by the graphics of the running applications. The graphics controller needs fast
access to local video memory for screen refreshes and various pixel elements including Zbuffers,
double buffering, overlay planes, and textures.
For these reasons, programmers can always expect to have more texture memory available via
AGP system memory. Keeping textures out of the frame buffer allows larger screen resolution, or
permits Z-buffering for a given large screen size. As the need for more graphics intensive
applications continues to scale upward, the number of textures stored in system memory will
increase. AGP delivers these textures from system memory to the graphics controller at speeds
sufficient to make system memory usable as a secondary texture store.
PCI EXPRESS
The PCIe electrical interface is also used in a variety of other standards, most notably Express
Card, a laptop expansion card interface.
Format specifications are maintained and developed by the PCI-SIG (PCI Special Interest Group),
a group of more than 900 companies that also maintain the conventional PCI specifications. PCIe
3.0 is the latest standard for expansion cards that is in production and available on mainstream
personal computers.
COMPONENTS OF GPU
There are several components on a typical graphics card:
Graphics Processor
The graphics processor is the brains of the card, and is typically one of three
configurations:
Graphics co-processor: A card with this type of processor can handle all of the graphics chores
without any assistance from the computer's CPU. Graphics co- processors are typically found on
high-end video cards.
Graphics accelerator: In this configuration, the chip on the graphics card renders graphics based
on commands from the computer's CPU. This is the most common configuration used today.
Frame buffer: This chip simply controls the memory on the card and sends information to the
digital-to-analog converter (DAC). It does no processing of the image data and is rarely used
anymore.
Memory – The type of RAM used on graphics cards varies widely, but the most popular types use
a dual-ported configuration. Dual-ported cards can write to one section of memory while it is
reading from another section, decreasing the time it takes to refresh an image.
Graphics BIOS – Graphics cards have a small ROM chip containing basic information that tells
the other components of the card how to function in relation to each other. The BIOS also performs
diagnostic tests on the card's memory and input/ output (I/O) to ensure that everything is
functioning correctly.
Display Connector – Graphics cards use standard connectors. Most cards use the 15-pin connector
that was introduced with Video Graphics Array (VGA).
Computer (Bus) Connector – This is usually Accelerated Graphics Port (AGP). This port enables
the video card to directly access system memory. Direct memory access helps to make the peak
bandwidth four times higher than the Peripheral Component Interconnect (PCI) bus adapter card
slots. This allows the central processor to do other tasks while the graphics chip on the video card
accesses system memory.
WORKING
The working of GPU can be explained by considering the graphics pipeline. There are different
steps involved in creating a complete 3D scene. It is done by different parts of the GPU, each of
which are assigned a particular job. During 3D rendering, there are different types of data the
travel across the bus. The two most common types are texture and geometry data. The geometry
data is the "infrastructure" that the rendered scene is built on. This is made up of polygons
(usually triangles) that are represented by vertices, the end-points that define each polygon.
Texture data provides much of the detail in a scene, and textures can be used to simulate more
complex geometry, add lighting, and give an object a simulated surface.
Many new graphics chips now have accelerated Transform and Lighting (T&L) unit, which takes
a 3D scene's geometry and transforms it into different coordinate spaces. It also performs lighting
calculations, again relieving the CPU from these math-intensive tasks.
Following the T&L unit on the chip is the triangle setup engine. It takes a scene's transformed
geometry and prepares it for the next stages of rendering by converting the scene into a form that
the pixel engine can then process. The pixel engine applies assigned texture values to each pixel.
This gives each pixel the correct colour value so that it appears to have surface texture and does
not look like a flat, smooth object. After a pixel has been rendered it must be checked to see whether
it is visible by checking the depth value, or Z value.
A Z check unit performs this process by reading from the Z-buffer to see if there are any other
pixels rendered to the same location where the new pixel will be rendered. If another pixel is at
that location, it compares the Z value of the existing pixel to that of the new pixel. If the new pixel
is closer to the view camera, it gets written to the frame buffer. If it's not, it gets discarded.
After the complete scene is drawn into the frame buffer the RAMDAC converts this digital data
into analog that can be given to the monitor for display.
BLOCK DIAGRAM
GPU COMPUTING
GPU computing is the use of a GPU (graphics processing unit) together with a CPU to accelerate
general-purpose scientific and engineering applications. Pioneered five years ago by NVIDIA, GPU
computing has quickly become an industry standard, enjoyed by millions of users worldwide and
adopted by virtually all computing vendors. GPU computing offers unprecedented application
performance by offloading compute-intensive portions of the application to the GPU, while the
remainder of the code still runs on the CPU. From a user's perspective, applications simply run
significantly faster. CPU + GPU is a powerful combination because CPUs consist of a few cores
optimized for serial processing, while GPUs consist of thousands of smaller, more efficient cores
designed for parallel performance. Serial portions of the code run on the CPU while parallel portions
run on the GPU.
MODERN GPU ARCHITECTURE
NVIDIA’s GeForce 8800 was the product that gave birth to the new GPU Computing model.
Introduced in November 2006, the G80 based GeForce 8800 brought several key innovations to
GPU Computing:
• G80 was the first GPU to support C, allowing programmers to use the power of the GPU without
having to learn a new programming language.
• G80 was the first GPU to replace the separate vertex and pixel pipelines with a single, unified
processor that executed vertex, geometry, pixel, and computing programs.
• G80 was the first GPU to utilize a scalar thread processor, eliminating the need for programmers
to manually manage vector registers.
• G80 introduced the single-instruction multiple-thread (SIMT) execution model where multiple
independent threads execute concurrently using a single instruction.
• G80 introduced shared memory and barrier synchronization for inter-thread communication.
In June 2008, NVIDIA introduced a major revision to the G80 architecture. The second generation
unified architecture—GT200 (first introduced in the GeForce GTX 280, Quadro FX 5800, and
Tesla T10 GPUs)—increased the number of streaming processor cores (subsequently referred to
as CUDA cores) from 128 to 240. Each processor register file was doubled in size, allowing a
greater number of threads to execute on-chip at any given time.
Hardware memory access coalescing was added to improve memory access efficiency. Double
precision floating point support was also added to address the needs of scientific and high-
performance computing (HPC) applications. When designing each new generation GPU, it has
always been the philosophy at NVIDIA to improve both existing application performance and GPU
programmability; while faster application performance brings immediate benefits, it is the
GPU’s relentless advancement in programmability that has allowed it to evolve into the most
versatile parallel processor of our time. It was with this mindset that we set out to develop the
successor to the GT200 architecture.
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming
model created by NVIDIA and implemented by the graphics processing units (GPUs) that they
produce. CUDA gives program developers direct access to the virtual instruction set and memory
of the parallel computational elements in CUDA GPUs.
Using CUDA, the GPUs can be used for general purpose processing (i.e., not exclusively graphics);
this approach is known as GPGPU. Unlike CPUs, however, GPUs have a parallel throughput
architecture that emphasizes executing many concurrent threads slowly, rather than executing a
single thread very quickly.
The CUDA platform is accessible to software developers through CUDA-accelerated libraries,
compiler directives (such as OpenACC), and extensions to industry-standard programming
languages, including C, C++and Fortran. C/C++ programmers use 'CUDA C/C++', compiled with
"nvcc", NVIDIA's LLVM-based C/C++ compiler, and Fortran programmers can use 'CUDA
Fortran', compiled with the PGI CUDA Fortran compiler from The Portland Group.
In addition to libraries, compiler directives, CUDA C/C++ and CUDA Fortran, the CUDA platform
supports other computational interfaces, including the Khronos Group's OpenCL, Microsoft's
Direct Compute, and C++ AMP. In the computer game industry, GPUs are used not only for
graphics rendering but also in game physics calculations (physical effects like debris, smoke, fire,
fluids); examples include PhysX and Bullet. CUDA has also been used to accelerate non-graphical
applications in computational biology, cryptography and other fields by an order of magnitude or
more.
CUDA provides both a low-level API and a higher-level API. The initial CUDA SDK was made
public on 15 February 2007, for Microsoft Windows and Linux. Mac OS X support was later added
in version 2.0, which supersedes the beta released February 14, 2008. CUDA works with all Nvidia
GPUs from the G8x series onwards, including GeForce, Quadro and the Tesla line. CUDA is
compatible with most standard operating systems. Nvidia states that programs developed for the
G8x series will also work without modification on all future Nvidia video cards, due to binary
compatibility.
Advantages
CUDA has several advantages over traditional general-purpose computation on GPUs (GPGPU)
using graphics APIs:
FERMI ARCHITECTURE
The first Fermi based GPU, implemented with 3.0 billion transistors, features up to 512 CUDA
cores. A CUDA core executes a floating point or integer instruction per clock for a thread. The 512
CUDA cores are organized in 16 SMs of 32 cores each. The GPU has six 64-bit memory partitions,
for a 384-bit memory interface, supporting up to a total of 6 GB of GDDR5 DRAM memory. A
host interface connects the GPU to the CPU via PCI-Express. The Giga Thread global scheduler
distributes thread blocks to SM thread schedulers.
Third Generation Streaming Multiprocessor
The third generation SM introduces several architectural innovations that make it not only the most
powerful SM yet built, but also the most programmable and efficient.
Each SM features 32 CUDA processors—a fourfold increase over prior SM designs. Each
CUDA processor has a fully pipelined integer arithmetic logic unit (ALU) and floating-point unit
(FPU). Prior GPUs used IEEE 754-1985 floating point arithmetic. The Fermi architecture
implements the new IEEE 754-2008 floating-point standard, providing the fused multiply-add
(FMA) instruction for both single and double precision arithmetic. FMA improves over a multiply-
add (MAD) instruction by doing the multiplication and addition with a single final rounding step,
with no loss of precision in the addition. FMA is more accurate than performing the operations
separately. GT200 implemented double precision FMA.
In GT200, the integer ALU was limited to 24-bit precision for multiply operations; as a result,
multi-instruction emulation sequences were required for integer arithmetic. In Fermi, the newly
designed integer ALU supports full 32-bit precision for all instructions, consistent with standard
programming language requirements. The integer ALU is also optimized to efficiently support
64-bit and extended precision operations. Various instructions are supported, including Boolean,
shift, move, compare, convert, bit-field extract, bit-reverse insert, and population count.
KEPLER ARCHITECTURE
With the launch of Fermi GPU in 2009, NVIDIA ushered in a new era in the high performance
computing (HPC) industry based on a hybrid computing model where CPUs and GPUs work
together workloads. And in just a couple of years, NVIDIA Fermi GPUs powerss me of the fastest
supercomputers in the world as well as tens of thousands of research clusters globally. Now, with
the new Kepler GK110 GPU, NVIDIA raises the bar for the HPC industry, yet again. Comprised
of 7.1 billion transistors, the Kepler GK110 GPU is an engineering marvel created to address the
most daunting challenges in HPC. Kepler is designed from the ground up to maximize
computational performance with superior power efficiency. The architecture has innovations that
make hybrid computing dramatically easier, applicable to a broader set of applications, and more
accessible. Kepler GK110 GPU is a computational workhorse with teraflops of integer, single
precision, and double precision performance and the highest memory bandwidth. The first GK110
based product will be the Tesla K20 GPU computing accelerator.
At the heart of the Kepler GK110 GPU is the new SMX unit, which comprises of several
architectural innovations that make it not only the most powerful Streaming Multiprocessor (SM)
we’ve ever built but also the most programmable and power-efficient. It delivers more processing
performance and efficiency through this new, innovative streaming multiprocessor design that
allows a greater percentage of space to be applied to processing cores versus control logic.
Dynamic Parallelism
Any kernel can launch another kernel and can create the necessary streams, events, and
dependencies needed to process additional work without the need for host CPU interaction. This
simplified programming model is easier to create, optimize, and maintain. It also creates a
programmer friendly environment by maintaining the same syntax for GPU launched workloads
as traditional CPU kernel launches. Dynamic Parallelism broadens what applications can now
accomplish with GPUs in various disciplines. Applications can launch small and medium sized
parallel workloads dynamically where it was too expensive to do so previously.
Hyper-Q enables multiple CPU cores to launch work on a single GPU simultaneously, thereby
dramatically increasing GPU utilization and slashing CPU idle times. This feature increases
the total number of connections between the host and the Kepler GK110 GPU by allowing 32
simultaneous, hardware managed connections, compared to the single connection available with
Fermi.
Hyper-Q is a flexible solution that allows connections for both CUDA streams and Message
Passing Interface (MPI) processes, or even threads from within a process. Existing applications
that were previously limited by false dependencies can see up to a 32x performance increase
without changing any existing code.
GPU IN MOBLIES
Mobile devices are quickly becoming our most valuable personal computers. Whether we’re
reading email, surfing the Web, interacting on social networks, taking pictures, playing games, or
using countless apps, our smartphones and tablets are becoming indispensable. Many people are
also using mobile devices such as Microsoft’s Surface RT and Lenovo’s Yoga 11 because of their
versatile form-factors, ability to run business apps, physical keyboards, and outstanding battery
life. Amazing new visual computing experiences are possible on mobile devices thanks to ever
more powerful GPU subsystems. A fast GPU allows rich and fluid 2D or 3D user interfaces, high
resolution display output, speedy Web page rendering, accelerated photo and video editing, and
more realistic 3D gaming. Powerful GPUs are also becoming essential for auto infotainment
applications, such as highly detailed and easy to read 3D navigation systems and digital instrument
clusters, or rear-seat entertainment and driver assistance systems.
Each new generation of NVIDIA® Tegra® mobile processors has delivered significantly higher
CPU and GPU performance while improving its architectural and power efficiency. Tegra
processors have enabled amazing new mobile computing experiences in smartphones and tablets,
such as full-featured Web browsing, console class gaming, fast UI and multitasking
responsiveness, and Blu-ray quality video playback.
At CES 2013, NVIDIA announced Tegra 4 processor, the world’s first quad-core SoC using four
ARM Cortex-A15 CPUs, a Battery Saver Cortex A15 core, and a 72-core NVIDIA GPU. With its
increased number of GPU cores, faster clocks, and architectural efficiency improvements, the
Tegra 4 processor’s GPU delivers approximately 20x the GPU horsepower of Tegra 2 processor.
The Tegra 4 processor also combines its CPUs, GPU, and ISP to create a Computational
Photography Engine for near-real-time HDR still frame photo and 1080p 30 video capture.
One of the most popular application categories that demands fast GPU processing is 3D games.
Mobile 3D games have evolved from simple 2D visuals to now rival console gaming experiences
and graphics quality. In fact, some games that take full advantage of Tegra 4’s GPU and CPU cores
are hard to distinguish graphically from PC games! Not only have the mobile games evolved, but
mobile gaming as an application segment is one of the fastest growing in the industry today.
Visually rich PC and console games such as Max Payne and Grand Theft Auto III are now available
on mobile devices.
High-quality, high-resolution retina displays (with resolutions high enough that the human eye
cannot discern individual pixels at normal viewing distances) are now being used in various mobile
devices. Such high-resolution displays require fast GPUs to deliver smooth UI interactions, fast
Web page rendering, snappy high-res photo manipulation, and of course high-quality 3D gaming.
Similarly, connecting a smartphone or tablet to an external high-resolution 4K screen absolutely
requires a powerful GPU. With two decades of GPU industry leadership, the mobile GPU in the
NVIDIA® Tegra® 4 Family of mobile processors is architected to deliver the performance
demanded by console-class mobile games, modern user interfaces, and high-resolution displays,
while reducing power consumption to fall within mobile power budgets. You will see that the Tegra
4 processor is also the most architecturally efficient GPU subsystem in any mobile SoC today.
Tegra 4 Family GPU Features and Architecture
The Tegra 4 processor’s GPU accelerates both 2D and 3D rendering. Although 2D rendering is
often considered a “given” nowadays, it’s critically important to the user experience. The 2D
engine of the Tegra 4 processor’s GPU provides all the relevant low-level 2D composition
functionality, including alpha-blending, line drawing, video scaling, BitBLT, colour space
conversion, and screen rotations. Working in concert with the display subsystem and video decoder
units, the GPU also helps support 4K video output to high-end 4K video display. The 3D engine
is fully programmable, and includes high-performance geometry and pixel processing capability
enabling advanced 3D user interfaces and console-quality gaming experiences. The GPU also
accelerates Flash processing in Web pages and GPGPU (General Purpose GPU) computing, as
used in NVIDIA’s new Computational Photography Engine, NVIDIA Chimera™ architecture, that
implements near-real-time HDR photo and video photography, HDR Panoramic image processing,
and “Tap-to-Track” objecting tracking. Tegra 4 processor includes a 72 core GPU subsystem. The
Tegra 4’s processor’s GPU has 6x the number of shader processing cores of Tegra 3 processor,
which translates to roughly 3-4x delivered game performance and sometimes even higher. The
NVIDIA Tegra 4i processor uses the same GPU architecture as the Tegra 4 processor, but a 60-
core variant instead of 72 cores. Even at 60 cores it delivers an astounding amount of graphics
performance for mainstream smartphone devices.
ADVANTAGES
GPUs offer exceptionally high computational power, as they are designed to handle millions of
calculations simultaneously, making them ideal for graphics rendering and complex computational
tasks. Equipped with specialized cores optimized for parallel operations, GPUs process data
significantly faster than CPUs for specific workloads. This efficiency makes them indispensable
for applications such as gaming, 3D rendering, machine learning, and scientific simulations.
Modern GPUs are capable of handling advanced tasks like ray tracing, complex lighting effects,
and real-time physics simulations, showcasing their ability to manage highly demanding processes
with remarkable speed and accuracy.
GPUs offload graphics-related tasks from the CPU, allowing it to focus on other critical system
operations and improving overall multitasking. This enables users to run demanding applications,
such as video editing, gaming, and rendering, alongside other processes without performance
degradation. By reducing the CPU's workload, tasks become smoother, eliminating bottlenecks. In
gaming, for instance, the CPU handles AI and game logic while the GPU takes care of rendering,
ensuring optimal performance.
GPUs are highly efficient at executing thousands of threads simultaneously, making them ideal for
workloads requiring massive parallelism. Applications such as artificial intelligence, deep learning,
and scientific research benefit greatly from their parallel processing capabilities. Unlike CPUs,
which focus on sequential execution, GPUs break tasks into smaller chunks and process them
concurrently. This architecture enables GPUs to handle complex problems like matrix
multiplication, fluid dynamics, and weather modelling with greater efficiency.
• Antialiasing
GPUs play a crucial role in antialiasing by reducing jagged edges (aliasing) in 3D graphics,
creating smoother and more realistic images. Techniques like Full-Scene Anti-Aliasing (FSAA)
and Multisample Anti-Aliasing (MSAA) significantly enhance image quality. Gamers especially
benefit from GPUs with advanced antialiasing features, as they deliver immersive visuals without
requiring higher resolutions. This approach saves processing power while maintaining excellent
visual fidelity.
• Anisotropic Filtering
GPUs use anisotropic filtering to resolve the issue of blurry textures on angled or distant objects,
ensuring textures remain sharp regardless of their orientation or distance from the viewer. This
technique enhances the realism of 3D environments by improving the clarity of textures on surfaces
like roads, buildings, and other receding objects. Advanced filtering methods further optimize
performance by balancing high texture quality with efficient rendering speed.
GPUs play a vital role in gaming and entertainment by powering high-resolution 3D gaming
with realistic visuals, ray tracing, and immersive environments. In virtual reality (VR) and
augmented reality (AR) applications, GPUs enable real-time rendering, delivering seamless user
experiences. They are also crucial in movie production for rendering 3D animations, realistic
simulations, and CGI effects. Additionally, GPUs accelerate high-definition video decoding,
ensuring smooth playback for streaming platforms like YouTube and Netflix, enhancing the overall
viewing experience.
GPUs are essential in artificial intelligence (AI) and machine learning, particularly for training
neural networks in applications like natural language processing, image recognition, and speech
synthesis. They are also crucial in autonomous systems, such as self-driving cars and drones,
where they process sensor data, identify objects, and make real-time decisions. In the realm of
generative AI, GPUs power tools like ChatGPT and DALL·E, enabling the rapid generation of
text, images, and videos. Additionally, GPUs accelerate healthcare AI, speeding up drug discovery
and medical diagnostics using advanced machine learning algorithms.
GPUs are transforming medical imaging and healthcare by enabling real-time motion
compensation, which allows surgeons to virtually "stop" a beating heart for precise operations.
They accelerate image reconstruction for CT, MRI, and PET scans, speeding up diagnostics. In
medical AI, GPUs are used for tasks such as cancer detection, diabetic retinopathy screening, and
identifying anomalies in imaging data. Additionally, GPUs analyze genetic data to support
personalized medicine, helping create custom treatment plans tailored to individual patients'
needs.
• Defense and Intelligence
GPUs play a vital role in defense and intelligence, accelerating image processing tasks like geo-
rectification, 3D reconstruction, and satellite imagery analysis. They are essential for real-time
video surveillance, enabling efficient security monitoring of critical locations. In cybersecurity,
GPUs enhance encryption and decryption processes, helping secure sensitive data. Additionally,
GPUs power military simulations, creating realistic battlefield scenarios for training and strategy
planning, improving preparedness and decision-making in defense operations.
• Cryptocurrency Mining
GPUs are widely used in cryptocurrency mining, particularly for coins like Bitcoin and Ethereum,
due to their ability to perform parallel hashing computations. Their efficiency in handling
repetitive tasks makes them ideal for solving complex cryptographic puzzles, allowing miners to
process transactions and secure blockchain networks more effectively and quickly.
GPUs are integral to robotics and automation, processing real-time data from sensors to control
robotic arms, drones, and autonomous vehicles. They enable advanced vision processing,
allowing robots to interpret their surroundings and make decisions based on visual analysis. In
industrial automation, GPUs are used for quality control, defect detection, and optimizing
assembly line operations, improving efficiency and precision in manufacturing processes.
In film and media production, GPUs play a crucial role by accelerating rendering and video
effects in software like Adobe Premiere Pro and DaVinci Resolve. Their real-time processing
capabilities ensure high-quality colour correction and grading, delivering cinematic visuals.
Additionally, GPUs are essential in live broadcasting, enabling real-time encoding and decoding
for smooth live streaming of events, enhancing the overall viewing experience.
CHALLENGES FACED
Power and energy constraints pose significant challenges for GPUs. With power supply voltage
scaling diminishing, energy per operation now scales only linearly with process feature size. This
results in a growing gap between the increasing number of processors that can be integrated into a
chip and those that can be effectively powered and cooled. Additionally, the heat generated by
high-performance GPUs necessitates advanced cooling systems, which are both expensive and
susceptible to failure. Furthermore, GPUs are constrained by specific power consumption limits—
such as 3 W for mobile devices and 150 W for desktops and servers—restricting their full
potential in energy-intensive applications
Memory Bandwidth Bottlenecks arise when the development of memory bandwidth fails to match
the computational power of modern GPUs, leading to significant delays in data processing for tasks
that require heavy data manipulation. This bandwidth lag creates a bottleneck that limits the full
potential of the GPU, especially in data-intensive applications. Furthermore, as memory hierarchies
become more complex, managing the flow of data across various levels of memory storage becomes
increasingly difficult. Inefficient control over data movement can slow down overall performance,
exacerbating the challenges posed by memory bandwidth limitations
Programming and software challenges hinder effective GPU utilization. Parallel programming is
complex due to the need for explicit control over massive concurrency. Debugging and profiling
GPU applications is also difficult, with limited tools available for managing memory hierarchies.
Compatibility issues arise when code optimized for one GPU architecture doesn't work well on
others, requiring extra development effort for cross-platform compatibility. These factors slow
down the development and optimization of GPU-based applications.
• Scalability and Architecture
Environmental and economic challenges are key concerns in the widespread use of GPUs. One of
the major issues is high power consumption, as GPUs significantly contribute to energy usage,
especially in data centers and resource-intensive applications such as cryptocurrency mining. The
high cost of manufacturing and purchasing high-end GPUs also creates economic barriers, limiting
access for smaller organizations or individuals. Additionally, the rapid obsolescence of GPUs leads
to increasing electronic waste, raising sustainability concerns as the environmental impact of
discarded technology grows. These factors highlight the need for more energy-efficient and cost-
effective solutions in GPU technology.
• Performance Limitations
Performance limitations are a notable challenge for GPUs, especially when handling specific types
of workloads. While GPUs excel at parallel tasks, they face difficulties with workloads that require
sequential processing or low-latency responses, which can lead to higher latency and reduced
efficiency. Additionally, in real-time applications such as ray tracing, the demand for extreme
performance often pushes GPUs to their limits, creating bottlenecks that hinder the ability to achieve
smooth, high-quality rendering. These challenges highlight the need for ongoing advancements in
GPU technology to meet the growing demands of diverse applications
CONCLUSION
From the introduction of the first 3D accelerator in 1996, GPUs have evolved dramatically, earning
their status as critical components of modern computing. No longer limited to graphics rendering,
GPUs have transformed into powerful co-processors capable of handling complex computations
in fields such as artificial intelligence, deep learning, and high-performance scientific simulations.
As the pace of GPU development accelerates, we can anticipate even faster, more efficient units
with expanded applications across industries.
Their increasing role in general-purpose computing solidifies their importance beyond traditional
graphics tasks, making them indispensable for cutting-edge technologies like autonomous
vehicles, real-time data analytics, and blockchain processing. With advancements in architecture
and integration, GPUs are poised to shape the future of computing, pushing the boundaries of what
machines can achieve. This trajectory underscores the GPU’s transition from a specialized graphics
unit to a central player in the broader computing landscape.
REFERENCES
• David Luebke,” GPU Computing - Past, Present and Future”, GPU Technology
conference, San Francisco, October 11-14, 2011
• Zhang k, Kang J.U,” Graphics Processing Unit based Ultra High-Speed Techniques”,
Volume 18, Issue:4, year:2012
• Ajit Datar,” Graphics Processing Unit Architecture (GPU Arch) with focus NVIDIA
GeForce 6800 GPU” Publication Date: 14 April 2008