SPPU - BE - HPC - Unit 1 Notes

High Performance Computing Unit 1 Notes - Slides with Explanation

Uploaded by

vrjorwekar

We take content rights seriously. If you suspect this is your content, claim it here.

67% found this document useful (3 votes)

634 views47 pages

SPPU - BE - HPC - Unit 1 Notes

High Performance Computing Unit 1 Notes - Slides with Explanation

Uploaded by

vrjorwekar

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 47

High Performance Computing (410250) f. Vaishali JorwekarWhat comes in your mind when you see these 3 pictures of computers. of. Vaishali JorwekarPersonal Laptop Gaming Laptop Super Computer First laptop on the left is personal laptop, which we use for our day to day use. Gaming laptop has higher configuration and graphics card for high definition games. ‘Super computers used by scientist and big companies for complex mathematical modeling. So the main differentiating factor is computing power. computing power refers to how fast and capable a computer is in performing tasks and calculations. of. Vaishali JorwekarPersonal Laptop AMD Ryzen 5 7530U processor [12 Threads | Speed 116 MB L3 Cache upto IMemory: Miz, duat-channet capable upgradable upto 40GB | Storage: 512GB SSD M2 Gaming Laptop Processor: 13th Gen Intel Core |19-13980HX Processor 2.2 Gh (Gem Cache, up to 24 Memory: 16GB (8GB SO-DIMM *2) DDRS 4800 Miz Support {Upto 3260 2¢50-DINM slots ‘Storage: 178 PCIe 4.0 NVMe M2ssD Super Computer Peak Performance: 200 Pops Number of Nodes! 4508 emery pee Node: 512 GB DORE + 96 G3 HAN2 1250 PB IBM Spectum Scale GPFS 2.5 TBis Power Consumption: 13 Operating Syston: ‘Rod Ha Enterprise Linuk (RHEL) version 7.4 I have sample specifications here, and | want to highlight differences in key computing here. As the computation need increases, processors requirements also increases, which is met through, increasing number of processors and cores, cycle frequency. Typical personal computers will have 6 cores up to 16GB RAM, whereas Gaming laptop will have igh number of cores and RAM. But if you compare super computer, you will see number of processors, number is in thousands. And RAM in petabytes. of. Vaishali JorwekarHigh Performance Computing High Performance Computing (HPC) refers to the use of powerful computers and parallel processing techniques to solve complex problems or perform tasks at a much faster rate than traditional computers. High Performance Computing (HPC) refers to the use of powerful computers and parallel processing techniques to solve complex problems or perform tasks at a much faster rate than traditional computers. We saw in last slide, what makes computers powerful, that is number of processors, its cores, frequency and RAM. In first chapter we will see details about parallel processing techniques. of. Vaishali JorwekarApplication of High Performance Computing ‘+ Financial institutions ~ Transactions and card fault detection + Bio-sciences and the human genome — Drug discovery, disease detection / prevention + Computer aided engineering - automotive design and testing, transportation commerce, structural outlook, mechanical design + Chemical engineering -process and molecular design next line ‘+ Digital content creation and distribution-computer aided graphics in film and media * Economics / financial-Wall Street risk analysis, portfolio management, automated trading + Electronic design and automation- electronic component design + Geo sciences and geo engineering - oil and gas exploration and reservoir modelling ‘+ Mechanical design and drafting-2D and 3D design and verification, mechanical modelling + Defense and energy-nuclear stewardship, basic and applied research + Government labs, universities/academic-basic and applied research + Meteorological departments-weather forecasting Lets look at some of the application of high performance computing. of. Vaishali JorwekarParallel Processing A parallel computer is a set of processors that are able to work cooperatively to solve a computational problem. Parallel computing is a form of computation in which many instructions are carried out simultaneously operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (in parallel) Here's a simplified example to help illustrate the concept of Parallel Processing. Imagine you have a really challenging puzzle to solve, and you want to do it as quickly as. possible. If you try to solve it alone, it might take a long time. However, if you have a group of friends working together simultaneously, each focusing on a different part of the puzzle, you can finish much faster. In the context of computing, traditional computers are like individuals trying to solve the puzzle on their own. High Performance Computing, on the other hand, is like having a team of super-fast computers working together to tackle a complex problem. of. Vaishali JorwekarSerial Processing Parallel Processing — | ~ WMT wu I! | |- Wut | ‘Tobe run ona single computer having a single CPU: Tobe run using multiple CPUs: + Aproblemis broken into a discrete series of instructions + Aproblemis broken into discrete parts that can be solved + Instructions are executed one after another ‘concurrently + Only one instruction may execute at any moment in time. + Each partis further broken down to a series of instructions + Instructions from each part execute simultaneously on different CPUs In serial programming, problem is broken into series of instructions. Just recall your C programs, and each line will have some instructions. This instructions will run one after another. And only one instruction will get executed at any moment. It is nothing but serial programming. First thing for parallel processing, we will need multiple CPUs. So Problem is broken into part that can be solved parallelly. Each part is further broken down to a series of instructions. Instructions from each part execute simultaneously on different CPUs. of. Vaishali JorwekarSerial Processing Parallel Processing Instructions | wil |B Count from 1 to 1000. re Using @ regular computer, you would start at 1 and — With HPC, you could divide the task among multiple processors. incrementally count each number one by one until you reach For instance, if you have 10 processors, each processor could be 1000. This process might take some time, but it's manageable responsible for counting a range of 100 numbers. So, Processor for a personal computer. -Lcounts from 1 to 100, Processor 2 from 101 to 200, and so on. AAU processors work simultaneously, and the entire task of ‘counting from 1 to 1000 is completed much faster compared to ‘personal computer. Lets understand this concept with very simple example. Suppose you want to count from 1 to 1000. Using a regular computer, you would start at 1 and incrementally count each number one by one until you reach 1000. This process might take some time, but it's manageable for a personal computer. of. Vaishali JorwekarMotivating Parallelism Reasons for Growth: + Advancements in specifying and coordinating complex concurrent tasks. + Portable algorithms facilitating parallel processing. + Specialized execution environments and software development toolkits. Reasons: + Increased Computational Power + Enhanced Memory/Disk Speed + Improved Data Communication In recent years, there has been a big improvement in how computers handle multiple tasks that is parallel processing. This is because we've gotten better at organizing and managing complex tasks happening at once, creating portable algorithms (sets of instructions), using special environments for executing tasks, and developing toolkits for making software. This progress is based on three main reasons: Lincreased Computational Power: Modern computers, equipped with CMOS chip-based processors and advanced networking, have become significantly more powerful. This has driven the development of applications capable of handling multiple tasks simultaneously. 2.Enhanced Memory/Disk Speed: Progress in hardware interfaces has expedited the transi from microprocessor creation to the development of entire machines that efficiently execute parallel tasks. 3.Improved Data Communication: Standardization of programming environments has seen notable advancements. This ensures that applications designed for parallel processing remain relevant and useful for an extended period. n of. Vaishali JorwekarModern Processor f. Vaishali JorwekarStored- program computer architecture Stored program computer architecture is a design where instructions and data are stored in the same ‘memory, allowing a central processing unit to sequentially fetch, decode, and execute instructions, enabling versatile programmability. Central Processing Unit Input Device ouput Device "Stored-program computer architecture is like having a recipe book for your computer. In this analogy, the recipe book Is the computer's memory, and the chef Is the central processing unit (CPU). Let's break it down: Memory Unit: Just like a recipe book contains both instructions and a list of ingredients, the computer's memory stores both program instructions and data. Chef (CPU): The CPU acts like a chef following the instructions in the recipe book. It fetches each step, processes it, and moves on to the next one. Fetching and Execution: Imagine the CPU as a chef turning the pages of the recipe book (fetching), reading the instructions (decoding), and then cooking accordingly (execution). Example is personal computer of. Vaishali JorwekarGeneral-purpose Cache-based Microprocessor architecture General-purpose Cache-based Microprocessor architecture is a design incorporating a cache memory hierarchy to enhance data access speed and overall performance in executing a wide range of computational tasks. FE] cote ener cpu Pamary Memory Secondary Menor] Word Transfer Pres me cru Cache Main Memory Fast Slow Again lets understand with same analogy of chef and kitchen. Microprocessor (Chef): The microprocessor is like the chef, responsible for executing instructions and processing data. Cache (Countertop): Now, think of the cache as the countertop near the chef. This is where the chef keeps ingredients they use frequently. Main Memory (Pantry): The main memory is like the pantry, storing a larger quantity of ingredients. However, it takes a bit more time for the chef to go to the pantry to get less frequently used ingredients. Fetching Ingredients (Data): When the chef needs an ingredient (data), here's what happens: First, the chef checks the countertop (cache) for commonly used ingredients. If the ingredient is on the countertop (in the cache), great! It's quickly accessed. If not, the chef goes to the pantry (main memory) to retrieve the ingredient. Everyday Products: Smartphones and Laptops: Just like a chef needs quick access to ingredients, your smartphone and laptop use cache memory to store frequently accessed data and instructions for faster processing. of. Vaishali JorwekarWeb Browsing: When you load a webpage, the browser uses cache memory to store elements of the page for quicker retrieval. It's like having the ingredients ready for the chef without going to the pantry every time. of. Vaishali JorwekarParallel Programming Platforms of. Vaishali JorwekarExplicit Parallelism Implicit Parallelism + Programmer specifically defines and instructs + Automatically identifies and executes tasks the system on parallel tasks. concurrently without explicit instructions from + Programmer actively Incorporate parallel the programmer. constructs or directives into the code. + Programmer write regular, step-by-step code + System follows the programmer's explicit without specific parallel constructs. instructions for parallel execution. + Compiler, runtime system, and hardware work together to find and exploit parallel opportunities. Implicit parallelism is like type of parallelism in computing that automatically handles multiple tasks at the same time without you needing to explicitly tell it to. It means you can write your programs in a regular, step-by-step way, and behind the scenes, the computer's compiler and hardware work together to find opportunities to speed things up by doing tasks simultaneously. So, as an engineer, you focus on your code's logic, and the system takes care of making it run faster using parallel processing, all without you having to add any special parallel instructions. of. Vaishali JorwekarImplicit Parallelism - Pipelining Execution Pipelining in High-Performance Computing + Maximizing Processor Utilization © Utilize ALU, buses, registers, etc., continuously. + Pipelining Concept I © Instructions flow through the processor like a =< pipe. © Move through stages to accomplish operations. + Continuous Processor Usage © Each unit handles an instruction, keeping the processor busy. Imagine your processor is like a well-designed assembly line. Each part of the processor—like the ALU, buses, and registers—has a specific job. The goal? Keep all these parts busy all the time. So, what's pipelining? It's like turning your processor into a pipe. Instructions flow through it, moving from one stage to the next to get the job done. This way, each part of the processor is always working on something. No downtime. In simpler terms, it's like a well-oiled machine where instructions smoothly move through different stages, making sure your processor is always doing something useful." of. Vaishali JorwekarOverlapping Execution with Pipelining coffe Implicit Parallelism - Pipelining Execution + Non-Pipelined Approach © Fetch, decode, read, execute, and write sequentially © Hardware idle during waiting periods. + Pipelining Technique © Overlap execution of several instructions. © Two-stage pipelining example: Fetch and Execute. + Benefits © Faster execution by fetching next instruction during the current one’s execution, © Allunits busy, preventing idle time "Now, let's explore why pipelining is good and how it improves the efficiency of our processors. In the past, processors followed a step-by-step approach—fetch, decode, read, execute, and write, one after another. The drawback? Many components of the hardware would remain inactive, patiently waiting for others to complete their tasks. In pipelining approach, It's like managing multiple instructions simultaneously. Picture this: accomplishing two tasks in just two stages—fetching the next instruction while executing the current one, It's an intelligent method to overlap tasks and maintain a smooth workflow. What's the result? Quicker execution! Every part of the processor remains engaged, avoiding any downtime. Think of it as orchestrating a production line where everyone has a role, and the line keeps moving without interruptions. of. Vaishali Jorwekar+ From Scalar to Superscalar © Scalar processors had one pipelined unit for {teger and one for floating-point operations. for Parallelism © Single pipeline isn’t enough for parallelism. © Pipelines enable parallelism by having multiple instructions at different stages. © Superscalar processors execute more than one instruction per clock cycle. © Fetch and decode multiple instructions simultaneously, ¢ Implicit Parallelism - Superscalar Execution So, back in the day, processors were scalar, meaning they had one pipeline for integer operations and one for floating-point operations. But designers realized that having just one pipeline wasn't cutting it for getting things done faster. We needed more parallelism. that's where superscalar came into picture. It's like having a processor that can do more than one thing at a time during a single clock cycle. Imagine fetching and decoding multiple instructions simultaneously. That's the essence of superscalar - making our processors more efficient by doing multiple tasks at once." of. Vaishali Jorwekar+ Instruction Level Parallelism (ILP) ‘© Superscalar architecture exploits Instruction Level Parallelism (ILP) ‘© Multiple pipelines for various instructions (eg, integer and floating-point) + Complexity Considerations ‘© Superscalar scheduler complexity and hardware cost are crucial in processor design. + VLIW Solution © Very Long Instruction Word (VLIW) processors use compile-time analysis. ‘© Bundling instructions for concurrent execution, addressing complexity. Implicit Parallelism - Superscalar Execution Integer regstor fle Floating-point ogister fe ir [on Pipelined integer functional units Pipelined floating point functional units Lets start with Instruction Level Parallelism (ILP). We've got multiple pipelines for different instructions like arithmetic, load, and store. It's about taking advantage of parallelism to speed things up. Now, here's the catch - making a superscalar processor is not easy. It's complex, and the hardware cost is something we really need to think about in processor design. To tackle this, we have something called VLIW or Very Long Instruction Word processors. They use a clever trick at compile time to identify and bundle together instructions that can be done at the same time. It's like putting a bunch of instructions in a very long instruction word to simplify the process. of. Vaishali JorwekarImplicit Parallelism - VLIW Processor Structure + Need for Separate Units © To perform multiple operations in one execution stage, separate units for each operation are essential. + VLIW Architecture © Visual representation of separate units for operations (Floating Point Add, Multiply, Branching, Integer ALU). © VLIW (Very Long Instruction Word) executes more than one basic instruction at a time. ‘© Multiple operations stored in a single instruction word, When we want to do multiple things at once in a single execution stage, we need separate units for ‘each operation. Picture this: for floating point addition, multiplication, branching, and integer ALU, we've got dedicated units. Check out Fig. 1.4.3 for a visual on this. Now, VLIW stands for Very Long Instruction Word. It's a way for our processors to handle more than one basic instruction at a time. How? By storing multiple operations in a single instruction word. So, when we issue one instruction, multiple operations kick off simultaneously during the execution cycle of the pipelining process. Simple, right?" of. Vaishali JorwekarImplicit Parallelism - VLIW Processor Structure Execution and Compiler Role + Simultaneous Operations © VLIW executes multiple operations simultaneously with one instruction. + Compiler's Role © Compiler identifies parallelism, schedules dependency-free code. © Resolves dependencies among instructions at ‘compile time. + Characteristics © Multiple independent operations in a VLIW instruction, no flow dependences. So, VLIW does multiple operations all at once with one instruction—no waiting around. But here's the trick: the compiler is crucial. It spots where we can run things in parallel and arranges the code to avoid any dependencies. So, the compiler is very important, making sure everything plays in harmony, It identifies and schedules operations that can run side by side, resolving any issues before the program even runs. One more thing - in a VLIW instruction, all these operations are independent; they don't rely on each other. It's like having a set of tasks that can be done simultaneously without any fuss. of. Vaishali JorwekarDichotomy of Parallel Computing Platforms Division based on logical and physical organization of parallel platforms Physical organization is the actual hardware organization of a platform. logical organization refers to a programmer's view of the platform. Control Structure The Communication Model + The various ways of expressing parallel + The mechanisms for specifying tasks is known as control structure. interaction between the parallel tasks is called as communication model. There are several platforms which facilitates parallel computing. In this section the division based on logical and physical organization of parallel platforms will be discussed. Physical organization is the actual hardware organization of a platform. logical organization refers to a programmer's view of the platform. From programmers perspective the two important components of parallel computing are: Control Structure and The Communication Model. of. Vaishali JorwekarPhysical Organization of Parallel Platforms Evolution Lets start with at conventional architecture, representing the traditional uni-processor system. While some parallel features can improve a single processor's speed, there are limitations. The foundation of processor architecture traces back to the Von Neumann Computer, characterized by its CPU, Memory, and I/O devices. This system follows the Von Neumann architecture, where the CPU consists of Arithmetic and Control units, operating on the stored program concept. Both program and data share the same memory unit, each location having a unique address. Execution proceeds sequentially unless the program explicitly alters this flow. Fig. 1.8.2 marks the initial steps toward parallelism, introducing lookahead, overlapping fetch and execute, and parallelism in functions. This latter concept involves two mechanisms: pipelining and multiple functional units. In the second mechanism, various functional units operate simultaneously, enhancing processing speed. Vector instructions, akin to massive arrays of data with a common operation, were initially managed by pipeline processors controlled by software looping. Subsequently, explicit processors tailored for vector instructions emerged. Two variations in vector processing include memory-to-memory and register-to-register, with the former utilizing of. Vaishali Jorwekarmemory for operand storage and the latter using registers. The evolution of register-to-register architecture led to the creation of two processor types: Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD). These developments signify the gradual integration of parallelism in processors, contributing to enhanced processing capabilities. of. Vaishali JorwekarPhysical Organization of Parallel Platforms Parallel Random Access Machine (PRAM) Various PRAM models differ in how they handle read or write conflicts + EREW : Exclusive Read Exclusive Write p processors can simultaneously read and write the content of p distinct memory locations. + CREW: Concurrent Read Exclusive Write p processors can simultaneously read the content of p! memory locations, where p'

Daa Lab Manual
No ratings yet
Daa Lab Manual
60 pages
Pps - Question Bank
No ratings yet
Pps - Question Bank
65 pages
Lab Manual OF Big Data Analtyics Lab (Bca04207) : BCA General II Year IV Semester Academic Session 2021-22
No ratings yet
Lab Manual OF Big Data Analtyics Lab (Bca04207) : BCA General II Year IV Semester Academic Session 2021-22
79 pages
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
100% (1)
M. Tech. (Sem-Ii) Theory Examination 2017-18 Distributed Data Base
2 pages
Laboratory Practice II
No ratings yet
Laboratory Practice II
23 pages
Dsa Notes Dsa
No ratings yet
Dsa Notes Dsa
33 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
Programming Techniques For Turing Machine Construction
No ratings yet
Programming Techniques For Turing Machine Construction
31 pages
C & Ds Notes 2022-2023 r22 Syllabus
100% (1)
C & Ds Notes 2022-2023 r22 Syllabus
210 pages
Java Tokens
No ratings yet
Java Tokens
68 pages
8086 Assembler Tutorial For Beginners (Part 1)
100% (2)
8086 Assembler Tutorial For Beginners (Part 1)
3 pages
BA Lab Manual
No ratings yet
BA Lab Manual
62 pages
Managing State: 5.1 The Problem of State in Web Applications
No ratings yet
Managing State: 5.1 The Problem of State in Web Applications
17 pages
O.S. Notes Based On Syllabus PDF
No ratings yet
O.S. Notes Based On Syllabus PDF
37 pages
25th August MCA New First Year Syllabus 2020
No ratings yet
25th August MCA New First Year Syllabus 2020
24 pages
Sta - Final Lab Manual
No ratings yet
Sta - Final Lab Manual
51 pages
RDBMS Unit 5
No ratings yet
RDBMS Unit 5
39 pages
4.5 Issues in Code Generation
No ratings yet
4.5 Issues in Code Generation
7 pages
What Are The Phases of Compiler Design
No ratings yet
What Are The Phases of Compiler Design
6 pages
Ooad Lab Manual
No ratings yet
Ooad Lab Manual
83 pages
CS6612 Compiler Lab Manual
100% (4)
CS6612 Compiler Lab Manual
60 pages
Challenges InThreading A Loop - Doc1
100% (2)
Challenges InThreading A Loop - Doc1
6 pages
Steps For Handling Page Fault - Easy Notes
No ratings yet
Steps For Handling Page Fault - Easy Notes
4 pages
Experiment-3 DBMS
No ratings yet
Experiment-3 DBMS
10 pages
#Procedure To Find Square of A Given No
No ratings yet
#Procedure To Find Square of A Given No
10 pages
CNS Module 2
No ratings yet
CNS Module 2
23 pages
DS Unit-5 Searching-Sorting
100% (1)
DS Unit-5 Searching-Sorting
37 pages
Web Programming Lab Manual
No ratings yet
Web Programming Lab Manual
21 pages
Dominators Global Data Flow Analysis
No ratings yet
Dominators Global Data Flow Analysis
30 pages
Computer Organization and Architecture (COA) 2017 May - June Old Solved Question Paper
100% (1)
Computer Organization and Architecture (COA) 2017 May - June Old Solved Question Paper
35 pages
PPL UNIT 2 Notes
No ratings yet
PPL UNIT 2 Notes
66 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Horspool Algorithm
No ratings yet
Horspool Algorithm
6 pages
Data Structure Using C by Mamata Garanayak 238c40
50% (4)
Data Structure Using C by Mamata Garanayak 238c40
460 pages
Dbms Lab Manual RGPV
75% (4)
Dbms Lab Manual RGPV
38 pages
Practical Slip Mcs Sem 3 Pune University
0% (1)
Practical Slip Mcs Sem 3 Pune University
4 pages
Database Management System Kcs 501 1
No ratings yet
Database Management System Kcs 501 1
2 pages
DBMS Lab Manual 2023-24
No ratings yet
DBMS Lab Manual 2023-24
77 pages
AI - Model Paper Answers - 240817 - 173447
No ratings yet
AI - Model Paper Answers - 240817 - 173447
27 pages
Data Copy in Copy Out
No ratings yet
Data Copy in Copy Out
2 pages
Bridge Course Computer Science
No ratings yet
Bridge Course Computer Science
2 pages
And Percentage Error.: 1. WAP To Find The Absolute Error, Relative Error
No ratings yet
And Percentage Error.: 1. WAP To Find The Absolute Error, Relative Error
16 pages
Theory of Computation Notes 1 - TutorialsDuniya
No ratings yet
Theory of Computation Notes 1 - TutorialsDuniya
106 pages
CS8661-IP LAB MAUAL UPDATION NEW (1) Lak
100% (1)
CS8661-IP LAB MAUAL UPDATION NEW (1) Lak
87 pages
Understanding Big Data
No ratings yet
Understanding Big Data
117 pages
DAA Unit 4 Notes
No ratings yet
DAA Unit 4 Notes
87 pages
Report 20220209 Talent Matcht Prueba 1 Backend C.buitron Outlook - Com77911720767
No ratings yet
Report 20220209 Talent Matcht Prueba 1 Backend C.buitron Outlook - Com77911720767
10 pages
R Unit 1 2018 Notes
No ratings yet
R Unit 1 2018 Notes
36 pages
Mca 1 Sem Problem Solving Using C Kca102 2022
No ratings yet
Mca 1 Sem Problem Solving Using C Kca102 2022
2 pages
Me C++ Lab Programs
No ratings yet
Me C++ Lab Programs
71 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
34 pages
JDBC Api Components and Drivers
100% (1)
JDBC Api Components and Drivers
15 pages
The Role of Algorithms in Computing
No ratings yet
The Role of Algorithms in Computing
9 pages
Data Science Fundamentals QB
No ratings yet
Data Science Fundamentals QB
23 pages
5.knowledge Acquisition in Artificial Intelligence
No ratings yet
5.knowledge Acquisition in Artificial Intelligence
19 pages
Unit 1 Algorithm Performance Analysis and Measurement
No ratings yet
Unit 1 Algorithm Performance Analysis and Measurement
61 pages
Anna University Notes
No ratings yet
Anna University Notes
153 pages
Software Engg Question Bank 2021
No ratings yet
Software Engg Question Bank 2021
8 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages

SPPU - BE - HPC - Unit 1 Notes

Uploaded by

SPPU - BE - HPC - Unit 1 Notes

Uploaded by

You might also like