0% found this document useful (0 votes)

98 views14 pages

CS683 Pa1

Uploaded by

badbloggerr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views14 pages

CS683 Pa1

Uploaded by

badbloggerr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Programming Assignment 1 - Chak De Microarchitecture!

CS 683: Advanced Computer Architecture, Autumn 2024

Computer Science and Engineering
Indian Institute of Technology, Bombay
CASPER group: https://casper-iitb.github.io/

* Disclaimer: The memes included in this document are intended solely for fun learning. They are not
meant to offend, mislead, or be taken as factual information. Please enjoy them in the spirit of fun
learning.

Invite to the assignment: https://classroom.github.com/a/4u66Tjkg

We write a lot of programs, keeping the algorithm our primary concern, but while we do,
we often forget about the underlying hardware on which our programs will run. A wise
man once said,

“ Theory will take you

only so far ”
In this programming assignment, we’ll explore how we can exploit the architecture to
gain an advantage beyond theory. We’ll explore how we can better utilize the cache by
improving the locality of programs. We’ll see how we can use special hardware
capabilities to speed up vector operations.

Just a friendly reminder: if you think copying code is a clever shortcut, think again. It’s
not only easily spotted but also a great way to miss out on the chance to actually learn
something. Why not impress us with your own work?

NOTE:
- You need to have intel-based x86 machines to implement software prefetching.
- Make changes to the base codes only. Do not use a different implementation of the
base codes.
- No compiler or any other optimizations should be used. Points will be deducted
for using any additional optimization techniques other than the ones mentioned in
the assignment.
The assignment is divided into two tasks, each having its own subparts. The task structure
and their respective points are shown below:

Task 1 (Matrix transpose)

1A Tile it to see it 2.5 points

1B Fetch it but with soft corner (software prefetching) 2.5 points

1C Tiling + Prefetching 2 points

Task 2 (2D convolution)

2A Shhh SIMD in action 2 points

2B Tile it again 2 points

2C Software Prefetching 2 points

2D Hum saath saath hain 2 points

Bonus Points: A bonus of 5 points will be given to the top 10 teams getting the best
speedups.
Task 1A: Tile it to See it

“ Babumoshai MATRIX choti honi

chahiye BADI nahi ”

One of the most common matrix operations is getting the transpose. This operation is particularly
brutal on the cache if the matrix size is huge, as we access the matrix in column-major order.
What if we can divide the matrix into tiles and get the transpose per tile? Maybe we’ll be gentle
on the cache then…

TODOs
1. Report the L1-D cache MPKI when executing only the naive matrix transpose.
2. Implement the tiled matrix transpose.
3. Report the L1-D cache MPKI when executing only the tiled matrix transpose.
4. Compare the performance by calculating the speedup.
5. Do this for multiple matrix sizes and tile sizes and analyze the results.
6. Plot the MPKI and speedup for different matrix sizes and tile sizes of your choice. Select
the sizes such that you can draw clear conclusions from the results.

Answer the following:

1. Report the changes in L1-D MPKI that you observed while moving from the naive to the
tiled matrix transpose, and argue.
2. How much did the L1-D MPKI change for different matrix sizes and tile sizes? Explain
the findings.
3. Did you achieve any speedup? If so, how much and what contributed to it? If not, what
were the reasons?
Task 1B: Using Software Prefetching

“ Don’t let PROCRASTINATION take

over your life. Be brave and take risks.
Start PREFETCHING ”

Software prefetching is a technique that aims to reduce cache misses by fetching data into the
cache before it is needed. In this section, you will optimize the matrix transpose code using
software prefetching techniques. Explain the concept of software prefetching and the different
strategies that can be employed. Strategies like temporal locality of fetched data, fetching a
variable number of addresses at a time, etc., can be considered.

Note:
1. You will have to turn off hardware prefetching to see the effects of software prefetching.
How?

TODOs
1. Report the number of instructions and L1-D cache MPKI when executing only the
naive matrix transpose.
2. Implement the software-prefetched matrix transpose.
3. Report the number of instructions and L1-D cache MPKI when executing only the
software-prefetched matrix transpose.
4. Compare the performance by calculating the speedup.
5. Do this for multiple matrix sizes and analyze the results.
6. Plot the MPKI and speedup for different matrix sizes of your choice. Select matrix sizes
such that you can draw clear conclusions from the results.

Answer the following:

1. Report the change in the number of instructions that you observed while moving from
naive to the software-prefetched matrix transpose, and argue.
2. Report the change in L1-D MPKI that you observed while moving from the naive to the
software-prefetched matrix transpose, and argue.
3. Did you achieve any speedup? If so, how much and what contributed to it? If not, what
were the reasons?

Resources:
1. To implement software prefetching, you will use ‘_mm_prefetch.’
2. _mm_prefetch is an intrinsic function provided by Intel that prefetches data into the
cache to enhance memory access efficiency. It enables programmers to give the processor
advance notice about which memory locations will be accessed soon, reducing cache
misses and improving performance.
3. The function is part of Intel's SSE (Streaming SIMD Extensions) and is highly optimized
for Intel architectures. It is especially effective when used with SIMD (Single Instruction,
Multiple Data) operations.
4. Following are the links where you can find details about _mm_prefetch:
- https://www.intel.com/content/www/us/en/docs/intrinsics-guide/
index.html#ig_expand=5152&text=prefetch
- https://stackoverflow.com/questions/46521694/what-are-mm-prefetch-locality-hints

Task 1C: Tiling + Prefetching

You need to optimize the matrix transpose further using a combination of tiling and software
prefetching.
Let T(x) be the time taken while executing matrix transpose with technique x.
So the goal of this task is as follows:
T(tiling + prefetching) < min(T(tiling), T(prefetching))

TODOs
1. Report the number of instructions and L1-D cache MPKI when executing only the
naive matrix transpose.
2. Implement the tiled + software-prefetched matrix transpose.
3. Report the number of instructions and L1-D cache MPKI when executing only the
tiled + software-prefetched matrix transpose.
4. Compare the performance by calculating the speedup.
5. Do this for multiple matrix sizes and analyze the results.
6. Plot the MPKI and speedup for different matrix sizes and tile sizes of your choice. Select
sizes such that you can draw clear conclusions from the results.

Answer the following:

1. Report the change in the number of instructions that you observed while moving from the
naive to the software-prefetched matrix transpose, and argue.
2. Report the change in L1-D MPKI that you observed while moving from the naive to the
software-prefetched matrix transpose, and argue.
3. How much did the L1-D MPKI change for different matrix sizes and tile sizes? Explain
the findings.
4. Did you achieve any speedup? If so, how much and what contributed to it? If not, what
were the reasons?
Task 2A: Shhh, SIMD in action 🏹

Life with SIMD (ek bow,

anek arrow)
Let's move on to another common operation used in image processing: 2D convolution. Here,
we have vector operations that can be optimized using special registers that modern processors
have.
SIMD (Single Instruction, Multiple Data) instructions are specialized instructions that
simultaneously perform operations on multiple data elements. These instructions can
significantly speed up 2D convolution. In this section, you will explore SIMD instructions, such
as SSE (Streaming SIMD Extensions) or AVX (Advanced Vector Extensions), depending on the
available hardware.

TODOs
1. Report the number of instructions when executing only the naive convolution
algorithm.
2. Implement the SIMD 2D convolution algorithm.
3. Report the number of instructions when executing only the SIMD 2D convolution
algorithm.
4. Compare the performance by calculating the speedup.
5. Do this using SIMD registers of width 128 bits and 256 bits (and 512 bits if available)
and compare the speedups.
6. Do this for multiple matrix sizes and kernel sizes and analyze the results.
7. Plot the speedup for the different matrix sizes and kernel sizes of your choice. Select the
sizes such that you can draw clear conclusions from the results.
Answer the following:
1. Report the change in the number of instructions that you observed while moving from the
naive to the SIMD 2D convolution algorithm, and argue.
2. Did you achieve any speedup? If so, how much and what contributed to it? If not, what
were the reasons?

Resources:

Here are a few basic points to get started with using SIMD (Single Instruction, Multiple Data)
instructions.

1. Understanding SIMD Registers:

○ SIMD (Single Instruction, Multiple Data) allows the same operation to be
performed on multiple data points simultaneously, which can significantly speed
up computations.
○ _m128d and _m256d represent 128-bit and 256-bit SIMD registers,
respectively. These registers can hold multiple double-precision (64-bit) floating-
point numbers.
○ _m128d can hold two double-precision floating-point numbers.
○ _m256d can hold four double-precision floating-point numbers.
○ There is also _m512d, which can hold eight double-precision floating-point
numbers. You can check if your system supports this by doing:

lscpu | grep avx512

If this yields some flags like avx512*, there you go!

2. Loading Data into SIMD Registers:

○ Before performing any SIMD operations, data must be loaded into these registers.
○ One of the ways to load data is using functions like _mm256_loadu_pd (for
_m256d registers), which loads four double-precision floating-point values from
memory into a SIMD register. The "u" in loadu stands for "unaligned," meaning
the data does not need to be aligned to a specific boundary in memory.
3. Performing SIMD Operations:
● Once the data is loaded into SIMD registers, you can perform arithmetic
operations on these registers.
● Functions like _mm256_add_pd and _mm256_mul_pd allow you to perform
addition and multiplication, respectively, on the elements in the registers.

You can refer to all these functions here:

https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html.
Task 2B: Tile it again

TODOs
1. Report the L1-D cache MPKI when executing only the naive 2D convolution.
2. Implement the tiled 2D convolution.
3. Report the L1-D cache MPKI when executing only the tiled 2D convolution.
4. Compare the performance by calculating the speedup.
5. Do this for multiple matrix sizes and kernel sizes and analyze the results.
6. Plot the MPKI and speedup for different matrix sizes, kernel sizes, and tile sizes of your
choice. Select the sizes such that you can draw clear conclusions from the results.

Answer the following:

1. Report the change in L1-D MPKI that you observed while moving from the naive to the
tiled 2D convolution, and argue.
2. How much did the L1-D MPKI change for different matrix sizes, kernel sizes, and tile
sizes? Explain the findings.
3. Did you achieve any speedup? If so, how much and what contributed to it? If not, what
were the reasons?

Task 2C: Software prefetching

Note:
1. You will have to turn off hardware prefetching to see the effects of software prefetching.

TODOs
1. Report the number of instructions and L1-D cache MPKI when executing only the
naive 2D convolution.
2. Implement the software-prefetched 2D convolution.
3. Report the number of instructions and L1-D cache MPKI when executing only the
software-prefetched 2D convolution.
4. Compare the performance by calculating the speedup.
5. Do this for multiple matrix sizes and kernel sizes and analyze the results.
6. Plot the MPKI and speedup for different matrix sizes and kernel sizes of your choice.
Select the sizes such that you can draw clear conclusions from the results.

Answer the following:

1. Report the change in the number of instructions that you observed while moving from the
naive to the software-prefetched matrix multiplication algorithm, and argue.
2. Report the change in L1-D MPKI that you observed while moving from the naive to the
software-prefetched matrix multiplication algorithm, and argue.
3. Did you achieve any speedup? If so, how much and what contributed to it? If not, what
were the reasons?

Task 2D: Hum Saath Saath Hain

You need to optimize the 2D convolution further using a combination of all the techniques in
place. You need to understand how each technique optimizes the 2D convolution and how they
can interact synergistically to improve performance further. Complete the following functions in
the provided template code to calculate the speedup against baseline naive implementation.
Let T(x) be the time taken while executing matrix convolution with technique x.
So the goal of this task is as follows:
T(technique 1 + technique 2 + ...) < min(T(technique 1), T(technique 2), ...)

Techniques Concerned function

Tiling + SIMD tiled_simd_convolution

Tiling + prefetching tiled_prefetch_convolution

SIMD + prefetching simd_prefetch_convolution

Tiling + SIMD + prefetching simd_tiled_prefetch_convolution

1. Measure and report the execution time of all the combinations.

2. Compare the performance improvement achieved by each technique against the baseline
implementation.

Deliverables
1. Source code for all the tasks in transpose.c and convolution.c files.
2. README.md summarizing all the tasks and their respective todos. Describe what you
did, why you did it, and how much it benefited you. Compare and analyze the
performance improvements achieved by each of the tasks. Discuss any trade-offs or
limitations associated with each optimization technique. Reflect on the importance of
understanding hardware architecture and the impact it has on performance.
3. All the plots should be included in the README.md file along with their summary.
4. Include a plot showing the comparison of the performance of each technique along with
the combination of the techniques against the different matrix sizes in the README.md
file. There will be two different plots for transpose and convolution operations,
respectively. This is what your plot should look like

Submission
● You should submit a single tar.gz file with the name roll_number_pa1.tar.gz on
Moodle.
● The folder structure within tar.gz should be in the below format. Place the files in the
appropriate folders for all the tasks.

● Kindly read the document at this link to create a private repository for the assignment. Do
not push everything at the last moment. Maintain a proper commit history.
Appendix
Instructions to build and run the project:
1. For part1 (cd to the part1 directory)
There are various sections to run for part 1:
1. naive
2. tiling
3. prefetch
4. tiling-prefetch
5. all

To execute a section in part1, you can run:

make <section-name>
This will create an executable file in the build directory; to run the executable, just do the
following:
./build/<section-name> <matrix-size> <tile-size>
The <block-size> can be a random number for the sections where tiling is not
performed.

2. For part2 (cd to the part2 directory)

There are various sections to run for part 2:
1. naive
2. tiling
3. prefetch
4. simd
5. tiling-prefetch
6. tiling-simd
7. simd-prefetch
8. tiling-simd-prefetch
9. all
To execute a section in part2, you can run:
make <section-name>
This will create an executable file in the build directory; to run the executable, just do the
following:
./build/<section-name> <matrix-size> <kernel-size>

The block size for the tiling tasks can be defined in the program itself.
Deadline

Chak De
Microarchitecture

Best wishes
See you in Vivas 🙃

LCL Ewm BBP Document
100% (2)
LCL Ewm BBP Document
18 pages
C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Design Thinking Handbook
No ratings yet
Design Thinking Handbook
124 pages
Xbox Architecture: Architecture of Consoles: A Practical Analysis, #13
From Everand
Xbox Architecture: Architecture of Consoles: A Practical Analysis, #13
Rodrigo Copetti
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
SOP-017 Physical Security (v.05)
No ratings yet
SOP-017 Physical Security (v.05)
9 pages
IT - R19 Final - 210
No ratings yet
IT - R19 Final - 210
210 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Compiler Optimizations and Prefetching
No ratings yet
Compiler Optimizations and Prefetching
22 pages
Exercices Memory-Caches
No ratings yet
Exercices Memory-Caches
31 pages
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Lec 34
No ratings yet
Lec 34
26 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Questions On Chapter 1 and 2 Color New V2
No ratings yet
Questions On Chapter 1 and 2 Color New V2
8 pages
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
CS-30005 (HPC) - CS End Nov 2024
No ratings yet
CS-30005 (HPC) - CS End Nov 2024
23 pages
106
No ratings yet
106
80 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Midtermsolutions
No ratings yet
Midtermsolutions
3 pages
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 202 Computer Organisation Previous Years Unsolved Papers
Manish Soni
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Migdalskiy Sergiy Physics Optimization Strategies
No ratings yet
Migdalskiy Sergiy Physics Optimization Strategies
104 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
PA2 - Lehra Do! Prefetchers
No ratings yet
PA2 - Lehra Do! Prefetchers
6 pages
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Coa Assignment
No ratings yet
Coa Assignment
1 page
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
Mastering Dynamic Programming in Python
From Everand
Mastering Dynamic Programming in Python
Ed A Norex
No ratings yet
Learning PyTorch 2.0, Second Edition
From Everand
Learning PyTorch 2.0, Second Edition
Matthew Rosch
No ratings yet
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
Software Development Techniques
From Everand
Software Development Techniques
Chandini Devar
No ratings yet
C++ Mastery: Advanced Techniques and Strategies
From Everand
C++ Mastery: Advanced Techniques and Strategies
Adam Jones
No ratings yet
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
From Everand
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
Rodrigo Copetti
No ratings yet
OS Topics 2024-1
No ratings yet
OS Topics 2024-1
5 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Part 1: Objective Paper: Is Used For Process Switching in An Operating System
No ratings yet
Part 1: Objective Paper: Is Used For Process Switching in An Operating System
4 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
2 Mark Answers
No ratings yet
2 Mark Answers
9 pages
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
From Everand
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
ARCHER PAUL
No ratings yet
Unit 4 - 6 Sample Questions
No ratings yet
Unit 4 - 6 Sample Questions
9 pages
COA Digital-Cheatsheet
No ratings yet
COA Digital-Cheatsheet
4 pages
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
78 pages
Unit II
No ratings yet
Unit II
9 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Mastering C: Advanced Techniques and Best Practices
From Everand
Mastering C: Advanced Techniques and Best Practices
Adam Jones
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Architecture Question Bank
No ratings yet
Architecture Question Bank
5 pages
15IF11 Multicore E PDF
No ratings yet
15IF11 Multicore E PDF
14 pages
DigitalLogic ComputerOrganization L22 CachesP3 Handout
No ratings yet
DigitalLogic ComputerOrganization L22 CachesP3 Handout
52 pages
Programming Concepts in C++
From Everand
Programming Concepts in C++
Robert Burns
No ratings yet
Algorithm Challenges: The Dojo Collection
From Everand
Algorithm Challenges: The Dojo Collection
Martin Puryear
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
The Beginner’s Guide to Local AI – Free AI Run Locally on Your PC
From Everand
The Beginner’s Guide to Local AI – Free AI Run Locally on Your PC
Steven Mcananey
No ratings yet
The Ultimate Prompt Vault: 1001 ChatGPT Commands Every Software Developer Should Know
From Everand
The Ultimate Prompt Vault: 1001 ChatGPT Commands Every Software Developer Should Know
Nemilidinne Ashok Reddy
No ratings yet
MATLAB Data Science
From Everand
MATLAB Data Science
Henry Codwell
No ratings yet
Flutter Full-Stack
From Everand
Flutter Full-Stack
HAROLD WHITES
No ratings yet
Architecture1 1 (2012)
No ratings yet
Architecture1 1 (2012)
87 pages
Python Programs1
No ratings yet
Python Programs1
7 pages
Notification 2021 2023 (Part Ii)
No ratings yet
Notification 2021 2023 (Part Ii)
2 pages
Cyber Arrow 3
No ratings yet
Cyber Arrow 3
9 pages
Project IS3940 - PNU
No ratings yet
Project IS3940 - PNU
28 pages
Readymade Dissertation in Delhi
100% (1)
Readymade Dissertation in Delhi
4 pages
Question
100% (1)
Question
17 pages
RSHH Qam13 Module 01 PDF
No ratings yet
RSHH Qam13 Module 01 PDF
16 pages
Retailer Outlet Name Retailer Nametelephone Number 1 Street No - Member Name
No ratings yet
Retailer Outlet Name Retailer Nametelephone Number 1 Street No - Member Name
5 pages
Etransfer KPESE Manual Version 1.0 PDF
No ratings yet
Etransfer KPESE Manual Version 1.0 PDF
16 pages
Teacher Resume Format in Word India
100% (1)
Teacher Resume Format in Word India
6 pages
Implementation of GSM Based Water Meter A Step Towards Automation in Billing System
No ratings yet
Implementation of GSM Based Water Meter A Step Towards Automation in Billing System
4 pages
Experiment No. 1: Name: Juili Maruti Kadu Te A Roll No: 19 UID: 118CP1102B Sub: Software Engineering
No ratings yet
Experiment No. 1: Name: Juili Maruti Kadu Te A Roll No: 19 UID: 118CP1102B Sub: Software Engineering
32 pages
DB Cheat Sheet Till Mid
No ratings yet
DB Cheat Sheet Till Mid
2 pages
Abb Utilities GMBH: Operation
No ratings yet
Abb Utilities GMBH: Operation
4 pages
Automated Warehouse PDF
No ratings yet
Automated Warehouse PDF
345 pages
TB Barricade v3 - ESP
No ratings yet
TB Barricade v3 - ESP
6 pages
ABCmouse Part of The Body Worksheets Packet
No ratings yet
ABCmouse Part of The Body Worksheets Packet
19 pages
Cloudflare 2025 Investor Day Presentation
No ratings yet
Cloudflare 2025 Investor Day Presentation
117 pages
BOBCAT S205 SKID STEER LOADER Service Repair Manual Instant Download (SN 530511001-530559999)
No ratings yet
BOBCAT S205 SKID STEER LOADER Service Repair Manual Instant Download (SN 530511001-530559999)
32 pages
Data Entry Operator Job Description
100% (1)
Data Entry Operator Job Description
2 pages
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
No ratings yet
CS 7641 CSE/ISYE 6740 Mid-Term Exam 2 (Fall 2016) Solutions: 1 Probability and Bayes' Rule (14 PTS)
12 pages
01LAB Anonuevo Justin
No ratings yet
01LAB Anonuevo Justin
5 pages
Photo Contest Criteria and Guidelines
No ratings yet
Photo Contest Criteria and Guidelines
2 pages
Quick Guide: SUN2000 - (50KTL-ZHM3, 50KTL-M3)
No ratings yet
Quick Guide: SUN2000 - (50KTL-ZHM3, 50KTL-M3)
21 pages
Part4 F
No ratings yet
Part4 F
26 pages
Introduction To Syntax Analysis: CSCI4160: Compiler Design and Software Development
No ratings yet
Introduction To Syntax Analysis: CSCI4160: Compiler Design and Software Development
36 pages

CS683 Pa1

Uploaded by

CS683 Pa1

Uploaded by

Programming Assignment 1 - Chak De Microarchitecture!

CS 683: Advanced Computer Architecture, Autumn 2024

Invite to the assignment: https://classroom.github.com/a/4u66Tjkg

“ Theory will take you

Task 1 (Matrix transpose)

1A Tile it to see it 2.5 points

1B Fetch it but with soft corner (software prefetching) 2.5 points

1C Tiling + Prefetching 2 points

Task 2 (2D convolution)

2A Shhh SIMD in action 2 points

2B Tile it again 2 points

2C Software Prefetching 2 points

2D Hum saath saath hain 2 points

“ Babumoshai MATRIX choti honi

Answer the following:

“ Don’t let PROCRASTINATION take

Answer the following:

Task 1C: Tiling + Prefetching

Answer the following:

Life with SIMD (ek bow,

1. Understanding SIMD Registers:

lscpu | grep avx512

If this yields some flags like avx512*, there you go!

2. Loading Data into SIMD Registers:

You can refer to all these functions here:

Answer the following:

Task 2C: Software prefetching

Answer the following:

Task 2D: Hum Saath Saath Hain

Techniques Concerned function

Tiling + SIMD tiled_simd_convolution

Tiling + prefetching tiled_prefetch_convolution

SIMD + prefetching simd_prefetch_convolution

Tiling + SIMD + prefetching simd_tiled_prefetch_convolution

1. Measure and report the execution time of all the combinations.

To execute a section in part1, you can run:

2. For part2 (cd to the part2 directory)

You might also like