0% found this document useful (0 votes)
24 views10 pages

My Work

The document details an assignment on parallel distributed computing, focusing on word analysis and term frequency analysis in large text files using multi-threading. It explains the implementation of thread management, including chunk division, thread safety with mutexes, and performance optimization through thread affinity. The document also presents execution time comparisons for different threading configurations and discusses challenges faced during the implementation.

Uploaded by

i220875
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

My Work

The document details an assignment on parallel distributed computing, focusing on word analysis and term frequency analysis in large text files using multi-threading. It explains the implementation of thread management, including chunk division, thread safety with mutexes, and performance optimization through thread affinity. The document also presents execution time comparisons for different threading configurations and discusses challenges faced during the implementation.

Uploaded by

i220875
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Assignment 01 Parallel Distributed Computing

Muhammad Daud Cheema


I220875
Youtube video link: https://youtu.be/CKMOJGAiAM8?si=zxIs3JCLUwmgWSOD

TASK 2: Word Analysis in a Large Text File


Solution Explanation:
For this, the program:
1. Obtain the file size and equally divide it into chunks and assigns each chunk to a separate
thread using pthread_create.
2. Each thread processes its chunk, counts words, checks for vowels, and updates the global
statistics ( vowel word count, and word frequencies).
3. To ensure thread safety, shared resources are protected using mutexes (mutex_lock and
mutex_unlock).
4. The program uses pthread_setaffinity_np to bind threads to specific CPU cores, enhancing
performance by reducing context switching and distributing the data equally among cores.
Execution Time and Speedup Analysis:
1. Without Thread Affinity: In this configuration, the operating system decides how to
distribute threads across available CPU cores.
2.
3. With Thread Affinity: Threads are explicitly bound to specific CPU cores using
pthread_setaffinity_np. This configuration minimizes context switching and can improve
performance on systems with multiple cores.
Challenges Faced and Solutions:
1. Handling large files: Processing a file larger than the system's memory requires careful
handling of chunks. The solution divides the file into smaller, manageable parts for each
thread.
2. Thread Synchronization: Ensuring thread safety when updating shared resources (like
word frequencies and counts) was challenging. This was addressed by using mutexes to lock
shared variables and prevent race conditions.
3. Data Extraction: I had to store the entire data into a txt file, so that it could be divided into
chunks.

Problem 2: Term Frequency Analysis


Solution Explanation:
The problem involves performing Term Frequency Analysis on a large text file using multi-
threading. The goal is to count the frequency of each word in the file and calculate the total
number of unique words. The analysis can be done with and without considering thread
affinity, which refers to binding threads to specific CPU cores to optimize performance.
Execution Time and Speedup Analysis:
1. Without Thread Affinity: In this setup, the OS decides how to distribute threads across
CPU cores.
With Thread Affinity: Threads are explicitly bound to cores, potentially improving
performance by reducing context switching.
Challenges Faced and Solutions:
1. Handling large files: Processing a file larger than the system's memory requires careful
handling of chunks. The solution divides the file into smaller, manageable parts for each
thread.
2. Thread Synchronization: Ensuring thread safety when updating shared resources (like
word frequencies and counts) was challenging. This was addressed by using mutexes to lock
shared variables and prevent race conditions.
3. Data Extraction: I had to store the entire data into a txt file, so that it could be divided into
chunks.
TABLE FOR BOTH PROBLEMS
Time for one thread Time for 2 threads Time for 4 threads
T2 without Affinity 97.8823 80.6181 74.0218
T2 with Affinity 80.71 64.72 72.57
T3 without Affinity 128.762 67.77 34.534
T3 with Affinity 75.6515 45.776 40.5807

You might also like