0% found this document useful (0 votes)
3 views

Computing Department Task

The document outlines a task to implement a parallelized machine learning algorithm, such as k-means clustering or logistic regression, using a large dataset. It emphasizes the use of data parallelism with frameworks like MPI/OpenMP to optimize computation time and hardware resource utilization. Deliverables include source code and a report detailing the parallelization strategy and performance evaluation results.

Uploaded by

temasgen201
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Computing Department Task

The document outlines a task to implement a parallelized machine learning algorithm, such as k-means clustering or logistic regression, using a large dataset. It emphasizes the use of data parallelism with frameworks like MPI/OpenMP to optimize computation time and hardware resource utilization. Deliverables include source code and a report detailing the parallelization strategy and performance evaluation results.

Uploaded by

temasgen201
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Task: Data Parallelism in Machine Learning

Objective: Implement a parallelized version of a machine learning algorithm (e.g., k-means


clustering, logistic regression) to process a large dataset.
Instructions:
1. Dataset:
o Use a large publicly available dataset (e.g., Kaggle) or generate artificial data.
2. Algorithm:
o Choose a machine learning algorithm that can be parallelized, such as k-means
clustering or logistic regression.
3. Implementation:
o Implement the algorithm using data parallelism. Distribute the dataset across
multiple cores or nodes.
o Use frameworks like MPI/OpenMP for CPU parallelism.
4. Optimization:
o Focus on minimizing computation time and maximizing the utilization of
hardware resources.
o Consider load balancing, data communication, and convergence criteria in your
optimization.
5. Performance Evaluation:
o Measure the time taken to train the model and assess its accuracy.
o Compare the performance and scalability of the parallel implementation with a
sequential one.
6. Deliverables:
o Submit the source code and a report.
o The report should include an explanation of the parallelization strategy,
performance results, and any understanding made during implementation.

You might also like