The document outlines a task to implement a parallelized machine learning algorithm, such as k-means clustering or logistic regression, using a large dataset. It emphasizes the use of data parallelism with frameworks like MPI/OpenMP to optimize computation time and hardware resource utilization. Deliverables include source code and a report detailing the parallelization strategy and performance evaluation results.
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views
Computing Department Task
The document outlines a task to implement a parallelized machine learning algorithm, such as k-means clustering or logistic regression, using a large dataset. It emphasizes the use of data parallelism with frameworks like MPI/OpenMP to optimize computation time and hardware resource utilization. Deliverables include source code and a report detailing the parallelization strategy and performance evaluation results.
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1
Task: Data Parallelism in Machine Learning
Objective: Implement a parallelized version of a machine learning algorithm (e.g., k-means
clustering, logistic regression) to process a large dataset. Instructions: 1. Dataset: o Use a large publicly available dataset (e.g., Kaggle) or generate artificial data. 2. Algorithm: o Choose a machine learning algorithm that can be parallelized, such as k-means clustering or logistic regression. 3. Implementation: o Implement the algorithm using data parallelism. Distribute the dataset across multiple cores or nodes. o Use frameworks like MPI/OpenMP for CPU parallelism. 4. Optimization: o Focus on minimizing computation time and maximizing the utilization of hardware resources. o Consider load balancing, data communication, and convergence criteria in your optimization. 5. Performance Evaluation: o Measure the time taken to train the model and assess its accuracy. o Compare the performance and scalability of the parallel implementation with a sequential one. 6. Deliverables: o Submit the source code and a report. o The report should include an explanation of the parallelization strategy, performance results, and any understanding made during implementation.