0% found this document useful (0 votes)
4 views16 pages

Map Reduce Machine Learning

The document discusses a general parallel programming framework for machine learning algorithms on multicore architectures, addressing the limitations of traditional methods that lack scalability. It integrates the Map-Reduce paradigm to facilitate efficient parallelization of various algorithms, including Logistic Regression, Neural Networks, and k-means. The proposed approach reformulates computations as summations, allowing for independent processing across multiple cores, thus achieving linear speedups in performance.

Uploaded by

Abebe Zeihun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views16 pages

Map Reduce Machine Learning

The document discusses a general parallel programming framework for machine learning algorithms on multicore architectures, addressing the limitations of traditional methods that lack scalability. It integrates the Map-Reduce paradigm to facilitate efficient parallelization of various algorithms, including Logistic Regression, Neural Networks, and k-means. The proposed approach reformulates computations as summations, allowing for independent processing across multiple cores, thus achieving linear speedups in performance.

Uploaded by

Abebe Zeihun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Map-Reduce for Machine Learning on Multicore

Abebe Zerihun
Introduction
• Multicore Era: Increasing number of processing cores
per chip, but lack of effective programming
frameworks for multicore machine learning.
• Goal: Develop a general, parallel programming
method to leverage multicore architectures for a
wide range of machine learning algorithms.
• Challenge: traditional approaches focus on specific
optimizations for individual algorithms, lacking
scalability and generalizability.
Map-Reduce
Contribution
• General Framework:
– Algorithms fitting the Statistical Query Model are
expressed in a summation form, enabling easy
parallelization.
– This method ensures exact, non-approximated
implementations of machine learning algorithms.
• Example instead of directly accessing a dataset {(xi, yi)}ni=1,
an algorithm may query.

where f(x, y) is a function of the data.


Cont..
• Integration with Map-Reduce:
– Adapts Google's Map-Reduce paradigm for multicore
systems, simplifying programming and achieving linear
speedups.
• Broad Applicability:
– Demonstrates parallelization for diverse algorithms:
Logistic Regression, SVM, Neural Networks, PCA, ICA, k-
means, EM, Naive Bayes, etc.
Workflow
• Summation Form:
– Reformulates computations as summations over data
subsets, ideal for parallel execution.
• Map-Reduce Framework:
– Map Phase: Data is split among cores, and partial
computations are performed.
• Reduce Phase:
– Partial results are aggregated, and final computations are
completed.
• Parallel Gradient Descent:
– Key algorithms like Logistic Regression and Neural
Networks use batch gradient descent for efficient parallel
optimization
Algorithms Demonstrated
• Locally Weighted Linear Regression
(LWLR):Parallelizes matrix multiplications and
statistics aggregation
• Logistic Regression: computes gradients and
Hessian in parallel for optimization.
• Neural Networks (Backpropagation):Parallelizes
error backpropagation and gradient computations for
weight updates
• Principal Component Analysis (PCA):Parallelizes
covariance matrix computation and mean
calculations.
• k-means Clustering: parallelizes distance
Logistic Regression
• In logistic regression, we aim to optimize the parameters θ of the
hypothesis:

– To predict probabilities for binary classification.


• The negative log-likelihood gives the log-loss (cost function):
Gradient Computation
• The gradient of J(θ) with respect to θ :

• This is already in summation form since we compute a


summation over all data points.
• Each term can be computed independently for each data
point (xi,yi).
Parallelizing Logistic Regression
• Step 1: Partition the data
– Divide the dataset into P disjoint subsets Dp​(one for each processor).
– Each subset contains mp​examples:

• Step 2: Compute Partial Gradients Locally


– On each processor p, compute the partial gradient over its assigned
subset:
Cont..

• Aggregate gradients

– Once the gradient is computed, update the parameters 𝜃θ


• Update parameters

using gradient descent:

– Where α is learning rate


Example
• Suppose there are 500 training samples.
• Partitioned data for 5 cores.
• Let D1,D2,…,D5​be the data subsets for each core,

• Core 1: 𝐷1={(𝑥1,𝑦1),…,(𝑥100,𝑦100)},
where each subset contains 100 samples:

• Core 2: 𝐷2={(𝑥101,𝑦101),…,(𝑥200,𝑦200)},
• Core 3: 𝐷3={(𝑥201,𝑦201),…,(𝑥300,𝑦300)},
• Core 4: 𝐷4={(𝑥301,𝑦301),…,(𝑥400,𝑦400)}
• Core 5: 𝐷5={(𝑥401,𝑦401),…,(𝑥500,𝑦500)}.
• Each core computes the partial
gradients for its assigned data • Core 4:
subset.
• Core1:

• Core 5:
• Core 2:

• Core 3
Aggregating Gradients
• After all cores compute their partial gradients, the gradients are
aggregated to form the total gradient.

• Update
Time complexity
• Logistic Regression
• Single machine/ core
– O(mn2 + n3)
• Multi-core
– O(+ n2log(p))
Q and A ?

You might also like