0% found this document useful (0 votes)
4 views10 pages

Computer Network

Distributed machine learning utilizes multiple computing resources to efficiently handle large-scale datasets and complex models. Key frameworks include Apache Spark, TensorFlow, and PyTorch, which support distributed training and data processing. The approach emphasizes scalability, fault tolerance, and effective communication among distributed components.

Uploaded by

Gaurav Jena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

Computer Network

Distributed machine learning utilizes multiple computing resources to efficiently handle large-scale datasets and complex models. Key frameworks include Apache Spark, TensorFlow, and PyTorch, which support distributed training and data processing. The approach emphasizes scalability, fault tolerance, and effective communication among distributed components.

Uploaded by

Gaurav Jena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIVERSITY INSTITUTE OF

ENGINEERING
COMPUTER SCIENCE ENGINEERING
Bachelor of Engineering (Computer Science & Engineering)
Subject Name: Big Data Analytics
Subject Code: 20CST-471
Distributed machine learning
Mapped with CO5

Prepared By: DISCOVER . LEARN . EMPOWER


Er. Ankita Sharma 1
Distributed machine learning
• Distributed machine learning refers to the use of multiple computing
resources, often organized in a cluster or a distributed computing
environment, to perform machine learning tasks. This approach is
essential for handling large-scale datasets, complex models, and
computationally intensive operations that cannot be efficiently
processed on a single machine. The primary goals of distributed
machine learning are to improve scalability, reduce processing time,
and enable the handling of big data challenges.
• 1. Distributed Computing Frameworks:
• Apache Spark: A popular open-source distributed computing framework that
provides a unified analytics engine for large-scale data processing. Spark
includes MLlib, a library for distributed machine learning.
• TensorFlow and PyTorch: Popular deep learning frameworks that can be
configured to work in a distributed manner, allowing the training of deep
neural networks across multiple GPUs or servers.
• Dask: A parallel computing framework in Python that enables distributed
computing for machine learning and other data-intensive tasks.
• Hadoop MapReduce: While not as commonly used for machine learning as
Spark, Hadoop MapReduce can also be adapted for distributed machine
learning tasks.
• 2. Parallelism and Data Distribution:
• Data Parallelism: Distributing the dataset across multiple nodes or
machines and performing parallel computation on different subsets of the
data. Each machine processes a portion of the data independently.
• Model Parallelism: Distributing the components of a model across different
machines, where each machine is responsible for computing specific parts
of the model.
Distributed machine learning refers to the use of multiple computing resources, often
organized in a cluster or a distributed computing environment, to perform machine
learning tasks. This approach is essential for handling large-scale datasets, complex
models, and computationally intensive operations that cannot be efficiently processed on
a single machine. The primary goals of distributed machine learning are to improve
scalability, reduce processing time, and enable the handling of big data challenges.
• Here are key concepts and aspects related to distributed machine learning:
• 1. Distributed Computing Frameworks:
• Apache Spark: A popular open-source distributed computing framework that provides a
unified analytics engine for large-scale data processing. Spark includes MLlib, a library for
distributed machine learning.
• TensorFlow and PyTorch: Popular deep learning frameworks that can be configured to
work in a distributed manner, allowing the training of deep neural networks across
multiple GPUs or servers.
• 3. Communication and Synchronization:
• Parameter Server Architectures: In distributed machine learning,
parameter servers are used to manage and distribute model
parameters across the cluster. Workers perform computations and
update the parameters by communicating with the parameter server.
• Synchronization Strategies: Ensuring that the different components
of the distributed system are synchronized is crucial. Strategies
include synchronous updates, asynchronous updates, and a
combination of both.
• 4. Scaling Algorithms:
• Horizontal Scaling: Increasing the number of machines or nodes in
the cluster to handle larger datasets and more complex models.
• Vertical Scaling: Utilizing more powerful hardware or increasing the
computational capacity of individual machines.
• 5. Fault Tolerance:
• Distributed machine learning systems need to be resilient to failures
in the cluster. Techniques such as data replication, checkpointing, and
fault-tolerant algorithms are employed to handle failures gracefully.
•Thankyou

You might also like