0% found this document useful (0 votes)
85 views6 pages

Deep Learning Over Big Data Stacks: Challenges and A Comprehensive Study On HPC Clusters

This document discusses the challenges of deep learning over big data. It notes that deep learning is seeing a resurgence due to improved model accuracy, larger datasets, and increased computing power. Deep learning over big data provides benefits like building powerful analytics pipelines that leverage existing big data deployments. However, there are challenges managing the 3 V's of big data - volume, variety, and velocity - when using deep learning. Techniques are needed to efficiently handle large and diverse datasets as well as data streaming in at high speeds.

Uploaded by

dipishankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views6 pages

Deep Learning Over Big Data Stacks: Challenges and A Comprehensive Study On HPC Clusters

This document discusses the challenges of deep learning over big data. It notes that deep learning is seeing a resurgence due to improved model accuracy, larger datasets, and increased computing power. Deep learning over big data provides benefits like building powerful analytics pipelines that leverage existing big data deployments. However, there are challenges managing the 3 V's of big data - volume, variety, and velocity - when using deep learning. Techniques are needed to efficiently handle large and diverse datasets as well as data streaming in at high speeds.

Uploaded by

dipishankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Deep Learning over Big Data Stacks: Challenges

and A Comprehensive Study on HPC Clusters


Big Data and Deep Learning
• Deep Learning is a sub-set of Machine Learning.
– But, it is perhaps the most radical and revolutionary
subset
• Deep Learning is going through a resurgence
– Model: Excellent accuracy for deep/convolutional
neural networks
Courtesy: http://www.zdnet.com/article/caffe2-deep-learning-wide-ambitions-
– Data: Public availability of versatile datasets like flexibility-scalability-and-advocacy/

MNIST, CIFAR, and ImageNet


– Capability: Unprecedented computing and
communication capabilities: Multi-/Many-Core,
GPGPUs, Xeon Phi, InfiniBand, RoCE, etc.
• Big Data has become one of the most important
elements in business analytics
– Increasing demand for getting Big Value out of Big
Data to drive the revenue continuously growing MNIST handwritten digits Deep Neural Network
2
Deep Learning over Big Data (DLoBD)
• Deep Learning over Big Data (DLoBD) is one of the most efficient analyzing paradigms
• Benefits of the DLoBD approach
– Easily build a powerful data analytics pipeline; or enhance existing workflows
• E.g., Flickr DL/ML Pipeline, “How Deep Learning Powers Flickr”, http://bit.ly/1KIDfof

(3) Non-deep
(1) Prepare (2) Deep (4) Apply ML
learning
Datasets @Scale Learning @Scale model @Scale
analytics @Scale
– Better data locality
– Efficient resource sharing and cost effective => Leverage existing Big Data deployments

3
The 3 V’s of Big Data: Deep Learning Challenges
• Managing Big Data Volume: Large number of examples
(inputs), large varieties of class types (outputs), and
very high dimensionality (attributes). Volume
– Distributed frameworks over parallelized machines
– CPU/GPU SIMD for training speed with accuracy;
support both model and data parallelism Variety Velocity
Data incompleteness and with unlabeled or noisy
labels
– E.g., 80 million tiny image database: low-resolution
color images over 79,000 search terms
– Need for more efficient cost function or Semi-
supervised strategies
4
The 3 V’s of Big Data: Deep Learning Challenges
• Managing Big Data Variety: Multiple modalities
– Data source variety: audio streams, graphics and
animations, and unstructured text, etc. Volume
– Step1: Learn data representations from each
individual data sources using DL
– Step2: Learn shared representations capable of Variety Velocity
capturing correlations across multiple modalities
– E.g., multimodal Deep Boltzmann Machine (DBM)
[Srivastava and Salakhutdinov]
• multiple stacked-RBMs for each modality
• an additional layer of binary hidden units on
top for joint representation.

5
The 3 V’s of Big Data: Deep Learning Challenges
• Managing Big Data Velocity: Data is being generated at
extremely high speed and need to be processed in a
timely manner Volume
– Need for Online Learning approaches
– Limited progress with Online DL over conventional
neural networks
– Mini-batches with SGD for a good balance between
Variety Velocity
computer memory and running time
High velocity is that data are often non-stationary
– Temporal locality of data => significant degree of
correlation
– Ability to learn the data as a stream

You might also like