0% found this document useful (0 votes)
16 views

Big Data Processing Concepts

Big Data Processing Concepts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Big Data Processing Concepts

Big Data Processing Concepts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Big Data Processing Concepts

S.Kavitha
Head & Assistant Professor
Department of Computer Science
Sri Sarada Niketan College of Science for Women,Karur.
Parallel Data Processing
• Parallel data processing involves the simultaneous
execution of multiple sub-tasks that
• collectively comprise a larger task. The goal is to
reduce the execution time by dividing a
• single larger task into multiple smaller tasks that run
concurrently.
• Although parallel data processing can be achieved
through multiple networked machines,
• it is more typically achieved within the confines of a
single machine with multiple
• processors
Distributed Data Processing
• Distributed data processing is closely related
to parallel data processing in that the same
• principle of “divide-and-conquer” is applied.
However, distributed data processing is
• always achieved through physically separate
machines that are networked together as a
• cluster.
Hadoop
• Hadoop is an open-source framework for large-scale
data storage and data processing that
• is compatible with commodity hardware. The Hadoop
framework has established itself as
• a de facto industry platform for contemporary Big
Data solutions. It can be used as an
• ETL engine or as an analytics engine for processing
large amounts of structured, semistructured
• and unstructured data. From an analysis perspective,
Hadoop implements the
• MapReduce processing framework.
Processing Workloads
• A processing workload in Big Data is defined
as the amount and nature of data that is
• processed within a certain amount of time.
Workloads are usually divided into two types:
• batch
• transactional
Batch
• Batch processing, also known as offline
processing, involves processing data in
batches
• and usually imposes delays, which in turn
results in high-latency responses. Batch
• workloads typically involve large quantities of
data with sequential read/writes and
• comprise of groups of read or write queries.
Transactional
• Transactional processing is also known as online
processing. Transactional workload
• processing follows an approach whereby data is
processed interactively without delay,
• resulting in low-latency responses. Transaction workloads
involve small amounts of data
• with random reads and writes.
• OLTP and operational systems, which are generally write-
intensive, fall within this
• category. Although these workloads contain a mix of
read/write queries, they are generally
• more write-intensive than read-intensive.
Cluster
• In the same manner that clusters provide
necessary support to create horizontally scalable
• storage solutions, clusters also provides the
mechanism to enable distributed data
• processing with linear scalability. Since clusters
are highly scalable, they provide an ideal
• environment for Big Data processing as large
datasets can be divided into smaller datasets
• and then processed in parallel in a distributed
manner.
Processing in Batch Mode
• In batch mode, data is processed offline in
batches and the response time could vary from
• minutes to hours. As well, data must be
persisted to the disk before it can be
processed.
• Batch mode generally involves processing a
range of large datasets, either on their own or
• joined together, essentially addressing the
volume and variety characteristics of Big Data
• datasets.

You might also like