0% found this document useful (0 votes)
14 views49 pages

Big Data Lecture

Uploaded by

sonia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views49 pages

Big Data Lecture

Uploaded by

sonia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Big Data Query Language

20CS2005
The student will be able to
1. understand the importance and challenges of Big data
2. design applications using HADOOP and RHADOOP
3. identify the appropriate function of PIG data model to be used in development
4. model Big data applications schema and use HIVE QL
5. develop applications with Cassandra.
6. build applications with HDFS and MapReduce
Module 2: Data Analysis using R and Hadoop
Features of R language - HADOOP Features - HDFS and MapReduce architecture - R and Hadoop
Integrated Programming Environment (RHIPE) Introduction - Architecture of RHIPE - RHIPE function
reference - RHADOOP Introduction - Architecture of RHADOOP - RHADOOP function reference, SQL on
HADOOP.
Big Data
Hadoop and
Spark
History of
Spark
Given to the
Apache Software Exists as a next
Started at UC generation real-
Berkeley Foundation and
time and batch
AM PLab by the license was processing
Matei Zaharia changed to Apache framework
2.0
201
2009 3 Present

2010 2014

Open Became an Apache


sourced top- level project.
under a Used by Databricks
BSD to set a world
license record in large-scale
sorting
Limitations of MapReduce in
Hadoop

Unsuitable for real-time processing


Being batch oriented, it takes minutes to execute jobs
depending on
the amount of data and number of nodes in the cluster.

Unsuitable for trivial operations


For operations like Filter and Joins, you might need to rewrite
the jobs,
which becomes complex because of the key-value pattern.

Unsuitable for large data on network


Since it works on the data locality principle, it cannot
process a lot of data that requires shuffl ing over the
network.
Limitations of MapReduce in
Hadoop

Unsuitable with OLTP


OLTP requires a large number of short transactions, as it
works on the batch-oriented framework.

Unsuitable for processing graphs


The Apache Graph library processes graphs, that adds
additional
complexity on top of MapReduce.

Unsuitable for iterative execution


Being a stateless execution, MapReduce doesn’t fit in use
cases like k- means that need iterative execution.
Click icon to add picture

THANK YOU

You might also like