0% found this document useful (0 votes)

6 views9 pages

Big Data Technology

The document discusses Big Data technologies, focusing on NoSQL databases and Hadoop. NoSQL is a non-relational database used for big data applications, while Hadoop is an open-source platform for storing and processing diverse data types, featuring HDFS and MapReduce for scalability and fault tolerance. It also highlights the shift from traditional data processing methods to more flexible and cost-effective open-source solutions and cloud-based models.

Uploaded by

pratima depa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Big Data Technology

Uploaded by

pratima depa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Big Data Technology

(NoSQL and Hadoop)

NoSQL(NOT ONLY SQL)

It is a light weight ,open source, non relational database that did not expose the
standard SQL interface.

NoSQL databases are widely used in bigdata and other real time web applications.

Features of NoSQL:

1. NoSQL databases are non-relational

2. Distributed

3. No support for ACID properties

4. No fixed table schema

Types of NoSQL Databases

1. Key-value

2. Document

3. Column

4. Graph
Hadoop is an open-source platform for storage and processing of diverse data types that
enables data-driven enterprises to rapidly derive the complete value from all their
data.

History of Hadoop

The original creators of Hadoop are Doug Cutting (used to be at Yahoo!

now at Cloudera) and Mike Cafarella (now teaching at the University of
Michigan in Ann Arbor). Doug and Mike were building a project called
“Nutch” with the goal of creating a large Web index. They saw the
MapReduce and GFS papers from Google, which were obviously super
relevant to the problem Nutch was trying to solve. They integrated the
concepts from MapReduce and GFS into Nutch; then later these two
components were pulled out to form the genesis of the Hadoop project.

The name “Hadoop” itself comes from Doug’s son yellow plush elephant toy that he
has.
■
The scalability and elasticity of free, open-source Hadoop running on
standard hardware allow organizations to hold onto more data than ever
before.
■
Hadoop handles a variety of workloads, including search, log processing,
recommendation systems, data warehousing, and video/image analysis
■
Apache Hadoop is an open-source project Hadoop is able to store any
kind of data in its native format and to perform a wide variety of analyses and
transformations on that data. Hadoop stores terabytes, and even petabytes, of
data inexpensively. It is robust and reliable and handles hardware and system
failures automatically, without losing data or interrupting data analyses.
■
Hadoop runs on clusters of commodity servers and each of those servers
has local CPUs and disk storage that can be leveraged by the system.
The two critical components of Hadoop are:
1. The Hadoop Distributed File System (HDFS). HDFS is the storage
system for a Hadoop cluster. When data lands in the cluster, HDFS breaks
it into pieces and distributes those pieces among the different servers
participating in the cluster. Each server stores just a small fragment of the
complete data set, and each piece of data is replicated on more than one
server.

2.MapReduce. Because Hadoop stores the entire dataset in small pieces across a
collection of servers, analytical jobs can be distributed, in parallel, to each of the
servers storing part of the data. Each server evaluates the question against its
local fragment simultaneously and reports its results back for collation into a
comprehensive answer. MapReduce is the agent that distributes the work and
collects the results.

■
Both HDFS and MapReduce are designed to continue to work in
the face of system failures.
■
Because of the way that HDFS and MapReduce work, Hadoop
provides scalable, reliable, and fault-tolerant services for data storage and
analysis at very low cost.

Compute Cluster

DFS Block 1 DFS Block 1

Data Map
datadata data data data
datadata data data data DFS Block 1
datadata data data data
datadata data data data
datadata data data data
datadata data data data DFS Block 2

datadata data data data Reduce

datadata data data data DFS Bloc Ma p
datadata data data data
datadata data data data
datadata data data data
datadata data data data DFS Block 2

Map

DFS Block 3
DFS Block 3

Old vs. New Approaches

■
The old way is a data and analytics technology stack with different layers
“cross-communicating data” and working on “scale-up” expensive
hardware.
■
The new way is a data and analytics platform that does all the data
processing and analytics in one “layer,” without moving data back and
forth.

Summary

1. The technology stack has changed. New proprietary technologies and

open-source inventions enable different approaches that make it easier and
more affordable to store, manage, and analyze data.
2. Hardware and storage is affordable and continuing to get cheaper to enable
massive parallel processing.
3. The variety of data is on the rise and the ability to handle unstructured data
is on the rise.

Data Discovery: Work the Way People’s Minds Work

Tableau Software and QlikTech International.(Qlikview)

Open-Source Technology for Big Data Analytics

 Open-source software is computer software that is available in

source code form under an open-source license that permits users to
study, change, and improve and at times also to distribute the
software.

 Although the source code is released, there are still governing bodies
and agreements in place. The most prominent and popular example is
the GNU General Public License (GPL), which “allows free
distribution under the condition that further developments and
applications are put under the same license.”This ensures that the
products keep improving over time for the greater population of users.

 Some other open-source projects are managed and supported by

commercial companies, such as Cloudera, that provide extra
capabilities, training, and professional services that support open-
source projects such as Hadoop.

 You can make it into what you want and what you need. If you
come up with an idea, you can put it to work immediately. That’s the
advantage of the open- source stack—flexibility, extensibility, and
lower cost.”
 “One of the great benefits of open source lies in the flexibility of the
adoption model: you download and deploy it when you need it”.

 Pace of software development has accelerated dramatically because of

open-source software.

 The old model was top-down, slow, inflexible and expensive.

The new software development model is bottom-up, fast,
flexible, and considerably less costly.

 A traditional proprietary stack is defined and controlled by a

single vendor, or by a small group of vendors. It reflects the old
command-and-control mentality of the traditional corporate
world and the old economic order.

 An open-source stack is defined by its community of users and

contributors. No one “controls” an open-source stack, and no one
can predict exactly how it will evolve. The open-source stack
reflects the new realities of the networked global economy, which is
increasingly dependent on big data.

The Cloud and Big Data

With a cloud model, you pay on a subscription basis with no upfront capital
expense. You don’t incur the typical 30 percent maintenance fees—and all the
updates on the platform are automatically available.

The ability to build massively scalable platforms—platforms where you

have the option to keep adding new products and services for zero additional
cost—is giving rise to business models that weren’t possible before.

Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
12 Cs Project Fashion Store Mysql Con
100% (2)
12 Cs Project Fashion Store Mysql Con
28 pages
Big Data Technology
No ratings yet
Big Data Technology
12 pages
BDA Module-2 Notes PDF
100% (1)
BDA Module-2 Notes PDF
14 pages
Unit II
No ratings yet
Unit II
60 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Unit Ii Hadoop With HDFS
No ratings yet
Unit Ii Hadoop With HDFS
22 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
BDA Notes Unit-2
No ratings yet
BDA Notes Unit-2
27 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
CC Unit 2
No ratings yet
CC Unit 2
29 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
58 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Big Data 2 - Part
No ratings yet
Big Data 2 - Part
40 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
00 HadoopWelcome Transcript
No ratings yet
00 HadoopWelcome Transcript
4 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
16 pages
Big Data and Mapreduce Challenges, Opportunities and Trends
No ratings yet
Big Data and Mapreduce Challenges, Opportunities and Trends
9 pages
HADOOP
No ratings yet
HADOOP
10 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
14 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Bda Unit 2
No ratings yet
Bda Unit 2
44 pages
Big Data
No ratings yet
Big Data
29 pages
Unit 4
100% (1)
Unit 4
33 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
The Big Data Technology Landscape
No ratings yet
The Big Data Technology Landscape
36 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
DocScanner Jan 12, 2023 2-29 PM
No ratings yet
DocScanner Jan 12, 2023 2-29 PM
32 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
SImplified Solutions of BAD601 Model Question Paper
No ratings yet
SImplified Solutions of BAD601 Model Question Paper
32 pages
A STUDY ON BIG DATA HADOOP Nandha Kumar
No ratings yet
A STUDY ON BIG DATA HADOOP Nandha Kumar
7 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
No ratings yet
Eng - Hadoopthe Next Big Thing in - Tanvi Deshpande
6 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
Hadoop Architecture and Its Functionality
No ratings yet
Hadoop Architecture and Its Functionality
7 pages
Data Science and Big Data UNIT 3
No ratings yet
Data Science and Big Data UNIT 3
11 pages
BIG DATA AND ANALYTICS Presentation
No ratings yet
BIG DATA AND ANALYTICS Presentation
31 pages
Big Data Technologies UNIT 1
No ratings yet
Big Data Technologies UNIT 1
5 pages
Hadoop Big Data Unit 2
No ratings yet
Hadoop Big Data Unit 2
23 pages
Big Data
No ratings yet
Big Data
28 pages
BAD601 Module 2 PDF
No ratings yet
BAD601 Module 2 PDF
61 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Devops Unit1
No ratings yet
Devops Unit1
45 pages
Adhoc and Sensor Networks Chapter 02
No ratings yet
Adhoc and Sensor Networks Chapter 02
68 pages
RTree Comparison Results
No ratings yet
RTree Comparison Results
1 page
Big Data Indexing
No ratings yet
Big Data Indexing
22 pages
DBMS Unit 5 Part2
No ratings yet
DBMS Unit 5 Part2
7 pages
DBMS Unit 5 Part1
No ratings yet
DBMS Unit 5 Part1
6 pages
Upload - Files - Doi 10.5455 Jjcit.71 16982303541708647460 88
No ratings yet
Upload - Files - Doi 10.5455 Jjcit.71 16982303541708647460 88
14 pages
Data Replication and Scheduling in The Cloud With Optimization Assisted Work Flow Management
No ratings yet
Data Replication and Scheduling in The Cloud With Optimization Assisted Work Flow Management
23 pages
MT
No ratings yet
MT
1 page
Support Vectors
No ratings yet
Support Vectors
7 pages
Moving Data in and Out of Hadoop
No ratings yet
Moving Data in and Out of Hadoop
17 pages
Discuss Security Issues of Social Networking
No ratings yet
Discuss Security Issues of Social Networking
3 pages
TD Ameritrade Secure Login
No ratings yet
TD Ameritrade Secure Login
3 pages
Change Manager
No ratings yet
Change Manager
266 pages
Advanced Engineering Informatics
No ratings yet
Advanced Engineering Informatics
32 pages
CSEC Information Technology Syllabus
No ratings yet
CSEC Information Technology Syllabus
71 pages
Remote Monitoring of Vehicle Diagnostics
No ratings yet
Remote Monitoring of Vehicle Diagnostics
14 pages
Dbms Lab Manual - Svit
No ratings yet
Dbms Lab Manual - Svit
18 pages
Haulage Calculation - Minesight Haulage
100% (2)
Haulage Calculation - Minesight Haulage
12 pages
Ffs
No ratings yet
Ffs
2 pages
EDU278 Test Prep 10
No ratings yet
EDU278 Test Prep 10
6 pages
FMS Answers
No ratings yet
FMS Answers
8 pages
Network Security and Security Laws and Regulations (Literature Review)
No ratings yet
Network Security and Security Laws and Regulations (Literature Review)
14 pages
LinkedIn & Naukri Optimization
No ratings yet
LinkedIn & Naukri Optimization
8 pages
Grade 4 End of Year Agric Science and Tech Paper 2 2022
No ratings yet
Grade 4 End of Year Agric Science and Tech Paper 2 2022
4 pages
Dell Assignment
No ratings yet
Dell Assignment
7 pages
Internship Report
No ratings yet
Internship Report
27 pages
Algo Assignment-P2 Binary SRCH
No ratings yet
Algo Assignment-P2 Binary SRCH
7 pages
Sna QB
No ratings yet
Sna QB
14 pages
CSE246 Report
No ratings yet
CSE246 Report
10 pages
Velpfiwe Advancerev05 2 278495
No ratings yet
Velpfiwe Advancerev05 2 278495
6 pages
FoodMart BO Case Study
No ratings yet
FoodMart BO Case Study
18 pages
Python Grade 9 Lesson Tuples and Lists
0% (1)
Python Grade 9 Lesson Tuples and Lists
50 pages
QT For Symbian: Liu Xiaoguo Forum Nokia, China
No ratings yet
QT For Symbian: Liu Xiaoguo Forum Nokia, China
47 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Aaaws
No ratings yet
Aaaws
16 pages
Tenetech Guide On Using The Mobile Application 1
No ratings yet
Tenetech Guide On Using The Mobile Application 1
6 pages
BS 7666-2-2006 - (2018-06-20 - 03-41-10 Am)
No ratings yet
BS 7666-2-2006 - (2018-06-20 - 03-41-10 Am)
30 pages
ACP Questions Examtopic
No ratings yet
ACP Questions Examtopic
71 pages
Vlsi Design PDF
No ratings yet
Vlsi Design PDF
77 pages

Big Data Technology

Uploaded by

Big Data Technology

Uploaded by

Big Data Technology

(NoSQL and Hadoop)

NoSQL(NOT ONLY SQL)

1. NoSQL databases are non-relational

3. No support for ACID properties

4. No fixed table schema

Types of NoSQL Databases

The original creators of Hadoop are Doug Cutting (used to be at Yahoo!

DFS Block 1 DFS Block 1

datadata data data data Reduce

Old vs. New Approaches

1. The technology stack has changed. New proprietary technologies and

Data Discovery: Work the Way People’s Minds Work

Open-Source Technology for Big Data Analytics

 Open-source software is computer software that is available in

 Some other open-source projects are managed and supported by

 Pace of software development has accelerated dramatically because of

 The old model was top-down, slow, inflexible and expensive.

 A traditional proprietary stack is defined and controlled by a

 An open-source stack is defined by its community of users and

The Cloud and Big Data

The ability to build massively scalable platforms—platforms where you

You might also like