Unit 4: Big Data Tehnology Landscape Two Inportant Technologies

This document provides an overview of NoSQL and Hadoop technologies. It discusses: 1. NoSQL databases as a distributed, non-relational alternative to SQL databases that can handle big data and real-time applications. Popular types include key-value, document, column, and graph databases. 2. Hadoop as a framework for distributed storage and processing of large datasets across clusters of commodity hardware. It consists of HDFS for storage and MapReduce for processing. 3. The course will cover challenges of distributed computing, an introduction to NoSQL and Hadoop technologies, and deep dives into HDFS, MapReduce, and use cases.

Uploaded by

kiran vemula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views42 pages

Unit 4: Big Data Tehnology Landscape Two Inportant Technologies

Uploaded by

kiran vemula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 42

Unit 4

Big data tehnology landscape

Two inportant technologies:
NoSQL and hadoop
Course objective
• To study the two technologies 1. NoSQL and 2. hadoop.
• Note that processing does not mean analytics.
while storing handles how to divide the big data into chunks
of data and distribute the blocks and store them,
Processing is to integrate these distributed blocks again
prior to processing such that it can be properly presented
to data analytic tools.
again analysis is different from analytics. Analysis converts
data into information in he form of reports and visualization
while analytics converts information into knowledge using
statistical and mathematics .
syllabus
• Course objective: study NoSQL and hadoop technologies
• 1.Distributed computing challenges
• 2.NoSQL
• 3.Hadoop: consisting of HDFS and MapReduce.
3.1.history of hadoop,
3.2.hadoop overview
3.3. use case of hadoop,
3.4.hadoop distributors,
• 4. HDFS:
4.1.HDFS daemons: Namenode, datanode, secondary
namenode
4.2. file read, file write, Replica processing of data with
hadoop
4.3.Managing resources and applications with Hadoop YARN
4.1.Distributed computing challenges
1. In a distributed system ,since several servers are
networked together there could be failure of
hardware.
ex: a hard disk failure creates data retrieval
problem
2. In DS the data is spread across several machines.
How to integrate them prior to processing it?
Solution: two important technologies: NoSQL and
hadoop. We study in this unit 4
2.NoSQL
• MySQL is the world's most used RDBMS, and runs as a
server providing multi-user access to a number of
databases.
• TheOracle Database is an object-relational database
management system (ORDBMS).
• The main difference between Oracleand MySQL is the fact
that MySQL is open source, whileOracle is not.
• SQL stands for Structured Query Language. It's a standard
language for accessing and manipulating databases
• SQL Server, Oracle, Informix, Postgres, etc are RDMS
2.1.introduction to NoSQL.
• It is a distributed DataBase model while hadoop is not a data
base.(hadoop is a framework) ;
• NoSQL is OpenSource, non relational, scalable.
• There are several databases which follow this NoSQL model.
• NoSQL data bases are used in Big data and real time web
applications, social media.
• They do not restrict the data to adhere to any schema at the
time of storage
• They structure the unstructured input data into different
formats viz key value pairs ; document oriented; coloumn
oriented; graph based data ; besides structured data
• They adhere to CAP theorem and compromise on C in favor of A
and P.
• It does not support ACID properties of transactions
(Atomocity,Consistency,Isolation, and Durability).
2.2.Types of NoSQL data bases
They can be broadly classified into:
1.key-value or the big hash table type: : they maintain big hash table
of keys and values.
sample key value pair:
key value
First name Robert
last name williams
2. Document type: maintain data as a collection of documents.
Documents are equivalent to records in RDBMS and collection is
equivalent of Table in RDBMS. Sample document:
{“Book Name”: “Fundamentals .. “,
“Publisher”: “Wiley India”,
“year”: “2011”
}
2.2.Types of NoSQL data bases ..
3.Column type: each storage block has data
from only one column
4. Graph type: Also called network db. A graph stores
data in nodes
sample graph:ID, name, Age stored in each node.
arrows carry Labels like “member”,”member since
2002” , “knows since 2002”, etc.,
2.3.popular NoSQL data bases
1. Key value or big hash table
2. Schema-less
1. Key value or big hash table type NoSQL Data bases: (some schema is
followed)
Amazon S3 (Dynamo); Scalaris , Redis,Riak,
2.schema-less: (no schema even like key, value)
2.1 Column based : Cassaandra, Hbase
2.2 Document based: ApacheCouchDB, MongoDB,
MarkLogic
2.3. Graph-based: Neo4j, HyperGraphDB
2.4.Advantages of NoSQL
• Dynamic schema: since it allows insertion of data without a
predefined schema-it facilitates application changes in real
time ie faster code development and integration and less
db administration
• Auto sharding: it automatically spreads data across
arbitrary number of servers while balancing the load and
query on the servers. if a server fails the server is replaced
w/o disruptions.
• Replication: multiple copies of data are stored across the
cluster and even across data centers. This promises high
availability and fault tolerance
2.4.Advantages of NoSQL..
• Rapid and elastic Scalability: allows to scale to the cloud
with the following capacities:
Cluster scale: allows distribution of data base across >100
nodes among multiple data centers
performance scale: supports over >100000 database read
and write operations per sec
Data scale: supports storing of >1 billion documents in the
db
• Cheap and easy to implement
• Adheres to CAP. relaxes consistency requirement
2.5.Disadvantages of NoSQL
• Does not support joins
• No support for ACID
• No standard query language interface except
in case of MongoDB and Cassandra(CQL)
• No easy integration with other applications
that support SQL
2.6.No SQL applications in Industry
• Key value pairs type data base: used for shopping carts, web user
data analysis(amazon, Linkedin)
• Column type database: used by facebook, twitter, eBay, NETFLIX
• Document type database : used for logging, archives management
• Graph type database : used in network modeling, walmart
• NoSQL vendors:
1.amazon (Dynamo): Used by Linkedin, Mozilla
2.Facebook(Cassandra):Used by Netflix. Twitter,eBay ie column type
darabase
3.Google(Big Table). Used by Adobe Photoshop
2.7.NewSQL
• Data base that has the same scalable
performance as NoSQL, support OLTP,
maintain ACID guarantees of traditional Data
Base.
• It is a new RDBMS supporting relational data
model and uses SQL as interface.
2.8.Comparison
Sql NOsql NewSQL

acid cap ACID

RDB Non RDB RDB

OTP/OLAP NO OTP/OLAP

Predefined schema No schema rigidity May have schema rigidity

Vertically scalable by Scaleout (horizontal) Scaleout (horizontal)

increasing system
resources

Distributed computing Distributed Distributed computing

Fully supported Increasing support growing

ACID
• In databanises, a transaction is a very small of a program
may contain several lowlevel tasks.
• A transaction in a database system must maintain Atomicity, Consistency,
Isolation, and Durability − commonly known as ACID properties − in order to
ensure accuracy, completeness, and data integrity .
• For example, a transfer of funds from one bank account to another, even involving
multiple changes such as debiting one account and crediting another, is a single
transaction.
• Atomicity Consistency Isolation Durability (ACID) is a concept referring to a
database system's four transaction
properties: atomicity, consistency, isolationand durability.
• These four properties describe the major guarantees of the transaction paradigm,
which has influenced many aspects of development in database systems.
Atomicity
• An atomic transaction is an indivisible and irreducible series
of database operations such that either all occur, or nothing
occurs. A guarantee of atomicity prevents updates to
the database occurring only partially, which can cause greater
problems than rejecting the whole series outright.
• An atomic transaction is an indivisible and irreducible series
of database operations such that either all occur, or nothing
occurs.
• Transactions are often composed of Multiple statements.
• A guarantee of atomicity prevents updates to
the database occurring only partially, which can cause greater
problems than rejecting the whole series outright.
• A guarantee of atomicity prevents updates to
the database occurring only partially, which can cause greater
problems than rejecting the whole series outright.
• Atomicity guarantees that each transaction is treated as a
single "unit", which either succeeds completely, or fails
completely:
• if any of the statements in a transaction fails to complete, the
entire transaction fails and the database is left unchanged.
• An atomic system must guarantee atomicity in each and every
situation, including power failures, errors and crashes.
Consistency
• Consistency ensures that a transaction can only bring the
database from one valid state to another valid state,
maintaining database invariants:
• any data written to the database must be valid according
to all defined rules, including constraints
, cascades, triggers, and any combination thereof.
• This prevents database corruption by an illegal transaction,
but does not guarantee that a transaction is correct.
Isolation
• Transactions are often executed concurrently (e.g., reading
and writing to multiple tables at the same time)
• Isolation ensures that concurrent execution of transactions
leaves the database in the same state that would have been
obtained if the transactions were executed sequentially.
• Isolation is the main goal of concurrency control;
• depending on the method used, the effects of an
incomplete transaction might not even be visible to other
transactions.
Durability

• Durability guarantees that once a transaction

has been committed, it will remain committed
even in the case of a system failure (e.g.,
power outage or crash).
• This usually means that completed
transactions (or their effects) are recorded
in non-volatile memory
HADOOP
Hadoop
• 3.Hadoop:
3.1.history of hadoop,
3.2.hadoop overview
3.3. use case of hadoop,
3.4.hadoop distributors,
• 4. HDFS:
4.1.HDFS daemons: Namenode, datanode, secondary
namenode
4.2. file read, file write, Replica processing of data with hadoop
4.3.Managing resources and applications with Hadoop YARN
1. Hadoop overview
• For 1. massive data storage
2. faster data processing
Key aspects:
1.OSS
2.Framework: programs, tools etc; provided to
develop and execute applications. It is not a data
base like NoSQL
3. distributed: data distributed across multiple
computers. Data processed parallelly
4. Massive data and faster processing
Hadoop distributors
• The following companies supply hadoop
products:
• Cloudera, Hortonworks, MAPR, Apache
hadoop
• Cloudera.
• HortonWorks.
• Amazon Web Services Elastic MapReduce Hadoop Distribution.
• Microsoft.
• MapR.
• IBM InfoSphere Insights
4. HDFS:
• HDFS is one of the two core components of
hadoop, the 2nd being MapReduce.
4.1.HDFS daemons: Namenode, datanode,
secondary namenode
4.2. file read, file write, Replica processing of
data with hadoop
4.3.Managing resources and applications with
Hadoop YARN
4.1.HDFS daemons
1.NameNode:
• There is a single namenode per cluster
• It manages file related operations like read, write, create and
delete
• Namenode stores HDFS namespace
• It manages file system Namespace which is a collection of files in
the cluster
• file system Namespace includes mapping of blocks to file , file
properties and is stored in a file called FsImage
• It uses editlog to record every transaction
• A rack is a collection of data nodes within a cluster
• it uses rackID to identify datanodes in the rack.
HDFS daemons
• When namenode starts, it reads FsImage and
EditLog from disk and applies all transactions
from EditLog to represent in FsImage.
• Then it flushes out new version of FsImage on
disk and truncates the old EditLog because the
changes are updated in the FsImage.
HDFS daemons
2.DataNode
• There are multiples
• During pipeline read write datanodes communicate
with each other.
• A datanode also sends heartbeat message to
namenode to ensure connectivity between name
and data nodes
• In case of no heartbeat, namenode replicates
datanode within the cluster and keeps running
HDFS daemons

3.Secondary NameNode
• It takes a snapshot of HDFS metadata at
intervals as specified in the configuration
• It occupies same memory size as namenode
• Therefore they are run on different machines
• In failure of namenode the secondary can be
configured
4.2. file read, file write, Replica processing of data
with hadoop
• File read:
• 1. the client opens file he wants to read by calling open()
on the DFS
• 2.DFS communicates with namenode to get the location
of the data blocks
• 3.namenode returns the addresses of the datanodes
containing the data blocks
• 4.DFS returns an FSDataInputStream to client.
• 5. client calls read() on the FSDataInputStream which
contains the addresses of the datanodes for the first few
blocks of file, connects to the nearest datanode for the
1st block in the file on FSDataInputStream to close the
connection
• 6.client calls read() repeatedly to get the data stream
from the datanode
• 7.when the end of a block FSDataInputStream closes
the connection with datanode.
• 8. it repeats the steps for to find the best node for
the next block.
• 9. client calls close()
File write
• 1. client calls create() to create file
• 2. An RPC call is initiated to namenode
• 3. namenode creates file after few checks
• 4. FSDataInputStream returns the stream for client to write on
• 5.as the client writes data, the data is split into packets which is then
written to a data queue
• 6.datastreamer requests namenode to allocate blocks by selecting a
list of suitable nodes for storing replicas (by default 3)
• 7. this list of datanodes makes a pipeline with 3 nodes in the pipe line
for the 1st block
File write….
• 8. datastreamer streams the packets to the 1st data node in the
pipeline which stores and the forwards to other datanodes in the
pipeline
• 9.DFSOutputStream manages an “Ack queue” of packets that are
waiting for ackment- and a pkt is removed from the queue only if
it is acknowledged by all the datanodes in the pipeline
• 10.when the client finishes writing the file it calls close() on the
stream
• 11.this flushes all the remaining pkts to the datanode pipeline
and waits for acknowledgements before communicating with
NameNode to inform the client that the creation of file is
complete
Replica processing of data with
• Replica placement strategy:
hadoop
• by default 3 replicas are created for each data set
1st replica is placed in the same node as the client
2nd replica is placed on a node in a different rack
3rd replica is placed on the same rack as second but on a different node in the rack
• Then a data pipeline is built . The client application writes a block to the 1 st
datanode in the pipeline.
next this datanode takes over and forwards data to the next node in the pipeline.
• this process continues for all the data blocks.
• Subsequently all the data blocks are written to the disk
• The client application need not track all blocks of data. The HDFS directs the
client to the nearest replica.
Why hadoop 2.x ?
• Because of following limitations of hadoop1.0:
• In hadoop 1.0 HDFS and MR are core componenets while other
components are built around.
• 1. single namenode for entire namespace of a cluster. It saves all its
file metadata in main memory. This puts a limit on the number of
objects stored in NameNode.
• 2.restricted to processing batch-oriented Map reduce jobs
• 3.MR for cluster resource management and data processing. not
suitable for interactive analysis
• 4. hadoop1.0 not suitable for machine learning, graphs and other
memory intensive algorithms
• 5. map slots may become full while reduce slots are empty and vice
versa- inefficient resource utilization
• 6
How hadoop 2.x
• HDFS 2 used in hadoop 2.0 consists of 2 major components:
• 1. namespace service: to take care of file related (create,
read, write) operations
• 2. blocks storage service: handles data nodes cluster
management, replication
• HDFS2 uses:
• 1. mutiple independent namenodes: datanodes are
common storage blocks shared by all namenodes. All
datanodes register with every namenode in the cluster
• 2. passive standby namenode
Managing resources and applications
with hadoop YARN
• YARN is a sub-project of hadoop 2.x
• It is a general processing platform
• YARN is not constrained to MR alone
• Multiple applications can be run in hadoop2.x
if all the applications share the same resources (memory,
cpu, network etc.,) management.
• With YARN hadoop can do not only batch processing but
also interactive, online, streaming, graph and other types of
processing
Daemons of YARN
1. Global Resource Manager: to distribute resources among various
applications. It has 2 components:
1.1. Scheduler: decides allocation of resources to running applications.
No monitoring
1.2. ApplicationManager: accepts jobs, negotiates resources for
executing ApplicationMaster which is specific to an application
• 2.NodeManager: it monitors usage of resources and reports the usage
to Global Resource Manager. It launches ‘application containers’ for
execution of application.
• Every machine will have one NodeManager
• 3.Per-application ApplicationMaster: every application has one.to
negotiate required resoueces for execution from the Resource
Manager. It works along with NodeManager for executing and
monitoring component tasks
• Application is a job submitted to the
framework. Ex: Map Reduce job
• Container:
is a basic unit of allocation across multiple
resource types:
ex: container_0= 2GB, 1 CPU
container_1= 1GB, 6 CPU
container replaces the fixed map/reduce slots
YARN Architecture: steps
• 1.client program submits the application which contains specifications to
launch application specific ‘ ApplicationMaster’
• 2.ResourceManager launches ‘ ApplicationMaster’ by assigning some
container
• 3. ‘ ApplicationMaster’ registers with ApplicationMaster’ so that the client
can quiery from Resource manager for details
• 4.( applicationmaster negotiates apptopruate resource containers via the
resource –request protocol)
• 5. after container allocation , the ApplicationMaster launches the container
by providing the specs to NodeManger
• 6. NodeManger executeds the application code and provides status to
ApplicationMaster via application specific protocol
• 7.on completion of application , ‘ ApplicationMaster deregisters with
ResourceManager and shuts down. itscontainers can then be reused.

NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
TCP IP Model
No ratings yet
TCP IP Model
13 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Unit 2
No ratings yet
Unit 2
65 pages
Ch4 Generative AI Q N A
No ratings yet
Ch4 Generative AI Q N A
10 pages
Unit 2
No ratings yet
Unit 2
23 pages
Unit 6
No ratings yet
Unit 6
143 pages
Database Management 1
No ratings yet
Database Management 1
32 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
Bda - 4 Unit
No ratings yet
Bda - 4 Unit
10 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
Unit 1
No ratings yet
Unit 1
23 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
DBMS Unit 5 Macro
No ratings yet
DBMS Unit 5 Macro
3 pages
DBMS Chapter 5
No ratings yet
DBMS Chapter 5
52 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Unit 3
No ratings yet
Unit 3
10 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
No SQL
No ratings yet
No SQL
12 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
NoSQL
No ratings yet
NoSQL
18 pages
NOSQL
No ratings yet
NOSQL
25 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Unit 2
No ratings yet
Unit 2
26 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
No SQL
No ratings yet
No SQL
109 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Chapter - 4 - NoSQL - 1676181987
No ratings yet
Chapter - 4 - NoSQL - 1676181987
85 pages
NO SQL Unit 1
No ratings yet
NO SQL Unit 1
66 pages
Lecture 3 Hamming Code
No ratings yet
Lecture 3 Hamming Code
14 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Module 1
No ratings yet
Module 1
69 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
No ratings yet
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
9 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Unit 3 Nosql Databases Adt
No ratings yet
Unit 3 Nosql Databases Adt
64 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
Unit - I - Nosql
No ratings yet
Unit - I - Nosql
12 pages
Requirement Gathering Questions
No ratings yet
Requirement Gathering Questions
15 pages
SQL Exercises: Bruce Momjian February 2004
No ratings yet
SQL Exercises: Bruce Momjian February 2004
2 pages
1 Module 2 SLP Question
No ratings yet
1 Module 2 SLP Question
3 pages
Optim EBS Data Governance HK Mar2017
No ratings yet
Optim EBS Data Governance HK Mar2017
17 pages
2005 1 PDF
No ratings yet
2005 1 PDF
12 pages
C - S4FCF - 2020 V12.75
No ratings yet
C - S4FCF - 2020 V12.75
17 pages
CONFERENCE REPORT - Sample
No ratings yet
CONFERENCE REPORT - Sample
3 pages
Acronis Admin Guide
No ratings yet
Acronis Admin Guide
190 pages
Database System Environment
No ratings yet
Database System Environment
6 pages
SIP Format & Reporting Schedule
No ratings yet
SIP Format & Reporting Schedule
4 pages
Data Visualization Handout
No ratings yet
Data Visualization Handout
5 pages
Castor Reference Guide 1.3.1
No ratings yet
Castor Reference Guide 1.3.1
143 pages
Linux Commands - File Management
No ratings yet
Linux Commands - File Management
1 page
Logcat CSC Update Log
No ratings yet
Logcat CSC Update Log
856 pages
Lecture 3 - Marketing Research
No ratings yet
Lecture 3 - Marketing Research
3 pages
Unit 1 AI Serach
No ratings yet
Unit 1 AI Serach
41 pages
Arcview Legends Tutorial
No ratings yet
Arcview Legends Tutorial
7 pages
1 s2.0 S0168169921004592 Main
No ratings yet
1 s2.0 S0168169921004592 Main
18 pages
Online Shopping Project Synopsis
No ratings yet
Online Shopping Project Synopsis
95 pages
SS7
No ratings yet
SS7
70 pages
Current Log
No ratings yet
Current Log
22 pages
STA201 Subject Outline
No ratings yet
STA201 Subject Outline
23 pages
Chapter 2 and Chapter 3
No ratings yet
Chapter 2 and Chapter 3
30 pages
Ankit STRV Report - PDF 2
No ratings yet
Ankit STRV Report - PDF 2
6 pages
Jdegtget - Allmotype/ Jdegtget - Allmotypekeystr: Syntax
No ratings yet
Jdegtget - Allmotype/ Jdegtget - Allmotypekeystr: Syntax
5 pages
Current Log
No ratings yet
Current Log
6 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet

Unit 4: Big Data Tehnology Landscape Two Inportant Technologies

Uploaded by

Unit 4: Big Data Tehnology Landscape Two Inportant Technologies

Uploaded by

Unit 4

Big data tehnology landscape

acid cap ACID

RDB Non RDB RDB

Predefined schema No schema rigidity May have schema rigidity

Vertically scalable by Scaleout (horizontal) Scaleout (horizontal)

Distributed computing Distributed Distributed computing

Fully supported Increasing support growing

• Durability guarantees that once a transaction

You might also like