0% found this document useful (0 votes)

36 views

Bigdata Fundamentals

Big data refers to extremely large datasets that are too large to be processed with traditional data processing tools and methods. Some key characteristics of big data include volume, velocity, and variety. Volume refers to the enormous amount of data being generated every day from sources like social media, sensors, and business transactions. Velocity refers to the speed at which this data is being created and needs to be processed. Variety means the diverse types and formats of data, including structured, unstructured, audio, video, etc. Proper management and analysis of big data can provide valuable insights and benefits to organizations across various applications like customer analytics, product development, and more.

Uploaded by

klogeswaran.it

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Bigdata Fundamentals

Uploaded by

klogeswaran.it

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

BIG DATA

K.LOGESWARAN AP(Sr.G) | AI | KEC

Can you think of ?
2

• Can you think of running a query on 20,980,000 GB file.

• What if we get a new data set like this, every day?
• What if we need to execute complex queries on this
data set everyday ?
• Does anybody really deal with this type of data set?
• Is it possible to store and analyze this data?
• Yes Google deals with more than 20 PB data
everyday
In fact, in a minute
3
•Email users send more than 204 million messages;
•Mobile Web receives 217 new users;
•Google receives over 2 million search queries;
•YouTube users upload 48 hours of new video;
•Facebook users share 684,000 bits of content;
•Twitter users send more than 100,000 tweets;
•Consumers spend $272,000 on Web shopping;
•Apple receives around 47,000 application downloads;
•Brands receive more than 34,000 Facebook 'likes';
•Tumblr blog owners publish 27,000 new posts;
•Instagram users share 3,600 new photos;
•Flickr users, on the other hand, add 3,125 new photos;
• Foursquare users perform 2,000 check-ins;
•WordPress users publish close to 350 new blog posts.
And this is one year back͙.. Damn!!
BIG DATA
BIG DATA
5
•Data which are very large in size is called Big Data.

•Big data is data that contains greater variety, arriving in increasing

volumes and with more velocity.

•Big Data refers to extremely large ,very fast, highly diverse and
complex data that cannot be managed by traditional data
management tools
or
•Big data primarily refers to data sets that are too large or complex
to be dealt with by traditional data-processing application software.
Big data Applications
Big data Applications
Data sources
8
• Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data
on a day to day basis as they have billions of users worldwide

• New York Stock Exchange is an example of Big Data that generates about one terabyte of new trade
data per day.

• A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches up to many Petabytes.

•E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users
buying trends can be traced.

•Weather Station: All the weather station and satellite gives very huge data which are stored and
manipulated to forecast weather.

•Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly publish
their plans and for this they store the data of its million user
9

Volume Velocity Variety

• Data • Data • Data
quantity Speed Types
Data sources
10
Volume
• A typical PC might have had 10 gigabytes of storage in
2000.
• Today, Face book ingests 500 terabytes of new data
every day.
• Boeing 737 will generate 240 terabytes of flight data
during a single flight across the US.
• The smart phones, the data they create and consume;
• sensors embedded into everyday objects will soon result
in billions of new, constantly-updated data feeds
containing environmental, location, and other information,
including video.

13
Velocity
Click streams and ad impressions capture user
behavior at millions of events per second

high-frequency stock trading algorithms reflect market

changes within microseconds

machine to machine processes exchange data

between billions ofdevices

infrastructure and sensors generate massive log data in

real-time

on-line gaming systems support millions of concurrent users,

14
each producing multiple inputs per second.
Variety
Big Data isn't just numbers, dates, and strings. Big
Data is also geospatial data, 3D data, audio and
video, and unstructured text, including log files
and social media.

Traditional database systems were designed to

address smaller volumes of structured data, fewer
updates or a predictable, consistent data structure.

Big Data analysis includes different types of

data
15
Wholeness of BIG DATA
1.1UNDERSTANDING BIG DATA
 Big Data can be examined at two levels
 Collection of data ,analyzed and utilized for benefit of
business
 Insights helps to make better decision
 Specialkind data poses unique challenges in storing,processing
and offer unique benefits
 Space, time and function

 Huge opportunities for technology providers to innovate and manage

the entire life cycle of data –to generate, store, organize ,analyze and
visualize this data
1.2 Capturing BIG DATA
 Four V’s (Volume,velocity,variety, veracity) arrive
together with data at a time
1.2 Capturing BIG DATA-Volume
Volume is amount of data generated by organization or individuals
Data generated is doubling every I year
Data is huge to extract meaningful specific information in a reasonable period of
time
1.2 Capturing BIG DATA-Volume

•Reason for data growth is reduction in storage cost of data.

It decreases 30 -40 percent per year
•Different form and functions of data increases
•Cost of computation and communication of data is also
coming down
1.2 Capturing BIG DATA-Velocity
 Big data is generated by Billion of devices and
communicated at higher speed via internet
 Increased velocity of data is due to increase in speed
of internet
 Internet speed at homes and offices increased to 100
times faster
 Increased variety of sources like mobile devices
,sensors can generate data from anywhere ,at any
time
1.2 Capturing BIG DATA-Variety
 Three kinds of data
 Form of data
 Data type range from numbers to text, graph, map, audio,
video, etc.
 Composite of data in a single file
 Text documents have graphs and pictures inside it
 Video songs have audio embedded in it
 Audio and video have different complex storage formats
 Function of data
 Data from human conversaion,songs and movies ,new product
design, old archived data etc
 Processing of each data is different
 Used to recognize people face in pictures, comparing to identify
the speaker ,comparing handwriting to identify the writer
1.2 Capturing BIG DATA-Variety
 Three kinds of data
 Source of data
 Mobile phone, tablet allows to access and generate data any
time and anywhere
 Web access and search logs he sources of data
 Business Systems generates structured business transactional
information
 Sensors like temperature pressure on machines and RFID tags on
assets generate data
 Three broad type of source of data
 Human to human communication
 Human to machine communication
 Machine to machine to communication
1.2 Capturing BIG DATA-Veracity
 Relates to truthfulness, believability and quality of
data
 Source of information may not be authoritative
 Data may not communicated and received correctly
due to human or technical failures
 Data provided and received may be intentionally
wrong for competitive or security reasons
1.2 Capturing BIG DATA-Veracity
Benefitting from Big data
Benefitting from Big data

 Monitoring and tracking Applications

 Analysis and Insight
 New product development
1.4 Management of big data
 some emerging insights into making better use of Big Data.
 Focus to protect and enhance customer relationships and
customer experience.
 Solve a real pain-point. Big Data should be deployed for
specific business Objectives
 Organizations are beginning their pilot implementations by
using existing and newly accessible internal sources of data.
 Combining data-based analysis with human intuition and
perspectives is better than going just one way.
1.4 Management of big data

 Faster you analyze the data, the more its predictive

value. The value of data depreciates with time
 Don’t throw away data if no immediate use can be seen
for it. Data has value beyond what you initially anticipate.
Maintain one copy of your data, not multiple.
 Data is expected to continue to grow at exponential
rates. Storage costs continue to fall, data generation
continues to grow, data-based applications continue to
grow in capability and functionality.
 Big Data is transforming business, just like IT did. Big
Data is a new phase representing a digital world.
1.5 Organizing big data
 Given huge quantities-The cost of storing and processing
the data, too, would be a major driver for the choice of an
organizing pattern.
 Given the fast speed of data,it will also be desirable to
create a control over the data by maintaining count and
averages over time, unique values received, etc.
 Given the variety in form factors, data needs to be stored
and analyzed differently.
 Given different quality levels of data, various data sources
may need to be ranked and prioritized before serving them
to the audience.
1.6 Analyzing big data
 Big Data can be utilized to visualize a flowing or a
static situation.
 Analyzed in Two ways
 Big Data in motion
 is to process the incoming stream of data in real time for quick
and effective statistics about the data.
 Big Data at rest
 To store and structure the data and apply standard analytical
techniques on batches of data for generating insights.
1.6 Analyzing big data
1.6 Analyzing big data

The bar shows the

number of page
views, and the inner
darker bar shows
the number of
unique visitors.
 The dashboard
could show the view
by days, weeks or
years also.
1.6 Analyzing big data
 Text Data could be combined,
filtered, cleaned, thematically
analyzed, and visualized in a
wordcloud.

 wordcloud from a recent stream of

tweets (ie Twitter messages) from US
Presidential candidates Hillary
Clinton and Donald Trump.

 The larger words implies greater

frequency of occurrence in the
tweets.
1.7 Technology challenges of Bigdata
 Storing Huge Volumes
 It distributes data across the large cluster of inexpensive commodity machines, and ensures that every
piece of data is stored on multiple machines to guarantee that at least one copy is always available
 Hadoop is the most well-known clustering technology for Big Data. Its data storage pattern is called
Hadoop Distributed File System (HDFS).
 Ingesting streams at an extremely fast pace
 creating special ingesting systems that can open an unlimited number of channels for receiving data.
 These queuing systems can hold data, from which consumer applications can request and process
data at their own pace.
 Apache Spark is the most popular system for streaming applications

 Handling a variety of forms and functions of data

 structuring and access of all varieties of data
 HBase, for example, stores each data element separately along with its key identifying information.
This is called a key-value pair format.
 Cassandra stores data in a document format.
 NoSQL languages, such as Pig and Hive, are used to access this data.
 Processing data at huge speeds
 to moving large amounts of data from storage to the processor
1.7 Technology challenges of Bigdata

 Processing data at huge speeds

 to moving large amounts of data from storage to the processor
 this would consume enormous network capacity and choke the network
 Alternative to this is to “move the processing to where the data is stored.”
 Distributes the task logic throughout the cluster of machines where the data is
stored.
 Those machines work, in parallel, on the data assigned to them, respectively.
 A follow-up process consolidates the outputs of all the small tasks and delivers
the final results
 MapReduce, also invented by Google, is the best-known technology for parallel
processing of distributed Big Data.
1.7 Technology challenges of Bigdata
Assignment 1
Liberty Stores Case Exercise:
Liberty Stores Inc. is a specialized global retail chain that sells organic food,
organic clothing, wellness products, and education products to enlightened
LOHAS (Lifestyles of the Healthy and Sustainable) citizens worldwide. The
company is 20 years old, and is growing rapidly. It now operates in 5
continents, 50 countries, 150 cities, and has 500 stores. It sells 20000
products and has 10000 employees. The company has revenues of over $5
billion and has a profit of about 5% of its revenue. The company pays
special attention to the conditions under which the products are grown and
produced. It donates about one-fifth (20%) from its pre-tax profits from
global local charitable causes.
 Q1: Create a comprehensive Big Data strategy for the CEO of the
company.
 Q2: How can Big Data systems such as IBM Watson help this company?
Big Data Architecture
CASELET: Google Query Architecture
Big Data Architecture
 There are many sources of data. All data is funneled in
through an ingest system.
 The data is forked into two sides:
 a stream processing system
 Streaming data processing happens as the data flows through a system.
This results in analysis and reporting of events as it happens. An
example would be fraud detection or intrusion detection.
 a batch processing system.
 Batch processing is when the processing and analysis happens on a set
of data that have already been stored over a period of time
 The outcome of these processing can be sent into NoSQL
databases for later retrieval, or sent directly for consumption
by many applications and devices.
Big Data Architecture
Big Data Architecture
Big data sources
 Sources of data for an application depends upon

data taken to perform analysis.

 The data will vary in origin, size, speed, form, and

function, as described by the 4 Vs

 Data sources can be internal or external to the

organization
Big Data Architecture
 A big data solution typically comprises these as logical
layers.
 Data ingest layer
 Batch Processing layer

 Streaming Processing layer

 Data Organizing Layer

 Data Consumption layer

 Infrastructure Layer

 Distributed File System Layer

 Each layer can be represented by one or more

available technologies.
Big Data Architecture
Data ingest layer
 Used for acquiring data from the data sources.

 can acquire at various speeds and in various quantities.

 data is sent to a batch processing system, a stream processing
Directly to HDFS
Batch Processing layer and Streaming Processing layer
 analysis layer reads data from the file system or from the NoSQL databases.

 Data is processed using parallel programming to produce the desired results.

 needs to understand the data sources and data types, the algorithms that would
work on that data, and the format of the desired outcomes.
 output of this layer could be sent for instant reporting, or stored in a NoSQL
databases for an on-demand report, for the client.
Big Data Architecture
 Data Organizing Layer
 layer receives data from both the batch and stream
processing layers.
 NoSQL databases. Is used to organize the data for easy
access.
 SQL-like languages like Hive and Pig can be used to easily
access data and generate reports.
Big Data Architecture
 Data Consumption layer
 This layer consumes the output provided by the analysis layers,
directly or through the organizing layer.
 The outcome could be standard reports, data analytics, dashboards
and other visualization applications, recommendation engine, on
mobile and other devices.
 Infrastructure Layer
 Used to manages the raw resources of storage, compute, and
communication through a cloud computing paradigm.
 Distributed File System Layer
 include the Hadoop Distributed File System (HDFS).
 supporting applications, such as YARN (Yet Another Resource
Manager), that enable the efficient access to data storage and its
transfer.
Big Data Architecture examples-IBM
WATSON
Netflix
 This is one of the largest providers of online video
entertainment. They handle 400 Billion online events per day.
 As a cutting-edge user of big data technologies, they are
constantly innovating their mix of technologies to deliver the
best performance.
 Kafka is the common messaging system for all incoming
requests.
 They host the entire infrastructure on Amazon Web Services
(AWS).
 The database is AWS’ S3 as well as Cassandra and Hbase to
store data.
 Spark is used for stream processing.
Netflix
Netflix
EBAY
 Ebay is the second-largest Ecommerce company in
the world.
 It delivers 800 million listings from 25 million sellers
to 160 million buyers.
 To manage this huge stream of activity, EBay uses a
stack of Hadoop, Spark, Kafka, and other elements.
Paypal

 This payments-facilitation company needs to understand

and acquire customers, and process a large number of
payment transactions.
Apache Hadoop - Distributed Computing

A distributed file storage system is a clever way of storing

huge quantities of data in a networked collection of commodity
machines
 secureand cost-effective
 speed and ease, for retrieval and processing
Apache Hadoop - HADOOP FRAMEWORK

 Apache Hadoop distributed computing framework

 composed of the following modules:
 Hadoop Common – contains libraries and utilities needed by other
Hadoop modules
 Hadoop Distributed File System (HDFS) – a distributed file-system
that stores data on commodity machines
 YARN – a resource-management platform responsible for managing
computing resources in clusters and using them for scheduling of
users’ applications
 MapReduce – an implementation of the MapReduce programming
model for large scale data processing.
 facilitates concurrent processing by splitting petabytes of data into smaller
chunks, and processing them in parallel on Hadoop commodity servers.
 In the end, it aggregates all the data from multiple servers to return a
consolidated output back to the application
Apache Hadoop - HDFS DESIGN GOALS

 Hadoop distributed file system (HDFS) is a distributed and

scalable file-system
 It is designed for applications that deal with very large
data sizes
 also designed to deal with mostly immutable files, i.e.
write data once, but read it many times
 major design goals of HDFS
Apache Hadoop - HDFS DESIGN GOALS

 major design goals of HDFS

 Hardware failure management - one must plan for it
 Huge volume – capacity to store large file with fast throughput
 High speed - mechanism to provide low latency (latency - time it takes
for a data packet to travel from one designated point to another) access to streaming
applications
 High variety - Maintain simple data coherence(data coherence-
uniformity across shared resource data), by writing data once but reading
many times
 Plug-and-play - Maintain easy accessibility of data using any
hardware, software, and database platform
 Network efficiency- Minimize network bandwidth requirement,
by minimizing data movement.
Apache Hadoop - MASTER-SLAVE ARCHITECTURE

 Hadoop is an architecture for organizing computers in a

master-slave relationship
 A Hadoop cluster has two types of nodes
 Master- Single master node called NameNode
 Slave - large number of slave worker nodes (called
DataNodes)
 A small Hadoop cluster includes a single master and
multiple worker nodes
 A large Hadoop cluster would consist of a master and
thousands of small ordinary machines as worker nodes
Apache Hadoop - MASTER-SLAVE ARCHITECTURE
Apache Hadoop - MASTER-SLAVE ARCHITECTURE

 MASTER NODE (NameNode)

 the master node manages the overall file system, its namespace,
and controls the access to files by clients
 The master node is aware of the data-nodes, i.e. which blocks of
which file are stored on which data node
 It also controls the processing plan for all applications running on
the data on the cluster
 Only one Master node is available - that makes it a single point of
failure
 To overcome from failure: the master node has its hot backup always
ready to take over, just in case the master node dies unexpectedly
 The master node uses a transaction log to persistently record every
change that occurs to file system
Apache Hadoop - MASTER-SLAVE ARCHITECTURE

 WORKER NODES(DataNodes)
 store the data blocks in their storage space, as directed by
the master node
 It contains many disks to maximize storage capacity and
access speed.
 It do not have awareness about the distributed file structure
Apache Hadoop - Architecture
Apache Hadoop - Architecture

 The NameNode stores all relevant information about all the

DataNodes, and the files stored in those DataNodes
 Information includes:
 For every DataNode, its name, rack, capacity, and health
 For every File, its name, replicas, type, size, timeStamp, location,
health, etc
 DataNode failure:
 data on the failed DataNode will be accessed from its replicas on
other DataNodes
 The failed DataNode can be automatically recreated on another
machine, by writing all those file blocks of from the other healthy
replicas
 Each DataNode sends a heartbeat message to the NameNode
periodically. Without this message, the DataNode is assumed to be
dead.
Apache Hadoop - Architecture

 Role of NameNode
 tries to ensure that files are evenly spread across the data-
nodes in the cluster
 tries to optimize the networking load

 tries to store fragments of files on the same node for speed of

read and writing
Apache Hadoop - BLOCK SYSTEM

 A block of data is the fundamental storage unit in HDFS

 HDFS stores large files (typically gigabytes to terabytes) by
storing segments (called blocks) of the file across multiple
machines
 All storage capacity and file sizes are measured in blocks
 A block ranges from 16–128MB in size, with a default block size of
64MB.
 Thus, an HDFS file is chopped up into 64 MB chunks, and if
possible, each chunk will reside on a different DataNode
 Every data file takes up a number of blocks depending upon
its size
 Eg., 1 Terabyte storage will have 16000 blocks (1TB divided by
64MB).
Apache Hadoop - ENSURING DATA INTEGRITY

 Hadoop ensures that no data is lost or corrupted during

storage or processing
 Only one client can write or append to a file, at a time.
 If some data on a DataNode is indeed lost or corrupted
 new healthy replica for that lost block will be used to recreate
the data
 To ensure integrity
A checksum algorithm is applied on all data written to HDFS

Introduction to information and big data security
No ratings yet
Introduction to information and big data security
39 pages
Intro To System Administration & Maintenance
100% (3)
Intro To System Administration & Maintenance
30 pages
100 Sap Basis Interviwe Questions
100% (5)
100 Sap Basis Interviwe Questions
10 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Prepared By: Asmita Deshmukh
No ratings yet
Prepared By: Asmita Deshmukh
51 pages
Big Data..Unit-1 Notes
No ratings yet
Big Data..Unit-1 Notes
16 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
88 pages
Big Data
No ratings yet
Big Data
24 pages
Bsd1313 Chapter 2
No ratings yet
Bsd1313 Chapter 2
40 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
No ratings yet
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
7 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
Unit 1
No ratings yet
Unit 1
89 pages
Unit1 - Introduction To Big Data
No ratings yet
Unit1 - Introduction To Big Data
53 pages
1 - Big Data
No ratings yet
1 - Big Data
204 pages
Big Data Analytics_AAM_Unit 1
No ratings yet
Big Data Analytics_AAM_Unit 1
178 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
Seminar On: Big Data
No ratings yet
Seminar On: Big Data
23 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Big Data Primer
No ratings yet
Big Data Primer
17 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Big Data CH 1
No ratings yet
Big Data CH 1
62 pages
Part 1 - Introduction To Big Data
No ratings yet
Part 1 - Introduction To Big Data
24 pages
Big Data Analysis Seminar
100% (1)
Big Data Analysis Seminar
15 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
L1
No ratings yet
L1
53 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Lecture 6 BigData
No ratings yet
Lecture 6 BigData
61 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
Big Data Analytics
No ratings yet
Big Data Analytics
96 pages
Big Data
No ratings yet
Big Data
31 pages
Wibd Notes
No ratings yet
Wibd Notes
32 pages
Presented by Theerthana.H Pradeepa.A
No ratings yet
Presented by Theerthana.H Pradeepa.A
14 pages
Big Data Presentation
No ratings yet
Big Data Presentation
24 pages
Big Data.pptx (1)
No ratings yet
Big Data.pptx (1)
54 pages
Big Data Intro PDF
No ratings yet
Big Data Intro PDF
93 pages
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
No ratings yet
Big Data: Submitted By-Rajashree Rashmita Reg - No-1825209016 Mca 4 Sem
27 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Module 6_Big Data and NOSQL
No ratings yet
Module 6_Big Data and NOSQL
63 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
11 pages
Unit 1
No ratings yet
Unit 1
74 pages
Big Data
No ratings yet
Big Data
21 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
23 pages
Big-Data-ppt
No ratings yet
Big-Data-ppt
30 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
ET-Ext
No ratings yet
ET-Ext
217 pages
Big Data-Hadoop
No ratings yet
Big Data-Hadoop
6 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
55 pages
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
No ratings yet
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
58 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
BIGDATA UNITS
No ratings yet
BIGDATA UNITS
80 pages
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet
Data Decoded - Understanding Big Data and Its Everyday Applications
From Everand
Data Decoded - Understanding Big Data and Its Everyday Applications
Michael Reed
No ratings yet
SC-200 exam questions and practice tests
No ratings yet
SC-200 exam questions and practice tests
17 pages
File Systems: Tanenbaum & Bo, Modern Operating Systems:4th Ed., (C) 2013 Prentice-Hall, Inc. All Rights Reserved
No ratings yet
File Systems: Tanenbaum & Bo, Modern Operating Systems:4th Ed., (C) 2013 Prentice-Hall, Inc. All Rights Reserved
49 pages
HUAWEI IdeaHub S2 OPS Configuration
No ratings yet
HUAWEI IdeaHub S2 OPS Configuration
14 pages
Getting Started With SystemC _ ElectroBucket
No ratings yet
Getting Started With SystemC _ ElectroBucket
5 pages
Mysql
No ratings yet
Mysql
77 pages
Experiment 7 Result
No ratings yet
Experiment 7 Result
4 pages
Oracle RAC 11gR2 On AIX Step by Step Install Oracle RAC
No ratings yet
Oracle RAC 11gR2 On AIX Step by Step Install Oracle RAC
82 pages
Introduction to Information Technology Notes-1
No ratings yet
Introduction to Information Technology Notes-1
82 pages
Paper Template of JIC (Use "Title of Paper" Style)
No ratings yet
Paper Template of JIC (Use "Title of Paper" Style)
2 pages
3 Embedded Flash memory interface
No ratings yet
3 Embedded Flash memory interface
40 pages
TEE02 - MQP1 DR AIT Model Question Paper
No ratings yet
TEE02 - MQP1 DR AIT Model Question Paper
2 pages
Unit_2
No ratings yet
Unit_2
113 pages
Cos 101 Material-1
No ratings yet
Cos 101 Material-1
16 pages
User Manual For QR Code Generation System
No ratings yet
User Manual For QR Code Generation System
14 pages
Csir-Central Mechanical Engineering Research Institute: LH, L VKBZ Vkj&Dsunzh Kaf D VFHK Kaf DH Vuqla/Kku Lalfkku
No ratings yet
Csir-Central Mechanical Engineering Research Institute: LH, L VKBZ Vkj&Dsunzh Kaf D VFHK Kaf DH Vuqla/Kku Lalfkku
4 pages
Soa Unit 1 - Final (1).Ppt
No ratings yet
Soa Unit 1 - Final (1).Ppt
154 pages
25C16 - Fingerprint Recognition System
No ratings yet
25C16 - Fingerprint Recognition System
40 pages
20200907-XII-Python With MySQL-1 of 2-Handout
No ratings yet
20200907-XII-Python With MySQL-1 of 2-Handout
11 pages
CKA-2025
No ratings yet
CKA-2025
36 pages
Cryptography and Network Security 2010
No ratings yet
Cryptography and Network Security 2010
4 pages
Framework of HCI
No ratings yet
Framework of HCI
266 pages
Az-104 A6c066907e4a
No ratings yet
Az-104 A6c066907e4a
66 pages
LI AR0231 AP0200 GMSL2 XXXH Datasheet V1 0-2888732
No ratings yet
LI AR0231 AP0200 GMSL2 XXXH Datasheet V1 0-2888732
10 pages
Xerox ConnectKey (And Discovery) Firmware Upgrade Methods - Incl. Forced ALTBOOT - Xeretec Helpdesk
No ratings yet
Xerox ConnectKey (And Discovery) Firmware Upgrade Methods - Incl. Forced ALTBOOT - Xeretec Helpdesk
2 pages
SOP For Proctor
No ratings yet
SOP For Proctor
9 pages
T Rec G.987.2 202302 I!!pdf e
No ratings yet
T Rec G.987.2 202302 I!!pdf e
42 pages
(Itab) Practical File
No ratings yet
(Itab) Practical File
21 pages
The Diagnozer Ver3.00 User Guide
No ratings yet
The Diagnozer Ver3.00 User Guide
76 pages

Bigdata Fundamentals

Uploaded by

Bigdata Fundamentals

Uploaded by

BIG DATA

K.LOGESWARAN AP(Sr.G) | AI | KEC

• Can you think of running a query on 20,980,000 GB file.

•Big data is data that contains greater variety, arriving in increasing

Volume Velocity Variety

high-frequency stock trading algorithms reflect market

machine to machine processes exchange data

infrastructure and sensors generate massive log data in

on-line gaming systems support millions of concurrent users,

Traditional database systems were designed to

Big Data analysis includes different types of

 Huge opportunities for technology providers to innovate and manage

•Reason for data growth is reduction in storage cost of data.

 Monitoring and tracking Applications

 Faster you analyze the data, the more its predictive

The bar shows the

 wordcloud from a recent stream of

 The larger words implies greater

 Handling a variety of forms and functions of data

 Processing data at huge speeds

data taken to perform analysis.

function, as described by the 4 Vs

 Streaming Processing layer

 Data Organizing Layer

 Data Consumption layer

 Distributed File System Layer

 Each layer can be represented by one or more

 can acquire at various speeds and in various quantities.

 Data is processed using parallel programming to produce the desired results.

 This payments-facilitation company needs to understand

A distributed file storage system is a clever way of storing

 Apache Hadoop distributed computing framework

 Hadoop distributed file system (HDFS) is a distributed and

 major design goals of HDFS

 Hadoop is an architecture for organizing computers in a

 MASTER NODE (NameNode)

 The NameNode stores all relevant information about all the

 tries to store fragments of files on the same node for speed of

 A block of data is the fundamental storage unit in HDFS

 Hadoop ensures that no data is lost or corrupted during

You might also like