0% found this document useful (0 votes)

7 views57 pages

HADOOP

Uploaded by

Roy abhisek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views57 pages

HADOOP

Uploaded by

Roy abhisek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

CS5412 / Lecture 22 Ken Birman

Apache Architecture Kishore Pusukuri

CS5412, Fall 2022 1

Apache: A Big Data Archicture
Batch Analytical Stream Machine Other
Processing SQL Processing Learning Applications

Resource Manager (Workload Manager, Task Scheduler, etc.)

Data
Data Storage (File Systems, Database, etc.) Ingestion
Systems

Popular BigData Systems: Apache Hadoop, Apache Spark

CS5412, Fall 2022 2

Actual Apache Tool Names

Hadoop Other
MapReduce Hive Pig Applications

Yet Another Resource

Negotiator (YARN)
Data Ingest
Systems
Hadoop NoSQL Database (HBase) e.g.,
Hadoop Distributed File System (HDFS) Apache
Kafka,
Flume, etc
Cluster
CS5412, Fall 2022 3
Apache Zookeeper
Zookeeper manages small files
holding configuration
information for your application
It automatically tracks IP
addresses of application
components, and health status.
The step count for an iterative
calculation.

CS5412, Fall 2022 4

The ZooKeeper Service
Each -service has a
leader that talks to
Zookeeper. Its other Zookeeper is itself an
nodes have interesting distributed
connections too, but system
just passive (for fault
detection)

ZooKeeper Service is replicated over a set of machines, usually 5 or 7.

All machines store a copy of the data in memory (!). Checkpointed to disk if you wish but at a limited pace (1/5 secs).
A leader is elected on service startup
Clients only connect to a single ZooKeeper server & maintains a TCP connection.
Client can read from any Zookeeper server.
Writes go through the leader & employ virtual synchrony atomic multicast with majority consensus on membership

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ProjectDescription CS5412, Fall 2022 5

Hadoop Distributed File System (HDFS)

HDFS is similar to Ceph, so we won’t do a deep dive on it.

Supports append-only updates, whole-file delete/create/replace.

Offers a form of “checkpoint”

 It records file versions and lengths.
 Rollback is done by truncating files to earlier lengths
 HDFS doesn’t retain old versions, so it can’t undo a file
delete/replace.
CS5412, Fall 2022 6
Hadoop Database (HBase) is a thin
layer directly over HDFS
HBASE is used like a NoSQL database.

It maps directly to HDFS

It holds relations (tables).

Supports large amounts of data and high throughput

CS5412, Fall 2022 7
HBase: Data Model (1)

CS5412, Fall 2022 8

HBase: Data Model (2)

•Sorted rows: support billions of rows

•Columns: Supports millions of columns
•Cell: intersection of row and column
 Can have multiple values (which are time-stamped)
 Can be empty. No storage/processing overheads

CS5412, Fall 2022 9

HBase: Table

CS5412, Fall 2022 10

HBase: Horizontal Splits (Regions)

CS5412, Fall 2022 11

HBase Architecture

CS5412, Fall 2022 12

HBase Architecture: Column Family (1)

CS5412, Fall 2022 13

HBase Architecture: Column Family (2)

CS5412, Fall 2022 14

HBase Architecture (1)
HBase is composed of three types of servers in a leader/worker
type of architecture: Region Server, Hbase Master, ZooKeeper.
Region Server:
• Clients communicate with Leader
RegionServers (workers) directly for Servers

accessing data
• Serves data for reads and writes.
• These region servers are assigned Worker
Servers
to the HDFS data nodes to preserve
data locality.
CS5412, Fall 2022 15
HBase Architecture (2)

HBase Leader (HMaster): coordinates region servers,

handles DDL (create, delete tables) operations.
Zookeeper: HBase uses ZooKeeper as a distributed
coordination service to maintain server state in the cluster.

CS5412, Fall 2022 16

HDFS uses ZooKeeper as its
coordinator
Maintains region server state in the cluster
Provides server failure notification
Uses consensus to guarantee common shared state

CS5412, Fall 2022 17

HBase vs HDFS
Hbase is a way of “talking to” HDFS. We use it for massive
tables that wouldn’t fit into a single HDFS file.

HBase
HDFS
• Stores data as key-value objs in column-
• Stores data as flat files
families. Records in HBase are stored
• Optimized for streaming access of
according to the rowkey and sequential
large files -- doesn’t support random
search is common
read/write
• Provides low latency access to small
• Follows write-once read-many model
amounts of data from within a large data set
• Supports log-style files (append-only).
• Provides flexible data model

CS5412, Fall 2022 18

Hadoop Resource Management

Yet Another Resource Negotiator (YARN)

➢ YARN is a core component of Hadoop, manages all the resources of a
Hadoop cluster (CPUs, memory, GPUs, networking connections, etc).
➢ Using selectable criteria such as fairness, it effectively allocates resources of
Hadoop cluster to multiple data processing jobs
○ Batch jobs (e.g., MapReduce, Spark)
○ Streaming Jobs (e.g., Spark streaming)
○ Analytics jobs (e.g., Impala, Spark)

CS5412, Fall 2022 19

Hadoop Ecosystem (Resource
Manager)
YARN
Other Spark decides
Hadoop Hive Pig
Applications Stream where the
steps in
your job
Resource Yet Another Resource should run
manager Negotiator (YARN)

Hadoop Distributed Hadoop NoSQL

File System (HDFS) Database (HBase)

CS5412, Fall 2022 20

YARN Concepts (1)
➢ YARN focuses on a generalized concept based on a virtual machine
container. In YARN, a container is an abstraction for managing resources --
an unit of computation of a resource node, i.e., a certain amount of CPU,
Memory, Disk, etc., ASIC resources. Tied to Mesos container model.
➢ A single job may run in one or more containers – a set of containers would
be used to encapsulate highly parallel Hadoop jobs.
➢ The main goal of YARN is effectively allocating containers to multiple data
processing jobs.
➢ YARN competes with Kubnetes (the Docker container manager) but covers
cases that don’t involve executable VMs. Kubnetes is focused on Docker

CS5412, Fall 2022 21

YARN Concepts (2)
Three Main components of YARN:
Application Leader, Node Manager, and Resource Manager (a.k.a. YARN
Daemon Processes)
➢ Application Leader:
○ Single instance per job.
○ Spawned within a container when a new job is submitted by a client
○ Requests additional containers for handling of any sub-tasks.
➢ Node Manager: Single instance per worker node. Responsible for monitoring
and reporting on local container status (all containers on worker node).

CS5412, Fall 2022 22

YARN Concepts (3)
Three Main components of YARN: Application Master, Node Manager, and
Resource Manager (aka The YARN Daemon Processes)
➢ Resource Manager: arbitrates system resources between competing jobs. It has
two main components:
○ Scheduler (Global scheduler): Responsible for allocating resources to the
jobs subject to familiar constraints of capacities, queues etc.
○ Application Manager: Responsible for accepting job-submissions and
provides the service for restarting the ApplicationMaster container on failure.

CS5412, Fall 2022 23

YARN Concepts (4)

How do the
components of YARN
work together?

Image source: http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html

CS5412, Fall 2022 24
Hadoop Ecosystem (Processing
Layer)
Processing Map Other Spark
Reduce Hive Pig Stream
Applications
Totally unrelated
Yet Another Resource tools run on the
Negotiator (YARN) same machines.
YARN needs to
decide where to
schedule each
Hadoop Distributed Hadoop NoSQL task.
File System (HDFS) Database (HBase)

CS5412, Fall 2022 25

Recall: MapReduce generates lots
of tasks as it runs!
Reduce Reducer Output Result
aardvark 1 aardvark 1
Intermediate Data Reduce
cat 1 cat 1
aardvark 1
Reduce mat 1
cat 1 mat 1
mat 1 on 2
Reduce on 2
on 1,1
sat 1,1 sat 2 sat 2
Reduce
sofa 1 sofa 1
sofa 1
the 1,1,1,1
Reduce the 4
the 4

Reduce
CS5412, Fall 2022 26
Example YARN challenge

YARN has to decide which machines to schedule each map and

reduce step on. Some have RDDs cached for reuse

Those same machines are in demand by other Apache tools as

well, such as Pig and Hive and even the HDFS meta-data service

Some tasks might have special needs, like “a node with 8 GPUs”
or “at least 20GB of RAM memory.”
CS5412, Fall 2022 27
How does YARN do this?

YARN loops, collecting a batch of tasks that need to be scheduled.

For each batch it runs a form of constrained optimization. First, it

identifies tasks with special needs (for example, a task that needs
two GPUs can only run on a node with two GPUs available)

Next it runs a form of minflow/maxcut algorithm to assign tasks to

available nodes in a best fit manner. Like bin packing.
CS5412, Fall 2022 28
Slight topic shift…
Away from YARN and
focusing now on what
the other tools do

CS5412, Fall 2022 29

Apache Hive: SQL on MapReduce
Hive is an abstraction layer on top of Hadoop (MapReduce/Spark)

Use Cases:

 Data Preparation
 Extraction-Transformation-Loading Jobs (Data Warehousing)
 Data Mining

CS5412, Fall 2022 30

Apache Hive: SQL on MapReduce
Hive is an abstraction layer on top of Hadoop (MapReduce/Spark)
➢ Hive uses a SQL-like language called HiveQL

➢ Facilitates reading, writing, and managing large datasets residing in

distributed storage using SQL-like queries

➢ Hive executes queries using MapReduce (and also using Spark)

○ HiveQL queries → Hive → MapReduce Jobs

CS5412, Fall 2022 31

Apache Hive
➢ Structure is applied to data at time of read → No need to worry about
formatting the data at the time when it is stored in the Hadoop cluster
➢ Data can be read using any of a variety of formats:
○ Unstructured flat files with comma or space-separated text
○ Semi-structured JSON files (a web standard for event-oriented data such
as news feeds, stock quotes, weather warnings, etc)
○ Structured HBase tables
➢ Hive is not designed for online transaction processing. Hive should be
used for “data warehousing” tasks, not arbitrary transactions.

CS5412, Fall 2022 32

Apache Pig: Scripting on MapReduce
Pig is an abstraction layer on top of Hadoop (MapReduce/Spark)

➢ Use Cases:
○ Data Preparation
○ ETL Jobs (Data Warehousing)
○ Data Mining

CS5412, Fall 2022 33

Apache Pig: Scripting on MapReduce
Pig is an abstraction layer on top of Hadoop (MapReduce/Spark)
➢ Code is written in Pig Latin “script” language (a data flow language)
➢ Facilitates reading, writing, and managing large datasets residing in
distributed storage
➢ Pig executes queries using MapReduce (and also using Spark)
○ Pig Latin scripts → Pig → MapReduce Jobs

CS5412, Fall 2022 34

Apache Hive & ApachePig
➢ Instead of writing Java code to implement MapReduce, one can opt
between Pig Latin and Hive SQL to construct MapReduce programs
➢ Much fewer lines of code compared to MapReduce, which reduces
the overall development and testing time

CS5412, Fall 2022 35

Apache Hive vs Apache Pig
➢ Declarative SQL-like language ➢ Procedural data flow language (Pig Latin)
(HiveQL) ➢ Runs on Client side of any cluster
➢ Operates on the server side of any ➢ Best for semi structured data
cluster ➢ Better for creating data pipelines
➢ Better for structured Data ○ allows developers to decide where to
➢ Easy to use, specifically for checkpoint data in the pipeline
generating reports ➢ Incremental changes to large data sets
➢ Data Warehousing tasks and also better for streaming
➢ Facebook ➢ Yahoo

CS5412, Fall 2022 36

Apache Hive vs ApachePig: example
Job: Get data from sources users and clicks is to be joined and filtered, and then joined
to data from a third source geoinfo and aggregated and finally stored into a table
ValuableClicksPerDMA

insert into ValuableClicksPerDMA Users = load 'users' as (name, age, ipaddr);

select dma, count(*) Clicks = load 'clicks' as (user, url, value);
from geoinfo join ( ValuableClicks = filter Clicks by value > 0;
select name, ipaddr UserClicks = join Users by name, ValuableClicks by user;
from users join clicks on Geoinfo = load 'geoinfo' as (ipaddr, dma);
(users.name = clicks.user) UserGeo = join UserClicks by ipaddr, Geoinfo by ipaddr;
where value > 0; ByDMA = group UserGeo by dma;
) using ipaddr ValuableClicksPerDMA = foreach ByDMA generate group,
group by dma; COUNT(UserGeo);
store ValuableClicksPerDMA into 'ValuableClicksPerDMA';

CS5412, Fall 2022 37

Data Ingestion Systems/Tools
➢ Apache Sqoop
○ High speed import to HDFS from Relational Database (and vice versa)
○ Supports many database systems,
e.g. Mongo, MySQL, Teradata, Oracle

➢ Apache Flume
○ Distributed service for ingesting streaming data
○ Ideally suited for event data from multiple systems, for example, log files

CS5412, Fall 2022 38

Apache Kafka
➢ Functions like a distributed publish-subscribe messaging system (or a
distributed streaming platform)
○ A high throughput, scalable messaging system
○ Distributed, reliable publish-subscribe system
○ Design as a message queue & Implementation as a distributed log service

➢ Originally developed by LinkedIn, now widely popular

➢ Features: Durability, Scalability, High Availability, High Throughput
➢ Check out the awesome Kafka “intro” video here.

CS5412, Fall 2022 39

What is Apache Kafka used for? (1)
➢ The original use case (@LinkedIn):
○ To track user behavior on websites.
○ Site activity (page views, searches, or other actions users might take) is
published to central topics, with one topic per activity type.

➢ Effective for two broad classes of applications:

○ Building real-time streaming data pipelines that reliably get data between
systems or applications
○ Building real-time streaming applications that transform or react to the
streams of data
CS5412, Fall 2022 40
What is Apache Kafka used for? (2)
➢ Lets you publish and subscribe to streams of records, similar to a
message queue or enterprise messaging system
➢ Lets you store streams of records in a fault-tolerant way
➢ Lets you process streams of records as they occur
➢ Lets you have both offline and online message consumption

CS5412, Fall 2022 41

Apache Kafka: Fundamentals
➢ Kafka is run as a cluster on one or more servers
➢ The Kafka cluster stores streams of records in categories called topics
➢ Each record (or message) consists of a key, a value, and a timestamp

➢ Point-to-Point: Messages persisted in a queue, a particular message is

consumed by a maximum of one consumer only
➢ Publish-Subscribe: Messages are persisted in a topic, consumers can
subscribe to one or more topics and consume all the messages in that topic

CS5412, Fall 2022 42

Apache Kafka: Components
Logical Components:
➢ Topic: The named destination of partition
➢ Partition: One Topic can have multiple partitions and it is an unit of parallelism
➢ Record or Message: Key/Value pair (+ Timestamp)

Physical Components:
➢ Producer: The role to send message to broker
➢ Consumer: The role to receive message from broker
➢ Broker: One node of Kafka cluster
➢ ZooKeeper: Coordinator of Kafka cluster and consumer groups
CS5412, Fall 2022 43
Apache Kafka: Topics & Partitions (1)
➢ A stream of messages belonging to a particular category is called a
topic (or a feed name to which records are published)
➢ Data is stored in topics.
➢ Topics in Kafka are always multi-subscriber -- a topic can have
zero, one, or many consumers that subscribe to the data written to it
➢ Topics are split into partitions. Topics may have many partitions, so
it can handle an arbitrary amount of data

CS5412, Fall 2022 44

Apache Kafka: Topics & Partitions (2)
➢ For each topic, the Kafka cluster ➢ Each partition is an ordered,
maintains a partitioned log that immutable sequence of records
looks like this: that is continually appended to -- a
structured commit log.
➢ Partition offset: The records in the
partitions are each assigned a
sequential id number called the
offset that uniquely identifies each
record within the partition.

CS5412, Fall 2022 45

Apache Kafka: Topics & Partitions (3)
➢ The only metadata retained on a per-
consumer basis is the offset or
position of that consumer in the log.
➢ This offset is controlled by the
consumer -- normally a consumer will
advance its offset linearly as it reads
records (but it can also consume
records in any order it likes)

CS5412, Fall 2022 46

Apache Kafka: Topics & Partitions (4)
The partitions in the log serve several purposes:
➢ Allow the log to scale beyond a size that will fit on a single server.
➢ Handles an arbitrary amount of data -- a topic may have many partitions
➢ Acts as the unit of parallelism

CS5412, Fall 2022 47

Apache Kafka: Distribution of Partitions (2)

Here, a topic is configured into

three partitions.
Partition 1 has two offset factors 0
and 1.
Partition 2 has four offset factors 0,
1, 2, and 3.
Partition 3 has one offset factor 0.
The id of the replica is same as the
id of the server that hosts it.
CS5412, Fall 2022 48
Apache Kafka: Components
Logical Components:
➢ Topic: The named destination of partition
➢ Partition: One Topic can have multiple partitions and it is an unit of parallelism
➢ Record or Message: Key/Value pair (+ Timestamp)

Physical Components:
➢ Producer: The role to send message to broker
➢ Consumer: The role to receive message from broker
➢ Broker: One node of Kafka cluster
➢ ZooKeeper: Coordinator of Kafka cluster and consumer groups
CS5412, Fall 2022 49
Apache Kafka: Producers
➢ Producers publish data to the topics of their choice.
➢ The producer is responsible for choosing which record to assign to
which partition within the topic.
➢ Record to Topic: In a round-robin fashion simply to balance load or
can be done according to some semantic partition function

CS5412, Fall 2022 50

Apache Kafka: Consumers
➢ Consumer group: Balance consumers to partitions
➢ Consumers label themselves with a consumer group name
➢ Each record published to a topic is delivered to one consumer
instance within each subscribing consumer group
➢ If all the consumer instances have the same consumer group, then the
records will effectively be load balanced over the consumer instances.
➢ If all the consumer instances have different consumer groups, then
each record will be broadcast to all the consumer processes.
CS5412, Fall 2022 51
Apache Kafka: Producers & Consumers

Example:
A two server Kafka cluster hosting four
partitions (P0 to P3) with two consumer
groups (A & B). Consumer group A has
two consumer instances (C1 & C2) and
group B has four (C3 to C6).

CS5412, Fall 2022 52

Apache Kafka: Design Guarantees (1)

➢ Records (or Messages) sent by a producer to a particular topic partition

will be appended in the order they are sent.
➢ A consumer instance sees records in the order they are stored in the log.
➢ For a topic with replication factor N, we will tolerate up to N-1 server
failures without losing any records committed to the log.

CS5412, Fall 2022 53

Apache Kafka: Design Guarantees (2)
Message Delivery Semantics:
➢ At most once: Messages may be lost but are never redelivered.
➢ At least once: Messages are never lost but may be redelivered.
➢ Exactly once: Each message is delivered once and only once

CS5412, Fall 2022 54

Apache Kafka: Four Core APIs (1)
Producer API: Allows an application to publish a
stream of records to one or more Kafka topics
Consumer API: Allows an application to
subscribe to one or more topics and process the
stream of records produced to them
Streams API: Allows an application to act as a
stream processor -- consuming an input stream
from one or more topics and producing an output
stream to one or more output topics

CS5412, Fall 2022 55

Apache Kafka: Four Core APIs (2)
Connector API:
Allows building and running producers or
consumers that connect Kafka topics to existing
applications or data systems.
For example, a connector to a relational
database might capture every change to a table.

CS5412, Fall 2022 56

Summary

Apache ecosystem: A comprehensive big-data framework

All open source, very standard architecture at all levels

Widely popular, but not always blindingly fast. Understanding

the intended styles of use is important for good performance.

CS5412, Fall 2022 57

The Richest Man in Babylon
100% (1)
The Richest Man in Babylon
139 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
Wa0005.
No ratings yet
Wa0005.
84 pages
Understanding Hadoop Ecosystem
No ratings yet
Understanding Hadoop Ecosystem
38 pages
Module 1 - Introduction To Big Data
100% (1)
Module 1 - Introduction To Big Data
40 pages
Big Data Unit 4
No ratings yet
Big Data Unit 4
96 pages
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
277 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
MechaProg A Description
67% (3)
MechaProg A Description
5 pages
Unit 5
No ratings yet
Unit 5
101 pages
Big Data - Tomas Iglesias IV
No ratings yet
Big Data - Tomas Iglesias IV
37 pages
Unit 4
No ratings yet
Unit 4
85 pages
Unit 2
No ratings yet
Unit 2
73 pages
Unit 2lecturenotes 240530095215 Bebaac62
No ratings yet
Unit 2lecturenotes 240530095215 Bebaac62
98 pages
Hadoop
No ratings yet
Hadoop
83 pages
HADOOP
No ratings yet
HADOOP
19 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
BDA Unit 2
No ratings yet
BDA Unit 2
52 pages
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
No ratings yet
Chap 3-5.-Hadoop Ecosystem YARN MapReduce - 1
87 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Fro CH3
No ratings yet
Fro CH3
21 pages
MODULE 2 Hadoop Ecosystem Tools
No ratings yet
MODULE 2 Hadoop Ecosystem Tools
44 pages
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
No ratings yet
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
30 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
2-Introduction To Hadoop Eco System
No ratings yet
2-Introduction To Hadoop Eco System
35 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
10 - Big Data Architecture and Tools
No ratings yet
10 - Big Data Architecture and Tools
31 pages
DC - Co 1 All in 1 PDF
No ratings yet
DC - Co 1 All in 1 PDF
197 pages
BDA Unit 1
No ratings yet
BDA Unit 1
35 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
BD U-4 (Anupam Sir)
No ratings yet
BD U-4 (Anupam Sir)
23 pages
Unit 3 - BD - Hadoop Ecosystem
No ratings yet
Unit 3 - BD - Hadoop Ecosystem
42 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
Unit 4 Endsem PYQs
No ratings yet
Unit 4 Endsem PYQs
24 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
1 - Big Data and Hadoop Framework
No ratings yet
1 - Big Data and Hadoop Framework
40 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
DS Unit 4.1
No ratings yet
DS Unit 4.1
14 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
Big Data Notes
No ratings yet
Big Data Notes
12 pages
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
No ratings yet
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
52 pages
Module 2 Hadoop Eco System
No ratings yet
Module 2 Hadoop Eco System
13 pages
Unit2 Bda
No ratings yet
Unit2 Bda
12 pages
Unit 4
No ratings yet
Unit 4
21 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Dataserver For Oracle Guide
No ratings yet
Dataserver For Oracle Guide
284 pages
Introduction To The Big Data Ecosystem
No ratings yet
Introduction To The Big Data Ecosystem
13 pages
A Glimpse of The Hadoop Echosystem
No ratings yet
A Glimpse of The Hadoop Echosystem
16 pages
Module 4 - Hadoop
No ratings yet
Module 4 - Hadoop
5 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Hadoop
No ratings yet
Hadoop
5 pages
Assignment For DS (Stack and Recursion)
No ratings yet
Assignment For DS (Stack and Recursion)
11 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Penetration Testing With Metasploit Framework
No ratings yet
Penetration Testing With Metasploit Framework
16 pages
SPARK
No ratings yet
SPARK
66 pages
554 Cheatsheet
No ratings yet
554 Cheatsheet
1 page
Immfp11: Process Control and Automation Solutions From Elsag Bailey Group
No ratings yet
Immfp11: Process Control and Automation Solutions From Elsag Bailey Group
76 pages
Nord Riduttori Inverter Sk180
100% (1)
Nord Riduttori Inverter Sk180
40 pages
File Access in VBA
No ratings yet
File Access in VBA
4 pages
Powerpoint Presentation For Dennis, Wixom, & Tegarden Systems Analysis and Design With Uml, 3Rd Edition
No ratings yet
Powerpoint Presentation For Dennis, Wixom, & Tegarden Systems Analysis and Design With Uml, 3Rd Edition
38 pages
Keyboard Teaching Guide 2
No ratings yet
Keyboard Teaching Guide 2
48 pages
Cyber SYLLABUS Feb2023
No ratings yet
Cyber SYLLABUS Feb2023
8 pages
Computer Science 2009 Solved For CBSE C++
89% (9)
Computer Science 2009 Solved For CBSE C++
23 pages
Pratt Chapter 2
No ratings yet
Pratt Chapter 2
41 pages
Lecture 18
No ratings yet
Lecture 18
57 pages
Untitled
No ratings yet
Untitled
26 pages
Pre-Installation Guide: Printlink5-IN
No ratings yet
Pre-Installation Guide: Printlink5-IN
18 pages
Ighpk6006b Itr Status
No ratings yet
Ighpk6006b Itr Status
2 pages
Health Checkup Bill Format
No ratings yet
Health Checkup Bill Format
1 page
Underground Activity Detection System: Tunnels Disclosure
No ratings yet
Underground Activity Detection System: Tunnels Disclosure
4 pages
Introduction To The Superheterodyne Receiver
No ratings yet
Introduction To The Superheterodyne Receiver
24 pages
New Motokit WPTT Box Solution
No ratings yet
New Motokit WPTT Box Solution
2 pages
2023 LG Digital Signage E-Catalog
No ratings yet
2023 LG Digital Signage E-Catalog
39 pages
7580 - User - Manual - of Chip
No ratings yet
7580 - User - Manual - of Chip
43 pages
TI Impedance Front End Design
No ratings yet
TI Impedance Front End Design
18 pages
AN - Xilinx XAPP1026 LightWeight IP (lwIP) Application Examples
No ratings yet
AN - Xilinx XAPP1026 LightWeight IP (lwIP) Application Examples
31 pages
JDSU JD724C Celladvisor Jd720c Series Cable and Antenna Analyzers Data Sheets en
No ratings yet
JDSU JD724C Celladvisor Jd720c Series Cable and Antenna Analyzers Data Sheets en
13 pages
W Java148
No ratings yet
W Java148
14 pages
GC 2024 06 19
No ratings yet
GC 2024 06 19
5 pages
WildFire API-5.0 RevB
No ratings yet
WildFire API-5.0 RevB
5 pages
Industrial Automation - Is It Time To Upgrade Your Control System - Stellar Food For Thought
No ratings yet
Industrial Automation - Is It Time To Upgrade Your Control System - Stellar Food For Thought
4 pages
11.6.1.5 Lab Schedule A Task Using The GUI and The Command Line
No ratings yet
11.6.1.5 Lab Schedule A Task Using The GUI and The Command Line
3 pages
HP Merced
No ratings yet
HP Merced
3 pages
EEE 105: Instruction Set Examples: Snap Densing
No ratings yet
EEE 105: Instruction Set Examples: Snap Densing
3 pages
DVR 5000 Series
No ratings yet
DVR 5000 Series
2 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet