BDA Unit 1

The document provides an extensive overview of Big Data, covering its definition, types (structured, unstructured, and semi-structured), characteristics (including the 3Vs and 6Vs), challenges, applications across various industries, and enabling technologies such as Hadoop and NoSQL databases. It emphasizes the importance of data management and analytics in handling large volumes of data generated from diverse sources. The document also highlights the need for advanced tools and techniques to effectively process and analyze Big Data.

Uploaded by

dnyangitte01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views60 pages

BDA Unit 1

Uploaded by

dnyangitte01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Unit 1

INTRODUCTION
Introduction to Big Data
• Big data is a collection of massive and complex data sets
and data volume that include the huge quantities of
data and data management capabilities.
• There exists large amount of heterogeneous digital
data.
• With growth of technology, large volume of data is
produced.
• This data can be structured, semi- structured or
unstructured from different sources.
Introduction to Big Data
• Big data is about data volumes and large data sets
measured in terabytes or petabytes.
• Data analytics technologies and techniques analyzes
data sets and gather new information about the data.
• Big data analytics is a form of advanced analytics, which
involve complex applications with elements such as
predictive models, statistical algorithms
• Traditional SQL queries and RDBMS systems can not
work with big data, thus wide variety of scalable tools
and techniques have evolved to work with big data.
Introduction to Big Data
• Big Data is a technique to store, process, manage,
analysis and report, a huge amount of variety data, at
the required speed, and within the required time to
allow real-time analysis.
• The need of big data comes from big companies like
Google and Face book.
Evolution of Big Data
Evolution of Big Data
Types of Big Data
• Big data is classified in three ways:
1. Structured Data
2. Unstructured Data
3. Semi-Structured Data
Types of Big Data
• The structure of the data is important to know how to
work with the data and also indicates what insights it
can produce.
• All data goes through a process called extract,
transform, load (ETL) before it can be analyzed.
• The data is collected, formatted, converted to be
readable by an application, and then stored for use.
• The ETL process for each structure of data varies.
Types of Big Data
1. Structured Data
• Any data that can be processed, accessed and stored as a
fixed-format is named structured data.
• With the increased ability in software engineering, many
advanced techniques for working with structured data.
• Structured data is the easiest to work with.
• It is highly organized with dimensions defined by set of
parameters.
• It is a quantitative data such as, Age, billing, contact,
address, expenses, debit/credit card numbers etc.
Types of Big Data
• Structured data is the easiest type of data to analyze
because it requires no preparation before processing.
• A user need to cleanse data and fit it to relevant points.
• One of the major benefit of using structured data is the
streamlined process of merging enterprise data with
relational database.
• The ETL process for structured data stores the finished
product in a data warehouse.
• These databases are highly structured and filtered for the
specific analytics purpose.
Types of Big Data
2. Unstructured Data
• This type of big data, the data format of the relative
multitude of unstructured files are incorporated.
 for example, image files, audio files, log files, and video files.
• Any data which has an unfamiliar structure or model is
arranged as unstructured data.
• As size of this data is huge, this type of data has different
difficulties for preparing and determining its value.
• Analytical process needed for unstructured data are takes
time and efforts to covert it into some level of readability.
• For unstructured data, the second phase of ETL process is
complicated.
Types of Big Data
• It is difficult to analyze unstructured data and use the
information extracted from it into an application.
• This means translating it into some form of structured data.
• Methods like text parsing, natural language processing and
developing content hierarchies are needed to do so.
• Unstructured is placed in data lakes, which preserve the raw
format of the data and all of the information it holds.
• In warehouses, the data is limited to its defined schema, but
in data lakes the data more flexible and in variety of formats.
Types of Big Data
3. Semi-Structured Data
• Semi-structured data toes the line between structured and
unstructured.
• Most of the time, this translates to unstructured data with
metadata attached to it.
• This can be inherent data collected, such as time, location,
device ID stamp or email address, or it can be a semantic
tag attached to the data later.
• Semi-structured splits the gap between structured and
unstructured data.
• It can inform AI training and machine learning by
associating patterns with metadata.
• Semi-structured data has no set schema.
Characteristics of Big Data
• Since 1997, many attributes have been added to Big
Data.
• Among these attributes, three of them are the most
popular and have been widely cited and adopted.
• In 2001, Gartner analyst Doug Laney listed the 3 ‘V’s of
Big Data
• Variety, Velocity, and Volume
Characteristics of Big Data
1) Variety
• Variety of Big Data refers to structured, unstructured, and
semistructured data that is gathered from multiple sources.
• Today data comes in an array of forms such as emails, PDFs,
photos, videos, audios, SM posts, and so much more.
• It signifies the variety of incompatible and inconsistent data
formats and data structures
2) Velocity
• Velocity essentially refers to the speed at which data is being
created in real-time.
• It represents the pace of data used to support interaction
and generated by interactions.
Characteristics of Big Data
3) Volume
• Volume is one of the characteristics of big data.
• It means the incoming data stream and cumulative
volume of data.
• Big Data indicates huge ‘volumes’ of data that is being
generated on a daily basis from various sources like
social media platforms, business processes, machines,
networks, human interactions, etc.
• Such a large amount of data are stored in data
warehouses.
Characteristics of Big Data
• IBM — 4Vs definition
• IBM added another attribute or “V” for “Veracity” on the
top of Douglas Laney’s 3Vs notation.
• This is known as the 4Vs of Big Data.
• It defines each “V” as following:
• 1. Volume stands for the scale of data
• 2. Velocity denotes the analysis of streaming data
• 3. Variety indicates different forms of data
• 4. Veracity implies the uncertainty of data
Characteristics of Big Data
• Microsoft — 6Vs definition
• For the sake of maximizing the business value, Microsoft
extended Douglas Laney’s 3Vs attributes to 6 Vs which it
added variability, veracity, and visibility:
• 1. Volume stands for scale of data
• 2. Velocity denotes the analysis of streaming data
• 3. Variety indicates different forms of data
• 4. Veracity focuses on trustworthiness of data sources
Characteristics of Big Data
• Microsoft — 6Vs definition
• 5. Variability refers to the complexity of data set. In
comparison with “Variety” (or different data format), it
means the number of variables in data sets
• 6. Visibility emphasizes clear and full picture of data in order
to make informative decision
Characteristics of Big Data
Challenges of Big Data
1. Lack of proper understanding of Big Data
2. Storing huge sets of data properly is difficult
3. Insufficient understanding of big data
4. Need for synchronization across data sources
5. Constantly changing and updating data
6. Lack of data professionals
7. Data searching, sharing and transferring is complicated
8. Securing huge sets of data is one of the
overwhelming challenge
9. Finding and fixing data quality issues
10. Scaling big data systems efficiently and cost effectively
Applications of Big Data
• Education Industry:- helps to analyze and study huge
amount of data, can be used to improve the operational
effectiveness and working of educational institutes.
• Healthcare Industry :-reduces the costs of a treatment,
helps avoid preventable diseases by detecting them in the
early stages and helps in deciding preventive measures
• Government Sector:-In making faster and informed
decisions, identify areas that needs attention and helps to
overcome national challenges such as unemployment,
terrorism, energy resources etc
• Media and Entertainment:- helps in predicting the interests
of audiences and optimized or on-demand scheduling of
media streams.
Applications of Big Data
• Weather Patterns:-In weather forecasting, to study global
warming, in understanding the patterns of natural disasters,
to make necessary preparations in the case of crises
• Transportation Industries:- helps in route planning, helps in
congestion management and traffic control, increase the
safety level of traffic
• Banking Sector:- helps in misuse of credit/debit cards,
venture credit hazard treatment, business clarity, customer
statistics alteration, money laundering
Applications of Big Data
• Marketing:- helps to collect huge amounts of data and get to
know the choices of millions of customers in a few seconds.
Analyzing the data help marketers run campaigns, increase
click-through rates, put relevant advertisements, improve
the product
• Business Insights :- helps to solve a lot of problems related
to profits, customer satisfaction, and product development.
• Space Sector:- helps to manage the data received from
satellites orbiting the earth, probes studying outer space,
and rovers on other planets and analyzes that to run
simulations.
Enabling Technologies for Big Data
• Big data is used, for a collection of data sets.
• These data sets are large and complex, that it is difficult to
process using traditional tools.
• A recent survey, says that 80% of the data created, in the
world are, unstructured.
• Thus traditional tools, will not be able to handle, such a big
data motion.
• One challenge is, to store and process this big data.
• To do so, the technologies and the enabling framework to
process the big data are needed.
Enabling Technologies for Big Data
• 1. Operational Big Data Technologies:
• It indicates the amount of data generated on a daily basis
such as online transactions, social media, or any sort of data
from a specific firm used for the analysis through big data
technologies based software.
• It acts as raw data to feed the Analytical Big Data
Technologies.
• For example: the Operational Big Data Technologies include
executives’ particulars in an MNC, online trading and
purchasing from Amazon, Flipkart, Walmart, etc, online ticket
booking for movies, flight, railways and many more.
Enabling Technologies for Big Data
• 2. Analytical Big Data Technologies:
• It refers to advance adaptation of Big Data Technologies.
• The real investigation of massive data that is crucial for business
decisions comes under this part.
• For example: stock marketing, weather forecasting, time series
analysis, and medical-health records.
Enabling Technologies for Big Data
• In both the types of big data technologies, the most commonly used
techniques that easily facilitate the practical use of big data were as
follows:
• 1) Predictive Analytics
• One of the prime tools for businesses to avoid risks in decision
making, and help businesses.
• Predictive analytics hardware and software solutions can be utilised for
discovery, evaluation and deployment of predictive scenarios by
processing big data.
• This data can help companies solve problems by analyzing and
understanding them.
• 2) NoSQL Databases
• These databases are utilised for reliable and efficient data management
across a scalable number of storage nodes.
• NoSQL databases store data as relational database tables, JSON docs or
key-value pairings.
Enabling Technologies for Big Data
• 3) Knowledge Discovery Tools
• These are tools that allow businesses to mine big data (structured
and unstructured) which is stored on multiple sources.
• These sources can be different file systems, APIs, DBMS or similar
platforms.
• With search and knowledge discovery tools, businesses can isolate
and utilise the information to their benefit.
• 4) Stream Analytics
• Organization needs to process the data that can be stored on
multiple platforms and in multiple formats.
• Stream analytics software is useful for filtering, aggregation, and
analysis of such big data.
• These software also allows connection to external data sources and
their integration into the application flow.
Enabling Technologies for Big Data
• 5) In-memory Data Fabric
• This technology helps in distribution of large quantities of data
across system resources such as Dynamic RAM, Flash Storage or
Solid State Storage Drives.
• This enables low latency access and processing of big data on the
connected nodes.
• 6) Distributed Storage
• This technology provides a way to overcome independent node
failures and loss or corruption of big data sources.
• Distributed file stores contain replicated data, sometimes the data
is also replicated for low latency quick access on large computer
networks.
• These are generally non-relational databases.
Enabling Technologies for Big Data
• 7) Data Virtualization
• It enables applications to retrieve data without implementing
technical restrictions such as data formats, the physical location of
data, etc.
• This technology is used by Apache Hadoop and other distributed
data stores for real-time or near real-time access to data stored on
various platforms.
• 8) Data Integration
• A key operational challenge for most organizations handling big
data is to process terabytes (or petabytes) of data in a way that can
be useful for customer deliverables.
• Data integration tools allow businesses to streamline data across a
number of big data solutions such as Amazon EMR, Apache Hive,
Apache Pig, Apache Spark, Hadoop, MapReduce, MongoDB and
Couchbase.
Enabling Technologies for Big Data
• 9) Data Preprocessing
• These software solutions are used for manipulation of data into a
format that is consistent and can be used for further analysis.
• The data preparation tools accelerate the data sharing process by
formatting and cleansing unstructured data sets.
• A limitation of data preprocessing is that all its tasks cannot be
automated and require human oversight, which can be tedious and
time-consuming.
• 10) Data Quality
• An important parameter for big data processing is the data quality.
The data quality software can conduct cleansing and enrichment of
large data sets by utilising parallel processing.
• These softwares are widely used for getting consistent and reliable
outputs from big data processing.
Enabling Technologies for Big Data
Apart from above mentioned techniques, some other big data
enabling technologies are as follows:

• Apache Hadoop • Apache Spark

• Hadoop Ecosystem • Zookeepar
• HDFS Architecture • Cassandra
• YARN • Hbase
• NoSQL • Spark Streaming
• Hive • Kafka
• MapReduce • Spark Mlib
• GraphX
Enabling Technologies for Big Data
• Apache Hadoop is the tool which is going to be used for, the
big data computation.
• It is an open source, software framework, for a big data.
• Hadoop Framework was designed to store and process data
in a Distributed Data Processing Environment with
commodity hardware with a simple programming model.
• It can store and analyse the data present in different
machines with high speeds and low costs.
• In particular, Hadoop can process extremely large volumes of
data with varying structures (or no structure at all).
• Developed by: Apache Software Foundation in the year 2011
10th of Dec.
• Written in: JAVA
Enabling Technologies for Big Data
• Hadoop Ecosystem is used for big data computation.
• The Hadoop architecture comprises three layers.
1. Storage layer (HDFS)
2. Resource Management layer (YARN)
3. Processing layer (MapReduce)
Enabling Technologies for Big Data

Hadoop Ecosystem
Enabling Technologies for Big Data
• HDFS:-
• HDFS is one of the major components of Apache Hadoop.
• It is a distributed file system that handles large data sets
running on commodity hardware.
• It is used to scale a single Apache Hadoop cluster to
hundreds (and even thousands) of nodes.
• HDFS has been built to detect faults and automatically
recover quickly.
• It is designed for high data throughput rates.
• HDFS manages, all the nodes, and their corresponding
memories.
Enabling Technologies for Big Data
• HDFS:-
• HDFS is designed to be portable across multiple hardware
platforms and to be compatible with a variety of underlying
operating systems.
• HDFS can process applications that have data sets typically
gigabytes to terabytes in size.
• It provides high aggregate data bandwidth and can scale to
hundreds of nodes in a single cluster.
Enabling Technologies for Big Data
• YARN:-(Yet Another Resource Negotiator)
• It is one of Apache Hadoop's core components.
• Apache Hadoop YARN is the resource management and job
scheduling technology in the open
source Hadoop distributed processing framework.
• YARN is responsible for allocating system resources to the
various applications running in a Hadoop cluster and
scheduling tasks to be executed on different cluster nodes.
• YARN can dynamically allocate resources to applications as
needed.
Enabling Technologies for Big Data
• MapReduce:-
• It is a program model for distributed computing based on
java.
• The MapReduce algorithm contains two important tasks,
Map and Reduce.
• Map takes a set of data and converts it into another set of
data, and breaks the individual elements into tuples
(key/value pairs).
• Reduce takes the output from a map as an input and
combines those data tuples into a smaller set of tuples.
• As the sequence of the name MapReduce implies, the
reduce task is always performed after the map task
Enabling Technologies for Big Data
• Apache Hive is an open-source data warehousing tool for
performing distributed processing and data analysis.
• It was developed by Facebook to reduce the work of writing
the Java MapReduce program.
• Apache Hive uses a Hive Query language, which is a
declarative language similar to SQL.
• Hive translates the hive queries into MapReduce programs.
• Hive supports applications written in any language like
Python, Java, C++, Ruby, etc. using JDBC, ODBC, and Thrift
drivers, for performing queries on the Hive.
Enabling Technologies for Big Data
• Apache Pig is a data flow language.
• It is an abstraction over MapReduce.
• It is a tool which is used to analyze larger sets of data
representing them as data flows.
• All data manipulation operations in Hadoop can be
performed using Apache Pig.
• To write data analysis programs, Pig provides a high-level
language known as Pig Latin.
• To analyze data using Apache Pig, programmers need to
write scripts using Pig Latin language.
Enabling Technologies for Big Data
• HBase stands for Hadoop Database.
• HBase is Java-based Not Only SQL (NoSQL) database which
runs on top of Hadoop.
• Data stored in the table format in HDFS.
• HBase stores data as key/value pair.
• Hbase is flexible and is convenient for multiple read-write of
data stored in HDFS.
• HBase is a data model that is similar to Google’s big table
designed to provide quick random access to huge amounts
of structured data.
• It provides random real-time read/write access to data in
the Hadoop File System.
Enabling Technologies for Big Data
• A NoSQL database (sometimes called as Not Only SQL) is a
database, provides a mechanism to store and retrieve data
other than the tabular relations used in relational databases.
• These databases are schema-free, support easy replication,
have simple API, and very consistent.
• They can handle huge amounts of data.
• They are mainly used for:-
• simplicity of design
• horizontal scaling
• easy availability.
Enabling Technologies for Big Data
• Cassandra is a distributed database from Apache that is
highly scalable and designed to manage very large amounts
of structured data.
• It is a type of NoSQL, column oriented database
• It is scalable and consistent.
• It is used to provide high availability along with no single
failure point.
• Cassandra accommodates all possible data formats
including: structured, semi-structured, and unstructured.
• It can dynamically accommodate changes to the data
structures according to need of application.
• It supports properties like Atomicity, Consistency, Isolation,
and Durability (ACID).
Enabling Technologies for Big Data
• MongoDB is an open-source document database and leading
NoSQL database and is written in C++.
• MongoDB is a cross-platform, document oriented database.
• It provides, high performance, high availability, and easy
scalability.
• MongoDB works on concept of collection and document.
• A single MongoDB server typically has multiple databases.
• Collection is a group of MongoDB documents. It is the equivalent
of an RDBMS table.
• A document is a set of key-value pairs. Documents have dynamic
schema.
• MongoDB handles flexibility and also a wide variety of data types
at high volumes and among distributed architectures.
Enabling Technologies for Big Data
• Apache ZooKeeper is a service used by a cluster (group of nodes)
to coordinate between themselves and maintain shared data with
robust synchronization techniques.
• ZooKeeper is a distributed application and also provide services
for writing a distributed application.
• It has simple architecture and API.
• ZooKeeper allows developers to focus on core application logic
without thinking about the distributed nature of the application.
• The ZooKeeper framework was originally built at Yahoo for
accessing their applications in an easy and robust way.
• Zookeeper resolves race condition and deadlock using fail-safe
synchronization approach and inconsistency of data
with atomicity.
• It provides mutual exclusion and co-operation between server
processes, which helps in HBase for configuration management.
Big Data Stack
Big Data Stack
• The Data Layer:-
• At the bottom of the stack are technologies that store
masses of raw data.
• The data comes from traditional sources like OLTP
databases, and less structured sources like log files,
sensors, web analytics, document and media archives.
Big Data Stack
• Data Storage Systems:-
• Following are some of the examples of data storage systems
• Hadoop HDFS—the classic big data file system. It became
popular due to its robustness and limitless scale on
commodity hardware.
• Amazon S3—create buckets and load data using a variety of
integrations.
• Mongo DB—a mature open source document-based
database, built to handle data at scale with proven
performance.
• Cassandra- is a distributed database from Apache that is
highly scalable and designed to manage very large amounts
of structured data
Big Data Stack
• The Data Integration Layer:-
• To create a big data store, the data must be imported from
its original sources into the data layer.
• In many applications, data need to be ingested into
specialized tools, such as data warehouses.
• This needs a data pipeline.
• To do this, a rich ecosystem of big data integration tools,
including powerful open source integration tools, is needed.
• These tools need to pull the data from sources, transform it,
and load it to a target system.
Big Data Stack
• Big Data Ingestion Tools
• Stitch—a lightweight ETL (Extract, Transform, Load) tool
which pulls data from multiple pre-integrated data sources,
transforms and cleans it as necessary.
• Stitch is easy to setup, seamless and integrates multiple
sources of data .
• Blendo—a cloud data integration tool that allows to connect
data sources with a few clicks, and pipe them to a Server.
• Blendo provides schemas and optimization for email
marketing, eCommerce and other big data use cases.
Big Data Stack
• Big Data Ingestion Tools
• Apache Kafka—It is an open source streaming
messaging bus that can creates a feed from your data
sources, partitions the data, and streams it to a passive
listener.
• Apache Kafka is one of the powerful solution used in
production at huge scale data.
Big Data Stack
• The Data Processing Layer:-
• At this layer, data arrives at its destination.
• Now a technology is needed that can crunch the data to help
data analysis.
• The data processing layer should optimize the data to
facilitate more efficient analysis, and provide a compute
engine to run the queries.
• Data warehouse tools are optimal for processing data at
large scale, and a data lake is more appropriate for storage,
it also assist other technologies when data needs to be
processed and analyzed.
Big Data Stack
• Data Processing Tools
• Following are the examples of data processing tools.
• Apache Spark— It is similar to Map/Reduce but more faster.
• Runs parallelized queries on unstructured, distributed data
in Hadoop. Spark also provides a SQL interface, but is not a
SQL engine.
• PostgreSQL— It is used to pipeline the data to facilitate
queries. PostgreSQL can be scaled by partitioning the data
and it is very reliable.
• Amazon Redshift—A cloud-based data warehouse and
offers huge query processing speeds and can also be used as
a relational database.
Big Data Stack
• Data Analytics & BI Layer :-
• The data layer collected the raw materials for the analysis,
the integration layer mixed them all together, the data
processing layer optimized, organized the data and executed
the queries.
• The analytics & BI is the application layer which, with the
help of data enables data-driven decisions.
• The technology in this layer is helpful to run the queries to
answer questions, slice and dice the data, build dashboards
and create visualizations etc.
Big Data Stack
• Data Analytics Tools
• Tableau— This is one of the powerful BI and data
visualization tool, which connects the data and allows to
perform complex analysis, and build charts and dashboards.
• Chartio—cloud BI service allows to connect data sources,
explore data, build SQL queries and transform the data as
needed, and create live auto-refreshing dashboards.
• Looker—cloud-based BI platform that allows query and
analyze large data sets via SQL, set up visualizations and
define metrics that elaborates the data.
Hadoop Distributions

Distro Remarks Free / Premium

Apache •The Hadoop Source
Completely free and open
hadoop.apache.or •No packaging except TAR balls
source
g •No extra tools
•Oldest distro
Cloudera
•Very polished Free / Premium model
www.cloudera.co
•Comes with good tools to install (depending on cluster size)
m
and manage a Hadoop cluster
•Newer distro
HortonWorks
•Tracks Apache Hadoop closely
www.hortonworks. Completely open source
•Comes with tools to manage and
com
administer a cluster
Hadoop Distributions
•MapR has their own file system (alternative to
HDFS)
•Boasts higher performance
MapR
•Nice set of tools to manage and administer a Free / Premium
www.mapr.co
cluster model
m
•Does not suffer from Single Point of Failure
•Offer some cool features like mirroring,
snapshots, etc.

•Encryption support
Intel
•Hardware acceleration added to some layers of
hadoop.intel.c Premium
stack to boost performance
om
•Admin tools to deploy and manage Hadoop

Pivotal HD •fast SQL on Hadoop

Premium
gopivotal.com •software only or appliance

Chapter 1
No ratings yet
Chapter 1
21 pages
Unit I
No ratings yet
Unit I
25 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Bda M1
No ratings yet
Bda M1
111 pages
IMTC634 - Data Science - Chapter 11
No ratings yet
IMTC634 - Data Science - Chapter 11
22 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Unit 1
No ratings yet
Unit 1
26 pages
BigData Unit-1
No ratings yet
BigData Unit-1
72 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Big Data
No ratings yet
Big Data
84 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Unit I Bda
No ratings yet
Unit I Bda
18 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Lec 1 - Introduction To Big Data
No ratings yet
Lec 1 - Introduction To Big Data
37 pages
Big Data Intro
No ratings yet
Big Data Intro
21 pages
Unit 1
No ratings yet
Unit 1
57 pages
Chapter N1 Introduction To Big Data
No ratings yet
Chapter N1 Introduction To Big Data
40 pages
Big Data Intro
No ratings yet
Big Data Intro
12 pages
Big Data
No ratings yet
Big Data
7 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
What Is Data
No ratings yet
What Is Data
20 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Unit 1
No ratings yet
Unit 1
44 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
BDT 1
No ratings yet
BDT 1
49 pages
Unit 5
No ratings yet
Unit 5
63 pages
Unit 1 BDT
No ratings yet
Unit 1 BDT
27 pages
Unit 1
No ratings yet
Unit 1
56 pages
BDA Unit 1
No ratings yet
BDA Unit 1
28 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
19 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
Arora 2016
No ratings yet
Arora 2016
6 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
No ratings yet
Big Data and Blockchain Basics: Dr. Poonam Saini Poonamsaini@pec - Edu.in
42 pages
Bda U1
No ratings yet
Bda U1
78 pages
Unit 1
No ratings yet
Unit 1
59 pages
Big Data Analytics (VN) 1
No ratings yet
Big Data Analytics (VN) 1
98 pages
BIG Data Analytics
No ratings yet
BIG Data Analytics
17 pages
Unit-I - Big Data
No ratings yet
Unit-I - Big Data
29 pages
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
No ratings yet
Unit 1 Introduction To BIG DATA ANALYSIS: Evolution of Technology
9 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
Cloud Computing
No ratings yet
Cloud Computing
86 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
115 pages
Big Data
No ratings yet
Big Data
16 pages
Introduction To Big Data Analytics - Thendral1
No ratings yet
Introduction To Big Data Analytics - Thendral1
26 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Introduction Big Data
100% (2)
Introduction Big Data
140 pages
Big Data
No ratings yet
Big Data
110 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
7 pages
Unit - 2
No ratings yet
Unit - 2
7 pages
BI Syllabus
No ratings yet
BI Syllabus
1 page
Unit - 2.1
No ratings yet
Unit - 2.1
23 pages
Unit - 5
No ratings yet
Unit - 5
22 pages
Unit 3
No ratings yet
Unit 3
84 pages
Unit 2
No ratings yet
Unit 2
70 pages
Unit - 4
No ratings yet
Unit - 4
19 pages
Artificial Intelligence NPTEl Assignment 1
No ratings yet
Artificial Intelligence NPTEl Assignment 1
10 pages
Unit 4
No ratings yet
Unit 4
74 pages
BDA Unit 5
No ratings yet
BDA Unit 5
61 pages
BDA Unit 4
No ratings yet
BDA Unit 4
72 pages
BDA Unit 2
No ratings yet
BDA Unit 2
52 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
CC Unit I
No ratings yet
CC Unit I
6 pages
Dolby Multichannel Amplifier Product Sheet
No ratings yet
Dolby Multichannel Amplifier Product Sheet
2 pages
Gcc-Tbc-January 2020 Exam. English-Hindi-Marathi-30 & 40 Batchwise Objective Question Papers
No ratings yet
Gcc-Tbc-January 2020 Exam. English-Hindi-Marathi-30 & 40 Batchwise Objective Question Papers
350 pages
Usage of AI in Education
No ratings yet
Usage of AI in Education
24 pages
Software Requirements Specification: Automated Railway Reservation System
No ratings yet
Software Requirements Specification: Automated Railway Reservation System
37 pages
Cisco CCIE Lab Study Guide
No ratings yet
Cisco CCIE Lab Study Guide
7 pages
HVDC - DC Protection Engineer
No ratings yet
HVDC - DC Protection Engineer
1 page
Creating Extra Information Types As A Self-Serivce Function - Oracle Apps
No ratings yet
Creating Extra Information Types As A Self-Serivce Function - Oracle Apps
8 pages
Automated Class Attendance Management System Using Face Recognition
No ratings yet
Automated Class Attendance Management System Using Face Recognition
7 pages
Lab 8 GIS
No ratings yet
Lab 8 GIS
11 pages
GSM Channels
No ratings yet
GSM Channels
44 pages
STEP04 ACT Finance Preparation 2024
No ratings yet
STEP04 ACT Finance Preparation 2024
78 pages
Email Spam PDF
No ratings yet
Email Spam PDF
5 pages
User'S Manual: Revision 1.0a
No ratings yet
User'S Manual: Revision 1.0a
126 pages
Just Basic Programming Langugae
No ratings yet
Just Basic Programming Langugae
13 pages
ASA Failover Troubleshooting On 7.x and 8
No ratings yet
ASA Failover Troubleshooting On 7.x and 8
5 pages
Plantweb Optics Data Lake: Transform Data Into Intelligent Business Decisions
No ratings yet
Plantweb Optics Data Lake: Transform Data Into Intelligent Business Decisions
7 pages
Case Study 1
No ratings yet
Case Study 1
12 pages
A Law Vs U Law Voip Codec Comparison
No ratings yet
A Law Vs U Law Voip Codec Comparison
36 pages
Group Assignment 1 PDF
No ratings yet
Group Assignment 1 PDF
2 pages
APG Commands
No ratings yet
APG Commands
24 pages
OPL Recommended Settings
No ratings yet
OPL Recommended Settings
7 pages
Asynchronous Bus.
No ratings yet
Asynchronous Bus.
3 pages
Zynq Ultrascale Plus Product Brief
No ratings yet
Zynq Ultrascale Plus Product Brief
6 pages
University of Cambridge International Examinations International General Certificate of Secondary Education
No ratings yet
University of Cambridge International Examinations International General Certificate of Secondary Education
8 pages
CS Dept Practical Exams Timetable 2024
No ratings yet
CS Dept Practical Exams Timetable 2024
4 pages
Review On Travel Agency System Management Portal
No ratings yet
Review On Travel Agency System Management Portal
7 pages
Entity Relationship Modeling
No ratings yet
Entity Relationship Modeling
41 pages
GNITS Placements Forum
No ratings yet
GNITS Placements Forum
3 pages
My First HPS
No ratings yet
My First HPS
13 pages
Epi Data
100% (2)
Epi Data
31 pages

BDA Unit 1

Uploaded by

BDA Unit 1

Uploaded by

Unit 1

• Apache Hadoop • Apache Spark

Distro Remarks Free / Premium

Pivotal HD •fast SQL on Hadoop

You might also like