0% found this document useful (0 votes)

20 views37 pages

01 Unit-BDA - Intro BDA

The document outlines the curriculum for a Big Data Analytics course, covering topics such as the classification and characteristics of digital data, the evolution and definition of Big Data, and the challenges associated with it. It also introduces various technologies and tools related to Big Data, including Hadoop, MongoDB, MapReduce, Hive, and Pig. Additionally, it discusses the importance of data analytics and the differences between traditional business intelligence and Big Data approaches.

Uploaded by

sidhukola28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views37 pages

01 Unit-BDA - Intro BDA

Uploaded by

sidhukola28

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

B.

TECH CSE III Year I Semester

2020 – 2021
VCE-R18 (Integrated Course)
BIG DATA ANALYTICS (A4513)

UNIT-1
INTRODUCTION TO BIG DATA

A. BHANU PRASAD
Associate Professor, Dept. of CSE

VARDHAMAN COLLEGE OF ENGINEERING

(AUTONOMOUS)
Shamshabad – 501218, Hyderabad, AP
Course Theory Contents
INTRODUCTION TO BIG DATA: Classification of Digital Data, Characteristics
of Data, Evolution of Big Data, Definition of Big Data, Challenges with Big
Data, What is Big Data?, Other Characteristics of Data Which are not
Definitional Traits of Big Data, Why Big Data?

Are We Just an Information Consumer or Do we also Produce Information?,

Traditional Business Intelligence (BI) versus Big Data, A Typical Data
Warehouse Environment, A Typical Hadoop Environment, What is New
Today?, What is changing in the Realms of Big Data?

BIG DATA ANALYTICS: Where do we Begin?, What is Big Data Analytics?,

What Big Data Analytics Isn’t?, Why this Sudden Hype Around Big Data
Analytics?, Classification of Analytics, Greatest Challenges that Prevent
Businesses from Capitalizing on Big Data, Top Challenges Facing Big Data,

Why is Big Data Analytics Important?, What Kind of Technologies are we

looking Toward to Help Meet the Challenges Posed by Big Data?,
Terminologies Used in Big Data Environments, Basically Available Soft State
Eventual Consistency (BASE), Few Top Analytics Tools.

2
Theory Contents Contd..
THE BIG DATA TECHNOLOGY LANDSCAPE: NoSQL (Not Only SQL),
Hadoop, Introduction to Hadoop, Introducing Hadoop, Why Hadoop?, Why
not RDBMS?, RDBMS versus Hadoop, Distributed Computing Challenges,
History of Hadoop, Hadoop Overview, Use Case of Hadoop, Hadoop
Distributors, HDFS (Hadoop Distributed File System), Processing Data with
Hadoop, Managing Resources and Applications with Hadoop YARN (Yet
another Resource Negotiator), Interacting with Hadoop Ecosystem.

INTRODUCTION TO MONGODB: What is MongoDB?, Why MongoDB?,

Terms Used in RDBMS and MongoDB, Data Types in MongoDB, MongoDB
Query Language.

INTRODUCTION TO MAPREDUCE PROGRAMMING: Introduction, Mapper,

Reducer, Combiner, Partitioner, Searching, Sorting, Compression

INTRODUCTION TO HIVE: What is Hive?, Hive Architecture, Hive Data

Types, Hive File Format, Hive Query Language (HQL)

INTRODUCTION TO PIG: What is Pig?, The Anatomy of Pig, Pig on Hadoop,

Pig Philosophy, Use Case for Pig: ETL Processing, Pig Latin Overview, Data
Types in Pig, Running Pig, Execution Modes of Pig, HDFS Commands. 3
INTRODUCTION TO BIG DATA

Lecture
Topics to be Covered
#

Introduction to Big Data: Classification of Digital Data,

1 Characteristics of Data, Evolution of Big Data, Definition of
Big Data, Challenges with Big Data
What is Big Data?, Other Characteristics of Data Which are
2
not Definitional Traits of Big Data, Why Big Data?
Are We Just an Information Consumer or Do we also Produce
3 Information?, Traditional Business Intelligence (BI) versus
Big Data,
A Typical Data Warehouse Environment, A Typical Hadoop
4 Environment, What is New Today?, What is changing in the
Realms of Big Data?

4
BOOKS
TEXT BOOKS:
1. Big Data and Analytics
Seema Acharya, Subhashini Chellappan
2nd Edition, Wiley India.

REFERENCE BOOKS:
2. Big Data Now
O'Reilly Media, 2nd Edition, 2012

3. Big Data: A Revolution That Will Transform

How We Live, Work, and Think
Viktor Mayer-Schonberger, Kenneth Cukier,
Mariner Books, 2014
5
INTRODUCTION TO BIG DATA
1.1 Classification of Digital Data
Digital data can be classified into 3 types
1) Structured Data
 This is the data which is in an organized form
(e.g., in rows and columns) and can be easily
used by a computer program.
 Relationships exist between entities of data, such
as classes and their objects.
 Ex: data stored in databases
2) Semi-structured data:
 This is the data which does not conform to a data model but has some
structure. However, it is not in a form which can be used easily by a
computer program.
 Ex: emails, XML, markup languages like HTML, etc. Metadata for this
data is available but is not sufficient.
3) Unstructured data:
 This is the data which does not conform to a data model or is not in a form
which can be used easily by a computer program.
 About 80–90% data of an organization is in this format.
 Ex: memos, chat rooms, PowerPoint presentations, images, videos, letters,
researches, white papers, body of an email, etc. 6
1.1.1 Structured Data
 Structured Data: when data conforms to a pre-defined schema/structure we
say it is structured data.
 Most of the structured data is held in RDBMS. An RDBMS conforms to the
relational data model wherein the data is stored in rows/columns. Refer Table
1.1.
 The number of rows/records/tuples in a relation is called the cardinality of a
relation. The number of columns is referred to as the degree of a relation.

7
Sources of Structured data
 Oracle, IBM-DB2, Microsoft SQL server,
MySQL(open source)
 Online transaction processing (OLTP):
Transactional/operational data in day to day
business activities Ex: online banking
 Online shopping
 Use simple queries
 Required read/write operations
 Size is smaller 100MB to 10GB

8
Ease of working with Structured data
1) Insert/delete/update: The Data Manipulation Language (DML)
operations provide the required ease with data input, storage, access,
process, analysis, etc.
2) Security: Encryption and tokenization solutions are available for the
security of information throughout its lifecycle. . Only authorized
individuals are able to decrypt and view sensitive information.
3) Indexing: An index is a data structure that speeds up the data
retrieval operations.
4) Scalability: The storage and processing capabilities of the traditional
RDBMS can be easily scaled up by increasing the horsepower of the
database server.
5) Transactional processing: RDBMS has support for Atomicity,
Consistency, Isolation, and Durability (ACID) properties of
transaction. Atomicity: Either a transaction happens in its entirety or
none of it at all. Consistency: Before and after execution of
transaction the database must be consistent state. Isolation: It allows
concurrent execution. Durability: All changes made to the database
during a transaction are permanent.
9
1.1.2 Semi-Structured Data
 Semi-structured data is also referred to as self-describing structure. It
has the following features:
1) It does not conform to the data models that one typically associates
with relational databases or any other form of data tables.
2) It uses tags to segregate semantic elements.
3) Tags are also used to enforce hierarchies of records and fields within
data.
5) There is no separation between the
data and the schema. The amount
of structure used is dictated by the
purpose at hand.
6) In semi-structured data, entities
belonging to the same class and
also grouped together need not
necessarily have the same set of
attributes.
And if at all, they have the same set of attributes, the order of
attributes may not be similar and for all practical purposes it is not 10
important as well.
Sources of Semi-Structured Data
 Amongst the sources for semi-structured data, the front runners are
“XML” and “JSON” as depicted in Fig.
1) XML: eXtensible Markup Language (XML) is hugely popularized by
web services developed utilizing the Simple Object Access Protocol
(SOAP) principles.
2) JSON: Java Script Object Notation (JSON) is used to transmit data
between a server and a web application. JSON is popularized by web
services developed utilizing the Representational State Transfer
(REST) – an architecture style for creating scalable web services.
MongoDB (open-source, distributed, NoSQL, documented-oriented
database) and Couchbase (originally known as Membase, open-source,
distributed, NoSQL, document-oriented database) store data natively
in JSON format.

11
1.1.3 UnStructured Data
 Unstructured data does not conform to any pre-
defined data model.
 The structure of the unstructured data is quite
unpredictable. Various sources of unstructured
data is depicted in Figure 1.8 .

12
Issues with “Unstructured” Data
 Although unstructured data is known NOT to conform to a pre-
defined data model or be organized in a pre-defined manner, there are
incidents wherein the structure of the data (placed in the unstructured
category) can still be implied.
As mentioned in Fig, there could be few other reasons behind placing
data in the unstructured category despite it having some structure or
being highly structured.

13
How to Deal with Unstructured Data?
 The following techniques are used to find patterns in or interpret
unstructured data:
1) Data mining: We use methods at the intersection of artificial
intelligence, machine learning, statistics, and database systems to
unearth consistent patterns in large data sets and/or systematic
relationships between variables. It is the analysis step of the
“knowledge discovery in databases” process. Few popular data mining
algorithms are as follows:
 Association rule mining: It is also called “market basket analysis”
or “affinity analysis”. It is about when you buy a product, what is
the other product that you are likely to purchase with it.
 Regression analysis: It helps to predict the relationship between
two variables. The variable whose value needs to be predicted is
called the dependent variable and the variables which are used to
predict the value are referred to as the independent variables.
 Collaborative filtering: It is about predicting a user’s preference or
preferences based on the preferences of a group of users.
14
Deal with Unstructured Data Contd..
2) Text analytics or text mining: Compared to the structured data stored
in relational databases, text is largely unstructured, amorphous, and
difficult to deal with algorithmically. Text mining is the process of
gleaning high quality and meaningful information (through devising
of patterns and trends by means of statistical pattern learning) from
text. It includes tasks such as text categorization, text clustering,
sentiment analysis, concept/entity extraction, etc.
3) Natural language processing (NLP): It is related to the area of human
computer interaction. It is about enabling computers to understand
human or natural language input.
4) Noisy text analytics: It is the process of extracting structured or semi-
structured information from noisy unstructured data such as chats,
blogs, wikis, emails, message-boards, text messages, etc. The noisy
unstructured data usually comprises one or more of the following:
Spelling mistakes, abbreviations, acronyms, non-standard words,
missing punctuation, missing letter case, filler words such as “uh”,
“um”, etc.

15
Deal with Unstructured Data Contd..
2) Manual tagging with metadata: This is about tagging manually with
adequate metadata to provide the requisite semantics to understand
unstructured data.
3) Part-of-speech tagging: It is also called POS or POST or grammatical
tagging. It is the process of reading text and tagging each word in the
sentence as belonging to a particular part of speech such as “noun”,
“verb”, “adjective”, etc.
4) Unstructured Information Management Architecture (UIMA): It is an
open source platform from IBM. It is used for real-time content
analytics. It is about processing text and other unstructured data to
find latent meaning and relevant relationship buried therein.

16
1.2 Characteristics of Data
 Data has three key characteristics:
1) Composition: The composition of data deals with the structure of data,
that is, the sources of data, the granularity, the types, and the nature of
data as to whether it is static or real-time streaming.
2) Condition: The condition of data deals with the state of data, that is,
“Can one use this data as is for analysis?” or “Does it require cleansing
for further enhancement and enrichment?”
3) Context: The context of data deals with “Where has this data been
generated?” “Why was this data generated?” “How sensitive is this data?”
“What are the events associated with this data?” and so on.
 Small data (data as it existed prior to the big data revolution) is about
certainty. It is about fairly known data sources; it is about no major
changes to the composition or context of data.
 Big data is about complexity… complexity in
terms of multiple and unknown datasets,
exploding volume, speed at which the data is
being generated and needs to be processed,
and in terms of the variety of data (internal or
external, behavioral or social) that is being
17
generated.
1.3 Evolution of BIG DATA
 1970s and before was the era of mainframes.
 The data was essentially primitive and structured. Relational
databases evolved in 1980s and 1990s. The era was of data intensive
applications.
 The World Wide Web (WWW) and the Internet of Things (IoT) have
led to an onslaught of structured, unstructured, and multimedia data
as shown in Table 2.1.

18
1.4 Definition of BIG DATA
 Different sources defined Big data in different ways:
 Big data is high-volume, high-velocity, and high-variety information
assets that demand cost effective, innovative forms of information
processing for enhanced insight and decision making. (or)
 Big data is anything beyond the human and technical infrastructure
needed to support storage, processing, and analysis. (or)
 Big data is the term for the collection of datasets so large and complex
that it becomes difficult to process using database system tools and
traditional processing applications
 Today’s BIG may be tomorrow’s NORMAL.
 The 3Vs (Volume, Velocity, Variety) concept was proposed by the
Gartner analyst Doug Laney
 There is no explicit definition of how big the dataset should be for it to
be considered “big data.” Big data that is just too big, moves fast, and
does not fit the structures of typical database systems. The data
changes are highly dynamic.
19
1.5 Challenges with Big Data
 Following are a few challenges with big data:
1) Data Generation: Data today is growing at an exponential rate. The
key questions here are: “Will all this data be useful for analysis?”, “Do
we work with all this data or a subset of it?”, “How will we separate
the knowledge from the noise?”, etc.
2) Cloud computing and virtualization: Cloud computing is the answer
to managing infrastructure for big data as far as cost-efficiency,
elasticity, and easy upgrading/downgrading is concerned. This
further complicates the decision to host big data solutions outside the
enterprise.
3) Retention: How long should one retain this data? As some data is
useful for making long-term decisions, whereas in few cases, the data
may quickly become irrelevant and obsolete just a few hours after
having being generated.
4) Lack of Talent: There are a lot of Big Data projects in major
organizations, but there is a lack of skilled professionals who possess
a high level of proficiency in data sciences that is vital in
implementing big data solutions. 20
Challenges with Big Data Contd..
5) Data visualization: is becoming popular as a separate discipline. We
are short by quite a number, as far as business visualization experts
are concerned.
6) Data Quality: The problem is with Veracity of data. The data is very
messy, inconsistent and incomplete
7) Discovery: Analyzing peta bytes of data using extremely powerful
algorithms to find patterns and insights are very difficult.
8) Storage: The more data an organization has, the more complex the
problems of managing it can become. The question that arises here is
“Where to store it?”. We need a storage system which can easily scale
up or down on-demand
9) Analytics: In the case of Big Data, most of the time we are unaware of
the kind of data we are dealing with, so analyzing that data is even
more difficult.
10) Security: Since the data is huge in size, keeping it secure is another
challenge. It includes user authentication, restricting access based on
a user, recording data access histories, proper use of data encryption
etc 21
1.6 What is Big Data?
 Big data is data that is big in volume, velocity, and variety. Refer
Figure 2.5.
1) Volume
 Volume refers to the ‘amount of data’, which is growing day by day at a
very fast pace. (or) data can actually be considered as a Big Data or
not, is dependent upon the volume of data.
 Data rapidly increasing GB, TB, PB….
Sources of big data
1) Typical internal data sources: Data present within an organization’s
firewall. It is as follows:
• Data storage: File systems, SQL (RDBMSs – Oracle, MS SQL
Server, DB2, MySQL, PostgreSQL, etc.), NoSQL (MongoDB,
Cassandra, etc.), and so on.
• Archives: Archives of scanned documents, paper archives, customer
correspondence records, patients’ health records, students’
admission records, students’ assessment records, and so on.
22
Sources of big data Contd..
2) External data sources: Data residing outside an organization’s firewall.
It is as follows:
• Public Web: Wikipedia, weather, regulatory, compliance, census,
etc.
3) Both (internal + external data sources)
• Sensor data: Car sensors, smart electric meters, office buildings, air
conditioning units, refrigerators, and so on.
• Machine log data: Event logs, application logs, Business process
logs, audit logs, clickstream data, etc.
• Social media: Twitter, blogs, Facebook, LinkedIn, YouTube,
Instagram, etc.
• Business apps: ERP, CRM, HR, Google Docs, and so on. • Media:
Audio, Video, Image, Podcast, etc.
• Docs: Comma separated value (CSV), Word Documents, PDF, XLS,
PPT, and so on.

23
Sources of big data

• Data storage: File systems,

• Archives: Archives of scanned SQL (RDBMSs – Oracle, MS SQL
documents, paper archives, customer Internal data sources Server, DB2, MySQL, PostgreSQL,
correspondence records, patients’ etc.), NoSQL (MongoDB,
health records, students’ admission Cassandra, etc.), and so on.
records, students’ assessment records,
and so on.

• Sensor data: Car

•Media: Audio, Video,
sensors, smart electric
Image, Podcast, etc
meters, office buildings,
air conditioning units,
refrigerators, and so on.
• Docs: Comma separated
• Machine log data: value (CSV), Word
Event logs, application Documents, PDF, XLS, PPT
logs, Business process
logs, audit logs,
clickstream data, etc. . • Business apps: ERP,
CRM, HR, Google Docs,
and so on

• Social media: Twitter,

• Public Web: Wikipedia, weather, blogs, Facebook, LinkedIn,
regulatory, compliance, census, etc. YouTube, Instagram, etc.
Exernal data sources 24
Big Data Contd..

2) Velocity: Refers to the speed of generation of data, How fast the data
is generated and processed to meet the demands.
We have moved from the days of batch processing (remember our payroll
applications) to real-time processing.
Batch → Periodic → Near real time → Real-time processing
1990: HD: 1GB-20GB ,Ram: 28MB, Reading capacity: 10kbps
3) Variety: Variety deals with a wide range of data types and sources of
data. There are three categories:
1) Structured data: From traditional transaction processing systems and
RDBMS, etc.
2) Semi-structured data: For example Hyper Text Markup Language
(HTML), eXtensible Markup Language (XML).
3) Unstructured data: For example unstructured text documents, audios,
videos, emails, photos, PDFs, social media, etc.

25
1.7 Other Characteristics of Data Which are
not Definitional Traits of Big Data
 There are yet other characteristics of data which are not necessarily the
definitional traits of big data. Few of these are listed as follows:
1) Veracity and validity: Veracity refers to biases, noise, and
abnormality in data. The key question here is: “Is all the data that is
being stored, mined, and analyzed meaningful and pertinent to the
problem under consideration?” Validity refers to the accuracy and
correctness of the data. Any data that is picked up for analysis needs
to be accurate. It is not just true about big data alone.
2) Volatility: Volatility of data deals with, how long is the data valid?
And how long should it be stored? There is some data that is required
for long-term decisions and remains valid for longer periods of time.
However, there are also pieces of data that quickly become obsolete
minutes after their generation.
3) Variability: Data flows can be highly inconsistent with periodic
peaks. Process of being able to handle and manage the data
effectively.

26
1.7 Other Characteristics of Data Which are
not Definitional Traits of Big Data
 There are yet other characteristics of data which are not necessarily the
definitional traits of big data. Few of these are listed as follows:
1) Veracity and validity: Veracity refers to biases, noise, and
abnormality in data. The key question here is: “Is all the data that is
being stored, mined, and analyzed meaningful and pertinent to the
problem under consideration?” Validity refers to the accuracy and
correctness of the data. Any data that is picked up for analysis needs
to be accurate. It is not just true about big data alone.
2) Volatility: Volatility of data deals with, how long is the data valid?
And how long should it be stored? There is some data that is required
for long-term decisions and remains valid for longer periods of time.
However, there are also pieces of data that quickly become obsolete
minutes after their generation.
3) Variability: Data flows can be highly inconsistent with periodic
peaks. Process of being able to handle and manage the data
effectively.
4) Value: Big Data i.e. Value. Is it adding to the benefits of the
organizations who are analyzing big data? 27
28
1.8 Why Big Data?
 The more data we have for analysis, the greater will be the analytical
accuracy and also the greater would be the confidence in our decisions
based on these analytical findings.
 This will entail a greater positive impact in terms of enhancing
operational efficiencies, reducing cost and time, and innovating on new
products, new services, and optimizing existing services. Refer Figure
2.8.

29
1.9 Are We Just an Information Consumer or
Do we also Produce Information?
There are several instances everyday where you generate data.
1) Text message to send in the confirmation to attend the promotion
bash.
2) Use of credit card to pay for gas/fuel at the gas station.
3) Point of Sale system at Archie’s where your transaction gets recorded.
4) Photographs and posts on social networking sites.
5) Likes and comments to your post.

30
1.10 Traditional Business Intelligence (BI) versus
Big Data
 Some of the differences between traditional BI and big data.
S.No Traditional BI environment Big Data
1) In traditional BI environment, In a big data environment, data
all the enterprise’s data is resides in a distributed file
housed in a typical central system that scales horizontally
database server that scales
vertically.
2) data is generally analyzed in an data, it is analyzed in both real
offline mode time as well as in offline mode.
3) Traditional BI is about Big data is about variety:
structured data and it is here Structured, semi-structured,
that data is taken to processing and unstructured data and
functions (move data to code). here the processing functions
are taken to the data (move
code to data).

31
1.11 A Typical Data Warehouse Environment
 In a typical Data Warehouse (DW) environment, operational or
transactional or day-to-day business data is gathered from Enterprise
Resource Planning (ERP) systems, Customer Relationship
Management (CRM), legacy systems, and several third party
applications.
 The data from these sources may differ in format [data could have been
housed in any RDBMS such as Oracle, MS SQL Server, DB2, MySQL,
and Teradata, and so on or in spreadsheet (.xls, .xlsx, etc.) or .csv or
txt].
 Data may come from data sources located in the same geography or
different geographies. This data is then integrated, cleaned up,
transformed, and standardized through the process of Extraction,
Transformation, and Loading (ETL). The transformed data is then
loaded into the enterprise data warehouse (available at the enterprise
level) or data marts (available at the business unit/ functional unit or
business process level).
 A host of market leading business intelligence and analytics tools are
then used to enable decision making from the use of ad-hoc queries,
32
SQL, enterprise dashboards, data mining, etc. Refer Figure 2.9.
1.11 A Typical Data Warehouse Environment
business intelligence and
data is gathered analytics tools
from different Extraction,
sources and in Transformation,
different Loading (ETL).
formats

33
1.12 A Typical Hadoop Environment
 Hadoop environment is very different from the data warehouse environment.
As is fairly obvious from Figure 2.10, the data sources are quite disparate from
web logs to images, audios, and videos to social media data to the various docs,
pdfs, etc.
 Here the data in focus is not just the data within the company’s firewall but
also data residing outside the company’s firewall.
 This data is placed in Hadoop Distributed File System (HDFS). If need be, this
can be repopulated back to operational systems or fed to the enterprise data
warehouse or data marts or Operational Data Store (ODS) to be picked for
further processing and analysis.

34
1.13 What is New Today?
Coexistence of Big Data and Data Warehouse
 Few companies are a wee bit comfortable working with incumbent data
warehouse for standard BI and analytics reporting, for example the quarterly
sales report, customer dashboard, etc.
 The power that Hadoop brings to the table with different types of analysis on
different types of data.
 The same operational systems, was engaged in powering the data warehouse,
can also populate the big data environment when they’re needed for
computation-rich processing or for raw data exploration.
 We cannot ignore the powerful analytics capability of Hadoop and the
revolutionary developments in RDBMS. So, the need of the hour is to have
both data warehouse and Hadoop co-exist in today’s environment.

35
1.14 What is changing in the Realms of Big Data?
 Three very important reasons why companies should compulsorily
consider leveraging big data:
1) Competitive advantage: The most important resource with any
organization today is their data. What they do with it will determine
their fate in the market.
2) Decision making: Decision making has shifted from the hands of the
elite few to the empowered many. Good decisions play a significant
role in furthering customer engagement, reducing operating margins
in retail, cutting cost and other expenditures in the health sector.
3) Value of data: The value of data continues to see a steep rise. As the
all-important resource, it is time to look at newer architecture, tools,
and practices to leverage this.

36
37

1 Bda A6515 Intro Bda
No ratings yet
1 Bda A6515 Intro Bda
48 pages
1 Big Data Analytics-Introduction R21 A7902 ABP
No ratings yet
1 Big Data Analytics-Introduction R21 A7902 ABP
14 pages
Unit-1 Bda
No ratings yet
Unit-1 Bda
17 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Unit - Big - Data
No ratings yet
Unit - Big - Data
107 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Data Science Class2
No ratings yet
Data Science Class2
33 pages
Big Data Analytics Notess
No ratings yet
Big Data Analytics Notess
69 pages
Module 1
No ratings yet
Module 1
60 pages
Unit 4 DigitalData
No ratings yet
Unit 4 DigitalData
22 pages
CS 441 Handouts
No ratings yet
CS 441 Handouts
300 pages
Unit 1 Bigdata
No ratings yet
Unit 1 Bigdata
30 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Big Data and Analytics Cse448 Module 1 L
No ratings yet
Big Data and Analytics Cse448 Module 1 L
38 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
1.1 Module-1
No ratings yet
1.1 Module-1
31 pages
UNIT I Notes
No ratings yet
UNIT I Notes
26 pages
Unit 1
No ratings yet
Unit 1
44 pages
Unit 1-2
No ratings yet
Unit 1-2
78 pages
Basics of Big Data Notes
No ratings yet
Basics of Big Data Notes
17 pages
Big Data Analytics QB
No ratings yet
Big Data Analytics QB
44 pages
Fbda Unit-1
No ratings yet
Fbda Unit-1
17 pages
Cloud Computing
No ratings yet
Cloud Computing
86 pages
Bigdata Notes-1 To 3
No ratings yet
Bigdata Notes-1 To 3
32 pages
Big Data Chapter-I - New
No ratings yet
Big Data Chapter-I - New
49 pages
Big Data Study 1
No ratings yet
Big Data Study 1
77 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Bda MST Merged
No ratings yet
Bda MST Merged
230 pages
Big Data & Analytics (CSE448) L1
No ratings yet
Big Data & Analytics (CSE448) L1
50 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Big Data NOTES and QB
No ratings yet
Big Data NOTES and QB
92 pages
SESSION 2017-2018: B.Tech (Cse) Year: Iv Semester: Viii
No ratings yet
SESSION 2017-2018: B.Tech (Cse) Year: Iv Semester: Viii
68 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Emergency Chapter Two
No ratings yet
Emergency Chapter Two
41 pages
Itfm Assignment Group 8
100% (1)
Itfm Assignment Group 8
16 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
10 pages
Unit 1
No ratings yet
Unit 1
59 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
32 pages
Unit - 1 Introduction To Big Data
No ratings yet
Unit - 1 Introduction To Big Data
29 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
UNIT - 1 - DA - Notes
No ratings yet
UNIT - 1 - DA - Notes
51 pages
Unit 1 BDT
No ratings yet
Unit 1 BDT
27 pages
(Ca) Bda Unit-I
No ratings yet
(Ca) Bda Unit-I
10 pages
Big Data & Analytics (CSE448) L1
No ratings yet
Big Data & Analytics (CSE448) L1
51 pages
BDU1
No ratings yet
BDU1
39 pages
Unit-I - Big Data
No ratings yet
Unit-I - Big Data
29 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Chapter 1
No ratings yet
Chapter 1
149 pages
BDA Unit-1
No ratings yet
BDA Unit-1
35 pages
Bda Unit 1
No ratings yet
Bda Unit 1
24 pages
Lecture 1 Introduction To Data Engineering
No ratings yet
Lecture 1 Introduction To Data Engineering
7 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
Module 1 1
No ratings yet
Module 1 1
68 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Jump Start MySQL: Master the Database That Powers the Web
From Everand
Jump Start MySQL: Master the Database That Powers the Web
Timothy Boronczyk
No ratings yet
Application Form For Grant of Condonation-IV B.Tech. II Semester
No ratings yet
Application Form For Grant of Condonation-IV B.Tech. II Semester
1 page
Calloused Mind Training Journal
No ratings yet
Calloused Mind Training Journal
3 pages
Bus Ticket Booking Documentation
No ratings yet
Bus Ticket Booking Documentation
4 pages
April2025 AzureOpen (AI) PromptEngineering en
No ratings yet
April2025 AzureOpen (AI) PromptEngineering en
10 pages
LoopCV Guide
No ratings yet
LoopCV Guide
5 pages
Kunal Gir BodyRecomposition Workout Plan
No ratings yet
Kunal Gir BodyRecomposition Workout Plan
7 pages
Parisodhana 2025 Template
No ratings yet
Parisodhana 2025 Template
1 page
Daily Failure Visualization Script Huberman
No ratings yet
Daily Failure Visualization Script Huberman
2 pages
Wireless Communications and Mobile Computing - 2022 - Hussain - Face Mask Detection Using Deep Convolutional Neural Network
No ratings yet
Wireless Communications and Mobile Computing - 2022 - Hussain - Face Mask Detection Using Deep Convolutional Neural Network
10 pages
04 Introduction To CassandraDB
No ratings yet
04 Introduction To CassandraDB
19 pages
Squares and Cubes - 7041083 - 2025 - 02 - 21 - 23 - 27
No ratings yet
Squares and Cubes - 7041083 - 2025 - 02 - 21 - 23 - 27
3 pages
Clustering Before Classification
No ratings yet
Clustering Before Classification
3 pages
Archana
No ratings yet
Archana
6 pages
DBSCAN Algorithm
No ratings yet
DBSCAN Algorithm
15 pages
DWM Musa
No ratings yet
DWM Musa
4 pages
Thesis Mari Maisuradze
No ratings yet
Thesis Mari Maisuradze
76 pages
Big Data Components
No ratings yet
Big Data Components
31 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Data Warehousing and Data Mining Syllabus
No ratings yet
Data Warehousing and Data Mining Syllabus
1 page
Chapter 3: Data Mining
No ratings yet
Chapter 3: Data Mining
20 pages
Regresi Klasifikasi Klasterasi Asosiasi
No ratings yet
Regresi Klasifikasi Klasterasi Asosiasi
38 pages
Tibco Spot Miner 8.2 Uguide
No ratings yet
Tibco Spot Miner 8.2 Uguide
756 pages
Mkristel Aleman Mreview Bigdata
No ratings yet
Mkristel Aleman Mreview Bigdata
11 pages
Unit 8 DBSCAN
No ratings yet
Unit 8 DBSCAN
53 pages
Unit V - Web and Text Mining
No ratings yet
Unit V - Web and Text Mining
35 pages
Introduction To Data Mining For Bioinformatics: Fall 2005 Peter Van Der Putten (Putten - at - Liacs - NL)
No ratings yet
Introduction To Data Mining For Bioinformatics: Fall 2005 Peter Van Der Putten (Putten - at - Liacs - NL)
50 pages
Week 1 Introduction To The Machine Learning Course
No ratings yet
Week 1 Introduction To The Machine Learning Course
10 pages
Management Information Systems: Chapter 8: Accessing Organizational Information - Data Warehouse + BP18
No ratings yet
Management Information Systems: Chapter 8: Accessing Organizational Information - Data Warehouse + BP18
27 pages
Persistent Forecasting of Disruptive Technologies-Report 2 - 12834
No ratings yet
Persistent Forecasting of Disruptive Technologies-Report 2 - 12834
342 pages
Data Analytics
No ratings yet
Data Analytics
302 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
A Behavior Based Intrusion Detection System Using Machine Learning Algorithms
No ratings yet
A Behavior Based Intrusion Detection System Using Machine Learning Algorithms
16 pages
Customer Segmentation Using Machine Learning: Ilavendhan@galgotiasuniversity - Edu.in
No ratings yet
Customer Segmentation Using Machine Learning: Ilavendhan@galgotiasuniversity - Edu.in
7 pages
A Exercises Solutions
No ratings yet
A Exercises Solutions
13 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
DAL Syllabus
No ratings yet
DAL Syllabus
4 pages
14 - Sanket Joshi
No ratings yet
14 - Sanket Joshi
11 pages
Dr.R.Vidya
No ratings yet
Dr.R.Vidya
21 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Resume For Mining Job
100% (2)
Resume For Mining Job
7 pages
(MCQ) - Data Warehouse and Data Mining - LMT
No ratings yet
(MCQ) - Data Warehouse and Data Mining - LMT
4 pages

01 Unit-BDA - Intro BDA

Uploaded by

01 Unit-BDA - Intro BDA

Uploaded by

B.

TECH CSE III Year I Semester

VARDHAMAN COLLEGE OF ENGINEERING

Are We Just an Information Consumer or Do we also Produce Information?,

BIG DATA ANALYTICS: Where do we Begin?, What is Big Data Analytics?,

Why is Big Data Analytics Important?, What Kind of Technologies are we

INTRODUCTION TO MONGODB: What is MongoDB?, Why MongoDB?,

INTRODUCTION TO MAPREDUCE PROGRAMMING: Introduction, Mapper,

INTRODUCTION TO HIVE: What is Hive?, Hive Architecture, Hive Data

INTRODUCTION TO PIG: What is Pig?, The Anatomy of Pig, Pig on Hadoop,

Introduction to Big Data: Classification of Digital Data,

3. Big Data: A Revolution That Will Transform

• Data storage: File systems,

• Sensor data: Car

• Social media: Twitter,

You might also like