0% found this document useful (0 votes)

476 views17 pages

BIG Data Analytics

1. The document discusses the classification, characteristics, and definition of big data. Digital data can be structured, semi-structured, or unstructured. 2. Characteristics of big data include the five V's: volume, variety, velocity, veracity, and value. Big data has a very large volume from many sources, exists in various structured and unstructured forms, is created and processed at high speeds, requires validation of its reliability, and is most useful when it has value. 3. Big data is defined as large and complex datasets that cannot be processed by traditional data processing applications. It has evolved from the era of mainframes and structured data to include vast amounts of data from many sources in

Uploaded by

Pawan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

476 views17 pages

BIG Data Analytics

Uploaded by

Pawan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Big data Analy cs using R B.

com (VI- Sem)

UNIT I

INTRODUCTION TO BIGDATA

Data, classification Of Digital Data--structured, unstructured, semi-structured data,

characteristics of data, evaluation of big data, definition and challenges of big data ,
what is big data and why to use big data ?, business intelligence Vs big data.

1) What is Data? and What are (explain about) the Classification of digital data?

Irrespective of the size of the enterprise whether it is big or small, data continues to
be a precious and irreplaceable asset.
Data:
Data is present in homogeneous sources as well as in heterogeneous sources. The
need of the hour is to understand, manage, process, and take the data for analysis to draw
valuable insights. Digital data can be structured, semi-structured or unstructured data.
Data generates information and from information we can draw valuable insight.
The digital data can be broadly classified into 3 types.
They are
I. structured,
II. semi-structured, and
III. unstructured data.

Classiﬁca on of digital Data

Structured Data Semi- structured Unstructured

Data Data

Fig. Classification of Digital data

1. Structured data:
 When data follows a pre-defined schema/structure we say it is structured data.
 This is the data which is in an organized form (e.g., in rows and columns) and be easily
used by a computer program.
 Relationships exist between entities of data, such as classes and their objects.
 About 10% data of an organization is in this format.

1
Big data Analy cs using R B.com (VI- Sem)

 Data stored in databases is an example of structured data.

2. Semi-structured data:
 Semi-structured data is also referred to as self-describing structure.
 This is the data which does not conform to a data model but has some structure.
 However, it is not in a form which can be used easily by a computer program. About
10% data of an organization is in this format;
for example,
HTML,
XML,
JSON,
email data etc.
3. Unstructured data:
 This is the data which does not conform to a data model or is not in a form which can
be used easily by a computer program.
 About 80% data of an organization is in this format;
for example
images,
videos,
Audios,
chat rooms,
memos,
PowerPoint presentations,
body of an email, etc.

2) What are the characteristics of data ?

Data has three key characteristics:

They are
1. Composition,
2. Condition
3. Context

1. Composition: The composition of data deals with the structure of data, that is, the sources
of data, the granularity, the types, and the nature of data as to whether it is static or real-
time streaming.
2. Condition: The condition of data deals with the state of data, that is, "Can one use this
data as is for analysis." or "Does it require cleaning for further enhancement and
enrichment."
3. Context: The context of data deals with "Where has this data been generated." "Why was
this data generated." How sensitive is this data."

"What are the events associated with this data." and so on.

2
Big data Analy cs using R B.com (VI- Sem)

 Small data (data as it existed prior to the big data revolution) is about certainty. It is
about known data sources;
 it is about no major changes to the composition or context of data.

Composition

Condition

Context

Fig. Characteristics of data (Big Data and Analytics)

 Big data is about complexity.

 Complexity in terms of multiple and unknown datasets, in terms of exploding
volume, in terms of speed at which the data is being generated and the speed at which
it needs to be processed and in terms of the variety of data (internal or external,
behavioural or social) that is being generated.

3) What are the Big Data Characteristics?

Big Data contains a large amount of data that is not being processed by traditional data storage
or the processing unit. It is used by many multinational companies to process the data and
business of many organizations. The data flow would exceed 150 exabytes per day before
replication.

There are five v's of Big Data that explains the characteristics.

5 V's of Big Data

1. Volume
2. Veracity
3. Variety
4. Value
5. Velocity

3
Big data Analy cs using R B.com (VI- Sem)

1.Volume

The name Big Data itself is related to an enormous size. Big Data is a vast ‘volume’ of data
generated from many sources daily, such as business processes, machines, social media
platforms, networks, human interactions, and many more.

Facebook can generate approximately a billion messages, 4.5 billion times that the "Like"
button is recorded, and more than 350 million new posts are uploaded each day. Big data
technologies can handle large amounts of data.

4
Big data Analy cs using R B.com (VI- Sem)

2.Variety

Big Data can be structured, unstructured, and semi-structured that are being collected from
different sources. Data will only be collected from databases and sheets in the past, But these
days the data will comes in array forms, that are PDFs, Emails, audios, SM posts, photos,
videos, etc.

The data is categorized as below:

a. Structured data: In Structured schema, along with all the required columns. It is in a
tabular form. Structured Data is stored in the relational database management system.
b. Semi-structured: In Semi-structured, the schema is not appropriately defined,
e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction Processing)
systems are built to work with semi-structured data. It is stored in relations, i.e., tables.
c. Unstructured Data: All the unstructured files, log files, audio files, and image files
are included in the unstructured data. Some organizations have much data available,
but they did not know how to derive the value of data since the data is raw.
d. Quasi-structured Data: The data format contains textual data with inconsistent data
formats that are formatted with effort and time with some tools.

Example: Web server logs, i.e., the log file is created and maintained by some server that
contains a list of activities.

3.Veracity

Veracity means how much the data is reliable. It has many ways to filter or translate the data.
Veracity is the process of being able to handle and manage data efficiently. Big Data is also
essential in business development.

5
Big data Analy cs using R B.com (VI- Sem)

For example, Facebook posts with hashtags.

4.Value

Value is an essential characteristic of big data. It is not the data that we process or store. It
is valuable and reliable data that we store, process, and also analyze.

5.Velocity

Velocity plays an important role compared to others. Velocity creates the speed by which the
data is created in real-time. It contains the linking of incoming data sets speeds, rate of
change, and activity bursts. The primary aspect of Big Data is to provide demanding data
rapidly.

Big data velocity deals with the speed at the data flows from sources like application logs,
business processes, networks, and social media sites, sensors, mobile devices, etc.

4) What is the Big Data? Evolution of Big Data?

Big Data:
Big data is the large onset of structured, semi-structured, and unstructured data. It is data that
arrives at a much higher volume, at a much faster rate, in a wider variety of file formats, and
from a wider variety of sources, than that of structured data alone.

The term ‘big data’ has been around since the late 1990s, when it was officially coined by
NASA researchers, Application-Controlled Demand Paging for Out-of-Core Visualization.

They used the term to describe the challenge of processing and visualizing vast amounts of
data from supercomputers.

three primary components still in use today to describe big data:

Volume (size of data),
Velocity (speed in which data grows), and
Variety (number of data types and sources with which the data comes from).
The History and Evolution of Big Data:

6
Big data Analy cs using R B.com (VI- Sem)

 1970s and before was the era of mainframes.

 The data was essentially primitive and structured.
 Relational databases evolved in 1980s and 1990s. The era was of data intensive
applications.
 The World Wide Web (WWW) and the Internet of Things (IOT) have led to an
onslaught of structured, unstructured, and multimedia data.
Refer Table 1.1.

Table 1.1 The evolution of big data (Big Data and Analytics)

Sno Year of Duration Technology

1 1940 to 1989 Data Warehousing and Personal Desktop Computers

2 1989 to 1999 Emergence of the World Wide Web

3 2000s to 2010s Controlling Data Volume, Social Media and Cloud

Computing

4 2010s to now Optimization Techniques, Mobile Devices and IoT

1940s to 1989 – Data Warehousing and Personal Desktop Computers

 The origins of electronic storage can be traced back to the development of the world’s
first programmable computer, the Electronic Numerical Integrator and
Computer (ENIAC).
 Then, in the early 1960s, International Business Machines (IBM) released the first
transistorized computer called TRADIC,.
 The first personal desktop computer to feature a Graphical User Interface (GUI) was
Lisa, released by Apple Computers in 1983.
 Throughout the 1980s, companies like Apple, Microsoft, and IBM would release a
wide range of personal desktop computers,

7
Big data Analy cs using R B.com (VI- Sem)

1989 to 1999 – Emergence of the World Wide Web

 Between 1989 and 1993, British computer scientist Sir Tim Berners-Lee would create
the fundamental technologies required to power what we now know as the World
Wide Web.
 These web technologies were HyperText Markup Language (HTML), Uniform
Resource Identifier (URI), and Hypertext Transfer Protocol (HTTP).
 As more devices gained access to the internet, this led to a massive explosion in the
amount of information that people could access and share data at any one time.

2000s to 2010s – Controlling Data Volume, Social Media and Cloud Computing
 During the early 2000s, companies such as Amazon, eBay, and Google helped
generate large amounts of web traffic, as well as a combination of structured and
unstructured data.
 Amazon also launched a beta version of AWS (Amazon Web Services) in 2002,
which opened the Amazon.com platform to all developers. By 2004, over 100
applications were built for it.
 AWS then relaunched in 2006, offering a wide range of cloud infrastructure services,
including Simple Storage Service (S3) and Elastic Compute Cloud (EC2).
 The public launch of AWS attracted a wide range of customers, such as Dropbox,
Netflix, and Reddit, who were eager to become cloud-enabled and so they would all
partner with AWS before 2010.

2010s to now – Optimization Techniques, Mobile Devices and IoT

In the 2010s, the biggest challenges facing big data was the advent of mobile devices and the
IoT (Internet of Things).

The rise of mobile devices and IoT devices also led to new types of data being collected,
organized, and analyzed. Some examples include:

 Sensor Data (data collected by internet-enabled sensors to provide valuable, real-time

insight into the inner workings of a piece of machinery)
 Social Data (publicly available social media data from platforms like Facebook and
Twitter)
 Transactional Data (data from online web stores including receipts, storage records,
and repeat purchases)
 Health-related data (heart rate monitors, patient records, medical history)
The Future of Big Data Solutions.

big data technology is AI (Artificial Intelligence) and automation, both of which are
streamlining the process of database management and big data analysis, making it easier to
convert raw data into meaningful insights that make sense to key decision makers.

Another massive hurdle for big data is ethical concerns.

8
Big data Analy cs using R B.com (VI- Sem)

5) What is the Definition of Big Data? Explain.

• Big data is high-velocity and high-variety information assets that demand cost effective,
innovative forms of information processing for enhanced insight and decision making.
• Big data refers to datasets whose size is typically beyond the storage capacity of and also
complex for traditional database software tools
• Big data is anything beyond the human & technical infrastructure needed to support
storage, processing and analysis.
• It is data that is big in volume, velocity and variety. Refer to figure
1.3

Figure 1.3 Data: Big in volume, variety, and Velocity (Big Data and Analytics)

Variety: Data can be structured data, semi-structured data and unstructured data. Data stored
in a database is an example of structured data.HTML data, XML data, email data,
CSV files are the examples of semi-structured data. Power point presentation, images,
videos, researches, white papers, body of email etc are the examples of unstructured data.
Velocity: Velocity essentially refers to the speed at which data is being created in real- time.
We have moved from simple desktop applications like payroll application to real- time
processing applications.

Volume: Volume can be in Terabytes or Petabytes or Zettabytes. Gartner Glossary Big data
is high-volume, high-velocity and/or high variety information assets that demand cost-
effective, innovative forms of information processing that enable enhanced insight and
decision making.

9
Big data Analy cs using R B.com (VI- Sem)

For the sake of easy comprehension, we will look at the definition in three parts.

Part I of the definition: "Big data is high-volume, high-velocity, and high-variety

information assets" talks about voluminous data (humongous data) that may have great
variety (a good mix of structured, semi-structured. and unstructured data) and will require a
good speed/pace for storage, preparation, processing and analysis.

Part II of the definition: "cost effective, innovative forms of information processing" talks
about embracing new techniques and technologies to capture (ingest), store, process, persist,
integrate and visualize the high volume, high-velocity, and high-variety data.

Part III of the definition: "enhanced insight and decision making" talks about deriving
deeper, richer and meaningful insights and then using these insights to make faster and better
decisions to gain business value and thus a competitive edge.
Data —> Information —> Actionable intelligence —> Better decisions —>Enhanced
business value

Figure 1.4 Definition of big data – Gartner (Big Data and Analytics)

6) What are the Challenges of Big Data?

10
Big data Analy cs using R B.com (VI- Sem)

"Big data is high-volume, high-velocity, and high-variety information assets" talks about
voluminous data (humongous data) that may have great variety (a good mix of structured,
semi-structured. and unstructured data) and will require a good speed/pace for storage,
preparation, processing and analysis.

Following are a few challenges with big data:

Figure 1.5 Challenges with big data (Big Data and Analytics)

Data volume: Data today is growing at an exponential rate. This high tide of data will
continue to rise continuously. The key questions are –
“will all this data be useful for analysis?”,
“Do we work with all this data or subset of it?”,
“How will we separate the knowledge from the noise?” etc.

Storage: Cloud computing is the answer to managing infrastructure for big data as far as
cost-efficiency, elasticity and easy upgrading / downgrading is concerned. This further
complicates the decision to host big data solutions outside the enterprise.

Data retention: How long should one retain this data? Some data may require for log-term
decision, but some data may quickly become irrelevant and obsolete.

Skilled professionals: In order to develop, manage and run those applications that generate
insights, organizations need professionals who possess a high-level proficiency in data
sciences.

Other challenges: Other challenges of big data are with respect to capture, storage, search,
analysis, transfer and security of big data.

11
Big data Analy cs using R B.com (VI- Sem)

Visualization: Big data refers to datasets whose size is typically beyond the storage capacity
of traditional database software tools. There is no explicit definition of how big the data set
should be for it to be considered bigdata. Data visualization(computer graphics) is becoming
popular as a separate discipline. There are very few data visualization experts.

7) Explain about the Business Intelligence vs Big Data Comparison Table?

Business Intelligence vs Big Data Comparison
Below is the comparison below:

Comparison of Business Intelligence Big Data

Objectives
1.Purpose The purpose of Business The main purpose of Big Data
Intelligence is to help the is to capture, process, and
business to make better decisions. analyze the data, both structured
and unstructured to improve
customer outcomes.
2.EcoSystem / Operation systems, ERP Hadoop, Spark, R Server, hive,
Components databases, Data Warehouse, HDFS etc.
Dashboard etc.
3.Tools Below is the list of tools used for Below is the list of tools used in
business intelligence.. Big Data.

 Tableau  Hadoop
 Online analytical  Spark
processing (OLAP)  Hive
 Data Warehousing  Cloudera, etc
 Microsoft Power BI
 Google Analytics etc

4.Characteristics/ Below are the six features of Big data can be described by
Properties Business Intelligence some characteristics such as
Location intelligence, Volume,
Executive Dashboards, Variety,
“what if” analysis, Velocity,
Interactive reports, Veracity and
Metadata layer, and Value.
Ranking reports
5.Benefits Below is the list of benefits of Below is the list of benefits of
Business Intelligence Big Data

 Helps in making better  Better Decision making

business decisions  Fraud detection
 Faster and more accurate  Storage, mining, and
reporting and analysis analysis of data

12
Big data Analy cs using R B.com (VI- Sem)

 Increase revenues  Cost savings

6.Applied Fields Social media, Healthcare, The banking sector,

Gaming Industry, Food Industry Entertainment, and Social
etc media, Healthcare, Retail and
wholesale etc

Both the BI and Big data helps to analyse the data to get the insights and to view the relevant
data.
Business intelligence and Big Data need to be synchronized, need to be used together. They

both are not the same thing, but they share a lot of the same common goals. A lot of the

distinctions between Business intelligence and Big Data tend to be arbitrary.

13
Big data Analy cs using R B.com (VI- Sem)

Short answered questions

1) What is Bigdata? Why to use Bigdata?

Big data is high-volume, high-velocity, and high-variety information assets" talks about
voluminous data (humongous data) that may have great variety (a good mix of structured,
semi-structured. and unstructured data) and will require a good speed/pace for storage,
preparation, processing and analysis.
Some of the benefits of using big data for businesses are:

 Cost savings: Big data tools like Apache Hadoop, Spark, etc. bring cost-saving
benefits to businesses when they have to store large amounts of data.
 Time savings: Real-time in-memory analytics helps companies to collect data from
various sources and process it faster.
 Market understanding: Big data helps companies to understand the market
conditions, customer preferences, trends and opportunities.
 Customer acquisition and retention: Big data helps companies to refine their
marketing campaigns and techniques, provide targeted promotional information, and
improve customer loyalty programs.
 Innovation and product development: Big data helps companies to discover new
sources of revenue, solve problems, and create new products and services based on
customer feedback and demand.
 Competitive advantage: Big data helps companies to gain an edge over their
competitors by leveraging data-driven insights and strategies.

Big data is important for companies because it can help them achieve growth, efficiency,
profitability and customer satisfaction.

2) What is data? Explain about Structured data.

Big data:
Big data is high-velocity and high-variety information assets that demand cost
effective, innovative forms of information processing for enhanced insight and decision
making.
Data:
Data is present in homogeneous sources as well as in heterogeneous sources. The
need of the hour is to understand, manage, process, and take the data for analysis to draw
valuable insights. Digital data can be structured, semi-structured or unstructured data.
Classification of Digital Data:
On the basis of the data received from the sources, big data covers
 Structured Data
 Semi-Structured Data
 Unstructured Data
In a real-world scenario, typically, the unstructured data is larger in volume than the
structured and semi-structured data. Approximately 70% to 80% of data is in unstructured
form.

14
Big data Analy cs using R B.com (VI- Sem)

Structured Data:
 It is organized data in a predefined format.
 It is stored in tabular form.
 The Number of rows/records/tuples in a relation is called the Cardinality of a
relation.
 The number of columns in a relation is called as Degree of a relation.
 It is the data that resides in fixed fields within a record or file.
 It is formatted data that has entities and their attributes mapped.
 It is used to query and report against predetermined data types.
Sources of Structured Data:
 Relational Databases.
 RDBMS-(Oracle corporation, IBM-DB2, Microsoft-Microsoft SQL Server,
EMCGreenplum, Teradata- Teradata, MySql (Open Source), PostgreSQL (Advanced
Open Source) etc.
 Flat file in the form of records (like Comma Separated Values (.CSV) and tabseparated
files).
 Multidimensional databases (Major used in data warehouses technology.
 Legacy databases.
Sample of Structured Data:

Customer ID Name Product ID City State

12365 Smith 241 Graz Styria

23658 Jack 365 Wolfsberg Carinthia

32456 Kady 421 Enns Upper Austria.

3) What is data? Explain about Semi-Structured data.

 Semi-Structured Data
 Unstructured Data
In a real-world scenario, typically, the unstructured data is larger in volume than the
structured and semi-structured data. Approximately 70% to 80% of data is in unstructured
form.
Semi-Structured Data:
 Semi-structured data is also known a schema-less or self-describing structure.
 It does not have the data model.
 It refers to a form of structured data that contains tags or mark-up elements in order to
separate element and generate hierarchies of records and fields in the given data.
 Such type of data does not follow the proper structure of data models as in relational
databases.
Sources of Semi-Structured Data:
 File-systems such as Web Data in the form of Cookies.
 Data Exchange formats such as JavaScript Object Notation (JSON) data.
 XML Stands for eXtensible Markup Language. It is hugely popularized by web
services developed utilizing the Simple Object Notation Principles (SOAM).
 JSON Stands for Java Script Object Notation. It is used to transmit data between a
server and a web application. JSON is popularized by web services developed
utilizing the REpresentational State Transfer (REST).
 MongoDB is open source, Distributed.
 NoSQL is Document oriented database.
 Couhbase is originally known as Membase open source, distributed.
 NoSQL store data natively in JSON format.
4) What is data? Explain about Unstructured data.
Big data:
Big data is high-velocity and high-variety information assets that demand cost
effective, innovative forms of information processing for enhanced insight and decision
making.
Data:
Data is present in homogeneous sources as well as in heterogeneous sources. The
need of the hour is to understand, manage, process, and take the data for analysis to draw
valuable insights. Digital data can be structured, semi-structured or unstructured data.
Classification of Digital Data:
On the basis of the data received from the sources, big data covers
 Structured Data
 Semi-Structured Data
 Unstructured Data
In a real-world scenario, typically, the unstructured data is larger in volume than the
structured and semi-structured data. Approximately 70% to 80% of data is in unstructured
form.

16
Big data Analy cs using R B.com (VI- Sem)

Unstructured Data:
 Unstructured data does not have any logical structure or pre-defined data model.
Sources of Unstructured Data:
 It contains metadata. (Additional information related to the data).
 It contains inconsistent data, such as data obtained from files, social media websites,
satellites etc.,
 It contains data in different formats such as emails, text, audio, video, or images.
 Social Media: data obtained from social networking platforms, including youtube,
Facebook, Twitter, Linkedln, and Flickr.
 Mobile Data: data such as text message and location information.
Note: About 80 % percent of enterprise data consist of Unstructured data.

Printers Presentation
100% (1)
Printers Presentation
17 pages
Surprise For The Sniper - Sienna Trap
No ratings yet
Surprise For The Sniper - Sienna Trap
323 pages
BBM en-GB 2015.4
100% (2)
BBM en-GB 2015.4
476 pages
ADS Chapter 4 Concurrency Control Techniques
No ratings yet
ADS Chapter 4 Concurrency Control Techniques
36 pages
Topical Notes by Chapter For IGCSE Biology
100% (1)
Topical Notes by Chapter For IGCSE Biology
23 pages
Data Structure Multiple Choice Questions & Answers PDF: Ans: C Linked List
100% (1)
Data Structure Multiple Choice Questions & Answers PDF: Ans: C Linked List
3 pages
SQL MCQ
No ratings yet
SQL MCQ
11 pages
Unit 2 Searching and Sorting
No ratings yet
Unit 2 Searching and Sorting
83 pages
Advanced Database Notes
50% (2)
Advanced Database Notes
21 pages
Elementary Data Organisation
No ratings yet
Elementary Data Organisation
13 pages
VLSI System Design
No ratings yet
VLSI System Design
91 pages
Dbms Lab File
100% (1)
Dbms Lab File
30 pages
"Stepper Motor Control Using Arduino": Minor Project
No ratings yet
"Stepper Motor Control Using Arduino": Minor Project
22 pages
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100% (1)
SIS ESD Sistems For Process Industries Using IEC 61508 Unit7 SIL Selection
100 pages
Integrity and Security in DBMS
100% (1)
Integrity and Security in DBMS
58 pages
What Is Relational Model
No ratings yet
What Is Relational Model
7 pages
FDB For Exit Exam
No ratings yet
FDB For Exit Exam
284 pages
Chat Application Using Java
No ratings yet
Chat Application Using Java
10 pages
Quiz 5v
No ratings yet
Quiz 5v
3 pages
Chapter 01 Notes
No ratings yet
Chapter 01 Notes
11 pages
Spare Parts Specification: Especificacion de Repuestos
No ratings yet
Spare Parts Specification: Especificacion de Repuestos
223 pages
Ch#22 TRANSACTION - MANAGEMENT
No ratings yet
Ch#22 TRANSACTION - MANAGEMENT
80 pages
Assignment Data Science
No ratings yet
Assignment Data Science
3 pages
Business Information Systems Set 1
No ratings yet
Business Information Systems Set 1
6 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
8 pages
Features of OOP
100% (1)
Features of OOP
7 pages
The Evolution of Database Management System
No ratings yet
The Evolution of Database Management System
2 pages
Lecture Notes - Unit I: EID 453 Design Patterns 4/4 B.Tech (CSE B3)
No ratings yet
Lecture Notes - Unit I: EID 453 Design Patterns 4/4 B.Tech (CSE B3)
11 pages
Practical Exam July2023-15!6!23
No ratings yet
Practical Exam July2023-15!6!23
129 pages
Jimma University JIT School of Computing Advanced Database System Lab
100% (1)
Jimma University JIT School of Computing Advanced Database System Lab
70 pages
Unit 4 RMI Some Questions and Answers
No ratings yet
Unit 4 RMI Some Questions and Answers
23 pages
Chapter 8 Database Administration and Security
100% (1)
Chapter 8 Database Administration and Security
21 pages
CS304 Object Oriented Programming Final Term of 2012 Subjectives Solved With References by Moaaz
0% (1)
CS304 Object Oriented Programming Final Term of 2012 Subjectives Solved With References by Moaaz
26 pages
Previews 1928680 Pre
No ratings yet
Previews 1928680 Pre
7 pages
Data Structure Previous Question Paper
100% (1)
Data Structure Previous Question Paper
2 pages
ECS-701 (Distributed System) - Syllabus
0% (1)
ECS-701 (Distributed System) - Syllabus
1 page
D) All of The Above A) AVL Tree
No ratings yet
D) All of The Above A) AVL Tree
26 pages
GAs - BOUNDARY WALL-S77 BOUNDARY WALLS
100% (1)
GAs - BOUNDARY WALL-S77 BOUNDARY WALLS
1 page
CH 1
100% (1)
CH 1
29 pages
UNIT I Unit V e Commerce 2 Marks
75% (4)
UNIT I Unit V e Commerce 2 Marks
21 pages
Database Notes
No ratings yet
Database Notes
40 pages
Pps Answer Key Final Paper Nmims
No ratings yet
Pps Answer Key Final Paper Nmims
13 pages
Requirements Modeling
No ratings yet
Requirements Modeling
39 pages
Operating Systems - Unit-1
No ratings yet
Operating Systems - Unit-1
28 pages
Object Oriented Databases
No ratings yet
Object Oriented Databases
12 pages
Distributed Database Transparency Features
No ratings yet
Distributed Database Transparency Features
6 pages
Commercial Radio Operators
100% (1)
Commercial Radio Operators
11 pages
TYCS - Data Science MCQ
No ratings yet
TYCS - Data Science MCQ
6 pages
Eliciting Requirements
No ratings yet
Eliciting Requirements
20 pages
CS6601 Distributed System Question Bank
100% (2)
CS6601 Distributed System Question Bank
5 pages
2020 Proposal
No ratings yet
2020 Proposal
14 pages
Carriageway 4.5 M: BSC BBC CAB GSB 2.5% 2.5% 4% 4%
No ratings yet
Carriageway 4.5 M: BSC BBC CAB GSB 2.5% 2.5% 4% 4%
1 page
Database Design
No ratings yet
Database Design
97 pages
Abrahams and McMinns Clinical Atlas of Human Anatomy 1st edition by Peter Abrahams, Jonathan Spratt, Marios Loukas, Albert VanSchoor 0702073350 9780702073359 - The ebook in PDF/DOCX format is available for instant download
100% (6)
Abrahams and McMinns Clinical Atlas of Human Anatomy 1st edition by Peter Abrahams, Jonathan Spratt, Marios Loukas, Albert VanSchoor 0702073350 9780702073359 - The ebook in PDF/DOCX format is available for instant download
28 pages
Chapter 3. Big Data Adoption and Planning Considerations
No ratings yet
Chapter 3. Big Data Adoption and Planning Considerations
70 pages
Emphasis On Sustainable & Regenerative Farming Methods
No ratings yet
Emphasis On Sustainable & Regenerative Farming Methods
48 pages
OMGT Week 13 & 14 PPT (With EOQ Problem)
No ratings yet
OMGT Week 13 & 14 PPT (With EOQ Problem)
46 pages
Dbms Lesson Plan
No ratings yet
Dbms Lesson Plan
11 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
36 pages
Angol Nyelvi Próbafelvételi
No ratings yet
Angol Nyelvi Próbafelvételi
10 pages
SIROLL ALU en
No ratings yet
SIROLL ALU en
28 pages
Hydrocarbon Solutions
No ratings yet
Hydrocarbon Solutions
26 pages
Final DB Systems Exam June 2020
No ratings yet
Final DB Systems Exam June 2020
8 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Query Language
No ratings yet
Query Language
44 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Ch.2 Sine and Cosine Rule & Ch.6 Radian Measure
No ratings yet
Ch.2 Sine and Cosine Rule & Ch.6 Radian Measure
33 pages
Magazine4 09
No ratings yet
Magazine4 09
24 pages
Continuous System Simulation
No ratings yet
Continuous System Simulation
40 pages
Redemption - Batch - 3 11 24 To 3 15 24
No ratings yet
Redemption - Batch - 3 11 24 To 3 15 24
4 pages
Sybca Bigdata MCQ
No ratings yet
Sybca Bigdata MCQ
7 pages
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
No ratings yet
Specification 201 Quality Systems 14 April 2016.RCN-D1623234100
59 pages
WT Lab Manual
No ratings yet
WT Lab Manual
41 pages
2024 Acuvue Price List
No ratings yet
2024 Acuvue Price List
2 pages
Case Study On Relational Data Base Design
No ratings yet
Case Study On Relational Data Base Design
22 pages
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
No ratings yet
2.1: Advanced Processor Technology: Qn:Explain Design Space of Processor?
29 pages
Grade 5 - Week 13 - Science Questions
No ratings yet
Grade 5 - Week 13 - Science Questions
4 pages
Dk014: C-7: Chemical Bonding (Non-F2F) : NAM@chemistryunitkmk
No ratings yet
Dk014: C-7: Chemical Bonding (Non-F2F) : NAM@chemistryunitkmk
2 pages
Serializability
No ratings yet
Serializability
26 pages
Introduction To Programming in C++ Summer II 2002: Midterm Test
No ratings yet
Introduction To Programming in C++ Summer II 2002: Midterm Test
13 pages
Bca 3 y Imp Question Python
No ratings yet
Bca 3 y Imp Question Python
2 pages
Visit Egypt Daily Life KS2
No ratings yet
Visit Egypt Daily Life KS2
20 pages
DDBMS True False
No ratings yet
DDBMS True False
7 pages
DBMS Self Notes CHP 1
No ratings yet
DBMS Self Notes CHP 1
7 pages
Pa6 GF20 - RTP Company RTP Pa6 20 GF
No ratings yet
Pa6 GF20 - RTP Company RTP Pa6 20 GF
1 page
Emergency Light
No ratings yet
Emergency Light
2 pages
Dbmsmcqs
No ratings yet
Dbmsmcqs
13 pages
EC2303 Computer Architecture and Organization QUESTION PAPER
No ratings yet
EC2303 Computer Architecture and Organization QUESTION PAPER
4 pages
Assignment - I
No ratings yet
Assignment - I
5 pages

BIG Data Analytics

Uploaded by

BIG Data Analytics

Uploaded by

Big data Analy cs using R B.

com (VI- Sem)

Data, classification Of Digital Data--structured, unstructured, semi-structured data,

Classiﬁca on of digital Data

Structured Data Semi- structured Unstructured

Fig. Classification of Digital data

 Data stored in databases is an example of structured data.

2) What are the characteristics of data ?

Data has three key characteristics:

Fig. Characteristics of data (Big Data and Analytics)

 Big data is about complexity.

3) What are the Big Data Characteristics?

5 V's of Big Data

The data is categorized as below:

For example, Facebook posts with hashtags.

4) What is the Big Data? Evolution of Big Data?

three primary components still in use today to describe big data:

 1970s and before was the era of mainframes.

Sno Year of Duration Technology

1 1940 to 1989 Data Warehousing and Personal Desktop Computers

2 1989 to 1999 Emergence of the World Wide Web

3 2000s to 2010s Controlling Data Volume, Social Media and Cloud

4 2010s to now Optimization Techniques, Mobile Devices and IoT

1940s to 1989 – Data Warehousing and Personal Desktop Computers

1989 to 1999 – Emergence of the World Wide Web

2010s to now – Optimization Techniques, Mobile Devices and IoT

 Sensor Data (data collected by internet-enabled sensors to provide valuable, real-time

Another massive hurdle for big data is ethical concerns.

5) What is the Definition of Big Data? Explain.

Part I of the definition: "Big data is high-volume, high-velocity, and high-variety

6) What are the Challenges of Big Data?

Following are a few challenges with big data:

7) Explain about the Business Intelligence vs Big Data Comparison Table?

Comparison of Business Intelligence Big Data

 Helps in making better  Better Decision making

 Increase revenues  Cost savings

6.Applied Fields Social media, Healthcare, The banking sector,

distinctions between Business intelligence and Big Data tend to be arbitrary.

Short answered questions

1) What is Bigdata? Why to use Bigdata?

2) What is data? Explain about Structured data.

Customer ID Name Product ID City State

12365 Smith 241 Graz Styria

23658 Jack 365 Wolfsberg Carinthia

32456 Kady 421 Enns Upper Austria.

3) What is data? Explain about Semi-Structured data.

You might also like