0% found this document useful (0 votes)

27 views10 pages

BIG Data 1

Uploaded by

navata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views10 pages

BIG Data 1

Uploaded by

navata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT - 1

1) What is big data? Explain healthcare and credit risk management

A) Big Data is a term used for a collection of data sets that are large and complex, which is
difficult to store and process using available database management tools or traditional data
processing applications. The challenge includes capturing, curating, storing, searching, sharing,
transferring, analyzing and visualization of this data.

 It refers to a massive amount of data that keeps on growing exponentially with time.

 It is so voluminous that it cannot be processed or analyzed using conventional data

processing techniques.

 It includes data mining, data storage, data analysis, data sharing, and data visualization.

 The term is an all-comprehensive one including data, data frameworks, along with the
tools and techniques used to process and analyze the data.

Big Data Applications: Healthcare

 The level of data generated within healthcare systems is not trivial.

 Traditionally, the health care industry lagged in using Big Data, because of limited ability
to standardize and consolidate data.
 But now Big data analytics have improved healthcare by providing personalized
medicine and prescriptive analytics.
 Researchers are mining the data to see what treatments are more effective for
particular conditions, identify patterns related to drug side effects, and gains other
important information that can help patients and reduce costs.
 With the added adoption of mHealth, eHealth and wearable technologies the volume of
data is increasing at an exponential rate.
 This includes electronic health record data, imaging data, patient generated data, sensor
data, and other forms of data.
 By mapping healthcare data with geographical data sets, it’s possible to predict disease
that will escalate in specific areas.
 Based of predictions, it’s easier to strategize diagnostics and plan for stocking serums
and vaccines.

[1]
Big Data in Banking Sector

Credit Risk Management

 Establishing a robust risk management system is of utmost importance for banking

organizations or else they have to suffer from huge revenue losses.
 To stay alive in the competitive world and increase their profit as much as they can,
organizations have to keep innovating new things.
 Through Big Data Analysis, firms can detect risk in real-time and apparently saving the
customer from potential fraud.
 The rapidly growing digital world is furnishing us with numerous benefits but on the
other hand, gives birth to various kinds of frauds as well.
 Our personal data is now more vulnerable to cyber attacks than ever before and it is the
biggest challenge a banking organization faces.
 Employing Big Data Analytics with some Machine Learning Algorithms, organizations are
now able to detect frauds before they can be placed.
 This is done by identifying unfamiliar spending patterns of the user, predicting unusual
activities of the user, etc.

2) What is big Data? Explain Four V’s of big data?

Big Data Characteristics

The five characteristics that define Big Data are: Volume, Velocity, Variety, Veracity and Value.

1. VOLUME

Volume refers to the ‘amount of data’, which is growing day by day at a very fast pace. The size
of data generated by humans, machines and their interactions on social media itself is massive.
Researchers have predicted that 40 Zettabytes (40,000 Exabytes) will be generated by 2020,
which is an increase of 300 times from 2005.

2. VELOCITY

Velocity is defined as the pace at which different sources generate the data every day. This flow
of data is massive and continuous. There are 1.03 billion Daily Active Users (Facebook DAU) on
Mobile as of now, which is an increase of 22% year-over-year. This shows how fast the numbers
of users are growing on social media and how fast the data is getting generated daily. If you are
able to handle the velocity, you will be able to generate insights and take decisions based on
real-time data.

[2]
3. VARIETY

As there are many sources which are contributing to Big Data, the type of data they are
generating is different. It can be structured, semi-structured or unstructured. Hence, there is a
variety of data which is getting generated every day. Earlier, we used to get the data from excel
and databases, now the data are coming in the form of images, audios, videos, sensor data etc.
as shown in below image. Hence, this variety of unstructured data creates problems in
capturing, storage, mining and analyzing the data.

4. VERACITY

Veracity refers to the data in doubt or uncertainty of data available due to data inconsistency
and incompleteness. In the image below, you can see that few values are missing in the table.
Also, a few values are hard to accept, for example – 15000 minimum value in the 3rd row, it is
not possible. This inconsistency and incompleteness is Veracity.
Data available can sometimes get messy and maybe difficult to trust. With many forms of big
data, quality and accuracy are difficult to control like Twitter posts with hashtags,
abbreviations, typos and colloquial speech. The volume is often the reason behind for the lack
of quality and accuracy in the data.

 Due to uncertainty of data, 1 in 3 business leaders don’t trust the information

they use to make decisions.

 It was found in a survey that 27% of respondents were unsure of how much of
their data was inaccurate.

 Poor data quality costs the US economy around $3.1 trillion a year.

5. VALUE

After discussing Volume, Velocity, Variety and Veracity, there is another V that should be taken
into account when looking at Big Data i.e. Value. It is all well and good to have access to
big data but unless we can turn it into value it is useless. By turning it into value I mean, Is it
adding to the benefits of the organizations who are analyzing big data? Is
the organization working on Big Data achieving high ROI (Return On Investment)? Unless, it
adds to their profits by working on Big Data, it is useless.

[3]
3) Types of Big Data

Big Data could be of three types:

 Structured

 Semi-Structured

 Unstructured

1. Structured

The data that can be stored and processed in a fixed format is called as Structured Data. Data
stored in a relational database management system (RDBMS) is one example of ‘structured’
data. It is easy to process structured data as it has a fixed schema. Structured Query Language
(SQL) is often used to manage such kind of Data.

2. Semi-Structured

Semi-Structured Data is a type of data which does not have a formal structure of a data model,
i.e. a table definition in a relational DBMS, but nevertheless it has some organizational
properties like tags and other markers to separate semantic elements that make it easier to
analyze. XML files or JSON documents are examples of semi-structured data.

3. Unstructured

The data which have unknown form and cannot be stored in RDBMS and cannot be analyzed
unless it is transformed into a structured format is called as unstructured data. Text Files and
multimedia contents like images, audios, and videos are example of unstructured data. The
unstructured data is growing quicker than others; experts say that 80 percent of the data in an
organization are unstructured.

4) Source of Big Data..?

Sources of big data

Technology today allows us to collect data at an astounding rate--both in terms of volume and
variety. There are various sources that generate data, but in the context of big data, the
primary sources are as follows:

 Social networks: Arguably, the primary source of all big data that we know of today is
the social networks that have proliferated over the past 5-10 years. This is by and large
unstructured data that is represented by millions of social media postings and other
data that is generated on a second-by-second basis through user interactions on the

[4]
web across the world. Increase in access to the internet across the world has been a
self-fulfilling act for the growth of data in social networks.

 Media: Largely a result of the growth of social networks, media represents the millions,
if not billions, of audio and visual uploads that take place on a daily basis. Videos
uploaded on YouTube, music recordings on SoundCloud, and pictures posted on
Instagram are prime examples of media, whose volume continues to grow in an
unrestrained manner.

 Data warehouses: Companies have long invested in specialized data storage facilities
commonly known as data warehouses. A DW is essentially collections of historical data
that companies wish to maintain and catalog for easy retrieval, whether for internal use
or regulatory purposes. As industries gradually shift toward the practice of storing data
in platforms such as Hadoop and NoSQL, more and more companies are moving data
from their pre-existing data warehouses to some of the newer technologies. Company
emails, accounting records, databases, and internal documents are some examples of
DW data that is now being offloaded onto Hadoop or Hadoop-like platforms that
leverage multiple nodes to provide a highly-available and fault-tolerant platform.

 Sensors: A more recent phenomenon in the space of big data has been the collection of
data from sensor devices. While sensors have always existed and industries such as oil
and gas have been using drilling sensors for measurements at oil rigs for many decades,
the advent of wearable devices, also known as the Internet Of Things such as Fitbit and
Apple Watch, meant that now each individual could stream data at the same rate at
which a few oil rigs used to do just 10 years back.

5) Applications of Big Data..?

We cannot talk about data without talking about the people, people who are getting benefited
by Big Data applications. Almost all the industries today are leveraging Big Data applications in
one or the other way.

 Smarter Healthcare: Making use of the petabytes of patient’s data, the organization can
extract meaningful information and then build applications that can predict the
patient’s deteriorating condition in advance.

 Telecom: Telecom sectors collects information analyzes it and provide solutions to

different problems. By using Big Data applications, telecom companies have been able
to significantly reduce data packet loss, which occurs when networks are overloaded,
and thus, providing a seamless connection to their customers.

[5]
 Retail: Retail has some of the tightest margins, and is one of the greatest beneficiaries
of big data. The beauty of using big data in retail is to understand consumer behavior.
Amazon’s recommendation engine provides suggestion based on the browsing history
of the consumer.

 Traffic control: Traffic congestion is a major challenge for many cities globally. Effective
use of data and sensors will be key to managing traffic better as cities become
increasingly densely populated.

 Manufacturing: Analyzing big data in the manufacturing industry can reduce component
defects, improve product quality, increase efficiency, and save time and money.

 Search Quality: Every time we are extracting information from google, we are
simultaneously generating data for it. Google stores this data and uses it to improve its
search quality.

6) Traditional Approach:

In past we used to deal bigdata with this approach, in traditional approach we will have
a computer to store and process big data. Here data will be stored in an RDBMS like
Oracle Database, MS SQL Server or DB2 and sophisticated softwares can be written to
interact with the database, process the required data and present it to the users for
analysis purpose.

Limitation
This approach works well where we have less volume of data that can be
accommodated by standard database servers, or up to the limit of the processor which
is processing the data. But when it comes to dealing with huge amounts of data, it is
really a tedious task to process such data through a traditional database server.

Google’s Solution

[6]
Google solved this problem using an algorithm called MapReduce. This algorithm divides
the task into small parts and assigns those parts to many computers connected over the
network, and collects the results to form the final result dataset.

Above diagram shows various commodity hardwares which could be single CPU
machines or servers with higher capacity.
Hadoop:
Doug Cutting, Mike Cafarella and team took the solution provided by Google and
started an Open Source Project called HADOOP in 2005 and Doug named it after his
son’s toy elephant. Now Apache Hadoop is a registered trademark of the Apache
Software Foundation.
Hadoop runs applications using the MapReduce algorithm, where the data is processed
in parallel on different CPU nodes. In short, Hadoop framework is capable enough to
develop applications capable of running on clusters of computers and they could
perform complete statistical analysis for a huge amount of data.

[7]
7) Explain Core Hadoop Architecture .?

Hadoop is an open source framework from Apache and is used to store process and analyze
data which are very huge in volume. Hadoop is written in Java and is not OLAP (online analytical
processing). It is used for batch/offline processing.It is being used by Facebook, Yahoo, Google,
Twitter, LinkedIn and many more. Moreover it can be scaled up just by adding nodes in the
cluster.

Modules of Hadoop

1. HDFS: Hadoop Distributed File System. Google published its paper GFS and on the basis
of that HDFS was developed. It states that the files will be broken into blocks and stored
in nodes over the distributed architecture.

2. Yarn: Yet another Resource Negotiator is used for job scheduling and manage the
cluster.

3. Map Reduce: This is a framework which helps Java programs to do the parallel
computation on data using key value pair. The Map task takes input data and converts it
into a data set which can be computed in Key value pair. The output of Map task is
consumed by reduce task and then the out of reducer gives the desired result.

4. Hadoop Common: These Java libraries are used to start Hadoop and are used by other
Hadoop modules.

[8]
Big Data Vs Cloud Computing (Major Differences)

Let’s see 8 major differences between Big Data and Cloud Computing:

1) Concept

In cloud computing, we can store and retrieve the data from anywhere at any time. Whereas,
big data is the large set of data which will process to extract the necessary information.

2) Characteristics

Cloud Computing provides the service over the internet which can be:

 Software as a Service (SaaS)

 Platform as a Service (PaaS)

 Infrastructure as a Service (IaaS)

Whereas, there are some important characteristics of Big data which can lead to strategic
business moves and they are Velocity, Variety, and Volume.

3) Accessibility

Cloud Computing provides universal access to the services. Whereas, Big data solves technical
problems and provides better results.

4) When to use

A customer can shift to Cloud Computing when they need rapid deployment and scaling of the
applications. The application deals with highly sensitive data and requires strict compliance one
should keep things on the cloud.

Whereas, we can use Big Data for traditional methods and here frameworks are ineffective. Big
data is not replacement for relational database system and big data solve specific problem
statement related to large data sets and most of the large data sets do not deal with small data.

5) Cost

Cloud Computing is economical as it has low maintenance costs centralized platform no upfront
cost and disaster safe implementation. Whereas, Big data is highly scalable, robust ecosystem,
and cost-effective.

[9]
6) Job roles and responsibility

The user of the cloud is the developers or office worker in an organization. Whereas, there is
big data analyst in big data which are responsible for analyzing the date of filing interesting and
sites and possible future trends.

7) Types and trends

Cloud Computing includes three types which are:

 Public Cloud

 Private Cloud

 Hybrid Cloud

 Community Cloud

Whereas, some important trends in Big Data Technology is Hadoop, MapReduce, and HDFS.

8) Vendors

Some of the vendors and solution providers of Cloud Computing are

 Google

 Amazon Web Service

 Microsoft

 Dell

 Apple

 IBM

Whereas, some of the vendors and solution providers of big data are

 Cloudera

 Hortonworks

 Apache

 MapR

[10]

212-82 V12.95
No ratings yet
212-82 V12.95
92 pages
Sns College of Engineering: Big Data Analytics
No ratings yet
Sns College of Engineering: Big Data Analytics
17 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Hadoop 2 & 3 Units Final
No ratings yet
Hadoop 2 & 3 Units Final
27 pages
Big Data
No ratings yet
Big Data
7 pages
Unit 5
No ratings yet
Unit 5
63 pages
Big Data 101
No ratings yet
Big Data 101
18 pages
BD 1
No ratings yet
BD 1
15 pages
BDAV Question Bank Solution
No ratings yet
BDAV Question Bank Solution
63 pages
BDA GTU Study Material E-Notes All-Units 03122021014217PM
No ratings yet
BDA GTU Study Material E-Notes All-Units 03122021014217PM
42 pages
Unit 1 BDM
No ratings yet
Unit 1 BDM
49 pages
Bda Chapter 1 Techneo
No ratings yet
Bda Chapter 1 Techneo
27 pages
Big Data-Intro
No ratings yet
Big Data-Intro
31 pages
Bda PST
No ratings yet
Bda PST
11 pages
Unit 1
No ratings yet
Unit 1
56 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
Bda Mse
No ratings yet
Bda Mse
62 pages
Lec 1 - Introduction To Big Data
No ratings yet
Lec 1 - Introduction To Big Data
37 pages
Big Data
No ratings yet
Big Data
7 pages
BDM 1
No ratings yet
BDM 1
37 pages
Unit 1
No ratings yet
Unit 1
44 pages
Unit 1
No ratings yet
Unit 1
21 pages
Unit 1
No ratings yet
Unit 1
56 pages
BigData UNIT-1
No ratings yet
BigData UNIT-1
19 pages
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
No ratings yet
Advanced Analytics: What Is Big Data Analytics? Definition, Benefits, and More
13 pages
Big Data Characteristics
No ratings yet
Big Data Characteristics
4 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
11 pages
Unit 1
No ratings yet
Unit 1
107 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Big Data Analytics
No ratings yet
Big Data Analytics
127 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
BDA Unit 1
No ratings yet
BDA Unit 1
28 pages
Unit I Introduction To Big Data
No ratings yet
Unit I Introduction To Big Data
36 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Ds Unit-1
No ratings yet
Ds Unit-1
19 pages
Unit 2
No ratings yet
Unit 2
35 pages
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Unit 1 (Big Data Analytics)
No ratings yet
Unit 1 (Big Data Analytics)
11 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Unit 6
No ratings yet
Unit 6
24 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
Bda U1
No ratings yet
Bda U1
78 pages
Big Data
No ratings yet
Big Data
14 pages
Lecture 1: Big Data Challenges and Overview: Extracted From
No ratings yet
Lecture 1: Big Data Challenges and Overview: Extracted From
26 pages
Data, Big
No ratings yet
Data, Big
90 pages
Basics of Big Data
No ratings yet
Basics of Big Data
14 pages
Big Data SKN
No ratings yet
Big Data SKN
24 pages
Partiiunit5characteristics of Big Data and Data Analytics
No ratings yet
Partiiunit5characteristics of Big Data and Data Analytics
6 pages
BDA-1st Unit
No ratings yet
BDA-1st Unit
39 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Unit 4 Mfcs
No ratings yet
Unit 4 Mfcs
27 pages
Unit4 2
No ratings yet
Unit4 2
21 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Deep Learning Ascs
No ratings yet
Deep Learning Ascs
10 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Aiml Micro Project DBMS
No ratings yet
Aiml Micro Project DBMS
14 pages
Probabilistic Reasoning: Unit-V
No ratings yet
Probabilistic Reasoning: Unit-V
33 pages
Insurance Software Solutions
No ratings yet
Insurance Software Solutions
8 pages
Esaimen Ooop
No ratings yet
Esaimen Ooop
9 pages
Catalogo Serie E1049
No ratings yet
Catalogo Serie E1049
12 pages
System Requirements
No ratings yet
System Requirements
1 page
The Apogee AD-8000 8-Channel, 24-Bit Converter
No ratings yet
The Apogee AD-8000 8-Channel, 24-Bit Converter
6 pages
SonicOS Enhanced 5.1 Log Event Reference Guide
No ratings yet
SonicOS Enhanced 5.1 Log Event Reference Guide
56 pages
DBDM Lecture Notes
No ratings yet
DBDM Lecture Notes
242 pages
Smart Fire Alarm System Using Arduino
100% (1)
Smart Fire Alarm System Using Arduino
2 pages
Permasense Brochure V4.3
No ratings yet
Permasense Brochure V4.3
8 pages
Quick Installation Guide: HD Ultra-Wide View Wi-Fi Camera Dcs-960L
No ratings yet
Quick Installation Guide: HD Ultra-Wide View Wi-Fi Camera Dcs-960L
48 pages
Data Sheet 6ES7331-7NF00-0AB0: Input Current
No ratings yet
Data Sheet 6ES7331-7NF00-0AB0: Input Current
3 pages
What Is Epub 3 Matt Garrish
No ratings yet
What Is Epub 3 Matt Garrish
29 pages
Building Joints
No ratings yet
Building Joints
11 pages
Brochure SpeedCast
No ratings yet
Brochure SpeedCast
16 pages
CHAPTER 4: Indices, Surds and Logarithms
100% (1)
CHAPTER 4: Indices, Surds and Logarithms
13 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
3 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
11 pages
DTN Tutorial
No ratings yet
DTN Tutorial
11 pages
8th STD - Maths - Qtrly Exam - Sep 2021 - 22 Online - 20.09.2021
No ratings yet
8th STD - Maths - Qtrly Exam - Sep 2021 - 22 Online - 20.09.2021
2 pages
?? ???? ???????? PDF
No ratings yet
?? ???? ???????? PDF
12 pages
PREPOSITIONS OF PLACE - Quizizz
No ratings yet
PREPOSITIONS OF PLACE - Quizizz
6 pages
睿能全电脑普通袜机使用说明书10寸屏英文版
No ratings yet
睿能全电脑普通袜机使用说明书10寸屏英文版
45 pages
0702LS Infineon PDF
No ratings yet
0702LS Infineon PDF
12 pages
Unit 1
No ratings yet
Unit 1
10 pages
PAI (21AI54) Module 2 Notes
No ratings yet
PAI (21AI54) Module 2 Notes
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages

BIG Data 1

Uploaded by

BIG Data 1

Uploaded by

UNIT - 1

1) What is big data? Explain healthcare and credit risk management

 It is so voluminous that it cannot be processed or analyzed using conventional data

Big Data Applications: Healthcare

 The level of data generated within healthcare systems is not trivial.

Credit Risk Management

 Establishing a robust risk management system is of utmost importance for banking

2) What is big Data? Explain Four V’s of big data?

Big Data Characteristics

 Due to uncertainty of data, 1 in 3 business leaders don’t trust the information

Big Data could be of three types:

4) Source of Big Data..?

Sources of big data

5) Applications of Big Data..?

 Telecom: Telecom sectors collects information analyzes it and provide solutions to

 Software as a Service (SaaS)

 Platform as a Service (PaaS)

 Infrastructure as a Service (IaaS)

7) Types and trends

Cloud Computing includes three types which are:

Some of the vendors and solution providers of Cloud Computing are

 Amazon Web Service

You might also like