0% found this document useful (0 votes)

14 views56 pages

$R3N9XOZ

The document outlines a course on Big Data Analysis, detailing its objectives, content, grading structure, and the significance of big data in today's digital landscape. It covers the characteristics of big data, its challenges, and various types of big data analytics, including descriptive, diagnostic, predictive, and prescriptive analytics. The course aims to equip students with both theoretical knowledge and practical skills necessary for effective big data analysis.

Uploaded by

petergamal126

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views56 pages

$R3N9XOZ

Uploaded by

petergamal126

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Big Data Analysis

Dr. Maryam Hazman

Content

 Course Description
 Introduction to Big Data
Course Description

The aims of course:

The aim of this course is to provide the
students with theoretical and practical skills
related to big data analysis process.
Course Description
Course Consents:

 Basic concept in big data

 Cloud computing
 Introduction to big data analytics
 Introduction to Hadoop technology
 MapReduce
 Revision
 Final exam
Course Description

Grading (100%):
 Final exam 70
 Mid-term exam 10
 Practice exam 10
 Course work 10

Timing:
 Lecture 3
 Practice 3
Introduction

Buying Online
Introduction

What about offline buying???

Introduction
Personalized experience needs a lot of data must be collected

 Shopping cart
 Wish list and Previous purchases
 Items rated and reviewed
 Geo-location
 Time-on-site and Duration of views
 Links clicked & Text Searched
 Telephone inquiries
 Responses to marketing materials
 Social media posting
Introduction

Customer Data
Introduction

Customer Data
Introduction
Introduction
Introduction
Introduction

What has changed to make digital tech so useful today

What is a big data?

Big Data

 is a volume of both structured and unstructured data

 is so large, it is difficult to process using traditional database and software

techniques.

 its volume of data is too big.

 Moves too fast and exceeds current processing capacity

What is a big data?
 Big data is a term for a collection of data sets, so large and complex that
it becomes often difficult to process using traditional data processing
applications.

Large amounts of different types of data produced from various types of

sources, such as

 People,
 Machines or
 Sensors.
What is a big data?
The Big Data Framework organization attempts to categories the development of Big data to
three main phases;

 Phase 1.0 (1970-2000): Big data was mainly described by the data storage and analytics, and it
was an extension to the modern database management systems and data warehousing technologies;

 Phase 2.0 (2000-2010): with the uprising of Web 2.0, and the propagation of semi-structured and
unstructured content, the notion of Big data has changed to embody advanced technical solutions to
extract meaningful information from dissimilar and heterogeneous data formats;

 Phase 3.0 (2010-now): with the emergence of smartphones and mobile devices, sensor data,
wearable devices, Internet of Things (IoT), to many more data generators, Big Data has entered a new
era and has drawn a new horizon with a new range of opportunities
What is a big data?
Big Data Characteristics
Big Data Characteristics
Big data V-features
Big Data Characteristics

 The following is a brief discussion on the 10Vs of Big data.

1. Volume
2. Velocity
3. Variety
4. Veracity
5. Variability
6. Validity
7. Vulnerability
8. Volatility
9. Visualization
10.Value
Volume

 Amount of data

 Refers to the vast increase in the data growth.

 Size of data plays a very critical role in determining the value out of
data

 This is evident as more than 90% of the data was produced recently.

 In fact, more than 2.5 Exabyte (=1018) bytes are created daily since even
as earlier as 2013 from every post, share, search, click, stream, and
many more data producers. It is expected to be 463 Exabyte in 2025.

 People share 500 terabytes of data per day on Facebook. Also, there are
over 300 hours of video shared every minute on YouTube
Velocity

 The speed of generation of data

 Data flow is often vast and
continuous. So it requires
platforms and capacities which
can not only handle significant
volumes but deal with this
stream in real-time
 Represents the accumulation of
data in high speed, near real-
time and real-time from
dissimilar data sources.
Variety (Format)

 Different types of data

 Involves collecting data from various resources and in fuzzy and
heterogeneous types.
 This includes importing data in dissimilar formats, namely
 Structured (tables reside in relational databases – RDBMS, etc.),
 Semi-structured (email, XML, JSON, and other markup languages,
etc.) and
 Unstructured (text, pictures, audio files, video, sensor data, etc.).
Veracity
 Refers to the source, accuracy, and correctness of data. Is it information or
misinformation?

 Being able to identify the relevance and accuracy of data and apply it to the
appropriate purpose.

 There are multiple factors to ensure the veracity of Big data:

 Trustworthiness of data source

 Reliability and security of data store

 Data availability

 Correctness and

 Consistency
Variability

 Refers to variance in meaning, number of inconsistencies,

multitude of data dimensions, and inconsistent data
receiving speeds.
Validity
 Refers to the “data are shown (or known) to be an accurate indicator of the
claim being made”.

 It differs from the veracity in that the validity does “mean the correctness and
accuracy of data with regard to the intended usage”.

 In other word, data can be trustworthy, thus satisfy the veracity aspect. But,
poor interpretation to the data might lead to unintended use. Moreover, the
same truthful data can be valid to be used in one application and invalid for a
different one.
Vulnerability

 Refers to the security of the collected datasets that will be used for later
analysis.

 It also denotes the errors in the system which permits harmful activities
to be conducted on the collected datasets.

 Hence, the acquisition of datasets should ensure capacity to provide

safe systems able to protect the collected data from breaches.
Volatility

 Refers to time up which data is valid to be stored/used

before it becomes outdated or no longer relevant.

 It is crucial dimension since cost of storage and maintenance

extends with longer Big data store.
Visualization
 Refers to the ability to present Big data into a visual context, such as
diagrams, graphs, maps, etc. toward better understanding and
interpreting of data.

 It also assists people and organizations to discover patterns,

correlations, trends, relationships and dependencies.

 Big data visualization is a powerful tool for decision makers to access,

evaluate and interpret massive data in even real time and act upon it.
Value

 Represents the outcome product of Big data analysis (i.e. new

idea, insights).

 Understanding the potential to create revenue or unlock

opportunities through your data. This reflects the outcomes
of using the your data analyzing results. If it is not valuable
then question should be raised about why and when you
store it
Big data Challenges
Big data Challenges

 Storing and processing issue

 Privacy and Security
 Data access and sharing
 Analytical challenges
 Skills requirements
 Technical Issues
Storing and processing issue

 The rate of increase in data is much faster than the existing processing
systems.

 The current storage systems are not capable enough to store these data.

 There is a need to develop a processing system that not only satisfies to

today's needs but also future needs.
Privacy and Security

 New devices and technologies like cloud computing provide a gateway

to access and to store information for data analysis.

 This integration of IT architectures will lead to greater risks to data

security and intellectual property.
Data access and sharing

Generally data is used for making accurate decisions.

The data should be available in accurate complete and timely manner.

Analytical challenges

Traditional RDBMS are suitable only for structured data.

What if data volume gets so large that we do not know how to deal with it?

Does all data need to be store?

Does all data need to be analyzed?

Which data points are important?

How can data be used for best advantages?

Skills requirements

 With the increase in amount of (structured, semi-structured, and

unstructured) data generated, there is a need for talent.

 The demand for people with good analytical skills in big data is
increasing.
Technical Issues

 Fault Tolerance

 Scalability

 Quality of Data

 Heterogeneous Data
Technical Issues: Fault Tolerance

A system's ability to continue operating uninterrupted despite the failure

of one or more of its components.

Fault-tolerant systems use backup components that automatically take

the place of failed components, ensuring no loss of service.
Technical Issues: Scalability

 The property of a system to handle a growing amount of work by adding

resources to the system.

 Vertical Scalability (Scale-up)

 Horizontal Scalability (Scale-out)

Technical Issues: Scalability
Vertical Scalability (Scale-up)

In this type of scalability, we increase the power of existing resources in the

working environment in an upward direction
Technical Issues: Scalability

Horizontal Scalability (Scale-out)

In this kind of scaling, the resources are added in a horizontal row.

Technical Issues: Quality of Data

 Data Quality :
 Completeness

 Validity

 Accuracy

 Consistency

 Integrity

 Timeless
Technical Issues: Heterogeneous Data
 Data is collected from different source with different formats.

 Data Source as:

 Database

 Websites

 Social Networks

 Files

 Ontologies

 APIs

 ….
Big Data Analytics
 A set of fundamental concepts/principles that underlie techniques for
extracting useful knowledge from large datasets containing a variety of data
types.

 Big data analytics is a term that describes the process of using data to
discover trends, patterns, and other correlations, as well as using them to
make data-driven decisions.
Types of Big Data Analytics
There are four main types of big data analytics: descriptive,
diagnostic, predictive, and prescriptive analytics.

They use various tools for processes such as cleaning,

integration, visualization, data mining, and many others, to
improve the process of analyzing data and ensuring the
company benefits from the data they gather.
Descriptive Analytics
 Answers the question, “What happened?”

 It is one of the first steps of analyzing raw data by performing simple

mathematical operations and producing statements about samples and
measurements.

 It allows you to know the trends from raw data and describe what is
currently happening.

 Data visualization is a natural fit for descriptive analysis since charts,

graphs, and maps can show trends in data in a clear, easily
understandable way.
Diagnostic Analytics
 Answers the question, “Why did it happen?”

 Use to investigate data and content to answer “Why did it happen?”. So, by
analyzing data, we understand the reasons for certain behaviors and events
related to specific situation.

 It includes comparing coexisting trends or movement, uncovering correlations

between variables, and determining causal relationships where possible

 Some tools and techniques used for such a task include: searching for patterns
in the data sets, filtering the data, using probability theory, regression analysis,
and more.
predictive Analytics
 Answers the question, “What might happen in the future?”

 Use to make predictions about future outcomes based on analyzing

historical data.

 In order to get the best results, it uses many sophisticated predictive

tools and models such as machine learning and statistical modeling.

 Making predictions for the future can help your organization formulate
strategies based on likely scenarios.
Prescriptive Analytics
 Answers the question, “What should we do next?”

 It takes into account all possible factors in a scenario and suggests

actionable takeaways.

 It takes the results from descriptive and predictive analysis and finds
solutions for optimizing decisions through various simulations and
techniques
Assignment

Discuss the different between data warehouse and data lake

Thanks

SUM DMO With System Move
No ratings yet
SUM DMO With System Move
26 pages
COBIT 2019 Process Summary
100% (2)
COBIT 2019 Process Summary
1 page
Big Data Lec1
No ratings yet
Big Data Lec1
37 pages
Unit1 - Introduction To Big Data
No ratings yet
Unit1 - Introduction To Big Data
53 pages
BD 1
No ratings yet
BD 1
15 pages
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
No ratings yet
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
58 pages
What Is Data
No ratings yet
What Is Data
20 pages
Big Data Intro PDF
No ratings yet
Big Data Intro PDF
93 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Big Data Analtics (Unit 1)
No ratings yet
Big Data Analtics (Unit 1)
31 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Big Data-Intro
No ratings yet
Big Data-Intro
31 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Overview of Big Data
No ratings yet
Overview of Big Data
4 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
Data, Big
No ratings yet
Data, Big
90 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
BDA 1-5 Imp
No ratings yet
BDA 1-5 Imp
120 pages
$RM5TSDQ
No ratings yet
$RM5TSDQ
70 pages
Overview of Big Data: Saidatul Rahah Hamidi
No ratings yet
Overview of Big Data: Saidatul Rahah Hamidi
25 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
BDA Answerbank
No ratings yet
BDA Answerbank
71 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Bda U1
No ratings yet
Bda U1
78 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Bda Chapter 1 Techneo
No ratings yet
Bda Chapter 1 Techneo
27 pages
BDAchap 1
No ratings yet
BDAchap 1
15 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
34 pages
BDA Unit 1 Notes-1
No ratings yet
BDA Unit 1 Notes-1
34 pages
Big Data
No ratings yet
Big Data
54 pages
Big Data CH 1
No ratings yet
Big Data CH 1
62 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
BDA-1st Unit
No ratings yet
BDA-1st Unit
39 pages
Big Data and Its Characteristics
No ratings yet
Big Data and Its Characteristics
21 pages
BDT 1
No ratings yet
BDT 1
49 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Module 1
No ratings yet
Module 1
21 pages
BCC (IEEE Format) Big Data
No ratings yet
BCC (IEEE Format) Big Data
2 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Lecture 3-Introduction To Big Data
No ratings yet
Lecture 3-Introduction To Big Data
25 pages
Big Data Presentation
No ratings yet
Big Data Presentation
22 pages
Big Data Analysis
No ratings yet
Big Data Analysis
3 pages
Lecture 3-Introduction To Big Data
No ratings yet
Lecture 3-Introduction To Big Data
25 pages
Big Data Analytic
No ratings yet
Big Data Analytic
10 pages
BDA Unit 1
No ratings yet
BDA Unit 1
68 pages
Big Data: Beginning With Capture, Organize, Integrate, Analyze, and Act
100% (1)
Big Data: Beginning With Capture, Organize, Integrate, Analyze, and Act
23 pages
Big Data Analytics Is
No ratings yet
Big Data Analytics Is
17 pages
Content
No ratings yet
Content
7 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Unit 1
No ratings yet
Unit 1
44 pages
Unit 1
No ratings yet
Unit 1
21 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
37 pages
ETB 1 (Big Data)
No ratings yet
ETB 1 (Big Data)
28 pages
HPCC Data Tutorial: Boca Raton Documentation Team
No ratings yet
HPCC Data Tutorial: Boca Raton Documentation Team
27 pages
Linux Training Volume1
100% (1)
Linux Training Volume1
433 pages
CV Gabriel Jimenez - Softtek 2013
No ratings yet
CV Gabriel Jimenez - Softtek 2013
2 pages
DATAVERSITY Erwin State of Data Governance 2020 Final 012420
100% (1)
DATAVERSITY Erwin State of Data Governance 2020 Final 012420
16 pages
ChanSpy - Customization in FreePBX - FreePBX
No ratings yet
ChanSpy - Customization in FreePBX - FreePBX
7 pages
Container Deployment Foundation: Planning Guide (DRAFT)
No ratings yet
Container Deployment Foundation: Planning Guide (DRAFT)
45 pages
Micro Project
No ratings yet
Micro Project
19 pages
DameWare Server Administrator Guide
No ratings yet
DameWare Server Administrator Guide
59 pages
(Ebook) Borland - Internet Programming With Delphi (Marco Cantu) PDF
No ratings yet
(Ebook) Borland - Internet Programming With Delphi (Marco Cantu) PDF
14 pages
File Handling in Python
No ratings yet
File Handling in Python
4 pages
DBMS (2018 2019)
No ratings yet
DBMS (2018 2019)
2 pages
Verification and Validation: Ian Sommerville, SW Engineering, 7th/8th Edition CH 22
No ratings yet
Verification and Validation: Ian Sommerville, SW Engineering, 7th/8th Edition CH 22
49 pages
SIA-MDA and The Future
No ratings yet
SIA-MDA and The Future
2 pages
Competitive Brief: Oracle - Hyperion
No ratings yet
Competitive Brief: Oracle - Hyperion
20 pages
RDBMS Study Material (K.G)
No ratings yet
RDBMS Study Material (K.G)
128 pages
The Architecture of Microsoft Office SharePoint Server
No ratings yet
The Architecture of Microsoft Office SharePoint Server
6 pages
المنهج العملي في الامن السيبراني
No ratings yet
المنهج العملي في الامن السيبراني
3 pages
Guidelinesgovtentities
No ratings yet
Guidelinesgovtentities
76 pages
CS403 Mcqs Mid Term by Vu Topper RM-1-1
No ratings yet
CS403 Mcqs Mid Term by Vu Topper RM-1-1
56 pages
Quick Start Guide: IBM Spectrum Scale
No ratings yet
Quick Start Guide: IBM Spectrum Scale
2 pages
Pega 7 - Pega PRPC Basic Concepts V 0.1
100% (1)
Pega 7 - Pega PRPC Basic Concepts V 0.1
13 pages
BRMS Backup
No ratings yet
BRMS Backup
109 pages
DBDM 16 Marks All 5 Units
No ratings yet
DBDM 16 Marks All 5 Units
74 pages
INE Authentication and SSO Attacks Course File - Unlocked
No ratings yet
INE Authentication and SSO Attacks Course File - Unlocked
200 pages
Cyber Security Exam Questions and Answers PDF 1
No ratings yet
Cyber Security Exam Questions and Answers PDF 1
3 pages
Machine Learning For Intrusion Detection: Pavel Laskov
No ratings yet
Machine Learning For Intrusion Detection: Pavel Laskov
72 pages
Group 4 - Fingerprint Based ATM Banking System
No ratings yet
Group 4 - Fingerprint Based ATM Banking System
10 pages
C SDLC
No ratings yet
C SDLC
52 pages

$R3N9XOZ

Uploaded by

$R3N9XOZ

Uploaded by

Big Data Analysis

Dr. Maryam Hazman

The aims of course:

 Basic concept in big data

What about offline buying???

What has changed to make digital tech so useful today

 is a volume of both structured and unstructured data

 is so large, it is difficult to process using traditional database and software

 its volume of data is too big.

 Moves too fast and exceeds current processing capacity

Large amounts of different types of data produced from various types of

 The following is a brief discussion on the 10Vs of Big data.

 Refers to the vast increase in the data growth.

 The speed of generation of data

 Different types of data

 There are multiple factors to ensure the veracity of Big data:

 Trustworthiness of data source

 Reliability and security of data store

 Refers to variance in meaning, number of inconsistencies,

 Hence, the acquisition of datasets should ensure capacity to provide

 Refers to time up which data is valid to be stored/used

 It is crucial dimension since cost of storage and maintenance

 It also assists people and organizations to discover patterns,

 Big data visualization is a powerful tool for decision makers to access,

 Represents the outcome product of Big data analysis (i.e. new

 Understanding the potential to create revenue or unlock

 Storing and processing issue

 There is a need to develop a processing system that not only satisfies to

 New devices and technologies like cloud computing provide a gateway

 This integration of IT architectures will lead to greater risks to data

Generally data is used for making accurate decisions.

The data should be available in accurate complete and timely manner.

Traditional RDBMS are suitable only for structured data.

Does all data need to be store?

Does all data need to be analyzed?

Which data points are important?

How can data be used for best advantages?

 With the increase in amount of (structured, semi-structured, and

A system's ability to continue operating uninterrupted despite the failure

Fault-tolerant systems use backup components that automatically take

 The property of a system to handle a growing amount of work by adding

 Vertical Scalability (Scale-up)

 Horizontal Scalability (Scale-out)

In this type of scalability, we increase the power of existing resources in the

Horizontal Scalability (Scale-out)

In this kind of scaling, the resources are added in a horizontal row.

 Data Source as:

They use various tools for processes such as cleaning,

 It is one of the first steps of analyzing raw data by performing simple

 Data visualization is a natural fit for descriptive analysis since charts,

 It includes comparing coexisting trends or movement, uncovering correlations

 Use to make predictions about future outcomes based on analyzing

 In order to get the best results, it uses many sophisticated predictive

 It takes into account all possible factors in a scenario and suggests

Discuss the different between data warehouse and data lake

You might also like