0% found this document useful (0 votes)
21 views35 pages

Chapter 1

BIG DATA ANALYSIS POWER POINT SLIDE CHAPTER 1

Uploaded by

Shams AlHadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views35 pages

Chapter 1

BIG DATA ANALYSIS POWER POINT SLIDE CHAPTER 1

Uploaded by

Shams AlHadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Click to edit Master title style

Big Data Analytics

Dr. Mukhtaj Khan


[email protected]

1
Fundamental of Big Data
Click to edit Master title style

• What is Dataset?
• What is Big Data
• Source of Big Data
• Data Analysis
• Data Analytics
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
• Types of Data
• Features of Big Data
2
Click
Whattois edit Master title style
Dataset?

• Collections or groups of related data are generally referred to as


datasets.

• Each group or dataset member (datum) shares same set of attributes


or properties as others in the same dataset.

• Some examples of datasets are:


• tweets stored in a flat file
• a collection of image files in a directory
• an extract of rows from a database table stored in a CSV formatted
file
• historical weather observations that are stored as XML files
3
Click
Whattois edit Master
Dataset title style
cont…

Figure shows three datasets based on


three different data formats.

4
Click
Whattois edit Master title style
Big Data

• Big data is a collection of data


sets so large and complex that it
becomes difficult to process using
traditional database management
tools.

• The challenges include capture,


storage, search, sharing, analysis,
and visualization.

5
Click to edit Master title style
Big Data

• “Big Data” is data whose scale, diversity, and


complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract value
and hidden knowledge from it

6 6
Click
Whattois edit Master
Big Data title style
cont….

• Put another way, big data is the realization of


greater business intelligence by storing,
processing, and analyzing data that was
previously ignored due to the limitations of
traditional data management technologies.

• Big Data is defined as a term that encompasses


the use of techniques to capture, process, analyse
and visualize potentially large datasets in a
reasonable timeframe not accessible / possible to
standard IT technologies
7
Click
Whattois edit Master
Big Data title style
cont….

• 2.5 quintillion bytes of data are generated every


day!
– A quintillion is 1018
• Data come from many quarters.
– Social media sites
– FaceBook
– Twitter
– YouTube
– Sensors
– IoTs
– Digital photos
– Business transactions 8
Click to edit Master title style
Big Data

9 9
Click
Data to edit Master title style
Analysis

• Data analysis is the process of examining data to


find facts, relationships, patterns, insights and/or
trends.

• The overall goal of data analysis is to support


better decision making.

• A simple data analysis example is the analysis of


ice cream sales data in order to determine how the
number of ice cream cones sold is related to the
daily temperature.

• The results of such an analysis would support


decisions related to how much ice cream a store 10
Click to edit
Big Data Master title style
Analytics
• Big Data analytics is a field dedicated to the
analysis, processing, cleansing, organizing and
storage of large collections of data that frequently
originate from disparate sources.

• Big Data solutions and practices are typically


required when traditional data analysis, processing
and storage technologies are insufficient.

• In Big Data environments, data analytics has


developed methods that allow data analysis to occur
through the use of highly scalable distributed
technologies and frameworks that are capable of
analyzing large volumes of data from different 11
Click to edit
Big Data Mastercont…
Analytics title style

• The Big Data analytics lifecycle generally


involves:
• identifying, obtaining, preparing and
analyzing large amounts of raw,
unstructured data.

• The data analysis are employed to extract


meaningful information which can be used
as an input for:
• identifying patterns,
• enriching existing enterprise data and
• performing large-scale searches. 12
Click to edit
Traditional Master
vs Big title style
Data Analytics Approach

• Traditional data analytics approach such as


statistical approach uses sampling approach to
approximate measurements
• Whereas the Big data analytics approach consider
the whole dataset during prediction.
• The big data analytics approach leverage
computational resources to execute analytics
algorithms.
• Data processed by a Big Data analytics can be used
by enterprise applications directly or can be fed into
a data warehouse to enrich existing data.
• The results obtained through the processing of Big
Data can lead to a wide range of insights and
benefits, such as: operational optimization, 13
prediction of fault, identification of new
Click toAnalytics
Big Data edit Master
Types title style

• There are four general categories of


analytics that are distinguished by results
they produce. These categories are:
• descriptive analytics
• diagnostic analytics
• predictive analytics
• prescriptive analytics

14
Click toAnalytics
Big Data edit Master
Types title
cont…style

15
Click toAnalytics
Big Data edit Master
Types title
cont…style

Descriptive Analytics:
• Descriptive analytics are carried out to
answer questions about events that had
already occurred.
• Sample question can include:
• What was the sales volume over the
past 12 months?
• What is the number of support calls
received as categorized by severity
and geographic location?
• What is the monthly commission
earned by each sales agent?
16
Click toAnalytics
Big Data edit Master
Types title
cont…style

17
Click toAnalytics
Big Data edit Master
Types title
cont…style

Diagnostic analytics:
• Diagnostic analytics aim to determine the
cause of a phenomenon that occurred in the
past using questions that focus on the
reason behind the event. Such question
included:

• Why were Q2 sales less than Q1 sales?


• Why have there been more support calls
originating from the Eastern region than
from the Western region?
• Why was there an increase in patient re-
admission rates over the past three 18

months?
Click toAnalytics
Big Data edit Master
Types title
cont…style

19
Click toAnalytics
Big Data edit Master
Types title
cont…style

Predictive Analytics:
• Predictive analytics are carried out in an
attempt to determine the outcome of an event
that might occur in the future.

• The predictions are made based on patterns,


trends using historical and current data.
• Questions are usually formulated using a what-
if rationale, such as the following:
• What are the chances that a customer will
default on a loan if they have missed three
months payment?
• What will be the patient survival rate if Drug
B is administered instead of Drug A?
• If a customer has purchased Products A and 20
Click toAnalytics
Big Data edit Master
Types title
cont…style

21
Click toAnalytics
Big Data edit Master
Types title
cont…style

Prescriptive Analytics:
• Prescriptive analytics build upon the results
of predictive analytics by prescribing
actions that should be taken.
• The focus is not only on which prescribed
option is best to follow, but why. Sample
question may be included:
• Among three drugs, which one provides
the best results?
• When is the best time to trade a
particular stock?

22
Click toAnalytics
Big Data edit Master
Types title
cont…style

23
Click toCharacteristics
Big Data edit Master title style

The value characteristic is intuitively related


to the veracity characteristic in that the
higher the data fidelity, the more value it
holds for the business.

24
Click
Big Data to edit Master title style
Characteristics

25
Big Data Characteristics
Click to edit Master title style
Volume

• The exponential growth in the data storage


as the data is now more than text data.

• The data can be found in the format of


videos, music’s and large images on our
social media channels.

• It is very common to have Terabytes and


Petabytes of the storage system for
enterprises.

• The big volume indeed represents Big 26


Click
Velocity to edit Master title style
Big Data Characteristics

• The data growth and social media explosion


have changed how we look at the data.

• Today, people reply on social media to update


them with the latest happening. On social
media sometimes a few seconds old
messages (a tweet, status updates etc.) is not
something interests users.

• They often discard old messages and pay


attention to recent updates. The data
movement is now almost real time and the
update window has reduced to fractions of the
27
seconds.
Click to edit Master title style
Big Data Characteristics
Variety

• Data can be stored in multiple format. For


example database, excel, csv, access or for the
matter of the fact, it can be stored in a simple
text file.

• Sometimes the data is not even in the traditional


format as we assume, it may be in the form of
video, SMS, pdf or something we might have not
thought about it. It is the need of the organization
to arrange it and make it meaningful.

• It will be easy to do so if we have data in the


same format, however it is not the case most of
the time. 28
Click
Veracity to edit Master title style
Big Data Characteristics

• Veracity refers to how accurate is the data


• To extract value from the data, the data needs
to be cleaned to remove noise..

• Data-driven applications can reap the benefits


of big data only when the data is meaningful
and accurate.

• Therefore, cleansing of data is important so


that incorrect and faulty data can be filtered
out.
29
Click to edit Master title style
Big Data Characteristics
Value

• value and time are inversely related. The


longer it takes for data to be turned into
meaningful information, the less value it
has for a business.

• Stale results inhibit the quality and speed


of informed decision-making

30
Click to edit Master title style
Types of Data

• The data processed by Big Data solutions can


be human-generated or machine-generated

• Human-generated data is the result of human


interaction with systems, such as online services and
digital devices

• Machine-generated data is generated by software


programs and hardware devices in response to real-
world events. For example, a log file captures an
authorization decision made by a security service, and
a point-of-sale system generates a transaction against
inventory to reflect items purchased by a customer.

• Structured Data 31

• Unstructured Data
Click to edit Master title style
Types of Data
Structured Data

• Structured data conforms to a data model or


schema and is often stored in tabular form.
• It is used to capture relationships between
different entities and is therefore most often
stored in a relational database.
• Structured data is frequently generated by
enterprise applications and information
systems like ERP and CRM systems.
• Examples of this type of data include banking
transactions, invoices, and customer records.
32
Click to
Types of Data
Unstructured Dataedit Master title style

• Data that does not conform to a data model or data


schema is known as unstructured data.
• It is estimated that unstructured data makes up 80%
of the data within any given enterprise.
• Unstructured data has a faster growth rate than
structured data.
• This form of data is either textual or binary and often
conveyed via files that are self-contained and non-
relational. A text file may contain the contents of
various tweets or blog postings. Binary files are often
media files that contain image, audio or video data.
• Unstructured data cannot be directly processed or
queried using SQL.
• Alternatively, a Not-only SQL (NoSQL) database is a
non-relational database that can be used to store 33
Click toData
Types of Data
Semi-Structured edit Master title style

• Semi-structured data has a defined level of


structure and consistency but is not relational in
nature. Instead, semi-structured data is
hierarchical or graph-based.

• Graph” in this case points to mathematical graph


theory. In graph theory, a graph is a
mathematical structure to model pair-wise
relationships between objects. Graph or network
data is, in short, data that focuses on the
relationship or adjacency of objects.

34
Types of Data
Click
Meta Data to edit Master title style

• Metadata provides information about a dataset’s


characteristics and structure.
• The tracking of metadata is crucial to Big Data
processing, storage and analysis because it provides
information about the pedigree of the data and its
provenance during processing.
• Examples of metadata include:
• XML tags providing the author and creation date of a
document
• attributes providing the file size and resolution of a
digital photograph

35

You might also like