0% found this document useful (0 votes)
32 views96 pages

Chapter 02 Understanding of Data

The document discusses various topics related to data including what data is, characteristics of big data, different data sources like structured, unstructured and semi-structured data, data storage methods, different types of data analysis including descriptive, diagnostic, predictive and prescriptive analytics, and data preprocessing techniques.

Uploaded by

neha praveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views96 pages

Chapter 02 Understanding of Data

The document discusses various topics related to data including what data is, characteristics of big data, different data sources like structured, unstructured and semi-structured data, data storage methods, different types of data analysis including descriptive, diagnostic, predictive and prescriptive analytics, and data preprocessing techniques.

Uploaded by

neha praveen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 96

Machine

Learning
S. Sridhar and M. Vijayalakshmi

© Oxford University Press 2021. All rights reserved


Chapter 2

Understanding of Data

© Oxford University Press 2021. All rights reserved


What is Data?
• DATA ARE FAC TS
• FAC TS ARE IN THE FORM OF NUMBERS, AUDIO, VIDEO,
IMAGE
• NEED TO ANALYZE DATA FOR TAKING DECISIONS

© Oxford University Press 2021. All rights reserved


Characteristics of Big Data

© Oxford University Press 2021. All rights reserved


Characteristic of Data

© Oxford University Press 2021. All rights reserved


Data Sources
A DATA SOURCE CAN BE
ANYTHING –

• STRUCTURED DATA
• SEMI-STRUCTURED DATA
• UNSTRUCTURED DATA

© Oxford University Press 2021. All rights reserved


Structured Data
A STRUCTURED DATA C AN BE ANY ONE OF THE FOLLOWING –

• RECORD DATA
• GRAPHICS DATA
• DATA MATRIX
• ORDERED DATA – SEQUENCE DATA, TIME SERIES DATA,
TEMPORAL DATA

© Oxford University Press 2021. All rights reserved


Unstructured Data
AN UNSTRUCTURED DATA C AN BE ANY ONE OF THE
FOLLOWING –

• VIDEO, IMAGE, PROGRAMS


• BLOG DATA
• 80% OF ORGANIZATION DATA

© Oxford University Press 2021. All rights reserved


SEMI-Structured Data
A SEMI-STRUCTURED DATA C AN BE ANY ONE OF THE
FOLLOWING –

• XML/JSON OBJECTS
• RSS FEEDS
• HIERARCHICAL RECORDS

© Oxford University Press 2021. All rights reserved


Data Storage

© Oxford University Press 2021. All rights reserved


Data Storage
• DATABASE SYSTEMS
• TYPES ARE
1. TRANSACTIONAL DATABASE
2. TIME SERIES DATABASE
3. TEMPORAL DATABASE

© Oxford University Press 2021. All rights reserved


Data Storage
• OTHER
TYPES

© Oxford University Press 2021. All rights reserved


Descriptive Analytics

© Oxford University Press 2021. All rights reserved


Diagnostic Analytics

© Oxford University Press 2021. All rights reserved


Predictive Analytics

© Oxford University Press 2021. All rights reserved


Prescriptive Analytics

© Oxford University Press 2021. All rights reserved


Data Analysis Framework
• FRAMEWORK

© Oxford University Press 2021. All rights reserved


Types of Processing
• CLOUD
COMPUTING
• GRID COMPUTING
• H-COMPUTING

© Oxford University Press 2021. All rights reserved


Good Data Characteristics
• GOD DATA SHOULD HAVE THESE
CHARACTERISTICS

© Oxford University Press 2021. All rights reserved


Open-Source Data
1. DIGITAL LIBRARIES
2. EXPERIMENTAL DATA LIKE GENOMIC AND BIOLOGICAL
DATA
3. HEALTHCARE SYSTEMS LIKE PATIENT INSURANCE DATA

© Oxford University Press 2021. All rights reserved


Social-Media Data
1. TWITTER DATA
2. FACEBOOK DATA
3. YOUTUBE
VIDEOS
4. INSTAGRAM
DATA

© Oxford University Press 2021. All rights reserved


Multimodal Data
• IMAGE ARCHIVES WITH TEXT AND NUMERIC
DATA
• WWW

© Oxford University Press 2021. All rights reserved


Data Preprocessing
DATA THAT C AN C AUSE
PROBLEMS
• INCOMPLETE DATA
• OUTLIER DATA
• INCONSISTENT DATA
• INACCURATE DATA
• MISSING VALUES
• DUPLIC ATE DATA

© Oxford University Press 2021. All rights reserved


Missing Data

© Oxford University Press 2021. All rights reserved


Noisy Data
BINNING
TECHNIQUE

© Oxford University Press 2021. All rights reserved


Data Normalization
MIN-MAX PROCEDURE
TRANSFORMS DATA TO THE RANGE
0-1

© Oxford University Press 2021. All rights reserved


Data Normalization
Z-
SCORE

© Oxford University Press 2021. All rights reserved


Types of Data

© Oxford University Press 2021. All rights reserved


Nominal Data

© Oxford University Press 2021. All rights reserved


Ordinal Data

© Oxford University Press 2021. All rights reserved


Numerical Data

© Oxford University Press 2021. All rights reserved


Types of Data
BASED ON
VARIABLES

© Oxford University Press 2021. All rights reserved


Data Visualization

© Oxford University Press 2021. All rights reserved


Data Visualization

© Oxford University Press 2021. All rights reserved


Data Visualization

© Oxford University Press 2021. All rights reserved


Data Visualization

© Oxford University Press 2021. All rights reserved


Data Visualization

© Oxford University Press 2021. All rights reserved


Central Tendency
MEAN OF
DATA

© Oxford University Press 2021. All rights reserved


Central Tendency
MEDIAN OF
DATA

© Oxford University Press 2021. All rights reserved


Central Tendency
MODE OF
DATA

© Oxford University Press 2021. All rights reserved


DISPERSION
RANGE AND STANDARD
DEVIATION

© Oxford University Press 2021. All rights reserved


DISPERSION
QUARTILES AND
IQR

© Oxford University Press 2021. All rights reserved


Five-point summary
5-POINT
SUMMARY

© Oxford University Press 2021. All rights reserved


Shape of Data
SKEWNESS AND
KURTOSIS

© Oxford University Press 2021. All rights reserved


Shape of Data
KURTOS
IS

© Oxford University Press 2021. All rights reserved


Shape of Data
MEAN ABSOLUTE DEVIATION AND COEFFICIENT OF
VARIATION

© Oxford University Press 2021. All rights reserved


Stem-Leaf Plot

© Oxford University Press 2021. All rights reserved


Q-Q Plot
QQ PLOT IS NORMALITY TEST. IF DATA CLOSER TO STRAIGHT LINE, THEN
THE DISTRIBUTION IS NORMAL.

© Oxford University Press 2021. All rights reserved


Bivariate Data
INVOLVES TWO
VARIABLES

© Oxford University Press 2021. All rights reserved


Bivariate Data Visualization

© Oxford University Press 2021. All rights reserved


Bivariate Data – Covariance

© Oxford University Press 2021. All rights reserved


Bivariate Data – Correlation

© Oxford University Press 2021. All rights reserved


Bivariate Data – Correlation

© Oxford University Press 2021. All rights reserved


Multivariate Data Visualization

© Oxford University Press 2021. All rights reserved


Multivariate Data Visualization

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
1. GAUSSIAN ELIMINATION

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
1. GAUSSIAN ELIMINATION

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
1.MATRIX
DECOMPOSITION

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
1.MATRIX
DECOMPOSITION

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
1.
DISTRIBUTIONS

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
EXPONENTIAL
DISTRIBUTION

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
BINOMIAL
DISTRIBUTION

© Oxford University Press 2021. All rights reserved


Multivariate Essential Mathematics
POSSON AND BERNOULLI
DISTRIBUTION

© Oxford University Press 2021. All rights reserved


Density Estimation

© Oxford University Press 2021. All rights reserved


Hypothesis Testing
Z-
TEST

© Oxford University Press 2021. All rights reserved


Hypothesis Testing
PAIRED T-
TEST

© Oxford University Press 2021. All rights reserved


Hypothesis Testing
PAIRED T-
TEST

© Oxford University Press 2021. All rights reserved


Hypothesis Testing
PAIRED T-
TEST

© Oxford University Press 2021. All rights reserved


Hypothesis Testing
CHI-SQUARE
TEST

© Oxford University Press 2021. All rights reserved


Hypothesis Testing
CHI-SQUARE
TEST

© Oxford University Press 2021. All rights reserved


Hypothesis Testing
CHI-SQUARE
TEST

© Oxford University Press 2021. All rights reserved


Feature Engineering

© Oxford University Press 2021. All rights reserved


Feature Engineering
• FEATURE
TRANSFORMATION
• FEATURE SELECTIONS

© Oxford University Press 2021. All rights reserved


Characteristics of Good Features
• FEATURES ARE REMOVED USING RELEVANCY
• FEATURES ARE REMOVED BASED ON
REDUNDANC Y

© Oxford University Press 2021. All rights reserved


FEATURE SELECTION
FORWARD SELECTION

© Oxford University Press 2021. All rights reserved


FEATURE SELECTION
BACKWARD SELECTION

© Oxford University Press 2021. All rights reserved


Principal Component Analysis

© Oxford University Press 2021. All rights reserved


Principal Component Analysis
Compute
Covarianc
e matrix
as

Compute Eigen
values and
Eigen vec tors
and matrix A
as a set of
eigen vectors

© Oxford University Press 2021. All rights reserved


Principal Component Analysis
Compute PCA
as

The original
Data c an be
recovered
as

© Oxford University Press 2021. All rights reserved


PCA Algorithm

© Oxford University Press 2021. All rights reserved


PCA Example

© Oxford University Press 2021. All rights reserved


PCA Example

© Oxford University Press 2021. All rights reserved


PCA Example

© Oxford University Press 2021. All rights reserved


PCA Example

© Oxford University Press 2021. All rights reserved


PCA Example

© Oxford University Press 2021. All rights reserved


Verification

© Oxford University Press 2021. All rights reserved


LDA Algorithm

© Oxford University Press 2021. All rights reserved


LDA Algorithm

© Oxford University Press 2021. All rights reserved


SVD Algorithm

© Oxford University Press 2021. All rights reserved


SVD Algorithm

© Oxford University Press 2021. All rights reserved


SVD Example

© Oxford University Press 2021. All rights reserved


SVD Example

© Oxford University Press 2021. All rights reserved


SVD Example

© Oxford University Press 2021. All rights reserved


SVD Example

© Oxford University Press 2021. All rights reserved


Summary

© Oxford University Press 2021. All rights reserved


Summary

© Oxford University Press 2021. All rights reserved

You might also like