0% found this document useful (0 votes)

7 views

Module-2

Uploaded by

shaanfaydh.2105130

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Module-2

Uploaded by

shaanfaydh.2105130

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 83

SRI RAMAKRISHNA ENGINEERING

COLLEGE
[Educational Service : SNR Sons Charitable Trust]
[Autonomous Institution, Reaccredited by NAAC with ‘A+’ Grade]
[Approved by AICTE and Permanently Affiliated to Anna University, Chennai]
[ISO 9001:2015 Certified and all Eligible Programmes Accredited by NBA]
VATTAMALAIPALAYAM, N.G.G.O. COLONY POST, COIMBATORE – 641 022.

Department of Information Technology

20IT211- Data Science

Presentation by
Mrs.S.Jansi Rani, AP(Sr.Gr)/IT
COURSE OUTCOMES
20IT211- Data Science
Understand the basic concepts of data science and
CO1 PO1,PO2,PO12
data mining

CO2 Identify the techniques to explore and evaluate data PO3,PO5,PO12

Apply various data mining algorithms for real time PO2,PO3,PO5,P

CO3
applications O12

Implement the concepts of clustering and model

CO4 PO3,PO5,PO12
evaluation

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 2

20IT211- Data Science

Module I : INTRODUCTION 9 hours

What is data science – Case for data science – Data

science classification – Data science algorithms – Data
science process – Prior knowledge – Data preprocessing –
Data cleaning – Data integration – Data reduction – Data
transformation and data discretization – Feature selection
– Data sampling – Modeling – Application.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 3

20IT211- Data Science

Module II : DATA EXPLORATION AND VISUALIZATION

9 hours

Objectives of Data exploration – Datasets – Descriptive

statistics – Data Visualization – Univariate visualization –
Multivariate visualization – Visualizing high dimensional
data – Roadmap for data exploration.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 4

20IT211- Data Science

Module III : CLASSIFICATION AND ASSOCIATION

ANALYSIS 18 hours

Basic concepts of Classification – Decision tree induction –

Bayes classification methods – Rule based classification –
Techniques to improve classification accuracy – Support vector
machines – Regression methods: Linear regression – Logistic
regression – Association analysis: Frequent Item set mining
methods – Pattern evaluation methods.
19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 5
20IT211- Data Science

Module IV : CLUSTERING AND MODEL EVALUATION 9

hours

Basic concepts and methods in cluster analysis – Partitioning

methods – Density based methods – Model evaluation:
Confusion matrix – Receiver Operator Characteristics (ROC) and
Area under the Curve (AUC) – Lift curves – Evaluating the
Predictions – Implementation

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 6

TEXTBOOKS
1. Vijay Kotu and Bala Deshpande, “Data Science Concepts and Practice”, 2
Edition, Morgan Kaufmann Publishers, 2019.

2. Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining: Concepts and
Techniques”, 3 Edition, Morgan Kaufmann Publishers, 2012.

3. Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From
The Frontline”, O’Reilly, 2016.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 7

Reference(s)
1. Mohammed J. Zaki and Wagner Miera Jr, “Data Mining and Analysis:
Fundamental Concepts and Algorithms”, Cambridge University Press, 2014.

2. Matt Harrison, “Learning the Pandas Library: Python Tools for Data
Munging, Analysis and Visualization O’Reilly, 2016.

3. Joel Grus, “Data Science from Scratch: First Principles with Python”, O’Reilly
Media, 2015. 4. Wes McKinney, “Python for Data Analysis: Data Wrangling
with Pandas, NumPy, and IPython”, O’Reilly Media, 2012

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 8

WEB REFERENCES
1. https://nptel.ac.in/courses/106/106/106106179/

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 9

Acknowledgement
Resources are taken from the internet and textbooks

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 10

Introduction to Data Exploration
Data science helps decipher the hidden useful relationships within
data

Data exploration helps with

understanding data better,

to prepare the data in a way that makes advanced analysis possible,

to get the necessary insights from the data faster than using advanced
analytical techniques

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 11

Introduction to Data Exploration
Example:

Simple pivot table functions, computing statistics like mean

and deviation, and plotting data as a line, bar, and scatter
charts are part of data exploration techniques

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 12

Introduction to Data Exploration

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 13

Data Exploration
Data exploration :two types
Descriptive Statistics
Data Visualization

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 14

Data Exploration
Visualization is the process of projecting the data, or parts of it,
into multi-dimensional space or abstract images. All the useful
(and adorable) charts fall under this category

Descriptive statistics is the process of condensing key

characteristics of the dataset into simple numeric metrics.
Some of the common quantitative metrics used are mean,
standard deviation, and correlation

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 15

OBJECTIVES OF DATA EXPLORATION
Data understanding

Data preparation

Data science tasks

Interpreting the results

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 16

Dataset
Iris:
*The Iris dataset contains 150 observations of three different
species, Iris setosa, Iris virginica, and Iris versicolor, with 50
observations each.

*Each observation consists of four attributes: sepal length, sepal

width, petal length, and petal width

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 17

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 18
Dataset: Types of Data
For example, the temperature in weather data can be expressed
as any of the following formats:

● Numeric centigrade (31C, 33.3C) or Fahrenheit (100F, 101.45F)

or on the Kelvin scale

● Ordered labels as in hot, mild, or cold

● Number of days within a year below 0C (10 days in a year below

freezing)

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 19

Dataset: Types of Data
Numeric or Continuous:

Temperature expressed in Centigrade or Fahrenheit is numeric and continuous -

denoted by numbers and take an infinite number of values between digits.

Values are ordered and calculating the difference between the values.

Additive and subtractive mathematical operations and logical comparison

operators like greater than, less than, and equal to, operations can be applied.

An integer is a special form of the numeric data type which does not have
decimals in the value or more precisely does not have infinite values between
consecutive numbers.
19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 20
Dataset: Types of Data
Categorical or Nominal

Nominal attributes are also referred to as categorical attributes. The values of

nominal attributes do not have any meaningful order.

The color of the iris of the human eye is a categorical data type because it takes a
value like black, green, blue, gray, etc.

There is no direct relationship among the data values, and hence, mathematical
operators except the logical or “is equal” operator cannot be applied.

 They are also called a nominal or polynomial data type

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 21

An ordered nominal data type is a special case of a categorical
data type where there is some kind of order among the values.

An example of an ordered data type is temperature expressed

as hot, mild, cold.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 22

DESCRIPTIVE STATISTICS
Descriptive statistics refers to a branch of statistics that
involves summarizing, organizing, and presenting data
meaningfully and concisely

Some examples of descriptive statistics include average

annual income, median home price in a neighborhood, range
of credit scores of a population, etc

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 23

DESCRIPTIVE STATISTICS

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 24

DESCRIPTIVE STATISTICS
Descriptive statistics can be broadly classified into
univariate and
 multivariate exploration

*depending on the number of attributes under analysis

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 25

Univariate Exploration
Univariate data exploration denotes analysis of one
attribute at a time.
Measure of Central Tendency

Mean: The mean is the arithmetic average of all observations

in the dataset. It is calculated by summing all the data points
and dividing by the number of data points.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 26

Univariate Exploration
Median: The median is the value of the central point in the
distribution. The median is calculated by sorting all the
observations from small to large and selecting the mid-point
observation in the sorted list.

Mode: The mode is the most frequently occurring observation. In

the dataset, data points may be repetitive, and the most repetitive
data point is the mode of the dataset.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 27

Measure of Spread
There are two common metrics to quantify spread

Range: The range is the difference between the maximum value and the
minimum value of the attribute.

Deviation: The variance and standard deviation measures the spread, by

considering all the values of the attribute.

Deviation is simply measured as the difference between any given value (xi)
and the mean of the sample (μ).

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 28

The variance is the sum of the squared deviations of all data points divided
by the number of data points.

High standard deviation means the data points are spread widely around the
central point.

Low standard deviation means data points are closer to the central point.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 29

Percentile
Quartiles
Interquartile range (IQR) = Q3 - Q1 = 85 - 41 = 44

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 30

Percentile & Quartiles

The first quartile, Q1, is the same as the 25th percentile, and the

third quartile, Q3, is the same as the 75th percentile. The median,

M, is called both the second quartile and the 50th percentile

To calculate quartiles and percentiles, the data must be ordered

from smallest to largest

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 31

IQR

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 32

Example

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 33

A Formula for Finding the kth
Percentile

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 34

Example

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 35

Skewness
Normal distribution:

The peak of the curve is at the mean, and the data is symmetrically distributed
on either side of it. The mean, median, and mode are equal to each other or lie
close to each other.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 36

Skewness
Skewness is used to measure the level of asymmetry in our graph. It is
the measure of asymmetry that occurs when our data deviates from the
norm.

Sometimes, the normal distribution tends to tilt more on one side. This is
because the probability of data being more or less than the mean is higher
and hence makes the distribution asymmetrical. This also means that the
data is not equally distributed

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 37

Skewness
Two Types:

1. Positively Skewed: In a distribution that is Positively Skewed, the

values are more concentrated towards the right side, and the left
tail is spread out.

mean, median, and mode are always positive.

Mean > Median > Mode

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 38

Skewness
Two Types:

Negatively Skewed: In a Negatively Skewed distribution, the data points are more
concentrated towards the right-hand side of the distribution. This makes the mean,
median, and mode bend toward the right. Hence these values are always negative.

Mode > Median > Mean

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 39

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 40
Skewness
Skewness is a measure of the asymmetry of
the probability distribution of a real-valued
random variable about its mean. The
skewness value can be positive or negative, or
undefined.

skewness is the measure of how much the

probability distribution of a random variable
deviates from the normal distribution

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 41

If this value is between:

1.-0.5 and 0.5, the distribution of the value is almost symmetrical

2.-1 and -0.5, the data is negatively skewed, and if it is between

0.5 to 1, the data is positively skewed. The skewness is
moderate.

3.If the skewness is lower than -1 (negatively skewed) or greater

than 1 (positively skewed), the data is highly skewed.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 42

Example of Skewness
Cricket score is one of the best examples of skewed distribution.

Let us say that during a match, most of the players of a

particular team scored runs above 50, and only a few of them

scored below 10. In such a case, the data is generally

represented with the help of a negatively skewed distribution.

Similarly, a positively skewed distribution can be used if most of

the players of a particular team score badly during a match, and

only a few of them tend to perform well.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 43
Kurtosis

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 44

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 45
Multivariate Exploration
Multivariate exploration is the study of more
than one attribute in the dataset simultaneously.

Central Data Point

Correlation

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 46

Correlation

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 47

The Pearson correlation coefficient between two attributes x and
y is calculated with the formula:

 sx and sy are the standard deviations of random variables x and y

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 48

DATA VISUALIZATION
Data Visualization encompasses the methods of expressing data
in an abstract visual form

The visual representation of data provides easy comprehension of

complex data with multiple attributes and their underlying
relationships

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 49

Motivation
The motivation for using data visualization includes

Comprehension of dense information:

◦ A simple visual chart can easily include thousands of data points.

◦ By using visuals, the user can understand the big picture, as well as longer term trends that are

extremely difficult to interpret purely by expressing data in numbers.

Relationships::

◦ Visualizing data in Cartesian coordinates enables exploration of the relationships between the

attributes

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 50

Univariate Visualization
Visual exploration starts with investigating one attribute at a
time using univariate charts.
Histogram

Quartile

Distribution Chart

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 51

Histogram
$Histograms (or frequency histograms): “Histos” means pole or mast, and “gram”
means chart

$Plotting histograms is a graphical method for summarizing the distribution of a given

attribute, X

$Techniques to understand the frequency of the occurrence of values.

$ It shows the distribution of the data by plotting the frequency of occurrence in a

range.

$The height of the bar indicates the frequency (i.e., count) of that X value. The
resulting graph is more commonly known as a bar chart.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 52

If X is numeric, the term histogram is preferred.
◦ The range of values for X is partitioned into disjoint consecutive subranges. The
subranges, referred to as buckets or bins, are disjoint subsets of the data distribution
for X.
◦ The range of a bucket is known as the width. Typically, the buckets are of equal width

◦ Example:

◦ For example, a price attribute with a value range of Rs.1 to Rs.200 (rounded up to the
nearest rupees) can be partitioned into subranges 1 to 20, 21 to 40, 41 to 60, and so
on.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 53

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 54
Quartile
A box whisker plot is a simple visual way of showing the
distribution of a continuous variable with information such as
quartiles, median, and outliers, overlaid by mean and standard
deviation

The main attraction of box whisker or quartile charts is that

distributions of multiple attributes can be compared side
by side and the overlap between them can be deduced

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 55

Quartile
The quartiles are denoted by Q1, Q2, and Q3 points, which indicate the data
points with a 25% bin size

In a distribution, 25% of the data points will be below Q1, 50% will be below Q2,
and 75% will be below Q3

The Q1 and Q3 points in a box whisker plot are denoted by the edges of the box.

 The Q2 point, the median of the distribution, is indicated by a cross line within
the box. The outliers are denoted by circles at the end of the whisker line.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 56

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 57
Example
Suppose you have the math test results for a class of 15 students. Here are the
results:
91 95 54 69 80 85 88 73 71 70 66 90 86 84 73

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 58

Example

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 59

Example

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 60

Quartile

Outlier

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 61

Distribution Chart

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 62

The normal distribution is also called the Gaussian distribution
or “bell curve” due to its bell shape

Example
These types of charts aim to convey “what is the distribution?” of my
data. For example, did a survey and asked everyone about their age.
 A distribution chart would be useful to visualize the distribution of
ages among respondents.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 63

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 64
Multivariate Visualization
The multivariate visual exploration considers more than one
attribute in the same visual.

These visualizations examine two to four attributes simultaneously

Scatterplot

Scatter Multiple

Scatter Matrix

Bubble Chart

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 65

Scatter Plot
One of the key observations that can be concluded from a scatterplot is
the existence of a relationship between two attributes under inquiry.

 If the attributes are linearly correlated, then the data points align closer
to an imaginary straight line; if they are not correlated, the data points are
scattered.

Apart from basic correlation, scatterplots can also indicate the existence
of patterns or groups of clusters in the data and identify outliers in the
data.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 66

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 67
Scatter Multiple:

A scatter multiple is an enhanced form of a simple scatterplot where more than two dimensions
can be included in the chart and studied simultaneously.

The primary attribute is used for the x-axis coordinate. The secondary axis is shared with more
attributes or dimensions

Scatter Matrix

 If the dataset has more than two attributes, it is important to look at combinations of all the
attributes through a scatterplot. A scatter matrix solves this need by comparing all combinations
of attributes with individual scatterplots and arranging these plots in a matrix

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 68

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 69
Bubble Chart

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 70

Bubble Chart

A bubble chart is a variation of a simple scatterplot with the addition of one

more attribute, which is used to determine the size of the data point.

In the Iris dataset, petal length and petal width are used for x and y-axis,
respectively and sepal width is used for the size of the data point. The color
of the data point represents a species class label

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 71

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 72
Density Chart

Density charts are similar to the scatterplots, with one more dimension
included as a background color.

The data point can also be colored to visualize one dimension, and hence, a
total of four dimensions can be visualized in a density chart.

Example:

petal length is used for the x-axis, sepal length for the y-axis, sepal width for
the background color, and class label for the data point color

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 73

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 74
High Dimensional Data
High-dimensional data are defined as data in which the number of features (variables
observed), p, are close to or larger than the number of observations (or data points),
n.

The opposite is low-dimensional data in which the number of observations, n, far

outnumbers the number of features, p.

High-dimensional data implies many dimensions/variables/features/columns

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 75

Visualizing High-Dimensional
Data
Visualizing more than three attributes on a two-dimensional medium

(like a paper or screen) is challenging.

This limitation can be overcome by using transformation techniques to

project the high-dimensional data points into parallel axis space

This approach, a Cartesian axis is shared by more than one attribute.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 76

Parallel Chart
A parallel chart visualizes a data point quite innovatively by transforming
or projecting multi-dimensional data into a two-dimensional chart medium

Every attribute or dimension is linearly arranged in one coordinate (x-

axis) and all the measures are arranged in the other coordinate (y-axis).

Since the x-axis is multivariate, each data point is represented as a line in

a parallel space.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 77

Parallel Chart

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 78

Parallel Chart
This visualization is called a parallel axis because all four attributes are
represented in four parallel axes parallel to the y-axis.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 79

Deviation Chart
A deviation chart is very similar to a parallel chart, as it has parallel axes for all the
attributes on the x-axis.

Data points are extended across the dimensions as lines and there is one common y-axis.

Instead of plotting all data lines, deviation charts only show the mean and standard
deviation statistics.

For each class, deviation charts show the mean line connecting the mean of each

attribute; the standard deviation is shown as the band above and below the mean line.

The mean line does not have to correspond to a data point (line).

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 80

Deviation Chart

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 81

Andrews Curve
An Andrews plot belongs to a

family of visualization techniques

where the high-dimensional data

are projected into a vector space

so that each data point takes the

form of a line or curve.

19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 82

Thank You!!!
19/01/25 20IT211- DATA SCIENCE - MRS.S.JANSI RANI, AP(SR.GR)/IT 83

MTP101 Musical Theatre I - Technique Syllabus (Fall 2022)
No ratings yet
MTP101 Musical Theatre I - Technique Syllabus (Fall 2022)
8 pages
Work Immersion
No ratings yet
Work Immersion
50 pages
Speakout Intermediate Units 3 & 4 Revision (Adapted From Achievement Tests) - ANSWER KEY
0% (1)
Speakout Intermediate Units 3 & 4 Revision (Adapted From Achievement Tests) - ANSWER KEY
1 page
Howto Build A Light Setup
100% (1)
Howto Build A Light Setup
10 pages
4.1-4 Trainees Training Requirement
100% (1)
4.1-4 Trainees Training Requirement
14 pages
Module-1
No ratings yet
Module-1
140 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Data Science 5
100% (3)
Data Science 5
216 pages
DV - Unit 1
No ratings yet
DV - Unit 1
40 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
Data Science 3
No ratings yet
Data Science 3
216 pages
Module-4
No ratings yet
Module-4
97 pages
Data Science 1
100% (3)
Data Science 1
133 pages
22UCS303 DS-Unit III-N
No ratings yet
22UCS303 DS-Unit III-N
85 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Module-3
No ratings yet
Module-3
146 pages
DM UNIT-1-1
No ratings yet
DM UNIT-1-1
56 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Lect 3
No ratings yet
Lect 3
51 pages
Data Ana With R
No ratings yet
Data Ana With R
45 pages
Unit 2 - Data Visualization Techniques
No ratings yet
Unit 2 - Data Visualization Techniques
101 pages
02know Your Data-Lecture2-3
No ratings yet
02know Your Data-Lecture2-3
53 pages
Unit I
No ratings yet
Unit I
57 pages
Data Science Mid Syllabus
No ratings yet
Data Science Mid Syllabus
102 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Foundation of Data Science previous year question paper
No ratings yet
Foundation of Data Science previous year question paper
40 pages
Unit 1 - FoDS - Sep 2023
No ratings yet
Unit 1 - FoDS - Sep 2023
147 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
DM 2 Part 1
No ratings yet
DM 2 Part 1
50 pages
ITS632 Lecture2 Data
No ratings yet
ITS632 Lecture2 Data
61 pages
CG DADL - 2024 June - Lecture 01
No ratings yet
CG DADL - 2024 June - Lecture 01
58 pages
Sghapter 02
No ratings yet
Sghapter 02
96 pages
Lect 2 DM Converted 1
No ratings yet
Lect 2 DM Converted 1
29 pages
FDS - 2 SOLVED
No ratings yet
FDS - 2 SOLVED
14 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Week 5 - Data Mining Exploring Data With R
No ratings yet
Week 5 - Data Mining Exploring Data With R
146 pages
Penggalian Data & Analitika Bisnis: Faculties Teknologi Informasi - ITS
No ratings yet
Penggalian Data & Analitika Bisnis: Faculties Teknologi Informasi - ITS
69 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Data Mining
No ratings yet
Data Mining
40 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Data Science Lecture No 03
No ratings yet
Data Science Lecture No 03
23 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
DS Assignment
No ratings yet
DS Assignment
12 pages
Module 1
No ratings yet
Module 1
64 pages
Lecture 01
No ratings yet
Lecture 01
40 pages
Combine PDF
No ratings yet
Combine PDF
270 pages
Data ch2
No ratings yet
Data ch2
16 pages
unit 1
No ratings yet
unit 1
33 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
No ratings yet
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
51 pages
Ccps521 Win2023 Week01 Intro
No ratings yet
Ccps521 Win2023 Week01 Intro
44 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
25 pages
Chapter 1 (6)
No ratings yet
Chapter 1 (6)
62 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Hotal booking docs
No ratings yet
Hotal booking docs
2 pages
CCNP Route Chapter 3 Answers
86% (7)
CCNP Route Chapter 3 Answers
14 pages
Service Quality PDF
No ratings yet
Service Quality PDF
78 pages
G 7 Niken Tarin Haniatus
No ratings yet
G 7 Niken Tarin Haniatus
3 pages
19 MDKDV: Features and Benefits
No ratings yet
19 MDKDV: Features and Benefits
4 pages
Microsoft AI Cloud Partner Program Benefits Guide-3
No ratings yet
Microsoft AI Cloud Partner Program Benefits Guide-3
38 pages
Omfed Project
No ratings yet
Omfed Project
38 pages
Keyboard UnitK2assessment
No ratings yet
Keyboard UnitK2assessment
3 pages
QBittorrent - Dracula Theme
No ratings yet
QBittorrent - Dracula Theme
2 pages
LRam (4,0)
No ratings yet
LRam (4,0)
5 pages
Chain of Custody
No ratings yet
Chain of Custody
5 pages
Case Study: 1. Electroplating
No ratings yet
Case Study: 1. Electroplating
11 pages
Tensors Poor Man
No ratings yet
Tensors Poor Man
42 pages
Vedic Maths 1 June 2024(2)
No ratings yet
Vedic Maths 1 June 2024(2)
3 pages
PTC Phillipine-1
No ratings yet
PTC Phillipine-1
2 pages
Answer Keys English
No ratings yet
Answer Keys English
8 pages
2013-14 BKT - Annual-Report-For-Financial-Year
No ratings yet
2013-14 BKT - Annual-Report-For-Financial-Year
80 pages
Harrah's
No ratings yet
Harrah's
3 pages
CV Xueting Wang MIIS
No ratings yet
CV Xueting Wang MIIS
4 pages
ENGLISH 10 - LESSON 1 Information-from-Various-Sources
No ratings yet
ENGLISH 10 - LESSON 1 Information-from-Various-Sources
16 pages
Liquid Solution
No ratings yet
Liquid Solution
15 pages
WEG CFW500 Programming Manual 10006739425 Enaa
No ratings yet
WEG CFW500 Programming Manual 10006739425 Enaa
268 pages
Documento Sem Título
No ratings yet
Documento Sem Título
8 pages
Cross Contamination
No ratings yet
Cross Contamination
1 page
Martial Arts in 007 RPG
100% (1)
Martial Arts in 007 RPG
7 pages