0% found this document useful (0 votes)

11 views

Presentation Session 1 - Practical Data Science Final

Uploaded by

SHANMUGAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Presentation Session 1 - Practical Data Science Final

Uploaded by

SHANMUGAM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Session 1

03rd Aug 2024 [5 PM – 8 PM]

Practical Data Science

Topics:
1. Introduction to Data Science
2. Data characteristics
3. Descriptive Statistics
4. Inferential Statistics
WEEK 2Data
- CSEScience – WhyVISUALIZATION
3020 – DATA it is needed?
Data Growth – IDC-Seagate Study

“Data science is an interdisciplinary field of

scientific methods, processes, algorithms
and systems to extract knowledge or
insights from data in various forms, either
structured or unstructured”

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 2

Data
WEEK 2 - CSE Science
3020 – DATA– VISUALIZATION
Life Cycle
1. Data Acquisition: collect data from all its raw
sources, DBs and flat-files - integrate and
Data Acquisition
transform it into a homogenous format,
collecting it into “data warehouse,” – ETL Tools

Data
2. Data Preparations:
Visualisation
Preparations
Data Science • Data Cleaning (remove bad data, null values,
handling missing values)
– Life Cycle
• Data Transformation – takes raw data and
turns it into desired outputs by normalizing
Modelling (min-max, zscore)
Building and Data Mining
Testing • Handling Outliers
• Data Reduction
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 3
Data
WEEK 2 - CSE Science
3020 – DATA– VISUALIZATION
Life Cycle
3. Data Mining– Uncover the data patterns and
relationships to take better business decisions.
Data Acquisition
It’s a discovery process to get hidden and useful
knowledge, commonly known as exploratory
data analysis
Visualisation
Data
Preparations
4. Modelling Building and Testing –
Data Science • Modeling is the heart of data analysis. It
– Life Cycle takes organized data as ip and gives op.
• Suitable ML/DL models to be built for the
data, problem - to gain deeper insights,
Modelling predict outcomes – using training data set
Building and Data Mining
Testing • Tested against predetermined test data to
assess result accuracy
• Fine-tuned to improve the result
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 4
Data
WEEK 2 - CSE Science
3020 – DATA– VISUALIZATION
Life Cycle
5. Visualisation –
Data Acquisition
• Communicate insights from data through
visual representation
• Explaining the process of operationalisation
Data • Communicate results
Visualisation
Preparations
Data Science • Highlights the findings, correlations, etc..
– Life Cycle

Modelling
Building and Data Mining
Testing

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 5

Data
WEEK 2 - CSE 3020Science
– DATA–VISUALIZATION
Benefits
• Making Faster and better business decisions
• Develop insights that are beyond human capabilities
• Act at the right time and take advantage of opportunities,
• Innovate new products and solutions
• Risk analysis practices to make informed business decisions..
• Measuring performance etc..

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 6

Data3020
WEEK 2 - CSE Science – Applications
– DATA VISUALIZATION

Recommendation
Systems

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 7

WEEK 2 Data
- CSEScience – What
3020 – DATA we cover?
VISUALIZATION

Session 1: Introduction to Data Science - Data characteristics,

Descriptive and Inferential Statistical Analysis

Session 2: Types of data and dataset, Different pre-processing Techniques:

Finding Missing Data and handling them, Encoding Categorical Data, Data
Transformation and Normalization, Feature scaling, Indexing and slicing,
Filtering data, Outlier identification and removal. Hands-on in Python

Session 3: Complex Merging and Concatenating, Reshaping Data,

Grouping and Aggregation, Advanced Grouping Techniques, training and
test split, Cross validation Techniques. Practical exercise focused on pre-
processing a complex real-world dataset

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 8

Data
WEEK Visualization
2 - CSE – Why
3020 – DATA it is needed?
VISUALIZATION
Example 1
Sales of jackets and sales of socks over the course of the previous year

Visualization,
• Improves Insights
• Enables faster decision making

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 9

Data
WEEK Visualization
2 - CSE – Why
3020 – DATA it is needed?
VISUALIZATION
Example 2

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 10

Data
WEEK Visualization
2 - CSE – Why
3020 – DATA it is needed?
VISUALIZATION
Example 2

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 11

Data
WEEK Visualization
2 - CSE – Why
3020 – DATA it is needed?
VISUALIZATION

Data visualization representation:

• Charts
• Tables
• Graphs
• Maps
• Infographics
• Dashboards

• Graphical representation of information and data.

• Communicate insights from data through visual representation
• Goal à analyse large datasets into visual graphicsà easy to understand of complex
relationships within the data.
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 12
WEEKData Visualization
2 - CSE 3020 – DATA– What we cover?
VISUALIZATION

Session 1 - Overview of Data Visualisation - Data abstraction, Scalar

and vector visualisation - Data visualisation for Numerical and
categorical data; Histogram and Bar Chart.

Session 2 - Boxplot, Line Plots, Pie Charts, Scatter Plots, Heatmaps for
Correlation Analysis, Text visualisation. Hands On - MatplotLib for creating
multiple plots

Session 3 - Dashboard creation using visualization tools for the use cases:
Finance/marketing/healthcare (anyone) etc.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 13

2. Data Characteristics

Outline:
• Data types

• Measurements of Data

• Dataset types

• Semantics

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 14

WEEK 2 - CSE 3020 Data Types
– DATA VISUALIZATION
• Different types of data à Statistical techniques & Visualization
• Classification is essential

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 15

Qualitative/Categorical
WEEK 2 - CSE data
3020 – DATA VISUALIZATION

• Categories or groups
• Answer to Yes or No
• Qualitative data can be separated into different categories
that are distinguished by some nonnumeric characteristics.
• E.g.: Genders (male/female) of professional athletes.
• Expressed in terms of natural language descriptions
• Sometimes categorical data can take numerical values, but
those numbers do not have mathematical meaning.
• E.g.: Birthdate
• Calculate the average,
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 16
Quantitative
WEEK 2 - CSE 3020 data
– DATA VISUALIZATION

• Information that can be measured and written down with

numbers and not in any language or descriptive form

• Quantitative data - your height, your shoe size, and the

numbers representing counts or measurements.
• Example: total count of your employees

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 17

Classify as qualitative or quantitative
• Colors of car in a dealer’s showroom.
• Number of seats in movie theaters.
• Classification of patients based on nursing care needed
(complete, partial, or self care)
• Lengths of newborn cats of a certain species.
• Number of complaint letters received by an airline per month.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 18

Working with Quantitative Data

Quantitative data can further be

distinguished between
Discrete and Continuous types.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 19

Discrete
• Data result when the number of possible values is either a
finite number or a ‘countable’ number of possible values.
• Discrete data consists of distinct and separate values
0, 1, 2, 3
Example:
• The number of eggs that hen lay
• Mark obtained by student

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 20

Continuous
• Continuous variable is defined as a variable which can take an
uncountable set of values or infinite set of values.
• Values can be integers or decimals
Example:
The amount of milk that a cow produces; e.g. 2.3431 gallons per day.
Height of Individuals: Any value within a specific range and have
fractional components (e.g., 5.7 feet).
Temperature: Temperature can be measured with decimal values,
taking any value within a range.
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 21
Classify as discrete or continuous.
• Number of cartons of milk manufactured each day.
• Temperatures of airplane wing
• Incomes of college students on work study programs.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 22

Levels of Measurement

Another way to classify data is to use levels of

measurement.
Four levels of measurement
• Nominal
• Ordinal
• Interval
• Ratio

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 23

Why Is Level of Measurement Important?

• Helps you decide what statistical analysis is appropriate on

the values that were assigned
• Helps you decide how to interpret the data from that variable

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 24

1. Nominal level of measurement

• Characterized by data that consist of names, labels,

or categories only.
• The data cannot be arranged in an ordering scheme
(such as low to high)

• Example: survey responses yes, no, undecided

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 25

2. Ordinal level of measurement

• Involves data that may be arranged in some order,

but differences between data values either cannot
be determined or are meaningless

• Example: Course grades A, B, C, D, or F

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 26

3. Interval level of measurement
• Like the ordinal level, with the additional property
that the difference between any two data values is
meaningful. However, there is no natural zero
starting point (where none of the quantity is present)
Example: Years 1000, 2000, 1776, and 1492
Temperature

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 27

4. Ratio level of measurement
• Interval level modified to include the natural zero starting
point (where zero indicates that none of the quantity is
present).
• For values at this level, differences and ratios are
meaningful.
Example: Prices of college textbooks ($0 represents no cost)
weight of a person.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 28

Summary - Levels of Measurement
v Nominal - categories only
v Ordinal - categories with some order
v Interval - differences but no natural starting point
v Ratio - differences and a natural starting point

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 29

Classify each as nominal, ordinal, interval, or ratio

• Horsepower of motorcycle engines.

• Ratings of corporation in Houston(poor, fair, good, excellent)
• Salaries of the top 5 CEOs in the United States
• Marital status of respondents to a survey of savings accounts.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 30

The Hierarchy of Levels

Nominal

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 31

The Hierarchy of Levels

Nominal Attributes are only named; weakest

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 32
The Hierarchy of Levels

Ordinal

Nominal Attributes are only named; weakest

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 33

The Hierarchy of Levels

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 34

The Hierarchy of Levels

Interval
Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 35

The Hierarchy of Levels

Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 36

The Hierarchy of Levels

Ratio
Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 37

The Hierarchy of Levels

Ratio Absolute zero

Interval Distance is meaningful

Ordinal Attributes can be ordered

Nominal Attributes are only named; weakest

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 38

Nominal Scale & Ordinal Scale

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 39

Questionnaire

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 40

Reflection Spot
Understand the data

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 41

Dataset types

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 42

Data set types
Table
• Data represented in rows
and columns,.
• Cell - combination of a row
and a column (an item and
an attribute) contains
a value for that pair

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 43

Dataset types
Networks
• Used to specify when there is some kind of relationship between
two or more items.
• An item in a network is often called a node.
• A link is a relation between two items.
• Tables can represent networks
-Many-many relationships
-Also can be stored as specific graph databases or files

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 44

Dataset types
Trees
• Networks with hierarchical structure are more
specifically called trees.
• In contrast to a general network, trees do not
have cycles: each child node has only one
parent node pointing to that

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 45

Dataset types
Field
• The field dataset type also contains attribute values
associated with cells.
• Each cell in a field contains measurements or calculations
from a continuous domain
• There are infinitely many values that you might measure, so
you could always take a new measurement between any
two existing ones.
• Temperature, pressure, speed, force, and density;
mathematical functions can also be continuous.
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 46
Dataset types
Fields

Scalar Fields Vector Fields Tensor Fields

Each point in space has an

associated...

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 47

Dataset types
Spatial field
• Continuous data is in the form of a spatial field, cell structure
of the field - sampling at spatial positions.
• For example, with a medical imaging - suspected tumours
(distinctive shapes or densities)

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 48

Dataset types
Spatial Data Example: MRI
• Medical scan of a human body containing measurements
indicating the density of tissue at many sample point

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 49

Dataset types
Grid Field
• When a field contains data created by sampling at
completely regular intervals, the cells form a uniform grid.
• No need to explicitly store the grid geometry in terms of its
location in space, or the grid topology

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 50

Dataset types
Grid Fields
• Grids necessary to sample continuous data:
• A rectilinear grid supports non-uniform
sampling, allowing efficient storage of
information that has high complexity in
some areas and low complexity in others,
at the cost of storing some information
uniform rectilinear about the geometric location of each row.

• Interpolation: “how to show values between the sampled points in

ways that do not mislead”

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 51

Dataset types
Geometry
• The geometry dataset type specifies information about the
shape of items with explicit spatial positions.
• The items could be points, or one-dimensional lines or curves,
or 2D surfaces or regions, or 3D volumes.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 52

Dataset types
Dataset Availability
• The default approach to vis assumes that the entire dataset
is available all at once, as a static file.

• One kind of dynamic change is to add new items or delete

previous items. Another is to change the values of existing
items
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 53
Semantics matter
Refer the following data:
Task 1:
Basil, 7, S, Pear
What do you infer?
Any guess

Task 2: Lakshan 1111 89 90 92

What do you infer?

Ruhan 2222 78 67 90

Any guess
Thejas 3333 82 98 88

Student Name Regno Data Visualization Computer Graphics Human Computer

Interaction
Lakshan 1111 89 90 92

Ruhan 2222 78 67 90

Thejas 3333 82 98 88

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 54

Semantics matter
• Semantics: real-world meaning of the data
• city, or fruit, number represent a day of the month, or an age, or
a measurement of height, or a unique code for a specific person,
or a postal code for a neighbourhood, or a position in space etc..
• Type: structural or mathematical interpretation
• data level - item, link, attribute?
• dataset level - table, a tree, a field of sampled values? etc..
• Both often require metadata
- Sometimes we can infer some of this information

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 55

3. Descriptive Statistics
Outline:
• Mean
• Median
• Mode
• Range
• Variance
• Standard Deviation
• Percentile
• Interquartile Range

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 56

Central Tendency of Data
Single value that attempts to describe the whole data
using a central point or central location of the data.
• Mean
• Median
• Mode

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 57

Mean
• Arithmetic average
• Sum of the values, divided with the number of values in a data
Example: Observations for weight of students in a class
(60, 55, 85, 90, 70, 65, 70, 45)
Average weight of student = 540/8 = 67.5
Trim parameter
• The values in the vector get sorted and
• Required numbers of observations are dropped from calculating the mean.
NA Parameter (Not Applicable)
• If there are missing values, then the mean function returns NA.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 58

Median
55, 60, 70, 65, 70, 85, 90

• Middle value on the sorted list (ascending)

Example: Observations for weight of students in a class
Sample1: (60, 55, 85, 90, 70, 65, 70) à Total number is Odd
Ascending order (55, 60, 65, 70, 70, 85, 90) à Median = 70
Sample2: (60, 55, 85, 65, 70, 45) à Total number is even
Ascending order (45, 55, 60, 65, 70, 85) à Median = Mean (60, 65) = 62.5

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 59

Mode
• Value that has highest number of occurrences in a set of data.
• Sample1: (2,1,2,3,1,2,3,4,1,5,5,3,2)
• Mode = 2
• Both numeric and character data.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 60

Impact of Outlier in Central tendency

• Outlier is a data point that differs significantly

from other observations.
• Due to a variability in the measurement, or it may
be the result of experimental error
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 61
Impact of Outlier in Central tendency

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 62

Impact of Outlier in Central tendency

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 63

Summary of Central tendency

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 64

Measures of dispersion
Measures of dispersion (How wide the set of data is spread out?
• Range
• Variance
• Standard Deviation
• Percentile
• Interquartile range

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 65

Range
Let us consider two set of observations,
Sample 1: (-10, 0, 10, 20, 30) Sample 2: (8, 9, 10, 11, 12)
Mean = 10 Mean = 10
Median = 10 Median = 10
To differentiate à Range = Max – Min
Range = 40 Range = 4
Range considers only extreme values

Hence there is a need for other metrics

• Variance
• Standard deviation
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 66
Variance
• The average of the squared differences from the Mean
• Sample 1: (-10, 0, 10, 20, 30) à Mean = 10
( "#$"#$ ! % $"#$ ! % #$"#$ ! % &$"#$ ! % '$"#$ !
• 𝑥= = 1000/5 = 200
(
Steps to find out Variance:-
Step 1: Find the mean.
Step 2: For each data point, find the square of its distance to the mean.
Step 3: Sum the values from Step 2.
Step 4: Divide by the number of data points.
Sample 1: (-10, 0, 10, 20, 30) Sample 2: (8, 9, 10, 11, 12)
Variance = 250 Variance = 2.5
Linear measurements à Square measurements
Hence the value should be normalized
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 67
Standard Deviation
• Standard deviation measures the spread of a data distribution. (σ)

• SD is Square root of variance

• To get normalized values
• The more spread out a data distribution is, the greater its standard deviation.
• SD close to 0 indicates that the data points tend to be close to the mean.
• SD cannot be negative.

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 68

Percentile
• Percentile describes how a score compares to other scores from the same set.
• The percentage of values in a set of data scores that fall below a given value.

Step 1: Arrange the score in ascending order

Step 2: Percentile x (Total observations + 1)
100

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 69

Percentile
Step 1: Arrange the score in ascending order
Step 2: Percentile x (Total observations + 1)
100

Let us find the 80th percentile

= 80 x (16) /100 = 0.8 x 16
= ~12.8 = 13th observations is 80th percentile

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 70

Interquartile Range
• Interquartile range represents the difference between 1st Quartile (25th percentile)
and 3rd Quartile (75th percentile)
• Spread of 50% of data
• Obtaining Quartiles
1. Order data
2. Find the median
3. Look at the lower half of data set - Find “median” of this lower half - Q1
4. Look at the upper half of the data set. - Find “median” of this upper half - Q3
5. Inter-Quartile Range (IQR) = Q3 - Q1

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 71

Interquartile Range
Consider these 10 ages:
05 11 21 24 27 28 30 42 50 52

median Inter-Quartile Range
(IQR) = Q3 - Q1
The median of the bottom half (Q1) = 21
05 11 21 24 27
IQR= 42-21 = 21

The median of the top half (Q3) = 42

28 30 42 50 52

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 72

Interquartile Range
Example 2: Quartiles, n = 53
100 124 148 170 185 215
101 125 150 170 185 220
106 127 150 172 186 260
106 128 152 175 187
110 130 155 175 192
110 130 157 180 194
119 133 165 180 195
120 135 165 180 203
120 139 165 180 210
123 140 170 185 212

L(M)=(53+1) / 2 = 27 Median = 165

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 73
Interquartile Range
Example 2: Quartiles, n = 53
100 124 148 170 185 215
101 125 150 170 185 220
106 127 150 172 186 260
106 128 152 175 187
110 130 155 175 192
110 130 157 180 194
119 133 165 180 195
120 135 165 180 203
120 139 165 180 210
123 140 170 185 212
Bottom half has n * = 26 ® L(Q1)=(26 + 1) / 2= 13.5 from bottom
Q1 = avg(127, 128) = 127.5
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 74
Interquartile Range
Example 2: Quartiles, n = 53
100 124 148 170 185 215
101 125 150 170 185 220
106 127 150 172 186 260
106 128 152 175 187
110 130 155 175 192
110 130 157 180 194
119 133 165 180 195
120 135 165 180 203
120 139 165 180 210
123 140 170 185 212
Top half has n* = 26 ® L(Q3) = 13.5 from the top!
Q3 = avg(185, 185) = 185
Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 75
Interquartile Range
Example 2: Quartiles, n = 53
Q1 = 127.5 Q2 = 165 Q3 = 185
Inter-Quartile
"5 point summary" • Q1 = 127.5
Range (IQR)
= {Min, Q1, Median, Q3, Max}
= {100, 127.5, 165, 185, 260} • Q3 = 185
= Q3 - Q1
= 185 – 127.5
= 57.5
“spread of middle 50%”

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 76

4. Inferential Statistics

Outline:
• Normal Distribution
• Correlation
• Covariance
• Central Limit Theorem
• Hypothesis testing

Dr Vengadeswaran S, IIIT Kottayam Practical Data Science and Advanced ML 77

A Hans On Introduction To Data Science-1-300
No ratings yet
A Hans On Introduction To Data Science-1-300
300 pages
Rectifier User Manual
67% (6)
Rectifier User Manual
123 pages
CR Console Installation Guide: Product Specifications
100% (10)
CR Console Installation Guide: Product Specifications
59 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Forma Scientific - 86 Freezer Models 916 - 917 - 923 - 925 - and 926 Manual ENG
100% (1)
Forma Scientific - 86 Freezer Models 916 - 917 - 923 - 925 - and 926 Manual ENG
63 pages
Applied Data Science
100% (1)
Applied Data Science
279 pages
Data Science 1
100% (3)
Data Science 1
133 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Data Science Course in Hyderabad
100% (1)
Data Science Course in Hyderabad
29 pages
Data Science 5
100% (3)
Data Science 5
216 pages
Unit#1 - Overview
No ratings yet
Unit#1 - Overview
25 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
Data Science Course Curriculum 27 Feb 2023
No ratings yet
Data Science Course Curriculum 27 Feb 2023
21 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Data Science Mid Syllabus
No ratings yet
Data Science Mid Syllabus
102 pages
3.Question bank
No ratings yet
3.Question bank
7 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
Syllabus FDS
No ratings yet
Syllabus FDS
4 pages
Learn Data Science Fundamentals (2025)
100% (1)
Learn Data Science Fundamentals (2025)
201 pages
Project Report
No ratings yet
Project Report
29 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
DataScience Minordegree 2023 Syllabus
No ratings yet
DataScience Minordegree 2023 Syllabus
12 pages
intro
No ratings yet
intro
144 pages
Dsdm-Unit1 241031 194317
No ratings yet
Dsdm-Unit1 241031 194317
38 pages
Prime Classes Brochure
No ratings yet
Prime Classes Brochure
14 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
unit 1
No ratings yet
unit 1
33 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Final Data Science Course (Practicals)
No ratings yet
Final Data Science Course (Practicals)
5 pages
UNIT02
No ratings yet
UNIT02
41 pages
foundation of Data science imp notes
No ratings yet
foundation of Data science imp notes
6 pages
DSS-first Lecture
No ratings yet
DSS-first Lecture
14 pages
FDS notes
No ratings yet
FDS notes
5 pages
Introduction To Data Science, Evolution of Data Science
No ratings yet
Introduction To Data Science, Evolution of Data Science
11 pages
Data Science PDF
No ratings yet
Data Science PDF
11 pages
fds-two-marks
No ratings yet
fds-two-marks
10 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
18 pages
Intro To Data-Science Final
No ratings yet
Intro To Data-Science Final
3 pages
Unit I and unit ii dev (1)
No ratings yet
Unit I and unit ii dev (1)
36 pages
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
No ratings yet
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
51 pages
Foundation of Data Science previous year question paper
No ratings yet
Foundation of Data Science previous year question paper
40 pages
Unit I
No ratings yet
Unit I
52 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Session 1819
No ratings yet
Session 1819
47 pages
Data Science Pro
No ratings yet
Data Science Pro
31 pages
Ids PPT and PDF
No ratings yet
Ids PPT and PDF
493 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
Lesson Plan Ids-3-Aiml
No ratings yet
Lesson Plan Ids-3-Aiml
4 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
Data Ana With R
No ratings yet
Data Ana With R
45 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
Chapter 1 Introduction To Datascience
No ratings yet
Chapter 1 Introduction To Datascience
13 pages
Unit-3 Intr Data Science
No ratings yet
Unit-3 Intr Data Science
150 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills
No ratings yet
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills
10 pages
325E6B
No ratings yet
325E6B
1 page
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Spring 2025_CS301_1
No ratings yet
Spring 2025_CS301_1
5 pages
Safety Instructions: Screen Cleaning Precautions
No ratings yet
Safety Instructions: Screen Cleaning Precautions
36 pages
LP3470
No ratings yet
LP3470
22 pages
Aua Kua Webinar
No ratings yet
Aua Kua Webinar
29 pages
Applying Rational Equations Learning Objective(s)
No ratings yet
Applying Rational Equations Learning Objective(s)
5 pages
Read the following passage about the development of electric cars and mark the letter A
No ratings yet
Read the following passage about the development of electric cars and mark the letter A
5 pages
2005 National Electrical Code Optional Residential Electrical Load Calculation For Single Family Residences Only
No ratings yet
2005 National Electrical Code Optional Residential Electrical Load Calculation For Single Family Residences Only
1 page
Aids 5
No ratings yet
Aids 5
58 pages
Voss Fulltech
No ratings yet
Voss Fulltech
14 pages
2022 - SCWM - LS11 Show Selected Storage Bins Incorrectly
No ratings yet
2022 - SCWM - LS11 Show Selected Storage Bins Incorrectly
5 pages
Hazard Heat Map Training: Lufkin Industries
No ratings yet
Hazard Heat Map Training: Lufkin Industries
33 pages
Comparative and Comprehensive Review of Maximum Power Point Tracking Methods For PV Cells
No ratings yet
Comparative and Comprehensive Review of Maximum Power Point Tracking Methods For PV Cells
25 pages
Weel02 Task
No ratings yet
Weel02 Task
9 pages
Notes - PPS Unit 5
No ratings yet
Notes - PPS Unit 5
28 pages
Fast Lane - RH-DO180
No ratings yet
Fast Lane - RH-DO180
3 pages
En Cuisine Libro de Frances
No ratings yet
En Cuisine Libro de Frances
113 pages
Quote IP CCTV System Hikvision
No ratings yet
Quote IP CCTV System Hikvision
1 page
Final Paper Pattern IT430....
No ratings yet
Final Paper Pattern IT430....
7 pages
Site Id Site Name FE FS
No ratings yet
Site Id Site Name FE FS
16 pages
1 Polyas Method
No ratings yet
1 Polyas Method
5 pages
Usertransactionsgift Cards Infogift Order Id 1682555692923740161
No ratings yet
Usertransactionsgift Cards Infogift Order Id 1682555692923740161
1 page
Quotation: Shenzhen Manridy Technology Co., LTD
No ratings yet
Quotation: Shenzhen Manridy Technology Co., LTD
2 pages
Tech Interview Questions: Expand BIOS: A) Basic Input
50% (2)
Tech Interview Questions: Expand BIOS: A) Basic Input
12 pages
Modular Lithium-Ion Energy Storage System
No ratings yet
Modular Lithium-Ion Energy Storage System
2 pages
Full Thesis
No ratings yet
Full Thesis
123 pages
Designing Strategic Control Systems
No ratings yet
Designing Strategic Control Systems
9 pages
Samyung RadioCommunication STR6000A ServiceManual
100% (3)
Samyung RadioCommunication STR6000A ServiceManual
47 pages