0% found this document useful (0 votes)
3 views26 pages

Lecture 1.4

The document provides an introduction to statistics for data science, covering key concepts such as descriptive and inferential statistics, types of data, and scales of measurement. It distinguishes between nominal, ordinal, interval, and ratio scales, explaining their properties and examples. The learning objectives include understanding data collection, classification, and framing questions that can be answered through data analysis.

Uploaded by

bat1batttt4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views26 pages

Lecture 1.4

The document provides an introduction to statistics for data science, covering key concepts such as descriptive and inferential statistics, types of data, and scales of measurement. It distinguishes between nominal, ordinal, interval, and ratio scales, explaining their properties and examples. The learning objectives include understanding data collection, classification, and framing questions that can be answered through data analysis.

Uploaded by

bat1batttt4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Statistics for Data Science -1

Statistics for Data Science -1


Introduction and types of data

Usha Mohan

Indian Institute of Technology Madras

1/ 10
Statistics for Data Science -1

Learning objectives
1. What is statistics?
I Descriptive statistics, inferential statistics.
I Distinguish between a sample and a population.
2. Understand how data are collected.
I Identify variables and cases (observations) in a data set
3. Types of data-
I classify data as categorical(qualitative) or
numerical(quantitative) data.
I Understand cross-sectional versus time-series data.
I Measurement scales
4. Creating data sets; Downloading and manipulating data sets;
working on subsets of data.
5. Framing questions that can be answered from data.
2/ 10
Statistics for Data Science -1

Introduction
Basic definitions
Population and sample

Understanding data

Classification of data
Categorical and numerical
Cross-sectional versus time-series data
Scales of measurement

3/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Scales of measurement

I Data collection requires one of the following scales of


measurement: nominal, ordinal, interval, or ratio.

4/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Nominal scale of measurement

I When the data for a variable consist of labels or names used


to identify the characteristic of an observation, the scale of
measurement is considered a nominal scale.

5/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Nominal scale of measurement

I When the data for a variable consist of labels or names used


to identify the characteristic of an observation, the scale of
measurement is considered a nominal scale.
I Examples: Name, Board, Gender, Blood group etc.

5/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Nominal scale of measurement

I When the data for a variable consist of labels or names used


to identify the characteristic of an observation, the scale of
measurement is considered a nominal scale.
I Examples: Name, Board, Gender, Blood group etc.
I Sometimes nominal variables might be numerically coded.

5/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Nominal scale of measurement

I When the data for a variable consist of labels or names used


to identify the characteristic of an observation, the scale of
measurement is considered a nominal scale.
I Examples: Name, Board, Gender, Blood group etc.
I Sometimes nominal variables might be numerically coded.
I For example: We might code Men as 1 and Women as 2. Or
Code Men as 3 and Women as 1. Both codes are valid.

5/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Nominal scale of measurement

I When the data for a variable consist of labels or names used


to identify the characteristic of an observation, the scale of
measurement is considered a nominal scale.
I Examples: Name, Board, Gender, Blood group etc.
I Sometimes nominal variables might be numerically coded.
I For example: We might code Men as 1 and Women as 2. Or
Code Men as 3 and Women as 1. Both codes are valid.
I There is no ordering in the variable.

5/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Nominal scale of measurement

I When the data for a variable consist of labels or names used


to identify the characteristic of an observation, the scale of
measurement is considered a nominal scale.
I Examples: Name, Board, Gender, Blood group etc.
I Sometimes nominal variables might be numerically coded.
I For example: We might code Men as 1 and Women as 2. Or
Code Men as 3 and Women as 1. Both codes are valid.
I There is no ordering in the variable.
I Nominal: name categories without implying order

5/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Ordinal scale of measurement

I Data exhibits properties of nominal data and the order or rank


of data is meaningful, the scale of measurement is considered
a ordinal scale.

6/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Ordinal scale of measurement

I Data exhibits properties of nominal data and the order or rank


of data is meaningful, the scale of measurement is considered
a ordinal scale.
I Each customer who visits a restaurant provides a service
rating of excellent, good, or poor.
I The data obtained are the labels—excellent, good, or
poor—the data have the properties of nominal data.
I In addition, the data can be ranked, or ordered, with respect to
the service quality.

6/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Ordinal scale of measurement

I Data exhibits properties of nominal data and the order or rank


of data is meaningful, the scale of measurement is considered
a ordinal scale.
I Each customer who visits a restaurant provides a service
rating of excellent, good, or poor.
I The data obtained are the labels—excellent, good, or
poor—the data have the properties of nominal data.
I In addition, the data can be ranked, or ordered, with respect to
the service quality.
I Ordinal – name categories that can be ordered

6/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Interval scale of measurement

I If the data have all the properties of ordinal data and the
interval between values is expressed in terms of a fixed unit of
measure, then the scale of measurement is interval scale.

7/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Interval scale of measurement

I If the data have all the properties of ordinal data and the
interval between values is expressed in terms of a fixed unit of
measure, then the scale of measurement is interval scale.
I Interval data are always numeric. Can find out difference
between any two values.

7/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Interval scale of measurement

I If the data have all the properties of ordinal data and the
interval between values is expressed in terms of a fixed unit of
measure, then the scale of measurement is interval scale.
I Interval data are always numeric. Can find out difference
between any two values.
I Ratios of values have no meaning here because the value of
zero is arbitrary.

7/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Interval scale of measurement

I If the data have all the properties of ordinal data and the
interval between values is expressed in terms of a fixed unit of
measure, then the scale of measurement is interval scale.
I Interval data are always numeric. Can find out difference
between any two values.
I Ratios of values have no meaning here because the value of
zero is arbitrary.
I Interval:
numerical values that can be added/subtracted (no absolute zero)

7/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Example: temperature
I Suppose the response to a question on how hot the day is
comfortable and uncomfortable, then the temperature as a
variable is nominal.

8/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Example: temperature
I Suppose the response to a question on how hot the day is
comfortable and uncomfortable, then the temperature as a
variable is nominal.
I Suppose the answer to measuring the temperature of a liquid
is cold, warm, hot - the variable is ordinal.

8/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Example: temperature
I Suppose the response to a question on how hot the day is
comfortable and uncomfortable, then the temperature as a
variable is nominal.
I Suppose the answer to measuring the temperature of a liquid
is cold, warm, hot - the variable is ordinal.
I Example: Consider a AC room where temperature is set at
20°C and the temperature outside the room is 40°C. It is
correct to say that the difference in temperature is 20°C, but it
is incorrect to say that the outdoors is twice as hot as indoors.

8/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Example: temperature
I Suppose the response to a question on how hot the day is
comfortable and uncomfortable, then the temperature as a
variable is nominal.
I Suppose the answer to measuring the temperature of a liquid
is cold, warm, hot - the variable is ordinal.
I Example: Consider a AC room where temperature is set at
20°C and the temperature outside the room is 40°C. It is
correct to say that the difference in temperature is 20°C, but it
is incorrect to say that the outdoors is twice as hot as indoors.
I Temperature in degrees Fahrenheit or degrees centigrade is an
interval variable. No absolute zero.
Celsius Fahrenheit
Freezing point 0 32
Boiling point 100 212
8/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Ratio scale of measurement

I If the data have all the properties of interval data and the
ratio of two values is meaningful, then the scale of
measurement is ratio scale.

9/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Ratio scale of measurement

I If the data have all the properties of interval data and the
ratio of two values is meaningful, then the scale of
measurement is ratio scale.
I Example: height, weight, age, marks, etc.

9/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Ratio scale of measurement

I If the data have all the properties of interval data and the
ratio of two values is meaningful, then the scale of
measurement is ratio scale.
I Example: height, weight, age, marks, etc.
I Ratio: numerical values that can be added, subtracted,
multiplied or divided (makes ratio comparisons possible)

9/ 10
Statistics for Data Science -1
Classification of data
Scales of measurement

Summary

10/ 10

You might also like