0% found this document useful (0 votes)

4 views

LU1 Lecture Notes

This document serves as an introduction to statistics, covering fundamental concepts such as population, sample, parameter, and statistic, as well as the distinction between descriptive and inferential statistics. It also discusses data types, measurement scales, and the importance of understanding statistical terminology for effective data analysis. Key components include definitions of random variables, data formats, and sigma notation.

Uploaded by

sadpost787

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

LU1 Lecture Notes

Uploaded by

sadpost787

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

LEARNING UNIT 1: Introduction to Statistics

Learning objectives
• Understand the concepts of a population, sample, parameter, statistic, random variable
and data
• Distinguish between descriptive and inferential statistics
• Identify data types and measurement scales
• Know the difference between raw data and frequency data
• Understand sigma notation

Textbook reference
• Chapter 1
o §1.1 – §1.3, §1.7
o Exclude §1.4 – §1.6, §1.8 – §1.11
ATE01A1 – LU 1 1
INTRODUCTION

An essential part of the scientific research process is gathering, ordering, and analysing
information from which conclusions can be drawn and interpretations can be made. The
study of statistical methods focuses on how the data should be analysed so that meaningful
conclusions can be drawn.

THE LANGUAGE AND COMPONENTS OF STATISTICS

To perform statistical analyses, one must first understand the language of statistics. In this
section, we will define basic statistical terminology and concepts.

ATE01A1 – LU 1 2
Population

Population refers to the entire collection of individuals, objects, or items under

consideration. A population may be finite or infinite. For example, the shoes manufactured
on any given day in a factory are a finite population. However, all the outcomes when
flipping a coin repeatedly (and indefinitely) would be considered an infinite population. The
total number of elements in a population is denoted by N.

Parameter

A population parameter is a constant value (usually unknown) that describes some

measurable aspect of a population. Population parameters are generally denoted using
Greek letters.

ATE01A1 – LU 1 3
Sample
A sample is a subset of the population of interest. Samples are generally used to collect
information since considering the entire population is not always possible or feasible. The

total number of elements in a sample is denoted by n.

Statistic
A number calculated from sample data, which describes a measurable aspect of a sample,
is called a statistic. Sample statistics are generally denoted using Roman letters.

Sampling unit
A sampling unit is an object being measured, counted, or observed.

ATE01A1 – LU 1 4
Random variable
A variable is a characteristic of the elements of a population (or sample) for which the
observed values differ from element to element.
• In probability theory, where a variable assumes specific values with certain associated
probabilities, the variable is called a random variable.
• Variables are denoted by capital letters, e.g. X, Y, Z, and the values assumed by the
random variables are denoted by lowercase letters, e.g. x, y, z.
For example, let X = the height of boys in metres. Here, X is a random variable, which
measures the variable “height”. If three boys are selected at random, i.e. n = 3, and their
respective heights are 1.40m, 1.37m and 1.41m, the realisations of the random variable X
is given by 𝑥𝑖 for i = 1, 2, 3:
𝑥1 = 1.40 𝑥2 = 1.37 𝑥3 = 1.41

ATE01A1 – LU 1 5
Data
The actual values (numbers) or outcomes of all variables measured from the data.

Descriptive statistics
Descriptive statistics comprise those methods used to organise and describe information
that has been collected in a sample.

Inferential statistics
Inferential statistics comprise those methods and techniques used for making
generalisations, predictions or estimates about the population using sampled data.

ATE01A1 – LU 1 6
Notation

Sample statistic Population parameter

Mean 𝑥ҧ (x-bar)  (mu)
Variance 𝑠 2 (s-squared) 2 (sigma-squared)
Standard deviation s  (sigma)
𝑥
Proportion p=
𝑛
 (pi) [this is not the constant 3.1416]
Size n N

ATE01B1 – LU 1 7
Exercise 1.1
Consider the results of the three semester tests for ATE A:

1) All the ATE A students form the:

2) A selection of 50 ATE A students is a:
3) Each test is a:
4) The sampling unit is:
5) The results from all three tests form the:
6) The average mark for Test 1 is a:

7) To test whether the current group of ATE students performs better than groups from
previous years is the process of:

ATE01B1 – LU 1 8
UNDERSTANDING DATA

Data Types

Determining the most appropriate statistical method depends firstly on the problem
statement to be addressed and secondly on the type of data available. Specific statistical
methods are valid for certain data types only. Data types are identified by the nature of their
random variables. A random variable is categorical (qualitative) or numeric (quantitative).

ATE01B1 – LU 1 9.
1. Categorical random variables

Categorical variables are also known as qualitative variables. Such variables allow for
classification based on some characteristic. The variable ‘Eye colour’ can be classified as
brown, blue, green or grey.

The values of categorical variables are often recorded as numerical values. For example,
gender might be coded in the dataset as 1 = Male and 2 = Female, but these values have
no numerical meaning as they denote labels or categories of the variable. Such categorical
data can, therefore, only be counted to determine how many responses belong to each
category.

ATE01B1 – LU 1 10.
2. Numerical random variables

Numerical variables are also known as quantitative variables. Such variables are naturally
measured as numbers. For example, a person’s height in centimetres. Arithmetic
operations can be performed on the variables as the values have numerical meaning.

Numerical variables are further classified as either discrete or continuous:

• Discrete variables assume values obtained by counting (whole numbers or integers)

and consist of a finite number of values. For example, the number of students in a
class (75).

• Continuous variables assume values obtained by measuring and consist of infinite

values along the real line. For example, the time travel to work (28.4 min)

ATE01B1 – LU 1 11.
Measurement Scales

Data can also be classified in terms of its scale of measurement, i.e., the procedure used to
measure or obtain the data. There are four types of measurement scales: nominal, ordinal,
interval, and ratio.

1. Nominal

A categorical variable is measured on a nominal scale if the variable consists of two or

more categories with no intrinsic order (of equal importance).

For example, a person’s eye colour could be brown, blue, green or grey. There is no logical
way in which these four categories can be ordered. Nominal data is, therefore, usually
ordered alphabetically and then assigned a numeric value.

ATE01B1 – LU 1 12.
2. Ordinal

A categorical variable is measured on an ordinal scale if the variable consists of two or

more categories that can be ordered or ranked.

For example, a person’s age is classified as 1 = young, 2 = middle-aged, 3 = old. The three
possible values of this variable are ordered logically.

Note: in this example, numbers are used to reflect the measurement in order from low to
high, without any numeric meaning attached to the values.

ATE01B1 – LU 1 13.
3. Interval

A numerical variable (discrete or continuous) is measured on an interval scale if the values

of the variable can be arranged in order. Furthermore:

• there is no true or absolute zero, i.e., the value of zero is an arbitrary reference point

• Differences between data values are meaningful

• Ratios between values are not meaningful.

For example, temperature in degrees Celsius. The values are numerical and ordered. A
temperature of 0°C does not mean an absence of temperature, i.e., the scale has an
arbitrary zero value. The difference between 10°C and 20°C is the same as the difference
between 30°C and 40°C, namely a 10-degree difference. However, 20°C is not twice as hot
as 10°C, i.e., ratios are not meaningful.

ATE01B1 – LU 1 14.
4. Ratio

A numerical variable (discrete or continuous) is measured on a ratio scale if the values of

the variable can be arranged in order. Furthermore:

• there is a true or absolute zero

• differences between data values are meaningful

• ratios between values are meaningful.

For example, the amount of money in a bank account in Rand. The values are numerical
and ordered. An amount of R0 implies an absence of money, i.e., the scale has an absolute
zero value. The difference between R10 and R20 is the same as that between R30 and
R40, namely a R10 difference. R20 is twice as much money as R10, i.e., ratios are
meaningful.

ATE01B1 – LU 1 15.
Exercise 1.2
Data were collected from a random sample of 20 coffee consumers. The survey yielded the
following variables and data.
Consumer ID Household Daily coffee Coffee type Choice of Coffee affinity
Gender Age Highest qualification
number size consumption preference brand rating score
1 Male 24 Tertiary certificate 4 3 Instant 2 2.3
2 Male 26 Degree/Diploma 2 1 Instant 1 1.9
3 Female 25 Degree/Diploma 3 2 Filter 1 0.8
4 Female 30 Less than matric 5 7 Instant 5 4.4
5 Male 35 Tertiary certificate 1 4 Instant 3 3.1
6 Male 21 Tertiary certificate 1 1 Filter 3 0.4
7 Male 24 Degree/Diploma 4 2 Instant 4 1.8
8 Male 19 Matric 1 1 Filter 4 0.4
9 Female 28 Postgraduate degree 2 3 Instant 2 3.1
10 Female 34 Matric 3 2 Instant 1 1.9
11 Male 37 Tertiary certificate 2 5 Instant 1 4.9
12 Female 40 Postgraduate degree 5 2 Filter 3 0.6
13 Male 29 Degree/Diploma 4 1 Instant 1 0.1
14 Male 35 Degree/Diploma 2 4 Filter 5 3.6
15 Female 29 Matric 3 1 Filter 4 1
16 Male 19 Matric 6 2 Instant 4 1.4
17 Female 32 Degree/Diploma 1 3 Filter 3 2.4
18 Male 19 Less than matric 2 5 Instant 2 3.4
19 Female 26 Tertiary certificate 5 2 Instant 3 0.2
20 Female 36 Postgraduate degree 3 8 Instant 2 4.6
Daily coffee consumption = Number of cups
Choice of brand rating: 1 = Not important, 2 = Somewhat important, 3 = Important, 4 = Relatively important, 5 = Very important.
Coffee affinity score: derived from other information (one or more variables were combined) to calculate this new variable.

ATE01B1 – LU 1 16.
For each variable, identify the type and the scale of measure.

Variable Type Scale of measure

Consumer ID number
Gender
Age
Highest qualification
Household size
Daily coffee consumption
Coffee type preference
Choice of brand rating
Coffee affinity score

ATE01B1 – LU 1 17.
Data Formats

1. Raw data

Raw data refers to unprocessed information, also known as source data or primary data. All
information collected is first represented in raw data format, i.e., the dataset. As in the
previous example, the dataset is shown in a matrix format with rows and columns.
Variables are given in the columns, and observations are presented in the rows. A sample
of n observations and p variables will yield a dataset with n rows and p columns.

The steps to enter raw data into the calculator are as follows:

1) SETUP → down arrow → 3:STAT → 2:OFF

2) MODE → 2:STAT → 1:1–VAR

3) Enter variable values in the column labelled X

4) AC

ATE01B1 – LU 1 18.
2. Frequency data

Frequency data are raw data in an aggregated format where individual or a range of data
values are listed with a count of the number of times each value/range appears in the
dataset. This count is the frequency of occurrence or simply the frequency. It shows how
the data are distributed across the scale. Frequency data provide an overview of the
sampled information. Univariate frequency data represent counts of a single variable, and
bivariate frequency data represent counts of the combination of two variables. Steps to
enter frequency data into the calculator are given in Learning Unit 2.

ATE01B1 – LU 1 19.
Sigma Notation (self-read)

In mathematics, sigma notation is the standard notation used to represent summation. It is

a convenient and simple way to write long sums in a compact form. It is denoted by the
Greek capital letter sigma (𝛴). If a random variable X consists of n observations 𝑥1 , 𝑥2 , … , 𝑥𝑛 ,
the sum of all n values is represented in sigma notation as σ𝑛𝑖=1 𝑥𝑖 , or simply as σ 𝑥.

For example, if X = the number of children in a household where 𝑥1 = 2, 𝑥2 = 3 and 𝑥3 = 5,

then the total number of children in all three households in the sample is:

σ3𝑖=1 𝑥𝑖 = σ 𝑥 = 𝑥1 + 𝑥2 + 𝑥3 = 2 + 3 + 5 = 10

Note: the square of the sum is not equal to the sum of squares, i.e. σ 𝑥 2 ≠ σ(𝑥 2 )
2
Example: σ 𝑥 = (𝑥1 +𝑥2 + 𝑥3 )² = (2 + 3 + 5)² = (10)² = 100

σ(𝑥 2 ) = (𝑥1 ² + 𝑥2 ² + 𝑥3 ²) = (2² + 3² + 5²) = 38

ATE01B1 – LU 1 20.

Marketing Analytics
100% (1)
Marketing Analytics
58 pages
Statistical Analysis of Data With Report Writing
100% (2)
Statistical Analysis of Data With Report Writing
16 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
SPSS 23 Step by Step Answers To Selected Exercises
No ratings yet
SPSS 23 Step by Step Answers To Selected Exercises
75 pages
Training and Development Methods Used in Vodafone and BSNL
75% (8)
Training and Development Methods Used in Vodafone and BSNL
52 pages
STAE Lecture Notes - LU1
No ratings yet
STAE Lecture Notes - LU1
7 pages
2 Lesson 1 Introduction
No ratings yet
2 Lesson 1 Introduction
3 pages
EM-104-Module
No ratings yet
EM-104-Module
12 pages
Statistical Techniques For Analyzing Quantitative Data
100% (1)
Statistical Techniques For Analyzing Quantitative Data
41 pages
Accounting Decision Tools
No ratings yet
Accounting Decision Tools
6 pages
Ecs Notes
No ratings yet
Ecs Notes
10 pages
Untitled
No ratings yet
Untitled
3 pages
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
No ratings yet
What Is Statistics?: "Statistics Is A Way To Get Information From Data"
220 pages
Statistics 1
No ratings yet
Statistics 1
4 pages
W1 Lesson 1 - Basic Statistical Concepts - Module PDF
No ratings yet
W1 Lesson 1 - Basic Statistical Concepts - Module PDF
11 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
CHP1 Mat161
No ratings yet
CHP1 Mat161
4 pages
Module 001 Basic Statistical Concept
No ratings yet
Module 001 Basic Statistical Concept
12 pages
Statistics Lecture 1
No ratings yet
Statistics Lecture 1
20 pages
Definition of Statistical Terms
100% (3)
Definition of Statistical Terms
6 pages
Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
No ratings yet
Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
20 pages
1 Introduction To Statistics-MPhil Lecture
No ratings yet
1 Introduction To Statistics-MPhil Lecture
31 pages
Prelim Lec 2017
No ratings yet
Prelim Lec 2017
49 pages
Chapter 1
No ratings yet
Chapter 1
14 pages
Chapter 1. Biostatistics
No ratings yet
Chapter 1. Biostatistics
34 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Statistics Lecture Notes - UNIT 1
No ratings yet
Statistics Lecture Notes - UNIT 1
8 pages
Statistics
No ratings yet
Statistics
9 pages
Engineering Stats Lecture Notes
No ratings yet
Engineering Stats Lecture Notes
143 pages
Presentation 6 Statistics
No ratings yet
Presentation 6 Statistics
27 pages
STA132 Lecture Notes - 1
No ratings yet
STA132 Lecture Notes - 1
6 pages
03-Statistics For Management
No ratings yet
03-Statistics For Management
8 pages
Statistical Method Note in One
No ratings yet
Statistical Method Note in One
129 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
60 pages
LESSON-5-PLANNING-DATA-ANALYSES
No ratings yet
LESSON-5-PLANNING-DATA-ANALYSES
19 pages
Lecture 1 Notes
No ratings yet
Lecture 1 Notes
6 pages
Data Management
No ratings yet
Data Management
36 pages
Stat 1-3 Chapters
No ratings yet
Stat 1-3 Chapters
36 pages
Fba 324 Research Seminar
No ratings yet
Fba 324 Research Seminar
7 pages
Introduction
No ratings yet
Introduction
10 pages
Lecture 1: Introduction To Statistics
No ratings yet
Lecture 1: Introduction To Statistics
23 pages
Nature of Statistics
No ratings yet
Nature of Statistics
7 pages
Nature of Statistics
100% (1)
Nature of Statistics
7 pages
Probability Theory
No ratings yet
Probability Theory
354 pages
Quantitative Methods Fairview Branch PDF
100% (1)
Quantitative Methods Fairview Branch PDF
82 pages
EECM3724_Unit_1_Ch3_slides_2022
No ratings yet
EECM3724_Unit_1_Ch3_slides_2022
48 pages
CG8_DATA-ANALYSIS
No ratings yet
CG8_DATA-ANALYSIS
63 pages
Lec Notes Business Stat
No ratings yet
Lec Notes Business Stat
7 pages
stat for for computer science
No ratings yet
stat for for computer science
50 pages
6.descriptve PPHD
No ratings yet
6.descriptve PPHD
70 pages
Basic Statistics notes
No ratings yet
Basic Statistics notes
10 pages
Data Management
No ratings yet
Data Management
7 pages
Statistics - Exam Reviewer (Final)
No ratings yet
Statistics - Exam Reviewer (Final)
10 pages
STATISTICAL
No ratings yet
STATISTICAL
20 pages
Statistics SLM
No ratings yet
Statistics SLM
7 pages
advancedstatistics-130526200328-phpapp02-converted
No ratings yet
advancedstatistics-130526200328-phpapp02-converted
104 pages
The Three MS: Analysis Data
No ratings yet
The Three MS: Analysis Data
5 pages
College 7 - Chapter 14&16 Zonder Antwoorden - Voor Student
No ratings yet
College 7 - Chapter 14&16 Zonder Antwoorden - Voor Student
42 pages
CH 1 Up 9 Probability Note (For Engineering) - 1
No ratings yet
CH 1 Up 9 Probability Note (For Engineering) - 1
93 pages
Statistical Terms
No ratings yet
Statistical Terms
11 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
15-02-22-GIS Fundamentals
No ratings yet
15-02-22-GIS Fundamentals
52 pages
Lý Thuyết
No ratings yet
Lý Thuyết
21 pages
Unit 3-Statistics
No ratings yet
Unit 3-Statistics
15 pages
Mathematics in The Modern World
100% (2)
Mathematics in The Modern World
27 pages
Statistics and Probability
No ratings yet
Statistics and Probability
21 pages
Future LPT's Notes
No ratings yet
Future LPT's Notes
196 pages
Units of Measurements: Measurement Scales
No ratings yet
Units of Measurements: Measurement Scales
14 pages
Curriculum Development
100% (1)
Curriculum Development
17 pages
CHAPTER 4 - Data Management
No ratings yet
CHAPTER 4 - Data Management
19 pages
Rama Prasad 1983
No ratings yet
Rama Prasad 1983
10 pages
01 Nominal Data
No ratings yet
01 Nominal Data
18 pages
Chapter 02 Graphical Descriptive Techniques 1
No ratings yet
Chapter 02 Graphical Descriptive Techniques 1
27 pages
q3 m6 3is Data Collection Procedure 1
No ratings yet
q3 m6 3is Data Collection Procedure 1
35 pages
Paper: "Definition, Types, and Examples of Research Variables"
No ratings yet
Paper: "Definition, Types, and Examples of Research Variables"
12 pages
Chapter 1 - Basic Concepts of Statistics
No ratings yet
Chapter 1 - Basic Concepts of Statistics
12 pages
SLM 1 Assessment in Learning 1 1
No ratings yet
SLM 1 Assessment in Learning 1 1
19 pages
Research Methodology JUNE 2022
No ratings yet
Research Methodology JUNE 2022
14 pages
Lec 1 - Biostatstics
No ratings yet
Lec 1 - Biostatstics
17 pages
MR Past Paper PDF
No ratings yet
MR Past Paper PDF
5 pages
Sta404 Tuto Chap 1
No ratings yet
Sta404 Tuto Chap 1
6 pages
Study Guide 2 - Philosophical Background of Reseach Approaches
No ratings yet
Study Guide 2 - Philosophical Background of Reseach Approaches
9 pages
Research Methodology - Measurement & Scaling Techniques
No ratings yet
Research Methodology - Measurement & Scaling Techniques
13 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Introduction - Sesh 1
No ratings yet
Introduction - Sesh 1
17 pages
Recruitment (Mod 9)
No ratings yet
Recruitment (Mod 9)
5 pages
Hypothesis Testing & SPSS
No ratings yet
Hypothesis Testing & SPSS
34 pages
GIS Theory
100% (1)
GIS Theory
421 pages