Faculty of Information Science & Technology (FIST) : PSM 0325 Introduction To Probability and Statistics
Faculty of Information Science & Technology (FIST) : PSM 0325 Introduction To Probability and Statistics
(FIST)
PSM 0325
Introduction to Probability and Statistics
ONLINE NOTES
Topic 1
Descriptive Statistics
TOPIC 1
DESCRIPTIVE STATISTICS
References :
Introduction to Probability and Statistics, Assliza Salim. et al.,Pearson. 2011
Objectives:
1. Understand what is meant by statistics, population, sample, quantitative and qualitative
data, discrete and continuous variable.
2. Be able to present the set of data by using frequency distribution table, bar charts, pie
charts, histogram, polygon and ogive.
3. Be able to find mean, median, mode and also range, variance and standard deviation.
Contents:
1. Introductions
2. Organizing data
3. Measurement of central tendency and dispersion
INTRODUCTION
Process of statistics:
1. Identify the research objective
- Identify the purpose of the study, determine the questions to be asked, set a
target group.
2. Collect the information needed
- Collect data from a population or sample.
3. Organize, summarize and analyze the information
- Descriptive statistics – Organize the data collected either in a numerical
method or graphical method.
4. Make decision or draw conclusion
- Inferential statistics – The data collected from the sample is generalized to the
population.
__________________________________________________________________________________
1/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
Definition 1.2:
Descriptive Statistics is a field of study which involves organizing, displaying and
describing data by using tables, graphs and summary measures.
Definition 1.3:
Inferential Statistics is a field of study that used sample results to make decisions about
population.
Definition 1.5:
Sample refers to a certain number of elements that have been chosen from a population
for observation. Sample is subset to population.
For example, choose any 100 students in UNITELE for interviews. The sample size is
100.
Definition 1.7:
Variable is a characteristic under study.
Definition 1.8:
The value of the variable for an element is called an observation or measurement.
Definition 1.9:
A data set is a collection of observations on one or more variables.
__________________________________________________________________________________
2/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
ORGANIZING DATA
Definition 2.1:
Once data has been collected, before they are processed or ranked we called raw data.
Raw data also called as individual data.
Definition 2.2:
Frequency distribution is the lists of all categories or classes and the number of
elements or values that belong to each of the categories or classes.
BAR GRAPHS
The graph that is used to display ungrouped frequency contained in the frequency
distribution.
frequency or the relative frequency - height or the length of the bar (y - axis).
different category - horizontal axis (x - axis).
Bars are separated, the gap between each bar is uniform, all bars should be of the same
width.
__________________________________________________________________________________
3/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
BAR GRAPHS
10
Frequency 8
6
4
2
0
MIS
Eco
Bus
BS
Ot
Major
PIE CHARTS
A circle divided into portions that represent the relative frequencies or percentages of a
population or a sample belonging to different categories is called a pie chart.
PIE CHART
Bus
Eco
MIS
BS
Ot
Definition 2.3
A class interval is a range of values defined by the lower class limit and upper class limit.
Definition 2.4:
Class boundary is the midpoint of the upper limit of one class and the lower limit of the
next class.
Definition 2.5:
Class midpoint or class mark is a average of lower class limit and upper class limit.
Formula:
upper class limit + lower class limit
Class midpoint =
2
Definition 2.6:
Range is equal to highest value minus lowest value.
__________________________________________________________________________________
4/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
Definition 2.7:
Number of classes can be obtained by using Sturge’s formula.
c 1 3.3 log n where c = number of classes
n = number of observations
Definition 2.8:
Range
Class size =
number of classes
Definition 2.9:
Tally marks used to count class frequency by marking strokes against each class for each
data that falls in that class.
2.3.3 CUMULATIVE FREQUENCY DISTRIBUTION
Cumulative frequencies are obtained by finding the total number of values or frequency
that fall below the upper class boundary of each class.
HISTOGRAMS
- It is a graphical representation of a grouped frequency distribution with class intervals/
class boundaries at horizontal axis and frequency at vertical axis.
- It is obtained by adjoining rectangles, the width of each rectangle is the size of each
class and the height of each rectangle is the frequency of the class. The area of each
rectangle is important.
frequency HISTOGRAM
12
10
8
6
4
2
0
72-74 75-77 78-80 81-83 84-86
Height (in inches)
__________________________________________________________________________________
5/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
POLYGON
12
10
Frequency
8
6
4
2
0
72-74 75-77 78-80 81-83 84-86
Height (in inches)
OGIVES
It is the graphical representations of a cumulative frequency distribution. Ogive can be
drawn by joining with straight lines the dots marked above the upper boundaries of
classes at heights.
OGIVE
35
Cumulative Frequency
30
25
20
15
10
5
0
71.5 74.5 77.5 80.5 83.5 86.5
Height (in inches)
DESCRIPTIVE MEASURES
3.1.1 MEAN
(i) Ungrouped Data
__________________________________________________________________________________
6/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
Population mean
3.1.2 MEDIAN
The median is the value of the item, which is located at the center of the ranked
distribution.
__________________________________________________________________________________
7/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
3.1.3 MODE
The mode is the value, which occurs most frequently in a distribution.
Note: In any set of data may be there is no mode, or one or more than one mode.
1. Symmetry
The mean, median and mode all have the same values.
3.2.2 VARIANCE
The standard deviation measures the spread of the data as compared to the mean.
x N
2
x n
2
2 s2
N n 1
__________________________________________________________________________________
8/ 8
PSM0325 Introduction to Probability and Statistics Topic 1
m f N
2
m f n , n f
2
2
s
2
N n 1
__________________________________________________________________________________
9/ 8