0% found this document useful (0 votes)
64 views

02 Data and Preliminary Data Analysis - Print

The document provides information on preliminary data analysis techniques, including: 1) Definitions of key terms like raw data, frequency distribution, class intervals, and measures of central tendency. 2) Procedures for forming frequency distributions from raw data by determining class intervals and frequencies. 3) Descriptions of common measures of central tendency - the arithmetic mean, median, and mode - and how to calculate them from grouped or ungrouped data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

02 Data and Preliminary Data Analysis - Print

The document provides information on preliminary data analysis techniques, including: 1) Definitions of key terms like raw data, frequency distribution, class intervals, and measures of central tendency. 2) Procedures for forming frequency distributions from raw data by determining class intervals and frequencies. 3) Descriptions of common measures of central tendency - the arithmetic mean, median, and mode - and how to calculate them from grouped or ungrouped data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Data

DR. FLORENTINA PUNGKY PRAMESTI, ST., MT.


Preliminary Data Analysis

 Raw data: collected data that have not been organized


numerically. e.g: the set of heights of 100 male students obtained from
an alphabetical listing of university records
 Array: an arrangement of raw numerical data in ascending or
descending order of magnitude
 Frequency distribution (frequency table): a tabular
arrangement of data by classes together with the corresponding
class frequencies
 Class Intervals And Class Limits
 Class Boundaries. Bisa jg merupakan simbol kelas. Should
not coincide with actual observation ➔ to avoid ambiguity
 The size, or width, of a class interval is the difference between the lower and
upper class boundaries
 The class mark is the midpoint of the class interval

Forming frequency distributions


 Determine the largest and smallest numbers in the raw data and thus find the
range.
 Divide the range into a convenient number of class intervals having the same
size. If this is not feasible, use class intervals of different sizes or open class
intervals. The number of class intervals is usually between 5 and 20, depending
on the data. Class intervals are also chosen so that the class marks (or midpoints)
coincide with the actually observed data. This tends to lessen the so-called
grouping error involved in further mathematical analysis. However, the class
boundaries should not coincide with the actually observed data.
 Determine the number of observations falling into each class interval; that is, find
the class frequencies. This is best done by using a tally, or score sheet.
 Table 1 shows a frequency
distribution of the weekly wages
of 65 employees at the P&R
Company.
 Five new employees were hired
at weekly wages of $285.34,
$316.83, $335.78, $356.21, and
$374.50. Construct a frequency
distribution of wages for the 70
employees.
Graphic of the frequency

 histogram or frequency histogram: set of rectangles


Graphic of the frequency

 frequency polygon : line graph


Cumulative-frequency distributions and ogives

 Ogive: cumulative-frequency polygon


 Can you make it?
Case

 The final grades in mathematics of 80 students at State


University are recorded in the accompanying table
Case
Measuring the central tendency

 Arithmetic mean

 Arithmetic weighed mean

 Arithmetic mean from grouped data

Size of class intervals : c,


A : any guessed or assumed arithmetic
mean (which may be any number)
Deviations dj = Xj A, expressed as cuj ,
where uj can be positive or negative
integers or zero
either the middle value or the arithmetic mean of the
Median two middle values.
The set of numbers 3, 4, 4, 5, 6, 8, 8, 8, and 10 has median 6

The set of numbers 5, 5, 7, 9, 11, 12, 15, and 18 has median ½ *(9+11) =10

For grouped data

L1 : lower class boundary of the median class (i.e., the class containing the median)
N : number of items in the data (i.e., total frequency)
( f)1 : sum of frequencies of all classes lower than the median class
fmedian : frequency of the median class
c : size of the median class interval
value which occurs with the greatest frequency
Mode
The set 2, 2, 5, 7, 9, 9, 9, 10, 10, 11, 12, and 18 has mode 9.

The set 3, 5, 8, 10, 12, 15, and 16 has no mode

The set 2, 3, 4, 4, 4, 5, 5, 7, 7, 7, and 9 has two modes, 4 and 7, and is called bimodal

A distribution having only one mode is called unimodal

where L1 : lower class boundary of the modal class (i.e.,


the class containing the mode)
1 : excess of modal frequency over frequency of
next-lower class
2 : excess of modal frequency over frequency of
next-higher class
c : size of the modal class interval
EMPIRICAL RELATION BETWEEN THE MEAN, MEDIAN, AND MODE

For unimodal frequency curves that are


moderately skewed (asymmetrical), we
have the empirical
relation

Mean - mode = 3(mean - median)


THE GEOMETRIC MEAN G

THE HARMONIC MEAN H The geometric mean of the numbers 2, 4, and 8


is ….
And The harmonic mean
is ….

RELATION BETWEEN THE ARITHMETIC, GEOMETRIC, AND


HARMONIC MEANS
H  G  X
THE ROOT MEAN SQUARE

The RMS of the set 1, 3, 4, 5, and 7 is

QUARTILES, DECILES, AND PERCENTILES


Standard deviation for a grouped data

 fiXi − ( fiXi) / n


2 2

=w
n−1
W: class width n – 1 : degree of freedom (page 39)
Fi: Frequency ➔ membicarakan sampel maka gunakan
Xi: class mid point or deviation degree of freedom (n - m)
from an arbitrary origin ➔ membicarakan populasi gunakan n

Standard deviation  (xi −  )


n 2

= i=1
for an ungrouped data n
Catatan
will learn about the construction of ogive or cumulative frequency curve and cumulative
frequency polygon. There are two methods of constructing frequency polygon and cumulative
frequency curve but the techniques of drawing it is same.

1) Less than method


2) More than method

Less than method :


First prepare a less than type cumulative frequency table.
1) On the x – axis use the upper limits of the class.
2) Mark the less than type cumulative frequency on y – axis.
3) Plot the points using upper limits and corresponding cumulative frequencies.
4) Join the points by a free hand curve to get ogive and to get the cumulative frequency
polygon join the points by line segments.

More than method :


First prepare a more than type cumulative frequency table.
5) On the x – axis use the lower limits of the class.
6) Mark the more than type cumulative frequency on y – axis.
7) Plot the points using upper limits and corresponding cumulative frequencies.
8) Join the points by a free hand curve to get ogive and to get the cumulative frequency
polygon join the points by line segments.

You might also like