Assignment onec
Assignment onec
1. Qualitative data are nonnumeric variables and can't be measured. Qualitative data is
descriptive, expressed in terms of language rather than numerical values.Qualitative
data analysis describes information and cannot be measured or counted. It refers to the
words or labels used to describe certain characteristics or traits.You would turn to
qualitative data to answer the "why?" or "how?" questions. It is often used to investigate
open-ended studies, allowing participants (or customers) to show their true feelings and
actions without guidance.Examples include gender, religious affiliation, and state of birth.
2. Quantitative data are numerical variables and can be measured. Quantitative data
refers to any information that can be quantified — that is, numbers. If it can be
counted or measured, and given a numerical value, it's quantitative in nature.
Quantitative data can be expressed as a number or
can be quantified. Simply put, it can be measured by
numerical variables. Think of it as measuring stick.Quantitative
variables can tell you "how many," "how much," or "how often."
Quantitative data are easily amenable to statistical
manipulation and can be represented by a wide variety of
statistical types of graphs and charts such as line, bar graph,
scatter plot, and etc.
Examples of quantitative data:
Scores on tests and exams e.g. 85, 67, 90 and etc.
The weight of a person or a subject.
Your shoe size.
The temperature in a room.
There are 2 general types of quantitative data: discrete data
and continuous data. We will explain them later in this article.
Examples include balance in checking account, number of children in family. Note that
quantitative variables are either discrete (which can assume only certain values, and there
are usually "gaps" between the values, such as the number of bedrooms in your house) or
continuous (which can assume any value within a specific range, such as the air pressure in
a tire.)
Qualitative vs Quantitative
Data
4. Ordinal data
Ordinal data shows where a number is in order. This is the
crucial difference from nominal types of data.
Nominal Data
Nominal data is one of the types of qualitative information which helps to label the variables without
providing the numerical value. Nominal data is also called the nominal scale. It cannot be ordered
and measured. But sometimes, the data can be qualitative and quantitative. Examples of nominal
data are letters, symbols, words, gender etc.
The nominal data are examined using the grouping method. In this method, the data are grouped
into categories, and then the frequency or the percentage of the data can be calculated. These data
are visually represented using the pie charts.
Ordinal Data
Ordinal data/variable is a type of data that follows a natural order. The significant feature of the
nominal data is that the difference between the data values is not determined. This variable is mostly
found in surveys, finance, economics, questionnaires, and so on.
The ordinal data is commonly represented using a bar chart. These data are investigated and
interpreted through many visualisation tools. The information may be expressed using tables in
which each row in the table shows the distinct category.
Discrete Data
Discrete data can take only discrete values. Discrete information contains only a finite number of
possible values. Those values cannot be subdivided meaningfully. Here, things can be counted in
whole numbers.
Example: Number of students in the class
Continuous Data
Continuous data is data that can be calculated. It has an infinite number of probable values that can
be selected within a given specific range.
Example: Temperature range
1. Primary Data
Data measured or collect by the investigator or the user directly from the source.
Two activities involved: planning and measuring.
a) Planning:
Identify source and elements of the data.
Decide whether to consider sample or census.
If sampling is preferred, decide on sample size, selection method,
… etc
Decide measurement procedure.
Set up the necessary organizational structure.
b) Measuring: there are different options.
Focus Group
Telephone Interview
Mail Questionnaires
Door-to-Door Survey
Mall Intercept
New Product Registration
Personal Interview and
Experiments are some of the sources for collecting the primary
data.
2. Secondary Data
Data gathered or compiled from published and unpublished sources or files.
When our source is secondary data check that:
The type and objective of the situations.
The purpose for which the data are collected and compatible with
the present problem.
The nature and classification of data is appropriate to our
problem.
There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% * 100
n Where f= frequency of the class, n=total number of value.
Percentages are not normally a part of frequency distribution but they can be added since they are
used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing all the steps one can construct the following frequency distribution.
Class T a l l y Frequency Percent
S / / / / / / 7 2 8
D / / / / / / 7 2 8
W / / / / 6 2 4
Example:
The following data represent the mark of 20 students.
8 0 7 6 9 0 8 5 80
7 0 6 0 6 2 7 0 85
6 5 6 0 6 3 7 4 75
7 6 7 0 7 0 8 0 85
Construct a frequency distribution, which is ungrouped.
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
M a r k T a l l y Frequency
6 0 / / 2
6 2 / 1
6 3 / 1
6 5 / 1
7 0 / / / / 4
7 4 / 1
7 5 / / 2
7 6 / 1
8 0 / / / 3
8 5 / / / 3
9 0 / 1
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3) Grouped frequency Distribution:
-When the range of the data is large, the data must be grouped in to classes that are more than one
unit in width.
Definitions:
Grouped Frequency Distribution: a frequency distribution when several numbers are
grouped in one class.
Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one class
and lower limit of the next.
Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001, -----.
Class boundaries: Separates one class in a grouped frequency distribution from another. The
boundaries have one more decimal places than the row data and therefore do not appear in
the data. There is no gap between the upper boundary of one class and lower boundary of the
next class. The lower class boundary is found by subtracting U/2 from the corresponding
lower class limit and the upper class boundary is found by adding U/2 to the corresponding
upper class limit.
Class width: the difference between the upper and lower class boundaries of any class. It is
also the difference between the lower limits of any two consecutive classes or the difference
between any two consecutive class marks.
Class mark (Mid points): it is the average of the lower and upper class limits or the average
of upper and lower class boundary.
Cumulative frequency: is the number ofobservations less than/more than or equal to a
specific value.
Cumulative frequency above: it isthe total frequency of all values greater than or equal to
the lower class boundary of a given class.
Cumulative frequency blow: it isthe total frequency of all values less than or equal to the
upper class boundary of a given class.
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
Relative frequency (rf): it is the frequency divided by the total frequency.
Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This means that no data value can fall into two
different classes
3. The classes must be all inclusive or exhaustive. This means that all data values must be
included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width. The exception here is the first or last class. It is
possible to have an"below ..." or "... and above" class. This is often used with ages.
Steps for constructing Grouped frequency Distribution
1. Find the largest and smallest values
2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k 1 3.32 log n where k is number of classes desired and n is total number of
observation.
4. Find the class width by dividing the range by the number of classes and rounding up, not
R
w
off. k .
5. Pick a suitable starting point less than or equal to the minimum value. The starting point
is called the lower limit of the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the upper
limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units
from the upper limits. The boundaries are also half-way between the upper limit of one
class and the lower limit of the next class. !may not be necessary to find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may
not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example*:
Construct a frequency distribution for the following data.
1 1 29 6 33 14 31 22 27 19 20
1 8 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes desired using Sturges formula;
k 1 3.32 log n =1+3.32log (20) =5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 1 1
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries;
E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
Then continue adding w on both boundaries to obtain the rest boundaries. By doing so
one can obtain the following classes.
Class boundary
5 . 5 – 1 1 . 5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
CLASS
Boys Men
Girls Women
Pictogram
-In these diagram, we represent data by means of some picture symbols. We decide abut a
suitable picture to represent a definite number of units in which the variable is measured.
Year 1 9 8 9 1 9 9 0 1 9 9 1 1 9 9 2
Population 2 0 0 0 3 0 0 0 5 0 0 0 7 0 0 0
Bar Charts:
- A set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being :
Simple bar chart
Deviation o0r two way bar chart
Broken bar chart
Component or sub divided bar chart.
Multiple bar charts.
Solutions:
30
25
Sales in $
20
15
10
5
0
A B C
product
100
80
Sales in $
Product C
60
Product B
40
Product A
20
0
1957 1958 1959
Year of production
60
50
Sales in $
40 Product A
30 Product B
20 Product C
10
0
1957 1958 1959
Year of production
Va l u e F re q u e n c y
0
2. 5 8. 5 14. 5 20. 5 26. 5 32. 5 38. 5 44. 5
Cl a s s Mi d p o i n t s
X i
The symbol i 1 is a mathematical shorthand for X1+X2+X3+...+XN
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add up all the
numbers."
Example: Suppose the following were scores made on the first homework assignment for five
students in the class: 5, 7, 7, 6, and 8. In this example set of five numbers, where N=5, the
summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the sequence of
summation. If the expression were written with "i=3", the summation would start with the third
number in the set. For example:
In the example set of numbers, this would give the following result:
The "N" in the upper part of the summation notation tells where to end the sequence of
summation. If there were only three scores then the summation and example would be:
Sometimes if the summation notation is used in an expression and the expression must be written
a number of times, as in a proof, then a shorthand notation for the shorthand notation is
employed. When the summation sign "" is used without additional notation, then "i=1" and "N"
are assumed.
For example:
PROPERTIES OF SUMMATION
n
k nk
1. i 1 where k is any constant
n n
kX i k X i
2. i 1 i 1 where k is any constant
n n
(a bX i ) na b X i
3. i 1 i 1 where a and b are any constant
n n n
(X i Yi ) X i Yi
4. i 1 i 1 i 1
X i
a) i 1
5
Y i
b) i 1
5
10
c) i 1
5
(X i Yi )
d) i 1
5
(X i Yi )
e) i 1
5
X Y i i
f) i 1
5
X
2
i
g) i 1
5 5
( X i )( Yi )
h) i 1 i 1
Solutions:
5
X i 5 7 7 6 8 33
a) i 1
5
Y i 6 7 8 7 8 36
b) i 1
5
10 5 *10 50
c) i 1
5
(X i Yi ) (5 6) (7 7) (7 8) (6 7) (8 8) 3 33 36
e) i 1
5
X Y i i 5 * 6 7 * 7 7 * 8 6 * 7 8 * 8 241
f) i 1
5
X
2
i 5 2 7 2 7 2 6 2 8 2 223
g) i 1
5 5
( X i )( Yi ) 33 * 36 1188
h) i 1 i 1
X i
X i 1
n
If X1 occurs f1 times
If X2occurs f2 times
.
.
If Xn occurs fn times
k
fX i i
X i 1k k
fi f i n
Then the mean will be i 1 , where k is the number of classes and i 1
f
i 1
i Xi
36
X 4
5.15
7
f i 1
i
f
i 1
i Xi
X k
, Where
f i
i 1 Xi =the class mark of the ith class and fi= the frequency of the ith class
Example: calculate the mean for the following age distribution.
C l a s s Frequency
6 - 1 0 3 5
11- 15 2 3
16- 20 1 5
21- 25 1 2
26- 30 9
31- 35 6
Solutions:
First find the class marks
.
Find the product of frequency and class marks. Find mean using the formula.
C l a s s f i X i Xifi
6 - 1 0 3 5 8 2 8 0
6
1 1 - 1 5 2 3 1 3 2 9 9
1 6 - 2 0 1 5 1 8 2 7 0
i 1
fi X i
1575
X 6 15.75
2 1 - 2 5 1 2 2 3 2 7 6 100
2 6 - 3 0 9 2 8 2 5 2 i 1
fi
3 1 - 3 5 6 3 3 1 9 8 .
T o t a l 100 1575
Special properties of Arithmetic mean
1. The sum of the deviations of a set of items from their mean is always zero. i.e.
n
(X
i 1
i X ) 0.
2. The sum of the squared deviations of a set of items from their mean is the minimum. i.e.
n n
( Xi
i 1
X ) 2 ( X i A) 2 , A X
i 1
X 1 n1 X 2 n 2 .... X k n k X i ni
Xc i 1k
n1 n2 ...n k
n i 1
i
X 1n1 X 2 n2 X n i i
Xc i 12
n1 n2
n
i 1
i
XW
i 1
i i
Xw n
W i 1
i
XW
i 1
i i
60 * 1 75 * 2 63 * 1 59 * 3 55 * 3 615
Xw 5
61.5
1 2 1 3 3 10
W i 1
i
Xi
i 1 , i 1
If observations X1, X2, …Xn have weights W1, W2, …Wn respectively, then their harmonic mean
is given by
n
Wi 1
i
H .M n
W i Xi
i 1 , This is called Weighted Harmonic Mean.
Remark: The Harmonic Mean is useful and appropriate in finding average speeds and average
rates.
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back from the college to
his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
2
H .M 12km / hr
1 1
10 15
The Mode
- Mode is a value which occurs most frequently in a set of values
- The mode may not exist and even if it does exist, it may not be unique.
- In case of discrete distribution the value having the maximum frequency is the model value.
Examples:
1. Find the mode of 5, 3, 5, 8, 9
Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
- The mode of a set of numbers X1, X2, …Xn is usually denoted by X̂ .
Mode for Grouped data
If data are given in the shape of continuous frequency distribution, the mode is defined as:
1
Xˆ Lmo w
1 2
Where:
Xˆ the mod e of the distributi on
w the size of the mod al class
1 f mo f 1
2 f mo f 2
f mo frequency of the mod al class
f 1 frequency of the class preceeding the mod al class
f 2 frequency of the class following the mod al class
Note: The modal class is a class with the highest frequency.
Example: Following is the distribution of the size of certain farms selected at random from a
district. Calculate the mode of the distribution.
Size of farms No. of farms
5 - 1 5 8
1 5 - 2 5 1 2
2 5 - 3 5 1 7
3 5 - 4 5 2 9
4 5 - 5 5 3 1
5 5 - 6 5 5
6 5 - 7 5 3
Solutions:
45 55 is the mod al class , sin ce it is a class with the highest frequency .
Lmo 45
w 10
1 f mo f1 2
2 f mo f 2 26
f mo 31
f1 29
f 2 5
2
Xˆ 45 10
2 26
45.71
Merits and Demerits of Mode
Merits:
It is not affected by extreme observations.
Easy to calculate and simple to understand.
It can be calculated for distribution with open end class
Demerits:
It is not rigidly defined.
It is not based on all observations
It is not suitable for further mathematical treatment.
It is not stable average, i.e. it is affected by fluctuations of sampling to some
extent.
Often its value is not unique.
Note: being the point of maximum density, mode is especially useful in finding the most popular
size in studies relating to marketing, trade, business, and industry. It is the appropriate average to
be used to find the ideal size.
The Median
- In a distribution, median is the value of the variable which divides it in to two equal halves.
- In an ordered series of data median is an observation lying exactly in the middle of the series. It is the
middle most value in the sense that the number of values less than the median is equal to the number
of values greater than it.
-If X1, X2, …Xn be the observations, then the numbers arranged in ascending order will be X [1], X[2],
…X[n], where X[i] is ith smallest value.
X[1]< X[2]< …<X[n]
-Median is denoted by X̂ .
Median for ungrouped data
X , If n is odd .
~ ( n 1) 2
X 1
(X X ), If n is even
2 n 2 ( n 2) 1
1
( X [ 3] X [ 4 ] )
2
1
(5 6) 5.5
2
b) Order the data :1, 2, 3, 5, 8
Here n=5
~
X X n 1
[ ]
2
X [ 3]
3
Median for grouped data
If data are given in the shape of continuous frequency distribution, the median is defined as:
~ w n
X Lmed ( c)
f med 2
Where :
Lmed lower class boundary of the median class .
w the size of the median class
n total number of observatio ns.
c the cumulative frequency (less than type ) preceeding the median class .
f med thefrequen cy of the median class .
Remark:
The median class is the class with the smallest cumulative frequency (less than type) greater than or
n
equal to 2 .
Example: Find the median of the following distribution.
C l a s s Frequency
40-44 7
45-49 1 0
50-54 2 2
55-59 1 5
60-64 1 2
65-69 6
70-74 3
Solutions:
First find the less than cumulative frequency.
Identify the median class.
Find median using formula.
Clas Frequenc Cumu.Freq(less than type)
s y
40-44 7 7
45-49 1 0 1 7
50-54 2 2 3 9
55-59 1 5 5 4
60-64 1 2 6 6
65-69 6 7 2
70-74 3 7 5
n 75
37.5
2 2
39 is the first cumulative frequency to be greater than or equal to 37.5
50 54 is the median class .
L 49.5, w 5
med
n 75, c 17, f 22
med
~
X L w ( n c)
med f 2
med
49.5 5 (37.5 17)
22
54.16
Q5.Measure of variation
Measures of Dispersion (Variation)
For this reason, among others, the range is not the most important measure of variability.
Solutions :( 2)
R 4 L S 4 _________________(1)
RR 0.25 L S 16 _____________( 2)
Solving (1) and ( 2) at the same time , one can obtain the following value
L 10 and S 6
The Variance
Population Variance
If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".
1
Population Varince 2 ( X i ) 2 , i 1,2,.....N
N
For the case of frequency distribution it is expressed as:
1
Population Varince 2 f i ( X i ) 2 , i 1,2,.....k
N
Sample Variance
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
the corresponding parameter. This formula has the problem that the estimated value isn't the
same as the parameter. To counteract this, the sum of the squares of the deviations is divided by
one less than the sample size.
1
Sample Varince S 2
n 1
( X i X ) 2 , i 1,2,....., n
For the case of frequency distribution it is expressed as:
1
Sample Varince S 2
n 1
f i ( X i X ) 2 , i 1,2,.....k
We usually use the following short cut formula.
n
2
X i
2
nX 2
S i 1
, for raw data.
n 1
k
f i
2
X i nX 2
S 2 i 1 , for frequency distributi on.
n 1
Standard Deviation
Solutions:
1. X 11
X i 5 1 0 1 2 1 7 Total
3 6 1 1 3 6 7 4
(Xi- X ) 2
n
(X i X )2
74
S2 i 1
24.67.
n 1 3
S S2 24.67 4.97.
2. X 55
X ( C . M )
i 42 4 7 5 2 5 7 6 2 6 7 7 2 Total
118 6 4 0 1 98 6 0 5 88 86 86 4 4 0 0
f (X - X ) 2
i i
3 4 7
n
2
i 1
fi ( X i X )2
4400
S 59.46.
n 1 74
S S2 59.46 7.71.
Chebyshev's Theorem
For any data set ,no matter what the pattern of variation, the proportion of the values that fall
1
1
within k standard deviations of the mean or ( X kS , X kS )
will be at least k 2 , where k is
an number greater than 1. i.e. the proportion of items falling beyond k standard deviations of
1
the mean is at most k2
Example: Suppose a distribution has mean 50 and standard deviation 6.What percent of the
numbers are:
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.
Solutions:
a) 38 and 62 are at equal distance from the mean,50 and this distance is 12
ks 12
12 12
k 2
S 6
1
(1 ) * 100% 75%
Applying the above theorem at least k2 of the numbers lie between 38 and 62.
b) Similarly done.
1
2
*100% 25%
c) It is just the complement of a) i.e. at most k of the numbers lie less than 32
or more than 62.
d) Similarly done.
Example 2: The average score of a special test of knowledge of wood refinishing has a mean of
53 and standard deviation of 6. Find the range of values in which at least 75% the scores will lie.
(Exercise)
3. If the standard deviation of X 1 , X 2 , ..... X n is S , then the standard deviation of
a) X 1 k , X 2 k , ..... X n k will also be S
b) kX , kX , .....kX would be k S
1 2 n
a kX 1 , a kX 2 , .....a kX n would be k S
c)
Examples:
1. The mean and standard deviation of n Tetracycline Capsules X 1 , X 2 , ..... X n are known to be
12 gm and 3 gm respectively. New set of capsules of another drug are obtained by the linear
transformation Yi = 2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the standard deviation of the
new set of capsules
2. The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a. If 10 is added to each of the numbers in the set, then what will be the variance
and standard deviation of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will be the
variance and standard deviation of the new set?
Solutions:
2 * 3 6
1. Using c) above the new standard deviation = k S
2. a. They will remain the same.
b. New standard deviation= k S 5 * 10 50
Examples:
1. Two sections were given introduction to statistics examinations. The following information
was given.
V a l u e Section 1 Section 2
M e a n 7 8 9 0
Stan.deviation 6 5
Student A from section 1 scored 90 and student B from section 2 scored 95.Relatively speaking
who performed better?
Solutions: Calculate the standard score of both students.
X A X1 90 78
ZA 2
S1 6
XB X2 95 90
ZB 1
S2 5
Student A performed better relative to his section because the score of student A is two
standard deviation above the mean score of his section while, the score of student B is only one
standard deviation above the mean score of his section.
2. Two groups of people were trained to perform a certain task and tested to find out which
group is faster to learn the task. For the two groups the following information was given:
V a l u e Group one Group two
M e a n 10.4 min 11.9 min
Stan.dev. 1 . 2 m i n 1 . 3 m i n
Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B
from Group two take 9.3 minutes, who was faster in performing the
task? Why?
Solutions:
a) Use coefficient of variation.
S1 1 .2
C.V1 * 100 * 100 11.54%
X1 10.4
S2 1.3
C.V2 *100 *100 10.92%
X2 11.9
Since C.V2< C.V1, group 2 is more consistent.
b) Calculate the standard score of A and B
X A X1 9.2 10.4
ZA 1
S1 1 .2
XB X2 9.3 11.9
ZB 2
S2 1.3
Child B is faster because the time taken by child B is two standard deviation shorter than the
average time taken by group 2 while, the time taken by child A is only one standard deviation
shorter than the average time taken by group 1.
Assignment two
• Write the formula with its description for the variance of
discrete and continuous random variable.
• Give one example for each of them
Write the formula with its description for the variance of discrete and continuous random
variable. Give one example for each of them.
• For a discrete random variable X, the variance of X is obtained as
follows: var(X)=∑(x−μ)2pX(x), where the sum is taken over all values of x for which
pX(x)>0. So the variance of X is the weighted average of the squared deviations from the
mean μ, where the weights are given by the probability function pX(x) of X. Discrete
and Continuous Random Variables:
X 2 3 4 5 6 7 8 9 10 11 12
P(X)
The mean of a discrete random variable, X, is its weighted average. Each value of X is weighted
by its probability.
To find the mean of X, multiply each value of X by its probability, then add all the products.
As the number of observations increases, the mean of the observed values, , approaches
The more variation in the outcomes, the more trials are needed to ensure that is close
to .
Rules for Means:
Example:
Suppose the equation Y = 20 + 100X converts a PSAT math score, X, into an SAT math score,
Y. Suppose the average PSAT math score is 48. What is the average SAT math score?
Example:
Let represent the average SAT math score.
The mathematical formula for the variance of a continuous random variable?The formula is the
same whether it is a discrete random variable or a continuous random variable.
Explanation:
irrespective of the type of random variable, the formula for variance is σ2 = E(X2) - [E(X)]2.
However, if the random variable is discrete, we use the process of summation.
In the case of a continuous random variable, we use the integral.
E(X2) = ∫∞−∞x2f(x)dx.
E(X) = ∫∞−∞xf(x)dx.
From this, we get σ2 by substitution.
For a discrete random variable X, ΩX⊆{0,1,2,…}ΩX⊆{0,1,2,…}, we can write
E[X]=∑x=0∞(1−F(x))E[X]=∑x=0∞(1−F(x))
This formula is proving convenient to me on the current problem I'm working on where the
cumulative probability of being in a "sink" state naturally comes out of formulating the problem
in terms of a transition matrix.
However, I was wondering whether there is an analogous formula for variance in terms of the
CDF, or whether if I want the variance I'm going to have to change tack?
I'm thinking there isn't such a formula, because variance is defined as E[(X−μ)2]E[(X−μ)2] and
although (X−μ)2(X−μ)2 is positive, it isn't an integer and so a similar approach won't work.
There is such a thing as Var(X)=E[X2]−(E[X])2Var(X)=E[X2]−(E[X])2, and I suspect you can
reuse your trick to find E[X2]E[X2].
Suppose the standard deviation for the SAT math score is 150 points, and the standard deviation
for the SAT verbal score is 165 points. What is the standard deviation for the combined SAT
score?
Because the SAT math score and SAT verbal score are not independent, the rule for adding
variances does not apply!
Continuous random variable is a random variable that can take on a continuum of values. In
other words, a random variable is said to be continuous if it assumes a value that falls between a
particular interval.
Continuous random variables are used to denote measurements such as height, weight, time, etc.
The area under a density curve is used to represent a continuous random variable. In this article,
we will learn about the definition of a continuous random variable, its mean, variance, types, and
associated examples.
A continuous random variable and a discrete random variable are the two types of random
variables. A random variable is a variable whose value depends on all the possible outcomes of
an experiment. A continuous random variable is defined over a range of values while a discrete
random variable is defined at an exact value.
A continuous random variable can be defined as a random variable that can take on an infinite
number of possible values. Due to this, the probability that a continuous random variable will
take on an exact value is 0. The cumulative distribution function and the probability density
function are used to describe the characteristics of a continuous random variable.
Continuous Random Variable Example
where x ∈ [0, 1]. The probability that X takes on a value between 1/2 and 1 needs to be
Suppose the probability density function of a continuous random variable, X, is given by 4x 3,
determined. This can be done by integrating 4x 3 between 1/2 and 1. Thus, the required
probability is 15/16.
The variance of a continuous random variable can be defined as the expectation of the squared
differences from the mean. It helps to determine the dispersion in the distribution of the
continuous random variable with respect to the mean. The formula is given as follows:
Var(X) = σ2=∫∞−∞(x−μ)2f(x)dxσ2=∫−∞∞(x−μ)2f(x)dx
A continuous random variable is usually used to model situations that involve measurements. For
example, the possible values of the temperature on any given day. As the temperature could be
any real number in a given interval thus, a continuous random variable is required to describe it.
Some important continuous random variables associated with certain probability distributions are
given below.
Continuous Random Variable vs Discrete Random Variable. Both discrete and continuous
random variables are used to model a random phenomenon. The differences between a
continuous random variable and discrete random variable are given in the table below:
The value of a continuous random variable falls between a The value of a discrete random
range of values. variable is an exact value.
A continuous random variable can take on an infinite number Such a variable can take on a finite
Continuous Random Variable Discrete Random Variable
A continuous random variable is a variable that is used to model continuous data and its
value falls between an interval of values.
The probability density function of a continuous random variable is given as f(x)
= dF(x)dxdF(x)dx = F'(x).
The cumulative distribution function is given by P(a < X ≤ b) = F(b) - F(a)
= ∫baf(x)dx∫abf(x)dx.
The mean of a continuous random variable is E[X] = μ=∫∞−∞xf(x)dxμ=∫−∞∞xf(x)dx and
variance is Var(X) = σ2=∫∞−∞(x−μ)2f(x)dxσ2=∫−∞∞(x−μ)2f(x)dx.
Uniform random variable, exponential random variable, normal random variable, and
standard normal random variable are examples of continuous random variables.