Ii Sem Ba Notes
Ii Sem Ba Notes
Types of variables
A variable is a characteristic that can be measured and that can
assume different values. Height, age, income, province or country of
birth, grades obtained at school and type of housing are all examples
of variables. Variables may be classified into two main categories:
categorical and numeric. Each category is then classified in two
subcategories: nominal or ordinal for categorical variables, discrete or
continuous for numeric variables. These types are briefly outlined in
this section.
Categorical variables
A categorical variable (also called qualitative variable) refers to a
characteristic that can’t be quantifiable. Categorical variables can be
either nominal or ordinal.
Nominal variables
A nominal variable is one that describes a name, label or category
without natural order. Sex and type of dwelling are examples of
nominal variables. In Table the variable “mode of transportation for
travel to work” is also nominal.
Ordinal variables
An ordinal variable is a variable whose values are defined by an order
relation between the different categories. In Table the variable
“behaviour” is ordinal because the category “Excellent” is better than
the category “Very good,” which is better than the category “Good,”
etc. There is some natural ordering, but it is limited since we do not
know by how much “Excellent” behaviour is better than “Very good”
behaviour.
Numeric variables
A numeric variable (also called quantitative variable) is a quantifiable
characteristic whose values are numbers (except numbers which are
codes standing up for categories). Numeric variables may be either
continuous or discrete.
Continuous variables
A variable is said to be continuous if it can assume an infinite number
of real values within a given interval. For instance, consider the height
of a student. The height can’t take any values. It can’t be negative and
it can’t be higher than three metres. But between 0 and 3, the number
of possible values is theoretically infinite. A student may be
1.6321748755 … metres tall. In practice, the methods used and the
accuracy of the measurement instrument will restrict the precision of
the variable. The reported height would be rounded to the nearest
centimetre, so it would be 1.63 metres. The age is another example of
a continuous variable that is typically rounded down.
Discrete variables
As opposed to a continuous variable, a discrete variable can assume
only a finite number of real values within a given interval. An example
of a discrete variable would be the score given by a judge to a
gymnast in competition: the range is 0 to 10 and the score is always
given to one decimal (e.g. a score of 8.5). You can enumerate all
possible values (0, 0.1, 0.2…) and see that the number of possible
values is finite: it is 101! Another example of a discrete variable is the
number of people in a household for a household of size 20 or less.
The number of possible values is 20, because it’s not possible for a
household to include a number of people that would be a fraction of an
integer like 2.27 for instance.
There are two subtypes of categorical data namely: Nominal data and
Ordinal data.
• Nominal data – this is also called naming data. This is a type that
names or labels the data and its characteristics are similar to a
noun. Example: person’s name, gender, school name.
Questions to gather nominal data look like:
It also has two subtypes known as Discrete data and Continuous data.
Continuous data is further divided into two categories: Interval and Ratio.
• Ratio data – unlike interval data, ratio data has zero points. Being
similar to interval data, zero point is the only difference they
have. Example: in the body temperature, the zero point
temperature can be measured in Kelvin.
Start uncovering high quality insights now!
• 20
• No order scale
• Has an ordered scale
• Natural language
• Not use of natural
description
language description
• Can take numerical
• Takes numeric values
Characteristics values but with
with numeric qualities
qualitative properties
• Can be visualized using
• Can be visualized using
bar charts and pie
bar charts and pie
charts
charts
Questionnaires, surveys,
Data collection Questionnaires, surveys, and
interviews, focus groups and
tools interviews
observations
Is known as unstructured or
semi-structured data It can It is structured data and can
Structure use indexing methods to be quickly organized and
structure data like Google, made sense of
Bing, etc.
BASIS FOR
DISCRETE DATA CONTINUOUS DATA
COMPARISON
(ii) Median
(iii) Mode
Arithmetic Mean :
Arithmetic mean of a given set of observation is their sum divided by the number of
observation. It is denoted by 𝑥̅ and is given by :
Grouped data:
Discrete data: Let x1 ,x2 ,…………xn are the given n observation with corresponding frequency .
Continuous data:
In continuous data mean where frequency are given along with the value of the variable in the
form of class intervals. Then the arithmetic mean is
1
𝑥̅ = 𝑁 ∑𝑛𝑖=1 fI xi where Xi is the arithmetic mean of class intervals
ℎ Xi − A
Step deviation method : 𝑥̅ = A + ∑𝑛𝑖=1 fI di where di =
𝑁 ℎ
Median :
A second measure of location that may be used to describe the “ center “ or “ middle “ of a set data is
called median. It is defined as the value of the middle item when the items are arranged in an increasing
or decreasing order of their magnitude.
Un grouped data : If the data set contains an odd of items, the middle items of the array is the median.
𝑛+1 th
If the total of frequency is odd, say n, Then the value of ( ) observation gives the median.
2
If there is an even number of observations, then the median is the average of two middle items.
Discrete data :
Consider the series where the data are arranged in form of frequency distribution.
Suppose the order values X1,X2 ………….X n have their corresponding frequencies f1 , f2 ……….. f n . then
the median is following steps are :
Step 1 : Arranged the given data ascending or descending order of their magnitude.
𝑁+1
Step 4 : Now look at the cumulative frequency column and find that total which is either equal to 2
Or next higher to that and determine the value of the variable corresponding to it. That gives the value
of median.
Continuous data :
If the data are given with class intervals , then the following procedure is :
Step 3 : Locate the middle number in the cumulative frequency and thus find out median class.
Where
N = Total frequency ,
Mode :
A third measure of central tendency is the mode and it is defined simply as the value which occur
the most often, that is with the highj frequency.
Un grouped data :
The calculation of mode is very easy. It depends upon the frequencies. The data, there fore,
Should be grouped in discrete or continuous data and the item value with higher frequency would be
the mode.
Grouped data :
Discrete data :
In case of frequency distribution, one can find mode by inspection. The variate value having the
maximum frequency is the modal value.
Continuous data :
If the data are given with class intervals , then the following formula is used for the calculation
of mode :
f1 – f0
Mode = l + 2 f1 – f0 – f 2 x h
Where
Measure of Dispersion :
Consider series (i) 7,8,10,11,9 (ii) 3,6,9,12,15 (iii) 1,5,9,13,17 . In all these cases , the number of
observation is 5 and mean is 9. We cannot form an idea as to whether it is the average of I series or II
series or III series of observation 5 and is sum is 45.
Thus we see that the measure of central tendency are inadequate to gives us a complete idea about
of the distribution.
Literal meaning of dispersion is ‘’ scatterdness ‘’ . We study dispersion to have idea about the
variability or spread or homogeneity or heterogeneity of the distribution.
Range :
Range is the difference between the maximum(highest) and minimum (lowest ) values in the data. It
is easy to calculate and understand .
Problem 1:
= 14-5
Range= 9
Coefficient of range is
= ( 14 – 5 )/(14 + 5 )
= 0.4736
It is defined as half the difference between the lower and upper quartiles . It is also called as
semi – inter quartile range.
Quartile divided the whole data of observations in to four equal parts , then quartile deviation
is QD =( Q3 – Q 1)/2
Ungrouped data:
QD = ( Q3 – Q 1)/2
Sol :
Now we are arranged given data into ascending order ,we get
QD = ( Q3 – Q 1)/2
= 2 nd observation
Q1 = 4
= 3*2 nd observation
= 6 th observation
= 11
Grouped data :
Where
Q 1 = (N+1)/4 , look at the cumulative frequency column is either equal or next highest
value of the variable corresponding to it.
Q3 = 3*(N+1)/4 , look at the cumulative frequency column is either equal or next highest
value of the variable corresponding to it.
Problem : From the following data find the value of Quartile deviation.
Sol :
QD = ( Q3 – Q 1)/2
th
Where Q 1 = (N+1)/4 observation
Q3 = 3* (N+1)/4 th observation
x f Cf
800 16 16
1000 24 40
1500 26 66
1800 30 96
2000 20 116
2500 6 122
Total 122
Q1 = (122+1)/4 = 30.75 here next highest value is 40,then corresponding observation is 1000
= 1000
Q3 = 3* (122+1)/4 = 92.25 here next highest value is 96,then corresponding observation is 1800
= 1800
QD = (1800-1000)/2
= 400
Problem : : From the following data find the value of Quartile deviation.
Sol : QD = ( Q3 – Q 1)/2
th
Where Q 1 = (N/4) observation
Q3 = 3* (N/4) th observation
th
= (100/4) observation = 25 th observation is Q1
= 35+[(25-15)/20]*15
= 42.5
Q3 = 3* (N/4) th observation
th
=3* (100/4) observation = 75 th observation is Q3
= 60+([75-65]/30)*15
= 67.5
QD = 67.5-42.5/2
= 12.5
= (67.5-42.5)/(67.5+42.5)
= 0.2272
Mean Deviation :
It is the arithmetic mean of absolute deviations from mean or median or mode. It is denoted by MD
Un grouped data:
Grouped data :
Standard Deviation :
Standard deviation, usually denoted by б , was first suggested by karl pearson. It is defined as the
positive square root of the arithmetic mean of the square of deviations of the given observiation from
their arithmetic mean.
i.e
Coefficient of variance :
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Cv = X 100
𝑚𝑒𝑎𝑛 𝑜𝑟 𝑚𝑒𝑑𝑖𝑎𝑛
Problem : From the following data regarding the number of members in 7 families , calculate mean
deviation and its coefficient. 2,5,3,6,3,4,4,
Sol :
1
MD = 𝑛 ∑𝑛𝑖=1 𝐼 Xi – A I
Where A is median
Now we have to calculate median , then arranged given data ascending order ,we get
i.e median=4
X 2 5 3 6 3 4 4 Total
IX-Median
I
2 1 1 2 1 0 0 7
1
Sol : MD = 𝑁∑ fi IXi – A I where A is Arithmetic mean
1
𝑥̅ = 𝑁 ∑𝑛𝑖=1 fI xi where Xi = average of class interval
=625/21 =29.7619
Class f X fX (X-𝑋̅) I (X-𝑋̅) I f I (X-𝑋̅) I
-
0 to 10 2 5 10 24.7619 49.5238
24.7619
-
10 to 20 3 15 45 14.7619 44.2857
14.7619
20 to 30 5 25 125 -4.7619 4.7619 23.8095
30 to 40 6 35 210 5.2381 5.2381 31.4286
40 to 50 4 45 180 15.2381 15.2381 60.9524
50 to 60 1 55 55 25.2381 25.2381 25.2381
Total 21 625 235.2381
MD = 235.2381/21 = 11.20181
UNIT III
Theory Of Probability :
Exhaustive :
Mutually Excusive :
Event are said to be mutually excusive or incompatible if the happening of any one
Equally Likely :
Out comes of a trail are set to be equal likely , if taking onto consideration all
them are favorable to the happening of an event E, then the probability p of happening
of E is given by
i.e ,0 ≤ p≤ 1 or 0 ≤q ≤1
conditions, then the limiting values of the ratio of the number of time the event
happens to the number of trails , as the number of trails become indefinitely large
n ∞
P( A ∩ B)= P( A )x P( B/A)
= P( B) x P(A/B)
If A and B are Independent event , then P(A/B)= P(A) and P(B/A)= P(B)
If A and B are independent event with P(A) >0 and P(B)>0 , then
Random Variable :
Random variable mean a real number X connected with the outcomes of a random
experiments.
Distribution function :
FX (x) = P(X≤x)
Properties :
(i) If F is the distribution function of the random variable X and if a < b , then
F( -∞ )= Lt F(x) =0
F( ∞ ) =Lt F (X) = 1
If a random variable taking at most a countable number of values, it is called discrete random
variable
Suppose X is a one dimensional discrete random variable taking at most a countable number
of values x1,x2…… with each probability P1,P2…….. P(X i)=Pi is called pmf
A random variable X is said to be continuous , if it take all possible values between certain
limits.
The probability density function of a random variable x usually denote by f(x) is following
properties
(i) f(x) ≥ 0
∞
(ii) ∫−∞ 𝑓(x) = 1