0% found this document useful (0 votes)
5 views26 pages

Ii Sem Ba Notes

The document discusses variables for data analytics, categorizing them into categorical (nominal and ordinal) and numeric (discrete and continuous) types. It explains the characteristics and examples of each variable type, including their applications in data collection and analysis. Additionally, it outlines measures of central tendency, such as arithmetic mean, median, and mode, as essential statistical tools in data analytics.

Uploaded by

thanmaix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views26 pages

Ii Sem Ba Notes

The document discusses variables for data analytics, categorizing them into categorical (nominal and ordinal) and numeric (discrete and continuous) types. It explains the characteristics and examples of each variable type, including their applications in data collection and analysis. Additionally, it outlines measures of central tendency, such as arithmetic mean, median, and mode, as essential statistical tools in data analytics.

Uploaded by

thanmaix
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT –I

VARIABLES FOR DATA ANALYTICS:

Types of variables
A variable is a characteristic that can be measured and that can
assume different values. Height, age, income, province or country of
birth, grades obtained at school and type of housing are all examples
of variables. Variables may be classified into two main categories:
categorical and numeric. Each category is then classified in two
subcategories: nominal or ordinal for categorical variables, discrete or
continuous for numeric variables. These types are briefly outlined in
this section.

Categorical variables
A categorical variable (also called qualitative variable) refers to a
characteristic that can’t be quantifiable. Categorical variables can be
either nominal or ordinal.

Nominal variables
A nominal variable is one that describes a name, label or category
without natural order. Sex and type of dwelling are examples of
nominal variables. In Table the variable “mode of transportation for
travel to work” is also nominal.

Method of travel to work for Canadians


Table summary
This table displays the results of Method of travel to work for
Canadians. The information is grouped by Mode of
transportation for travel to work (appearing as row headers),
Number of people (appearing as column headers).
Mode of transportation for travel to work Number of people
Car, truck, van as driver 9,929,470
Car, truck, van as passenger 923,975
Public transit 1,406,585
Walked 881,085
Bicycle 162,910
Other methods 146,835

Ordinal variables
An ordinal variable is a variable whose values are defined by an order
relation between the different categories. In Table the variable
“behaviour” is ordinal because the category “Excellent” is better than
the category “Very good,” which is better than the category “Good,”
etc. There is some natural ordering, but it is limited since we do not
know by how much “Excellent” behaviour is better than “Very good”
behaviour.

Student behaviour ranking


Table summary
This table displays the results of Student behaviour ranking. The
information is grouped by Behaviour (appearing as row
headers), Number of students (appearing as column headers).
Behaviour Number of students
Excellent 5
Very good 12
Good 10
Bad 2
Very bad 1
It is important to note that even if categorical variables are not
quantifiable, they can appear as numbers in a data set.
Correspondence between these numbers and the categories is
established during data coding. To be able to identify the type of
variable, it is important to have access to the metadata (the data about
the data) that should include the code set used for each categorical
variable. For instance, categories used in Table 4.2.2 could appear as
a number from 1 to 5: 1 for “very bad,” 2 for “bad,” 3 for “good,” 4 for
“very good” and 5 for “excellent.”

Numeric variables
A numeric variable (also called quantitative variable) is a quantifiable
characteristic whose values are numbers (except numbers which are
codes standing up for categories). Numeric variables may be either
continuous or discrete.

Continuous variables
A variable is said to be continuous if it can assume an infinite number
of real values within a given interval. For instance, consider the height
of a student. The height can’t take any values. It can’t be negative and
it can’t be higher than three metres. But between 0 and 3, the number
of possible values is theoretically infinite. A student may be
1.6321748755 … metres tall. In practice, the methods used and the
accuracy of the measurement instrument will restrict the precision of
the variable. The reported height would be rounded to the nearest
centimetre, so it would be 1.63 metres. The age is another example of
a continuous variable that is typically rounded down.

Discrete variables
As opposed to a continuous variable, a discrete variable can assume
only a finite number of real values within a given interval. An example
of a discrete variable would be the score given by a judge to a
gymnast in competition: the range is 0 to 10 and the score is always
given to one decimal (e.g. a score of 8.5). You can enumerate all
possible values (0, 0.1, 0.2…) and see that the number of possible
values is finite: it is 101! Another example of a discrete variable is the
number of people in a household for a household of size 20 or less.
The number of possible values is 20, because it’s not possible for a
household to include a number of people that would be a fraction of an
integer like 2.27 for instance.

Categorical data vs Numerical


data
What is categorical data?
Categorical data refers to a data type that can be stored and identified
based on the names or labels given to them. A process called matching
is done, to draw out the similarities or relations between the data and
then they are grouped accordingly.

The data collected in the categorical form is also known as qualitative


data. Each dataset can be grouped and labelled depending on their
matching qualities, under only one category. This makes the categories
mutual exclusive.
Example: sexuality is categorical data, as a person can be straight,
homosexual, heterosexual, etc. and they are grouped together
depending on the common characteristics possessed by them.

There are two subtypes of categorical data namely: Nominal data and
Ordinal data.

• Nominal data – this is also called naming data. This is a type that
names or labels the data and its characteristics are similar to a
noun. Example: person’s name, gender, school name.
Questions to gather nominal data look like:

• What is your name?


• What is your pet’s name?
• What is your gender?

• Ordinal data – this includes data or elements of data that is ranked,


ordered or used on a rating scale. You can count and order ordinal
data but it doesn’t allow you to measure it.
Example: seminar attendants are asked to rate their seminar experience
on a scale of 1-5. Against each number, there will be options that will
rate their satisfaction like “very good, good, average, bad, and very bad”.

What is numerical data?


Numerical data refers to the data that is in the form of numbers, and not
in any language or descriptive form. Often referred to as quantitative
data, numerical data is collected in number form and stands different
from any form of number data types due to its ability to be statistically
and arithmetically calculated.
It doesn’t involve any natural language description and is quantitative in
nature and it is used to measure quantities like a person’s height, age,
IQ, etc.

It also has two subtypes known as Discrete data and Continuous data.

• Discrete data – Discrete data is used to represent countable items.


It can take both numerical and categorical forms and group them
into a list. This list can be finite or infinite too.

Discrete data basically takes countable numbers like 1, 2, 3, 4, 5, and so


on. In the case of infinity, these numbers will keep going on.

Example: counting sugar cubes from a jar is finite countable. But


counting sugar cubes from all over the world is infinite countable.
• Continuous data – As the name says, this form has data in the
form of intervals. Or simply said ranges. Continuous numerical data
represent measurements and their intervals fall on a number line.
Hence, it doesn’t involve taking counts of the items.

Example: in a school exam, students who scored 80%-100% come


under distinction, 60%-80% have first-class and below 60% are second
class.

Continuous data is further divided into two categories: Interval and Ratio.

• Interval data – interval data type refers to data that can be


measured only along a scale at equal distances from each other.
The numerical values in this data type can only undergo add and
subtract operations. Example: body temperature can be measured
in degree Celsius and degree Fahrenheit and neither of them can
be 0.

• Ratio data – unlike interval data, ratio data has zero points. Being
similar to interval data, zero point is the only difference they
have. Example: in the body temperature, the zero point
temperature can be measured in Kelvin.
Start uncovering high quality insights now!

15 differences between Categorical


data and Numerical data
Features Categorical data Numerical data

Categorical data refers to a


Numerical data refers to the
data type that can be stored
data that is in the form of
Definition and identified based on the
numbers, and not in any
names or labels given to
language or descriptive form.
them.
Also known as quantitative
Also known as qualitative data as it represents
Alias data as it qualifies data before quantitative values to perform
classifying it. arithmetic operations on
them.

What is your test score out of


20?
What is your gender?
• Below 5
• Male
Examples • 5-10
• Female • 10-15
• Other • 15-20

• 20

Nominal data and Ordinal Discrete data and Continuous


Types
data. data.

• No order scale
• Has an ordered scale
• Natural language
• Not use of natural
description
language description
• Can take numerical
• Takes numeric values
Characteristics values but with
with numeric qualities
qualitative properties
• Can be visualized using
• Can be visualized using
bar charts and pie
bar charts and pie
charts
charts

Can include long surveys and Survey interaction is easy and


User-friendly
has a chance of pushing short, hence fewer survey
design
respondents away. abandonment issues.
Mostly collected through
Nominal data: open-ended
Data collection multiple-choice questions and
questions Ordinal data:
method sometimes through open-
multiple-choice questions
ended questions.

Questionnaires, surveys,
Data collection Questionnaires, surveys, and
interviews, focus groups and
tools interviews
observations

Descriptive and inferential


statistics Eg: measures of
Median and mode Eg:
Analysis and central tendency, turf
univariate statistics, bivariate
interpretation analysis, text analysis,
statistics, regression analysis
conjoint analysis, trend
analysis

Used when a study requires


Used for statistical
respondents’ personal
calculations as a result of the
Uses information, opinions and
potential performance of
experiences. Commonly used
arithmetic operations
in business research

It is not compatible with most


It is compatible with most
statistical analysis methods,
Compatibility statistical calculation
hence researchers avoid using
methods.
it most of the times

Can be visualized using bar


Can be visualized using only
Visualization graphs, pie charts as well as
bar graphs and pie charts.
scatter plots.

Is known as unstructured or
semi-structured data It can It is structured data and can
Structure use indexing methods to be quickly organized and
structure data like Google, made sense of
Bing, etc.

Nominal vs Ordinal Scales: Points of Difference


In any business, the knowledge of different measurement variables is a
prerequisite as it allows owners to make well-informed and statistical
decisions. Every measurement scale a unique degree of detail to offer, such
as Nominal scale offers basic detail and Ratio offers maximum detail.

Factors Nominal Scale Ordinal Scale


These variables have a naturally o
order present between them yet th
The variables of this scale are difference between variables is un
differentiated by their nomenclature and The value of difference betwe
none other factors. variables on this scale cannot
Description There is no implied sequence in calculated. For instance, the o
which variables exist in nominal scale size is small, medium, large, e
large. But Small – Medium ≠ L
Extra Large.

There is no quantitative value associated Quantitative values are linked to o


Degree of
with variables on this scale. Instead, it is a variables but arithmetic evaluation
Quantitative Value
qualitative measurement scale. be conducted on these variables.
• Numbers are assigned to th
• These variables cannot be ordered. variables of this scale.
• The variables of this scale are • No arithmetic calculation ca
Key Differentiators
distinct. done on these variables.
• Nominal data is not quantifiable. • The difference between vari
cannot be calculated.
• Rank in a class test (first, se
or third)
• Sex (Male, Female)
• Customer satisfaction rating
• Marital Status (Married, Divorced,
scale of 0-10)
Unmarried, Widowed etc.)
Examples • Socio-economic status
• Religion (Christian, Jew, Muslim)
• Customer satisfaction degre
• Race (Red Indian, South-east
(Very satisfied, satisfied, ne
Asian etc.)
dissatisfied, very dissatisfied
• Education qualification

Interval scale Vs Ratio scale: Points of difference


Interval scale Ratio scale
Features

Ratio scale has all the characte


All variables measured in an interval scale
interval scale, in addition, to be
Variable property can be added, subtracted, and multiplied.
calculate ratios. That is, you ca
You cannot calculate a ratio between them.
numbers on the scale against 0
Zero-point in an interval scale is arbitrary.
The ratio scale has an absolute
Absolute Point For example, the temperature can be below
character of origin. Height and
Zero 0 degrees Celsius and into negative
cannot be zero or below zero.
temperatures.
Statistically, in an interval scale, the Statistically, in a ratio scale, the
Calculation
arithmetic mean is calculated. or harmonic mean is calculated
Interval scale can measure size and Ratio scale can measure size a
Measurement magnitude as multiple factors of a defined magnitude as a factor of one d
unit. terms of another.
A classic example of an interval scale is the
temperature in Celsius. The difference in Classic examples of a ratio sca
temperature between 50 degrees and 60 variable that possesses an abs
Example
degrees is 10 degrees; this is the same characteristic, like age, weight,
difference between 70 degrees and 80 sales figures.
degrees.

Discrete Data Vs Continuous Data

BASIS FOR
DISCRETE DATA CONTINUOUS DATA
COMPARISON

Meaning Discrete data is one that Continuous data is one that


has clear spaces falls on a continuous
between values. sequence.

Nature Countable Measurable

Values It can take only distinct It can take any value in


or separate values. some interval.
BASIS FOR
DISCRETE DATA CONTINUOUS DATA
COMPARISON

Graphical Bar Graph Histogram


Representation

Tabulation is Ungrouped frequency Grouped frequency


known as distribution. distribution.

Classification Mutually Inclusive Mutually Exclusive

Function graph Shows isolated points Shows connected points

Example Days of the week Market price of a product

UNIT II: ESSENTIAL STATISTICS DATA ANALYTICS:

Measure of Central Tendency :


Which gives an idea about central part of the distribution. The
following are five measure of central tendency or measure of location which are commonly used in :

(i) Arithmetic Mean (AM)

(ii) Median

(iii) Mode

(iv) Geometric Mean (GM)

(v) Harmonic Mean (HM)

Arithmetic Mean :
Arithmetic mean of a given set of observation is their sum divided by the number of
observation. It is denoted by 𝑥̅ and is given by :

Ungrouped data (or) individual data :


Let x1 ,x2 ,…………xn are the given n observation , then the arithmetic mean
1
is 𝑥̅ =𝑛 ∑𝑛𝑖=1 𝑥I

Step deviation method:


1
Arithmetic mean is 𝑥̅ = A + 𝑛 ∑𝑛𝑖=1 𝑑 I Where di = xi – A, A is constant or choose value

Grouped data:

Discrete data: Let x1 ,x2 ,…………xn are the given n observation with corresponding frequency .

f1, f2……….. fn , Then the arithmetic mean is


1
𝑥̅ = 𝑁 ∑𝑛𝑖=1 fI xi where N is total of frequency

Step deviation method :


1
𝑥̅ = A + ∑𝑛𝑖=1 fI di where di = xi-A , A is any constant
𝑁

Continuous data:

In continuous data mean where frequency are given along with the value of the variable in the
form of class intervals. Then the arithmetic mean is
1
𝑥̅ = 𝑁 ∑𝑛𝑖=1 fI xi where Xi is the arithmetic mean of class intervals

ℎ Xi − A
Step deviation method : 𝑥̅ = A + ∑𝑛𝑖=1 fI di where di =
𝑁 ℎ

A is any constant , h is width of the class interval.

Median :
A second measure of location that may be used to describe the “ center “ or “ middle “ of a set data is
called median. It is defined as the value of the middle item when the items are arranged in an increasing
or decreasing order of their magnitude.

Un grouped data : If the data set contains an odd of items, the middle items of the array is the median.
𝑛+1 th
If the total of frequency is odd, say n, Then the value of ( ) observation gives the median.
2

If there is an even number of observations, then the median is the average of two middle items.

If the total of frequncey is even say 2n , then


𝑛 𝑛
( 2 )th observation+ ( 2 +1) th observation
Median = 2
Grouped data :

Discrete data :

Consider the series where the data are arranged in form of frequency distribution.

Suppose the order values X1,X2 ………….X n have their corresponding frequencies f1 , f2 ……….. f n . then
the median is following steps are :

Step 1 : Arranged the given data ascending or descending order of their magnitude.

Step 2 : Find out cumulative frequencies.


𝑁+1
Step 3 : Apply formula : Median = Size of 2

𝑁+1
Step 4 : Now look at the cumulative frequency column and find that total which is either equal to 2

Or next higher to that and determine the value of the variable corresponding to it. That gives the value
of median.

Continuous data :

If the data are given with class intervals , then the following procedure is :

Step 1 : Convert the frequency in to cumulative frequencies (Cf)

Step 2 : Find out the middle item by using formula N/2 .

Step 3 : Locate the middle number in the cumulative frequency and thus find out median class.

Step 4 : Use the following formula to locate median by :


𝑁
−𝐶𝑓
Median = l + 2 xC
𝑓

Where

l = lower limit of median class,

N = Total frequency ,

Cf = Cumulative frequency of the median class preceding the median class,

F = frequency of the median class

C= class interval of the median class.

Mode :
A third measure of central tendency is the mode and it is defined simply as the value which occur
the most often, that is with the highj frequency.

Un grouped data :

The calculation of mode is very easy. It depends upon the frequencies. The data, there fore,

Should be grouped in discrete or continuous data and the item value with higher frequency would be
the mode.

Grouped data :

Discrete data :

In case of frequency distribution, one can find mode by inspection. The variate value having the
maximum frequency is the modal value.

Continuous data :

If the data are given with class intervals , then the following formula is used for the calculation
of mode :

f1 – f0
Mode = l + 2 f1 – f0 – f 2 x h

Where

l is the lower limit of mode class,

f1 is the frequency of mode class,

f0 is the frequency of the mode class, preceding to mode class,

f2 is the frequency of mode class, subsequent to mode class.

Measure of Dispersion :

Measure of central tendency gives us an idea about concentration of the observation


about the central part of the distribution. If we know the average alone we cannot form a complete idea
about the distribution.

Consider series (i) 7,8,10,11,9 (ii) 3,6,9,12,15 (iii) 1,5,9,13,17 . In all these cases , the number of
observation is 5 and mean is 9. We cannot form an idea as to whether it is the average of I series or II
series or III series of observation 5 and is sum is 45.

Thus we see that the measure of central tendency are inadequate to gives us a complete idea about
of the distribution.
Literal meaning of dispersion is ‘’ scatterdness ‘’ . We study dispersion to have idea about the
variability or spread or homogeneity or heterogeneity of the distribution.

Characteristics of measure of dispersion :

1) It should be rigidly defined.

2) It should be easy to calculate and easy to understand.

3) It should be based on all the observation.

4) It should be capable to further mathematical treatment.

5) It should be affected as little as possible by Fluctuations .

The following measure of dispersion are :

1) Range 2) Quartile deviation 3) Mean deviation 4) Standard deviation 5) coefficient of variance

The other measure of dispersion are :

6) Coefficient of Quartile deviation 7) Coefficient of Mean deviation

Range :

Range is the difference between the maximum(highest) and minimum (lowest ) values in the data. It
is easy to calculate and understand .

Range = Maximum value – minimum value.

Problem 1:

Find range for the observation (in Rs) 10,8,5,10,9,14,7.

Sol : Given data 10,8,5,10,9,14,7

Here maximum value is 14 and minimum value is 5

Range = maximum values – minimum value

= 14-5

Range= 9

Coefficient of range is

C.R = (L-S)/(L +S)

= ( 14 – 5 )/(14 + 5 )
= 0.4736

Quartile Deviation (QD):

It is defined as half the difference between the lower and upper quartiles . It is also called as
semi – inter quartile range.

Quartile divided the whole data of observations in to four equal parts , then quartile deviation
is QD =( Q3 – Q 1)/2

Ungrouped data:

QD = ( Q3 – Q 1)/2

Where Q 1 = (N+1)/4 observation , in the arranged data.

Q3 = 3*(N+1)/4 observation , in the arranged data.

Problem 1 : Find range for the observations 8,2,4,15,11,10,9.

Sol :

Now we are arranged given data into ascending order ,we get

2,4,8,9,10,11,15 and n=7

QD = ( Q3 – Q 1)/2

Where Q 1 = (N+1)/4 observation , in the arranged data.

= (7+1)/4 observation in the arranged data

= 2 nd observation

Q1 = 4

Q3 = 3*(N+1)/4 observation , in the arranged data.

= 3*2 nd observation

= 6 th observation

= 11

QD= (11-4)/4 = 3.5

Grouped data :

If Xi / fi ; ∀ i=1,2,…….n be the grouped frequency distribution, then Quartile deviation is


QD =( Q3 – Q 1)/2

Where

Q 1 = (N+1)/4 , look at the cumulative frequency column is either equal or next highest
value of the variable corresponding to it.

Q3 = 3*(N+1)/4 , look at the cumulative frequency column is either equal or next highest
value of the variable corresponding to it.

Problem : From the following data find the value of Quartile deviation.

Income (in RS) 1000 1500 800 2000 2500 1800


No. of persons 24 26 16 20 6 30

Sol :

QD = ( Q3 – Q 1)/2

th
Where Q 1 = (N+1)/4 observation

Q3 = 3* (N+1)/4 th observation

First arranged the given data in ascending order ,we get

x f Cf
800 16 16
1000 24 40
1500 26 66
1800 30 96
2000 20 116
2500 6 122
Total 122

Q1 = (122+1)/4 = 30.75 here next highest value is 40,then corresponding observation is 1000

= 1000

Q3 = 3* (122+1)/4 = 92.25 here next highest value is 96,then corresponding observation is 1800

= 1800

QD = (1800-1000)/2
= 400

Problem : : From the following data find the value of Quartile deviation.

Marks Less than 35 35-50 50-60 60-75 75 above


No.of students 15 20 30 30 5

Sol : QD = ( Q3 – Q 1)/2

th
Where Q 1 = (N/4) observation

Q3 = 3* (N/4) th observation

Marks No.of students Cf


Less than 35 15 15
35-50 20 35
50-60 30 65
60-75 30 95
75 and above 5 100
Total 100
th
Q 1 = (N/4) observation

th
= (100/4) observation = 25 th observation is Q1

25 th observation lies between 35-50, there fore it has to be taken Q1 class


𝑁
( 4 − Cf1 )
Q1 = l1 + x c1
𝑓1

= 35+[(25-15)/20]*15

= 42.5

Q3 = 3* (N/4) th observation
th
=3* (100/4) observation = 75 th observation is Q3

75 th observation lies between 60-75, there fore it has to be taken Q3 class


𝑁
( 4 – Cf3 )
Q1 = l3 + x c3
𝑓3

= 60+([75-65]/30)*15
= 67.5

QD = 67.5-42.5/2

= 12.5

Coefficient of Quartile deviation is :

C.Q.D = (Q3 – Q1)/ (Q3 + Q1)

= (67.5-42.5)/(67.5+42.5)

= 0.2272

Mean Deviation :

It is the arithmetic mean of absolute deviations from mean or median or mode. It is denoted by MD

Un grouped data:

If Xi ; ∀ i=1,2,…n be the n observation , then the mean deviations is :


1
MD = 𝑛 ∑𝑛𝑖=1 𝐼 Xi – A I

Where A is either mean or median or mode

Grouped data :

If Xi /fi ∀ i=1,2,…n be the grouped frequency distribution , then mean deviation is :


1
M D = 𝑁∑ fi IXi – A I

Where A is either mean or median or mode

`Coefficient of Mean deviations is :

C.M.D = Mean Deviation/ (mean or median)

Standard Deviation :

Standard deviation, usually denoted by б , was first suggested by karl pearson. It is defined as the
positive square root of the arithmetic mean of the square of deviations of the given observiation from
their arithmetic mean.

i.e

If xi ∀ i=1,2,…n be the n observations , then SD is :


1
б 2 =𝑛 ∑ (Xi - 𝑋̅ )2
for grouped data :

If Xi / fi , for all I =1,2,….n be the grouped frequency distribution, then the SD is :


1
б 2 =𝑁 ∑ fi(Xi - 𝑋̅ )2

Coefficient of variance :

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Cv = X 100
𝑚𝑒𝑎𝑛 𝑜𝑟 𝑚𝑒𝑑𝑖𝑎𝑛

Problem : From the following data regarding the number of members in 7 families , calculate mean
deviation and its coefficient. 2,5,3,6,3,4,4,

Sol :
1
MD = 𝑛 ∑𝑛𝑖=1 𝐼 Xi – A I

Where A is median

Now we have to calculate median , then arranged given data ascending order ,we get

2,3,4,5,6 and n=5 i.e odd number

Median = Size of (n+1)/2 =(5+1)/2 = 3 observation

i.e median=4

X 2 5 3 6 3 4 4 Total
IX-Median
I
2 1 1 2 1 0 0 7

Mean Deviation (MD)= 7/7 =1

Coefficient of Mean Deviation (CMD)= (1/4)X100 =25

Problem : Calculate mean deviation for the following data.

Class 0-10 10-20 20-30 30-40 40-50 50-60


f 2 3 5 6 4 1

1
Sol : MD = 𝑁∑ fi IXi – A I where A is Arithmetic mean

1
𝑥̅ = 𝑁 ∑𝑛𝑖=1 fI xi where Xi = average of class interval

=625/21 =29.7619
Class f X fX (X-𝑋̅) I (X-𝑋̅) I f I (X-𝑋̅) I
-
0 to 10 2 5 10 24.7619 49.5238
24.7619
-
10 to 20 3 15 45 14.7619 44.2857
14.7619
20 to 30 5 25 125 -4.7619 4.7619 23.8095
30 to 40 6 35 210 5.2381 5.2381 31.4286
40 to 50 4 45 180 15.2381 15.2381 60.9524
50 to 60 1 55 55 25.2381 25.2381 25.2381
Total 21 625 235.2381

MD = 235.2381/21 = 11.20181

Coefficient of mean deviation is

CMD=(MD/Mean)X100 =(11.20181/29.7619)X100 =37.6381

UNIT III

PROBABILITY FOR DATA ANALYTICS

Theory Of Probability :

Exhaustive :

The total number of possible outcomes in any trial is known as exhaustive.

Mutually Excusive :

Event are said to be mutually excusive or incompatible if the happening of any one

of them precludes the happening of all the others.

Equally Likely :

Out comes of a trail are set to be equal likely , if taking onto consideration all

relevant evidences, there is no reason to expect one on other.


Mathematical or Classical or Priori Probability :
A trail of results in ‘n’ exhaustive ,mutually exclusive and equally likely cases and ‘m’ of

them are favorable to the happening of an event E, then the probability p of happening

of E is given by

P = P(E) = m/n =Favorable number of cases/ Exhaustive number of cases

Obviously P or q are non negative and cannot exceed unity

i.e ,0 ≤ p≤ 1 or 0 ≤q ≤1

If P( E) = 1 , E is called a certain events

And If P(E ) =0 , E is called an Impossible event.

Statistical or Empirical Probability :

According to Richardson von Mises

If a trail is repeated a number of time under essential homogeneous and identical

conditions, then the limiting values of the ratio of the number of time the event

happens to the number of trails , as the number of trails become indefinitely large

is called the probability of happening of the event.

i.e, P = P(E) = Lt m/n

n ∞

Meaning in terms of set


sno Statement theory
At least of the events A or B
1 AU B
occures
2 Both the events A and B occur A∩B
3 Neither A nor B occurs Ā∩Ḃ
Event A occurs and Bdoes not
4 A∩Ḃ
occurs
Event A and B are mutually
5 A∩B
excusive
The probability of the impossible event is zero.

The probability of the complementary event Ā of A is i.e, P(Ā )= 1-P(A)

For two events A and B is P( Ā ∩ B) = P(B) – P(A∩B)

For two events A and B is P( A ∩ Ḃ) = P(A) – P(A∩B)

Probability of union of any two events A and B is given by

i.e, P( A U B)= P(A) + P(B)- P(A∩B)

Multiplication Law of probability :

For any two events A and B is given by :

P( A ∩ B)= P( A )x P( B/A)

= P( B) x P(A/B)

If A and B are Independent event , then P(A/B)= P(A) and P(B/A)= P(B)

If A and B are independent event with P(A) >0 and P(B)>0 , then

P(A ∩ B)= P(A) P(B) ≠ 0

A and B are cannot be mutually excusive

Hence two independent event cannot be mutually disjoint.

Random Variable :

Random variable mean a real number X connected with the outcomes of a random
experiments.

Distribution function :

Let X be a random variable on ( S,B,P). Then the function

FX (x) = P(X≤x)

Properties :

(i) If F is the distribution function of the random variable X and if a < b , then

P(a<X≤b) = F(b) – F(a)


(ii) If F is the distribution function of one dimensional random variable x , then 0≤ F(X) ≤ 1

And F(x) ≤ F( y) if x<y

(iii) If F is one dimensional random variable x , then

F( -∞ )= Lt F(x) =0

F( ∞ ) =Lt F (X) = 1

Discrete Random variable :

If a random variable taking at most a countable number of values, it is called discrete random
variable

Probability Mass Function (PMF) :

Suppose X is a one dimensional discrete random variable taking at most a countable number
of values x1,x2…… with each probability P1,P2…….. P(X i)=Pi is called pmf

I.e , (i) P(X)≥ 0 (ii)∑∞


𝑖=1 𝑃(Xi) = 1

Continuous random variable :

A random variable X is said to be continuous , if it take all possible values between certain
limits.

Probability Density Function (PDF) :

The probability density function of a random variable x usually denote by f(x) is following
properties

(i) f(x) ≥ 0

(ii) ∫−∞ 𝑓(x) = 1

You might also like