Maths and Statistical Analysis
Maths and Statistical Analysis
Module-3
x: 20 25 30 35 40
y: 30 37.5 45 52.5 60
-In this example there is linear relationship as there is the ratio 2:3 at all points.
- If we plot these points, they will lie on a straight line.
-Correlation is said to be non linear (or curvilinear), when the amount of change in
one variable is not in constant ratio to the change in the other variable.
-In the case of curvilinear correlation, the ratio of change fluctuates and is never
constant.
● Simple, Multiple and Partial correlation:-The correlation is said to be simple when
only two variables are studied.
-The correlation is either multiple or partial when three or more variables are
studied.
-The correlation is said to be Multiple when three variables are studied
simultaneously.
-answer
-From the three scatter diagrams we can say that there is positive correlation
between x and y in the set 1,
-there is negative correlation between x and y in the set 2 and there is no correlation
between x and y in the set 3.
x: 2 3 4 5 6 7 8
y: 4 5 6 12 9 5 4
-answer
->example 2
-ans:
→example 3
→Properties of Correlation Coefficient
● Correlation coefficient has a well defined formula.
● Correlation coefficient is a pure number and is independent of the units of
measurement.
● It lies between -1 and +1.
● Correlation coefficient does origin or change with reference to change of origin or
change of scale.
● Coefficient of correlation between x and y is same as that between y and x.
→example 1
-if r=.6 and n=64 find probable error and standard error?
ans:
=0.08
-since probable error is very small the correlation is significant.
To
->example 1
= 0.545 (pg:H 17)
*Uses of Correlation
● It helps to study the Association between two variables.
- For example, we can examine whether there is any relation between sale and profit,
with the help of correlation.
● Correlation measures degree of relation between two variables.
-Karl Pearson's coefficient of correlation provides a formula for finding the degree of
relation between two variables.
● From the correlation coefficient, we can develop a measure called probable error.
-Probable error indicates whether the correlation is significant or not.
● Correlation analysis helps to estimate the future values.
-For example, from the correlation coefficient between income and investment one
can predict the possible quantum of investment for a particular amount of income.
#Regression Analysis:-a in the general sense, means the estimation or the prediction
of the unknown value of one variable from the known value of the other variable.
- It is a statistical device used to study the relationship between two or more variables that
are related.
→Dependent and Independent Variables
-In regression analysis there are two types of variables.
-The vari-able whose value is influenced or is to be predicted is called dependent variable.
-and the variable which influences the values or is used for pre-diction, is called independent
variable.
-If the regression curve is a straight line, we say that there is linear regression between the
variables.
- If the curve of regression is not a straight line, then the regression is termed as curved or
non-linear regression.
-The regression equation in such cases is not of first degree.
-In this case the dependent variable does not change by a constant amount of change in the
independent variable.
→Freehand Curve method:-A freehand curve method is an easy method for obtaining a
regression line.
-According to this method original data are plotted on a graph paper.
-Usually original data when plotted on a graph gives a wave like curve but it depicts a
general tendency of the data. Or Original data typically produces a wave-like curve when
plotted on a graph, but this curve only represents the data's overall tendency.
-Independent variable is taken along the horizontal axis and dependent variable along the
vertical axis.
-We draw a smooth freehand line in such a way that it clearly indicates the tendency of the
original data.
->Example : The following are the price in thousands of Rupees and Corresponding supply.
dependent variable and ye stands for the corresponding value of the dependent variable
obtained from the line.
-The difference between given y values and y values obtained from the line of best fit are
respectively d₁, d₂ ,.......
-Therefore principle of least squares states that the line of best fit be so drawn such that
d₁²+ d₂² +......... is minimum.
->example 2 pg: 6 to 15
->example 3 pg :16 to 18
#Probability Theory:-The probability of a given event may be defined as the
numerical value given the chance of the occurrence of that event.
-It is a number lying between 0 and 1.
-Zero is for an event which cannot occur and 1 for an event certain to occur.
-When the occurrence of an event is uncertain, probability is a number between 0 and 1.
-For example, when we toss a coin, the event of getting Head is uncertain.
- So its probability is neither 0 nor 1, but between the two.
-Since the chance of the occurrence of Head is as much as it's not occuring,
- we can predict the occurrence of the Head with 50% confidence only.
-Therefore the probability for head = ½.
*Random experiment:-An experiment that has two or more outcomes which vary in an
unpredictable manner from trial to trial when conducted under uniform conditions, is called
a random experiment.
-In a random experiment all the possible outcomes are known in advance but none of the
outcomes can be predicted with certainty.
-The tossing of a coin, for example, is a random experiment since it has two specified
outcomes - Head and Tail.
-But we are uncertain whether head will turn or tail when the coin is tossed.
-Throwing a die is another example of a random experiment.
-When we throw a die, the possible results are [1, 2, 3, 4, 5, 6].
-It is not possible to predict which of these will occur.
*Sample Space:-The Sample space of a random experiment is the set containing all the
sample points of that random experiment.
-Therefore the sample space of a random experiment is the totality of all the elementary
outcomes of that random experiment.
-Eg: 1: When a coin is tossed the sample space is [Head, Tail].
-Eg: 2. When two coins are tossed the sample space is [HH, HT, TH, TT].
-A sample space may be discrete or continuous.
-When a sample space has finite number of points, it is said to be finite sample space
otherwise infinite.
->Ex. 1: A box contains 10 tickets each numbered 1 to 10. A ticket is drawn. What is the
sample space?
Ans: The sample space is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
->Ex. 2. From a lot containing good and bad items, 3 items are chosen. Prepare the sample
space.
Ans: Let B stand for bad and G stand for good. Then the sample space is
{GGG GGB,GBG BGG GBB, BGB, BBG, BBB}
● Impossible event (Empty set):-If an event cannot occur, when the random
experiment is conducted, then that event is an impossible event.
-For example, Getting a white ball from a bag containing all black balls is an
impossible event.
-An impossible event is an empty set, as it contains no sample point of the random
experiment.
-An empty set can be denoted by ∅
● Uncertain events:-An event is said to be uncertain if its happening is neither sure nor
impossible.
-That is, the happening of an uncertain event cannot be predicted.
-For example getting a white ball from a bag containing white and black balls is
uncertain.
● Equally likely events:-Two events are said to be equally likely if any one of them
cannot be expected to occur in preference to the other.
-Eg: 1. Getting Head and Getting Tail when a coin is tossed.
- 2. when a die is thrown are equally likely events.
● Mutually exclusive events:-A set of events are said to be mutually exclusive if the
occurrence of one of them excludes (or prevents) the possibility of the occurrence of
the others.
-Two mutually exclusive events cannot occur simultaneously in the same trial.
-Eg: 1. Getting an Ace and Getting a King when a card is drawn from a pack of cards
are mutually exclusive.
-Eg: 2. Getting Head and Getting Tail when a coin is tossed are mutually exclusive.
-Note: If A and B are mutually exclusive, then A ∩ B = ∅ (null set) ie the two sets are
disjoint.
● Mutually Exclusive and Exhaustive events:- A set of events are said to be mutually
exclusive and exhaustive if one of them must and only one can occur.
-That is, A, B and C are said to be mutually exclusive and exhaustive
-if they are (i) mutually exclusive and (ii) exhaustive.
- Example: In tossing two coins, getting no head, A getting one head, getting two
heads and getting 3 heads are mutually exclusive and exhaustive.
● Complement of event A (Event 'not A'):-The event 'A' and the event 'not A' are
called complementary events.
-A’=U-A (where 'U' stands for sample space).
● Union of two events (At least one) (A or B) :-The union of two events A and B
denoted by A u B is the set of sample points in A or in B or in both.
-Eg: A = getting a multiple of 5 ,B= getting a multiple of 3
,then AUB= getting a multiple of 5 or 3.
-In fig. 1, A and B are intersecting.
-In fig. II, A and B are not intersecting.
-In both the cases, A u B is shaded.
● Intersection of two events (Both A and B):- The intersection of two events A and B
denoted by A n B is the set of sample points common to both A and B.
-Eg A =getting a multiple of 5. B = getting a multiple of 3
-Then A n B = getting a multiple of 15.
-In fig. I, A and B are intersecting, AnB=∅
● Difference of two events (A and not B) (only A) (exactly A):- The event 'A not B' is
the event whose outcomes are those belonging to A, but not B.
-Therefore 'A not B' excludes from A outcomes common to A and B.
-A not B= AnB’=(A-B)=A-(AnB)
-A-B is shaded.
● Exactly one (Symmetric difference) of two events.:-If A and B are two events exactly
one of them is the event whose outcomes are in A only or in B only.
-That is, the outcomes common to A and B are excluded.
-Exactly one of A and B
01. Addition Rule for mutually exclusive events:- If A and B are two mutually
exclusive events then the probability for A or B to happen is the sum of their
probabilities ie P(A u B)= P(A) + P(B) if A and B are disjoint
Proof: Let n (A) be the number of elementary outcomes in A and n (B) be
the number of elementary outcomes in B. Let n( A u B) be the number of
outcomes in A u B.
-Let n (S) be the number of elementary outcomes in the sample space.
Or Example note pg;23
02. Addition Rule for any two events (not mutually exclusive):- If A and B are
any two events, then the probability for A or B to happen is the sum of their
probabilities minus probability for both to happen.
Theorem =
→Independence of two events/statistical independence
Or
___________________________________________________________________________
Module -4
#Random variables:-A real valued function, defined over the sample space of a
random experiment is called the random variable associated to that random experiment.
-That is, the values of a random variable correspond to the outcomes of a random
experiment.
(a < x < b )=
-Let X and x be two random variable then joint probability function joining probability
density of X and Y is denoted by f( x,y ) and that of joint distribution function of X and Y is
denoted by ;
f( x,y ) = p( X <= x , Y <= y )
● n - number of trials.
● x - number of success in trials.
● P - probability of success.
● 1- P = q - probability of failure.
=> mean up = np
=> variance = npq
=>
=>
→Importance of Binomial distribution:-The Binomial distribution is often very useful in
decision making situations in business.
-In quality control it is very widely applied.
-In acceptance sampling plans, inspection is carried out on the articles drawn in a sample.
-The Binomial Distribution is used in such a sampling.
-The Binomial Distribution describes an enormous varieties of real life events.
-The distribution can be used to judge whether a coin or a die is unbiased or not by
comparing the observed frequencies and expected frequencies.
7. If two independent random variables follow Binomial Distribution, their sum also
follows Binomial Distribution.
-if λ is an integer then the distribution has two modes λ-1 and λ (bimodel).
-if λ is not an integer then it is uni model take the integral part of λ.
p(x) =
Or
->problems Example note pg:- 44 to 46
-coefficient of skewness is 0
-The normal curve is uni model ie,it has only one mode.
-The point of inflexion occur at .and the point of inflexion the curve changes
from concavity to convexity.
-Q1 and Q3 are equidistant from median.
range is to .
-If X and Y are two independent normal variates then their sum is also a normal variate .this
is called the additive property.
-Area under the normal curve is distributed as follows,
→large and small:-when the sample size is more than 30 the sample is known as large
otherwise small sample
Sample > 30 =large
Sample < 30 =small
*Point estimation & Interval estimation:-A point estimate is a single value estimate
of a parameter. For instance, a sample mean is a point estimate of a population mean.
-An interval estimate gives you a range of values where the parameter is expected to lie.
Or
-type of estimations are
1. Point:- Any statistic suggest
as an estimate of an
unknown parameter is
called point estimation
2. Interval:-it take the
value from an interval
->statistical inference:-the primary objective of sample study is a draw inference about the
pop by examining only a part of pop.
-such inference drawn are called statistical inference.
-thwere are two main branches they are;
● Test of hypothesis
● estimation
___________________________________________________________________________
Module -5
#Testing of Hypothesis
-procedure for testing hypothesis are;
1. Set up hypothesis (H0) and alternative hypothesis(H1).
2. Decide the test( such as z-test,t-test,χ2 -test and f-test.
3. Specify level of significance.
-usually level of significance specified as 5% or 1%
-in the absence of any specific instruction.
4. Calculate the value of the test statistic using appropriate formula.
5. Obtain the table value of the test using level of significance and the degree of
freedom.
6. Make decision about accepting or rejecting the null hypothesis on the basis of
computed value.
-if computed value less than table value we accept region.
-otherwise we reject the null hypothesis and accept the alternative hypothesis.
*Null & Alternative Hypothesis:-null hypothesis can be defined as a statistical
hypothesis.
-null hypothesis is denoted by H0
-H0 is a original hypothesis.
-Any hypothesis other than H0 is called H1 .
-When H0 is rejected we accept the other hypothesis known as H1 .
-alternative hypothesis is denoted by H0 .
→one tail and two tail test:-A two tailed test is one in which we reject the .if the
computed value greater than or lower than the critical value.
-in two tailed test the critical region is represented by both tails.
-if we are testing hypothesis at 5% level of significance ,the size of the acceptance region is
0.95 and the size of rejection region is 0.05 on both side together
-fig
-in one tail test ,the rejection region will be located in only one tail which may be either left
or right, depending on the alternative hypothesis
-in one tailed test critical region is represented by only in one tail
-fig
→critical value:-test statistics which separate the rejection region from the acceptance
region is called
→type 1 and type 2 error:-In any test of hypothesis the decision is to accept or to reject a
null hypothesis .
-there are 4 possibilities of the decision are;
● Accept H0 when it is true.
● Reject H0 when it is false.
● Reject H0 when it is true.
● Accept H0 when it is false.
-here (1) and(2) are correct ,while (3) and(4) are error.
-the last two cases are respectively known as type 1 and type 2 error.
-type 1:-Reject H0 when it is true.
-type 2:-Accept H0 when it is false.
→critical region:-The region is divided into 2 parts acceptance region and rejection region .
-if the computed value of the test statistics falls in the rejection region we accept the H0 .
-this rejection region is also known as critical region.
-the size of the critical region is also known as level of significance.
->best critical region:-our aim is to minimise both type 1 and type 2 error.
-but decreasing type1 of error cause increase in the other type of error.
-this is undesirable.
-so we have to keep the critical region as low as possible without allowing type 2 error to go
up very much.
-least type 2 error is called best critical region.
→test statistic:-The decision to accept or to reject the H0 is made on the basis of a statistic
computed from the sample.
-such a statistics is called test statistic
-There are commonly used test statistics are t,z, f x2 etc..
*Large sample test and Small Sample test:- for large samples ,we usually use z test .
-greater than 30 is called large sample.
-for small samples,any of the following test can be used z-test,x2-test ,t-test and f-test.
-less than 30 is called small sample.
*z - test
-Z-test is applied when the test statistic follows normal distribution.
-Uses of Z-test are:
1. To test the given population mean when the sample is large or when the population
SD is known.
2. To test the equality of two sample means when the samples are large or when the
population SD is known.
3. To test the population proportion.
4. To test the equality of two sample proportions.
5. To test the population SD when the sample is large.
6. To test the equality of two sample standard deviations when the samples are large or
when population standard deviations are known.
7. To test the equality of correlation coefficients
->example
ans:-
Note: -ve sign is not needed.
Eg; calculate value = -2.5 then we take as 2.5
And table value =1.9
So, calculate value > table value ,we reject
-where n1 and n2 are sample size and s1 and s2 are sample S.D
-where o stands for observed frequencies and E stands for expected frequencies.
a b
c d
-degree of freedom =(R-1) (C-1) R-no.of rows
=(2-1) (2-1) C-no.of columns
=1*1
=1
-where 'O' refers to the observed frequencies and 'E' refers to the expected
frequencies.
3. Degree of freedom= (r-1) x (c-1) where 'r' is the number of rows and 'c' is the number
of columns.
4. Obtain the table value for the degree of freedom and the level of significance.
5. If the calculated value of x2 is less than the table value accept the null hypothesis
(H0) . Otherwise reject it.
___________________________________________________________________________
Module-2
#Mathematical Logic:-One of logic's key goals is to establish guidelines for judging
the validity of any particular argument or reasoning.
-The rules of logic give precise meaning to mathematical statements.
-These rules are used to distinguish between valid and invalid mathematical arguments.
-Apart from its importance in understanding mathematical reasoning, logic has numerous
applications in Computer Science, varying from design of digital circuits, to the construction
of computer programs and verification of correctness of programs.
*Truth tables
*Tautology and contradiction:-A compound proposition that is always true, no matter
what the truth values of the propositional variables that occur in it, is called a tautology.
-A compound proposition that is always false is called a contradiction.
-A compound proposition that is neither a tautology nor a contradiction is called a
contingency.
-We can construct examples of tautologies and contradictions using just one propositional
variable (P).
-Consider the truth tables of p ∨ ¬p and p ∧ ¬p, shown in Table .
-Because p ∨ ¬p is always true, it is a tautology.
-Because p ∧ ¬p is always false, it is a contradiction.
->Example :Show that ¬(p ∨ (¬p ∧ q)) and ¬p ∧ ¬q are logically equivalent .
*Inference theory:-The main function of logic is to provide rules of inference, or
principles of reasoning.
-The theory associated with such rules is known as inference theory.
-The interference theory can be described as the analysis of validity of the formula from the
given set of premises.
-An argument can be defined as a sequence of statements.
- The argument is a collection of premises and a conclusion.
-The conclusion is used to indicate the last statement, and premises are used to indicate all
the remaining statements.
- Before the conclusion, the symbol ∴ will be placed.
-The following syntax is used to show the premises and conclusion:
● Premises: p1, p2, p3, p4, ….., pn
● Conclusion: q
*Validity by truth table:-if A and B are two premises and its C is the conclusion .
-When A and B premises are true (T) then the conclusion is also true(T).
-Then we say the give two premises produce a valid conclusion.
-If A and B premises are true (T) then the conclusion is false (F).
-it means that the give two premises not produce a valid conclusion.
->Example 1: Determine whether the conclusion C follows logically from the premises H1 and
H2.
-here H1 and H2 are two premises and C is the conclusion
a) H1:P→Q , H2 :P , C:Q
ans):- Here premises are H1 [P→Q] and H2 [P].
-and here two premises true have a conclusion [C]
True value therefore it is a valid conclusion.
-problems nokkikonam
#Predicate calculus:-Consider the following example. We need to convert the
following sentence into a mathematical statement using propositional logic only.
"Every person who is 18 years or older, is eligible to vote."
-The above statement cannot be properly expressed using only propositional logic.
- It would have been easier if the statement were referring to a specific person.
-But since it is not the case and the statement applies to all people who are 18 years or
older, we are stuck.
->Eg : Is X> 1 true or false
: Is X is great tennis player true or false
-we cannot say this above example are true or false
-Therefore we need a more powerful type of logic. (Predicate logic)
→Predicate logic:-Predicate logic is an extension of Propositional logic.
-It adds the concept of predicates and quantifiers to better capture the meaning of
statements that cannot be properly expressed by propositional logic.
*Predicates:- A predicate P(x) is a sentence that contain a finite number of variables and
becomes a proposition when specific values are substituted for the variables
-where P(x) is a propositional function.
-and x is a predicate variable.
->domain:- The domain of a predicate variable is the set of all possible values that may be
substituted in the place of variables
-Eg: x is great tennis player [x is set of all human names]
*Quantifier:- Quantifiers are words that refer to quantities such as “some” or “all” and
indicate how frequently a certain statement is true.
-There are two types qualifiers they are;
● Universal Quantifier:-The phrase “for all” denoted by ∀ is called the Universal
Quantifier
-eg: Let “all students are smart”
Let p(x) denote “ x is smart”
Then the above sentence can be written as ∀xP(x)
● Existential Quantifier:-The phrase “there exists “ denoted by ∃ is called the
Existential Quantifier
-eg:Let there exist x such that x² =5
Let P(x) is “x² =5”
Then the above sentence can be written as ∃xP(x)
→Negating Quantified Expressions:-Consider the following example
“Every student in the xyz university has studied discrete mathematics."
-here the Domain is All students of xyz university.
-P(x): x has studied discrete mathematics.
-the above statement is equivalent to ∀xP(x).
-Then What is the negation of the above statement?
"It is not the case that every student in the xyz university has studied discrete
mathematics."
OR
-we can simply write as
"There is some student in the xyz university who has not studied discrete
Mathematics."
-here P(x): x has studied discrete mathematics.
-and its negation is ¬P(x): x has not studied discrete mathematics.
"There is some student in the xyz university who has not studied discrete mathematics."
It is equivalent to ∃x¬P(x).
___________________________________________________________________________