0% found this document useful (0 votes)
64 views88 pages

MFCS Notes

Uploaded by

Vaidehi Suthar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views88 pages

MFCS Notes

Uploaded by

Vaidehi Suthar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Chapter 1:

Prepared by: Spandan Joshi(210280723001)


Topic 1: (PMF, CDF and PDF)
■ Random Variable: The term random variable is actually a misnomer as random variable
X is actually a function whose domain is the sample space and range is the set of all real
numbers.
■ So basically, the random variable X : S → R is a function that assigns a real number X(s)
to each sample point s ∈ S.
■ Example:
We toss a fair coin twice, this is a random experiment and the sample space can be
written as
S = {HH, HT, TH, TT}
Suppose we define a Random Variable X whose value is the number of observed heads. Then
the value of X will be one of 0,1,2 depending upon the outcome of the random experiment.

Discrete Random Variable:


■ Example for countable/finite:
Tossing a coin 3 times and counting the number of heads/tails.
S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

■ Example for countably infinite:

Tossing a coin till the first heads appear.


S={H, TH, TTH, TTTH, TTTTH, TTTTTH, TTTTTTH, …}

Probability Mass Function (PMF):

■ Let X be a discrete random variable with range Rx = {x1,x2,x3,...} (finite or countably


infinite), then the function

Px(xk) = P(X = xk), for k = 1,2,3,…

is called the “Probability Mass Function” of X.

■ Properties of PMF:
– 0 ≤ px(si) ≤ 1 must be true since this is a probability
– Since the random variable assigns a value x ∈ R to each sample point s ∈ S, we
must have the sum equal to 1.
Example:
If we toss a fair coin twice and let X be defined as the number of Tails we observe, Find
the range of X (Rx) and Probability Mass Function Px.
Solution:
Here, out sample space is given by
S = {HH, HT, TH, TT}
The number of Tails will be 0, 1 or 2. Hence,
Rx = {0, 1, 2}
Since the Rx = {0, 1, 2} is a countable set, random variable X is a discrete
random variable, and so the PMF will be defined as
Px(k) = P(X=k) for k = 0, 1, 2.
And so we have,
Px(0) = P(X=0) = P(HH)= 0.25
Px(1) = P(X=1) = P({HT,TH}) = 0.25 + 0.25 = 0.5
Px(2) = P(X=2) = P(TT)= 0.25

Although PMF is generally defined in Rx, it is sometimes convenient to extend it to all real
numbers. So in general, we can write

Px(x) = {𝑃(𝑋 = 𝑥) 𝑖𝑓 𝑥∈𝑅𝑥 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒


If we try to plot it,

Cumulative Distribution Function (CDF):

● The PMF is one way to describe the distribution of a discrete random variable.
● But PMF cannot be defined for continuous random variables.
● The cumulative distribution function (CDF) of a random variable is another method to
describe the distribution of random variables.
● The advantage of the CDF is that it can be defined for any kind of random variable
(discrete, continuous, and mixed).
The cumulative distribution function (CDF) of random variable X is defined as

FX(x)=P(X≤x), for all x∈R

Example:
Let X be a discrete random variable with range RX={1,2,3,...}. Suppose the PMF of X is
given by
PX(k)=1/2k for k=1,2,3,...
a) Find and plot the CDF of X, FX(x)
b) Find P(2<X≤5)
c) Find P(X>4)

Solution:
a. To find the CDF, note that
For x<1, FX(x)=0
For 1≤x<2, FX(x)=PX(1)=12
For 2≤x<3, FX(x)=PX(1)+PX(2)=12+14=34

In general we have
For 0<k≤x<k+1, FX(x) = PX(1) + PX(2) +...+ PX(k)
= (1/2) + (1/4) +...+ (1/2k) = (2k−1)/2k

Figure shows the CDF of X:

b. To find P(2<X≤5), we can write


P(2<X≤5) = FX(5) − FX(2) = (31/32) − (3/4) = (7/32)
Or equivalently, we can write
P(2<X≤5) = PX(3) + PX(4) + PX(5) = (1/8) + (1/16) + (1/32) = (7/32),
which gives the same answer.

c. To find P(X>4), we can write


P(X>4) = 1−P(X≤4) = 1−FX(4) = 1−(15/16) = (1/16)
Continuous Random Variable:
■ Examples:
– The Random Variable T is defined as time in hours from now for a huge 16k
model to be rendered.
– The depth of drilling required in order to find oil
– The height of a person
– The amount of time taken to travel from Gandhinagar to Mumbai in hours.

Probability Density Function (PDF):

■ Consider a continuous random variable X with an absolutely continuous CDF FX(x). The
function fX(x) defined by
fX(x) = dFx(x)/dx = F′x(x), if Fx(x) is differentiable at x
is called the probability density function (PDF) of X.
■ Properties of PDF:
● For every x ∈ R, p(x) ≥ 0.

● ∫ 𝑝(𝑥) 𝑑𝑥 = 1
−∞
Example:
Let X be a continuous variable with following PDF
fX(x) = {𝑐𝑒 − 𝑥 𝑓𝑜𝑟 𝑥 ≥ 0 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Where c is a positive constant


A. Find c.
B. Find the CDF of X, Fx(x).
C. Find P(1<X<3).
Hints:
A. To find c, we can use the property of PDF,
∞ ∞
∫ 𝑓𝑥(𝑋)𝑑𝑥 = ∫ 𝑐𝑒 − 𝑥 𝑑𝑥
−∞ 0
𝑥
B. To find the CDF of X, we use Fx(x) = ∫ 𝑓𝑥(𝑥) 𝑑𝑥, so for x<0, we get Fx(x)=0. And for X ≥ 0, we
−∞
have
𝑥
Fx(x) = ∫ 𝑒 − 𝑥 𝑑𝑥 = 1 − 𝑒 − 𝑥
0

C. We can find (1<X<3) using either CDF or PDF. Since we have found the CDF already,
P(1<X<3) = Fx(3) - Fx(1) = [1 - 𝑒 − 3] – [1 - 𝑒 − 1] = 𝑒 − 1 - 𝑒 − 3
Prepared by: Kalpit Patel(210280723001)
Chp 1 topic no (2) : Parametric families of distributions
{ binomial distribution, poisson distribution , geometric distribution }

Binomial Distribution
In probability theory and statistics, the binomial distribution is the discrete
probability distribution that gives only two possible results in an experiment,
either Success or Failure. For example, if we toss a coin, there could be only
two possible outcomes: heads or tails, and if any test is taken, then there
could be only two results: pass or fail. This distribution is also called a
binomial probability distribution.

Why do we need this…………………………………?

Ex : Toss of coin 2 times . find prob. Does the head occur 2 times ?

HH HT TH TT

Prob(Head=2) = 1 / 4

But when outcomes are large then ………………. Using this way it's
difficult to find probability…….

So at that time we used binomial .

Binomial Distribution Formula

The binomial distribution formula is for any random variable X, given by;

P(X=x) = nCx px (1-p)n-x

Where,

n = the number of experiments


x = 0, 1, 2, 3, 4, …

p = Probability of Success in a single experiment

q = Probability of Failure in a single experiment = 1 – p


Example 1: If a coin is tossed 5 times, find the probability of:
(a) Exactly 2 heads
(b) At least 4 heads.
Solution:
(a) The repeated tossing of the coin is an example of a Bernoulli trial.
According to the problem:
Number of trials: n=5
Probability of head: p= 1/2 and hence the probability of tail, q =1/2
For exactly two heads:
x=2
P(x=2) = 5C2 p2 q5-2 = 5! / 2! 3! × (½)2× (½)3
P(x=2) = 5/16
(b) For at least four heads,
x ≥ 4, P(x ≥ 4) = P(x = 4) + P(x=5)
Hence,
P(x = 4) = 5C4 p4 q5-4 = 5!/4! 1! × (½)4× (½)1 = 5/32
P(x = 5) = 5C5 p5 q5-5 = (½)5 = 1/32
Therefore,
P(x ≥ 4) = 5/32 + 1/32 = 6/32 = 3/16
Binomial Distribution Mean and Variance
For a binomial distribution, the mean, variance and standard deviation for
the given number of success are represented using the formulas
Mean, μ = np
Variance, σ2 = npq
Standard Deviation σ= √(npq)
Where p is the probability of success
q is the probability of failure, where q = 1-p

Example 2
A fair coin is tossed 10 times, what are the probability of getting exactly 6
heads and at least six heads.
Solution:
Let x denote the number of heads in an experiment.
Here, the number of times the coin tossed is 10. Hence, n=10.
The probability of getting head, p ½
The probability of getting a tail, q = 1-p = 1-(½) = ½.
The binomial distribution is given by the formula:
P(X= x) = nCxpxqn-x, where = 0, 1, 2, 3, …
Therefore, P(X = x) = 10Cx(½)x(½)10-x
(i) The probability of getting exactly 6 heads is:
P(X=6) = 10C6(½)6(½)10-6
P(X= 6) = 10C6(½)10
P(X = 6) = 105/512.

Hence, the probability of getting exactly 6 heads is 105/512.


(ii) The probability of getting at least 6 heads is P(X ≥ 6)
P(X ≥ 6) = P(X=6) + P(X=7) + P(X= 8) + P(X = 9) + P(X=10)
P(X ≥ 6) = 10C6(½)10 + 10C7(½)10 + 10C8(½)10 + 10C9(½)10 + 10C10(½)10
P(X ≥ 6) = 193/512.

Example 3: Your basketball team is playing a series of 5 games against your


opponent. The winner is those who wins more games (out of 5).

Let assume that your team is much more skilled and has 75% chances of
winning. It means there is a 25% chance of losing.

What is the probability of your team getting 3 wins?

We need to find out.

In this example:

n = 5, p=0.75, q=0.25, x=3


Let’s replace in the formula to get the answer:
Poisson Distribution
The Poisson distribution is a discrete probability function that means
the variable can only take specific values in a given list of numbers,
probably infinite. A Poisson distribution measures how many times an
event is likely to occur within the “x” period of time. In other words, we
can define it as the probability distribution that results from the Poisson
experiment. A Poisson experiment is a statistical experiment that
classifies the experiment into two categories, such as success or failure.
Poisson distribution is a limiting process of the binomial distribution.

When can we use it ……………………….?

1> When n is tends to infinite

2>The probability of success in each case tends to zero.

Poisson Distribution Formula


The formula for the Poisson distribution function is given by:
f(x) =(e– λ λx)/x!
Where,
e is the base of the logarithm
x is a Poisson random variable
λ is an average rate of value
mean = variance = lambda =np
Example 1:
A random variable X has a Poisson distribution with parameter λ such
that P (X = 1) = (0.2) P (X = 2). Find P (X = 0).
Solution:
For the Poisson distribution, the probability function is defined as:
P (X =x) = (e– λ λx)/x!, where λ is a parameter.
Given that, P (x = 1) = (0.2) P (X = 2)
(e– λ λ1)/1! = (0.2)(e– λ λ2)/2!
⇒λ = λ
2
/ 10
⇒λ = 10
Now, substitute λ = 10, in the formula, we get:
P (X =0 ) = (e– λ λ0)/0!
P (X =0) = e-10 = 0.0000454
Thus, P (X= 0) = 0.0000454

Example 2:
Telephone calls arrive at an exchange according to the Poisson process
at a rate λ= 2/min. Calculate the probability that exactly two calls will be
received during each of the first 5 minutes of the hour.
Solution:
Assume that “N” is the number of calls received during a 1 minute
period.
Therefore,
P(N= 2) = (e-2. 22)/2!
P(N=2) = 2e-2.
Now, “M” is the number of minutes among 5 minutes considered,
during which exactly 2 calls will be received. Thus “M” follows a
binomial distribution with parameters n=5 and p= 2e-2.
P(M=5) = 32 x e-10
P(M =5) = 0.00145, where “e” is a constant, which is approximately
equal to 2.718.
Geometric Distribution
A geometric distribution is defined as a discrete probability distribution
of a random variable “x” which satisfies some of the conditions. The
geometric distribution conditions are

● A phenomenon that has a series of trials


● Each trial has only two possible outcomes – either success or
failure
● The probability of success is the same for each trial

When to use this ……………………….?

Prob . of getting 1st time success in n times experiments performed.

Geometric Distribution Formula

P(x) = q^x-1 p ; x= 1, 2, ….

0 Else

Nth times attempts and find prob of first time success at rth.

Example1 : suppose soldier shoots a target , if prob of target hit in any


shot is 0.7

Find prob of target would be hit on 10th attempt?

Solution :

P is 0.7
So q is 0.3

X is 10

P(x=10) = q^10-1 p

Ans : 0.3^9 . 0.7


Prepared by: Krishna Brahmbhatt(210280723001)

Uniform Distribution

Uniform Distribution is a probability distribution and is concerned with


events that are equally likely to occur. When working out problems that
have a uniform distribution, be careful to note if the data is inclusive or
exclusive.

Example 1

The data in the table below are 55 smiling times, in seconds, of an


eight-week-old baby.

10.4 19.6 18.8 13.9 17.8 16.8 21.6 17.9 12.5 11.1 4.9

12.8 14.8 22.8 20.0 15.9 16.3 13.4 17.1 14.5 19.0 22.8

1.3 0.7 8.9 11.9 10.9 7.3 5.9 3.7 17.9 19.2 9.8

5.8 6.9 2.6 5.8 21.7 11.8 3.4 2.1 4.5 6.3 10.7

8.9 9.4 9.4 7.6 10.0 3.3 6.7 7.8 11.6 13.8 18.6

The sample mean = 11.49 and the sample standard deviation = 6.23.

We will assume that the smiling times, in seconds, follow a uniform distribution
between zero and 23 seconds, inclusive. This means that any smiling time from
zero to and including 23 seconds is equally likely. The histogram that could be
constructed from the sample is an empirical distribution that closely matches the
theoretical uniform distribution.
The notation for the uniform distribution is X ~ U(a, b) where a = the lowest value
of x and b = the highest value of x.

The probability density function is


f(x)=1/(b−a)
for a ≤ x ≤ b.
For this example, X ~ U(0, 23) and
f(x)=1/(23−0)
for 0 ≤ X ≤ 23.

Formulas for the theoretical mean and standard deviation are


μ=a+b/2
and
σ=√(b−a)^2/12
For this problem, the theoretical mean and standard deviation are
μ=(0+23) /2
=11.50 seconds
and
σ=√(23−0)^2/12
=6.64 seconds

Example 2

The current (in mA) measured in a piece of copper wire is known to follow a
uniform distribution over the interval [0, 25]. Write down the formula for the
probability density function f(x) of the random variable X representing the current.
Calculate the mean and variance of the distribution and find the cumulative
distribution
function F(x).
Solution
Over the interval [0, 25] the probability density function f(x) is given by the
formula
f(x) =1/25− 0
= 0.04, 0 ≤ x ≤ 25
0 otherwise
Using the formulae developed for the mean and variance gives
E(X) =(25 + 0) /2= 12.5 mA
and V(X) =(25 − 0)^2/12
= 52.08 mA2
The cumulative distribution function is obtained by integrating the probability
density function as
shown below.
F(x) =integration from -∞ to x[f(t) dt]
Hence, choosing the three distinct regions x < 0, 0 ≤ x ≤ 25 and x > 25 in turn
gives:
F(x) = 0, x <0
x/25, 0 ≤x ≤25
1, x >25

Example 3
In the manufacture of petroleum the distilling temperature (T◦C) is crucial in
determining the quality of the final product. T can be considered as a random
variable uniformly distributed over 150◦C to 300◦C. It costs £C1 to produce 1
gallon of petroleum. If the oil distills at temperatures
less than 200◦C the product sells for £C2 per gallon. If it distills at a temperature
greater than 200◦C it sells for £C3 per gallon. Find the expected net profit per
gallon.

P(X < 200) = 50 ×(1/150) =1/3


P(X > 200) =2/3
Let F be a random variable defining profit.
F can take two values £(C2 − C1) or £(C3− C1)
x C2 − C1 C3 − C1
P(F = x) ⅓ ⅔
E(F) =⅓ ( C2 −C1)+ ⅔(C3 − C1)
=(C2 −3C1 + 2C3)/3

Example 4
Packages have a nominal net weight of 1 kg. However their actual net weights
have a uniform distribution over the interval 980 g to 1030 g.
(a) Find the probability that the net weight of a package is less than 1 kg.
(b) Find the probability that the net weight of a package is less than w g, where
980 < w <1030.
(c) If the net weights of packages are independent, find the probability that, in a
sample of five packages, all five net weights are less than wg and hence find the
probability density function of the weight of the heaviest of the packages. (Hint:
all five packages weigh less than w g if and only if the heaviest weighs less that w
g).

(a) The required probability is P(W<1000)


=(1000 − 98) /(1030 − 980)
=20/50 = 0.4
(b) The required probability is P(W < w) =(w − 980) /(1030 − 980)
=w − 980/50
(c) The probability that all five weigh less than w g is ((w −980)/50)^5
so the pdf of the heaviest is
d/dw=(( w - 980)/50)^5
=5/50((w −980)/50)^4
= 0.1 ((w −980)/50)
for 980 < w < 1030.
Prepared by:Chirag Vankar (210280723006)
Expectation and Variance

If you have a collection of numbers a1,a2,...,aN, their average is a single number that describes
the whole collection. Now, consider a random variable X. We would like to define its average, or
as it is called in probability, its expected value or mean. The expected value is defined as the
weighted average of the values in the range.

Let X be a discrete random variable with range RX={x1,x2,x3,...} (finite or countably infinite). The
expected value of X, denoted by EX is defined as:

E= ∑ 𝑥𝑃(𝑥)
𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥

Variance is defined as :
2 2 2
Var(x) = 𝐸([(𝑋 − µ) ] = 𝐸[𝑋 ] − (𝐸[𝑋])

Q. A man draws 3 balls from an urn contatining 5 white and 7 blacks balls. He gets Rs.10 for
each white ball and Rs.5 for each black ball. Find his expectation.

Soln.
Probability of drawing 3 white balls = 5𝐶3/12𝐶3 = 1/22
Probability of drawing 2 white and 1 black = (5𝐶2 * 7𝐶1)/(12𝐶3) = 7/22
Probability of drawing 1 white and 2 black= (5𝐶1 * 7𝐶2)/(12𝐶3 = 21/44
Probability of drawing 3 blacks = 7𝐶3/12𝐶3 = 7/44

𝐸(𝑋) = (10 * 3) * (1/22) + (10 * 2 + 5) * 7/22 + (10 + 5 * 2) * 21/44


+ (5 * 3) * 7/44
= 𝑅𝑠. 21. 25

Central Limit Theorem


If X1, X2, X3…… are independently distributed random variables such that 𝐸(𝑋𝑖) = µ𝑖 and
2
𝑉𝑎𝑟(𝑋𝑖) = σ then as 𝑛 → ∞, the distribution of sum of these random variables
2
𝑋1 + 𝑋2 + 𝑋3 +...... tends to normal distribution with mean µ and variance σ .

Q. A coin is tossed 200 times. Find the probability that number of heads obtained is between 80
and 120. Given 𝑃(𝑍 <− 2. 82) = 0. 0024
Let X = no. of heads, n = 100, p = q = 1/2

Since n is large, we can apply central limit theorem.

By binomial distribution,

µ = 𝑛𝑝

=200 * 1/2

= 100
2
σ = 𝑛𝑝𝑞

= 100 * (1/2) * (1/2)

= 50

(𝑋−100)
𝑍 =
50

→ 𝑃(80 < 𝑋 < 120)


80−100 𝑋−100 120−100
→ 𝑃( < < )
50 50 50

→ 𝑃(− 2. 82 < 𝑍 < 2. 82)

→ 0. 4976 + 0. 4976

→ 0. 0052

Q. A random sample of size 100 is taken from a population whose mean is 60 and variance is
400. Using central limit theorem, with what probability we can assert that mean of sample will
not differ from µ = 60 by more than 4? Given 𝑃(𝑍 <− 2) = 0. 0228

Soln. Required probability 𝑃(|𝑋 − 60| ≤ 4)

𝑛 = 100

µ𝑖 = 60

2
σ𝑖 = 400
Sample mean is 𝑋 = 𝑋1 + 𝑋2 +..... + 𝑋100/100

100
1
𝐸(𝑋)= 100
∑ 𝐸(𝑋𝑖)
𝑖=1

1
= 100
[60 + 60 +....]

1
= 100
* 60 * 100 = 60

100
1
𝑉𝑎𝑟(𝑋) = 2 * ∑ 𝑉𝑎𝑟(𝑋𝑖)
100 𝑖=1

1
= 2 * 100 * 400 = 4
100

→ 𝑃(|𝑋 − 60| ≤ 4)

→ 𝑃(− 4 ≤ 𝑋 − 60 ≤ 4)

→ 𝑃(56 ≤ 𝑋 ≤ 64)

56−60 𝑋−60 64−60


→ 𝑃( ≤ ≤ )
4 4 4

→ 𝑃(− 2 ≤ 𝑍 ≤ 2)

→ 0. 4772 + 0. 4772 = 0. 9544

Q. 20 dice are thrown. Find the approx. probability that the sum obtained is between 65 and 75
using the central limit theorem.

Soln.

Required probability 𝑃(65 ≤ 𝑆𝑛 ≤ 75)

𝑥 1 2 3 4 5 6

𝑃(𝑥) 1/6 1/6 1/6 1/6 1/6 1/6

𝑛+1 6+1
Since distribution is uniform, we have 𝐸(𝑋) = 2
= 2
= 7/2
2
𝑛 −1
𝑉𝑎𝑟(𝑋) = 12
= 35/12

𝑛 = 20

𝑆𝑛 = 𝑋1 + 𝑋2 +.... follows normal distribution

20
𝐸(𝑆𝑛) = ∑ 𝐸(𝑋𝑖)
𝑖=1

=(7/2) * 20 = 70

20
𝑉𝑎𝑟(𝑆𝑛) = ∑ 𝑉𝑎𝑟(𝑋𝑖)
𝑖=1

= 20 * (35/12) = 175/3

→ 𝑃(65 ≤ 𝑆𝑛 ≤ 75)

65−70 𝑆𝑛−70 75−70


→ 𝑃( ≤ ≤ )
(175/3) (175/3) (175/3)

→ 𝑃(− 0. 6547 ≤ 𝑍 ≤ 0. 06547)

→ 0. 2422 + 0. 2422

→ 0. 4844

Q. An electric firm manufactures light bulbs that are normally distributed with mean = 800
hours and standard deviation = 40 hours. Find the probability that the bulb burns between 778
and 834 hours.

Soln. Given µ = 800 ℎ𝑜𝑢𝑟𝑠, σ = 40 ℎ𝑜𝑢𝑟𝑠

𝐿𝑒𝑡 𝑋 𝑏𝑒 𝑡ℎ𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠.

Required probability = 𝑃(778 < 𝑋 < 834)

→ 𝑃(778 < 𝑋 < 834)

778−800 𝑋−800 834−800


→ 𝑃( 40
< 40
< 40
)
→ 𝑃(− 0. 55 < 𝑍 < 0. 55)

→ 𝑃(𝑍 < 0. 85) − 𝑃(𝑍 <− 0. 55)

→ 0. 8023 − 0. 2912 → 0. 51111


Chapter 2
Prepared by: Prashant Bhavsar(210280723014),
Topic 1: Random samples
Random Sampling Definition
Random sampling is a method of choosing a sample of observations from a population to make
assumptions about the population. It is also called probability sampling. The counterpart of this
sampling is Non-probability sampling or Non-random sampling. The primary types of this
sampling are simple random sampling, stratified sampling, cluster sampling, and multistage
sampling. In the sampling methods, samples which are not arbitrary are typically called
convenience samples.
○ Sampling Methods
○ Population and Sample
○ Probability
○ Sample Space
The primary feature of probability sampling is that the choice of observations must occur in a
‘random’ way such that they do not differ in any significant way from observations, which are
not sampled. We assume here that statistical experiments contain data that is gathered through
random sampling.
Type of Random Sampling
The random sampling method uses some manner of a random choice. In this method, all the
suitable individuals have the possibility of choosing the sample from the whole sample space. It
is a time consuming and expensive method. The advantage of using probability sampling is that
it ensures the sample that should represent the population. There are four major types of this
sampling method, they are;
● Simple Random Sampling
● Systematic Sampling
● Stratified Sampling
● Clustered Sampling
Now let us discuss its types one by one here.
● Simple random sampling
In this sampling method, each item in the population has an equal and likely possibility
of getting selected in the sample (for example, each member in a group is marked with a
specific number). Since the selection of item completely depends on the possibility,
therefore this method is called “Method of chance Selection”. Also, the sample size is
large, and the item is selected randomly. Thus it is known as “Representative Sampling”.
● Systematic Random Sampling
In this method, the items are chosen from the destination population by choosing the
random selecting point and picking the other methods after a fixed sample period. It is
equal to the ratio of the total population size and the required population size.
● Stratified Random Sampling
In this sampling method, a population is divided into subgroups to obtain a simple
random sample from each group and complete the sampling process (for example,
number of girls in a class of 50 strength). These small groups are called strata. The small
group is created based on a few features in the population. After dividing the population
into smaller groups, the researcher randomly selects the sample.
● Clustered Sampling
Cluster sampling is similar to stratified sampling, besides the population is divided into a
large number of subgroups (for example, hundreds of thousands of strata or subgroups).
After that, some of these subgroups are chosen at random and simple random samples
are then gathered within these subgroups. These subgroups are known as clusters. It is
basically utilized to lessen the cost of data compilation.
● Random Sampling Formula
If P is the probability, n is the sample size, and N is the population. Then;
The chance of getting a sample selected only once is given by;
P = 1 – (N-1/N).(N-2/N-1)…..(N-n/N-(n-1))
Canceling = 1-(N-n/n)
P = n/N
The chance of getting a sample selected more than once is given by;

P = 1-(1-(1/N))n
Topic 2: Sampling distributions of estimators
● Since our estimators are statistics (particular functions of random variables), their
distribution can be derived from the joint distribution of X 1 . . . X n .
● It is called the sampling distribution because it is based on the joint distribution of the
random sample.
● Given a sampling distribution, we can
○ Calculate the probability that an estimator will not differ from the parameter θ
by more than a specified amount.
○ Obtain interval estimates rather than point estimates after we have a sample; an
interval estimate is a random interval such that the true parameter lies within
this interval with a given probability (say 95%).
○ Choose between two estimators we can, for instance, calculate the
mean-squared error of the estimator, E θ [( θ̂ − θ) 2 ] using the distribution of θ.
● Sampling distributions of estimators depend on sample size, and we want to know
exactly how the distribution changes as we change this size so that we can make the
right trade-offs between cost and accuracy
Why is this importance ?
Estimate the population parameter, we can use samples!
Use for better understanding :
https://onlinestatbook.com/2/sampling_distributions/samp_dist_mean.html
Sampling Distribution of the Mean
Learning Objectives
1. State the mean and variance of the sampling distribution of the mean
2. Compute the standard error of the mean
3. State the central limit theorem

The sampling distribution of the mean was defined in the section introducing sampling
distributions. This section reviews some important properties of the sampling distribution of the
mean introduced in the demonstrations in this chapter.
Mean
The mean of the sampling distribution of the mean is the mean of the population from which
the scores were sampled. Therefore, if a population has a mean μ, then the mean of the
sampling distribution of the mean is also μ. The symbol μM is used to refer to the mean of the
sampling distribution of the mean. Therefore, the formula for the mean of the sampling
distribution of the mean can be written as:
μM = μ
Variance
The variance of the sampling distribution of the mean is computed as follows:

That is, the variance of the sampling distribution of the mean is the population variance divided
by N, the sample size (the number of scores used to compute a mean). Thus, the larger the
sample size, the smaller the variance of the sampling distribution of the mean.
(optional) This expression can be derived very easily from the variance sum law. Let's begin by
computing the variance of the sampling distribution of the sum of three numbers sampled from
a population with variance σ2. The variance of the sum would be σ2 + σ2 + σ2. For N numbers,
the variance would be Nσ2. Since the mean is 1/N times the sum, the variance of the sampling
distribution of the mean would be 1/N2 times the variance of the sum, which equals σ2/N.
The standard error of the mean is the standard deviation of the sampling distribution of the
mean. It is therefore the square root of the variance of the sampling distribution of the mean
and can be written as:
The standard error is represented by a σ because it is a standard deviation. The subscript (M)
indicates that the standard error in question is the standard error of the mean.
Central Limit Theorem
The central limit theorem states that:
Given a population with a finite mean μ and a finite non-zero variance σ2, the
sampling distribution of the mean approaches a normal distribution with a mean of
μ and a variance of σ2/N as N, the sample size, increases.
The expressions for the mean and variance of the sampling distribution of the mean are not
new or remarkable. What is remarkable is that regardless of the shape of the parent population,
the sampling distribution of the mean approaches a normal distribution as N increases. If you
have used the "Central Limit Theorem Demo," you have already seen this for yourself. As a
reminder, Figure 1 shows the results of the simulation for N = 2 and N = 10. The parent
population was a uniform distribution. You can see that the distribution for N = 2 is far from a
normal distribution. Nonetheless, it does show that the scores are denser in the middle than in
the tails. For N = 10 the distribution is quite close to a normal distribution. Notice that the
means of the two distributions are the same, but that the spread of the distribution for N = 10 is
smaller.

Figure 1. A simulation of a sampling distribution. The parent population is uniform. The blue line
under "16" indicates that 16 is the mean. The red line extends from the mean plus and minus
one standard deviation.
Figure 2 shows how closely the sampling distribution of the mean approximates a normal
distribution even when the parent population is very non-normal. If you look closely you can
see that the sampling distributions do have a slight positive skew. The larger the sample size,
the closer the sampling distribution of the mean would be to a normal distribution.
Figure 2. A simulation of a sampling distribution. The parent population is very non-normal.
Topic 3: Methods of Moments

Example
Topic 4: Maximum Likelihood
Chapter 3
Presented by : Parth Nirmal (210280723008)

Topic : Statistical Inference


Statistical Inference :

Statistical inference is the process of using data analysis to deduce properties of an underlying probability
distribution.

Inference means the process of drawing conclusions about population parameters based on a sample taken
from the population.

Population and Samples :

Population means a set of all units and a sample is the data we collect from the population.

For example, Population is all the apples in orchard and

Sample is 100 apples we picked from the orchard.

Confidence Interval :
A confidence interval is how much uncertainty there is with any particular statistic. Confidence intervals are
often used with a margin of error. It tells you how confident you can be that the results from a poll or
survey reflect what you would expect to find if it were possible to survey the entire population. Confidence
intervals are intrinsically connected to confidence levels.

Confidence Interval: (CI or µ) = X ̅±Zα_(/2) (σ/√n)

Where, X is sample mean,

Z is critical value,

n is sample size,

α is level of significance,

σ is standard deviation

Level of significance: α=1–C


Where, C is confidence

Note: whole Z(σ/√n) is known as Margin of Error (E)

Example :
To infer the average strength of some product A, a sample of size of 80 is taken from the entire lot of that
product. The sample mean is 18.85 with sample variance 30.77. Construct a 99% confidence interval for the
product’s true average strength.

Hypothesis testing :
A hypothesis is an educated guess about something in the world around you. It should be testable, either
by experiment or observation.

Example :

The specification for a certain kind of ribbon calls for a mean breaking strength of 180 pounds. If five pieces
of the ribbon have a mean breaking strength of 169.5 pounds with a standard deviation of 5.7 pounds, test
the null hypothesis µ = 180 pounds against the alternative hypothesis µ < 180 pounds at the 0.01 level of
significance. Assume that the population distribution is normal.

Presented by : Ram Rabari (210280723005)


Topic : Introduction to multivariate analysis

Multivariate Analysis :

Multivariate analysis is an analysis of statistical techniques that analyze the relationship between more
than two variables which shows the effect of more than one variable on one independent variable.

Multivariate analysis shows the effect of more than one variable on one independent variable.

Objectives of Multivariate Analysis :

•Data reduction
•Grouping
•Investigate relationship among variables
•Prediction
•Hypothesis construction and testing
Multivariate analysis techniques :

•Principal Components and Common Factor Analysis


•Multiple Regression and Multiple Correlation
•Multiple Discriminant Analysis
•Multivariate Analysis of Variance and Covariance
•Conjoint Analysis
•Canonical Correlation
•Cluster Analysis
•Multidimensional Scaling
•Correspondence Analysis
•Linear Probability Models such as Logit and Probit
•Simultaneous/Structural Equation Modeling

Presented by : Monali Patel (210280723003)

Topic : Statistical Inference and Introduction to multivariate statistical models: regression and
classification problems

Multivariate Statistics :

Multivariate Statistics is a subdivision of statistics encompassing the simultaneous observation and analysis
of more than two outcome variables.
Which simple means it looks at more than two variables.

Multivariate Classification problem :

A class or cluster is a grouping of points in this multidimensional attribute space. Two locations belong to
the same class or cluster if their attributes (vector of band values) are similar. A multiband raster and
individual single band rasters can be used as the input into a multivariate statistical analysis.

In a normal classification problem we give the answer in yes/no right/wrong possible/not possible.

Here we can give an answer that the test tuple actually belongs in which class.

Multivariate Classification is also called multi-class classification.

Example :

Here, many classes are available like class1,class2,class3,class4 so, it’s multiclass classification problem.
1. Part of speech tagging (verb,noun,adjective,etc.)

2. Different topics

For the multiclass classification there are two methods:

1. One-vs-all

2. One-vs-one
Example :

(1) One – vs – all

Here three classes are available.

To solve this multiclass classification problem we have to generate classifier(models).

In one-vs-all method number of classifier = number of classes

Number of classes = 3

Number of classifier we have to generate = 3


Now we get 3 classification models.

M1,M2,M3
M1 and M3 is giving output +1.

And M2 is giving output -1.

So here we can neglect the M2.(so the test tuple is not belongs to M2)

Here M1 and M2 are giving the same output +1.

But the probability is different M1 is 90% and M3 is 70% belongs to the test tuple. So here we can say
that the test tuple is belongs to M1 because the probability is greater than M3.

(2) One – vs – one

Here all the procedure are same

But the number of classifier are different

Number of classifier = n(n-1)/2

And here the one class is not belong to one model.


Here pair of classes are belong to one model.

C1,C2 -> M12

C2,C3 -> M23

C3,C1 -> M31


Multivariate Regression problem :

Multivariate Regression helps use to measure the angle of more than one independent variable and more
than one dependent variable. It finds the relation between the variables (Linearly related).

It used to predict the behavior of the outcome variable and the association of predictor variables and how
the predictor variables are changing.

It can be applied to many practical fields like politics, economics, medical, research works and many
different kinds of businesses.

Examples :

(1) If E-commerce Company has collected the data of its customers such as Age, purchased history of a
customer, gender and company want to find the relationship between these different dependents and
independent variables.

(2) A gym trainer has collected the data of his client that are coming to his gym and want to observe some
things of client that are health, eating habits (which kind of product client is consuming every week), the
weight of the client. This wants to find a relation between these variables.

Example :

In above table there are only x and y in dataset so we can easily find the regression.
Multiple features(variable):

Here there are many variables in the dataset. So it’s called multivariate statistics.
Presented By: Zalak Mistry ( 210280723011 )

Topic: The problem of overfitting model assessment


What is Overfitting?

A statistical model is said to be overfitted when we feed it a lot more data than necessary.

A model is said to be overfitted if we feed it with lot more data than necessary .When a model
fits more data than it needs, it starts catching the noisy data and inaccurate values in the data.
As a result, the efficiency of model decrease.

To make it relatable, imagine trying to fit into oversized apparel.

When a model fits more data than it actually needs, it starts catching the noisy data and
inaccurate values in the data. As a result, the efficiency and accuracy of the model decrease. Let
us take a look at a few examples of overfitting in order to understand how it actually happens.

Examples Of Overfitting

Example 1
If we take an example of simple linear regression, training the data is all about finding out the
minimum cost between the best fit line and the data points. It goes through a number of
iterations to find out the optimum best fit, minimizing the cost. This is where overfitting comes
into the picture.

The line seen in the image above can give a very efficient outcome for a new data point. In the
case of overfitting, when we run the training algorithm on the data set, we allow the cost to
reduce with each number of iteration.

Running this algorithm for too long will mean a reduced cost but it will also fit the noisy data
from the data set. The result would look something like in the graph below.

This might look efficient but isn’t really. The main goal of an algorithm such as linear regression
is to find a dominant trend and fit the data points accordingly. But in this case, the line fits all
data points, which is irrelevant to the efficiency of the model in predicting optimum outcomes
for new entry data points.
Now let us consider a more descriptive example with the help of a problem statement.

Example 2

Problem Statement: Let us consider we want to predict if a soccer player will land a slot in a tier
1 football club based on his/her current performance in the tier 2 league.

Now imagine, we train and fit the model with 10,000 such players with outcomes. When we try
to predict the outcome on the original data set, let us say we got a 99% accuracy. But the
accuracy on a different data set comes around 50 percent. This means the model does not
generalize well from our training data and unseen data.

This is what overfitting looks like. It is a very common problem in Machine Learning and even
data science. Now let us understand the signal and noise.

Signal vs Noise
In predictive modeling, signal refers to the true underlying pattern that helps the model to learn
the data. On the other hand, noise is irrelevant and random data in the data set. To understand
the concept of noise and signal, let us take a real-life example.

Let us suppose we want to model age vs literacy among adults. If we sample a very large part of
the population, we will find a clear relationship. This is the signal, whereas noise interferes with
the signal. If we do the same on a local population, the relationship will become muddy. It
would be affected by outliers and randomness, for e.g, one adult went to school early or some
adult couldn’t afford education, etc.

Talking about noise and signal in terms of Machine Learning, a good Machine Learning
algorithm will automatically separate signals from the noise. If the algorithm is too complex or
inefficient, it may learn the noise too. Hence, overfitting the model.

How To Detect Overfitting?


The main challenge with overfitting is to estimate the accuracy of the performance of our model
with new data. We would not be able to estimate the accuracy until we actually test it.

To address this problem, we can split the initial data set into separate training and test data
sets. With this technique, we can actually approximate how well our model will perform with
the new data.

Let us understand this with an example, imagine we get a 90+ percent accuracy on the training
set and a 50 percent accuracy on the test set. Then, automatically it would be a red flag for the
model.
Another way to detect overfitting is to start with a simplistic model that will serve as a
benchmark.

With this approach, if you try more complex algorithms you will be able to understand if the
additional complexity is even worthwhile for the model or not. It is also known as Occam’s razor
test, it basically chooses the simplistic model in case of comparable performance in case of two
models. Although detecting overfitting is a good practice, but there are several techniques to
prevent overfitting as well. Let us take a look at how we can prevent overfitting in Machine
Learning.

How to Avoid Overfitting In Machine Learning?


There are several techniques to avoid overfitting in Machine Learning altogether listed below.

1. Cross-Validation
2. Training with more data
3. Removing Features
4. Early Stopping
5. Regularization
6. Ensembling
1. Cross-Validation

One of the most powerful features to avoid/prevent overfitting is cross-validation. The idea
behind this is to use the initial training data to generate mini train-test-splits, and then use
these splits to tune your model.

In a standard k-fold validation, the data is partitioned into k-subsets also known as folds. After
this, the algorithm is trained iteratively on k-1 folds while using the remaining folds as the test
set, also known as holdout fold.
The cross-validation helps us to tune the hyperparameters with only the original training set. It
basically keeps the test set separately as a true unseen data set for selecting the final model.
Hence, avoiding overfitting altogether.

2. Training With More Data

This technique might not work every time, as we have also discussed in the example above,
where training with a significant amount of population helps the model. It basically helps the
model in identifying the signal better.

But in some cases, the increased data can also mean feeding more noise to the model. When
we are training the model with more data, we have to make sure the data is clean and free from
randomness and inconsistencies.

3. Removing Features

Although some algorithms have an automatic selection of features. For a significant number of
those who do not have a built-in feature selection, we can manually remove a few irrelevant
features from the input features to improve the generalization.

One way to do it is by deriving a conclusion as to how a feature fits into the model. It is quite
similar to debugging the code line-by-line.

In case if a feature is unable to explain the relevancy in the model, we can simply identify those
features. We can even use a few feature selection heuristics for a good starting point.

4. Early Stopping

When the model is training, you can actually measure how well the model performs based on
each iteration. We can do this until a point when the iterations improve the model’s
performance. After this, the model overfits the training data as the generalization weakens after
each iteration.
So basically, early stopping means stopping the training process before the model passes the
point where the model begins to overfit the training data. This technique is mostly used in deep
learning.

5. Regularization

It basically means, artificially forcing your model to be simpler by using a broader range of
techniques. It totally depends on the type of learner that we are using. For example, we can
prune a decision tree, use a dropout on a neural network or add a penalty parameter to the
cost function in regression.

Quite often, regularization is a hyperparameter as well. It means it can also be tuned through
cross-validation.

6. Ensembling

This technique basically combines predictions from different Machine Learning models. Two of
the most common methods for ensembling are listed below:

· Bagging attempts to reduce the chance overfitting the models


· Boosting attempts to improve the predictive flexibility of simpler models
Even though they are both ensemble methods, the approach totally starts from opposite
directions. Bagging uses complex base models and tries to smooth out their predictions while
boosting uses simple base models and tries to boost its aggregate complexity.
Chapter 4
Presented by : Parth Nirmal (210280723008)

Topic : Graph Theory


Powered By : Nikita Nagdevani (210280723007)
1. Explain the OSI model in detail

o Ans: OSI stands for Open System Interconnection is a reference model that describes
how
information from a software application in one computer moves through a physical
medium to the software application in another computer.
o OSI consists of seven layers, and each layer performs a particular network function. o OSI
model was developed by the International Organization for Standardization (ISO) in 1984,
and it is now considered as an architectural model for the inter-computer communications.
o OSI model divides the whole task into seven smaller and manageable tasks. Each layer is
assigned a particular task.
o Each layer is self-contained, so that task assigned to each layer can be performed indepen

dently.

Characteristics of OSI Model:


● The OSI model is divided into two layers: upper layers and lower layers. o The upper layer
of the OSI model mainly deals with the application related issues, and they are
implemented only in the software. The application layer is closest to the end user. Both
the end user and the application layer interact with the software applications. An upper
layer refers to the layer just above another layer.
● The lower layer of the OSI model deals with the data transport issues. The data link
layer and the physical layer are implemented in hardware and software. The physical
layer is the lowest layer of the OSI model and is closest to the physical medium. The
physical layer is mainly responsible for placing the information on the physical medium.
Functions of the OSI Layers

There are the seven OSI layers. Each layer has different functions. A list of seven layers are given
below:
1. Physical Layer
2. Data-Link Layer
3. Network Layer
4. Transport Layer
5. Session Layer
6. Presentation Layer
7. Application Layer

Physical layer

o The main functionality of the physical layer is to transmit the individual bits from one
node to another node.
o It is the lowest layer of the OSI model.
o It establishes, maintains and deactivates the physical connection.
o It specifies the mechanical, electrical and procedural network interface specifications.

Functions of a Physical layer:

o Line Configuration: It defines the way how two or more devices can be connected
physically.
o Data Transmission: It defines the transmission mode whether it is simplex, half duplex or
full-duplex mode between the two devices on the network.
o Topology: It defines the way how network devices are arranged.
o Signals: It determines the type of the signal used for transmitting the information.

Data-Link Layer

o This layer is responsible for the error-free transfer of data frames.


o It defines the format of the data on the network.
o It provides a reliable and efficient communication between two or more devices. o It is
mainly responsible for the unique identification of each device that resides on a local
network.
o It contains two sub-layers:
o Logical Link Control Layer
o It is responsible for transferring the packets to the Network layer of the
receiver that is receiving.
o It identifies the address of the network layer protocol from the header. o
It also provides flow control.
o Media Access Control Layer
o A Media access control layer is a link between the Logical Link Control layer
and the network's physical layer.
o It is used for transferring the packets over the network.

Functions of the Data-link layer


o Framing: The data link layer translates the physical's raw bit stream into packets known as
Frames. The Data link layer adds the header and trailer to the frame. The header which
is added to the frame contains the hardware destination and source address.

o Physical Addressing: The Data link layer adds a header to the frame that contains a
destination address. The frame is transmitted to the destination address mentioned in
the header.
o Flow Control: Flow control is the main functionality of the Data-link layer. It is the
technique through which the constant data rate is maintained on both the sides so that
no data get corrupted. It ensures that the transmitting station such as a server with
higher processing speed does not exceed the receiving station, with lower processing
speed.
o Error Control: Error control is achieved by adding a calculated value CRC (Cyclic
Redundancy Check) that is placed to the Data link layer's trailer which is added to the
message frame before it is sent to the physical layer. If any error seems to occurr, then
the receiver sends the acknowledgment for the retransmission of the corrupted frames.
o Access Control: When two or more devices are connected to the same communication
channel, then the data link layer protocols are used to determine which device has
control over the link at a given time.

Network Layer
o It is a layer 3 that manages device addressing, tracks the location of devices on the
network.
o It determines the best path to move data from source to the destination based on the
network conditions, the priority of service, and other factors.
o The Data link layer is responsible for routing and forwarding the packets.
o Routers are the layer 3 devices, they are specified in this layer and used to provide the
routing services within an internetwork.
o The protocols used to route the network traffic are known as Network layer protocols.
Examples of protocols are IP and Ipv6.

Functions of Network Layer:

o Internetworking: An internetworking is the main responsibility of the network layer. It


provides a logical connection between different devices.
o Addressing: A Network layer adds the source and destination address to the header of
the frame. Addressing is used to identify the device on the internet.
o Routing: Routing is the major component of the network layer, and it determines the best
optimal path out of the multiple paths from source to the destination. o Packetizing: A
Network Layer receives the packets from the upper layer and converts them into packets.
This process is known as Packetizing. It is achieved by internet protocol (IP).

Transport Layer

o The Transport layer is a Layer 4 ensures that messages are transmitted in the order in
which they are sent and there is no duplication of data.
o The main responsibility of the transport layer is to transfer the data completely. o It
receives the data from the upper layer and converts them into smaller units known as
segments.
o This layer can be termed as an end-to-end layer as it provides a point-to-point connection
between source and destination to deliver the data reliably.
The two protocols used in this layer are:

o Transmission Control Protocol


o It is a standard protocol that allows the systems to communicate over the internet.
o It establishes and maintains a connection between hosts.
o When data is sent over the TCP connection, then the TCP protocol divides the data
into smaller units known as segments. Each segment travels over the internet
using multiple routes, and they arrive in different orders at the destination. The
transmission control protocol reorders the packets in the correct order at the
receiving end.
o User Datagram Protocol
o User Datagram Protocol is a transport layer protocol.
o It is an unreliable transport protocol as in this case receiver does not send any
acknowledgment when the packet is received, the sender does not wait for any
acknowledgment. Therefore, this makes a protocol unreliable.

Functions of Transport Layer:

o Service-point addressing: Computers run several programs simultaneously due to this


reason, the transmission of data from source to the destination not only from one
computer to another computer but also from one process to another process. The
transport layer adds the header that contains the address known as a service-point
address or port address. The responsibility of the network layer is to transmit the data
from one computer to another computer and the responsibility of the transport layer is
to transmit the message to the correct process.
o Segmentation and reassembly: When the transport layer receives the message from the
upper layer, it divides the message into multiple segments, and each segment is
assigned with a sequence number that uniquely identifies each segment. When the
message has arrived at the destination, then the transport layer reassembles the
message based on their sequence numbers.
o Connection control: Transport layer provides two services Connection-oriented service
and connectionless service. A connectionless service treats each segment as an
individual packet, and they all travel in different routes to reach the destination. A
connection-oriented service makes a connection with the transport layer at the
destination machine before delivering the packets. In connection-oriented service, all
the packets travel in the single route.
o Flow control: The transport layer also responsible for flow control but it is performed
end-to-end rather than across a single link.
o Error control: The transport layer is also responsible for Error control. Error control is
performed end-to-end rather than across the single link. The sender transport layer
ensures that message reach at the destination without any error.

Session Layer

o It is a layer 3 in the OSI model.


o The Session layer is used to establish, maintain and synchronizes the interaction between
communicating devices.

Functions of Session layer:

o Dialog control: Session layer acts as a dialog controller that creates a dialog between two
processes or we can say that it allows the communication between two processes which
can be either half-duplex or full-duplex.
o Synchronization: Session layer adds some checkpoints when transmitting the data in a
sequence. If some error occurs in the middle of the transmission of data, then the
transmission will take place again from the checkpoint. This process is known as
Synchronization and recovery.

Presentation Layer
o A Presentation layer is mainly concerned with the syntax and semantics of the information
exchanged between the two systems.
o It acts as a data translator for a network.
o This layer is a part of the operating system that converts the data from one presentation
format to another format.
o The Presentation layer is also known as the syntax layer.

Functions of Presentation layer:

o Translation: The processes in two systems exchange the information in the form of
character strings, numbers and so on. Different computers use different encoding
methods, the presentation layer handles the interoperability between the different
encoding methods. It converts the data from sender-dependent format into a common
format and changes the common format into receiver-dependent format at the receiving
end.
o Encryption: Encryption is needed to maintain privacy. Encryption is a process of
converting the sender-transmitted information into another form and sends the
resulting message over the network.
o Compression: Data compression is a process of compressing the data, i.e., it reduces the
number of bits to be transmitted. Data compression is very important in multimedia
such as text, audio, video.
Application Layer

● An application layer serves as a window for users and application processes to access
network service.
● It handles issues such as network transparency, resource allocation, etc. o An application
layer is not an application, but it performs the application layer functions.
● This layer provides the network services to the end-users.

Functions of Application layer:


1. File transfer, access, and management (FTAM): An application layer allows a user to
access the files in a remote computer, to retrieve the files from a computer and to
manage the files in a remote computer.
2. Mail services: An application layer provides the facility for email forwarding and storage.

Directory services: An application provides the distributed database sources and is used to
provide that global information about various objects.

2. Explain various application of data mining.

Ans:

• Data mining is how the patterns in large data sets are viewed and discovered using
intersecting techniques such as statistics, machine learning, and ones like databases
systems.
• It involves data extraction from a group of raw and unidentified data sets to
provide some meaningful results through mining.
• The extracted data is then further used by using transformation and ensuring that it is
being arranged to best service as per business requirements and needs.
List of Data Mining Applications
• Here is the list of various Data Mining Applications, which are given below:

1. Financial firms, banks, and their analysis


• Many data mining techniques are involved in critical banking and financial data
providing and keeping firms whose data is of utmost importance. • One such
method is distributed data mining, which is researched, modelled, crafted, and
developed to help track suspicious activities or any mischievous or fraudulent
transactions related to the credit card, net banking, or any other banking service.

9
2 .Health care domain and insurance domain
• The data mining-related applications can efficiently track and monitor a
patient’s health condition and help in efficient diagnosis based on the past
sickness record.
• Similarly, the insurance industry’s growth depends on the ability to convert the
data into knowledge form or by providing various details about the
customers, markets, and prospective competitors.
• Therefore all those companies who have applied the data mining techniques
efficiently have reaped the benefits. T
• his is used over the claims and their analysis, i.e., identifying the medical
procedures claimed together.
• It enables the forecasting of new policies, helps detect risky customer
behaviour patterns, and helps see fraudulent behaviour.
3. Application in the domain of transportation
• The historic or batch form of data will help identify the mode of transport a
particular customer generally opts for going to a particular place, say his home
town, thereby providing him alluring offers and heavy discounts on new
products and launched services.
• This will thus be included in targeted and organic advertisements where the
prospective leader of the customer generates the right to converted the lead. • It
is also helpful in determining the distribution of the schedules among various
warehouses and outlets for analyzing load based patterns.
4. Education
• In education, the application of data mining has been prevalent, where the
emerging field of educational data mining focuses mainly on the ways and
methods by which the data can be extracted from age-old processes and
systems of educational institutions.
• The goal is often provided by making a student grow and learn in various
facets using advanced scientific knowledge.
• Here data mining comes majorly into play by ensuring that the right quality of
knowledge and decision-making content is provided to the education
departments.
3. Explain Scheduling Algorithm of Operating System.
A Process Scheduler schedules different processes to be assigned to the CPU based on
particular scheduling algorithms. There are six popular process scheduling algorithms which
we are going to discuss in this chapter −

• First-Come, First-Served (FCFS) Scheduling


• Shortest-Job-Next (SJN) Scheduling
• Priority Scheduling
• Shortest Remaining Time
• Round Robin(RR) Scheduling
• Multiple-Level Queues Scheduling

These algorithms are either non-preemptive or preemptive. Non-preemptive algorithms are


designed so that once a process enters the running state, it cannot be preempted until it
completes its allotted time, whereas the preemptive scheduling is based on priority where a
scheduler may preempt a low priority running process anytime when a high priority process
enters into a ready state.
First Come First Serve (FCFS)

• Jobs are executed on first come, first serve basis.


• It is a non-preemptive, pre-emptive scheduling algorithm.
• Easy to understand and implement.
• Its implementation is based on FIFO queue.
• Poor in performance as average wait time is high.

Wait time of each process is as follows −


Process Wait Time : Service Time - Arrival Time

P0 0-0=0

P1 5-1=4

P2 8-2=6

P3 16 - 3 = 13

Average Wait Time: (0+4+6+13) / 4 = 5.75

Shortest Job Next (SJN)

• This is also known as shortest job first, or SJF

• This is a non-preemptive, pre-emptive scheduling algorithm.


• Best approach to minimize waiting time.

• Easy to implement in Batch systems where required CPU time is known in advance. •

Impossible to implement in interactive systems where required CPU time is not known. • The
processer should know in advance how much time process will take. Given: Table of
processes, and their Arrival time, Execution time

Process Arrival Time Execution Service Time


Time

P0 0 5 0

P1 1 3 5

P2 2 8 14

P3 3 6 8

12
Waiting time of each process is as follows −
Process Waiting Time

P0 0-0=0

P1 5-1=4

P2 14 - 2 = 12

P3 8-3=5

Average Wait Time: (0 + 4 + 12 + 5)/4 = 21 / 4 = 5.25

Priority Based Scheduling

• Priority scheduling is a non-preemptive algorithm and one of the most common


scheduling algorithms in batch systems.
• Each process is assigned a priority. Process with highest priority is to be executed first
and so on.
• Processes with same priority are executed on first come first served basis.

• Priority can be decided based on memory requirements, time requirements or any other
resource requirement.
Given: Table of processes, and their Arrival time, Execution time, and priority. Here we are
considering 1 is the lowest priority.
Process Arrival Time Execution Priority Service
Time Time

P0 0 5 1 0

P1 1 3 2 11

P2 2 8 1 14
P3 3 6 3 5

Waiting time of each process is as follows −


Process Waiting Time

P0 0-0=0

P1 11 - 1 = 10

P2 14 - 2 = 12

P3 5-3=2

Average Wait Time: (0 + 10 + 12 + 2)/4 = 24 / 4 = 6

Shortest Remaining Time

• Shortest remaining time (SRT) is the preemptive version of the SJN algorithm.

• The processor is allocated to the job closest to completion but it can be preempted by a
newer ready job with shorter time to completion.
• Impossible to implement in interactive systems where required CPU time is not known. •

It is often used in batch environments where short jobs need to give preference.
Round Robin Scheduling

• Round Robin is the preemptive process scheduling algorithm.

• Each process is provided a fix time to execute, it is called a quantum.

• Once a process is executed for a given time period, it is preempted and other process
executes for a given time period.
• Context switching is used to save states of preempted processes.

Wait time of each process is as follows −


Process Wait Time : Service Time - Arrival Time

P0 (0 - 0) + (12 - 3) = 9

P1 (3 - 1) = 2

P2 (6 - 2) + (14 - 9) + (20 - 17) = 12

P3 (9 - 3) + (17 - 12) = 11

Average Wait Time: (9+2+12+11) / 4 = 8.5

Multiple-Level Queues Scheduling

Multiple-level queues are not an independent scheduling algorithm. They make use of other
existing algorithms to group and schedule jobs with common characteristics.

• Multiple queues are maintained for processes with common characteristics. •


Each queue can have its own scheduling algorithms.
• Priorities are assigned to each queue.

For example, CPU-bound jobs can be scheduled in one queue and all I/O-bound jobs in
another queue. The Process Scheduler then alternately selects jobs from each queue and
assigns them to the CPU based on the algorithm assigned to the queue.

4.Explain Waterfall Model And Incremental Model.


Ans:. This model is named "Waterfall Model", because its diagrammatic representation
resembles a cascade of waterfalls.
1. Requirements analysis and specification phase:
● The aim of this phase is to understand the exact requirements of the customer
and to document them properly.
● Both the customer and the software developer work together so as to document
all the functions, performance, and interfacing requirement of the software. • It
describes the "what" of the system to be produced and not "how."In this phase,
a large document called Software Requirement Specification (SRS) document is
created which contained a detailed description of what the system will do in the
common language.

16
2. Design Phase: This phase aims to transform the requirements gathered in the SRS into a
suitable form which permits further coding in a programming language. It defines the overall
software architecture together with high level and detailed design. All this work is documented
as a Software Design Document (SDD).

3. Implementation and unit testing: During this phase, design is implemented. If the SDD is
complete, the implementation or coding phase proceeds smoothly, because all the information
needed by software developers is contained in the SDD.

During testing, the code is thoroughly examined and modified. Small modules are tested in
isolation initially. After that these modules are tested by writing some overhead code to check
the interaction between these modules and the flow of intermediate output.

4. Integration and System Testing: This phase is highly crucial as the quality of the end product
is determined by the effectiveness of the testing carried out. The better output will lead to
satisfied customers, lower maintenance costs, and accurate results. Unit testing determines the
efficiency of individual modules. However, in this phase, the modules are tested for their
interactions with each other and with the system.

5. Operation and maintenance phase: Maintenance is the task performed by every user once
the software has been delivered to the customer, installed, and operational.

When to use SDLC Waterfall Model?

● Some Circumstances where the use of the Waterfall model is most suited are:
When the requirements are constant and not changed regularly.
● A project is short
● The situation is calm
● Where the tools and technology used is consistent and is not changing
● When resources are well prepared and are available to use.
● Advantages of Waterfall model
● This model is simple to implement also the number of resources that are required for it
is minimal.
● The requirements are simple and explicitly declared; they remain unchanged during the
entire project development.
● The start and end points for each phase is fixed, which makes it easy to cover progress. •
The release date for the complete product, as well as its final cost, can be determined
before development.
● It gives easy to control and clarity for the customer due to a strict reporting system.

Disadvantages of Waterfall model

● In this model, the risk factor is higher, so this model is not suitable for more significant
and complex projects.
● This model cannot accept the changes in requirements during development. • It
becomes tough to go back to the phase. For example, if the application has now shifted
to the coding phase, and there is a change in requirement, It becomes tough to go back
and change it.
● Since the testing done at a later stage, it does not allow identifying the challenges and
risks in the earlier phase, so the risk reduction strategy is difficult to prepare.

INCREMENTAL MODEL

• Incremental Model is a process of software development where requirements divided


into multiple standalone modules of the software development cycle.
• In this model, each module goes through the requirements, design, implementation and
testing phases.
• Every subsequent release of the module adds function to the previous release. The
process continues until the complete system achieved.

The various phases of incremental model are as follows:


1. Requirement analysis: In the first phase of the incremental model, the product analysis
expertise identifies the requirements. And the system functional requirements are understood
by the requirement analysis team. To develop the software under the incremental model, this
phase performs a crucial role.

2. Design & Development: In this phase of the Incremental model of SDLC, the design of the
system functionality and the development method are finished with success. When software
develops new practicality, the incremental model uses style and development phase.

18
3. Testing: In the incremental model, the testing phase checks the performance of each existing
function as well as additional functionality. In the testing phase, the various methods are used
to test the behavior of each task.
4. Implementation: Implementation phase enables the coding phase of the development
system. It involves the final coding that design in the designing and development phase and
tests the functionality in the testing phase. After completion of this phase, the number of the
product working is enhanced and upgraded up to the final system product

When we use the Incremental Model ?

o When the requirements are superior.


o A project has a lengthy development schedule.
o When Software team are not very well skilled or trained.
o When the customer demands a quick release of the product.
o You can develop prioritized requirements first.

Advantage of Incremental Model

o Errors are easy to be recognized.


o Easier to test and debug
o More flexible.
o Simple to manage risk because it handled during its iteration.
o The Client gets important functionality early.

Disadvantage of Incremental Model

o Need for good planning


o Total Cost is high.
o Well defined module interfaces are needed.

5. What is Web Analytics? Explain outcomes of web analytics.

• Web Analytics is the process of collecting, processing, and analyzing of website data. •
With Web analytics, we can truly see how effective our marketing campaigns have been,
find problems in our online services and make them better, and create customer profiles to
boost the profitability of advertisement and sales efforts.
• Every successful business is based on its ability to understand and utilize the data
provided by its customers, competitors, and partners.
Benefits of Web Aanalytics.

1. Analyze online traffic

Web analytics help you analyze the online traffic that comes to your website. From how
many potential customers and users visit and browse through your webpage. It also tells
us where most of the potential customers come from, what they were doing on your
webpage and the amount of time they spent on your webpage. Because of this, and the
way the data is presented, you can easily identify what and which activities produce
better results and profits.

2. Track the bounce rate

The bounce rate of your website basically means the number of times a user has
browsed through your website, or even visited your website without interacting with
your webpage. If you have a high bounce rate, it means that overall, your website has a
weak user experience and the user did not feel as though the content was suited for
their purpose or what they were searching for. And when you have a high bounce rate, it
is really hard for your business to perform well in sales or quality leads and any sort of
conversions as well.

3. Target the ideal and right audience

For every business, it is very important that you find the right audience and ensure that
your content reaches to them in order to capitalize onto your efforts. Web analytics help
the companies by giving them information on how and where to find the right audience
as well as how to create content and create the right audience. With the right audience,
you make better and right marketing campaigns that will increase and promote the sales
which further on increases the conversions and improves your website tenfold.

4. Track and analyze your marketing campaigns

Tracking the success rate of your marketing campaigns will tell you how the campaigns
have been perceived by the user and if it has been a success or not, and also if the
whole campaign was profitable to your website and you. By tracking with the help of
unique links for each campaign, you are also ensuring that if the campaigns perform
terribly, you can always cancel it.

You might also like