Unit 3 KMS
Unit 3 KMS
𝝁𝟐 = 𝝁′𝟐 − 𝝁′𝟏 𝟐
𝝁𝟑 = 𝝁′𝟑 − 𝟑𝝁′𝟏 𝝁′𝟐 + 𝟐𝝁′𝟏 𝟑
𝝁𝟒 = 𝝁′𝟒 + 𝟔𝝁′𝟏 𝟐 𝝁′𝟐 − 𝟒𝝁′𝟏 𝝁′𝟑 − 𝟑 𝝁′𝟏 𝟒
Example: Let the height of 5 students in a class is 150cm, 155cm, 160m, 165cm, and
180cm. Find the mean height of the class.
Solution:-
Sum of height of all students = 150 + 155 + 160 + 165 + 180 = 810
Total number of students = 5
810
= Mean = = 162.
5
Hence, the mean height of the class is 162 cm.
∑(𝐱ᵢ – µ)𝟑
µ3 = Σ = 𝒏
∑(𝐱ᵢ – µ)𝟑 µ𝟑 𝟐
Skewness (γ) = 𝟑
=
𝒏∗µ𝟐 µ𝟐 𝟑
Where, ∑(𝐱ᵢ – µ)𝟑 : Sum of the cubed difference between each data point (xᵢ) and the mean
(µ).
n: Number of data points.
Negative Skewness: In negative skewness, the extreme data values are smaller, which in turn
decreases the mean value of the dataset.
Lepotokurtic: This distribution indicates that a more significant percentage of data is present
near the tail, which implies the longer tail. Lepotokurtic has a greater value of kurtosis than
Mesokurtic.
Platykurtic: This distribution indicates that there is fewer data in the tail portion, which
implies a shorter tail. Platykurtic has a lesser value of kurtosis than Mesokurtic.
µ3 = 0 → symmetrical
µ3 > 0 → positively skewed
µ3 < 0 → negatively skewed
For µ3 between ± 0.2, the distribution can be assumed to be normal with respect to skewness.
Mathematical Expectation
Mathematical expectation, also known as the expected value, which is the summation of all
possible values from a random variable.
It is also known as the product of the probability of an event occurring, denoted by P(x), and the
value corresponding with the actually observed occurrence of the event.
For a random variable expected value is a useful property. E(X) is the expected value and can be
computed by the summation of the overall distinct values that is the random variable. The
mathematical expectation is denoted by the formula:
E(X)= Σ (x1p1, x2p2, …, xnpn),
Properties of Expectation
1. If X and Y are the two variables, then the mathematical expectation of the sum of the two
variables is equal to the sum of the mathematical expectation of X and the mathematical
expectation of Y.
Or
E(X + Y) = E(X) + E(Y)
2. The mathematical expectation of the product of the two random variables will be the
product of the mathematical expectation of those two variables, but the condition is that the
two variables are independent in nature. In other words, the mathematical expectation of
the product of the n number of independent random variables is equal to the product of the
mathematical expectation of the n independent random variables
Or
E(XY)=E(X)E(Y)
3. The mathematical expectation of the sum of a constant and the function of a random
variable is equal to the sum of the constant and the mathematical expectation of the function
of that random variable.
Or
E (a + f(X)) = a + E (f(X)),
Where, a is a constant and f(X) is the function.
4. The mathematical expectation of the sum of product between a constant and function of a
random variable and the other constant is equal to the sum of the product of the constant
and the mathematical expectation of the function of that random variable and the other
constant.
Or
E (a X + b) = a E(X) + b
5. The mathematical expectation of a linear combination of the random variables and constant
is equal to the sum of the product of ‘n’ constant and the mathematical expectation of the
‘n’ number of variables.
Or
E(∑ai Xi)=∑ ai E(Xi)
Where, ai, (i=1…n) are constants.
Dispersion
The field of statistics is used across every sector and industry to help people better understand,
and predict, potential outcomes. In finance, investors often turn to statistics to gain a sense of
how returns on certain assets, or groups of assets, could be distributed. This range of possible
investment returns is called dispersion. In other words, dispersion refers to the range of
potential outcomes of investments based on historical volatility or returns.
There are two important ways to measure dispersion—alpha and beta—which calculate risk-
adjusted returns and returns relative to a benchmark, respectively. By considering the
dispersion of possible investment returns and values such as alpha and beta, investors can gain
a sense of the risk inherent in a particular security or investment portfolio.
Measures of Dispersion
In statistics, the measures of dispersion help to interpret the variability of data i.e. to know how
much homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or
scattered the variable is.
Types of Measures of Dispersion
There are two main types of dispersion methods in statistics which are:
1. Range: It is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
2. Variance: Deduct the mean from each data in the set, square each of them and add each
square and finally divide them by the total no of values in the data set to get the variance.
Variance (σ2) = ∑(X−μ)2/N
3. Standard Deviation: The square root of the variance is known as the standard
deviation i.e. S.D. =√𝜎.
4. Quartiles and Quartile Deviation: The quartiles are values that divide a list of
numbers into quarters. The quartile deviation is half of the distance between the third
and the first quartile.
5. Mean and Mean Deviation: The average of numbers is known as the mean and the
arithmetic mean of the absolute deviations of the observations from a measure of central
tendency is known as the mean deviation (also called mean absolute deviation).
1. Co-efficient of Range
2. Co-efficient of Variation
3. Co-efficient of Standard Deviation
4. Co-efficient of Quartile Deviation
5. Co-efficient of Mean Deviation
Co-efficient of Dispersion
The coefficients of dispersion are calculated (along with the measure of dispersion) when two
series are compared, that differ widely in their averages. The dispersion coefficient is also used
when two series with different measurement units are compared. It is denoted as C.D.
The common coefficients of dispersion are:
C.D. in terms of Coefficient of dispersion
Range C.D. = (Xmax – Xmin) ⁄ (Xmax + Xmin)
Quartile Deviation C.D. = (Q3 – Q1) ⁄ (Q3 + Q1)
Standard Deviation (S.D.) C.D. = S.D. ⁄ Mean
Mean Deviation C.D. = Mean deviation/Average
In the previous chapter we studied various aspects of the theory of a single R.V. In this chapter
we extend our theory to include two R.V's one for each coordinator axis X and Y of the XY
Plane.
Definition: Let S be the sample space. Let X = X(S) & Y = Y(S) be two functions
each assigning a real number to each outcome s ∈ S. hen (X, Y) is a two dimensional random
variable.
∞
f(y) = fy(y) = ∫−∞ 𝑓𝑥,𝑦 (x, y) 𝑑𝑥 ( marginal pdf of Y)
REGRESSION
Line of regression
The line of regression of X on Y is given by
𝜎𝑦
𝑥 − 𝑥 = r.𝜎 (𝑦 − 𝑦̅)
𝑥
Regression coefficient
Regression coefficient of Y on X
𝜎𝑦
r.𝜎 = bYX
𝑥
Regression coefficient of X on Y
𝜎
r.𝜎𝑥 = bXY
𝑦
Linear correlation
Linear correlation is a measure of dependence between two random variables.
It has the following characteristics:
it ranges between -1 and 1;
it is proportional to covariance;
its interpretation is very similar to that of covariance .
Correlation
Let x and y be two random variables
The linear correlation coefficient between x and y is
𝐶𝑜𝑣[𝑋,𝑌]
Cor[X,Y] =
𝜎𝑋 𝜎𝑌
Where: Cov[X,Y] is the covariance between x and y.
Interpretation
The interpretation is similar to the interpretation of covariance: the correlation between X and
Y provides a measure of how similar deviation from the respective means are.
Thanks to this property, correlation allows us to easily understand the intensity of the linear
dependence between two random variables.
The clser correlation is to 1, the stronger the positive linear dependence between X
and Y is.
The closer it is to -1, the stronger the negative linear dependence between X and Y is.
Terminology
The following terminology is often used
1. If Cov[X,Y] > 0, then X and Y are said to be Positively linearly correlated.
2. If Cov[X,Y] < 0, then X and Y are said to be Negatively linearly correlated.
3. If Cov[X,Y] = 0, then X and Y are said to be uncorrelated.
4. If Cov[X,Y] ≠ 0, then X and Y are said to be linearly correlated.
Sample in Statistics: An Introduction to Sampling
Sampling is very often used in our daily life. For example, while purchasing fruits from a shop,
we usually examine a few to assess the quality. A doctor examines a few drops of blood as a
sample and draws a conclusion about the blood constitution of the whole body.
T Tests
A t test is a statistical test that is used to compare the means of two groups. It is often used
in hypothesis testing to determine whether a process or treatment actually has an effect on the
population of interest, or whether two groups are different from one another. When to use
a t test
A t test can only be used when comparing the means of two groups
The t test is a parametric test of difference, meaning that it makes the same assumptions about
your data as other parametric tests.
T test formula
One-Sample T-Test
While performing this test, the mean or average of one group is compared against the set
average, which is either the theoretical value or means of the population. For example, a teacher
wishes to figure out the average height of the students of class 5 and compare the same against
a set value of more than 45 kgs.
The teacher first randomly selects a group of students and records individual weights to achieve
this. Next, she finds out the mean weight for that group and checks if it meets the standard set
value of 45+. The formula used to obtain one-sample t-test results is:
Where,
t = t-statistic
m = mean of the group
𝜇 = theoretical mean value of the population
s = standard deviation of the group
n = sample size
Independent Two-Sample T-Test
This is the test conducted when samples from two different groups, species, or populations are
studied and compared. It is also known as an independent T-test. For example, if a teacher
wants to compare the height of male students and female students in class 5, she would use the
independent two-sample test.
The T-test formula used to calculate this is:
Where,
Where,
T = t-statistic
m = mean of the group
𝜇 = theoretical mean value of the population
s = standard deviation of the group
n = sample size
Where,
Where,
Purpose of
test Decide if the Decide if the population Decide if the difference
population mean means for two different between paired
is equal to a groups are equal or not measurements for a
specific value or population is zero or not
not
Example:
test if... Mean heart rate Mean heart rates for two Mean difference in heart
of a group of groups of people are the rate for a group of people
people is equal to same or not before and after exercise is
65 or not zero or not
Estimate of
population Sample average Sample average for each Sample average of the
mean group differences in paired
measurements
Population
standard Unknown, use Unknown, use sample Unknown, use sample
deviation sample standard standard deviations for each standard deviation of
deviation group differences in paired
measurements
Degrees of
freedom Number of Sum of observations in each Number of paired
observations in sample minus 2, or: observations in sample
sample minus 1, n1 + n2 – 2 minus 1, or:
or: n–1
n–1
F-Test Definition
F-Test in statistics is a hypothesis-testing procedure that considers two variances from two
samples. The F-Test is used when the difference between two variances needs to be
significantly assessed, i.e., when determining whether or not two samples can be taken as
representative of the normal population with the same variance.
The F-Test helps to determine the overall significance of the regression. It is useful in various
situations, such as when a quality controller wants to determine whether the product’s quality
is deteriorating over time. In addition, it might be useful for an economist to determine whether
income variability varies between two populations.
Key Takeaways
The F-test is a statistical test that evaluates if the variances of the two normal populations
are equal.
One can deem the variance ratio of the test insignificant if F O= 0.5, and one can assume
that the values will be from the same group or groups with similar variances.
The null hypothesis is rejected, and the variance ratio is considered significant if F≥0.5.
The F-test vs. t-test: The t-test and the F-test are two separate tests. The T-test compares two
populations’ means, whereas the other compares two populations’ variances.
2. Null hypothesis: After the formation of the test, the null hypothesis are either
(a) Two samples were from the same group or
(b) The population’s variances concerning both samples are equal.
3. To compute the variance ratio, use the formula F = larger estimate divided by a smaller
estimate of variance. Regardless of whether S12 or S22, the numerator will always be the
larger value.
4. When calculating degrees of freedom, the larger the sample’s variance is V1; the smaller
variance is V2.
5. Table value of F: the critical value of F is available from the “F-Table” (F-test table) at
the determined significance level.
6. Analysis: This involves the comparison of the computed value and the tabulated value. For
various levels of significance, there are several F Tables (F-test tables).
(a) The variance ratio is insignificant if F ≤ F0.5. We can assume that the values are from the
same group or groups with similar variances.
(b) The null hypothesis is rejected, and the variance ratio is considered significant if F ≥ 0.5.
Consider the example of the population in a village:
Village A B
Sample size 10 12
Mean monthly income 150 140
Sample variance 92 110
Testing the equality of sample variances with a significance level of 5% with the above-given
date.
Null Hypothesis: - Ho :- 𝜎12 = 𝜎22
∑(𝑥1 −𝑥)2 92
Variance sample for S12 (sample1) = = 10−1 = 10.22
𝑛1 −1
And
∑(𝑥2 −𝑥)2 110
Variance sample for S22(sample 2)= = 12−1 = 10
𝑛2 −1
𝑆12 10.22
F= = = 1.022
𝑆22 10
The critical value for v1 (10-1) = 9 and v2 (12-1) =11 and the table value of F at 5%
significance level = 2.90. An online f-test calculator can help you in making the calculations
easier.
Interpretation
The F statistic helps to decide whether to accept or reject the null hypothesis. The test results
must include an F value and an F critical value. The F value is compared to a particular value
known as the F critical value. The value derived from the data is the F statistic, or F value
(without the “critical” part). In general, one can reject the null hypothesis if the computed F
value for a test is higher than the F critical value.
In the example above, F’s computed value (1.022) is less than its table value obtained from
the F table (2.90). As a result, one can conclude that the null hypothesis is true and that the
variance of the two samples is equal.
F-Test for Equality of Two Variances
F - Test T - Test
An F test is a test statistic used to check the The T-Test is used when the sample size is
equality of variance of two population small ( n < 30) and the population standard
deviation is not known.
The F test is used for variance This is used for testing means
In this type of hypothesis test, you determine whether the data “fit” a particular distribution or
not. For example, you may suspect your unknown data fit a binomial distribution. You use a
chi-square test (meaning the distribution for the hypothesis test is chi-square) to determine if
there is a fit or not. The null and the alternative hypotheses for this test may be written in
sentences or may be stated as equations or inequalities.
∑(𝑂−𝐸)2
The test statistics for goodness of fit test is :- ∑𝐸
Where: -
O = Observed Value
E = expected value
Definition of Attributes:
The dictionary meaning of attribute is quality or characteristic.
Examples: Attributes are beauty, honesty, gender (male & female), health, employment,
smoking, drinking, blindness, etc.
The theory of attributes deals with qualitative characteristics which cannot be measured
quantitatively. In the study of attributes, the objects (units of the population/sample) are
classified according to the presence or absence of the attribute (quality) in them.
Example: A person may be classified as smoker or non-smoker, blind or not blind, male or
female, employed or un-employed, healthy or sick, literate of illiterate and so on.
Types of Attributes
Categorical Attributes
These attributes represent qualitative or nominal data that cannot be measured or ordered, and
they can be further classified into:
Binary attributes: These attributes can take only two values, such as yes or no, true or
false, male or female.
Nominal attributes: These attributes have multiple categories that do not have any
natural order, such as colors, shapes, or names of people.
Numerical Attributes
These attributes represent quantitative data that can be measured or ordered, and they can be
further classified into:
Discrete attributes: These attributes take on integer values, such as the number of
siblings, or the number of cars in a parking lot.
Continuous attributes: These attributes can take on any real value within a range, such
as weight, height, or temperature.
Ordinal Attributes
These attributes represent data that can be ordered or ranked, but the differences between the
values may not be uniform, and they can be further classified into:
Interval attributes: These attributes have a fixed scale with uniform intervals between
values, such as dates, time, or temperature measured in Celsius or Fahrenheit.
Ratio attributes: These attributes have a fixed scale with a true zero point, such as weight,
height, or distance.
Textual Attributes
These attributes represent data in natural language format, such as text documents, emails, or
social media posts.
Geographic Attributes
These attributes represent data related to geographic locations, such as latitude, longitude, or
postal codes.
Temporal Attributes
These attributes represent data related to time, such as dates, times, or durations.
Date attributes: These attributes represent a specific date, such as birthdate or expiration
date.
Time attributes: These attributes represent a specific time of day, such as the time an
event starts or ends.
Duration attributes: These attributes represent a length of time, such as the duration of a
movie or the time it takes to complete a task.
Boolean Attributes
These attributes represent data that can take only two possible values, such as true or false, yes
or no, or on or off.
Image Attributes
These attributes represent data related to images, such as size, resolution, or color depth.
Audio Attributes
These attributes represent data related to audio signals, such as sampling rate, bit depth, or
duration.
Video Attributes
These attributes represent data related to video signals, such as frame rate, resolution, or
codec.
Structural Attributes
These attributes represent data related to the structure or hierarchy of objects, such as parent-
child relationships or the depth of a node in a tree.
Behavioral Attributes
These attributes represent data related to the behavior or actions of objects, such as frequency
of use or the number of clicks on a website.
Examples of Attribute
Here are some examples of attributes:
Color: The color of an object is an attribute. For example, a red apple or a blue car.
Size: The size of an object is an attribute. For example, a large house or a small dog.
Shape: The shape of an object is an attribute. For example, a round ball or a square box.
Texture: The texture of an object is an attribute. For example, a smooth surface or a rough
texture.
Material: The material of an object is an attribute. For example, a wooden chair or a metal
spoon.
Age: The age of a person or an object is an attribute. For example, a young child or an
antique vase.
Gender: The gender of a person or an animal is an attribute. For example, a male cat or a
female bird.
Height: The height of a person or an object is an attribute. For example, a tall building or
a short person.
Weight: The weight of a person or an object is an attribute. For example, a heavy rock or
a lightweight backpack.
Personality: The personality of a person is an attribute. For example, a kind-hearted
person or a funny friend.
Applications of Attribute
Here are some common applications of attributes in various fields:
Descriptive statistics: Attributes can be used to describe the characteristics of a sample
or population in a research study. For example, attributes such as age, gender, income, and
education level can be used to describe the demographic characteristics of participants in
a study.
Sampling: Attributes can be used to define the population of interest and to select a sample
from that population. For example, attributes such as geographic location, age, or income
can be used to define a target population for a study, and then used to select a sample that
represents that population.
Programming: In programming, attributes are used to define properties of objects or
classes. Attributes can be used to specify metadata, such as the author or version of code,
or to provide additional information about the behavior of a method or class.
Data analysis: Attributes are used to describe the characteristics of data sets, such as their
size, shape, and distribution. Attributes can also be used to indicate missing or incomplete
data points.
Web development: HTML attributes are used to specify the properties of elements on a
web page, such as the color of text, the size of an image, or the target of a link.
Machine learning: In machine learning, attributes are used to describe the features of
input data, such as the height and weight of a person in a dataset about human bodies.
Database management: Attributes are used to define the fields in a database table, such
as the name and age of a person in a database about customers.
Graphic design: Attributes such as color, shape, and texture are used to create visual
elements in graphic design, such as logos, icons, and user interface elements.
Purpose of Attribute
The purpose of an attribute is to provide information about the characteristics or qualities of an
object, person, or concept. Attributes can help us to identify and distinguish things from one
another, and they can also help us to make decisions and draw conclusions based on the
information provided.
Attributes are important in many areas of life, including:
Research: Attributes can be used to collect data and identify patterns, which can help
researchers to draw conclusions and make predictions.
Product design: The attributes of a product, such as its size, color, and material, can help
designers to create products that meet the needs and preferences of their target audience.
Marketing: Attributes can be used to promote and differentiate products from their
competitors, by highlighting the unique features and benefits.
Personal development: Understanding our own attributes can help us to identify our
strengths and weaknesses, and work on improving ourselves.
Characteristics of Attribute
Some of the key characteristics of attributes are:
Identifiability: An attribute must be distinguishable and identifiable from other
characteristics or qualities of an object or person.
Descriptiveness: Attributes should be descriptive, meaning they provide information
about the quality or characteristic of an object or person.
Measurability: Attributes can be measured or quantified in some way. For example, the
weight of an object or the age of a person.
Relevance: Attributes should be relevant to the context or situation in which they are being
used. For example, the color of a car is not relevant when discussing the car’s fuel
efficiency.
Consistency: Attributes should be consistent over time and across different contexts. For
example, if an object is red, it should be consistently red in all lighting conditions.
Importance: Attributes should be important or meaningful in the context they are being
used. For example, the material of a product may be more important than its color when
making a purchasing decision.
Objectivity: Attributes should be objective, meaning they are based on facts and not
influenced by personal bias or opinion.
Advantages of Attribute
Attributes have several advantages, including:
Clarity and Precision: Attributes help to provide a clear and precise description of an
object, person, or concept. They allow us to break down complex ideas or characteristics
into smaller, more manageable parts, making it easier to understand and communicate
them.
Comparability: Attributes allow for easy comparison between different objects, people,
or concepts. By identifying and measuring similar attributes, we can compare and evaluate
them based on their relative strengths and weaknesses.
Objectivity: Attributes are objective, meaning they are based on facts and not influenced
by personal bias or opinion. This makes them a useful tool for analyzing and describing
objects or concepts in a consistent and unbiased manner.
Efficiency: Attributes provide a quick and efficient way to collect and organize
information about an object or person. This can save time and resources when making
decisions or conducting research.
Customization: Attributes can be customized to fit specific needs or contexts. For
example, attributes can be tailored to meet the unique requirements of a particular product
or market segment.
Predictive Power: Attributes can help to predict the behavior or performance of an object
or person. By identifying key attributes that are associated with success or failure, we can
make more informed decisions and take actions to improve outcomes.
Limitations of Attribute
Some Limitations of Attribute are as follows:
Limited Scope: Attributes can only provide a limited description of an object, person, or
concept. They may not capture all of the nuances and complexities that make up the whole.
Context Dependency: Attributes may be dependent on the context or situation in which
they are being used. For example, the color of a product may be less important than its
durability in certain contexts.
Subjectivity: Despite efforts to make attributes objective, there is always the potential for
personal bias or subjectivity to influence the selection and interpretation of attributes.
Over-Simplification: Attributes can sometimes over-simplify the characteristics of an
object, person, or concept, leading to an incomplete or inaccurate understanding.
Incomplete Coverage: Attributes may not cover all of the important characteristics or
qualities of an object, person, or concept. This can lead to an incomplete or inaccurate
understanding of the whole.
Lack of Flexibility: Attributes can be inflexible and may not adapt well to changing
circumstances or new information.
Difficulty of Measurement: Some attributes may be difficult to measure or quantify,
making them less useful for analysis or comparison.