PSAI Unit 1
PSAI Unit 1
1
Decision Making: Data permeates every aspect of modern organizations,
serving as a cornerstone for success by facilitating evidence-based
decision-making rooted in factual data, statistical analysis, and emerging
trends. With the expanding volume and significance of data, the emergence of
data science has become inevitable. This interdisciplinary field of IT has
propelled data scientist roles to the forefront, making them highly sought-after
in the 21st century job market.
The history of data science can be traced back to the emergence of statistics
and computer science, with roots extending into various fields such as
mathematics, economics, and information theory.
2
The foundations of data analysis can be found in the development of
statistical methods by pioneers like Francis Galton, Karl Pearson, and Ronald
Fisher in the late 19th and early 20th centuries. These statisticians laid the
groundwork for analyzing and interpreting data through techniques such as
regression analysis and hypothesis testing.
The Use of Statistics was deeply rooted within the field of data science, data
science started with statistics and continues to evolve major domains into it
such as ML, AI, IOT.
While the term "data science" was coined earlier, it gained prominence in the
early 21st century as organizations began to recognize the value of
data-driven decision-making. Companies and organizations popularize data
science by leveraging data analytics to improve their products and services.
Descriptive Analysis
Descriptive Analysis examines data to gain insights into what happened in the
past. It is characterized by pie charts, bar charts, graphs.
Diagnostic Analysis
Diagnostic Analysis is a deep-understanding or detailed examination to
understand why something happened. It is characterized by data mining, and
correlations
3
Predictive Analysis
Predictive analytics uses historical data to make accurate predictions about
data patterns that may emerge in the future. It is characterized by techniques
such as machine learning, forecasting, pattern matching and predictive
modeling.
Prescriptive Analysis
Prescriptive Analysis takes predictive data to another level. It not only
provides what is likely to happen in the future, but also proposes the most
effective course of action in response to that result.
Social Media Analysis: Organizations analyze social media data using data
science techniques to understand customer sentiment, track trends, and
inform marketing strategies. Sentiment analysis algorithms can gauge public
opinion about products or brands based on social media posts and comments.
4
INTRODUCTION OF STATISTICS
Limitations of Statistics..
Statistics possesses significant strengths, yet it's imperative to
acknowledge its limitations. Key constraints encompass:
2
Assumptions: Statistical methods operate based on certain
assumptions about data, such as normal distribution. Failure to meet
these assumptions can skew analysis results.
Data Collection
Data collection is the very first step in statistical analysis.
Collection of data is the process of collecting or gathering data or
information from multiple sources to answers to research questions
and problem statements. Data collection involves gathering and
analyzing information from various sources to address research
inquiries, assess outcomes, and predict trends and probabilities. This
crucial phase is integral to research, analysis, and decision-making
across diverse fields such as social sciences, business, and healthcare.
What type of data you need to collect, whichever is relevant to problem
Organization of data
Once data is collected, organization is the next step, how to organize
the data is how to proceed with the collected data. organization of
data refers to the process of arranging data in a systematic and
structured manner to enhance analysis and interpretation.
The systematic arrangement of gathered or raw data to enhance
comprehensibility is termed data organization. By organizing data,
researcher facilitate subsequent statistical analyses, enabling
comparisons among similar datasets.
3
Analysis of data
Analysis is the process of collecting the large volumes of and then
using statistics and other data analysis techniques to identify the
trends, patterns and new insights from the data. data analysis involves
using mathematical techniques to extract useful information from a
dataset. There are various methods of data analysis in statistics,
including descriptive statistics, inferential statistics, regression
analysis, and hypothesis testing.
Interpretation of data
To get a better understanding about the data and getting familiar with
data. Data interpretation involves analyzing data and deriving
significant insights through various analytical methods. It assists
researchers in categorizing, manipulating, and summarizing data to
inform sound business decisions. The ultimate objective of data
interpretation projects is to formulate effective marketing strategies or
broaden the client user base.
Visualization of data
Data visualization/ presentation is the art and science of transforming
raw data into a visual format that’s easy to understand. It is like
turning the numbers & statistics into a captivating story that your
audience will quickly grasp what data is all about.
Data visualization serves as a crucial statistical instrument for visually
representing data through means like charts, graphs, and maps. Its
primary function is to simplify the comprehension, analysis, and
interpretation of intricate data sets.
Functions of Statistics
Definiteness
In statistics, definiteness entails presenting facts and figures in a
precise manner, which enhances the logical coherence and
persuasiveness of a statement compared to mere description.
4
Reduces the Complexity of data
Statistics simplifies the complexity inherent in raw data, which can
initially be difficult to comprehend. Through the application of various
statistical measures such as graphs, averages, dispersions, skewness,
kurtosis, correlation, and regression, we transform the data into a more
understandable and intelligible form. These measures facilitate
interpretation and inference drawing. Consequently, statistics plays a
pivotal role in expanding one's knowledge and understanding.
Facilitates comparison
Comparing different sets of observations is a fundamental aspect of
statistics, essential for drawing meaningful conclusions. The primary
objective of statistics is to facilitate comparisons between past and
present results, thereby discerning the causes of changes that have
occurred and predicting the impact of such changes in the future.
Testing Hypotheses
Formulating and testing hypotheses is an important function of
statistics. This helps in developing new theories. So statistics examines
the truth and helps in innovating new ideas.
Statistics
Types of statistics
Descriptive Statistics Inferential statistics
5
Descriptive statistics
Descriptive statistics is a branch of statistics that involves
summarizing, organizing & presenting data in a meaningful, very
concisely manner.
Descriptive statistics involves in describing, analyzing the data set's
main characteristics and main features without making any
conclusions about the dataset.
The primary purpose of descriptive statistics is to provide a clear and
precise description, summarizing of data, enabling the researcher to
gain insights, to identify hidden patterns, trends, distributions within
the dataset.
Descriptive statistics involves a graphical representation of data
through charts, graphs, maps, plots.
Descriptive statistics can be defined as a branch of statistics used to
summarize the characteristics of a sample using certain quantitative
techniques. It helps to provide simple and accurate summaries of
samples and observations using measures like mean, median,
variance, graphs and charts. Univariate descriptive statistics are used
to describe data that contains only one variable. On the other hand,
bivariate and multivariate descriptive statistics are used to describe
multivariate data.
6
Inferential Statistics
While Descriptive Statistics provides us with tools to summarize &
describe data.
Inferential statistics allows us to draw inferences/ make conclusions
about populations based on sample data. Inferential statistics helps to
develop a good understanding of the population data by analyzing the
samples obtained from it. It helps in making generalizations about the
population by using various analytical tests and tools.
The procedures include choosing a sample apply the tools like
regression analysis and hypothesis testing.
statistical inference is the branch of statistics concerned with drawing
conclusions and or making decisions concerning a population based
only on sample data.
Inferential statistics
Population
Sample
Hypothesis testing
Z-test
F-test
T- test
Anova
chi-square test.
7
Example of Inferential Statistics
Suppose you are cooking some recipe and you want to taste it before
serving to the guest to get an idea about the dish. You will never eat
the full dish to get that idea. Rather you will taste very little portion of
your dish with a spoon.
8
9
Population & Sample
Population
Population refers to the entire set of observations, events, objects
about which you want to gather information. It makes a Data pool for
study.
Population refers to the entire group of individuals, items, data points
that we are interested in studying.
It represents the entire group that you are interested in studying.
Population includes every possible unit (or) element that falls within the
scope of your study.
The population is the study's relevant data.
Types of Populations:
Finite Population:A population is considered finite if it consists of a
distinct and countable number of elements. For example, the
population of students in this class would be a finite population.
Key points
Inferential Statistics:
Inferential statistics can be defined as a field of statistics that uses
analytical tools for inferring conclusions about a population by
examining random samples. The goal of inferential statistics is to make
conclusions about a population. In inferential statistics, a statistic is
taken from the sample data (e.g., the sample mean) that used to make
inferences about the population parameter (e.g., the population mean).
1
Example of Population
Sample
The Sub-set is chosen in such a way that, sample should represent all
characteristics of the population. Means it shouldn’t be biased.
2
Resource efficiency: Conducting research (or) Collecting data from an
entire population can be time consuming and expensive.
Representative sample
Representative sample is the subset of a population that accurately
reflects the characteristics of the entire population.
The selection process for a representative sample aims to include the
individuals, objects, data points from various subgroups in the
population proportionally to their presence in the population.
Random Sample
Random Sample is also a subset of the population chosen through the
random selection process. Where each member of the population has
an equal chance of being selected into the sample.
Random Sample aims to eliminate bias in the selection process,
ensuring that every individual element in the population has an equal
opportunity to be selected into the sample.
Probability sampling
Every member of the population has an equal chance of being
selected. It ensures that the sample is unbiased/representative.
Sampling technique that involves randomly selecting a small group of
people from a larger population.
Non-Probability sampling
The non-probability sampling method is a method in which the
researcher selects the sample based on subjective judgment rather
than random selection.
3
Sample with Replacement,Selected items are returned to the
population and again selected to sample. While sampling without
replacement vice-versa. (SWR, SWOR)
Systematic Sampling:
Systematic sampling is essentially the same as random sampling,
except that it’s usually a little easier to carry out. Everyone in the
population is numbered, but instead of random numbers being
generated, people are randomly selected at regular intervals.
Ex: All the company’s employees are listed alphabetically. The first 10
numbers are selected at random. The starting point is number 6.
Starting from number 6, you select every 10th employee on the list
(from 6,16, 26,36, 46,56 etc.) until you have a sample size of 100
people.
4
Stratified Sampling
Stratified sampling involves dividing the population into
subpopulations that are different in each way. It allows you to draw
more precise conclusions by ensuring that every subgroup is properly
represented in the sample.
EX: The number of female employees is 800, and the number of male
employees is 200. You want the sample to be representative of the
company’s gender balance, so you divide the population into two
groups based on gender. For each group, you use random sampling.
For each group, you select 80 female employees and 20 male
employees. This will give you a sample of 100 people.
Clustered Sampling
cluster sampling is a method of probability sampling in which a large
sample of a population is divided into smaller groups called clusters
and randomly selected from the clusters to make up a sample. This
method is most commonly used when the sample size and population
size are very large.
5
Population Parameter:
● Population parameter in statistics refers to the measure or
characteristics of an entire population being studied.
● It is mathematically Constant.
● Generally population parameters are unknown and inferred from
sample data.
● Population is denoted by “N”
● Population Mean - μ
● Population Standard Deviation - σ
● Population Variance - σ2
Sample Statistic:
● Sample Statistic in statistics refers to the measure or
characteristic that is calculated from data collected from a
sample of individuals within a population.
● It is also mathematically Constant.
● Sample Statistics are calculated to infer the population
parameters.
● Sample is denoted by “n”
● Population Mean - 𝑥
● Population Standard Deviation - s
● Population Variance - s2
6
Collection of Data
Machine Learning and AI: In the field of machine learning, datasets are used
to train and evaluate models. The quality and relevance of the dataset
significantly impact the performance of these models.
Quantitative data
Quantitative data refers to data that contains any information that
can be quantified — that is, numbers. If it can be counted or
measured, and given a numerical value, it's quantitative in nature.
Quantitative data can tell you "how many", "how much", or "how
often".
Examples:
How many people attended last week's webinar? How much revenue
did the company generate in 2019? How often do certain customer
groups use online banking?
Quantitative Data is more Objective in Nature, and can only be
expressed in numericals.
Characteristics of Quantitative Data
Precision
Arithmetic Operations
Quantifiability
Qualitative data
Examples:
Interview Transcripts: Verbatim transcripts of interviews with people,
capturing their spoken words and reactions to open-ended questions.
Observation Notes: Point by point notes taken amid the perception of a
specific wonder or occasion, depicting what was seen, listened, or
experienced.
Photography and Videos: Visual information that can be analyzed for
substance, setting, and feelings communicated inside the pictures or
recordings.
Characteristics:
Subjective:
There are several methods you can use to collect quantitative data,
including:
● Experiments.
● Controlled observations.
● Surveys: paper, kiosk, mobile, questionnaires.
● Longitudinal studies.
● Polls.
● Telephone interviews.
● Face-to-face interviews.
● Exploratory Research:
● Content Analysis:
What is Data
Data is a collection of facts or statistics about an individual. Data can be
written, observed, a number, an image, a number, a graph, or a symbol. For
instance, an individual price, weight, address, age, name, temperature, date,
or distance can be data.
Data is just a piece of information. It doesn't mean anything on its own. You
need to analyze, organize, and interpret it to make sense of it. Data can be
simple—and may even seem useless until it is analyzed, organized, and
interpreted.
What is Information
Information can be defined as knowledge acquired through observation,
communication, investigation, or education. In other words, information can
be defined as the outcome of the analysis and interpretation of data. While
data refers to the individual facts, figures, or charts, information refers to
the perception of these facts.
● Quantitative data
● Qualitative data
Structured data:
Tabular data:
Fixed schema:
Structured data usually has a built-in schema. A built-in schema is a set
of rules that define how data is stored and organized in a database or
dataset. Structured data usually has a predefined schema. This means
that the structure of the data is pre-defined. Each column has a unique
data type meaning.
Consistency:
Structured data plays a vital role in maintaining consistency in data
entry, which in turn makes performing operations like searching and
sorting much easier. For example, if a dataset of products is structured
with consistent fields like "product name," "price," and "category," it
becomes more efficient to search for a specific product or sort the
data based on price or category.
Relational Nature:
Structured data is relational in nature. It refers to data that is organized
and represented in a fixed format, typically using tables and relationships
between those tables. Relational databases are commonly used to store
and manage structured data, where data is structured into tables with
rows and columns. The relationships between the tables are defined using
keys, such as primary keys and foreign keys, to establish links.
Examples
Relational Databases: Relational databases organize data into tables
with predefined columns and rows. They use a structured query language
(SQL) to retrieve and manipulate data.
Excel Spreadsheets: Excel spreadsheets organize data into rows and
columns, allowing users to input, sort, and analyze data in a structured
format.
Importance of Structured Data
Data Integration:
Examples: Examples:
Customer information, Emails, social media posts,
transaction records, inventory multimedia files, sensor data.
lists, financial data.
Numerical Data
Numerical data refers to the data that is in the form of numbers, and
not in any language or descriptive form. Often referred to as
quantitative data, numerical data is collected in number form and
stands different from any form of number data types due to its ability
to be statistically and arithmetically calculated.
Examples:
Categorical data refers to a data type that can be stored and identified,
classified based on the names or labels given to them. The data
collected in the categorical form is also known as qualitative data.
Examples
Namely
● Discrete Data
● Continuous Data
Discrete Data:
Data that can only take on certain values are discrete data. Discrete
data, also called discrete variables, are sets of data that accept only
specific values. It is typically represented as whole numbers or
integers.
Examples
These numbers are not generally spotless and clean like those in
discrete information, as they're normally gathered from exact
estimations. Over the long run, estimating a specific subject permits
us to make a characterized range where we can sensibly hope to
gather more information.
Consistent information changes over the long run and can have
various qualities at various time spans.
Freezer temperature
EXAMPLES
Example:
Ordinal Data
Examples
● Elementary
● High School
● Intermediate
● Graduation
● Post Graduation
● Doctorate
● Frequency distribution
● Mode
● Chi-square test
● Descriptive statistics
● Rank correlations
Data Visualization
BOX PLOT:
● Minimum Value
● First Quartile (Q1 or 25th Percentile)
● Second Quartile (Q2 or 50th Percentile)
● Third Quartile (Q3 or 75th Percentile)
● Maximum Value
1
Q1 - It is the median value from minimum value(min) and median (Q2).
Q2 - It is the median value of the total data set.
Q3 - It is the median value from and median (Q2) and maximum
value(Max)
Horizontal lines are whiskers.
2
Q1 is also called 25% percentile.
Q2 is also called 50% percentile.
Q3 is also called 75% percentile.
Outliers: Points lying beyond the minimum and maximum values are
outliers
Interquartile range: It is Q3-Q1. It is the spread or range of the middle
50% of the data.
Whiskers: From Minimum Value to Q1 is the first 25% of data
From Q3 to Maximum value is the last 25% of the data
Quartiles (Q1 and Q3): The quartiles divide the distribution into four
equal parts, with approximately 25% of the data falling between each
quartile. In a standard normal distribution (mean = 0, standard
3
deviation = 1), Q1 is approximately -0.675 and Q3 is approximately
0.675.
4
Box plot for normal distribution looks like the above graph.
Quartiles (Q1 and Q3): The quartiles divide the dataset into four equal parts,
with approximately 25% of the data falling between each quartile. In a skewed
distribution, Q1 and Q3 might not be equidistant from the median due to the
skewness.
Interquartile Range (IQR): The box in a box plot represents the interquartile
range, which spans from Q1 to Q3. The length of the box indicates the spread of
the middle 50% of the data.
Whiskers: The whiskers extend from the quartiles to the smallest and largest
values within 1.5 times the IQR from Q1 and Q3, respectively. However, in
skewed distributions, the whisker lengths may vary due to the asymmetric
nature of the data.
Outliers: Outliers are data points that fall beyond the whiskers. In skewed
distributions, outliers tend to occur more frequently on the side opposite to the
longer tail of the distribution.
5
Example:
17,17,18,19,20,22,23,25,33,64,10,5
5 10 17 17 18 19 20 22 23 25 33 64
Q1 = 17
Q2 = 19.5
Q3 = 24
7
BarGraph
Grouped Data
Ungrouped data
Ungrouped data is essentially the raw data you collect in its original
form without being organized into classes or categories. It's a
collection of individual data points that have not yet been subjected to
any processing or summarization. Working with ungrouped data
means dealing with each observation individually.
Ex: A list of exact ages of students in a class (e.g., 21, 23, 22, 20).
8
Bar Graphs with Ungrouped Data
For grouped data, bar graphs can show the distribution of data across
different intervals or groups. In this context, each bar represents a
group rather than an individual observation. For example, if you have
data on the ages of individuals in a population, you might group these
ages into 10-year intervals (0-9, 10-19, etc.) and use a bar graph to
show the number of individuals within each age group. This helps in
understanding the distribution of data across different ranges, making
it easier to identify patterns like skewness or bimodality.
9
The above two graphs are bar graphs for grouped data and ungrouped
data.
Histogram
● Bins: The range of values is divided into intervals called bins. The
bins must be of equal size but can cover different ranges
depending on the data distribution.
● Frequency: The height of each bar represents the frequency or
count of data points within each bin. Alternatively, histograms
can represent relative frequency, showing the proportion of data
points in each bin relative to the total dataset.
● No Gaps: In histograms, bars are placed adjacent to each other
without gaps to emphasize the continuous nature of the data.
10
Uses of Histograms
11
Estimates of Central Tendency
Descriptive statistics
𝑛
∑ .𝑥𝑖
𝑖=1
Formula: Mean(μ)=
𝑛
Xi= individual Items
n= no.of. Data points
example
Xi = 45+95+12+52+47+35+65+88+22
N=9
μ= 51.22
Outliers
An outlier is an extreme value which is significantly different from the
other values in a dataset. When calculating the mean or average, these
extreme values can heavily impact the result.
Due to its calculation utilizing the sum of all the values, the mean takes
into consideration the value of each data point. As a result, if there
are outliers within the dataset, they can excessively influence the
average. Outliers can either increase or decrease the mean,
depending on their position relative to the other values.
Example:
x= 15,85,95,75,42,12,1,93,501
x= 1,12,15,42,75,85,93,95,501 [1 and 501 are outlier]
Trimmed Mean
In statistics, the trimmed mean is a measure of central tendency that is
used when the dataset has outliers.
𝑛−𝑝
∑ 𝑥(𝑖)
Trimmed Mean x = 𝑖=𝑝+1
𝑛−2𝑝
Example
Given Dataset=25000,23000,22720,18000,7202,39009,32007,21003,
1002,990
Step 1) sort the values
990, 1002,7202,18000,21003,22720,23000,25000,32007,39009
Step 2)Cut the extreme value at both ends by 10%.
Remove 990, 39009
1002,7202,18000,21003,22720,23000,25000,32007
X = 1002,7202,18000,21003,22720,23000,25000,32007
N=8
Weighted Mean
The weighted mean is a measure of central tendency for a set of data
where each observation is given a weight.
A weighted average is a mean that assigns different levels of
importance to the values within a dataset. The regular mean, also
known as the arithmetic mean, assigns equal importance to all
observations. In writing, the weighted average is often referred to as
the weighted mean.
When you need to take into account the relative importance of values
in a dataset, utilize a weighted mean. Simply put, you are assigning
varying degrees of importance to the values during the calculations.
if we are taking the average from multiple sensors and one of the
sensors is less accurate, then we might down weight the data from that
sensor.
Calculating the weighted average involves multiplying each data point
by its weight and summing those products. Then sum the weights for
all data points. Finally, divide the weight value products by the sum of
the weights.
𝑛
∑ 𝑤𝑖 𝑥𝑖
Weighted Mean = 𝑋𝑤 = 𝑖=1
𝑛
∑ 𝑤𝑖
𝑖=1
Example
category weight(wi)
Home work 25
Quiz 30
Test 10
Final Exam 35
Student Score(xi)
HW 88
Q 71
T 97
FE 90
Wi.xi = 25*88+30*71+10*97+35*90
ΣWi = 25+30+10+35 = 100
Wi.xi = 2200+2130+970+3150/100
ΣWi= 8450/100
wi= 84.5
Median
The median is the value in the middle of a group. The point where half
the data is greater and half the data is lesser. The median is a way to
condense multiple data points into a single representative value.
Calculating the median is easy when it comes to statistical measures.
To calculate the median, arrange the data in ascending order and
identify the middlemost data point as the median.
17+21/2
38/2
Median = 19
Robust Estimate
The median is not the only robust estimate of location. In fact, a
trimmed mean is widely used to avoid the influence of outliers.
Robust estimates are statistical measures that are not significantly
influenced by outliers or extreme values in the data. These estimates
are designed to be more resistant to the impact of outliers and can
provide a more accurate reflection of the central tendency.
Dataset 1 Dataset 2
20 11
21 16
22 19
25 23
26 29
29 32
33 45
34 47
38 53
43 67
Dataset 1: 43-20 = 23
Dataset 2: 67-11 = 56
Dataset 2 has a broader range and, hence, more variability than
dataset 1.
The Range is highly sensitive to outliers. If there is an
exceptionally high or low number among the values, it has an
impact on the entire range of data.
Mean Absolute Deviation (MAD)
The mean absolute deviation of a dataset is calculated by finding the
average distance between each data point and the mean. It provides
insight into the range of values within a dataset.
Calculate the mean, Calculate how far away each data point is from
the mean using positive distances, these are called absolute
deviations, Add those deviations together, Divide the sum by the
number of data points.
Example
25,15,20,17,22,28,27
154/7=22
25-22=3 3+7+2+5+0+6+5/7=28/7=4
15-22=-7
20-22=-2
17-22=-5
22-22=0
28-22=6
27-22=5
*Apply the modulus for negative values.
A large MAD indicates a dataset has more spread out relative to
the mean. A Small MAD indicates a dataset has less spread out
relative to the mean.
Variance
Variance of measure of how much an individual Value or Data point
falls far away from the mean. To find the variance, you can calculate
the average of the squared differences between each data point and
the mean.
variance is denoted by the symbol "σ2".
Step 1: calculate the mean
Step 2: Subtract the mean from data points
Step 3: After Subtracting from mean, You will get absolute values
Step 4: Square them Up
Step 5 : Find Mean for Squared Values.
Example: 8,11,15,18,20,22,32,44,55
Mean = 225/9 =25
8-25=-17
11-25=-14
15-25= -10
18-25=-7
20-25=-5
22-25=-3
32-25=7
44-25=19
55-25=30
Σ(𝑥𝑖−μ)2
σ2 = 𝑛
[ population variance]
Σ(𝑥𝑖−𝑥)2
s2 = 𝑛−1
[ sample variance]
Standard Deviation
The standard deviation measures the average deviation of each data
point from the mean. A dataset with values grouped closely together
will result in a smaller standard deviation. Conversely, if the values
are more dispersed, the standard deviation will be larger due to
the increased standard distance.
√219.77 = 14.8246
Σ(𝑥𝑖−μ)2
σ2 = 𝑛
[ population SD]
Σ(𝑥𝑖−х)2
s2 = 𝑛−1
[ sample variance SD]
Interquartile Range
The interquartile range represents the central 50% of the data.
Consider the median value that separates the dataset into two equal
halves. You can also split the data into quarters. These quarters are
known as quartiles among statisticians and are represented in
ascending order as Q1, Q2, and Q3. Q1 contains the lowest 25% of the
dataset's values. The upper quartile, denoted as Q4, consists of the top
25% of the dataset with the highest values. The interquartile range
represents the middle 50% of the data, lying between the upper and
lower quartiles. Put simply, the interquartile range contains the middle
50% of the data points between Q1 and Q3.
Example
11,18,22,40,41,62,70
Step 1: ¼(n+1) = ¼(7+1)= 8/4= 2 —> Q1(25%)
Step 2 : ½(n+1)= ½(7+1) = 8/2 =4 —>Q2(50%)
Step :¾(n+1) = ¾(7+1)= 24/4= 6 —> Q3(75%)
Q1(25%)= 18
Q2(50%)= 40
Q3(75%)= 62
QR = Q3 - Q1 or 75% - 25%.
IQR = 62-18 = 44.
Hence, the most appropriate measure to use relies on the type of data
and the particular analytical goals. One must carefully take into
account the assumptions and constraints of each measurement,
making sure that the selected measurement is compatible with the
data's traits and the analysis's objectives.
Probability Axioms
So, I have written here that there is an 80 % chance that India will win
the match, and a 10 % chance that the price of gold will rise tomorrow.
Now, here the meaning of probability is that there is a chance, which
means that the event is uncertain. Now, here the meaning of
probability is that there is a chance, which means that the event is
uncertain. Now, what we have done is that we have assigned a
numerical value with the uncertainty that refers to the probability.
Probability Formula:
Probability can be defined as the ratio of the number of favorable
outcomes to the total number of outcomes of an event. For an
experiment having 'n' number of outcomes, the number of favorable
outcomes can be denoted by x. The formula to calculate the probability
of an event is as follows.
𝐹𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠(𝑥)
Probability(Event) = 𝑇𝑜𝑡𝑎𝑙 𝑂𝑢𝑡𝑐𝑜𝑚𝑒𝑠 (𝑛)
𝑁(𝐴)
P(A) = 𝑁 (𝑆)
Probability of Event:
The probability of an event is a measure of the likelihood that the event
will occur, expressed as a number between 0 and 1.
An event with a probability of 1 is considered certain to happen, while
an event with a probability of 0 is certain not to happen.
Terminology:
Experiment: An activity whose outcomes are not known is an
experiment. Every experiment has a few favorable outcomes and a
few unfavorable outcomes. The historic experiments of Thomas Alva
Edison had more than a thousand unsuccessful attempts before he
could make a successful attempt to invent the light bulb.
1
Types of Probability
These are 3 major types of probability :
1. Theoretical Probability
2. Experimental Probability
3. Axiomatic Probability
Theoretical Probability:
Theoretical or classical probability is based on the assumption that all
outcomes in a sample space are equally likely. It doesn't require
experimental data or subjective judgment but instead uses a priori
reasoning to calculate probabilities.
Experimental probability:
Experimental or empirical probability is based on actual experiments
and observations. Instead of assuming all outcomes are equally likely,
it calculates the probability of an event based on how often the event
occurs relative to the total number of trials.
Axiomatic Probability: (Probability Axioms)
Axiomatic probability, developed by Russian mathematician Andrey
Kolmogorov in the 1930s, is a more formal and rigorous approach to
probability that is based on set theory. It establishes probability on a
firm theoretical foundation through a set of axioms (basic rules) that
all probability measures must follow.
2
Axioms of Probability
There are three axioms of probability that make the foundation of
probability theory-
Axiom 1: Probability of Event
The first one is that the probability of an event is always between 0 and
1. 1 indicates definite action of any outcome of an event and 0
indicates no outcome of the event is possible.
Axiom 1: For any given event X, the probability of that event must be
greater than or equal to 0. Thus,
0 ≤ P(X) ≤ 1
Axiom 2: We know that the sample space S of the experiment is the set
of all the outcomes. This means that the probability of any one
outcome happening is 100 percent i.e P(S) = 1. Intuitively this means
that whenever this experiment is performed, the probability of getting
some outcome is 100 percent.
P(S) = 1
Axiom 3: For the experiments where we have two outcomes A and B. If
A and B are mutually exclusive,
P(A ∪ B) = P(A) + P(B)