Statistics
Statistics
Measure of Shapes
Central tendency is the middle point of a data set distribution. Measures of central tendency, also called
measures of location, provide a quick snapshot of the data and help in understanding the distribution
and central values of a dataset.
- Mean: The mean, often referred to as the average, is calculated by adding all the values in the dataset
and dividing by the number of values. There are different types of means, such as the arithmetic mean
and geometric mean.
- Median: The median is the middle element of an ordered data set. It is not influenced by outliers and
represents the central point of a dataset, especially in skewed distributions.
- Mode: The mode is the value that occurs most frequently in a data set. It can be unimodal, bimodal,
trimodal, or multimodal depending on the number of modes in the data.
Measure of Spread
Understanding measures of spread, like range, interquartile range (IQR), standard deviation, and
variance, helps in understanding how spread out the data points are from one another.
Mean
- Arithmetic Mean: Calculated by summing all values and dividing by the number of observations.
- Geometric Mean: Used when dealing with quantities that change over time, such as average growth
rates over several years.
Median
- The median is the middle value in an ordered data set. For an odd number of elements, it is the middle
element; for an even number, it is the mean of the two middle elements.
Mode
- The mode is the value that occurs most frequently in a dataset. It can be easily identified in ungrouped
data, but for grouped data, the modal class is determined using a formula involving class intervals and
frequencies.
Standard Deviation
Standard deviation measures the amount of variation or dispersion of a set of values. It is a key measure
of spread.
Measure of Dispersion
Measures of dispersion represent the scattering of data. They show various aspects of the data spread
across parameters and include:
Range
The range is the difference between the highest and lowest values in a dataset. It gives a good indicator
of variability, especially in distributions without extreme values.
Variance
Variance is the average of the squared differences from the mean, providing a measure of the spread of
data points.
Skewness
Skewness measures the asymmetry of the probability distribution of a real-valued random variable
about its mean.
Kurtosis
Kurtosis measures the "tailedness" of the probability distribution of a real-valued random variable.
Descriptive Statistics
Descriptive statistics summarize and describe the features of a dataset. They provide simple summaries
about the sample and the measures.
Inferential Statistics
Inferential statistics make inferences and predictions about a population based on a sample of data
taken from that population.
Parameters are numerical characteristics of a population, while statistics are numerical characteristics of
a sample.
Types of Data
Statistics are crucial for making informed decisions based on data analysis. They help in understanding
data distributions, central values, and variability, which are essential for effective decision-making.
• Simple to understand and calculate, reflects most common occurrence, helps identify
trends and patterns, quick insights(Regular Session 1_a4d2d…).
• Importance of Range:
• Indicates variability, good when no extreme values, can be misleading with outliers .
PDF 2
Descriptive Statistics
- Measuring essential characteristics of the data: Central value of the distribution, also known as overall
tendency.
Inferential Statistics
- Measuring data characteristics: Used to make inferences about the population from a sample.
- Mean: Sum of values divided by the number of values. It is highly sensitive to extreme observations
and is the most representative value for metric data.
- Median: Middle most observation for ordered data, dividing it into two equal parts. It is insensitive to
extreme observations and meaningful for ordinal/rank data.
- Mode: Most common value or the most repeated values having the highest frequency. It is not
affected by extreme observations and is applicable for nominal data.
- Range: Difference between maximum and minimum values. It is highly influenced by extreme
observations and is not based on all observations.
- Interquartile Range (IQR): Difference between the upper quartile and lower quartile, based on the
middle 50% of observations.
- Mean Absolute Deviation: Mean of the absolute deviations from the central value. It does not impose
high penalties for large deviations.
- Standard Deviation: Square root of the variance, most representative measure of dispersion, based on
all observations, and imposes higher penalties for large deviations.
- Coefficient of Variation (CV): Used to compare variability of two or more data distributions. It is
independent of scale and represents relative consistency.
Measure of Shape
Other Concepts
- Percentile and Decile: Used to describe the position of a particular value in the data set relative to the
other values.
- Parameter and Statistic: Parameter refers to a characteristic of a population, while statistic refers to a
characteristic of a sample.
- Types of Data: Refers to different categories such as nominal, ordinal, interval, and ratio data.
- Importance of Statistics: Critical for analyzing data, making informed decisions, and understanding
trends and patterns in various fields.
Weighted Mean
Geometric Mean
• Definition: Applicable to quantities that change over time, providing the average rate of
change.
Partition Value
Variability Quartile
• Interquartile Range (IQR): The difference between the upper quartile and lower
quartile, based on the middle 50% of observations. It's not affected by extreme
observations.
Trimmed Mean
• Definition: The mean calculated after removing a certain percentage of the largest and
smallest values.
Quantitative Data
Qualitative Data
• Definition: Data that is descriptive and conceptual, often categorized based on traits and
characteristics.
Progression
• Demand Forecasting: Useful for identifying the most common customer preferences and
demands.
• Range: Difference between the maximum and minimum values. It is highly influenced
by extreme observations.
• Interquartile Range (IQR): Measures the spread of the middle 50% of the data.
• Mean Absolute Deviation (MAD): Measures the average absolute deviation from the
mean.
• Variance: The mean of squared deviations from the mean.
• Standard Deviation: The square root of variance, representing the dispersion of a
dataset.
• Coefficient of Variance (CV): Represents the ratio of the standard deviation to the
mean.
Importance of Range
PDF 3
Measure of Central Tendency
- Definition: Central value of the distribution, also known as the overall tendency.
Measure of Spread
- Types: Range, Inter Quartile Range, Mean Absolute Deviation, Variance, Standard Deviation,
Coefficient of Variance.
Mean
- Characteristics:
Median
- Definition: The middlemost observation for ordered data, dividing it into two equal parts.
- Characteristics:
Mode
Standard Deviation
- Characteristics:
Measure of Dispersion
- Types: Range, Inter Quartile Range, Mean Absolute Deviation, Variance, Standard Deviation.
Range
- Characteristics:
Variance
Skewness
Kurtosis
Descriptive Statistics
- Definition: Makes inferences and predictions about a population based on a sample of data.
Types of Data
- Ratio Data: Measures data with meaningful intervals and a true zero point.
Importance of Statistics
- Definition: Statistics is crucial for collecting, analyzing, interpreting, presenting, and organizing data. It
helps in making informed decisions based on data analysis.
• Simple to understand and calculate, reflects the most common occurrence, helps identify
trends and patterns, quick insights(
• Importance of Range:
• Indicates variability, good when no extreme values, can be misleading with outliers(
• Coefficient of Variance:
• Used when comparing variability across datasets with different means. Example given:
comparison of performance among different bulbs(
• Variability Quartile:
• Quartiles divide data into quarters, measuring the central point of distribution(
• Trimmed Mean:
• Method of averaging that removes a small percentage of the largest and smallest values
before calculating the mean