0% found this document useful (0 votes)

19 views36 pages

Statistics

The document outlines key topics in statistics, including descriptive statistics, probability, sampling techniques, inferential statistics, and hypothesis testing. It covers essential concepts such as measures of central tendency, probability distributions, and data visualization methods. Additionally, it discusses various sampling methods and their importance in data analysis.

Uploaded by

shub99gaikwad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views36 pages

Statistics

Uploaded by

shub99gaikwad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Index of Topics Covered in Statistics

1. Descriptive Statistics
o Measures of Central Tendency (Mean, Median, Mode)
o Measures of Dispersion (Range, Variance, Standard Deviation, IQR)
o Data Visualization (Histogram, Boxplot, Scatterplot)
o Skewness and Kurtosis
2. Probability
o Basics of Probability
o Types of Events (Independent, Dependent, Mutually Exclusive)
o Conditional Probability
o Bayes' Theorem
3. Probability Distributions
o Discrete Distributions (Binomial, Poisson)
o Continuous Distributions (Normal, Uniform, Exponential)
o Properties of Normal Distribution (Z-Scores, Empirical Rule)
4. Sampling and Sampling Techniques
o Types of Sampling (Random, Stratified, Cluster, Systematic)
o Importance of Sample Size
o Sampling Error
5. Inferential Statistics
o Confidence Intervals
o Margin of Error
o Central Limit Theorem (CLT)
o Bootstrapping
6. Hypothesis Testing
o Null and Alternative Hypotheses
o Type I and Type II Errors
o Steps in Hypothesis Testing
o One-Tailed vs. Two-Tailed Tests
o Test Statistics (Z-Test, T-Test, Chi-Square Test, ANOVA)
7. Correlation and Covariance
o Definition and Differences
o Pearson’s Correlation Coefficient
o Spearman’s Rank Correlation
o Covariance Formula and Interpretation
8. Regression Analysis
o Linear Regression (Simple and Multiple)
o Assumptions of Linear Regression
o Coefficients (Intercept and Slope)
o R-Squared and Adjusted R-Squared
o Multicollinearity and Variance Inflation Factor (VIF)
o Logistic Regression
9. Analysis of Variance (ANOVA)
o One-Way ANOVA
o Assumptions of ANOVA
o F-Test
o Post Hoc Tests (Tukey’s Test)
10. Chi-Square Tests
o Chi-Square Goodness-of-Fit Test

Connect & follow for more such content and information on Data Analysis/Science
o Chi-Square Test for Independence
11. Data Transformation and Standardization
o Log Transformation
o Min-Max Scaling
o Z-Score Standardization
12. Outliers and Missing Data
o Detection of Outliers (IQR, Z-Scores)
o Handling Outliers (Winsorizing, Capping, Removing)
o Missing Data Imputation (Mean, Median, Mode, Regression Imputation)
13. Measures of Association
o Odds Ratio
o Relative Risk
o Contingency Tables

Connect & follow for more such content and information on Data Analysis/Science
Descriptive Statistics

1. Basics of Data

• Data Types:
o Quantitative (Numerical):
▪ Discrete: Countable values (e.g., number of students in a class)
▪ Continuous: Infinite possible values within a range (e.g., height,
weight)
o Qualitative (Categorical):
▪ Nominal: Categories without a specific order (e.g., gender, colors)
▪ Ordinal: Categories with a specific order (e.g., rankings, satisfaction
levels)
• Levels of Measurement:
o Nominal: Labels with no inherent order (e.g., male/female)
o Ordinal: Rank-ordered data with unequal intervals (e.g., class grades: A, B,
C)
o Interval: Equal intervals, but no true zero (e.g., temperature in Celsius)
o Ratio: Equal intervals and a true zero (e.g., weight, income)

2. Measures of Central Tendency

These measures represent the center of the data.

1. Mean (Average):
o Formula: Mean=ΣXn\text{Mean} = \frac{\Sigma X}{n}Mean=nΣX
o Sensitive to outliers.
o Example: The average of 2,3,52, 3, 52,3,5 is (2+3+5)/3=3.33(2+3+5)/3 =
3.33(2+3+5)/3=3.33.
2. Median:
o The middle value when data is sorted.
o If nnn is odd: Middle value.
o If nnn is even: Average of two middle values.
o Not affected by outliers.
3. Mode:
o The most frequently occurring value.
o Can have multiple modes (bimodal, multimodal).
o Example: In 1,2,2,31, 2, 2, 31,2,2,3, the mode is 222.

3. Measures of Dispersion (Variability)

These describe how spread out the data is.

1. Range:

Connect & follow for more such content and information on Data Analysis/Science
o Formula: Range=Max−Min\text{Range} = \text{Max} -
\text{Min}Range=Max−Min
o Example: For 2,4,6,82, 4, 6, 82,4,6,8, Range = 8−2=68 - 2 = 68−2=6.
2. Variance:
o Measures the average squared deviation from the mean.
o Formula:
▪ Population: σ2=Σ(X−μ)2N\sigma^2 = \frac{\Sigma (X -
\mu)^2}{N}σ2=NΣ(X−μ)2
▪ Sample: s2=Σ(X−Xˉ)2n−1s^2 = \frac{\Sigma (X - \bar{X})^2}{n-
1}s2=n−1Σ(X−Xˉ)2
o Larger variance = greater spread.
3. Standard Deviation:
o Square root of variance.
o Represents data spread in the same units as the data.
o Formula: σ=σ2\sigma = \sqrt{\sigma^2}σ=σ2.
4. Interquartile Range (IQR):
o Measures the middle 50% of the data.
o Formula: IQR=Q3−Q1\text{IQR} = Q3 - Q1IQR=Q3−Q1
▪ Q1Q1Q1: 25th percentile
▪ Q3Q3Q3: 75th percentile
o Helps detect outliers.

4. Skewness and Kurtosis

1. Skewness:
o Describes asymmetry in data distribution.
o Positive skew: Tail on the right (e.g., income data).
o Negative skew: Tail on the left.
2. Kurtosis:
o Measures the "tailedness" of the distribution.
o Types:
▪ Mesokurtic: Normal distribution
▪ Leptokurtic: Heavy tails
▪ Platykurtic: Light tails

5. Data Visualization

Helps to visually understand and describe data:

1. Histograms:
o Show frequency distribution.
o Useful for understanding data shape (normal, skewed).
2. Bar Charts:
o Represent categorical data.
3. Box Plots:
o Show data spread, median, and outliers.
4. Pie Charts:

Connect & follow for more such content and information on Data Analysis/Science
o Represent proportions of a whole.
5. Scatter Plots:
o Display relationships between two variables.

Summary of Descriptive Statistics Workflow:

1. Understand your data: Types and levels of measurement.

2. Calculate central tendency: Mean, median, mode.
3. Assess variability: Range, variance, standard deviation, IQR.
4. Check distribution: Skewness, kurtosis.
5. Visualize: Use appropriate charts and plots.

Connect & follow for more such content and information on Data Analysis/Science
Probability

1. Basics of Probability

• Definition: Probability measures the likelihood of an event occurring. It ranges from

0 (impossible event) to 1 (certain event).
o Formula: P(E)=Number of favorable outcomesTotal number of outcomesP(E)
= \frac{\text{Number of favorable outcomes}}{\text{Total number of
outcomes}}P(E)=Total number of outcomesNumber of favorable outcomes

2. Key Probability Terms

1. Experiment: A process with well-defined outcomes (e.g., rolling a die, flipping a

coin).
2. Sample Space (S): The set of all possible outcomes.
o Example: Rolling a die → S={1,2,3,4,5,6}S = \{1, 2, 3, 4, 5,
6\}S={1,2,3,4,5,6}
3. Event (E): A subset of the sample space (e.g., rolling an even number → E={2,4,6}E
= \{2, 4, 6\}E={2,4,6}).
4. Complement of an Event (EcE^cEc): All outcomes not in EEE (e.g.,
Ec={1,3,5}E^c = \{1, 3, 5\}Ec={1,3,5}).
5. Mutually Exclusive Events: Two events that cannot occur simultaneously.
6. Independent Events: The occurrence of one event does not affect the probability of
the other.

3. Types of Probability

1. Classical Probability:
o Based on equally likely outcomes.
o Example: Probability of rolling a 4 on a die → P(4)=16P(4) =
\frac{1}{6}P(4)=61.
2. Empirical Probability:
o Based on observed data.
o Example: Probability of rain based on historical weather data.
3. Subjective Probability:
o Based on intuition or experience.
o Example: The probability of a startup succeeding.

4. Rules of Probability

1. Addition Rule:

Connect & follow for more such content and information on Data Analysis/Science
o For mutually exclusive events: P(A∪B)=P(A)+P(B)P(A \cup B) = P(A) +
P(B)P(A∪B)=P(A)+P(B)
o For non-mutually exclusive events: P(A∪B)=P(A)+P(B)−P(A∩B)P(A \cup B)
= P(A) + P(B) - P(A \cap B)P(A∪B)=P(A)+P(B)−P(A∩B)
2. Multiplication Rule:
o For independent events: P(A∩B)=P(A)×P(B)P(A \cap B) = P(A) \times
P(B)P(A∩B)=P(A)×P(B)
3. Complement Rule:
o Probability of an event not occurring: P(Ec)=1−P(E)P(E^c) = 1 -
P(E)P(Ec)=1−P(E)

5. Conditional Probability

• The probability of event AAA occurring given that BBB has occurred:
P(A∣B)=P(A∩B)P(B)(if P(B)>0)P(A|B) = \frac{P(A \cap B)}{P(B)} \quad \text{(if $
P(B) > 0 $)}P(A∣B)=P(B)P(A∩B)(if P(B)>0)

6. Bayes’ Theorem
• A formula to calculate conditional probabilities: P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) =
\frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)
o Applications: Spam detection, medical testing, machine learning.

7. Probability Distributions

1. Discrete Probability Distributions:

o Binomial Distribution:
▪ Number of successes in nnn independent trials.
▪ Formula: P(X=k)=(nk)⋅pk⋅(1−p)n−kP(X = k) = \binom{n}{k} \cdot
p^k \cdot (1-p)^{n-k}P(X=k)=(kn)⋅pk⋅(1−p)n−k
o Poisson Distribution:
▪ Number of events in a fixed interval (time/space).
2. Continuous Probability Distributions:
o Normal Distribution:
▪ Bell-shaped curve.
▪ Properties: Symmetrical, mean = median = mode.
o Exponential Distribution:
▪ Time between events in a Poisson process.

8. Examples

1. Example 1: A coin is flipped twice. What is the probability of getting at least one
head?

Connect & follow for more such content and information on Data Analysis/Science
oSample space: S={HH,HT,TH,TT}S = \{HH, HT, TH,
TT\}S={HH,HT,TH,TT}
o Event AAA: At least one head → A={HH,HT,TH}A = \{HH, HT,
TH\}A={HH,HT,TH}
o Probability: P(A)=34P(A) = \frac{3}{4}P(A)=43
2. Example 2: A bag contains 4 red balls and 6 blue balls. What is the probability of
drawing a red ball?
o Total balls = 10, Red balls = 4.
o Probability: P(Red)=410=0.4P(\text{Red}) = \frac{4}{10} = 0.4P(Red)=104
=0.4.

9. Applications of Probability in Data Analysis

• Estimating uncertainties in predictions.

• Creating probabilistic models (e.g., weather forecasting).
• Forming the basis for inferential statistics (e.g., hypothesis testing).

Connect & follow for more such content and information on Data Analysis/Science
Sampling and Sampling Techniques
Sampling is the process of selecting a subset (sample) from a larger population to infer
conclusions about the entire population. Sampling is essential for practical and efficient data
analysis when studying large populations.

1. Types of Sampling

Sampling methods are broadly divided into probability sampling (where every member of
the population has a known chance of being selected) and non-probability sampling (where
selection is not random). Below are the key types:

A. Random Sampling

• Definition: Every member of the population has an equal chance of being selected.
• Method: Selection is done using random number generators or lottery methods.
• Example: Drawing names out of a hat to select participants for a survey.
• Advantages:
o Reduces selection bias.
o High likelihood of representing the population.
• Disadvantages:
o May be impractical for very large populations.

B. Stratified Sampling

• Definition: The population is divided into strata (groups) based on a characteristic,

and samples are drawn from each stratum proportionally.
• Example: Sampling students by gender (male/female) or income brackets
(low/middle/high).
• Advantages:
o Ensures representation of all subgroups.
o Increases precision of results.
• Disadvantages:
o Requires detailed knowledge of the population.
o Complex and time-consuming.

C. Cluster Sampling

• Definition: The population is divided into clusters, and entire clusters are randomly
selected.
• Example: Selecting a few schools randomly and surveying all students in those
schools.
• Advantages:

Connect & follow for more such content and information on Data Analysis/Science
o Cost-efficient for large populations.
o Useful when population data is geographically spread out.
• Disadvantages:
o Less accurate if clusters are not homogeneous.
o May increase sampling error.

D. Systematic Sampling

• Definition: Every nth member of the population is selected after randomly selecting
the starting point.
• Example: Selecting every 10th person in a list of 1,000 names.
• Advantages:
o Simple and quick to implement.
o Ensures uniform coverage of the population.
• Disadvantages:
o Patterns in the population may bias the sample (e.g., cyclical data).

2. Importance of Sample Size

The sample size plays a crucial role in ensuring the reliability and validity of results.

• Key Considerations:
1. Population Size: A smaller sample may suffice for small populations, but
larger populations require a larger sample.
2. Margin of Error: A smaller margin of error requires a larger sample.
3. Confidence Level: A higher confidence level (e.g., 95% vs. 99%) requires a
larger sample.
• Impact of Sample Size:
o Small Sample: May lead to underrepresentation and unreliable results.
o Large Sample: Reduces sampling error but can be costly and time-
consuming.

3. Sampling Error

• Definition: Sampling error is the difference between the sample statistic (e.g., sample
mean) and the true population parameter (e.g., population mean).
It arises due to the fact that a sample is only a subset of the population.
• Causes:
o Variability in the population.
o Sample size being too small.
o Non-representative sampling techniques.
• Reduction Methods:
o Use random sampling to minimize bias.
o Increase the sample size.
o Use stratified sampling to ensure all groups are proportionally represented.

Connect & follow for more such content and information on Data Analysis/Science
Inferential Statistics.

Inferential Statistics Overview

• Definition: Inferential statistics involves using sample data to make generalizations or

predictions about a larger population.
• It helps to estimate population parameters and test hypotheses.

Key Concepts in Inferential Statistics

1. Population and Sample

• Population: The entire group of individuals or items under study (e.g., all voters in a
country).
• Sample: A subset of the population used for analysis (e.g., 1,000 voters surveyed).

2. Parameters and Statistics

• Parameter: A numerical value summarizing a population (e.g., population mean

μ\muμ).
• Statistic: A numerical value summarizing a sample (e.g., sample mean xˉ\bar{x}xˉ).

3. Sampling Distribution

• The probability distribution of a statistic (e.g., sample mean) when drawn from a
population repeatedly.
• Central Limit Theorem (CLT):
o For large sample sizes (n>30n > 30n>30), the sampling distribution of the
sample mean approximates a normal distribution, regardless of the
population's distribution.

4. Estimation

Estimation involves determining population parameters based on sample data.

1. Point Estimation:
o Provides a single value as an estimate of a parameter (e.g., sample mean
xˉ\bar{x}xˉ for population mean μ\muμ).
2. Interval Estimation:
o Provides a range of values (confidence interval) likely to contain the
population parameter.
o Example: Confidence Interval (CI)=xˉ±z⋅sn\text{Confidence Interval (CI)} =
\bar{x} \pm z \cdot \frac{s}{\sqrt{n}}Confidence Interval (CI)=xˉ±z⋅ns
o zzz: Z-score for confidence level (e.g., 1.96 for 95% CI).

Connect & follow for more such content and information on Data Analysis/Science
o sss: Sample standard deviation.
o nnn: Sample size.

5. Hypothesis Testing

• Hypothesis testing is a formal procedure to test assumptions about population

parameters.

1. Steps in Hypothesis Testing:

o Define null hypothesis (H0H_0H0) and alternative hypothesis (HaH_aHa).
o Choose a significance level (α\alphaα, commonly 0.05 or 0.01).
o Select a test statistic (e.g., t-test, z-test).
o Compute the test statistic and compare it with critical value or p-value.
o Make a decision: Reject or fail to reject H0H_0H0.
2. Types of Errors:
o Type I Error: Rejecting H0H_0H0 when it is true (false positive).
o Type II Error: Failing to reject H0H_0H0 when it is false (false negative).

6. Confidence Intervals vs. Hypothesis Testing

• Confidence intervals estimate the range of a parameter.

• Hypothesis testing evaluates whether a parameter meets a specified condition.

7. Common Tests in Inferential Statistics

1. Z-Test:
o Used when population variance is known and sample size is large (n>30n >
30n>30).
o Example: Testing the mean of a large dataset.
2. T-Test:
o Used when population variance is unknown or sample size is small (n<30n <
30n<30).
o Types:
▪ One-sample t-test.
▪ Independent (two-sample) t-test.
▪ Paired t-test.
3. Chi-Square Test:
o Used to test the association between categorical variables or goodness-of-fit.
o Example: Testing independence between gender and product preference.
4. ANOVA (Analysis of Variance):
o Used to compare means of more than two groups.
o Example: Comparing exam scores across three teaching methods.

Connect & follow for more such content and information on Data Analysis/Science
8. Examples

1. Example 1: A company claims its average delivery time is 30 minutes. A sample of

50 deliveries has a mean of 32 minutes with a standard deviation of 4 minutes. At
α=0.05\alpha = 0.05α=0.05, can we conclude the claim is false?
o Use a one-sample t-test.
2. Example 2: Does a new drug significantly reduce blood pressure compared to a
placebo?
o Use an independent two-sample t-test.

9. Applications of Inferential Statistics

• Predicting market trends from surveys.

• Testing the effectiveness of new treatments.
• Quality control in manufacturing processes.

Connect & follow for more such content and information on Data Analysis/Science
Hypothesis Testing.

Hypothesis Testing Overview

• Definition: Hypothesis testing is a statistical method used to make inferences or

decisions about a population based on sample data.
• It helps determine whether there is enough evidence to reject a null hypothesis.

Key Concepts

1. Hypotheses

• Null Hypothesis (H0H_0H0): A statement of no effect or no difference.

Example: "There is no difference in the mean scores of two groups."
• Alternative Hypothesis (H1H_1H1): A statement that contradicts the null
hypothesis.
Example: "There is a significant difference in the mean scores of two groups."

2. Types of Hypothesis Tests

1. One-Tailed Test:
o Tests if the sample mean is significantly greater or smaller than the population
mean.
o Example: Testing if a new drug increases recovery rate.
2. Two-Tailed Test:
o Tests if the sample mean is significantly different (either higher or lower) from
the population mean.
o Example: Testing if a new teaching method affects scores (positively or
negatively).

3. Steps in Hypothesis Testing

1. State the Hypotheses: Define H0H_0H0 and H1H_1H1.

Example: H0H_0H0: The average salary is $50,000. H1H_1H1: The average salary is
not $50,000.
2. Set the Significance Level (α\alphaα): Common values are 0.05 or 0.01.
3. Choose the Appropriate Test:
o t tt-test (small sample size or unknown population standard deviation).
o z zz-test (large sample size or known population standard deviation).
o Chi-square test (categorical data).
o ANOVA (comparing means of 3 or more groups).
4. Compute the Test Statistic: Use formulas or statistical software to calculate.
5. Determine the p-Value:

Connect & follow for more such content and information on Data Analysis/Science
o The p-value tells us the probability of observing the data if H0H_0H0 is true.
6. Make a Decision:
o If p≤αp \leq \alphap≤α: Reject H0H_0H0 (there is enough evidence to support
H1H_1H1).
o If p>αp > \alphap>α: Fail to reject H0H_0H0 (no sufficient evidence to
support H1H_1H1).

4. Types of Errors

1. Type I Error (α\alphaα): Rejecting a true null hypothesis.

Example: Convicting an innocent person.
2. Type II Error (β\betaβ): Failing to reject a false null hypothesis.
Example: Letting a guilty person go free.

5. Examples of Hypothesis Testing

1. Two-Sample ttt-Test: Comparing means of two groups.

Example: Do male and female students perform differently in exams?
2. Paired ttt-Test: Comparing means of the same group before and after a treatment.
Example: Measuring the effect of a diet plan on weight loss.
3. Chi-Square Test: Testing independence or goodness of fit for categorical data.
Example: Are gender and preference for a product related?
4. ANOVA: Comparing means across three or more groups.
Example: Does the average income differ across three cities?

6. Interpreting Results

• Critical Value Approach:

o Compare the test statistic with critical values from a statistical table.
o If the test statistic exceeds the critical value, reject H0H_0H0.
• P-Value Approach:
o If the p-value is less than α\alphaα, reject H0H_0H0.
o Example: If p=0.03p = 0.03p=0.03 and α=0.05\alpha = 0.05α=0.05, reject
H0H_0H0.

7. Practical Applications

1. Business: Testing the effectiveness of marketing campaigns.

2. Healthcare: Evaluating the impact of new drugs.
3. Education: Assessing teaching methods.
4. Manufacturing: Testing the quality of products.

Connect & follow for more such content and information on Data Analysis/Science
Regression Analysis

Regression Analysis Overview

• Definition: Regression analysis is a statistical technique used to understand the

relationship between one dependent variable (target) and one or more independent
variables (predictors).
• It helps predict outcomes and identify trends or patterns.

Types of Regression Analysis

1. Simple Linear Regression:

o Examines the relationship between one dependent variable and one
independent variable.
o Example: Predicting sales based on advertising expenditure.
o Equation: Y=b0+b1X+ϵY = b_0 + b_1X + \epsilonY=b0+b1X+ϵ where:
YYY = Dependent variable
XXX = Independent variable
b0b_0b0 = Intercept
b1b_1b1 = Slope (rate of change)
ϵ\epsilonϵ = Error term
2. Multiple Linear Regression:
o Examines the relationship between one dependent variable and two or more
independent variables.
o Example: Predicting house prices based on size, location, and number of
bedrooms.
o Equation: Y=b0+b1X1+b2X2+⋯+bnXn+ϵY = b_0 + b_1X_1 + b_2X_2 +
\dots + b_nX_n + \epsilonY=b0+b1X1+b2X2+⋯+bnXn+ϵ
3. Logistic Regression:
o Used when the dependent variable is categorical (e.g., yes/no, 0/1).
o Example: Predicting whether a customer will buy a product (yes/no).
4. Polynomial Regression:
o Extends linear regression by fitting a curve to the data.
o Example: Predicting growth rates in non-linear relationships.
5. Ridge and Lasso Regression (Regularization Techniques):
o Used to prevent overfitting in models with many predictors by penalizing
large coefficients.

Key Terms in Regression

1. Coefficient (b1,b2,…b_1, b_2, \dotsb1,b2,…):

o Measures the strength and direction of the relationship between each predictor
and the target variable.
2. Intercept (b0b_0b0):

Connect & follow for more such content and information on Data Analysis/Science
o The value of the dependent variable when all predictors are zero.
3. Residuals:
o The difference between observed and predicted values:
Residual=Yobserved−Ypredicted\text{Residual} = Y_{\text{observed}} -
Y_{\text{predicted}}Residual=Yobserved−Ypredicted
4. R-Squared (R2R^2R2):
o Measures the proportion of variance in the dependent variable explained by
the independent variables.
o Value ranges from 0 to 1. Higher values indicate a better fit.
5. Adjusted R-Squared:
o Adjusted for the number of predictors. Used when comparing models with
different numbers of predictors.
6. P-Value:
o Tests whether the coefficients are statistically significant.
p<0.05p < 0.05p<0.05: The predictor significantly impacts the dependent
variable.

Steps in Regression Analysis

1. Define the Problem:

o Identify the dependent and independent variables.
2. Visualize the Data:
o Use scatterplots to examine relationships between variables.
3. Check Assumptions:
o Linearity: Relationship between independent and dependent variables should
be linear.
o Independence: Observations should be independent.
o Homoscedasticity: Variance of residuals should be constant across all levels
of the independent variable.
o Normality: Residuals should be approximately normally distributed.
4. Fit the Model:
o Use statistical software (e.g., Excel, Python, R) to estimate coefficients.
5. Evaluate the Model:
o Look at R2R^2R2, p-values, and residual plots.
6. Make Predictions:
o Use the regression equation to predict new outcomes.

Common Applications

1. Business:
o Forecasting sales, revenue, or market trends.
2. Healthcare:
o Predicting patient outcomes based on clinical variables.
3. Economics:
o Studying relationships between economic indicators (e.g., inflation vs. GDP
growth).
4. Education:

Connect & follow for more such content and information on Data Analysis/Science
o Evaluating the impact of study hours on exam scores.

Practical Example

Scenario:

You want to predict monthly sales (YYY) based on advertising spend (XXX).

Steps:

1. Data Collection: Gather data on monthly sales and advertising expenditure.

2. Visualize: Plot sales vs. advertising spend to check for a linear trend.
3. Fit Model:

Y=b0+b1X+ϵY = b_0 + b_1X + \epsilonY=b0+b1X+ϵ

o Software will calculate b0b_0b0 (intercept) and b1b_1b1 (slope).

4. Interpret Results:
o If b1=50b_1 = 50b1=50: For every $1 spent on advertising, sales increase by
$50.
o Check R2R^2R2 to see how well the model explains sales variance.

Advanced Topics

1. Multicollinearity:
o Occurs when independent variables are highly correlated.
o Solution: Remove redundant predictors or use Ridge/Lasso regression.
2. Interaction Effects:
o Examines if the effect of one predictor depends on the value of another.
o Example: The impact of advertising may differ based on the season.
3. Model Diagnostics:
o Residual plots to check assumptions.
o Use AIC/BIC for model comparison.

Connect & follow for more such content and information on Data Analysis/Science
Correlation and Covariance
Correlation and covariance measure the relationship between two variables, but they differ in
how they express that relationship. Let’s dive into each aspect.

1. Definition and Differences

Aspect Correlation Covariance

Definition Measures the strength and direction of the Measures how two variables
linear relationship between two variables. vary together.
Range Always between -1 and +1. Can take any value, positive
or negative.
Unit Unitless (standardized). Expressed in units of the
variables.
Interpretation Indicates strength and direction of the Indicates direction of the
relationship. relationship, but not
strength.

Correlation Analysis.
Correlation Analysis Overview

• Definition: Correlation analysis measures the strength and direction of a linear

relationship between two variables.
• Purpose:
o Understand how two variables are related.
o Identify patterns and trends in data.

Key Features of Correlation

1. Correlation Coefficient (rrr):

o A numerical value between −1-1−1 and +1+1+1 that represents the strength
and direction of the relationship.
o Formula for Pearson Correlation Coefficient:
r=∑(Xi−Xˉ)(Yi−Yˉ)∑(Xi−Xˉ)2∑(Yi−Yˉ)2r = \frac{\sum (X_i - \bar{X})(Y_i
- \bar{Y})}{\sqrt{\sum (X_i - \bar{X})^2 \sum (Y_i - \bar{Y})^2}}r=∑(Xi
−Xˉ)2∑(Yi−Yˉ)2∑(Xi−Xˉ)(Yi−Yˉ)
2. Range of rrr:
o r=+1r = +1r=+1: Perfect positive correlation.
o r=−1r = -1r=−1: Perfect negative correlation.
o r=0r = 0r=0: No correlation.
3. Direction of Correlation:
o Positive Correlation: Both variables increase together (e.g., height and
weight).

Connect & follow for more such content and information on Data Analysis/Science
o Negative Correlation: One variable increases while the other decreases (e.g.,
speed and time taken to travel a fixed distance).
o No Correlation: No apparent relationship between variables (e.g., shoe size
and IQ).
4. Strength of Correlation:
o 0.0≤∣r∣<0.30.0 \leq |r| < 0.30.0≤∣r∣<0.3: Weak correlation.
o 0.3≤∣r∣<0.70.3 \leq |r| < 0.70.3≤∣r∣<0.7: Moderate correlation.
o 0.7≤∣r∣≤1.00.7 \leq |r| \leq 1.00.7≤∣r∣≤1.0: Strong correlation.

Types of Correlation

1. Pearson Correlation:
o Measures linear relationships between two continuous variables.
o Assumes normal distribution and no significant outliers.
2. Spearman Rank Correlation:
o Measures monotonic relationships (increasing or decreasing, not necessarily
linear).
o Uses ranks of data instead of actual values.
3. Kendall’s Tau:
o Measures ordinal association between two variables.
o Used when datasets have tied ranks or smaller sample sizes.
4. Partial Correlation:
o Measures the relationship between two variables while controlling for the
effect of one or more additional variables.

Assumptions of Correlation Analysis

1. Quantitative Data: Variables should be continuous or ordinal for most correlation

methods.
2. Linear Relationship: Assumes a straight-line relationship for Pearson correlation.
3. Homogeneity: Data should have similar variance.
4. Normal Distribution: For Pearson correlation, the data should be approximately
normally distributed.

Correlation vs. Causation

• Correlation:
o Shows a relationship between two variables but does not imply causation.
o Example: Ice cream sales and drowning incidents may be correlated due to a
third factor (hot weather).
• Causation:
o One variable directly affects another.
o Requires experimental evidence to confirm.

Connect & follow for more such content and information on Data Analysis/Science
Applications of Correlation

1. Business:
o Analyze the relationship between marketing spend and sales.
2. Healthcare:
o Study the relationship between physical activity and heart disease risk.
3. Education:
o Investigate the correlation between study hours and exam performance.
4. Finance:
o Examine the relationship between stock prices and market indices.

Steps to Perform Correlation Analysis

1. Data Collection:
o Gather data on the two variables of interest.
2. Visualize the Data:
o Use scatterplots to identify patterns or trends.
3. Calculate the Correlation Coefficient:
o Use statistical software or formulas.
4. Interpret Results:
o Determine the strength and direction of the relationship.
5. Validate Assumptions:
o Ensure data meets the assumptions of the chosen correlation method.

Example: Analyzing Study Hours and Exam Scores

Scenario:

• You have data on 10 students' study hours and their corresponding exam scores.
• Hypothesis: Students who study more tend to score higher.

Steps:

1. Visualize: Plot study hours vs. exam scores to check for a linear relationship.
2. Calculate rrr:
o Use Pearson correlation formula.
o Let’s assume r=0.85r = 0.85r=0.85.
3. Interpretation:
o r=0.85r = 0.85r=0.85: Strong positive correlation. Students who study more
tend to score higher.

Correlation Matrix

• Definition: A table showing correlation coefficients between multiple variables.

• Example:

Connect & follow for more such content and information on Data Analysis/Science
Variable Study Hours Exam Scores Attendance
Study Hours 1 0.85 0.6
Exam Scores 0.85 1 0.7
Attendance 0.6 0.7 1

• Useful for identifying relationships in large datasets.

Practical Notes

1. Correlation does not detect non-linear relationships (e.g., U-shaped patterns).

2. Outliers can distort correlation coefficients.
3. Always complement correlation analysis with additional tests (e.g., regression,
experiments).

Connect & follow for more such content and information on Data Analysis/Science
Analysis of Variance (ANOVA).

Analysis of Variance (ANOVA) Overview

• Definition: ANOVA is a statistical technique used to compare the means of three or

more groups to determine if at least one group mean is significantly different from the
others.
• Purpose:
o Test differences among group means.
o Analyze the impact of one or more factors on a dependent variable.

Types of ANOVA

1. One-Way ANOVA:
o Used when there is one independent variable (factor) with multiple levels.
o Example: Comparing test scores of students across three teaching methods.
2. Two-Way ANOVA:
o Used when there are two independent variables (factors).
o Example: Studying the effect of teaching method (factor 1) and study
environment (factor 2) on student performance.
3. Repeated Measures ANOVA:
o Used when the same subjects are measured multiple times under different
conditions.
o Example: Testing the effect of a drug on the same patients at different time
intervals.
4. MANOVA (Multivariate Analysis of Variance):
o Extends ANOVA to multiple dependent variables.
o Example: Studying the effect of a training program on both performance and
motivation.

Key Assumptions of ANOVA

1. Normality:
o The dependent variable is normally distributed within each group.
2. Homogeneity of Variance:
o Variances of the groups being compared are approximately equal.
3. Independence:
o Observations in each group are independent of each other.

Terms in ANOVA

1. Factors:

Connect & follow for more such content and information on Data Analysis/Science
o Independent variables being studied.
2. Levels:
o Categories or groups within a factor.
3. F-Ratio:
o The test statistic in ANOVA.
o

F = \frac{\text{Variance Between Groups}}{\text{Variance Within Groups}} ]

4. Sum of Squares (SS):

o Between Groups (SSB): Variability due to differences between group means.
o Within Groups (SSW): Variability due to differences within each group.
5. Degrees of Freedom (df):
o Between Groups: k−1k - 1k−1, where kkk is the number of groups.
o Within Groups: n−kn - kn−k, where nnn is the total number of observations.
6. Mean Square (MS):
o Variance estimates:
▪ MSB=SSBdfBMS_B = \frac{SS_B}{df_B}MSB=dfBSSB
▪ MSW=SSWdfWMS_W = \frac{SS_W}{df_W}MSW=dfWSSW
7. P-Value:
o Probability of observing the F-ratio under the null hypothesis.

Steps in Conducting ANOVA

1. State the Hypotheses:

o Null Hypothesis (H0H_0H0): All group means are equal.
o Alternative Hypothesis (HaH_aHa): At least one group mean is different.
2. Set the Significance Level (α\alphaα):
o Commonly α=0.05\alpha = 0.05α=0.05.
3. Calculate the F-Ratio:
o Compare the variance between groups to the variance within groups.
4. Determine the P-Value:
o Use statistical software or F-distribution tables.
5. Make a Decision:
o If p≤αp \leq \alphap≤α: Reject H0H_0H0 (evidence suggests group means are
different).
o If p>αp > \alphap>α: Fail to reject H0H_0H0.

One-Way ANOVA Example

Scenario:

A researcher wants to test whether three diets (A, B, and C) lead to different average weight
losses.

1. Hypotheses:
o H0H_0H0: μA=μB=μC\mu_A = \mu_B = \mu_CμA=μB=μC.

Connect & follow for more such content and information on Data Analysis/Science
o HaH_aHa: At least one mean is different.
2. Data:
o Diet A: [5, 6, 7]
o Diet B: [8, 9, 10]
o Diet C: [3, 4, 5]
3. Calculate Sum of Squares:
o SSBSS_BSSB: Variability between group means.
o SSWSS_WSSW: Variability within each group.
4. Degrees of Freedom:
o dfB=k−1=3−1=2df_B = k - 1 = 3 - 1 = 2dfB=k−1=3−1=2.
o dfW=n−k=9−3=6df_W = n - k = 9 - 3 = 6dfW=n−k=9−3=6.
5. Calculate F-Ratio:
o Use F=MSBMSWF = \frac{MS_B}{MS_W}F=MSWMSB, where
MS=SSdfMS = \frac{SS}{df}MS=dfSS.
6. Compare P-Value and α\alphaα:
o If p≤0.05p \leq 0.05p≤0.05: Reject H0H_0H0.

Two-Way ANOVA Example

Scenario:

A researcher studies the effect of gender (male, female) and exercise type (yoga, cardio) on
stress levels.

1. Factors:
o Factor 1: Gender (2 levels).
o Factor 2: Exercise Type (2 levels).
2. Hypotheses:
o Main Effects:
▪ H0H_0H0: No difference in stress levels by gender.
▪ H0H_0H0: No difference in stress levels by exercise type.
o Interaction Effect:
▪ H0H_0H0: No interaction between gender and exercise type.
3. Steps:
o Compute main effects and interaction effect.
o Compare F-ratio for each.

Post-Hoc Tests

If ANOVA shows significant results, post-hoc tests determine which groups differ:

1. Tukey's HSD:
o Compares all possible pairs of group means.
2. Bonferroni Correction:
o Adjusts α\alphaα to reduce Type I error.
3. Dunnett’s Test:
o Compares all groups to a control group.

Connect & follow for more such content and information on Data Analysis/Science
Applications of ANOVA

1. Business:
o Testing the effectiveness of different advertising campaigns.
2. Education:
o Comparing average scores across multiple teaching methods.
3. Healthcare:
o Evaluating the effectiveness of various treatments.

Connect & follow for more such content and information on Data Analysis/Science
Chi-Square Test.

Chi-Square Test Overview

The Chi-Square Test is a non-parametric statistical test used to determine if there is a

significant association between categorical variables or if an observed distribution matches an
expected one.

Types of Chi-Square Tests

1. Chi-Square Test of Independence:

o Tests whether two categorical variables are independent of each other.
o Example: Checking if gender and product preference are related.
2. Chi-Square Goodness-of-Fit Test:
o Tests whether the observed distribution of a single categorical variable
matches an expected distribution.
o Example: Verifying if a die is fair by comparing observed and expected
frequencies of each side.

Key Assumptions of the Chi-Square Test

1. The data are categorical.

2. Observations are independent.
3. The sample size is large enough:
o Expected frequency in each cell should be at least 5.

Chi-Square Test Formula

The test statistic (χ2\chi^2χ2) is calculated as:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2

Where:

• OOO: Observed frequency.

• EEE: Expected frequency.

Chi-Square Test of Independence

Steps:

Connect & follow for more such content and information on Data Analysis/Science
1. State the Hypotheses:
o Null Hypothesis (H0H_0H0): The two variables are independent.
o Alternative Hypothesis (HaH_aHa): The two variables are not independent.
2. Set the Significance Level (α\alphaα):
o Commonly α=0.05\alpha = 0.05α=0.05.
3. Create a Contingency Table:
o Summarize the observed frequencies of the variables in a matrix format.
4. Calculate Expected Frequencies:
o Use the formula: E=Row Total×Column TotalGrand TotalE = \frac{\text{Row
Total} \times \text{Column Total}}{\text{Grand
Total}}E=Grand TotalRow Total×Column Total
5. Compute the Chi-Square Statistic:
o Plug observed (OOO) and expected (EEE) frequencies into the formula.
6. Determine Degrees of Freedom:
o df=(r−1)(c−1)df = (r - 1)(c - 1)df=(r−1)(c−1), where rrr is the number of rows
and ccc is the number of columns.
7. Compare χ2\chi^2χ2 Value to Critical Value:
o Use a Chi-Square distribution table or statistical software.
8. Make a Decision:
o If χ2≥\chi^2 \geqχ2≥ critical value or p≤αp \leq \alphap≤α: Reject H0H_0H0.
o Otherwise, fail to reject H0H_0H0.

Example: Chi-Square Test of Independence

Scenario: A company wants to determine if product preference (Product A, B, C) is related

to customer age group (Youth, Adult, Senior).

Observed Data:

Product A Product B Product C Row Total

Youth 30 20 10 60
Adult 40 30 20 90
Senior 20 30 40 90
Column Total 90 80 70 240

1. Expected Frequencies:
o For Youth & Product A:
E=Row Total×Column TotalGrand Total=60×90240=22.5E = \frac{\text{Row
Total} \times \text{Column Total}}{\text{Grand Total}} = \frac{60 \times
90}{240} = 22.5E=Grand TotalRow Total×Column Total=24060×90=22.5

Similarly, calculate EEE for all cells.

2. Chi-Square Statistic:
o Calculate χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2.
3. Degrees of Freedom:
o df=(3−1)(3−1)=4df = (3 - 1)(3 - 1) = 4df=(3−1)(3−1)=4.
4. Decision:

Connect & follow for more such content and information on Data Analysis/Science
o Compare χ2\chi^2χ2 to critical value at df=4df = 4df=4 and α=0.05\alpha =
0.05α=0.05.

Chi-Square Goodness-of-Fit Test

Steps:

1. State the Hypotheses:

o Null Hypothesis (H0H_0H0): The observed frequencies match the expected
distribution.
o Alternative Hypothesis (HaH_aHa): The observed frequencies do not match
the expected distribution.
2. Set the Significance Level (α\alphaα).
3. Calculate Expected Frequencies:
o Use the total sample size and proportions.
4. Compute the Chi-Square Statistic:
o Use χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2.
5. Determine Degrees of Freedom:
o df=k−1df = k - 1df=k−1, where kkk is the number of categories.
6. Compare χ2\chi^2χ2 to the Critical Value.

Example: Goodness-of-Fit

Scenario: A die is rolled 60 times, and the observed frequencies of each side are:

• 1: 8, 2: 10, 3: 12, 4: 8, 5: 14, 6: 8.

Hypotheses:

• H0H_0H0: The die is fair (E=606=10E = \frac{60}{6} = 10E=660=10 for each side).
• HaH_aHa: The die is not fair.

1. Expected Frequencies:
o E=10E = 10E=10 for all sides.
2. Chi-Square Statistic:
o Calculate χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2.
3. Degrees of Freedom:
o df=6−1=5df = 6 - 1 = 5df=6−1=5.
4. Decision:
o Compare χ2\chi^2χ2 to critical value at df=5df = 5df=5 and α=0.05\alpha =
0.05α=0.05.

Applications of the Chi-Square Test

1. Business:

Connect & follow for more such content and information on Data Analysis/Science
o Analyzing customer preferences across regions.
2. Healthcare:
o Studying the association between a disease and a risk factor.
3. Education:
o Determining if student performance differs by teaching method.

Connect & follow for more such content and information on Data Analysis/Science
Data Transformation and Standardization
Log Transformation

Definition:
Log transformation is the process of applying the logarithm function to transform a dataset. It
is used to reduce skewness and normalize data.

Why Use It?

• Reduces the impact of outliers.

• Handles data with an exponential growth pattern.
• Helps stabilize the variance in heteroscedastic data.

When to Use It?

• When the data has positive skewness.

• When values range across several orders of magnitude.

Formula:
For a value xx:

• y=log⁡(x)y = \log(x) (commonly base 10 or natural log)

Example: Original data: [10,100,1000][10, 100, 1000]

Log-transformed data (base 10): [1,2,3][1, 2, 3]

Min-Max Scaling

Definition:
Min-Max scaling transforms data to a fixed range, typically between 0 and 1.

Why Use It?

• To standardize features for machine learning models.

• To ensure all features contribute equally.

Formula:
For a value xx:

• x′=x−min(x)max(x)−min(x)x' = \frac{x - \text{min}(x)}{\text{max}(x) -

\text{min}(x)}

Example:
Original data: [10,20,30][10, 20, 30]
Scaled data: [0,0.5,1][0, 0.5, 1]

Advantages:

Connect & follow for more such content and information on Data Analysis/Science
• Preserves the relationships between data points.
• Easy to implement.

Z-Score Standardization

Definition:
Z-Score standardization transforms data to have a mean of 0 and a standard deviation of 1. It
is used when the dataset contains outliers.

Why Use It?

• To standardize data with different units or scales.

• For algorithms sensitive to feature scaling (e.g., SVM, PCA).

Formula:
For a value xx:

• z=x−μσz = \frac{x - \mu}{\sigma}

where μ\mu is the mean and σ\sigma is the standard deviation.

Example:
Original data: [10,20,30][10, 20, 30]
Mean: 2020, Standard deviation: 1010
Z-Scores: [−1,0,1][-1, 0, 1]

Key Points:

• Z-Scores help compare data from different distributions.

• Often used in statistical tests and machine learning.

Connect & follow for more such content and information on Data Analysis/Science
Outliers and Missing Data
Detection of Outliers

1. Using Interquartile Range (IQR):

Outliers are values that fall significantly below or above most of the data. IQR is a common
method to detect them.

• Steps:
1. Calculate Q1 (25th percentile) and Q3 (75th percentile).
2. Compute IQR: IQR=Q3−Q1\text{IQR} = Q3 - Q1.
3. Define outlier boundaries:
▪ Lower boundary: Q1−1.5×IQRQ1 - 1.5 \times \text{IQR}
▪ Upper boundary: Q3+1.5×IQRQ3 + 1.5 \times \text{IQR}
4. Values outside these boundaries are considered outliers.

Example:
Data: [1,2,3,4,5,6,100][1, 2, 3, 4, 5, 6, 100]
Q1 = 2.5, Q3 = 5.5, IQR = 3
Lower boundary: 2.5−1.5(3)=−22.5 - 1.5(3) = -2
Upper boundary: 5.5+1.5(3)=105.5 + 1.5(3) = 10
Outlier: 100100

2. Using Z-Scores:
Z-Scores help detect outliers by measuring how far a data point is from the mean in terms of
standard deviations.

• Rule: Any value with ∣z∣>3|z| > 3 is considered an outlier.

• Formula: z=x−μσz = \frac{x - \mu}{\sigma}

Example:
Data: [10,20,30,1000][10, 20, 30, 1000], Mean = 265, Std Dev = 476.74
Z-Score for 1000: 1000−265476.74≈1.54\frac{1000 - 265}{476.74} \approx 1.54 (not an
outlier).

Handling Outliers

1. Winsorizing:

• Replacing extreme values with a less extreme value (e.g., replacing them with the
nearest boundary).

2. Capping:

• Setting a maximum and minimum threshold. Values beyond these thresholds are
capped.

Connect & follow for more such content and information on Data Analysis/Science
3. Removing:

• Completely removing outliers if they are errors or irrelevant to the analysis.

Missing Data Imputation

1. Mean, Median, Mode Imputation:

• Replace missing values with the mean, median, or mode of the column.
• Works well for small datasets with low missingness.

2. Regression Imputation:

• Predict missing values using regression models.

• Example: If a column YY has missing values, it can be predicted using other columns
X1,X2,…X_1, X_2, \dots.

3. Advanced Techniques:

• K-Nearest Neighbors (KNN): Uses similar rows to fill in missing values.

• Multiple Imputation: Repeats imputation several times and combines the results for
robustness.

Connect & follow for more such content and information on Data Analysis/Science
Measures of Association
Measures of association describe the relationship or dependency between two variables.
These are crucial in determining how one variable changes concerning another. Below are
key topics under this category:

1. Odds Ratio (OR)

• Definition: Odds ratio measures the strength of association between two binary
variables.
It's often used in case-control studies to determine how strongly an exposure is
associated with an outcome.
• Formula:

Odds Ratio=Odds of Event in Group AOdds of Event in Group B\text{Odds Ratio} =

\frac{\text{Odds of Event in Group A}}{\text{Odds of Event in Group B}}

Where:

Odds=Probability of EventProbability of No Event\text{Odds} =

\frac{\text{Probability of Event}}{\text{Probability of No Event}}

• Example:
In a study of 100 people:
o 40 people drink coffee, and 30 of them report improved focus.
o 60 people don’t drink coffee, and 20 of them report improved focus.

Odds of improved focus for coffee drinkers: 3010=3\frac{30}{10} = 3

Odds of improved focus for non-coffee drinkers: 2040=0.5\frac{20}{40} = 0.5

Odds Ratio: 30.5=6\frac{3}{0.5} = 6

Interpretation: Coffee drinkers are 6 times more likely to report improved focus
compared to non-coffee drinkers.

2. Relative Risk (RR)

• Definition: Relative risk compares the probability of an event occurring in two

groups.
Unlike odds ratio, it directly compares probabilities rather than odds.
• Formula:

Relative Risk=Probability of Event in Group AProbability of Event in Group B\text{

Relative Risk} = \frac{\text{Probability of Event in Group A}}{\text{Probability of
Event in Group B}}

• Example:
Using the same data as above:

Connect & follow for more such content and information on Data Analysis/Science
o Probability of improved focus for coffee drinkers: 3040=0.75\frac{30}{40} =
0.75
o Probability of improved focus for non-coffee drinkers:
2060=0.33\frac{20}{60} = 0.33

Relative Risk: 0.750.33=2.27\frac{0.75}{0.33} = 2.27

Interpretation: Coffee drinkers are 2.27 times more likely to report improved focus
compared to non-coffee drinkers.

3. Contingency Tables (Cross-tabulation)

• Definition: A contingency table shows the frequency distribution of variables. It is

used to analyze relationships between categorical variables.
• Structure:
A 2x2 contingency table example for a study of smokers and lung disease:

Disease Present Disease Absent Total

Smokers 50 30 80
Non-Smokers 20 50 70
Total 70 80 150

• Measures Derived from Contingency Tables:

1. Risk: Disease PresentTotal Group Size\frac{\text{Disease
Present}}{\text{Total Group Size}}
2. Odds Ratio and Relative Risk: Computed as shown above.
• Chi-Square Test: Used to assess whether the observed distribution in the table is
significantly different from what we would expect if there were no relationship
between the variables.

Connect & follow for more such content and information on Data Analysis/Science

Statistics
No ratings yet
Statistics
7 pages
Complete Data Analysts RoadMap
No ratings yet
Complete Data Analysts RoadMap
47 pages
Lecture Note On Biostatistics
No ratings yet
Lecture Note On Biostatistics
74 pages
Statistics
No ratings yet
Statistics
21 pages
Biostatistis
No ratings yet
Biostatistis
35 pages
Verbal QB
50% (2)
Verbal QB
84 pages
ML2 Math Algo
No ratings yet
ML2 Math Algo
72 pages
Glossary of Wildland Fire Terminology
100% (1)
Glossary of Wildland Fire Terminology
185 pages
Assignment of Inferencial Ststistics Annd Hypothisis Testing (1) Mahi
No ratings yet
Assignment of Inferencial Ststistics Annd Hypothisis Testing (1) Mahi
13 pages
Probab, Stats
No ratings yet
Probab, Stats
17 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
Lecture01
No ratings yet
Lecture01
76 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
DAVA Notes 1-1
No ratings yet
DAVA Notes 1-1
19 pages
STAT100 - Full Course Notes
No ratings yet
STAT100 - Full Course Notes
27 pages
ML Unit-3
No ratings yet
ML Unit-3
18 pages
Statistic Module 2
No ratings yet
Statistic Module 2
15 pages
Chapter 1 - F2021 - IE 242
No ratings yet
Chapter 1 - F2021 - IE 242
35 pages
Kavya RM
No ratings yet
Kavya RM
10 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
Data Types:: Basic Statistics
No ratings yet
Data Types:: Basic Statistics
23 pages
Pitfall Traps
No ratings yet
Pitfall Traps
15 pages
Head First Statistics Bullet Points
No ratings yet
Head First Statistics Bullet Points
28 pages
Literature Review of Technical Analysis
50% (4)
Literature Review of Technical Analysis
7 pages
F.Y. Maths PPT On Probability and Statistics
No ratings yet
F.Y. Maths PPT On Probability and Statistics
10 pages
Bab III Integral Ganda
No ratings yet
Bab III Integral Ganda
396 pages
Statistics Notes With Examples
No ratings yet
Statistics Notes With Examples
3 pages
Data Science 01 - Basics
No ratings yet
Data Science 01 - Basics
52 pages
Safety First 24
100% (1)
Safety First 24
64 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
IE101 Reviewer
No ratings yet
IE101 Reviewer
22 pages
Notes
No ratings yet
Notes
12 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Statistics and Probability Summary
No ratings yet
Statistics and Probability Summary
6 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
Lesson 4 Notes
No ratings yet
Lesson 4 Notes
14 pages
1 Intro-Statistics
No ratings yet
1 Intro-Statistics
61 pages
Statistics
No ratings yet
Statistics
2 pages
Satellite Orbit and Constellation Design
No ratings yet
Satellite Orbit and Constellation Design
51 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
Prelim Coverage
No ratings yet
Prelim Coverage
6 pages
Theoretical Questions in Basic Business Statistics
No ratings yet
Theoretical Questions in Basic Business Statistics
12 pages
Introduction To Food Freezing - 3
No ratings yet
Introduction To Food Freezing - 3
50 pages
Quantifying Uncertainty: Week 5
No ratings yet
Quantifying Uncertainty: Week 5
38 pages
Probability&stats
No ratings yet
Probability&stats
12 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
STAT515 Lecture
No ratings yet
STAT515 Lecture
85 pages
Falling Film Evaporator
No ratings yet
Falling Film Evaporator
3 pages
Master Thesis Meteorology
100% (2)
Master Thesis Meteorology
8 pages
LQ1 Notes
No ratings yet
LQ1 Notes
15 pages
A 320 Perf Fuel Saving
100% (1)
A 320 Perf Fuel Saving
51 pages
Stat & Probability
No ratings yet
Stat & Probability
48 pages
Sol Adv Vocabulary Builder Akey
No ratings yet
Sol Adv Vocabulary Builder Akey
6 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
MATH 533 Part C - Regression and Correlation Analysis
0% (1)
MATH 533 Part C - Regression and Correlation Analysis
9 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
13 pages
Experiment To Measure Avogadro's Constant.
No ratings yet
Experiment To Measure Avogadro's Constant.
4 pages
ISDS 361A - Cheat Sheet Exam 1 PDF
No ratings yet
ISDS 361A - Cheat Sheet Exam 1 PDF
2 pages
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
No ratings yet
Probability and Statistics For Engineers Applied Statistics: Course 461601 Course 400516
22 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
Point To Point Microwave PDF
100% (1)
Point To Point Microwave PDF
83 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
M 6 Chap. 6 OMTQM Finals
No ratings yet
M 6 Chap. 6 OMTQM Finals
14 pages
Present Uses: 'When', 'Until', 'After', 'Before' and 'As Soon As'
No ratings yet
Present Uses: 'When', 'Until', 'After', 'Before' and 'As Soon As'
12 pages
Key Concepts Ch1 6
No ratings yet
Key Concepts Ch1 6
2 pages
Decsci Reviewer CHAPTER 1: Statistics and Data
No ratings yet
Decsci Reviewer CHAPTER 1: Statistics and Data
7 pages
SofarOcean VoyageOpt Whitepaper 2023
No ratings yet
SofarOcean VoyageOpt Whitepaper 2023
13 pages
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Summative Test On Typhoon
100% (2)
Summative Test On Typhoon
1 page
Weather Wonders Weekly 1.
No ratings yet
Weather Wonders Weekly 1.
2 pages
Test Questions Chapter 2
No ratings yet
Test Questions Chapter 2
4 pages
Station ID (YMML) Time of Report (302200Z) : Metar Codes
No ratings yet
Station ID (YMML) Time of Report (302200Z) : Metar Codes
3 pages
Vocabulary
No ratings yet
Vocabulary
5 pages
Modal Verbs USE Formation (Affirmative, Interrogative, Negative)
No ratings yet
Modal Verbs USE Formation (Affirmative, Interrogative, Negative)
4 pages
First Order Differential Equations
No ratings yet
First Order Differential Equations
9 pages
Building LSTM-Based Model For Solar Energy Forecasting - by Dr. Saptarsi Goswami - Towards Data Science
No ratings yet
Building LSTM-Based Model For Solar Energy Forecasting - by Dr. Saptarsi Goswami - Towards Data Science
7 pages
Rainy Sunny Cold Cloudy Warm Snowy Windy
No ratings yet
Rainy Sunny Cold Cloudy Warm Snowy Windy
1 page
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Information and Communication Technologies For Veterinary Sciences and Animal Husbandry in Jammu and Kashmir
No ratings yet
Information and Communication Technologies For Veterinary Sciences and Animal Husbandry in Jammu and Kashmir
5 pages
Nairobi Weather - Google Search
No ratings yet
Nairobi Weather - Google Search
1 page
Stylumia
No ratings yet
Stylumia
3 pages