Static Tics
Static Tics
Curated by
SAURABH G
Founder at DataNiti
6+ Years of Experience | Senior Data Engineer
Linkedin: www.linkedin.com/in/saurabhgghatnekar
BHAVESH ARORA
Senior Data Analyst at Delight Learning Services
M.Tech – IIT Jodhpur | 3+ Years of Experience
Linkedin: www.linkedin.com/in/bhavesh-arora-11b0a319b
✅ 1. Mean (Average)
Python Code:
import numpy as np
data = [10, 20, 30, 40, 50]
print("Mean:", np.mean(data))
✅ 2. Median
✅ 3. Mode
Python Code:
print("Range:", max(data) - min(data))
Variance tells how far each number in the dataset is from the mean.
Formulas:
● Population Variance (σ²):
Python Code:
import numpy as np
data = [10, 20, 30, 40, 50]
print("Sample Variance:", np.var(data, ddof=1)) # ddof=1 for sample
✅ 2. Standard Deviation (σ or s)
The square root of variance. It’s in the same unit as the data.
Formula:
Python Code:
print("Standard Deviation:", np.std(data, ddof=1))
Formula:
Python Code:
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
print("IQR:", Q3 - Q1)
✅ 1. Types of Probability
Python Code:
favorable = 3
total = 10
print("Probability:", favorable / total)
✅ 5. Real-World Use Cases
✅ 1. Discrete Distributions
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom, norm
# Normal
x = np.linspace(-3, 3, 100)
plt.plot(x, norm.pdf(x, 0, 1))
plt.title('Normal Distribution (μ=0, σ=1)')
plt.show()
✅ 4. When to Use Which
Scenario Use
Count of events (e.g., calls per hour) Poisson
Repeated binary trials Binomial
Heights, weights Normal
Waiting times Exponential
When you take many random samples from any population (regardless
of its shape), the distribution of sample means will:
● Tend toward a Normal Distribution
import numpy as np
import matplotlib.pyplot as plt
for _ in range(1000):
sample = np.random.choice(population, size=50)
sample_means.append(np.mean(sample))
1. Define H₀ and H₁
2. Select significance level (α) – typically 0.05
3. Choose test type (z-test, t-test, etc.)
4. Calculate test statistic
5. Compare with critical value or p-value
6. Make decision: Reject or fail to reject H₀
✅ 3. Types of Errors
Type Meaning
Type I Error Rejecting H₀ when it's actually true (False Positive)
Type II Error Failing to reject H₀ when it's false (False Negative)
print("t-statistic:", t_stat)
print("p-value:", p_val)
✅ 6. Common Tests
✅ 2. One-Sample T-Test
Used when same subjects are tested before and after a treatment.
Use case: Pre-test vs Post-test scores
✅ 5. Assumptions of T-Test
Confused between Z-Test, T-Test, and ANOVA? You're not alone. These
statistical tests help determine if group differences are real or due to
chance—but each has its own use-case and assumptions.
✅ 1. Z-Test
Used when:
● Population standard deviation is known
✅ 2. T-Test
Used when:
● Population standard deviation is unknown
● Sample size is small (n < 30)
● Data is approximately normal
Covariance tells you how two variables vary together, but not how
strong the relationship is.
● Positive covariance → variables move in the same direction
import numpy as np
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
cov_matrix = np.cov(x, y)
print("Covariance Matrix:\n", cov_matrix)
correlation = np.corrcoef(x, y)
print("Correlation Matrix:\n", correlation)
✅ 3. Key Differences Table
🔸 Q1: Can two variables have high covariance but low correlation?
✅ Yes, if the units or scales are different, correlation may still be low.
import pandas as pd
from scipy.stats import chi2_contingency
# Sample contingency table
data = [[30, 10],
[20, 40]]
✅ 5. Interpretation
📌 Applications
The Central Limit Theorem (CLT) is one of the most fundamental ideas
in statistics and data science. It explains why normal distribution
appears so often, even when the data itself isn’t normal!
import numpy as np
import matplotlib.pyplot as plt
✅ 4. Key Terms
Term Meaning
Population Entire group
Sample Subset of the population
Sampling Distribution Distribution of sample statistics
✅ 5. CLT Assumptions
📌 Applications of CLT
● A/B testing
● Quality control in manufacturing
● Estimating population mean from sample
● Predictive analytics in ML workflows
💬 INTERVIEW QUESTIONS (Medium-High)
✅ 1. What is a t-Test?
✅ 3. Assumptions of t-Test
✅ 4. Python Examples
📌 One-Sample t-Test
✅ 5. Interpreting p-Value
p-value Interpretation
p < 0.05 Reject the null hypothesis (significant)
p ≥ 0.05 Fail to reject the null (not significant)
📌 Real-World Applications
✅ 1. What is ANOVA?
✅ 2. Types of ANOVA
✅ 3. Assumptions of ANOVA
✅ 5. Interpreting Results
Use Tukey’s HSD or Bonferroni test to find which pairs of groups differ
significantly.
📈 Real-World Applications
📌 Example:
Does gender influence product preference?
✅ 3. Assumptions of Chi-Square
import pandas as pd
from scipy.stats import chi2_contingency
# Contingency table
data = pd.DataFrame({
'Product A': [30, 10],
'Product B': [20, 40]
}, index=['Male', 'Female'])
✅ 5. Interpreting Results
📈 Real-World Applications
● Is gender associated with product preference?
● Does education level impact voting behavior?
● Do different ads get clicked by different age groups?
The law of large numbers states that as the size of a sample increases,
the sample mean will get closer to the population mean. This principle
underpins many statistical practices, ensuring that larger samples
provide more accurate estimates.
To assess normality:
● Visual Methods:
o Histogram: Should resemble a bell-shaped curve.
o Q-Q Plot: Data points should lie approximately along the
reference line.Learn R, Python & Data Science Online
● Statistical Tests:
o Shapiro-Wilk Test: Tests the null hypothesis that the data is
normally distributed.
o Kolmogorov-Smirnov Test: Compares the sample
distribution with a normal distribution.
4. How do you interpret the Area Under the ROC Curve (AUC-ROC)?
AUC-ROC measures a classifier's ability to distinguish between classes:
● AUC = 1: Perfect classification.
● AUC = 0.5: No discriminative ability (equivalent to random
guessing).
Higher AUC indicates better model performance.