0% found this document useful (0 votes)
39 views5 pages

Day 3 Statistics Interview QnA

Uploaded by

spandushetty28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views5 pages

Day 3 Statistics Interview QnA

Uploaded by

spandushetty28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

### Descriptive Statistics

What is the mean of the dataset: 3, 7, 8, 10, 12?


The mean is (3 + 7 + 8 + 10 + 12) / 5 = 8.

How do you calculate the median of a dataset?


Sort the data, and the median is the middle value (or the average of the two middle values if
the dataset has an even number of observations).

What is the mode of the dataset: 4, 1, 2, 4, 3, 4, 5?


The mode is 4, as it appears most frequently.

What is standard deviation?


Standard deviation measures the amount of variation or dispersion in a set of values.

Define variance.
Variance is the average of the squared differences from the mean.

What is a percentile?
A percentile indicates the relative standing of a value within a dataset, showing the
percentage of observations below it.

How is the interquartile range (IQR) calculated?


IQR is the difference between the first quartile (Q1) and the third quartile (Q3).

What does a box plot represent?


A box plot shows the distribution of data based on five summary statistics: minimum, first
quartile, median, third quartile, and maximum.

What is a skewed distribution?


A skewed distribution is one where values are not symmetrically distributed around the mean,
often with a tail on one side.

How do you identify outliers in a dataset?


Outliers can be identified using IQR: values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR.

### Inferential Statistics

What is a hypothesis test?


A hypothesis test is a method to determine if there is enough evidence to reject a null
hypothesis.

What is a p-value?
The p-value indicates the probability of observing the test results under the null hypothesis.
What does a confidence interval represent?
A confidence interval estimates a range of values that likely contains the population
parameter.

What is the difference between Type I and Type II errors?


Type I error is rejecting a true null hypothesis; Type II error is failing to reject a false null
hypothesis.

Explain what a t-test is.


A t-test compares the means of two groups to determine if they are statistically different from
each other.

What is ANOVA used for?


ANOVA (Analysis of Variance) is used to compare means among three or more groups.

What is the central limit theorem?


The central limit theorem states that the sampling distribution of the sample mean
approaches a normal distribution as sample size increases.

Define correlation.
Correlation measures the strength and direction of a linear relationship between two
variables.

What is a chi-square test?


A chi-square test assesses how expectations compare to actual observed data in categorical
variables.

What is a Z-score?
A Z-score indicates how many standard deviations an element is from the mean.

### Probability

What is the difference between independent and dependent events?


Independent events do not affect each other's probabilities; dependent events do.

Define conditional probability.


Conditional probability is the probability of an event occurring given that another event has
already occurred.

What is a probability distribution?


A probability distribution describes how probabilities are distributed over the values of a
random variable.
What is a normal distribution?
A normal distribution is a continuous probability distribution characterized by a symmetric bell
shape.

What is the law of large numbers?


The law of large numbers states that as a sample size increases, the sample mean will
converge to the population mean.

What is a Bernoulli trial?


A Bernoulli trial is an experiment or process that results in a binary outcome: success or
failure.

Define joint probability.


Joint probability is the probability of two events happening at the same time.

What is the difference between discrete and continuous random variables?


Discrete random variables take on countable values; continuous random variables can take
on any value within a range.

Explain Bayes’ theorem.


Bayes’ theorem describes how to update the probability of a hypothesis based on new
evidence.

What is the expected value?


The expected value is the average outcome of a random variable when an experiment is
repeated many times.

### Regression Analysis

What is linear regression?


Linear regression models the relationship between a dependent variable and one or more
independent variables by fitting a linear equation.

What does R-squared represent?


R-squared indicates the proportion of variance in the dependent variable that can be
explained by the independent variables.

What is multicollinearity?
Multicollinearity occurs when two or more independent variables in a regression model are
highly correlated.

How do you interpret coefficients in a regression model?


Coefficients indicate the change in the dependent variable for a one-unit change in the
independent variable, holding other variables constant.
What is logistic regression used for?
Logistic regression is used to model binary outcome variables.

What is overfitting in a model?


Overfitting occurs when a model learns the noise in the training data rather than the
underlying pattern.

Explain the concept of residuals.


Residuals are the differences between observed values and predicted values from a
regression model.

What is a regression assumption?


Regression assumptions are the conditions that must be met for regression results to be
valid (e.g., linearity, independence, homoscedasticity).

What is the difference between simple and multiple regression?


Simple regression involves one independent variable, while multiple regression involves two
or more independent variables.

What is the purpose of a scatter plot in regression analysis?


A scatter plot visually shows the relationship between two variables, helping to identify
potential correlations.

### Advanced Topics

What is time series analysis?


Time series analysis involves statistical techniques to analyze time-ordered data points.

Explain what a confounding variable is.


A confounding variable is an outside influence that affects both the independent and
dependent variables, potentially misleading results.

What is cross-validation?
Cross-validation is a technique for assessing how the results of a statistical analysis will
generalize to an independent dataset.

Define non-parametric tests.


Non-parametric tests are statistical tests that do not assume a specific distribution for the
data.

What is the difference between a sample and a population?


A population includes all members of a defined group, while a sample is a subset of the
population.
What is sampling bias?
Sampling bias occurs when the sample is not representative of the population, leading to
incorrect conclusions.

Explain the concept of power in hypothesis testing.


Power is the probability that a test correctly rejects a false null hypothesis.

What is bootstrapping?
Bootstrapping is a resampling technique used to estimate the distribution of a statistic by
repeatedly sampling with replacement.

What is a survival analysis?


Survival analysis is used to analyze the time until an event occurs, often used in medical
research.

What is a control chart?


A control chart is a statistical tool used to monitor and control a process over time.

### Application and Interpretation

How can you visualize data distribution?


Data distribution can be visualized using histograms, box plots, or density plots.

What is the importance of data cleaning?


Data cleaning ensures accuracy and consistency, leading to valid analysis results.

What is A/B testing?


A/B testing compares two versions of a variable to determine which one performs better.

You might also like