Interview Questions
Interview Questions
SCIENTIST INTERVIEW
1. Statistical Analysis
1. What is hypothesis testing?
○ Answer: Hypothesis testing is a method used to decide if there is enough
evidence to reject a null hypothesis. It involves setting up two hypotheses
(null and alternative), calculating a test statistic, and comparing it against a
threshold (p-value) to determine the result.
3. What is a z-test?
○ Answer: A z-test is used to determine if there is a significant difference
between sample and population means. It is often used when the sample size
is large (n > 30) and the population variance is known.
14. How does the Naive Bayes classifier handle continuous data?
○ Answer: For continuous data, Naive Bayes typically assumes a Gaussian
distribution and uses the probability density function to estimate the likelihood
of the data given a class.
15. What is the law of total probability?
○ Answer: The law of total probability states that the total probability of an
outcome can be found by considering all possible ways in which the outcome
can occur. It is expressed as:
16. Explain the difference between discrete and continuous random variables.
○ Answer: Discrete random variables take on a finite or countable number of
values (e.g., number of heads in coin tosses), while continuous random
variables take on an infinite number of values within a range (e.g., heights of
people).
17. What is the difference between joint probability and marginal probability?
○ Answer: Joint probability is the probability of two events occurring together,
while marginal probability is the probability of a single event occurring,
irrespective of other events.
3. Supervised Learning
21. What is supervised learning?
○ Answer: Supervised learning is a type of machine learning where the model
is trained on labeled data, the model learns to map input features to output
labels.
Image Source: Mathworks
38. What is the difference between batch and stochastic gradient descent?
○ Answer: Batch gradient descent updates the model parameters using the
entire training dataset, while stochastic gradient descent updates the
parameters using one sample at a time, providing faster but noisier updates.
71. What is the difference between rule-based and machine learning-based NLP
methods?
○ Answer: Rule-based methods use handcrafted linguistic rules to process text,
while machine learning-based methods use statistical models to learn
patterns from labeled data.
75. What is the difference between generative and discriminative models in NLP?
○ Answer: Generative models learn the joint probability distribution of input and
output and can generate new data, while discriminative models learn the
conditional probability distribution of the output given the input and are used
for classification tasks.