Ace The Data Science Interview
Ace The Data Science Interview
Interview
I. Four Resume Principles to Live By for Data Scientists
v. In an interview, if other information is available and you are asked to calculate a probability,
you should always consider using Bayes’ rule (e.g., using of phrases like “given that”)
V. Probability
2. Law of total probability
i. Assume we have several disjoint events within B having occurred, then the probability of
event A happening is:
ii. can be decomposed into a weighted sum of conditional probabilities based on each possible
scenario having occurred.
iii. When asked to assess a probability involving a “tree of outcomes” upon which the probability
depends, be sure to remember this law.
V. Probability
3. Counting
i. “How many ways can five people sit around a lunch table?”
ii. “What is the likelihood that I draw four cards of the same suit?”
iii. Remember when selections does or doesn’t matter.
iv. If the order of selection of the n items being counted k at a time matters, then use:
vi. Real life applications of both: making up passwords (order matters) or choosing restaurants
nearby on a map (order doesn’t matter).
V. Probability
4. Random Variables (RV)
i. Is a quantity with an associated probability distribution. They can be discrete or continuous.
ii. If discrete, it has a PMF and can take particular values with a particular probability. If continuous, it has a PDF
and the probability of a particular value isn’t measurable. They must add up to one:
Discrete: ; Continuous:
iii. The CDF is often used instead of PMF/PDF. It’s non-negative and monotonically increasing, and its defined
as:
iv. Whenever asked about evaluating a RV, identify both its PDF and CDF.
ii. Examples: coin flips, user signups, any situation involving counting successful events of
binary outcomes
V. Probability
7. Poisson Distribution (Discrete)
i. Gives the probability of the number of events occurring within a particular fixed interval where
the known, constant rate of each event’s occurrence is . Its PMF is:
ii. Examples: number of visits to a website in a certain period of time, number of defects in a
square foot of fabric.
V. Probability
8. Uniform Distribution (Continuous)
i. Assumes a constant probability of an X falling between values on the interval a to b. Its PDF
is
ii. Examples: sampling (e.g., random number generation), hypothesis testing cases.
V. Probability
9. Exponential Distribution (Continuous)
i. Gives the probability of the interval length between events of a Poisson process having a set
rate parameter of . Its PMF is:
ii. Examples: many applications involve the normal distribution, largely due to (a) its natural fit
to many real-life occurrences, and (b) the Central Limit Theorem (CLT).
V. Probability
11. Markov Chains
i. It’s a process in which there is a finite set of states, and the probability of being in a particular
state is only dependent on the previous state. In other words, given the current state, the past
and future states it will occupy are conditionally independent.
ii. The probability of transitioning from state i to state j at any given time is given by a transition
matrix, denoted by P:
VI. Statistics
1. A