Google Data Science Interview Questions
Google Data Science Interview Questions
DATA SCIENCE
INTERVIEW
QUESTINOS
WHAT ARE THE ASSUMPTIONS OF ERROR IN LINEAR REGRESSION
Independence of Errors - The error terms should be
independent of each other. This means that there should be
no correlation between consecutive errors (no
autocorrelation). This assumption is often tested using the
Durbin-Watson test in time series data.
@karunt
WHAT IS THE FUNCTION OF P-VALUES IN HIGH DIMENSIONAL LINEAR REGRESSION?
P-values are used to test the null hypothesis that a
specific regression coefficient (for a predictor) is
zero. A low p-value suggests that the predictor is
statistically significant, meaning it likely has an
effect on the response variable.
@karunt
LET’S SAY YOU HAVE A CATEGORICAL VARIABLE WITH THOUSANDS OF
DISTINCT VALUES, HOW WOULD YOU ENCODE IT?
Leave-One-Out Encoding A variation of target
encoding, leave-one-out encoding, computes the
target mean for each category, but excludes the
current observation to avoid target leakage.
Pros: Reduces target leakage, works well with high-
cardinality features.
Cons: Computationally more expensive than simple
target encoding.
@karunt
DESCRIBE TO ME HOW PCA WORKS
PCA is a dimensionality reduction technique used if
you think you have correlated features, noisy data, or
to visualize data in fewer dimensions.
@karunt
WAS THIS HELPFUL?
Be sure to save it so you
can come back to it later!
@karunt