Data Index
Data Index
Table of Contents
STATISTICS ........................................................................................................................................................... 6
Q1. WHAT IS THE CENTRAL LIMIT THEOREM AND WHY IS IT IMPORTANT? ........................................................................ 6
Q2. WHAT IS SAMPLING? HOW MANY SAMPLING METHODS DO YOU KNOW? ................................................................... 7
Q3. WHAT IS THE DIFFERENCE BETWEEN TYPE I VS TYPE II ERROR? .................................................................................. 9
Q4. WHAT IS LINEAR REGRESSION? WHAT DO THE TERMS P-VALUE, COEFFICIENT, AND R-SQUARED VALUE MEAN? WHAT IS THE
SIGNIFICANCE OF EACH OF THESE COMPONENTS? ................................................................................................................. 9
Q5. WHAT ARE THE ASSUMPTIONS REQUIRED FOR LINEAR REGRESSION? ........................................................................ 10
Q6. WHAT IS A STATISTICAL INTERACTION? .............................................................................................................. 10
Q7. WHAT IS SELECTION BIAS? .............................................................................................................................. 11
Q8. WHAT IS AN EXAMPLE OF A DATA SET WITH A NON-GAUSSIAN DISTRIBUTION? .......................................................... 11
DATA SCIENCE .................................................................................................................................................... 12
Q1. WHAT IS DATA SCIENCE? LIST THE DIFFERENCES BETWEEN SUPERVISED AND UNSUPERVISED LEARNING. ......................... 12
Q2. WHAT IS SELECTION BIAS? ............................................................................................................................. 12
Q3. WHAT IS BIAS-VARIANCE TRADE-OFF? ............................................................................................................... 12
Q4. WHAT IS A CONFUSION MATRIX? ..................................................................................................................... 13
Q5. WHAT IS THE DIFFERENCE BETWEEN “LONG” AND “WIDE” FORMAT DATA?............................................................... 14
Q6. WHAT DO YOU UNDERSTAND BY THE TERM NORMAL DISTRIBUTION? ...................................................................... 15
Q7. WHAT IS CORRELATION AND COVARIANCE IN STATISTICS?...................................................................................... 15
Q8. WHAT IS THE DIFFERENCE BETWEEN POINT ESTIMATES AND CONFIDENCE INTERVAL? ................................................. 16
Q9. WHAT IS THE GOAL OF A/B TESTING? ............................................................................................................... 16
Q10. WHAT IS P-VALUE? ....................................................................................................................................... 16
Q11. IN ANY 15-MINUTE INTERVAL, THERE IS A 20% PROBABILITY THAT YOU WILL SEE AT LEAST ONE SHOOTING STAR. WHAT IS THE
PROBABILITY THAT YOU SEE AT LEAST ONE SHOOTING STAR IN THE PERIOD OF AN HOUR? ........................................................... 16
Q12. HOW CAN YOU GENERATE A RANDOM NUMBER BETWEEN 1 – 7 WITH ONLY A DIE? .................................................... 17
Q13. A CERTAIN COUPLE TELLS YOU THAT THEY HAVE TWO CHILDREN, AT LEAST ONE OF WHICH IS A GIRL. WHAT IS THE
PROBABILITY THAT THEY HAVE TWO GIRLS? ....................................................................................................................... 17
Q14. A JAR HAS 1000 COINS, OF WHICH 999 ARE FAIR AND 1 IS DOUBLE HEADED. PICK A COIN AT RANDOM AND TOSS IT 10
TIMES. GIVEN THAT YOU SEE 10 HEADS, WHAT IS THE PROBABILITY THAT THE NEXT TOSS OF THAT COIN IS ALSO A HEAD? ................. 17
Q15. WHAT DO YOU UNDERSTAND BY STATISTICAL POWER OF SENSITIVITY AND HOW DO YOU CALCULATE IT? ......................... 18
Q16. WHY IS RE-SAMPLING DONE? ......................................................................................................................... 18
Q17. WHAT ARE THE DIFFERENCES BETWEEN OVER-FITTING AND UNDER-FITTING? ............................................................ 19
Q18. HOW TO COMBAT OVERFITTING AND UNDERFITTING? ......................................................................................... 19
Q19. WHAT IS REGULARIZATION? WHY IS IT USEFUL? .................................................................................................. 20
Q20. WHAT IS THE LAW OF LARGE NUMBERS? .......................................................................................................... 20
Q21. WHAT ARE CONFOUNDING VARIABLES? ........................................................................................................... 20
Q22. WHAT ARE THE TYPES OF BIASES THAT CAN OCCUR DURING SAMPLING? ............................................................... 20
Q23. WHAT IS SURVIVORSHIP BIAS? ........................................................................................................................ 20
Q24. WHAT IS SELECTION BIAS? WHAT IS UNDER COVERAGE BIAS? ............................................................................... 21
Q25. EXPLAIN HOW A ROC CURVE WORKS? .............................................................................................................. 21
Q26. WHAT IS TF/IDF VECTORIZATION? .................................................................................................................. 22
Q27. WHY WE GENERALLY USE SOFT-MAX (OR SIGMOID) NON-LINEARITY FUNCTION AS LAST OPERATION IN-NETWORK? WHY
RELU IN AN INNER LAYER?............................................................................................................................................ 22
DATA ANALYSIS.................................................................................................................................................. 23
Q1. PYTHON OR R – WHICH ONE WOULD YOU PREFER FOR TEXT ANALYTICS? ................................................................. 23
Q2. HOW DOES DATA CLEANING PLAY A VITAL ROLE IN THE ANALYSIS? ........................................................................... 23
Q3. DIFFERENTIATE BETWEEN UNIVARIATE, BIVARIATE AND MULTIVARIATE ANALYSIS........................................................ 23
Q4. EXPLAIN STAR SCHEMA. ................................................................................................................................. 23
Q5. WHAT IS CLUSTER SAMPLING? ........................................................................................................................ 23
Steve Nouri