0% found this document useful (0 votes)

132 views

Interview Questions

Logistic regression is a machine learning classification algorithm that predicts the probability of a categorical dependent variable. It has been used recently to predict whether customers will click on an ad based on user profile data. Some key aspects of logistic regression include that it assumes each independent variable independently and equally contributes to the outcome, and that it outputs the likelihood of the dependent variable being 1 rather than 0. Regularization and kernel parameters can be tuned when implementing logistic regression in R using packages like kernlab.

Uploaded by

Raja Ram Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views

Interview Questions

Uploaded by

Raja Ram Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

SN Questions (DATASCIENCE WDM19022020) Rate

1 Data Science Interview Question

What is the assumption of the Naive Bayes Classifier?

Ans:-The fundamental assumption is that each independent variable independently and

equally contributes to the outcome.
2 What is SVM?

Ans:-Here we plot each data point in n-dimensional space with the value of each dimension
being the value of a particular coordinate. Then, we perform classification by finding the
hyper-plane that differentiate the classes very well
3 What are the tuning parameters of SVM?

Ans:-Kernel, Regularization, Gamma and Margin are the tuning parameters of SVM
4 Explain Kernel in SVM?

Ans:-Kernel tricks are nothing but the transformations applied on input variables that
separate non-separable data to separable data. There are 9 different kernel tricks. Examples
are Linear, RBF, Polynomial, etc.
5 Is there a need to convert categorical variables into numeric in SVM? If yes, explain.
Ans:-All the categorical variables have to be converted to numeric by creating dummy
variables, as all the data points have to be plotted on n-dimensional space, in addition to
this we have tuning parameters like Kernel, Regularization, Gamma & Margin which are
mathematical computations that require numeric variables. This is an assumption of SVM.
6 What is Regularization in SVM?
Ans:-The value of the Regularization parameter tells the training model as to how much it
can avoid misclassifying each training observation.
7 What is the Gamma parameter in SVM?
Ans:-Gamma is the kernel coefficient in the kernel tricks RBF, Polynomial, & Sigmoid. Higher
values of Gamma will make the model more complex and overfits the model.
8 What is the SVM package used for SVM in R?
Ans:-kernlab is the package used in R for implementing SVM in R
9 What is the function name to implement SVM in R?
Ans:- ksvm is the function in R to implement SVM in R
10 What is a decision tree?
Ans:-Decision Tree is a supervised machine learning algorithm used for classification and
regression analysis. It is a tree-like structure in which an internal node represents a test on
an attribute, each branch represents the outcome of the test and each leaf node represents
class label.
11 What are the rules in a decision tree?
Ans: A path from the root node to a leaf node represents classification rules
12 Explain different types of nodes in nodes in the decision tree and how are they selected?
Ans:-We have Root Node, Internal Node, Leaf Node in a decision tree. Decision Tree starts
at the Root Node, this is the first node of the decision tree. Data set is split based on Root
Node, again nodes are selected to further split the already splitted data. This process of
splitting the data goes on till we get leaf nodes, which are nothing but the classification
labels. The process of selecting Root Nodes and Internal Nodes is done using the statistical
measure called as Gain
13 What do you mean by impurity in Decision Tree?
Ans:-We say a data set is pure or homogenous if all of it's class labels is the same and
impure or heterogenous if the class labels are different. Entropy or Gini Index or
Classification Error can be used to measure the impurity of the data set.
14 What is Pruning in Decision Tree?
Ans:-The process of removal of sub-nodes which contribute less power to the decision tree
model is called as Pruning.
15 What is the advantage of Pruning?
Ans:-Pruning reduces the complexity of the model which in turn reduces the overfitting
problem of Decision Tree. There are two strategies in Pruning. Propruning - discard
unreliable parts from the fully grown tree, Prepruning - stop growing a branch when the
information becomes unreliable. Post pruning is the preferred one.
16 What is the difference between Entropy and Information Gain?
Ans:-Entropy is a probabilistic measure of uncertainty or impurity whereas Information Gain
is the reduction of this uncertainty measure.
17 Explain the expression of Gain (of any column)?
Ans:-Gain for any column is calculated by differencing Information Gain of a dataset with
respect to a variable from the Information Gain of the entire dataset i.e., Gain(Age) =
Info(D) - Info(D wrt Age)
18 What is the package required to implement Decision Tree in R?
Ans:-C50 and tree packages can be used to implement a decision tree algorithm in R.
19 What is a Random Forest?
Answer
Random Forest is an Ensemble Classifier. As opposed to building a single decision tree,
random forest builds many decision trees and combines the output of all the decision trees
to give a stable output.
How does Random Forest adds randomness and build a better model?
Answer
Instead of searching for the most important feature while splitting a node, it searches for
the best feature among a random subset of features. This results in a wide diversity that
generally results in a better model. Additional randomness can be added by using random
thresholds for each feature rather than searching for the best possible thresholds (like a
normal decision tree does).
What are the pros of using Random Forest?
Answer
Random Forest won't overfit the model, it is unexcelled in reliable accuracy, works very
well on large data sets, can handle thousands of input variables without deletion, outputs
significance of input variables, handles outliers and missing values very well
What is the limitation of Random Forest?
Answer
The main limitation of Random Forest is that a large number of trees can make the
algorithm to slow and ineffective for real-time predictions. In most real-world applications
the random forest algorithm is fast enough, but there can certainly be situations where run-
time performance is important and other approaches would be preferred.
What is a Neural Network?
Answer
Neural Network is a supervised machine learning algorithm that is inspired by the
human nervous system and it replicates the similar to how the human brain is trained. It
consists of Input Layers, Hidden Layers, & Output Layers.
What are the various types of Neural Networks?
Answer
Artificial Neural Network, Recurrent Neural Networks, Convolutional Neural Networks,
Boltzmann Machine Networks, Hopfield Networks are examples of the Neural Networks.
There are a few other types as well.
What is the use of activation functions in neural network?
Answer
The activation function is used to convert an input signal of a node in an A-NN to an
output signal. That output signal now is used as an input in the next layer in the stack.
What are the different types of activation functions in neural network?
Answer
Sigmoid or Logistic, Tanh or Hyperbolic tangent, ReLu or Rectified Linear units are
examples of activation functions in neural network
What is the package name to implement a neural network in R?
Answer
neuralnet package can be used to implement a neural network in R
Which among the following prevents overfitting when we perform bagging?
A: The use of sampling with replacement as the sampling technique
B: The use of weak classifiers
C: The use of classification algorithms which are not prone to overfitting
D: The practice of validation performed on every classifier trained
Answer: B
Explanation: The presence of over-training (which leads to overfitting) is not generally a
problem with weak classifiers. For example, in decision trees with only one node (the root
node), there is no real scope for overfitting. This helps the classifier which combines the
outputs of weak classifiers in avoiding overfitting.
Sum of weights of the principal component in PCA analysis is
A) <1
B) 1
C) >1
D) None of the above
Answer: B
Which of the following testing is concerned with making decisions using data?
a)Probability
b)Hypothesis
c)Causal
d)None of the above
Answer: B
Explanation: The null hypothesis is assumed true and statistical evidence is required to
reject it in favor of a research or alternative hypothesis.
Which of the following combination is correct?
A: Continuous – euclidean distance
B: Continuous – correlation similarity
C: Binary –Jaquard’s coefficient
D: All the above
Answer: D
Explanation: You should choose a distance/similarity that makes sense for your problem.
Which of the following is the correct use of cross-validation?
A: Selecting variables to include in a model
B: Comparing predictors
C: Selecting parameters in the prediction function
D: All of the Mentioned
Answer: D
Explanation: Cross-validation is also used to pick type of prediction function to be used.
Why data cleaning plays a vital role in analysis?
Answer
Cleaning data from multiple sources to transform it into a format that data analysts or
data scientists can work with is a cumbersome process because - as the number of data
sources increases, the time take to clean the data increases exponentially due to the
number of sources and the volume of data generated in these sources. It might take up to
80% of the time for just cleaning data making it a critical part of analysis task.
What are Recommender Systems?
Answer
A subclass of information filtering systems that are meant to predict the preferences or
ratings that a user would give to a product. Recommender systems are widely used in
movies, news, research articles, products, social tags, music, etc.
What is logistic regression? Or State an example when you have used logistic regression
recently?
Answer
Logistic Regression often referred to as the logit model is a technique to predict the binary
outcome from a linear combination of predictor variables. For example, if you want to
predict whether a particular political leader will win the election or not. In this case, the
outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be
the amount of money spent on election campaigning of a particular candidate, the amount
of time spent in campaigning, etc.

What is the function to compute accuracy of a classifier?

Answer :-
mean() function can be used to compute the accuracy. Within parenthesis actual labels
have to compared with predicted labels
Functions to find row & column count in R?

Answer :-
dim() function or nrow() & col() can be used to find row & column count
What is the function to perform simple random sampling?

Answer :-
sample() is the function in R to employ Simple Random Sampling
What is Joint probability?

Answer :-
It is the probability of two events occuring at the same time. Classical example is probability
of an email being spam wih the word lottery in it.Here the events are email being spam and
email having the word lottery
What is Probability?

Answer :-
Probability is given by Number of interested events/Total number of events
Can we represent the output of a classifer having more than two levels using a confusion
matrix?

Answer :-
We cannot use confusion matix when we have more than two levels in the output variable.
Instead, we can use crosstable() function from gmodels package
Difference between lapply & sapply function?

Answer :-
lapply returns the ouput as a list whereas sapply returns the ouput as a vector, matrix or
array.
What is the r function to know the persentae of observations for the levels of a variable?`

Answer :-
prop.table() employed on top of table() function i.e., prop.table(table()) is the r function. It
can be also be employed on any variable but it makes sense to employ on a factor variable.
What is the r function to know the number of observations for the levels of a variable?`

Answer :-
table() is the r function. It can be also be employed on any variable but it makes sense to
employ on a factor variable.
Function in R to employ KNN?
Answer :-
knn() can be used from the class package
How do we choose the value of K in KNN algorithm?

Answer :-
K value can be selected using sqrt(no. of obs/2), kselection package, scree plot, k fold cross
validation
Why is KNN called as Lazy Algorithm?

Answer :-
There is no or minimal training phase because of which training phase is pretty fast. Here
the training data is used during the testing phase.
Why is KNN called as non-parametric algorithm?

Answer :-
KNN makes no assumptions about the underlying data (unlike other algorithms, eg. Linear
Regression)
What is the use of set.seed() function ?

Answer :-
set.seed() function is to reproduce same results if the code is re-run again. Any number can
be given within the paranthesis
How to interpret clusterig output?

Answer :-
After computing optimal clusters, aggregate measure like mean has to be computed on all
variables and then resultant values for all the variables have to be interpreted among the
clusters
How do we decide upon the number of clusters in hierarchial clustering?

Answer :-
In Hierarchial Clustering number of clusters will be decided only after looking at the
dendrogram.
What are linkages in hierarchical clustering?

Answer :-
Linkage is the criteria based on which distances between two clusters is computed. Single,
Complete, Average are few of the examples for linkages
Single - The distance between two clusters is defined as the shortest distance between two
points in each cluster.
Complete - The distance between two clusters is defined as the longest distance between
two points in each cluster.
Average - the distance between two clusters is defined as the average distance between
each point in one cluster to every point in the other cluster.
Packages to read excel files in R?

Answer :-
readxl or xlsx packages can be used to read excel files in R
What is str() command why is it required to run it?

Answer :-
str() command gives dimensions for your data drame. In addition to this it gives, class of the
dataset & class of every variable
What does summary() command gives?
Answer :-
summary() command gives the distribution for numerical variables and proportion of
observations for factor variables
What is the range of variable when ((x - min(X))/(max(X) - min(X)) normalization technique
is employed?

Answer :-
0 to 1 is the range for this normalizaion technique
What is the range of Z transformed variable?

Answer :-
Theoretically it will be between - infinity to + inifinity but normally you have values between
-3 to +3
Is normalization of data required before applying clustering?

Answer :-
It would be better if we employ clustering on normalized data as you will get different
results for with and without normalization
Example of clustering?

Answer :-
Using variables like income, education, profession, age, number of children, etc you come
with different clusters and each cluster has people with similar socio-economic criteria
In which domains can we employ clustering?

Answer :-
None of your data science topics are domain specific. They can be employed in any domain,
provided data is available.
When can you say that resultant clusters are good?

Answer :-
When the clusters are as much heterogenous as possible and when the observations within
each cluster are as much homogeenous as possible.
Why is hierarchial clustering called as Agglomerative clustering?

Answer :-
It is because of bottom up approach, where initially each observation is considered to be a
single cluster and gradually based on the distance measure inidividual clusters will be paired
and finally merged as one
Examples of Supervised Machine Learning

Answer :-
KNN, Naive Bayes, SVM, Decision Tree, Random Forest, Neural Network
Examples of Unsupervised Machine Learning

Answer :-
Segmentation, PCA, SVD, Market Basket Analysis, Recommender Systems
What is Classification Modeling?

Answer :-
Classification Models are employed when the observations have to be classified in
categories and not predicted.
Examples being Cancerous and Non-cancerous tumor (2 categories), Bus, Rail, Car, Carpool
(>2 categories)
What is Unsupervised Machine Learning?
Answer :-
In this category of Machine Learning, there won’t be any output variable to be either
predicted or classified. Instead the algorithm understands the patterns in the data.
Examples: Segmentation, PCA, SVD, Market Basket Analysis, Recommender Systems.
What is Supervised Machine Learning?

Answer :-
Supervised Machine Learning will be employed for the problem statements where in output
variable (Nominal) of interest can be either classified or predicted.
Examples: KNN, Naive Bayes, SVM, Decision Tree, Random Forest, Neural Network
What is Machine Learning?

Answer :-
Machine learning is the science of getting computers to act without being explicitly
programmed. Machine learning has given us self-driving cars, practical speech recognition,
effective web search, and a vastly improved understanding of the human genome. It is so
widespread that unknowingly we use it many a times in our daily life.
Differentiate between univariate, bivariate and multivariate analysis.

Answer :-
These are descriptive statistical analysis techniques which can be differentiated based on
the number of variables involved at a given point of time. For example, the pie charts of
sales based on territory involve only one variable and can be referred to as univariate
analysis.
If the analysis attempts to understand the difference between 2 variables at time as in a
scatterplot, then it is referred to as bivariate analysis. For example, analysing the volume of
sale and a spending can be considered as an example of bivariate analysis.
Analysis that deals with the study of more than two variables to understand the effect of
variables on the responses is referred to as multivariate analysis.

ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
HUAWEI Final Written Exam 3333
50% (2)
HUAWEI Final Written Exam 3333
13 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Huawei Final Written Exam
50% (2)
Huawei Final Written Exam
18 pages
Data Science Interview Quesions
No ratings yet
Data Science Interview Quesions
22 pages
Data Mining For Intelligence
No ratings yet
Data Mining For Intelligence
4 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
Machine Learning Viva Questions
No ratings yet
Machine Learning Viva Questions
6 pages
Answer 2023-24
No ratings yet
Answer 2023-24
19 pages
Data Science Intervieew Questions
100% (1)
Data Science Intervieew Questions
16 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
65 pages
Data_Science__1731953513
No ratings yet
Data_Science__1731953513
33 pages
Important Questions
No ratings yet
Important Questions
18 pages
I Am Sharing 'Interview' With You
100% (3)
I Am Sharing 'Interview' With You
65 pages
Interview Question for Data science
No ratings yet
Interview Question for Data science
33 pages
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
mid 2
No ratings yet
mid 2
10 pages
15 Mlops Interview Questions for 2025
No ratings yet
15 Mlops Interview Questions for 2025
13 pages
SEC III Artificial Intelligence Question Bank
No ratings yet
SEC III Artificial Intelligence Question Bank
86 pages
ai ml unit 3
No ratings yet
ai ml unit 3
15 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
DL Unit 1
No ratings yet
DL Unit 1
20 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
Machine_Learning_One_Mark_Answers
No ratings yet
Machine_Learning_One_Mark_Answers
4 pages
d3 PDF
No ratings yet
d3 PDF
7 pages
Question Bank
No ratings yet
Question Bank
5 pages
ML 2 marks
No ratings yet
ML 2 marks
7 pages
machine_learning_units_1_to_5_bolded_questions
No ratings yet
machine_learning_units_1_to_5_bolded_questions
19 pages
AIML-QB- UNIT 3
No ratings yet
AIML-QB- UNIT 3
6 pages
practice_paper_2
No ratings yet
practice_paper_2
10 pages
UNIT 3.docx
No ratings yet
UNIT 3.docx
19 pages
MCQs Dumps 2
No ratings yet
MCQs Dumps 2
15 pages
Quiz 4 5 6
No ratings yet
Quiz 4 5 6
11 pages
ML QA
No ratings yet
ML QA
10 pages
MLT, Two Marks
No ratings yet
MLT, Two Marks
19 pages
Machine learning assingiment
No ratings yet
Machine learning assingiment
20 pages
Review Questions DS
No ratings yet
Review Questions DS
14 pages
Aam Ut-1 Qb Ans [Final]
No ratings yet
Aam Ut-1 Qb Ans [Final]
26 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Unit IV Naïve Bayes and Support Vector Machine
No ratings yet
Unit IV Naïve Bayes and Support Vector Machine
22 pages
AAM UT-1 QB ANS
No ratings yet
AAM UT-1 QB ANS
12 pages
Lecture 3 Mcqs
No ratings yet
Lecture 3 Mcqs
7 pages
ML QB Ans
No ratings yet
ML QB Ans
48 pages
MCQ Unit Wise ML (ROE083) Que Bank With Ans.
100% (4)
MCQ Unit Wise ML (ROE083) Que Bank With Ans.
22 pages
practice_paper_4
No ratings yet
practice_paper_4
9 pages
QUESTION BANK
No ratings yet
QUESTION BANK
67 pages
Primer On Major Data Mining Algorithms
No ratings yet
Primer On Major Data Mining Algorithms
86 pages
ML Suggestion 2
No ratings yet
ML Suggestion 2
11 pages
40 ML Interview Questions
No ratings yet
40 ML Interview Questions
12 pages
examBD2223 January Solutions
No ratings yet
examBD2223 January Solutions
7 pages
ML Interview Ques
No ratings yet
ML Interview Ques
12 pages
60 EC 604 Machine LearningTechniques Model Questions
No ratings yet
60 EC 604 Machine LearningTechniques Model Questions
3 pages
Questions For ML - Built A Thon
No ratings yet
Questions For ML - Built A Thon
7 pages
159
No ratings yet
159
9 pages
40 ML Interview Questions That You Must Know (2024) - Reader View
No ratings yet
40 ML Interview Questions That You Must Know (2024) - Reader View
13 pages
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Set 4
25% (8)
Set 4
2 pages
Set 4
25% (8)
Set 4
2 pages
Set+2 Normal+Distribution+Functions+of+random+variables+
92% (13)
Set+2 Normal+Distribution+Functions+of+random+variables+
3 pages
Set 5 (Assignment) (Basic Statistics 2)
100% (1)
Set 5 (Assignment) (Basic Statistics 2)
1 page
It Is Recommended Sample Size Is Greater or Equal Than 30. Lower The Sample Size Higher Chance of Wrong and Also Value of Confidence
100% (3)
It Is Recommended Sample Size Is Greater or Equal Than 30. Lower The Sample Size Higher Chance of Wrong and Also Value of Confidence
3 pages
Topics: Confidence Intervals
92% (13)
Topics: Confidence Intervals
4 pages
Practical Implementation of Random Forest-Based Mineral
No ratings yet
Practical Implementation of Random Forest-Based Mineral
17 pages
Quantifying and Analyzing The Performance of Cricket Player Using Machine Learning
No ratings yet
Quantifying and Analyzing The Performance of Cricket Player Using Machine Learning
7 pages
Intro To Pytorch
No ratings yet
Intro To Pytorch
12 pages
Ccs355 Neural Networks and Deep Learning Unit1 (1)
No ratings yet
Ccs355 Neural Networks and Deep Learning Unit1 (1)
29 pages
Machine Learning 1.4.19
No ratings yet
Machine Learning 1.4.19
23 pages
MBA Data Mining Unit 1 Notes
No ratings yet
MBA Data Mining Unit 1 Notes
12 pages
CNN Plant Disease Detection copy
No ratings yet
CNN Plant Disease Detection copy
21 pages
Unit 1 AAM
No ratings yet
Unit 1 AAM
16 pages
IEEE_Conference_Template__2_ (3)
No ratings yet
IEEE_Conference_Template__2_ (3)
5 pages
UNIT1
No ratings yet
UNIT1
38 pages
Python for Artificial Intelligence Programming
No ratings yet
Python for Artificial Intelligence Programming
106 pages
UNIT-1: 1. What Is Machine Learning?
No ratings yet
UNIT-1: 1. What Is Machine Learning?
130 pages
L09-An Introduction To Machine Learning
No ratings yet
L09-An Introduction To Machine Learning
65 pages
s12145-025-01816-x
No ratings yet
s12145-025-01816-x
23 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Design and Implementation of Different Machine Learning Algorithms for Credit Card Fraud Detection
No ratings yet
Design and Implementation of Different Machine Learning Algorithms for Credit Card Fraud Detection
6 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
24 pages
INT423 Roll.17
No ratings yet
INT423 Roll.17
9 pages
Data Science Interview Questions in IT
No ratings yet
Data Science Interview Questions in IT
16 pages
7898 Catboost Unbiased Boosting With Categorical Features
No ratings yet
7898 Catboost Unbiased Boosting With Categorical Features
11 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Train, Test, Validation Split
No ratings yet
Train, Test, Validation Split
9 pages
House Price Prediction
No ratings yet
House Price Prediction
59 pages
Data!
No ratings yet
Data!
19 pages
Machine Learning Report
No ratings yet
Machine Learning Report
16 pages
DSML
No ratings yet
DSML
510 pages
Approaching (Almost) Any Machine Learning Problem
100% (1)
Approaching (Almost) Any Machine Learning Problem
300 pages
AI in Neurosurgery A Systematic Review
No ratings yet
AI in Neurosurgery A Systematic Review
12 pages
Module 3
No ratings yet
Module 3
102 pages
Breast cancer prediction project
No ratings yet
Breast cancer prediction project
33 pages

Interview Questions

Uploaded by

Interview Questions

Uploaded by

SN Questions (DATASCIENCE WDM19022020) Rate

1 Data Science Interview Question

Ans:-The fundamental assumption is that each independent variable independently and

What is the function to compute accuracy of a classifier?

You might also like