Untitled
Untitled
Course – 2
Section – Logistic Regression
Q.1
Ans - B
Q.2
Ans – 0.5
Ans -
Q.4
Ans – C
Q.5
Ans – B
Q.6
Ans – D
Q.7
Ans - C
Q.8
Ans - ?
Q.9
Ans – D
Ans - C
Q.2
Ans - ?
Q.3
Ans - ?
Q.6
Ans – B
Q. 7
Ans – A
Q.8
Ans – A
Q.9
Ans - C
Section – Clustering
Q.1
Ans - A
Q.2
Ans - ?
Q.3
Ans – D
Q.4
Ans – A
Q.5
Ans – B
Q.6
Ans – D
Q.7
Ans - D
Q.8
Ans - B
Q.9
Ans – B
Section – Business Problem Solving
Q.1
Ans – B
Q.2
Ans – C
Q.3
Ans – B
Section – Multiple Choice Questions
Q.1
Ans – C,D
Q.2
Ans – B,D
Q.3
Ans – C,D
Q.4
Ans – A,D
Q.5
Ans – A,B
Section – Python Coding
Ans –
def kth(s,k):
s=list(set(s))
if len(s)<k:
return -1
else:
num = sorted(s, reverese=True)[k-1]
return num
return (kth,(input1,input2))
pass
Coding - SQL
Q.1
Ans -
select count(order_id) as OC, Order_city
from orders
where order_state = 'Gujarat' and order_status = 'Pending'
group by Order_city
order by OC, order_city;
Course –3
Section – Data Modelling
Q.1
Ans - B
Q.2
Ans – B
Q.3
Ans - C
Q.4
Ans – A
Section – Advanced SQL
Q.1
Ans – B
Q.2
Ans – A
Q.3
Ans – A
Section – Advanced Excel
Q.1
Ans – B
Q.2
Ans – C
Q.3
Ans - B
Q.4
Ans – D
Section – NoSQL and MongoDB
Q.1
Ans - C
Q.2
Ans – A
Q.3
Ans – A
Section – Cloud and Hive
Q.1
Ans –A
select * from employee cluster by id;
1 55000 HR
2 60000 HR
10001 50000 BigData
10002 58000 BigData
10003 70000 BigData
3 25000 HR
Ans - A
Q.3
Ans - B
Q.4
Ans - D
Q.5
Ans - B
Q.6
Ans - A
Section – Multiple Choice Questions
Q.1
Q.2
Ans – A,C,D
Q.3
Ans – A,B,C,D
Q.4
Ans – B,C
Q.5
Ans – B
Section - Coding - SQL
Q.1
Select o.*
from orders o
inner join customer_info c on o.customer_id= c.id
where c.state=’AZ’ and c.street like ‘%Silver%’
order by o.order_id;
Q.2
Select round(sum(product_price),0) as TOTAL
from product_info
where product_name like '%Adidas%';
Q.3
select count(order_id) as oc, Type
from orders
where Order_State = 'Maharashtra'
group by Type
order by oc;
Answer:- All of the Above
Answer C Both Statements is correct
Answer : C
Answer C:- Mean of Residuals of old model= mean of residual of new model
Write a Python function to check whether a number is perfect or not
A perfect number is a positive integer that is equal to the sum of its proper positive divisors, that is, the sum of its positive divisors excluding
the number itself.
Example: The first perfect number is 6, because 1, 2, and 3 are its proper positive divisors, and 1 + 2 + 3 = 6. The next perfect number is 28 =
1 + 2 + 4 + 7 + 14, and so on.
Output 1 if the number is a perfect number else output 0.
Input 1: 6
Output 1: 1
Input 2: 10
Output 2: 0
Answer:-
ans = 0
print(int(ans==n))
ans = 0
if(n%i == 0):
temp = (lambda i: i + n/i if i != n/i else i)
ans += temp(i)
print(int(n*2==ans))
Answer : B Complete Linkage and Single Linkage
Answer B
Answer : 4
Answer: - Problem Statement 3 : Accuracy
Answer : B
Answer:- B
Answer : C -> 0.5
ANSWER:- 0.45
ANSWER :- C
ANSWER. - A _the vif has lower bound of 0
Ans:-- B , C , D
Anse:- A, B, D
Answer:- A, D
Anser: B, C
Ans: B(Write)
Ans:- A
Answer:- B
Ans;- C- 5.8
Ans:- C
Ans : B
Answer D
Answer:- C
Answer:- B
Answer: B
Answer:- 5
Answer:- 25
Answer: - 3
Anser: B
Ans :- B
Answer:- D
Ans:- A,C,D
Ans:- B.D
Answer: B , 1,2,4
Answer:- D
Anser:- B
Answer-C
Create a NumPy array of size (n x n) will all zeros and ones such that the ones make a shape like 'Z'.
Given a single positive integer 'n' greater than 2, create a NumPy array of size (n x n) will all zeros and ones such that the ones make a shape like 'Z
Answer
Answer:-
inp=[2,3,1,5,6,2,1] #inp=int(input())
inp1=4. #inp1=int(input())
def kthlargest(inp, inp1):
x= sorted(list(set(inp)))
if inp1>=len(x):
return -1
else:
return x[-inp1]
print(kthlargest(inp,inp1))
o/p- 2
Dx
Ans- 11
Ans: The oddre 1, Probability of p is 0.5s a
Ans: ?
The model is good to predict when False posi ve is costly but not when False Nega ve is costly.
4 po
Ans: 0.55
Ans: 0.5
Ans: Sensi vity
Ans: Both statements are true
Ans: b
the dependent variable. Actually it does more than this - to secure monotonic relationship it
Ans: Classifica on is the task of predic ng a discrete class label. Regression is a task of predic ng a con nuous quan ty
Ans: c - both have same sum of residuals
Ans: Error terms have constant variance (homoscedas city)
It brings the data into a standard normal distribu on, with mean =0
Ans: ?
Ans:
F-sta s cs and F-tests to test the overall significance for a regression model, to compare the fits of different models, to test specific
regression terms, and to test the equality of means
a(i) is the avg distance between the data point and all others within the cluster (as small as possible)
B(i) is the avg distance between the point and nearest neighbour clusters. ( as large as possible)
Ans: it ranges between 0 to 1. If close to 1, it means we can create clusters else not.
It checks if data is clusterable or uniformly distributed which will not create meaningful clusters.
Ans: All of the above
Ans: Not suitable for clustering
Ans: Op on 4
Ans: Both statements are correct
Ans: 3>2>1>5>4>7>6>8
Ans: Logis c Regression
Ans: 3 & 4 - Actually Dendrogram Inspec on method and Elbow Method are used to find op mal clusters.
Ans: 1 & 2
Ans 1 & 3
Ans:2 & 4
Ans: 3 & 4
Ans: 1
Ans: char
Ans: 5.9
Ans: 41.59
Ans: Both statements are true
Ans: Op on 3 (unwind will display the list in the number of mes the records are present in the field that is kunwinding)
Ans: 2,3 and 4
Ans: Op on 1
Ans: Op on 2
Ans: Op on 1
and order_status=’Pending’
group by order_city
l=[1,4,5,2,9,8,7]
print(get_largest(l,2))
N= int(input())
Import numpy as np
Z=np.zeros((n,n),dtype=int)
Z[0]=np.ones(1)
Z[n-1]=np.ones(1)
For I in range(1,n):
Z[i][n-i-1]=1
X=np.hstack(z)
Return(list(x))
n = int(input())
import numpy as np
# Run a loop from the second row (index 1) till the second last row.
# Fill the 1s in appropriate indices. Notice that for every row index i, the 1
z[i][n-i-1] = 1
print(z)
import numpy as np
def z_display(n):
z = np.zeros((n,n), dtype=int)
z[0]=np.ones(1)
z[n-1]=np.ones(1)
for i in range(1,n):
z[i][n-i-1]=1
return(list(np.hstack(z)))
print(z_display(5))
Ans- 11
Ans: The oddre 1, Probability of p is 0.5s a
Ans: ?
The model is good to predict when False positive is costly but not when False Negative is costly.
Ans: 0.55
Ans: 0.5
Ans: Sensitivity
Ans: Both statements are true
Ans: b
the dependent variable. Actually it does more than this - to secure monotonic relationship it
Ans: Classification is the task of predicting a discrete class label. Regression is a task of predicting a continuous quantity
Ans: c - both have same sum of residuals
Ans: Error terms have constant variance (homoscedasticity)
F-statistics and F-tests to test the overall significance for a regression model, to compare the fits of different models, to test specific
regression terms, and to test the equality of means
a(i) is the avg distance between the data point and all others within the cluster (as small as possible)
B(i) is the avg distance between the point and nearest neighbour clusters. ( as large as possible)
Ans: it ranges between 0 to 1. If close to 1, it means we can create clusters else not.
It checks if data is clusterable or uniformly distributed which will not create meaningful clusters.
Ans: All of the above
Ans: Not suitable for clustering
Ans: Option 4
Ans: Both statements are correct
Ans: 3>2>1>5>4>7>6>8
Ans: Logistic Regression
Ans: 3 & 4 - Actually Dendrogram Inspection method and Elbow Method are used to find optimal clusters.
Ans: 1 & 2
Ans 1 & 3
Ans:2 & 4
Ans: 3 & 4
Ans: 1
Ans: char
Ans: 5.9
Ans: 41.59
Ans: Both statements are true
Ans: Option 3 (unwind will display the list in the number of times the records are present in the field that is kunwinding)
Ans: 2,3 and 4
Ans: Option 1
Ans: Option 2
Ans: Option 1
l=[1,4,5,2,9,8,7]
print(get_largest(l,2))
N= int(input())
Import numpy as np
Z=np.zeros((n,n),dtype=int)
Z[0]=np.ones(1)
Z[n-1]=np.ones(1)
For I in range(1,n):
Z[i][n-i-1]=1
X=np.hstack(z)
Return(list(x))
n = int(input())
import numpy as np
# Run a loop from the second row (index 1) till the second last row.
# Fill the 1s in appropriate indices. Notice that for every row index i, the 1
z[i][n-i-1] = 1
print(z)
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
BA Batch
Confidential C
Adjusted R-squared
For a multiple regression model, R-squared increases or remains the same as we add new predictors to the model, even if the
newly added predictors are independent of the target variable and don’t add any value to the predicting power of the model.
Adjusted R-squared eliminates this drawback of R-squared. It only increases if the newly added predictor improves the model’s
predicting power. Adding independent and irrelevant predictors to a regression model results in a decrease in the adjusted R-
squared.
Confidential C
Recursive Feature Elimination, or RFE for short, is a popular feature selection algorithm.
Confidential C
Confidential C
R-squared or coefficient of determination is a goodness-of-fit measure for linear regression models. This statistic
indicates the percentage of the variance in the dependent variable that the independent variables explain collectively.
Confidential C
Confidential C
Confidential C
The Four Assumptions of Linear Regression
1. Linear relationship: There exists a linear relationship between the independent variable, x, and the dependent variable, y.
2. Independence: The residuals are independent. In particular, there is no correlation between consecutive residuals in time series data.
3. Homoscedasticity: The residuals have constant variance at every level of x.
4. Normality: The residuals of the model are normally distributed.
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
y = e^(b0 + b1*X) / (1 + e^(b0 + b1*X))
Confidential C
Confidential C
The output of Logistics regression is probability P(y=1/x) ➔ 0 to 1
Confidential C
Regression metrics Logistics Reg
•Mean Absolute Error (MAE), •Accuracy
•Mean Squared Error (MSE), •Confusion Matrix
•Root Mean Squared Error (RMSE), •Precision and Recall
•R² (R-Squared). •F1-score
•AU-ROC
https://neptune.ai/blog/performance-metrics-in-machine-
learning-complete-guide Confidential C
TP FP Sensitivity / Recall 0.88
196 20 Specificity 0.93
28 256 Precesion / +ve Pred Rate 0.91
FN TN Accuracy 0.90
Confidential C
Confidential C
https://www.analyticsvidhya.com/blog/2021/06/understand-weight-of-evidence-and-information-value/
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
x
x
x
x
Confidential C
Confidential C
Confidential C
Diff
0.48 0.32 0.16
0.42 0.25 0.17
0.44 0.13 0.31
0.45 0.41 0.04
mean 0.17
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
Confidential C
https://www.youtube.com/watch?v=BqzgUnrNhFM
Confidential C
12 14 16.7%
15 13 -13.3%
13 13 0.0% Difference / original &
16 14 -12.5% Average of all errors
-2.3%
Confidential C
Confidential C
https://www.youtube.com/watch?v=_qv_7lEuiZg
Confidential C
Mean Imputation:
Last observation carried forward:
Linear interpolation:
Seasonal + Linear interpolation:
Confidential C
Confidential C
Confidential C
The series is stationary if Value in
ADF is less than 0.05
KPSS is Greater than or equal to 0.05
Confidential C
16
16
Confidential C
Select o.*
from orders o
inner join customer_info c on o.customer_id= c.id
where c.state=’AZ’
and
c.street like ‘%Silver%’
order by o.order_id ASC;
Confidential C
Confidential C
Confidential C
Select Type, count(order_id) as oc
from orders
where Order_State = 'Maharashtra'
Group by Type
Order BY oc asc
Confidential C
Confidential C
Confidential C
1. Ans => A
2. Ans => B
3. Ans => D
4. Ans => A
5. Ans => D
6. Ans => B
7. Ans =>C
8. Ans => A
9. Ans => C
10. Ans = > B
20. Ans = D
Detail
38. Ans = B
group by Order_city
order by OC,Order_city;
from orders
group by Type
order by oc;
78. Ans
79. Ans
80. Ans
81. Ans