ML Using Scikit
ML Using Scikit
iris = datasets.load_iris()
A: sklearn.datasets.base.Bunch
2)Which of the following expressions can access the features of the iris dataset, shown in the
below expression?
iris = datasets.load_iris()
3) Which of the following module of sklearn contains popular datasets, which are processed?
A: datasets
4)Which of the following utility of Pandas can be used to read from Oracle database?
A:read_sql
A: It fetches all popular datasets.( Wrong answer try with different option)
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Quiz 2: Preprocessing
enc = preprocessing.OneHotEncoder()
print(enc.fit(x).transform([[1, 1]]).toarray())
[[ 0. 0. 0. 1.]]
print(preprocessing.LabelEncoder().fit(regions).transform(regions))
[1 0 3 1 2 0]
3)Which of the following API is used to normalize a sample to the unit norm?
Normalizer
4) What is the output of the following code?
print(preprocessing.Binarizer().fit(x).transform(x))
[[ 1.]
[ 1.]
[ 1.]
[ 1.]]
……………………………………………………………………………………………………
1) Which of the following module of sklearn is used to deal with Nearest Neighbors?
A: radius
3)Which of the following class is used to implement the K-Nearest Neighbors classification in
scikit-learn?
A: KNeighborsClassifier
4)Which of the following algorithms can be used with any nearest neighbors utility in scikit-
learn?
A: all
A:It looks in the vincinity of area, covered by a fixed radius, of each training point.
………………………………………………………………………………………………………
1)Which of the following module of sklearn is used for dealing with Decision Trees?
A: Tree
A: False
3)A small change in data features may change a Decision Tree completely.
A: True
4)A feature can be reused to split a tree during Decision tree creation.
A: True
5)Which of the following utility is used for regression using decision trees?
A: True
2) More improvement is found in an ensemble when base estimators are highly correlated?
A: False
A: True
4)Which of the following utility of sklearn.ensemble is used for classification with extra
randomness?
5)Which of the following module of sklearn is used for dealing with ensemble methods?
A:ensemble
………………………………………………………………………………………………………
………………………………………………………………………………………………………
Quiz 6: SVM
A:True
2)What happens when very small value is used for parameter C in support vector machines?
A:False
4)Which attribute provides details of obtained support vectors, after classifying data using SVC?
A:support_vectors_
5)Which of the following parameter of SVC method is used for fine-tuning the model?
A: C
………………………………………………………………………………………………………
Quiz 7: CLUSTERING
A: False
2) Which of the following utility of sklearn.cluster is used for performing k-means clustering
Density based Clustering ?
A: DBSCAN (wrong answer try with other option)
5) Which of the following clustering technique is used to group data points into user given k
clusters?
A: K-means clustering
………………………………………………………………………………………………………
FINAL ASSESMENT
A: neighbors_num
3) Which of the following parameters are used to control Affinity Propagation clustering?
A: Preference,damping
4) Which attribute provides details of obtained support vectors, after classifying data using SVC?
A: support_vectors_
5) Spectral Clustering is best suited for identifying dense clusters.
A: True
A: n_clusters, min_samples
A: Ward
A: True
enc = preprocessing.OneHotEncoder()
print(enc.fit(x).transform([[1, 1]]).toarray())
[[ 0. 0. 0. 1.]]
10) Which of the following module of sklearn is used for dealing with ensemble methods?
A:ensemble
11) Which of the following Python library is used for Machine Learning?
A: Skikit-learn
12) Which of the following expressions can access the features of the iris dataset, shown in the
below expression?
iris = datasets.load_iris()
A: iris.features
13) Which of the following API is used to normalize a sample to the unit norm?
A: Normalizer
14) Which regressor utility of sklearn.neighbors is used to learn from k nearest neighbors of each
query point?
A: KNeighborsRegressor
15) What happens when very small value is used for parameter C in support vector machines?
A: True
print(preprocessing.Binarizer().fit(x).transform(x).shape)
A: 4,1
A: max_depth
A: preprocessing
21) Which of the following module of sklearn is used to deal with Nearest Neighbors?
A: k_neighbours
22) Which of the following class is used to implement the K-Nearest Neighbors classification in
scikit-learn?
A: KNeighborsClassifier
print(preprocessing.Binarizer().fit(x).transform(x))
[[ 1.]
[ 1.]
[ 1.]
[ 1.]]
24) The preprocessing technique in which categorical values are transformed into categorical
integers is known as __________.
A: Labeling
25) Which of the following utility of sklearn.cluster is used for performing k-means clustering?
A: k-means()
…………………………………………………………………………………………………….
HANDS ON 1: PREPROCESSING
Pre- Requirements:
Type the below code in prog.py file only (Don’t type the code in terminal, type it in normal workspace )
import numpy as np
iris=load_iris()
normalizer = preprocessing.Normalizer(norm='l2').fit(iris.data)
iris_normalized = normalizer.transform(iris.data)
print(iris_normalized.mean(axis=0))
enc = preprocessing.OneHotEncoder()
print(iris_target_onehot.toarray()[[0,50,100]])
iris.data[:50,:]= np.nan
imputer = imputer.fit(iris.data)
iris_imputed = imputer.transform(iris.data)
print(iris_imputed.mean(axis=0))
OUTPUT:
………………………………………………………………………………………………………………..
…………………………………………………………………………………………………………….
2) NEAREST NEIGHBOURS
Pre- Requirements:
Note:
if you need to install sklearn module , you need to use the above code which is given in 3 rd point
Solution:
Type the below code in prog.py file only (Don’t type the code in terminal, type it in normal workspace )
import numpy as np
iris=dataset.load_iris()
print(X_train.shape)
print(X_test.shape)
knn_clf= KNeighborsClassifier()
knn_clf=knn_clf.fit(X_train,Y_train)
print(knn_clf.score(X_train,Y_train))
print(knn_clf.score(X_test,Y_test))
cluster=3
max_score=0
best_n_neighbour=0
while(cluster<=10):
knn_clf=KNeighborsClassifier(n_neighbors=cluster)
knn_clf=knn_clf.fit(X_train,Y_train)
prev_score=max_score
max_score=knn_clf.score(X_test,Y_test)
if(max_score>prev_score):
best_n_neighbour=cluster
print(str(cluster),knn_clf.score(X_test,Y_test))
cluster=cluster+1
print(best_n_neighbour)
………………………………………………………………………………………………………………..
………………………………………………………………………………………………………………..
3) DECISION TREE
import numpy as np
np.random.seed(100)
boston = datasets.load_boston()
X_train,X_test,Y_train,Y_test=model_selection.train_test_split(boston.data,boston.target,
random_state=30)
print(X_train.data.shape)
print(X_test.data.shape)
dt_regresssorr=DecisionTreeRegressor()
dt_reg=dt_regresssorr.fit(X_train,Y_train)
print(dt_reg.score(X_train,Y_train))
print(dt_reg.score(X_test,Y_test))
y_pred=dt_reg.predict(X_test[:2])
print(y_pred)
maxdepth = 2
maxscore = 0
for i in range(2,6):
dt_regresssorr=DecisionTreeRegressor(max_depth=i)
dt_reg=dt_regresssorr.fit(X_train,Y_train)
score = dt_reg.score(X_test, Y_test)
maxdepth = i
maxscore = score
print(maxdepth)
OUTPUT:
After writing the code click on run button under run option
………………………………………………………………………………………………………
………………………………………………………………………………………………………
4) ENSEMBLE METHODS
import sklearn.model_selection as ms
digits = datasets.load_digits()
X = digits.data
y = digits.target
print(X_train.shape)
print(X_test.shape)
print(svm_clf.score(X_test,y_test))
scaler = StandardScaler()
scaler.fit(X)
digits_standardized = scaler.transform(X)
X_train, X_test, y_train, y_test = train_test_split(digits_standardized, y, random_state=30,
stratify=y)
print(svm_clf2.score(X_test,y_test))
OUTPUT:
After writing the code click on run button under run option
……………………………………………………………………………………………………
………………………………………………………………………………………………………
5) SVM
import numpy as np
digits = datasets.load_digits()
X_train,X_test,Y_train,Y_test=model_selection.train_test_split(digits.data,digits.target,
random_state=30)
print(X_train.data.shape)
print(X_test.data.shape)
classifier=SVC()
svm_clf=classifier.fit(X_train,Y_train)
print(svm_clf.score(X_test,Y_test))
standardizer=preprocessing.StandardScaler()
standardizer=standardizer.fit(digits.data)
digits_standardized=standardizer.transform(digits.data)
X_train,X_test,Y_train,Y_test=model_selection.train_test_split(digits_standardized,digits.target,
random_state=30)
classifier=SVC()
svm_clf2=classifier.fit(X_train,Y_train)
print(svm_clf2.score(X_test,Y_test))
OUTPUT:
After writing the code click on run button under run option
………………………………………………………………………………………………………
6) CLUSTERING
import numpy as np
np.random.seed(100)
boston = datasets.load_boston()
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(boston.data,
boston.target, random_state=30)
print(X_train.shape)
print(X_test.shape)
rf_Regressor = RandomForestRegressor()
print(rf_reg.score(X_train,Y_train))
print(rf_reg.score(X_test,Y_test))
predicted = rf_reg.predict(X_test[:2])
print(predicted)
depths = []
scores = []
c_estimators = 100
depths.append(x)
scores.append(rf_reg.score(X_test, Y_test))
print( (depths[np.argmax(scores)],c_estimators) )
OUTPUT:
After writing the code click on run button under run option
………………………………………………………………………………………………………