50% found this document useful (4 votes)
17K views23 pages

ML Using Scikit

Here is the code to implement nearest neighbors in scikit-learn: # Import nearest neighbors classifier from sklearn from sklearn.neighbors import KNeighborsClassifier # Load iris data from sklearn.datasets import load_iris iris = load_iris() # Split data into train and test from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.33) # Create a k-NN classifier with 3 neighbors knn = KNeighborsClassifier(n_neighbors=3) # Fit the classifier to the training data knn.

Uploaded by

manikanta tarun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (4 votes)
17K views23 pages

ML Using Scikit

Here is the code to implement nearest neighbors in scikit-learn: # Import nearest neighbors classifier from sklearn from sklearn.neighbors import KNeighborsClassifier # Load iris data from sklearn.datasets import load_iris iris = load_iris() # Split data into train and test from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.33) # Create a k-NN classifier with 3 neighbors knn = KNeighborsClassifier(n_neighbors=3) # Fit the classifier to the training data knn.

Uploaded by

manikanta tarun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

………………………………………………………………………………………………………

QUIZ 1: DATA READING

Obtained score : 3 Cut off: 3 Max.Score:5

1)What is the type of iris variable, shown in the below expression?

from sklearn import datasets

iris = datasets.load_iris()

A: sklearn.datasets.base.Bunch

2)Which of the following expressions can access the features of the iris dataset, shown in the
below expression?

from sklearn import datasets

iris = datasets.load_iris()

A: iris.get_features ( Wrong answer try with different option)

3) Which of the following module of sklearn contains popular datasets, which are processed?

A: datasets

4)Which of the following utility of Pandas can be used to read from Oracle database?

A:read_sql

5)What do the methods starting with fetch, of sklearn.datasets module do?

A: It fetches all popular datasets.( Wrong answer try with different option)

………………………………………………………………………………………………………
………………………………………………………………………………………………………

Quiz 2: Preprocessing

Obtained score : 4 Cut off: 3 Max.Score:5

1) What is the output of the following code?

import sklearn.preprocessing as preprocessing

x = [[0, 0], [0, 1], [2,0]]

enc = preprocessing.OneHotEncoder()

print(enc.fit(x).transform([[1, 1]]).toarray())

[[ 0. 0. 0. 1.]]

2) What is the output of the following code?

import sklearn.preprocessing as preprocessing

regions = ['HYD', 'CHN', 'MUM', 'HYD', 'KOL', 'CHN']

print(preprocessing.LabelEncoder().fit(regions).transform(regions))

[1 0 3 1 2 0]

3)Which of the following API is used to normalize a sample to the unit norm?

Normalizer
4) What is the output of the following code?

import sklearn.preprocessing as preprocessing

x = [[7.8], [1.3], [4.5], [0.9]]

print(preprocessing.Binarizer().fit(x).transform(x))

[[ 1.]

[ 1.]

[ 1.]

[ 1.]]

……………………………………………………………………………………………………

QUIZ 3: Nearest Neighbour

Obtained score : 4 Cut off: 3 Max.Score:5

1) Which of the following module of sklearn is used to deal with Nearest Neighbors?

A:n_neighbors(wrong answer try with other option)

2)Which of the following is an essential parameter of RadiusNeighborsClassifier?

A: radius

3)Which of the following class is used to implement the K-Nearest Neighbors classification in
scikit-learn?

A: KNeighborsClassifier
4)Which of the following algorithms can be used with any nearest neighbors utility in scikit-
learn?

A: all

5) What is the strategy followed by Radius Neighbors method?

A:It looks in the vincinity of area, covered by a fixed radius, of each training point.

………………………………………………………………………………………………………

QUIZ 4: DECISION TREE

Obtained score : 4 Cut off: 3 Max.Score:5

1)Which of the following module of sklearn is used for dealing with Decision Trees?

A: Tree

2)Data used for Decision Trees have to be preprocessed compulsorily.

A: False

3)A small change in data features may change a Decision Tree completely.

A: True

4)A feature can be reused to split a tree during Decision tree creation.

A: True

5)Which of the following utility is used for regression using decision trees?

A: DecisionTreeRegression ( wrong answer try with different option)


………………………………………………………………………………………………………

Quiz 5:Ensemble methods

Obtained score : 4 Cut off: 3 Max.Score:5

1)Ensemble methods are better than Decision Trees.

A: True

2) More improvement is found in an ensemble when base estimators are highly correlated?

A: False

3)Ensemble methods are better than Decision Trees.

A: True

4)Which of the following utility of sklearn.ensemble is used for classification with extra
randomness?

A:ExtraTreesClassifier(wrong answer try with other options)

5)Which of the following module of sklearn is used for dealing with ensemble methods?

A:ensemble

………………………………………………………………………………………………………
………………………………………………………………………………………………………

Quiz 6: SVM

Obtained score : 5 Cut off: 3 Max.Score:5

1)SVM algorithms are memory efficient.

A:True

2)What happens when very small value is used for parameter C in support vector machines?

A:Misclassification can happen

3)LinearSVC class accepts kernel parameter value.

A:False

4)Which attribute provides details of obtained support vectors, after classifying data using SVC?

A:support_vectors_

5)Which of the following parameter of SVC method is used for fine-tuning the model?

A: C

………………………………………………………………………………………………………

Quiz 7: CLUSTERING

Obtained score : 3 Cut off: 3 Max.Score:5

1) Agglomerative Clustering follows a top-down approach.

A: False

2) Which of the following utility of sklearn.cluster is used for performing k-means clustering
Density based Clustering ?
A: DBSCAN (wrong answer try with other option)

3) Which of the following parameters are used to control Density-based clustering?

A:eps, n_clusters (wrong answer try with other option)

4) What does the Homogeneity score of a clsutering algorithm indicate ?

A:Verifies if each cluster contains only members of a single class

5) Which of the following clustering technique is used to group data points into user given k
clusters?

A: K-means clustering

………………………………………………………………………………………………………

FINAL ASSESMENT

Obtained score : 16 (In Both attempts) Cut off: 21 Max.Score:25

1) The parameter used to control the number of neigbors of KNearestClassifier is


______________.

A: neighbors_num

2) What do the methods starting with fetch, of sklearn.datasets module do?

A: Fetches a dataset from sklearn.org

3) Which of the following parameters are used to control Affinity Propagation clustering?

A: Preference,damping

4) Which attribute provides details of obtained support vectors, after classifying data using SVC?

A: support_vectors_
5) Spectral Clustering is best suited for identifying dense clusters.

A: True

6) Which of the following parameters are used to control Density-based clustering?

A: n_clusters, min_samples

7) What values can be used for the linkage parameter in AgglomerativeClustering?

A: Ward

8) A small change in data features may change a Decision Tree completely.

A: True

9) What is the output of the following code?

import sklearn.preprocessing as preprocessing

x = [[0, 0], [0, 1], [2,0]]

enc = preprocessing.OneHotEncoder()

print(enc.fit(x).transform([[1, 1]]).toarray())

[[ 0. 0. 0. 1.]]

10) Which of the following module of sklearn is used for dealing with ensemble methods?

A:ensemble

11) Which of the following Python library is used for Machine Learning?

A: Skikit-learn
12) Which of the following expressions can access the features of the iris dataset, shown in the
below expression?

from sklearn import datasets

iris = datasets.load_iris()

A: iris.features

13) Which of the following API is used to normalize a sample to the unit norm?

A: Normalizer

14) Which regressor utility of sklearn.neighbors is used to learn from k nearest neighbors of each
query point?

A: KNeighborsRegressor

15) What happens when very small value is used for parameter C in support vector machines?

A:Misclassification can happen

16) Ensemble methods are better than Decision Trees.

A: True

17) What does the Homogeneity score of a clsutering algorithm indicate ?

A:Verifies if each cluster contains only members of a single class


18) What is the output of the following code?

import sklearn.preprocessing as preprocessing

x = [[7.8], [1.3], [4.5], [0.9]]

print(preprocessing.Binarizer().fit(x).transform(x).shape)

A: 4,1

19) Which of the following parameter is used to tune a Decision Tree?

A: max_depth

20) Which of the following module of sklearn contains preprocessing utilities?

A: preprocessing

21) Which of the following module of sklearn is used to deal with Nearest Neighbors?

A: k_neighbours

22) Which of the following class is used to implement the K-Nearest Neighbors classification in
scikit-learn?

A: KNeighborsClassifier

23) What is the output of the following code?

import sklearn.preprocessing as preprocessing

x = [[7.8], [1.3], [4.5], [0.9]]

print(preprocessing.Binarizer().fit(x).transform(x))
[[ 1.]

[ 1.]

[ 1.]

[ 1.]]

24) The preprocessing technique in which categorical values are transformed into categorical
integers is known as __________.

A: Labeling

25) Which of the following utility of sklearn.cluster is used for performing k-means clustering?

A: k-means()

…………………………………………………………………………………………………….

HOW TO EXECUTE MACHINE LEARNING USING SCIKIT HANDS ON

HANDS ON 1: PREPROCESSING

Pre- Requirements:

1. Open ide and select Install button under Run Option

2. After Some minutes in terminal type python3


Solution:

Inside Challenge there will be prog.py file open that file

Type the below code in prog.py file only (Don’t type the code in terminal, type it in normal workspace )

#Write your code here

from sklearn.datasets import load_iris

import sklearn.preprocessing as preprocessing

from sklearn.impute import SimpleImputer

import numpy as np

iris=load_iris()

normalizer = preprocessing.Normalizer(norm='l2').fit(iris.data)
iris_normalized = normalizer.transform(iris.data)

print(iris_normalized.mean(axis=0))

enc = preprocessing.OneHotEncoder()

iris_target_onehot = enc.fit_transform(iris.target.reshape(-1, 1))

print(iris_target_onehot.toarray()[[0,50,100]])

iris.data[:50,:]= np.nan

imputer= SimpleImputer(missing_values=np.nan, strategy="mean" )

#imputer = preprocessing.Imputer(missing_values='NaN', strategy='mean')

imputer = imputer.fit(iris.data)

iris_imputed = imputer.transform(iris.data)

print(iris_imputed.mean(axis=0))

OUTPUT:

After Writing code Click on Run button under Run option

if you get the same thing then your code is correct

………………………………………………………………………………………………………………..
…………………………………………………………………………………………………………….

2) NEAREST NEIGHBOURS

Pre- Requirements:

1. Open ide and select Install button under Run Option

2. After Some minutes in terminal type python3

3. in the same terminal type the below code

import os (after this one press enter)

os.system('sudo pip install scikit-learn')

Note:

if you need to install sklearn module , you need to use the above code which is given in 3 rd point

Solution:

Inside Challenge there will be prog.py file open that file

Type the below code in prog.py file only (Don’t type the code in terminal, type it in normal workspace )

import sklearn.datasets as dataset

from sklearn.model_selection import train_test_split

import numpy as np

iris=dataset.load_iris()

X_train, X_test, Y_train, Y_test=train_test_split(iris.data,iris.target,stratify=iris.target,random_state=30)

print(X_train.shape)

print(X_test.shape)

#from sklearn.neighbors._classification import KNeighborsClassifier


from sklearn.neighbors import KNeighborsClassifier

knn_clf= KNeighborsClassifier()

knn_clf=knn_clf.fit(X_train,Y_train)

print(knn_clf.score(X_train,Y_train))

print(knn_clf.score(X_test,Y_test))

cluster=3

max_score=0

best_n_neighbour=0

while(cluster<=10):

knn_clf=KNeighborsClassifier(n_neighbors=cluster)

knn_clf=knn_clf.fit(X_train,Y_train)

prev_score=max_score

max_score=knn_clf.score(X_test,Y_test)

if(max_score>prev_score):

best_n_neighbour=cluster

print(str(cluster),knn_clf.score(X_test,Y_test))

cluster=cluster+1

print(best_n_neighbour)

………………………………………………………………………………………………………………..
………………………………………………………………………………………………………………..

3) DECISION TREE

import sklearn.datasets as datasets

import sklearn.model_selection as model_selection

import numpy as np

from sklearn.tree import DecisionTreeRegressor

np.random.seed(100)

boston = datasets.load_boston()

X_train,X_test,Y_train,Y_test=model_selection.train_test_split(boston.data,boston.target,
random_state=30)

print(X_train.data.shape)

print(X_test.data.shape)

dt_regresssorr=DecisionTreeRegressor()

dt_reg=dt_regresssorr.fit(X_train,Y_train)

print(dt_reg.score(X_train,Y_train))

print(dt_reg.score(X_test,Y_test))

y_pred=dt_reg.predict(X_test[:2])

print(y_pred)

maxdepth = 2

maxscore = 0

for i in range(2,6):

dt_regresssorr=DecisionTreeRegressor(max_depth=i)

dt_reg=dt_regresssorr.fit(X_train,Y_train)
score = dt_reg.score(X_test, Y_test)

if(maxscore < score):

maxdepth = i

maxscore = score

print(maxdepth)

OUTPUT:

After writing the code click on run button under run option

Then only you will get output as mentioned in below image

………………………………………………………………………………………………………
………………………………………………………………………………………………………

4) ENSEMBLE METHODS

import sklearn.datasets as datasets

import sklearn.model_selection as ms

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

digits = datasets.load_digits()

X = digits.data

y = digits.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=30, stratify=y)

print(X_train.shape)

print(X_test.shape)

from sklearn.svm import SVC

svm_clf = SVC().fit(X_train, y_train)

print(svm_clf.score(X_test,y_test))

scaler = StandardScaler()

scaler.fit(X)

digits_standardized = scaler.transform(X)
X_train, X_test, y_train, y_test = train_test_split(digits_standardized, y, random_state=30,
stratify=y)

from sklearn.svm import SVC

svm_clf2 = SVC().fit(X_train, y_train)

print(svm_clf2.score(X_test,y_test))

OUTPUT:

After writing the code click on run button under run option

Then only you will get output as mentioned in below image

……………………………………………………………………………………………………
………………………………………………………………………………………………………

5) SVM

import sklearn.datasets as datasets

import sklearn.model_selection as model_selection

import numpy as np

from sklearn.svm import SVC

import sklearn.preprocessing as preprocessing

digits = datasets.load_digits()

X_train,X_test,Y_train,Y_test=model_selection.train_test_split(digits.data,digits.target,
random_state=30)

print(X_train.data.shape)

print(X_test.data.shape)

classifier=SVC()

svm_clf=classifier.fit(X_train,Y_train)

print(svm_clf.score(X_test,Y_test))

standardizer=preprocessing.StandardScaler()

standardizer=standardizer.fit(digits.data)

digits_standardized=standardizer.transform(digits.data)

X_train,X_test,Y_train,Y_test=model_selection.train_test_split(digits_standardized,digits.target,
random_state=30)

classifier=SVC()

svm_clf2=classifier.fit(X_train,Y_train)
print(svm_clf2.score(X_test,Y_test))

OUTPUT:

After writing the code click on run button under run option

Then only you will get output as mentioned in below image

………………………………………………………………………………………………………

6) CLUSTERING

from sklearn.ensemble import RandomForestRegressor

import sklearn.datasets as datasets

import sklearn.model_selection as model_selection

import numpy as np

np.random.seed(100)

boston = datasets.load_boston()
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(boston.data,

boston.target, random_state=30)

print(X_train.shape)

print(X_test.shape)

rf_Regressor = RandomForestRegressor()

rf_reg = rf_Regressor.fit(X_train, Y_train)

print(rf_reg.score(X_train,Y_train))

print(rf_reg.score(X_test,Y_test))

predicted = rf_reg.predict(X_test[:2])

print(predicted)

depths = []

scores = []

c_estimators = 100

for x in range(3, 6):

rf_Regressor = RandomForestRegressor(n_estimators=c_estimators, max_depth=x)

rf_reg = rf_Regressor.fit(X_train, Y_train)

score = rf_reg.score(X_test, Y_test)

depths.append(x)

scores.append(rf_reg.score(X_test, Y_test))
print( (depths[np.argmax(scores)],c_estimators) )

OUTPUT:

After writing the code click on run button under run option

Then only you will get output as mentioned in below image

………………………………………………………………………………………………………

You might also like