0% found this document useful (0 votes)
20 views74 pages

Abhiml ML File

Uploaded by

Bhawna Chandla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views74 pages

Abhiml ML File

Uploaded by

Bhawna Chandla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Output:-

FIG. 1.1 READ DATA FROM THE FILE “DATA CSV”


1. Using panda module read the csv file
and perform basic operations on a data
set ( head, tail , info, drop, rename)
1. Head
import pandas as pd df = pd.read_csv('diabetes_012_health_indicators_BRFSS2015.csv') df.head()

2. Tail import pandas as pd df =


pd.read_csv('diabetes_012_health_indicators_BRFSS2015.csv')

df.tail()
Output:-

FIG. 2.1 DISPLAY ALL THE DATA OF THE FILE “DATA CSV”

FIG.2.2 Rename Column Name 'FName':'First Name','LName':'Last Name'


2. Write a program to create a dictionary and perform
basic functions on that dataset.

CREATING A DATASET (DICTIONARY)

Info import pandas as pd df =


pd.read_csv('diabetes_012_health_indicators_BRFSS2015.csv')

df.info()

import pandas as pd

data = {

'EmpID':['1021','1057','1147','1272','1523','1445','1663','1747'],

'FName':['Ravi','Aman','Krishan','Priya','Navjot','Abhishek','Jay','Ritik'],

'LName':['Kumar','Kumr','Yadav','Sharma','Singh','Gupta','Malhotra','Patel'],

'Salary':['22000','25000','28000','20000','35000','18000','32000','40000'] } df =

pd.DataFrame(data)

print(df)

Rename Column

df.rename(columns = {'FName':'First Name','LName':'Last Name'},inplace = True)

print(df)
FIG.2.3 Drop Column Name 'Salary'

FIG.2.4 ISNA FUNCTION


Drop Column

df = df.drop(columns=['Salary']) print(df)

ISNA FUNCTION

Pandas dataframe.isna() function is used to detect missing values.


It return a boolean same-sized object indicating if the values are
NA. NA values, such as None or numpy.NaN, gets mapped to True
values. Everything else gets mapped to False values.

CODE:-
df.isna()

Output: FIG. 3.1 DISPLAY ALL THE DATA USING VARIOUS ENCODING METHOD
FIG. 3.2 Display The Label Encoder 'Gender_LabelEncoded'

FIG. 3.3 Display The Label Encoder 'Gender_LabelEncoded' 'Color_LabelEncoded'

3. Program to encode the data using


various encoding methods.
import pandas as pd
data = {
'Gender':['Male', 'Female', 'Non-Binary', 'Male', 'Female'],
'Color':['Red', 'Blue', 'Green', 'Red', 'Blue'] }
df=pd.DataFrame(data)
df.head()

1. Label Encoder
from sklearn.preprocessing import LabelEncoder lblenc =
LabelEncoder() df['Gender_LabelEncoded'] =
lblenc.fit_transform(df['Gender']) print("Label Encoded DataFrame: ")
print(df)

from sklearn.preprocessing import LabelEncoder lblenc =


LabelEncoder() df['Color_LabelEncoded'] =
lblenc.fit_transform(df['Color']) print("Label Encoded DataFrame:
")
print(df)

FIG. 3.4 Display The One-Hot Encoding 'Gender_LabelEncoded' 'Color_LabelEncoded'


FIG. 3.5 Display The Data Ordinal Encoding
2. One-Hot Encoding
one_hot_encoded = pd.get_dummies(df['Color'], prefix='color') df =

pd.concat([df, one_hot_encoded], axis=1) print(" \nOne-Hot Encoded

DataFrame: ")

print(df)

3. Ordinal Encoding

color_mapping = {'Red':10, 'Blue':15, 'Green':25}

df['Color_OrdinalEncoded'] = df['Color'].map(color_mapping)

print("\nOrdinal Encoded DataFrame: ") print(df)

OUTPUT:-
FIG. 3.6 Display The Data Binary Encoding

4. Binary Encoding

import category_encoders as ce binary_encoder =

ce.BinaryEncoder(cols=['Color']) binary_encoded =

binary_encoder.fit_transform(df['Color']) df = pd.concat([df,

binary_encoded], axis=1) print("\nBinary Encoded DataFrame: ")

print(df)
4. Write a program to perform various
scalers to scale the features using
minmax, Z- score, Robust, Max Absolute
and Min-max within range.

import pandas as pd import numpy

as np

data = {

'Feature1':np.random.randint(1,100,10),

'Feature2':np.random.randint(100,1000,10),

'Feature3':np.random.randint(1000,10000,10)} df =

pd.DataFrame(data) print("Original DataFrame: ")

print(df)
1. Min-Max Scaling
from sklearn.preprocessing import MinMaxScaler minmax_scaler =

MinMaxScaler() minmax_scaled_data = minmax_scaler.fit_transform(df)

minmax_df = pd.DataFrame(minmax_scaled_data, columns=df.columns)

print("\n Min-Max Scaled DataFrame: ") print(minmax_df)


2. Standardization or Z- score Scaling

from sklearn.preprocessing import StandardScaler standard_scaler =

StandardScaler() standard_scaled = standard_scaler.fit_transform(df)

std_df = pd.DataFrame(standard_scaled, columns=df.columns)

print("\n Standard Scaled DataFrame: ") print(std_df)


3. Robust Scaling

from sklearn.preprocessing import RobustScaler robust_scaler =

RobustScaler() robust_scaled = robust_scaler.fit_transform(df)

rb_df = pd.DataFrame(robust_scaled, columns=df.columns)

print("\n Robust Scaled DataFrame: ") print(rb_df)


4. Max Absolute Scaling

from sklearn.preprocessing import MaxAbsScaler maxabs_scaler =

MaxAbsScaler() maxabs_scaled=maxabs_scaler.fit_transform(df)

maxabs_df=pd.DataFrame(maxabs_scaled, columns=df.columns)

print("\nMax Absolute Scaled DataFrame:") print(maxabs_df)

5.Min-max Scaling within range.

minmax_range = (0, 10) minimax_range_scaler = MinMaxScaler(feature_range =

minmax_range) minimax_range_scaled = minimax_range_scaler.fit_transform(df)

minimax_range_df = pd.DataFrame(minimax_range_scaled, columns=df.columns) print("\

n Min-Max Scaled DataFrame within Range: ") print(minimax_range_df)


FIG:5.1. Display The Data Original Dataset:-

FIG:5.2. Display The Data Records with Missing Values:-


5.Write a program to find the missing value in randomly
generated dataset using numpy and then replace thr
missing values by the mean of respective features.
1.Original Dataset:
import numpy as np
import pandas as pd
np.random.seed(2)
num_samples = 100
num_features = 5
data = np.random.rand(num_samples, num_features)
random_row = np.random.choice(num_samples, size = 20, replace= False)
random_cols = np.random.choice(num_features, size = 20, replace= True)
data[random_row, random_cols]=np.nan
labels = np.random.randint(0,2, size=num_samples)
columns = [f"Feature_{i}"for i in range(1, num_features +1)]
df = pd.DataFrame(data, columns=columns)
df["Label"] = labels
print("Original Dataset:")
print(df.head())

2.Missing Values
missing_records =df[df.isnull().any(axis=1)]
print("\nRecords with Missing Values:")
print(missing_records)
FIG:5.3. Display The Data Count of Missing Values:-

FIG:5.4.Display The Data Filled mean:-

FIG:5.5.Display The Data Dataset after Filling Missing Values:-


3.Count of Missing Values:-
print("\nCount of Missing Values:")
print(df.isnull().sum())

4. Filled mean:-
df_filled_mean = df.fillna(df.mean())
df_filled_mean

5. Dataset after Filling Missing Values:-


print("\nDataset after Filling Missing Values:")
print(df_filled_mean.head())
FIG:6.1.Display The Data Generate Random Dataset

FIG:6.2.Display The Data Generate Random Dataset Company Name

FIG:6.3.Display The Data Generate Random Dataset Experience


6.Write a program to generate random dataset containing
fields, experience, Age, salary and company name for
interview of candidates.
Sol:-
import numpy as np
import pandas as pd
np.random.seed(42)
num_samples = 100
num_features = 4
data = np.random.rand(num_samples, num_features)
random_row = np.random.choice(num_samples, size=20, replace=False)
random_cols = np.random.choice(num_features, size=20, replace=True)
random_cols
# Generate Random Dataset Company Name
data[random_row, random_cols]= np.nan
labels=
np.random.choice(["Google","TCS","Wipro","Facebook"],size=num_samples,repl
ace=True)
labels
#Experience, Age, salary
age = np.random.choice([30,35,24,21],size=100,replace=True)
salary = np.random.choice([30000,40000,50000,60000],size=100, replace=True)
exp = np.random.choice([5,10,15],size=100,replace=True)
exp
FIG:6.4.Display The Data Generate Random DataFrame

FIG:6.5.Display The Data DataFrame


# Generate DataFrame
data ={
"age":age,
"salary":salary,
"Company": labels
}
columns =["Age","Salary","Exp","Company"]
df = pd.DataFrame(data,columns =columns)
df

df["Age"]= age
df["Salary"]= salary
df["Exp"]= exp
df
FIG:7.1.Display The Data

FIG:7.2.Display The Data


7.Write a program to print the ginni Impurity in case of
decision tree classification problem using iris Dataset.
Sol:-
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris =load_iris()
x=iris.data
y=iris.target
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,
random_state=42)
clf = DecisionTreeClassifier(random_state=42)
clf.fit(x_train, y_train)
gini_impurities = clf.feature_importances_
for feature_index, gini_impurity in enumerate(gini_impurities):
print(f"Feature{feature_index}: Gini Impurity ={gini_impurity:.4f}")

best_feature_index = np.argmin(gini_impurities)
print("nBest Feature(lowest Gini Impurity):Feature",best_feature_index)
y_pred = clf.predict(x_test)
accuracy =accuracy_score(y_test, y_pred)
print("\nAccuracy:",accuracy)
FIG:8.1.Display The Data
8.Write program to print the information gain of the same
dataset using decision tree classifier.
Sol:-
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.feature_selection import mutual_info_classif
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris =load_iris()
x=iris.data
y=iris.target
information_gains = mutual_info_classif(x,y)
for feature_index, information_gains in enumerate(information_gains):
print(f"Feature{feature_index}: Information_gains ={information_gains:.4f}")
FIG:8.2.Display The Data
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2,
random_state=42)
num_selected_features =2
selected_feature_indices = np.argsort(information_gains)[-num_selected_features:]
clf = DecisionTreeClassifier(random_state=42)
clf.fit(x_train[:,selected_feature_indices], y_train)
y_pred = clf.predict(x_test[:,selected_feature_indices])
accuracy =accuracy_score(y_test, y_pred)
print("\nAccuracy:",accuracy)

print("\nOriginal Predicted")
for i in range(len(y_test)):
print(f"{y_test[i]} {y_pred[i]}")
FIG:9.1.Display The Data
9.Write a program to print whether a person survive or not
during the Titanic accident using decision tree classifier.
Sol:-
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
titanic_data = pd.read_csv('D:\\titanic.csv')
features =['Pclass','Sex','Age','SibSp','Parch','Fare','Embarked']
target ='Survived'
titanic_data = titanic_data[features +[target]].dropna()
titanic_data
titanic_data['Sex'] = titanic_data['Sex'].map({'male':0,'Female': 1})
titanic_data['Embarked']= titanic_data['Embarked'].map({'S':0,'C':1,'Q':2})
x = titanic_data[features]
y = titanic_data[target]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,
random_state=42)
clf = DecisionTreeClassifier(random_state=42)
clf.fit(x_train, y_train)
FIG:9.2.Display The Data

first_10_records = X_test.head(10)
predictions = clf.predict(first_10_records)
for i, prediction in enumerate(predictions):
passenger_data =first_10_recoreds.iloc[i]
survived = "Survived" if prediction == 1 else "Not Survived"
print(f"Passenger {i + 1}: { survived} -| {passenger_data}")

FIG:10.Display The Data


Q10: -Write a program to predict the diabetes levels of
patients using linear regression algorithm.
ANSWER: - CODE: -
import pandas as pd
import numpy as np
from sklearn.datasets import load_diabetes from
sklearn.linear_model import LinearRegression
from sklearn.model_selection import
train_test_split
diabetes = load_diabetes()
df = pd.DataFrame(diabetes.data,
columns=diabetes.feature_names) df
data['target'] = diabetes.target
X = data.iloc[:, :-
1].values y =
data.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=0)
model = LinearRegression()
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print('Score:', score)
print("The diabetes level of the first patient is: ")
print(model.predict([X_test[0]]))
FIG:11.Display The Data
Q11: -Write a program to predict the category of iris flower
using logistic regression.
CODE: -
import numpy as
np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
flowers = ["Setosa","Versicolor","Verginica"] iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
test_size=0.2,random_state=23)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
print("Accuracy:", logreg.score(X_test, y_test))
print("The original values are: ")
for i in range(5):
print(flowers[y_test[i]],
end=",") print() print("The
predicted values are: ") for i in
range(5):
print(flowers[y_pred[i]],end=",")
FIG:12.Display The Data
Q12. Write a program to predict the K-MEAN clustering.

ANSWER: -
import numpy as np import
matplotlib.pyplot as plt from
sklearn.cluster import Kmeans

# Define the data points


X = np.array([[2, 3], [3, 2], [3, 3], [4, 3], [6, 5], [7, 5], [8, 5], [7, 6], [8, 6]])
# Define the number of clusters k =
2

# Run the KMeans algorithm


kmeans = KMeans(n_clusters=k, random_state=0).fit(X)
# Get the cluster labels and centroids
labels = kmeans.labels_ centroids =
kmeans.cluster_centers_

# Plot the scatter plot with the points and their assigned clusters
plt.scatter(X[:,0], X[:,1], c=labels)
plt.scatter(centroids[:,0], centroids[:,1], marker='*', s=300, c='red')
plt.title('Partitioned Clustering with Two Clusters') plt.xlabel('x')
plt.ylabel('y') plt.show()
FIG:13.Display The Data
Q13.Write a program to predict the category of iris flower
using SVC with accuracy.
ANSWER: -
from sklearn.datasets import load_iris from
sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score iris =
load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
test_size=0.3, random_state=42) lst =
["Setosa","Verginica","Versicolor"] clf = SVC(kernel='linear')
clf.fit(X_train, y_train) y_pred = clf.predict(X_test) for i in range(5):
print("Predicted value:")
print(lst[y_pred[i]])
print("Actual value:")
print(lst[y_test[i]]) print()
acc = accuracy_score(y_test, y_pred) print("Accuracy:",
acc)
FIG:14.Display The Data
Q14.Write a program to predict the category of iris flower
using Decision Tree Classifier with accuracy.
ANSWER: -
from sklearn.datasets import load_iris from
sklearn.tree import DecisionTreeClassifier from
sklearn.model_selection import train_test_split from
sklearn.metrics import accuracy_score iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
test_size=0.3, random_state=42) lst =
["Setosa","Verginica","Versicolor"] clf =
DecisionTreeClassifier(random_state=42) clf.fit(X_train, y_train)

y_pred = clf.predict(X_test) for i


in range(5):
print("Predicted value: ",end="")
print(lst[y_pred[i]]) print("Actual
value: ",end="") print(lst[y_test[i]])
print()

acc = accuracy_score(y_test, y_pred)


print("Accuracy:", acc) print(y_pred)
FIG:15.Display The Data
Q15: - Write a program to predict the category of iris flower
using Random Forest Classifier with accuracy.
ANSWER: -
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score iris =
load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
test_size=0.3, random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

lst =
["Setosa","Virginica","Versicolor"]
y_pred = clf.predict(X_test) for i in
range(5):
print("Actual value: ",end="")
print(lst[y_test[i]])
print("Predicted value: ",end="")
print(lst[y_pred[i]]) print()

acc = accuracy_score(y_test, y_pred)


print("Accuracy:", acc) print(y_pred)
FIG:16.Display The Data
Q16: - Write a program to Create plotting a sine curve.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x= np.linspace(0,2 *np.pi, 100)
y =np.sin(x)

plt.plot(x,y)
plt.xlabel('x values between 0 and 2pi',fontsize=18)
plt.ylabel('sin(x) values',fontsize=18)
plt.title('sine curve
plot',fontweight="bold",codor="r",fontsize=20)
plt.show()
FIG:17.Display The Data
Q17: - Write a program to Create scatter plot.

x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x,y,marker="D")
plt.xlabel('x values',fontsize=18)
plt.ylabel('y values',fontsize=18)
plt.title('scatter plot',fontsize=18)
plt.show()
FIG:18.Display The Data
Q18: - Write a program to Create a bar graph

x = ['A','B','C','D']
y = [10,20,15,25]
plt.bar(x,y)
plt.xlabel('Categories',fontsize=18)
plt.ylabel('Values',fontsize=18)
plt.title('Bar plot',fontsize=20,fontsize="bold",color="g")
plt.show()
FIG:19.Display The Data
Q19: - Write a program to create a histogram.

data = np.random.randn(1000)
plt.hist(data, bins=30)
plt.xlabel('Values',fontsize=18,fontweight="bold",color="r")
plt.ylabel('Frequency',fontsize=18,fontweight="bold",color="r")
plt.title('Histogram',fontsize=20,fontweight="bold",color="g")
plt.grid(True,color="r",alpha=0.2)
plt.savefig('K:\example_plot.png')
plt.show()
FIG:20.Display The Data
Q20: - Write a program to create a subplots.

import matplotlib.pyplot as plt


x =[1,2,3,4,5]
y1 =[2,4,6,8,10]
y2 =[3,6,9,12,15]

fig, axes = plt.subplots(2)

axes[0].plot(x,y1)
axes[0].set_title('Plot 1')
axes[0].set_xlabel('x-axis')
axes[0].set_ylabel('y-axis')

axes[1].plot(x,y2)
axes[1].set_title('Plot 2')
axes[1].set_xlabel('x-axis')
axes[1].set_ylabel('y-axis')

plt.tight_layout()
plt.show()
FIG:21.1Display The Data

FIG:21.2Display The Data


Q.21 Create a dataset of students having fileds
age,name,maths marks,english marks and gender using
pythons matpotlib module plot ,scatterplot of the ages
bargraph with name and age ,plot betweenmath marks and
english marks ,bargraph of genders and sub plots of all the
plot or single form.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt np.random.seed(0) num_students = 50 ages =
np.random.randint(18, 25, size=num_students) names = [f"Student {i}" for i in
range(1, num_students+1)] maths_marks = np.random.randint(50, 100,
size=num_students) english_marks = np.random.randint(50, 100,
size=num_students) genders = np.random.choice(['Male', 'Female'],
size=num_students)

21.2:-PERFORMING THE SCATTER PLOT OF AGES OF STUDENTS USING


MATPLOTLIB MODULE OF PYTHON:-

x= ['A','B','C','D','E','F','G','H','I','J']
y=['15','20','16','18','17','25','22','14','23','27']
plt.scatter(x, y,marker="d")
plt.xlabel('Students',fontsize=18)
plt.ylabel('Ages of students',fontsize=18)
plt.title('Scatter Plot',fontsize=18)
plt.show()
FIG:21.3Display The Data

FIG:21.4Display The Data

21.3:-PERFORMING THE BAR-GRAPH OF AGES AND NAMES OF


STUDENTS USING MATPLOTLIB MODULE OF PYTHON:-
x=['A','B','C','D','E','F','G','H','I','J']
y=['15','20','16','18','17','25','22','14','23','27'] plt.bar(x ,
y)
plt.xlabel('Name',fontsize=18)
plt.ylabel('Ages',fontsize=18)
plt.title('Bar graph of names with
ages',fontsize=20,fontweight="bold",color="r")
plt.show()

21.4:-PERFORMING THE BAR-GRAPH OF GENDERS AGES OF


STUDENTS BY USING MATPLOTLIB MODULE OF PYTHON:-

import numpy as np
import matplotlib.pyplot as plt

X = ['Group A','Group B','Group C','Group D']


Ygirls = [10,20,20,40]
Zboys = [20,30,25,30]

X_axis = np.arange(len(X))

plt.bar(X_axis - 0.2, Ygirls, 0.4, label = 'Girls')


plt.bar(X_axis + 0.2, Zboys, 0.4, label = 'Boys')

plt.xticks(X_axis, X) plt.xlabel("Groups")
plt.ylabel("Number of Students")
plt.title("Number of Students in each group")
plt.legend()
plt.show()
FIG:21.5Display The Data
21.5:-PERFORMING THE BAR-GRAPH OF MARKS OF MATHS,
ENGLISH OF STUDENTS BY USING MATPLOTLIB MODULE OF
PYTHON:-
import numpy as np import
matplotlib.pyplot as plt
Maths = [15, 25, 20, 40]
English = [14, 23, 51, 37] n=4
r = np.arange(n) width =
0.25
plt.bar(r, Women, color = 'b',
width = width, edgecolor = 'black',
label='Maths')
plt.bar(r + width, Men, color = 'g',
width = width, edgecolor = 'black',
label='Men')
plt.xlabel("Marks of Students")
plt.ylabel("Math Marks")
plt.title("English Marks")
plt.grid(linestyle='--')
plt.xticks(r + width/2,['Maths','English', 'Maths','English'])
plt.legend()
plt.show()
FIG:22Display The Data

Q.22Create a dataset of crops dieases from kagle by


using a method of SVM algorithms.
ANSWER:-
import numpy as np from sklearn import
datasets
from sklearn.model_selection import train_test_split from
sklearn.svm import SVC from sklearn.metrics import
accuracy_score

root = "/kaggle/input/top-agriculture-crop-disease/Crop Diseases" X, y =


datasets.make_classification( n_samples=1000, n_features=20, n_classes=2,
random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

lst = ["crop","disease","wheat"] clf =


SVC(kernel='linear') clf.fit(X_train, y_train)
y_pred = clf.predict(X_test) for i in
range(5):
print("Predicted value:")
print(lst[y_pred[i]])
print("Actual value:")
print(lst[y_test[i]])
print()
acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)
FIG:23Display The Data

Q.23Create a dataset of crops dieases from kagle by


using a method of Random Forest Classifier.
ANSWER:-
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split from sklearn.metrics import
accuracy_score root = "/kaggle/input/top-agriculture-crop-disease/Crop Diseases"
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3,
random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

lst = ["crop","dieases","non-dieases"]

y_pred = clf.predict(X_test)

for i in range(5):
print("Actual value: ",end="")
print(lst[y_test[i]])
print("Predicted value: ",end="")
print(lst[y_pred[i]])
print()

FIG:24Display The Data


Q.24Create a dataset of crops dieases from kagle by
using a method of Naive Base.
ANSWER:-
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import
train_test_split from sklearn.metrics import
accuracy_score
# Load the iris dataset root = "/kaggle/input/top-agriculture-
crop-disease/Crop Diseases"
# Split the data into training and testing sets

X, y = datasets.make_classification( n_samples=1000, n_features=20, n_classes=2,


random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train a naive Bayes


classifier clf = GaussianNB()
clf.fit(X_train, y_train)
# Use the trained classifier to make predictions on the test set
y_pred = clf.predict(X_test)
# Evaluate the accuracy of the
classifier acc = accuracy_score(y_test,
y_pred) print("Accuracy :
{:.3f}".format(acc)) print(y_pred)
FIG:25Display The Data
Q.25Create a dataset of crops dieases from kagle by
using a method Desicion Tree Classifier.
ANSWER:-
from sklearn.datasets import load_iris from sklearn.tree
import DecisionTreeClassifier from sklearn.model_selection
import train_test_split from sklearn.metrics import
accuracy_score root = "/kaggle/input/top-agriculture-crop-
disease/Crop Diseases"
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
test_size=0.3, random_state=42) lst = ["crop","non-crop","dieases"] clf =
DecisionTreeClassifier(random_state=42) clf.fit(X_train, y_train) lst =
["crop","cropdiease","non-cropdiease"]

y_pred =
clf.predict(X_test) for i in
range(5):
print("Actual value: ",end="")
print(lst[y_test[i]])
print("Predictedvalue:",end=""
) print(lst[y_pred[i]])
print()
acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)
print(y_pred)

You might also like