Lab Manual-MLT
Lab Manual-MLT
LABORATORY MANUAL
Sl.
No.
Name of the Exercises
1. Implementation of decision trees for real world problem
Aim:
To Implement the concept of decision trees with suitable data set from real world
problem and classify the data set to produce new sample.
Algorithm:
Step 1: Start the program
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Step 5: Implement the decision tree algorithm to solve real world problem
Program
import pandas
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
df = pandas.read_csv("C:\\Users\\IT\\Downloads\\data - Sheet1.csv")
d = {'UK': 0, 'USA': 1, 'N': 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1, 'NO': 0}
df['Go'] = df['Go'].map(d)
features = ['Age', 'Experience', 'Rank', 'Nationality']
X = df[features]
y = df['Go']
dtree = DecisionTreeClassifier()
dtree = dtree.fit(X, y)
tree.plot_tree(dtree, feature_names=features)
plt.show()
Output:
Result:
Thus, the implementation of the decision trees with suitable real-world problem has
been completed successfully.
Ex.No:2 DETECTION OF SPAM MAILS USING SUPPORT VECTOR MACHINE
Aim:
To implement the detection of spam mails using Support Vector Machine
Algorithm
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Step 5: Implement the support vector machine algorithm to detect the spam mails
Program
import numpy as np
import pandas as pd
data = pd.read_csv(“C:\\Users\\IT\\Downloads\\spam.csv”)
print(data)
print(data.info())
X=data[‘EmailText’].values
print(X)
Y=data[‘Label’].values
print(Y)
Output
Result:
Thus, the detection of spam mails using Support Vector Machine has been completed
successfully.
Aim:
To implement facial recognition application with Artificial Neural Network.
Algorithm
Step 1: Start the program
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Program
import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Conv2D,MaxPooling2D,Dense,Flatten,Dropout
import matplotlib.pyplot as plt
from keras.layers.normalization import BatchNormalization
from keras_preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
model = Sequential()
model.add(Conv2D(32, kernel_size = (3, 3), activation='relu',
input_shape=(224,224,3)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
#model.add(Dropout(0.3))
model.add(Dense(len(classes),activation='softmax'))
model.summary()
plt.plot(history.history['accuracy'])
plt.plot(history.history['loss'])
plt.xlabel('Time')
plt.legend(['accuracy', 'loss'])
plt.show()
def predict_image(image_path):
plt.imshow(img)
plt.show()
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
print("Actual: "+(image_path.split("/")[-1]).split("_")[0])
print("Predicted: "+classes[np.argmax(pred)])
predict_image("../input/face-recognition-dataset/Original Images/Original
Images/Brad Pitt/Brad Pitt_102.jpg")
Output:
Result:
Thus, the artificial neural network for facial recognition have been implemented
successfully.
Ex.No:4 IMPLEMENTATION OF AMAZON TOOLKIT: SAGEMAKER
Aim:
To implement the Amazon Toolkit Sagemaker
Procedure:
1.Setting up Sagemaker
11. Python code for importing the data set from S3 storage and using in the machine
learning pipeline in Amazon Sagemaker notebook
Aim:
To implement the character recognition using Multilayer Perceptron
Algorithm
Step 1: Start the program
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Program
# importing modules
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
import matplotlib.pyplot as plt
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Cast the records into float values
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
# normalize image pixel values by dividing
# by 255
gray_scale = 255
x_train /= gray_scale
x_test /= gray_scale
print("Feature matrix:", x_train.shape)
print("Target matrix:", x_test.shape)
print("Feature matrix:", y_train.shape)
print("Target matrix:", y_test.shape)
fig, ax = plt.subplots(10, 10)
k=0
for i in range(10):
for j in range(10):
ax[i][j].imshow(x_train[k].reshape(28, 28),aspect='auto')
k += 1
plt.show()
model = Sequential([
# reshape 28 row * 28 column data to 28*28 rows
Flatten(input_shape=(28, 28)),
# dense layer 1
Dense(256, activation='sigmoid'),
# dense layer 2
Dense(128, activation='sigmoid'),
# output layer
Dense(10, activation='sigmoid'),])
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accur
acy'])
model.fit(x_train, y_train, epochs=10,batch_size=2000,validation_split=0.2)
results = model.evaluate(x_test, y_test, verbose = 0)
print('test loss, test acc:', results)
Output:
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-
datasets/mnist.npz
11493376/11490434 [==============================] – 2s 0us/step
Feature matrix: (60000, 28, 28)
Target matrix: (10000, 28, 28)
Feature matrix: (60000,)
Target matrix: (10000,)
test loss, test acc: [0.27210235595703125, 0.9223999977111816]
Result:
Thus the character recognition using multilayer perceptron was implemented
successfully.
Algorithm
Step 1: Start the program
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Program
import numpy as np
X = np.c_[np.ones(len(X)), X]
# predict value
n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
Y = np.log(np.abs(X ** 2 - 1) + .5)
# jitter X
X += np.random.normal(scale=.1, size=n)
def plot_lwr(tau):
plot.title.text='tau=%g' % tau
plot.scatter(X, Y, alpha=.3)
return plot
Output
Result:
Thus the implementation the non-parametric locally weighted regression algorithm has
been completed successfully.
Aim:
To implement the sentiment analysis using Random Forest optimization Algorithm
Algorithm:
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Program:
import numpy as np # linear algebra
import re
import nltk
# For example, running this (by clicking run or pressing Shift+Enter) will list all files
under the input directory
import os
print(os.path.join(dirname, filename))
imdb=pd.read_csv("C:\\Users\\produ\\Downloads\\new\\Train.csv")
print(imdb.info())
print(imdb.shape)
print(imdb.head(10))
imdb['label'].value_counts().plot.pie(figsize=(6,6),title="Distribution
of reviews per
sentiment",labels=['',''],autopct='%1.1f%%')
labels=["Positive","Negative"]
plt.legend(labels,loc=3)
plt.gca().set_aspect('equal')
features = imdb.drop("label",axis=1)
labels = imdb["label"]
Output:
Result:
Thus, the implementation of sentimental analysis using random forest optimization
algorithm performed successfully using a python program.
Algorithm:
Step 1: Start the program
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Program:
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
data = pd.read_csv("C:\\Users\\produ\\Desktop\\ds4.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)
model = BayesianNetwork([
('age', 'Lifestyle'),
('Gender', 'Lifestyle'),
('Family', 'heartdisease'),
('diet', 'cholestrol'),
('Lifestyle', 'diet'),
('cholestrol', 'heartdisease'),
('diet', 'cholestrol')])
model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)
HeartDisease_infer = VariableElimination(model)
print('For Age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3,
Teen:4')
print('For Gender enter Male:0, Female:1')
print('For Family History enter Yes:1, No:0')
print('For Diet enter High:0, Medium:1')
print('for LifeStyle enter Athlete:0, Active:1, Moderate:2, Sedentary:3')
print('for Cholesterol enter High:0, BorderLine:1, Normal:2')
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={
'age': int(input('Enter Age: ')),
'Gender': int(input('Enter Gender: ')),
'Family': int(input('Enter Family History: ')),
'diet': int(input('Enter Diet: ')),
'Lifestyle': int(input('Enter Lifestyle: ')),
'cholestrol': int(input('Enter Cholestrol: '))})
print(q)
Output:
Result:
Thus, the construction of Bayesian network considering medical data was
implemented successfully using python.
Aim:
To implement online fraud detection using best machine learning algorithm
Algorithm:
Step 1: Start the program
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Step 5: Implement the Random Forest algorithm for online fraud detection
Program:
import pandas as pd
data=pd.read_csv("credit.csv",encoding='windows-1252').dropna()
print(data)
X=data.drop(['Class'],axis=1)
print(X)
Y=data['Class']
print(Y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.30)
print(x_train)
print(x_test)
print(y_train)
print(y_test)
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(n_estimators=100)
clf.fit(x_train,y_train)
y_pred=clf.predict(x_test)
print(y_pred)
from sklearn.metrics import accuracy_score
acc1=accuracy_score(y_pred,y_test)
print(acc1)
Output:
Result:
Thus, the online fraud detection using best machine algorithm was implemented
successfully
Aim:
To write a python program to implement a socially relevant problem which needs
machine learning solution
Algorithm:
Step 1: Start the program
Step 2: Import the module which are needed
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Step 5: Implement the Random Forest algorithm for online fraud detection
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize']=20,10
from keras.models import Sequential
from keras.layers import LSTM,Dropout,Dense
from sklearn.preprocessing import MinMaxScaler
df=pd.read_csv("NSE-TATA.csv")
df.head()
df["Date"]=pd.to_datetime(df.Date,format="%Y-%m-%d")
df.index=df['Date']
plt.figure(figsize=(16,8))
plt.plot(df["Close"],label='Close Price history')
data=df.sort_index(ascending=True,axis=0)
new_dataset=pd.DataFrame(index=range(0,len(df)),columns=['Date','Close'])
for i in range(0,len(data)):
new_dataset["Date"][i]=data['Date'][i]
new_dataset["Close"][i]=data['Close'][i]
scaler=MinMaxScaler(feature_range=(0,1))
final_dataset=new_dataset.values
train_data=final_dataset[0:987,:]
valid_data=final_dataset[987:,:]
new_dataset.index=new_dataset.Date
new_dataset.drop("Date",axis=1,inplace=True)
scaler=MinMaxScaler(feature_range=(0,1))
scaled_data=scaler.fit_transform(final_dataset)
x_train_data,y_train_data=[],[]
for i in range(60,len(train_data)):
x_train_data.append(scaled_data[i-60:i,0])
y_train_data.append(scaled_data[i,0])
x_train_data,y_train_data=np.array(x_train_data),np.array(y_train_data)
x_train_data=np.reshape(x_train_data,(x_train_data.shape[0],x_train_data.shape[1],1))
lstm_model=Sequential()
lstm_model.add(LSTM(units=50,return_sequences=True,input_shape=(x_train_data.
shape[1],1)))
lstm_model.add(LSTM(units=50))
lstm_model.add(Dense(1))
inputs_data=new_dataset[len(new_dataset)-len(valid_data)-60:].values
inputs_data=inputs_data.reshape(-1,1)
inputs_data=scaler.transform(inputs_data)
lstm_model.compile(loss='mean_squared_error',optimizer='adam')
lstm_model.fit(x_train_data,y_train_data,epochs=1,batch_size=1,verbose=2)
X_test=[]
for i in range(60,inputs_data.shape[0]):
X_test.append(inputs_data[i-60:i,0])
X_test=np.array(X_test)
X_test=np.reshape(X_test,(X_test.shape[0],X_test.shape[1],1))
predicted_closing_price=lstm_model.predict(X_test)
predicted_closing_price=scaler.inverse_transform(predicted_closing)
lstm_model.save(“saved_model.h5”)
Output:
Result:
Thus, implementation of socially relevant problem which needs machine
learning solution successfully using a python program.
Aim:
To write a python program to implement a sentimental analysis using Lexicon
Classification Algorithm
Algorithm:
Step 1: Start the program
Step 4: Perform various basic operations with loaded dataset to avoid missing values
Step 5: Implement the Lexicon Classification Algorithm for sentimental analysis
Program
import numpy as np
import pandas as pd
import re
%matplotlib inline
import warnings
warnings.filterwarnings(“ignore”)
import os
print(os.listdir(“../input”))
pd.set_option(‘display.max_columns’,None)
US_comments=pd.read_csv(‘../input/youtube/UScomments.csv’,error_bad_lines=False)
US_videos=pd.read_csv(‘../input/youtube/USvideos.csv’,error_bad_lines=False)
US_videos.head()
US_videos.shape
US_videos.nunique()
US_videos.info()
US_videos.head()
US_comments.head()
US_comments.shape
US_comments.isnull().sum()
US_comments.dropna(inplace=True)
US_comments isnull().sum()
US_comments.shape
US_comments.nunique()
US_comments.info()
US_comments.drop(41587,inplace=True)
US_comments=US_comments.reset_index().drop(‘index’,axis=1)
US_comments.likes=US_comments.likes.astype(int)
US_comments.replies=US_comments.replies.astype(int)
US_comments.head()
US_comments[‘comment_text’]=US_comments[‘comment_text’].str.replace(“[^a-zA-
Z#]”,” “)
US_comments[‘comment_text’]=US_comments[‘comment_text’].apply(lambda x: ‘
‘.join([w for w in x.split() if len(w)>3]))
US_comments[‘comment_text’]=US_comments[‘comment_text’].apply(lambda
x:x.lower())
tokenized_tweet=US_comments[‘comment_text’].apply(lambda x:x.split())
tokenized_tweet.head()
wnl = WordNetLemmatizer()
tokenized_tweet.head()
US_comments['comment_text'] = tokenized_tweet
import nltk
nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()
US_comments.head()
US_comments.head()
US_comments.Sentiment.value_counts()
videos = []
for i in range(0,US_comments.video_id.nunique()):
a = US_comments[(US_comments.video_id == US_comments.video_id.unique()[i])
& (US_comments.Sentiment == 'Positive')].count()[0]
b=US_comments[US_comments.video_id==US_comments.video_id.unique()[i]]
['Sentiment'].value_counts().sum()
Percentage = (a/b)*100
videos.append(round(Percentage,2))
Positivity = pd.DataFrame(videos,US_comments.video_id.unique()).reset_index()
Positivity.head()
channels = []
for i in range(0,Positivity.video_id.nunique()):
channels.append(US_videos[US_videos.video_id == Positivity.video_id.unique()[i]]
['channel_title'].unique()[0])
Positivity['Channel'] = channels
Positivity.head()
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud_posi, interpolation="bilinear")
plt.axis('off')
plt.show()
all_words_nega = ' '.join([text for text in US_comments['comment_text']
[US_comments.Sentiment == 'Negative']])
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud_nega, interpolation="bilinear")
plt.axis('off')
plt.show()
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud_neu, interpolation="bilinear")
plt.axis('off')
plt.show()
Output
['youtube']
(7992, 11)
video_id 2364
title 2398
channel_title 1230
category_id 16
tags 2204
views 7939
likes 6624
dislikes 2531
comment_total 4152
thumbnail_link 2364
date 40
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7992 entries, 0 to 7991
Data columns (total 11 columns):
video_id 7992 non-null object
title 7992 non-null object
channel_title 7992 non-null object
category_id 7992 non-null int64
tags 7992 non-null object
views 7992 non-null int64
likes 7992 non-null int64
dislikes 7992 non-null int64
comment_total 7992 non-null int64
thumbnail_link 7992 non-null object
0 [logan, paul]
1 [been, following, from, start, your, vine, cha...
2 [kong, maverick]
3 [attendance]
4 [trending]
Name: comment_text, dtype: object
0 [logan, paul]
1 [been, following, from, start, your, vine, cha...
2 [kong, maverick]
3 [attendance]
4 [trending]
Name: comment_text, dtype: object
Positive 305358
Neutral 260986
Negative 125030
Name: Sentiment, dtype: int64
POSITIVE COMMENTS
NEGATIVE COMMENTS
NEUTRAL COMMENTS