0% found this document useful (0 votes)

15 views39 pages

Data Mining Using Python

The document outlines the vision and mission of the Computer Science and Engineering Department, emphasizing the development of theoretical and practical skills in students. It details the Program Educational Objectives, Programme Outcomes, and Programme Specific Outcomes aimed at enhancing employability and addressing societal needs. Additionally, it includes practical coding examples for data preprocessing and similarity measures using Python libraries.

Uploaded by

Renuka Gabbeta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views39 pages

Data Mining Using Python

Uploaded by

Renuka Gabbeta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 39

Vision of the Department

TheComputerScience&Engineeringaimsatprovidingcontinuouslystimulatingeducationalenvironmenttoits students for

attaining their professional goals and meet the global challenges.
Missionofthe Department
 DM1: To develop a strong theoretical and practical background across the computer science discipline with
an emphasis on problem solving.
 DM2:Toinculcateprofessionalbehaviourwithstrongethicalvalues,leadershipqualities,innovative thinking and
analytical abilities into the student.
 DM3:Exposethestudents tocuttingedgetechnologies whichenhancetheiremployabilityand knowledge.
 DM4:Facilitatethefaculty tokeeptrackoflatestdevelopmentsintheirresearchareasandencouragethe faculty to
foster the healthy interaction with industry.
ProgramEducationalObjectives (PEOs)
 PEO1:Pursuehighereducation,entrepreneurshipandresearchtocompeteatgloballevel.
 PEO2:Designanddevelop productsinnovativelyincomputerscienceandengineeringand inother allied fields.
 PEO3: Function effectivelyas individuals and as membersof ateam in the conduct ofinterdisciplinaryprojects;
and even at all the levels with ethics and necessary attitude.
 PEO4:Serveever-changingneeds ofsocietywithapragmatic perception.

PROGRAMMEOUTCOMES(POs):

Engineering knowledge: Apply the knowledge of mathematics, science, engineering

PO1 fundamentals,andanengineeringspecializationtothesolutionofcomplex
engineeringproblems.
Problemanalysis:Identify,formulate,reviewresearchliterature,andanalyze complex
PO2 engineeringproblemsreachingsubstantiatedconclusions usingfirst
principlesofmathematics,naturalsciences,andengineeringsciences.
Design/development of solutions: Design solutions for complex engineeringproblems
and design system components or processes that meet the specified needs
PO3
withappropriateconsiderationforthepublichealthandsafety,andthecultural,
societal,and environmental considerations.
Conductinvestigationsofcomplexproblems:Useresearch-basedknowledgeand
PO4 researchmethodsincludingdesignofexperiments,analysisandinterpretationofdata,
andsynthesisof theinformationto providevalid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources,
PO5 andmodernengineeringandITtoolsincludingpredictionandmodelingtocomplex
engineeringactivitieswithan understandingofthe limitations.
Theengineerandsociety:Applyreasoninginformedbythecontextualknowledgeto
PO6 assesssocietal,health,safety,legalandculturalissuesandtheconsequent responsibilities
relevant to the professional engineering practice.
Environmentandsustainability:Understandtheimpactoftheprofessional
PO7 engineeringsolutionsinsocietalandenvironmentalcontexts,anddemonstratethe
knowledgeof, andneedforsustainable development.
DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Ethics:Applyethicalprinciplesandcommittoprofessionalethicsandresponsibilities
PO8
andnormsof theengineeringpractice.
Individualandteam work:Functioneffectivelyasanindividual,andasamemberor
PO9 leaderindiverseteams,andinmultidisciplinarysettings.
Communication:Communicateeffectivelyoncomplexengineeringactivitieswiththe
engineering community and with society at large, such as, being able to comprehend
PO10
andwriteeffectivereportsanddesigndocumentation,makeeffectivepresentations,
andgiveandreceiveclear instructions.
Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a
PO11
memberandleaderinateam,tomanageprojectsandinmultidisciplinary
environments.
Life-long learning: Recognize the need for, and have the preparation and ability to
PO12 engageinindependentandlife-longlearninginthebroadestcontextoftechnological
change

PROGRAMMESPECIFICOUTCOMES (PSOs):
The ability to apply Software Engineering practices and strategies in software
PSO
projectdevelopmentusingopen-sourceprogrammingenvironmentforthesuccessof
1
organization.
PSO Theabilitytodesignanddevelopcomputerprogramsinnetworking,webapplications
2 andIoTasperthesocietyneeds.
PSO
Toinculcateanabilitytoanalyze,designandimplementdatabaseapplications.
3

AIM: Demonstratingthefollowingdatapreprocessingtasksusingpythonlibraries

Description/Theory:
Pre-processing refers to the transformations applied to our data before feeding it to the algorithm.Data
Preprocessing is a technique that is used to convert the raw data into a clean data set. In other words,
whenever the data is gathered fromdifferent sources it is collected in raw format which is not feasible
for the analysis.

PROCEDURE/CODE:
a) Loadingthedataset
import numpy as np
import pandas as pd
dataset=pd.read_csv('D:/DM/file1.csv')
print(dataset)
DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
x=dataset.iloc[:,:-1].values

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
print(x)

b) Identifyingthedependentandindependentvariables.
dataset=pd.read_csv('D:/DM/file2.csv')
print(dataset)
x=dataset.iloc[:,:-1].values
y=dataset.iloc[:,-1].values
print(x)
print(y)
x=dataset.iloc[1:,1:-1].values
y=dataset.iloc[:,[0,4]].values
z=dataset.iloc[:0].values
print(x)
print(y)
print(z)

c) Dealingwithmissingdata
from sklearn.impute import SimpleImputer
imputer=SimpleImputer(missing_values=np.nan,strategy='mean')
imputer=imputer.fit(x[:,1:3])x[:,1:3]=imputer.transform(x[:,1:3])

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
print(x)2 Page|2
imputer=imputer.fit(x[:,1:])
x[:,1:]=imputer.transform(x[:,1:])
print(x)

importnumpyasnp
from sklearn.impute import SimpleImputer
imp=SimpleImputer(missing_values=np.nan,strategy='mean') imp.fit([[1,2],[np.nan,3],
[7,6]])
SimpleImputer() x=[[np.nan,2],[6,np.nan],
[7,6]] print(imp.transform(x))

OUTPUT:
a) Roll NameAgeBranchMarks1Marks2Marks3
0 1Ramesh 19 CSE 28 27 25
1 2Suresh 19 IT 24 29 26
2 3Geetha 18 EEE 19 28 27
3 4Seetha 19 ECE 26 19 28
4 5Sheela19 MECH 27 18 29
5 6 Ram 20 CSE 20 27 20
6 7 Ravi 19 IT 21 26 30
7 8 Mani 19 ECE 30 25 19
8 9 Srinu 20 EEE 28 24 18
9 10 Devi 19 CSE 18 23 23
[[1'Ramesh'19'CSE'2827]
[2'Suresh'19'IT'2429]
[3'Geetha'18'EEE'1928]
[4'Seetha'19'ECE'2619]
[5'Sheela'19'MECH'2718]
[6 'Ram'20 'CSE'2027]
[7'Ravi'19'IT'21 26]
[8'Mani'19 'ECE'3025]
[9'Srinu'20'EEE'2824]
[10'Devi'19'CSE'18 23]]

b) CountryGender Age SalaryPurchased

0 France Male44.072000.0 No
1 SpainFemale27.048000.0 Yes
2 Germany Male30.054000.0 No
3 Spain Male38.061000.0 No

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
4 GermanyFemale40.0 NaN Yes P age|3
5 FranceFemale35.058000.0 Yes
6 SpainFemale NaN52000.0 No

7 France Male48.079000.0 Yes

8 Germany Male50.083000.0 No
9 France Male37.069000.0 Yes
[['France''Male'44.072000.0]
['Spain''Female'27.048000.0]
['Germany''Male'30.054000.0]

c)[['Female'27.048000.0]
['Male'30.0 54000.0]
['Male'38.0 61000.0]
['Female'40.063000.0]
['Female'35.058000.0]
['Female'38.12552000.0]
['Male'48.0 79000.0]
['Male'50.0 83000.0]
['Male'37.069000.0]]

[[4. 2. ]
[6. 3.66666667]
[7. 6. ]]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|4

AIM: Demonstratethefollowingdatapreprocessingtasksusingpythonlibraries

Description/Theory:
Categoricaldataisatypeofdatathatisusedtogroupinformationwithsimilarcharacteristics, while
numerical data is a type of data that expresses information inthe form of numbers.
Exampleofcategoricaldata:gender
Categoricalvariablescanbedividedintotwocategories:
Nominal:noparticularorder
Ordinal:thereissomeorderbetweenvalues
PROCEDURE/CODE:

a) Dealingwithcategoricaldata.
importnumpyasnp
from sklearn.compose import ColumnTransformer
fromsklearn.preprocessingimportOneHotEncoder
ct=ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[0])],remainder='passthroug
h')
x=np.array(ct.fit_transform(x))
print(x)
fromsklearn.preprocessingimportLabelEncoder le
= LabelEncoder()
y=le.fit_transform(y)
print(y)

b) Scalingthefeatures.
fromsklearn.preprocessingimportStandardScaler
sc=StandardScaler()
x_train[:,6:]=sc.fit_transform(x_train[:,6:])
x_test[:,6:]=sc.transform(x_test[:,6:])
print(x_train)
print(x_test)

c) SplittingdatasetintoTrainingandTestingSets
fromsklearn.model_selectionimporttrain_test_split

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=1) Page|5
print(x_train)
print(x_test)
print(y_train)
print(y_test)

OUTPUT
a)[[0.01.00.00.0'Male'44.072000.0]
[1.00.00.01.0'Female'27.048000.0]
[1.00.01.0 0.0 'Male'30.054000.0]
[1.00.00.0 1.0 'Male'38.061000.0]
[1.00.01.00.0'Female'40.0nan]
[0.01.00.00.0'Female'35.058000.0]
[1.00.00.01.0'Female'nan52000.0]
[0.01.00.0 0.0 'Male'48.079000.0]
[1.00.01.0 0.0 'Male'50.083000.0]
[0.01.00.00.0'Male'37.069000.0]]

[0100110101]

b)[[1.00.00.01.0'Female'nan-1.0182239953527132]
[1.00.01.00.0'Female'40.0nan]
[0.01.00.00.0'Male'44.00.5834766714942513]
[1.00.00.01.0'Male'38.0-0.2974586952715791]
[1.00.00.01.0'Female'27.0 -1.3385641287221062]
[0.01.00.00.0'Male'48.01.1440719048906889]
[1.00.01.00.0'Male'50.0 1.4644120382600818]
[0.01.00.00.0'Female'35.0 -0.5377137952986238]]
[[1.00.01.00.0'Male'30.0-0.8580539286680167]
[0.01.00.00.0'Male'37.00.3432215714672067]]

c) [[1.00.00.01.0'Female'nan 52000.0]
[1.00.01.00.0'Female'40.0nan]
[0.01.00.0 0.0 'Male'44.072000.0]
[1.00.00.0 1.0 'Male'38.061000.0]
[1.00.00.01.0'Female'27.048000.0]
[0.01.00.0 0.0 'Male'48.079000.0]
[1.00.01.0 0.0 'Male'50.0 83000.0]
[0.01.00.00.0'Female'35.058000.0]]
[[1.00.01.00.0'Male'30.054000.0]
[0.01.00.00.0'Male'37.069000.0]]
[01001101]
[01]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|6

AIM:Demonstratethefollowing SimilarityandDissimilarityMeasuresusing python

a) Pearson’s Correlation
b) CosineSimilarity
c) JaccardSimilarity
d) EuclideanDistance
e) ManhattanDistance
Description/Theory:
The Pearson correlation coefficient, often referred to as Pearson’s r, is a measure of linear
correlationbetweentwovariables.This means that the Pearson correlation coefficient measures a
normalized measurement of covariance (i.e., a value between -1 and 1 that shows how much
variables vary together).

Cosinesimilarityisametricusedtomeasurethesimilarityoftwovectors.Specifically,it
measures the similarity in the direction or orientation of the vectors ignoring differen

TheJaccardsimilaritymeasuresthesimilaritybetweentwo setsofdatatoseewhichmembersare
sharedanddistinct.TheJaccardsimilarityiscalculatedbydividingthenumberofobservationsin both
sets by the number of observations in either set ces in their magnitude or scale.

Euclidean distance is considered the traditional metric for problems with geometry. It can be
simply explained as the ordinary distance between two points. It is one of the most used
algorithmsintheclusteranalysis.OneofthealgorithmsthatusethisformulawouldbeK-mean.
Mathematically it computes the root of squared differences between the coordinates between
two objects.

Thisdeterminestheabsolutedifferenceamongthepairofthecoordinates.
SupposewehavetwopointsPandQtodeterminethedistancebetweenthese pointswesimply have to
calculate the perpendicular distance of the points from X-Axis and Y-Axis.
In a plane with P at coordinate (x1, y1) and Q at (x2, y2).
ManhattandistancebetweenPandQ=|x1 –x2|+|y1–y2|
PROCEDURE/CODE:

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page|7
a) Pearson’sCorrelation
#Pearson'sCorrelation
fromscipy.statsimportpearsonr
X=[-2,-1,0,1,2]
Y=[4,1,3,2,0]
corr=pearsonr(X,Y)
print('PearsonCorrelation:',corr)

b)CosineSimilarity
importnumpyasnp
fromnumpy.linalgimportnorm

#definetwolistsorarray
a=np.array([2,1,2,3,2,9])
b=np.array([3,4,2,4,5,5])
print("a : ",a)
print("b:",b)

#Compute cosine Similarity

cosine=np.dot(a,b)/(norm(a)*norm(b))
print("Cosine Similarity : ",cosine)

c)JaccardSimilarity
importnumpyasnp
fromscipy.spatial.distanceimportjaccard
a=np.array([1,0,0,1,1,1])
b=np.array([0,0,1,1,1,1])
d=jaccard(a,b)
print("Distance : ",d)

d)EuclideanDistance
fromsklearn.metrics.pairwiseimporteuclidean_distances X=[[0,1],
[1,1]]
euclidean_distances(X,X)
#Calculatingeuclideandistancebetweenvectors from
scipy.spatial.distance import euclidean
row1=[10,20,15,10,5]
row2=[12,24,18,8,7]
dist=euclidean(row1,row2)
print(dist)

minkowskiDistance
fromscipy.spatialimportminkowski_distance

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
row1=[10,20,15,10,5] Page|8
row2=[12,24,18,8,7]
#Calculate distance (p=1)
dist=minkowski_distance(row1,row2,1)
print(dist)
#Calculate distance (p=2)
dist=minkowski_distance(row1,row2,2)
print(dist)

#pearsonwithoutusingmethods
importmath
n=int(input('Enternumberofrows:')) x,y
= [],[]
foriinrange(n):
print('Enter'+str(i+1)+'rowxvalue:',end="")
x.append(int(input()))
print('Enter'+str(i+1)+'rowyvalue:',end="")
y.append(int(input()))
print("Enteredvalues:")
print(x)
print(y)
x2=[] #List for storing x^
y2=[] #Listforstoringy^2
xy=[] #List for storing xy
for i in range(n):
x2.append(math.pow(x[i],2))
y2.append(math.pow(y[i],2))
xy.append(x[i]*y[i])
print('x square : ',x2)
print('y square : ',y2)
print('xy value : ',xy)
rxy=sum(xy) // n-((sum(x)//n) * (sum(y) // n)) #rhoxy
rx =math.sqrt(sum(x2) // n-math.pow((sum(x)//n),2))ry
=math.sqrt(sum(y2) // n-math.pow((sum(y)//n),2))
result = rxy//(rx*ry)
print("Pearson'scorrelationcoefficient:",result)

OUTPUT:
a) PearsonCorrelation:PearsonRResult(statistic=-0.7000000000000001,
pvalue=0.18812040437418737)
Page|9
b) a :[212329]
b:[342455]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
CosineSimilarity:0.8188504723485274 Page|9
c) Distance: 0.4

d) array([[0.,1.],
[1.,0.]])
6.082762530298219

e) 10

Minkowskidistance
13.0
6.082762530298219

Pearson
Enternumberofrows:5
Enter 1 row x value : -2
Enter 1 row y value : 4
Enter 2 row x value : -1
Enter 2 row y value : 1
Enter 3 row x value : 0
Enter 3 row y value : 3
Enter 4 row x value : 1
Enter 4 row y value : 2
Enter 5 row x value : 2
Enter 5 row y value : 0
Entered values :
[-2,-1,0,1,2]
[4,1,3,2, 0]
xsquare:[4.0,1.0,0.0,1.0,4.0]
ysquare:[16.0,1.0,9.0,4.0,0.0]
xyvalue:[-8,-1,0,2,0]
Pearson'scorrelationcoefficient:-1.0

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|10

AIM:Buildamodelusinglinearregressionalgorithmonanydataset.
Description/Theory:

PROCEDURE/CODE:
Linearregressionisthetypeofregressionthatformsarelationshipbetweenthetargetvariable and one
or more independent variables utilizing a straight line. The given equation represents the
equation of linear regression
Y= a +b*X+ e.
#Buildingalinearregressionmodelfordiabetesdataset
importmatplotlib.pyplotasplt
import numpy as np
fromsklearnimportdatasets,linear_model
from sklearn.metrics import mean_squared_error,r2_score
diabetes_X,diabetes_Y=datasets.load_diabetes(return_X_y=True) diabetes_X
= diabetes_X[:,np.newaxis,2]

#Split data into training/testing sets

diabetes_X_trian=diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

#Split the dependent variables

diabetes_y_train=diabetes_Y[:-20]
diabetes_y_test = diabetes_Y[-20:]

#CreateanobjectforLinearRegression
regr=linear_model.LinearRegression()

#Train the model using the training sets

regr.fit(diabetes_X_trian,diabetes_y_train)

#Make predictions using training sets

diabetes_y_pred=regr.predict(diabetes_X_test)

print(diabetes_y_pred)
Plotting:
plt.scatter(diabetes_X_test,diabetes_y_test,color='black')

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
plt.plot(diabetes_X_test,diabetes_y_pred,color='blue',linewidth=2) Page|11
plt.show()
Grid:

plt.scatter(diabetes_X_test,diabetes_y_test,color='black')
plt.plot(diabetes_X_test,diabetes_y_pred,color='blue',linewidth=2,marker='*',markerfacecolor='r
ed')
plt.grid()
plt.show()
LinearRegression:
importmatplotlib.pyplotasplt
import numpy as np
fromsklearn.linear_modelimportLinearRegression
x=np.array([5,15,45,25,35,45]).reshape((-1,1))
y=np.array([5,21,14,22,32,28])
print(x,y)

x=np.array([5,15,45,25,35,45]).reshape((-1,1))
y=np.array([5,21,14,22,32,28])
print(x,y)
model=LinearRegression()
model.fit(x,y)
result=model.score(x,y)

print("Score :",result)model.fit(x,y)
result=model.score(x,y)print("Score
: ",result)
print("Intercept:",model.intercept_)
print("Slope : ",model.coef_)
y_pred = model.predict(x) #predictedbyyvalue
print("Actual values of y : ",y);
print("Predictedvaluesofy:",y_pred)

plt.show()
plt.plot(x,y_pred,color='blue',linewidth=2,marker='o',markerfacecolor='red')

[<matplotlib.lines.Line2Dat0x23b679e6850>]

plt.plot(x,y_pred,color='blue',linewidth=2,marker='o',markerfacecolor='red')
plt.grid()
plt.show()

MultipleLinearRegression

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
#Importinglibraries Page|12
importnumpyasnp
importmatplotlib.pyplotasplt
import pandas as pd

#Importing the dataset

dataset=pd.read_csv('D:/DM/dataset.csv')
dataset.describe()
shape=dataset.shape
print(shape)dataset.columns
X=dataset.iloc[:,:-1].values
y=dataset.iloc[:, -1].values
print(X)

#EncodingCategoricaldata
from sklearn.compose import ColumnTransformer
fromsklearn.preprocessingimportOneHotEncoder
ct=ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[3])],remainder='passthroug
h')
X=np.array(ct.fit_transform(X))
print(X)

#SplitingthedatasetintotheTrainingsetandTestset from
sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

#TrainingtheMultipleLinearRegressionmodelontheTrainingset from
sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(X_train,y_train)

#Predicting the Test set results

y_pred=regressor.predict(X_test)
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred),1),y_test.reshape(len(y_test),1)),1))

plt.scatter(y_test,y_pred);
plt.xlabel('Actual')
plt.ylabel('Predicted')
pred_df=pd.DataFrame({'ActualValue':y_test,'PredictedValue':y_pred,'Difference':y_test- y_pred})
pred_df

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page| 13
OUTPUT:
[225.9732401115.74763374163.27610621114.73638965120.80385422
158.21988574236.08568105121.8150983299.56772822123.83758651
204.7371141196.53399594154.17490936130.9162951783.3878227
171.36605897137.99500384137.99500384189.5684526884.3990668]
Plotting:

GRID:

[[5]
[15]
[45]
[25]
[35]
[45]][52114 223228]

[[5]
[15]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
[45]
[25] Page|14
[35]
[45]][52114 223228]
Score:0.3114260563380282
Score:0.3114260563380282
Intercept:10.912499999999996
Slope:[0.3325]
Actual valuesofy:[521142232 28]
Predicted values of y :[12.575 15.9 25.87519.22522.5525.875]

plt.scatter(x,y,color='black')

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page|15

MultipleLinearRegression:
(50,5)
[[165349.2136897.8471784.1'NewYork']
[162597.7151377.59443898.53'California']
[153441.51101145.55407934.54'Florida']
[144372.41118671.85383199.62'NewYork']
[142107.3491391.77366168.42'Florida']
[131876.999814.71362861.36'NewYork']
[134615.46147198.87127716.82'California']
[130298.13145530.06323876.68'Florida']
[120542.52148718.95311613.29'NewYork']
[123334.88108679.17304981.62'California']
[101913.08110594.11229160.95'Florida']
[100671.9691790.61249744.55'California']
[93863.75127320.38249839.44'Florida']
[91992.39135495.07252664.93'California']
[119943.24156547.42256512.92'Florida']

Actual Value PredictedValue Difference

103282.38 103015.201598 267.178402

144259.40 132582.277608 11677.122392

146121.95 132447.738452 13674.211548

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page|16

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|17

AIM:BuildaclassificationmodelusingDecisionTreealgorithmonirisdataset Description/Theory:
Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostlyit is preferred for solving Classification problems. It is atree-
structuredclassifier,whereinternalnodesrepresentthefeaturesofadataset,branchesrepresent the
decision rules and each leaf node represents the outcome.

PROCEDURE/CODE:
#importlibraries
fromsklearn.datasetsimportload_iris
fromsklearn.treeimportDecisionTreeClassifier,plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
importmatplotlib.pyplotasplt
#Load iris dataset
iris=load_iris()
X=iris.data
y=iris.target
#Split the data into training and testing sets
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=None)
#Create an instance of the DecisionTreeClassifier class
tree_clf=DecisionTreeClassifier(max_depth=3)
#Fit the model on the training data
tree_clf.fit(X_train,y_train)
DecisionTreeClassifier(max_depth=3)
#predict on the testing data
y_pred=tree_clf.predict(X_test)
#Calculate accuracy of the model
accuracy=accuracy_score(y_test,y_pred)
print('Accuracy : ',accuracy)
#Visualizethedecisiontreeusingtheplot_treefunction
plt.figure(figsize=(15,10))
plot_tree(tree_clf,filled=True,feature_names=iris.feature_names,class_names=iris.target_names
)
plt.show()

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page|18
OUTPUT:
Accuracy:0.9555555555555556

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|19

AIM:ApplyNaiveBayesClassificationalgorithmonanydataset. Description/Theory:
NaïveBayesalgorithmisasupervised learningalgorithm,whichisbasedonBayestheorem and used
for solving classification problems.
It is mainly used in text classification that includes a high-dimensional training dataset.
SomepopularexamplesofNaïveBayesAlgorithmarespamfiltration,Sentimentalanalysis, and
classifying articles.

PROCEDURE/CODE:
fromsklearnimportmetrics
fromsklearn.naive_bayesimportGaussianNB import
pandas as pd
importnumpyasnp
fromsklearn.model_selectionimporttrain_test_split
from sklearn.metrics import confusion_matrix
fromsklearn.metricsimportaccuracy_score
dataset=pd.read_csv('D:\DM\iris.csv')
print(dataset)

X=dataset.iloc[:,:4].values
Y=dataset['Species'].values
print(Y)
print(X)

#split the dataset into training and test datasets

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3)
#create an object for Bayes Classifier GaussianNB
classifier=GaussianNB()
classifier.fit(X_train,Y_train)
#predict the values
print(X_test[0])
y_pred=classifier.predict(X_test)
print(y_pred)
accuracy=accuracy_score(Y_test,y_pred)
print("Accuracy: ",accuracy)

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page| 20
#buildconfusionmatrix
fromsklearn.metricsimportconfusion_matrix
cm=confusion_matrix(Y_test,y_pred)
from sklearn.metrics import accuracy_score
print("Accuracy: ",accuracy_score(Y_test,y_pred))
cm

df=pd.DataFrame({'RealValues':Y_test,'PredictedValues':y_pred}) print(df)

NaiveBayes:
#predictclasstablefor
fromsklearn.datasetsimportload_iris
fromsklearn.model_selectionimporttrain_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
fromsklearn.preprocessingimportLabelEncoder
import numpy as np
#import iris dataset
iris=load_iris()
X=iris.data
Y=iris.target
le=LabelEncoder()
Y=le.fit_transform(Y)
#Splitthedatasetintotrainingandtestingsets
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=42)
#Train a Naive bayes model on the training data
nb_model = GaussianNB()
nb_model.fit(X_train, Y_train)
#Make predictions on the test data
y_pred=nb_model.predict(X_test)
y_pred = le.inverse_transform(y_pred)
accuracy=accuracy_score(Y_test,y_pred)
print("Accuracy: ",accuracy)
new_observation = np.array([[5.8, 3.0, 4.5, 1.5]])
predicted_class = nb_model.predict(new_observation)
predicted_class=le.inverse_transform(predicted_class)
print("Predicted class: ", predicted_class)

OUTPUT:
IdSepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCm\
0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
2 3 4.7 3.2 1.3 0.2 Page |21

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2
.. ... ... ... ... ...
145 146 6.7 3.0 5.2 2.
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3
149 150 5.9 3.0 5.1 1.8

Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
.. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica

[150rowsx6columns]
['Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa'
[[ 1. 5.1 3.5 1.4]
[ 2. 4.9 3. 1.4]
[ 3. 4.7 3.2 1.3]
[ 4. 4.6 3.1 1.5]
[ 5. 5. 3.6 1.4]
[ 6. 5.4 3.9 1.7]
[ 7. 4.6 3.4 1.4]
[ 8. 5. 3.4 1.5]
[ 9. 4.4 2.9 1.4]
[ 10. 4.9 3.1 1.5]
[11. 5.4 3.7 1.5]
['Iris-setosa''Iris-virginica''Iris-versicolor''Iris-
setosa''Iris-versicolor''Iris-virginica''Iris-virginica''Iris-
setosa''Iris-virginica''Iris-setosa''Iris-virginica''Iris-
virginica''Iris-setosa''Iris-versicolor''Iris-virginica''Iris-

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
virginica'

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Accuracy:1.0 Page|22
Accuracy:1.0
RealValuesPredictedValues
0 Iris-setosa Iris-setosa
1 Iris-virginica Iris-virginica
2 Iris-versicolorIris-versicolor
3 Iris-setosa Iris-setosa
4 Iris-versicolorIris-versicolor
5 Iris-virginica Iris-virginica

Accuracy:0.9777777777777777
Predictedclass: [1]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|23

AIM:GeneratefrequentitemsetsusingAprioriAlgorithminpythonandalsogenerate association
rules for any market basket data.

Description/Theory:
ApriorialgorithmisgivenbyR.AgrawalandR.Srikantin1994forfindingfrequentitemsetsin a dataset
for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge
of frequent itemset properties. We apply an iterative approach or level-wise search where k-
frequent itemsets are used to find k+1 itemsets.

PROCEDURE/CODE:
pipinstallmlxtend

importpandasaspd
frommlxtend.preprocessingimportTransactionEncoder
frommlxtend.frequent_patternsimportapriori,association_rules data
=[['Bread', 'Milk', 'Eggs'],
['Bread','Diapers','Beer','Eggs'],
['Milk','Diapers','Beer','Cola'],
['Bread''Milk','Diapers','Beer','Cola','Eggs']]
te=TransactionEncoder()
te_ary=te.fit(data).transform(data)
df=pd.DataFrame(te_ary,columns=te.columns_) df
frequent_itemsets=apriori(df,min_support=0.75,use_colnames=True)
print(frequent_itemsets)

rules=association_rules(frequent_itemsets,metric='confidence',min_threshold=0.7)
print(rules)
selected_columns=['antecedents','consequents','antecedentsupport','consequentsupport',
'support', 'confidence']
print(rules[selected_columns])
Apriori:
pip install apyori
importnumpyasnp
importpandasaspd

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
importmatplotlib.pyplotasplt Page|24
fromapyoriimportapriori
store_data=pd.read_csv("D:/DM/store_data.csv",header=None)
display(store_data.head())
store_data.shape

records= []
foriinrange(1,7501):
records.append([str(store_data.values[i,j])forjinrange(0,20)])
print(type(records))

association_rules=apriori(records,min_support=0.0045,min_confidence=0.2,min_lift=3,
min_length=2)
association_results=list(association_rules)
print("Thereare{}Relationderived.".format(len(association_results))) for i
in range(0, len(association_results)):
print(association_results[i][0])

foriteminassociation_results:
pair = item[0]
items=[xforxin pair]
print("Rule:"+items[0]+"->"+items[1])
print("Support: " +str(item[1]))
print("Confidence: " +str(item[2][0][2]))
print("Lift: " +str(item[2][0][3]))
print("==========================================")

OUTPUT:
support itemsets
0 0.75 (Beer)
1 0.75 (Diapers)
2 0.75 (Eggs)
3 0.75(Beer, Diapers)
antecedentsconsequentsantecedentsupportconsequentsupportsupport\
0 (Beer) (Diapers) 0.75 0.75 0.75
1 (Diapers) (Beer) 0.75 0.75 0.75

confidence lift leverageconvictionzhangs_metric

0 1.0 1.333333 0.1875 inf 1.0
1 1.0 1.333333 0.1875 inf 1.0
antecedentsconsequentsantecedentsupportconsequentsupportsupport\
0 (Beer) (Diapers) 0.75 0.75 0.
75
1 (Diapers) (Beer) 0.75 0.75 0.
DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
75

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page| 25
confidence
0 1.0
1 1.0
2
0 1 2 3 4 5 6 7 8 9 10 1 1 1 14 15 16 17 18 1
1 2 3 9
0 shr alm av veg gr w y co en to lo gr h s mi sal anti fro spi ol
im ond oc etab ee h a tta er m w e o al ne m oxy zen na iv
p s ad les n ol m ge gy at fa e ne a ral on dant sm ch e
o mix gr e s ch dr o t n y d wa juic oot oi
ap w ee in jui yo te ter e hie l
es ea se k ce gu a
tfl rt
o
ur

1 bu me eg Na N N N N N N N N N N Na N Na Na Na N
rg atb gs N a a a a a a a a a a N a N N N a
ers alls N N N N N N N N N N N N
2 ch Na Na Na N N N N N N N N N N Na N Na Na Na N
ut N N N a a a a a a a a a a N a N N N a
ne N N N N N N N N N N N N
y
3 tur avo Na Na N N N N N N N N N N Na N Na Na Na N
ke cad N N a a a a a a a a a a N a N N N a
y o N N N N N N N N N N N N
4 mi mil en wh gr N N N N N N N N N Na N Na Na Na N
ne k erg ole ee a a a a a a a a a N a N N N a
ral y whe n N N N N N N N N N N N
wa bar at te
ter rice a
(7501,20)

<class'list'>
There are 48 Relation derived.
frozenset({'chicken', 'light cream'})
frozenset({'escalope','mushroomcreamsauce'})
frozenset({'escalope', 'pasta'})
frozenset({'ground beef', 'herb & pepper'})

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
frozenset({'tomatosauce','groundbeef'}) Page|26
frozenset({'wholewheatpasta','oliveoil'})
frozenset({'shrimp', 'pasta'})
frozenset({'nan', 'chicken', 'light cream'})
frozenset({'shrimp','frozenvegetables','chocolate'})
frozenset({'cooking oil', 'ground beef', 'spaghetti'})

Rule: chicken->light cream

Support: 0.004533333333333334
Confidence:0.2905982905982906
Lift:4.843304843304844
==========================================
Rule:escalope->mushroomcreamsauce
Support: 0.005733333333333333
Confidence:0.30069930069930073
Lift: 3.7903273197390845

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|27

AIM:ApplyK-Meansclusteringalgorithmonanydataset

Description/Theory:
K-MeansClusteringisanUnsupervisedLearningalgorithm,whichgroupstheunlabeleddataset into
different clusters.
Thek-meansclusteringalgorithmmainlyperformstwotasks:
DeterminesthebestvalueforKcenterpointsorcentroidsbyaniterativeprocess.
Assigns each datapointtoits closestk-center.Thosedatapointswhich areneartotheparticular k-
center, create a cluster.
Henceeachclusterhasdatapointswithsomecommonalities,anditisawayfromotherclusters.

PROCEDURE/CODE:
importmatplotlib.pyplotasplt
import numpy as np
from sklearn.cluster import KMeans X=np.array([[1,1],[1.5,2],[3,4],[5,7],[3.5,5],[4.5,5],
[3.5,4.5]])
print(X)
kmeans=KMeans(n_clusters=2)
kmeans.fit(X)
print("\nClusters:",kmeans.cluster_centers_)print("\nLabels:
",kmeans.labels_)
plt.scatter(X[:,0],X[:,1],c=kmeans.labels_,cmap="rainbow")

OUTPUT:
[[1.1.]
[1.52.]
[3.4.]
[5.7.]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
[3.55.] Page |28
[4.55.]
[3.54.5]]

plt.scatter(X[:,0],X[:,1])

KMeans(n_clusters=2)
Clusters: [[1.25 1.5 ]
[3.9 5.1]]
Labels:[001 11 11]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|29

AIM:ApplyHierarchicalClusteringalgorithmonanydataset

Description/Theory:
HierarchicalClusteringinMachineLearning
Hierarchicalclusteringisanotherunsupervisedmachinelearningalgorithm,whichisusedto group
theunlabeleddatasets into a clusterand alsoknown ashierarchical cluster analysis or HCA.
Inthisalgorithm,wedevelopthehierarchyofclustersintheformofatree,andthistree-shaped structure
is known as the dendrogram.
SometimestheresultsofK-meansclusteringandhierarchicalclusteringmaylooksimilar,but they
both differ depending on how they work. As there is no requirement to predetermine the
number of clusters as we did in the K-Means algorithm.

PROCEDURE/CODE:
importpandasaspd
importnumpyas np
import matplotlib.pyplot as plt
importscipy.cluster.hierarchyasshc
fromscipy.spatial.distanceimportsquareform,pdist
a=np.random.random_sample(size=5)
b=np.random.random_sample(size=5)
point=['p1','p2','p3','p4','p5']
data=pd.DataFrame({'Point':point,'a':np.round(a,2),'b':np.round(b,2)})
data=data.set_index('Point')
data
plt.figure(figsize=(8,5))
plt.xlabel('Column a')
plt.ylabel('Column b')

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
plt.title('Scatterplotofxand y') Page|30
plt.scatter(data['a'],data['b'],c='r',marker='*')
for j in data.itertuples():plt.annotate
(j.index,(j.a,j.b),fontsize=15)
dist=pd.DataFrame(squareform(pdist(data[['a','b']]),'euclidean'),
columns=data.index.values,
index=data.index.values)
dist
DANDROGRAM
plt.figure(figsize=(12,5))
plt.title('DendrogramwithSinglelinkage')
dend=shc.dendrogram(shc.linkage(data[['a','b']],method='single'),labels=data.index)

OUTPUT:
a b
P
oi
nt

p1 0.30 0.96
p2 0.31 0.54
p3 0.64 0.88
p4 0.67 0.15
p5 0.25 0.70
<Figuresize800x500with0 Axes>
<Figuresize800x500with0 Axes>

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
Page|31

p1 p2 p3 p4 p5
p1 0.000000 0.420119 0.349285 0.890505 0.264764
p2 0.420119 0.000000 0.473814 0.530754 0.170880
p3 0.349285 0.473814 0.000000 0.730616 0.429535
p4 0.890505 0.530754 0.730616 0.000000 0.692026
p5 0.264764 0.170880 0.429535 0.692026 0.000000
Dandrogram

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
DATE: Page|32

AIM:ApplyDBSCANclusteringalgorithmonanydataset

Description/Theory:
StepsUsedInDBSCANAlgorithm
Findalltheneighborpointswithinepsandidentifythecorepointsorvisitedwithmorethan MinPts
neighbors.
Foreachcorepointifitisnotalreadyassignedtoacluster,createanewcluster.
Findrecursivelyallitsdensity-connectedpointsandassignthemtothesameclusterasthecore point.
A point a and b are said to be density connected if there exists a point c which has a sufficient
number of points in its neighbors and both points a and b are within the eps distance. This is a
chainingprocess.So,ifbis aneighborofc,cisaneighborofd,anddis aneighborofe,which in turn
isneighbor of a implying that b is a neighbor of a.
Iteratethroughtheremainingunvisitedpointsinthedataset.Thosepointsthatdonotbelongto any
cluster are noise.

PROCEDURE/CODE:
importnumpyasnp
fromsklearn.datasetsimportmake_blobs
fromsklearn.preprocessingimportStandardScaler
import matplotlib.pyplot as plt
fromsklearn.clusterimportDBSCAN
centers=[[0.5,2],[-1,-1],[1.5,-1]]
X,y=make_blobs(n_samples=100,centers=centers,cluster_std=0.5,random_state=0)
print(X,y)
db=DBSCAN(eps=0.4,min_samples=5)
db.fit(X)
n_clusters_=len(set(labels))-(1 if -1 in labels else 0)
print("Estimatednumberofclusters:%d"%n_clusters_)
y_pred=db.fit_predict(X)

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
plt.figure(figsize=(6,4)) Page|33
plt.scatter(X[:,0],X[:,1],c=y_pred,cmap='Paired')
plt.title("Clusters determined by DBSCAN")
plt.savefig("DBSCAN.jpg")
KmeansforDBSCAN:
importmatplotlib.pyplotasplt
import numpy as np
fromsklearn.clusterimportKMeans
import matplotlib.pyplot as plt
import numpy as np
fromsklearn.clusterimportKMeans
plt.scatter(X[:,0],X[:,1])
kmeans=KMeans(n_clusters=2)
kmeans.fit(X)
print("\nClusters:",kmeans.cluster_centers_) print("\
n Labels:",kmeans.labels_)

OUTPUT:
[[0.57747371 2.18908126]
[1.1781908-2.11170158]
[0.533258612.15123595]
[0.94780833-0.97391746]
[1.27223375-0.99126042]
[1.266389612.73467938]
[0.588713071.79910953]
DBSCAN(eps=0.4)
labels=db.labels_

n_clusters_=len(set(labels))-(1 if -1 in labels else 0)

print("Estimatednumberofclusters:%d"%n_clusters_)
y_pred=db.fit_predict(X)
[0 -1011-1012-11221 -1-1-1212202

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE
2 0 2-1 2-1-1-12 1 1 1 1-11-1-11112-1 -10 Page | 34
21 0-1 0-101-1 1 1 12 - 2 2 0 0 0 12 -0 -1
1 1
01 0-1 02-1-12 1 1 01 1 0-1 0 1 0 22 1 0 2
12 22]

KMeans(n_clusters=2)

Clusters:[[0.25333922-0.89314775]
[0.492109632.00254955]]

Labels:[10100110010000 0100000 1 000101001100000

1001000010100 10111000000000111001101 0
1110100001001 0101000100000]

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING-LBRCE

CS3352 Foundations of Data Science
No ratings yet
CS3352 Foundations of Data Science
27 pages
CST 322 Data Analytics (Elective)
No ratings yet
CST 322 Data Analytics (Elective)
244 pages
DWDM Lab Manual Final Updated New Finalll
100% (1)
DWDM Lab Manual Final Updated New Finalll
60 pages
DSBDA Lab Manual 2022-23
100% (2)
DSBDA Lab Manual 2022-23
148 pages
Ml-Lab-Manual Cse
No ratings yet
Ml-Lab-Manual Cse
69 pages
BCS 452 1
No ratings yet
BCS 452 1
10 pages
ML Labmanual R22
No ratings yet
ML Labmanual R22
52 pages
BDA Final Lab Manual
100% (1)
BDA Final Lab Manual
56 pages
DDCO Lab Manual
No ratings yet
DDCO Lab Manual
56 pages
DL Lab Manual Student
No ratings yet
DL Lab Manual Student
6 pages
Rishabh ML File
No ratings yet
Rishabh ML File
102 pages
Deep Learning Lab
No ratings yet
Deep Learning Lab
66 pages
DSBDA Lab Manual 2022-23 Final-1
No ratings yet
DSBDA Lab Manual 2022-23 Final-1
148 pages
AIML Merged
No ratings yet
AIML Merged
275 pages
Oose Rahul Merged
No ratings yet
Oose Rahul Merged
63 pages
Ayush Machine Learning Lab
No ratings yet
Ayush Machine Learning Lab
38 pages
Kamal ML
No ratings yet
Kamal ML
38 pages
LecturePlan CS201 21CSH-422
No ratings yet
LecturePlan CS201 21CSH-422
7 pages
Record NNDL - Merged - Organized
No ratings yet
Record NNDL - Merged - Organized
60 pages
COMSATS Institute of Information Technology: Department of Electrical Engineering
No ratings yet
COMSATS Institute of Information Technology: Department of Electrical Engineering
33 pages
B.Tech - AIDS - 2024 - 2024 7 8 20 14 5
No ratings yet
B.Tech - AIDS - 2024 - 2024 7 8 20 14 5
153 pages
ML Lab Manual
No ratings yet
ML Lab Manual
55 pages
DATA MINING Using PYTHON
No ratings yet
DATA MINING Using PYTHON
37 pages
Python Manual
No ratings yet
Python Manual
10 pages
CS Lab Manual New Format
No ratings yet
CS Lab Manual New Format
57 pages
ML Lab Manual Session
No ratings yet
ML Lab Manual Session
50 pages
Dev Ops Lab Manual
No ratings yet
Dev Ops Lab Manual
25 pages
2nd Year
No ratings yet
2nd Year
137 pages
BI REPORT 8 To Rem Pages Final
No ratings yet
BI REPORT 8 To Rem Pages Final
40 pages
Final File
No ratings yet
Final File
38 pages
R22 B.tech DS Course Structure and Contents
No ratings yet
R22 B.tech DS Course Structure and Contents
135 pages
Ada Final
No ratings yet
Ada Final
37 pages
DMML Lab
No ratings yet
DMML Lab
35 pages
Lab Manual - 672 - CE
No ratings yet
Lab Manual - 672 - CE
38 pages
CCS356 OOSE - LAB - FRONT v6
No ratings yet
CCS356 OOSE - LAB - FRONT v6
10 pages
DS Mannual Updated Shanti
No ratings yet
DS Mannual Updated Shanti
82 pages
3rd Year
No ratings yet
3rd Year
104 pages
Bcs 452 Amit Singh 69
No ratings yet
Bcs 452 Amit Singh 69
20 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
62 pages
Lab Record Front Page DS CS
No ratings yet
Lab Record Front Page DS CS
7 pages
DATA STRUCTURE LAB MANUAL 2020-2021 Final Edited
100% (2)
DATA STRUCTURE LAB MANUAL 2020-2021 Final Edited
117 pages
DS Lab Manual
No ratings yet
DS Lab Manual
70 pages
ML Mod-2
No ratings yet
ML Mod-2
33 pages
Python Datastructures Course File
No ratings yet
Python Datastructures Course File
35 pages
LecturePlan BI519 22CSH-311
No ratings yet
LecturePlan BI519 22CSH-311
11 pages
Annexure I - B.Tech CSE (AI-DS) Syllabus 2023-24 - New
No ratings yet
Annexure I - B.Tech CSE (AI-DS) Syllabus 2023-24 - New
32 pages
BCS 452 1
No ratings yet
BCS 452 1
10 pages
Final B.Tech - CSE IBM AI ML Syllabus
No ratings yet
Final B.Tech - CSE IBM AI ML Syllabus
82 pages
Elements of Railway Tracks
No ratings yet
Elements of Railway Tracks
27 pages
Bda Vision Mission New
No ratings yet
Bda Vision Mission New
4 pages
Internship
No ratings yet
Internship
22 pages
Chapter 2 Components of Food
No ratings yet
Chapter 2 Components of Food
12 pages
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
No ratings yet
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
34 pages
Curriculum and Syllabi (2020-2021) : School of Computer Science and Engineering
No ratings yet
Curriculum and Syllabi (2020-2021) : School of Computer Science and Engineering
26 pages
2024-25 AI & DS III Sem-A Sec IDS 8
No ratings yet
2024-25 AI & DS III Sem-A Sec IDS 8
4 pages
Cse All
No ratings yet
Cse All
8 pages
LecturePlan - CS201 - 21CSH 311 DAA LAB
No ratings yet
LecturePlan - CS201 - 21CSH 311 DAA LAB
7 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
10 pages
LecturePlan CS201 20SMP-460
No ratings yet
LecturePlan CS201 20SMP-460
5 pages
Programming-IN-C-LAB-MANUAL FRONT
No ratings yet
Programming-IN-C-LAB-MANUAL FRONT
3 pages
Shostakovich - 5 Info From Nelson
100% (2)
Shostakovich - 5 Info From Nelson
22 pages
Section 13F - Engine Electrical System PDF
No ratings yet
Section 13F - Engine Electrical System PDF
14 pages
Recirculation Pump Sizing
No ratings yet
Recirculation Pump Sizing
3 pages
Lab Manual For: Advanced Python Programming Lab (CS311PC)
No ratings yet
Lab Manual For: Advanced Python Programming Lab (CS311PC)
27 pages
Cable Laying Specification
No ratings yet
Cable Laying Specification
16 pages
Experiment Name: Creep Analysis and Case Study
100% (2)
Experiment Name: Creep Analysis and Case Study
5 pages
Measurement of High Voltage
100% (1)
Measurement of High Voltage
30 pages
Westock - Ultra Slim Floor Beam (USFB) Design
100% (1)
Westock - Ultra Slim Floor Beam (USFB) Design
20 pages
Stock Market Prediction Using MLP and Random Forest
No ratings yet
Stock Market Prediction Using MLP and Random Forest
18 pages
Analysis of Temperature and Pressure Changes in Liquefied Natural
No ratings yet
Analysis of Temperature and Pressure Changes in Liquefied Natural
9 pages
Subtraction Strategies That Lead To Regrouping
100% (1)
Subtraction Strategies That Lead To Regrouping
6 pages
7SR10 Argus Complete Technical Manual
0% (1)
7SR10 Argus Complete Technical Manual
205 pages
WH Questions
100% (1)
WH Questions
13 pages
2 UG Crystal Note
No ratings yet
2 UG Crystal Note
97 pages
SSC CGL Physics in English d241009b
No ratings yet
SSC CGL Physics in English d241009b
13 pages
PX-760/PX-860/AP-260/AP-460/PX-160 MIDI Implementation: Casio Computer Co., LTD
No ratings yet
PX-760/PX-860/AP-260/AP-460/PX-160 MIDI Implementation: Casio Computer Co., LTD
51 pages
Greenhouse Effect Atmosphere Carbon Dioxide Nitrous Oxide Methane
No ratings yet
Greenhouse Effect Atmosphere Carbon Dioxide Nitrous Oxide Methane
11 pages
Es 13 - Module 7 - Flanged Bolt Coupling
No ratings yet
Es 13 - Module 7 - Flanged Bolt Coupling
9 pages
Oxymag English
No ratings yet
Oxymag English
40 pages
Krage Et Al, 2020
No ratings yet
Krage Et Al, 2020
17 pages
Imd 95-280y Im 0322
No ratings yet
Imd 95-280y Im 0322
6 pages
23 July 2024 - Comprehensive Review of Depression Detection Techniques Based On Machine Learning Approach
No ratings yet
23 July 2024 - Comprehensive Review of Depression Detection Techniques Based On Machine Learning Approach
25 pages
Normalization 1
No ratings yet
Normalization 1
14 pages
MEC132 F3 1819 Solutions
No ratings yet
MEC132 F3 1819 Solutions
7 pages
m5 Datasheet
No ratings yet
m5 Datasheet
1 page
Ferro Electric
No ratings yet
Ferro Electric
33 pages
ECF1100 Individual Assignment 2 Sem 1 2022
No ratings yet
ECF1100 Individual Assignment 2 Sem 1 2022
4 pages
Hooded Dino Blanket
No ratings yet
Hooded Dino Blanket
2 pages
BANFLEX
No ratings yet
BANFLEX
1 page
Delivering Digital Solutions: Software engineering, testing and deployment
From Everand
Delivering Digital Solutions: Software engineering, testing and deployment
Peter Thompson
No ratings yet

Data Mining Using Python

Uploaded by

Data Mining Using Python

Uploaded by

Vision of the Department

TheComputerScience&Engineeringaimsatprovidingcontinuouslystimulatingeducationalenvironmenttoits students for

Engineering knowledge: Apply the knowledge of mathematics, science, engineering

b) CountryGender Age SalaryPurchased

7 France Male48.079000.0 Yes

AIM:Demonstratethefollowing SimilarityandDissimilarityMeasuresusing python

#Compute cosine Similarity

#Split data into training/testing sets

#Split the dependent variables

#Train the model using the training sets

#Make predictions using training sets

#Importing the dataset

#Predicting the Test set results

Actual Value PredictedValue Difference

103282.38 103015.201598 267.178402

144259.40 132582.277608 11677.122392

146121.95 132447.738452 13674.211548

#split the dataset into training and test datasets

confidence lift leverageconvictionzhangs_metric

Rule: chicken->light cream

n_clusters_=len(set(labels))-(1 if -1 in labels else 0)

Labels:[10100110010000 0100000 1 000101001100000

You might also like