Assign - 10903 - DMCT 1
Assign - 10903 - DMCT 1
TheWEKAGUI Chooserapplicationwillstartandwewouldseethefollowingscreen-
WhenweclickontheExplorer
buttonintheApplicationsselector,itopensthefollowingscreen–
Conclusion-
WEKA is a powerful tool for developing machine learning models. It provides
implementationof several most widely used ML algorithms. Before these algorithms are
applied to yourdataset, it also allows you to preprocess the data. The types of algorithms that
are supportedare classified under Classify, Cluster, Associate, and Select attributes. The
result at
variousstagesofprocessingcanbevisualizedwithabeautifulandpowerfulvisualrepresentation.Thi
smakes it easier for a Data Scientist to quickly apply the various machine learning
techniquesonhisdataset,compare theresultsandcreatethe bestmodelforthefinaluse
Assignment2:
Q1. Given a dataset, reduce the dimension using PCA. Write a program in python for
thatpurpose.
Answer:-
To reduce the dimensionality of a dataset using Principal Component Analysis (PCA),
wefollow thesesteps:
StepstoPerformPCA
1. StandardizetheData:
o Ensurethatthedataisstandardized(mean=0and variance=1).Thisiscrucialbecause
PCAisaffectedbythescaleofthedata.
2. ComputetheCovarianceMatrix:
o Calculatethecovariancematrixofthestandardizeddata.Thecovariancematrixrepre
sentstherelationships(correlations)between thefeaturesinthedataset.
3. CalculateEigenvaluesandEigenvectors:
o Computetheeigenvaluesandeigenvectorsof
thecovariancematrix.Theeigenvectorsrepresentthedirectionsofthenewfeaturesp
ace(principalcomponents), and the eigenvalues indicate the magnitude
(variance) of thesecomponents.
4. SortEigenvectorsbyEigenvalues:
o Rank the eigenvectors by their corresponding eigenvalues in descending
order.The eigenvector with the highest eigenvalue is the principal component
thatexplainsthemostvarianceinthedata.
5. SelectPrincipalComponents:
o Choosethetopkeigenvectorsthatcorrespondtothelargesteigenvalues.Theseeigen
vectorsformthenew reducedfeaturespace.
6. TransformtheData:
o Projecttheoriginaldataontothenewfeaturespaceusingtheselectedeigenvectors.Th
isstepreducesthedimensionalityofthedatasetwhileretainingasmuchvarianceaspo
ssible.
7. AnalyzetheReducedDataset:
o Theresultingdatasetnowhasreduceddimensions,whichcanbeusedforfurtheranaly
sis,visualization,orasinput to machinelearningmodels.
Hence, PCA helps in reducing the dimensionality of a dataset by transforming it into a new
setof orthogonal features (principal components), while retaining most of the variance present
inthe originaldata.
PythonCode-
importpandasaspd
importmatplotlib.pyplotasplti
mportseabornassns
fromsklearn.decompositionimportPCA
fromsklearn.preprocessingimport StandardScaler,
LabelEncoder#Loadyourdataset
df=pd.read_csv(r'C:\Users\BIT.L1-0052\Desktop\birdsongdataset\
test.csv')#Encodecategoricaldataifpresent
if'species' indf.columns:#Adjustthecolumnnameas neededle
=LabelEncoder()
df['species']=le.fit_transform(df['species'])
#Convertcategoricalcolumnstonumerical(one-
hotencodingexample)df=pd.get_dummies(df,drop_first=True)
#Savetargetcolumnforplottingif's
pecies'indf.columns:
target=df['species']
#Dropnon-numericcolumns ifanyremain(optional)df=
df.select_dtypes(include=[float,int])
# Standardize the
datasetscaler=StandardS
caler()
scaled_data=scaler.fit_transform(df)
#ApplyPCA
pca=PCA(n_components=2)
pca_result =
pca.fit_transform(scaled_data)#CreateaDat
aFramewiththePCAresult
pca_df=pd.DataFrame(data=pca_result,columns=['PC1','PC2'])#A
ddthetargetcolumntothePCADataFrameforplotting
if 'species' in
df.columns:pca_df['species'
]=target
#Inspecttheresults
print("ExplainedVarianceRatio:",pca.explained_variance_ratio_)print("P
CAResult(First5rows):")
print(pca_df.head())
# Plot the PCA
resultsplt.figure(figsize=(
10,7))
sns.scatterplot(x='PC1', y='PC2', hue='species', data=pca_df, palette='viridis',
s=100,edgecolor='w')
plt.title('PCA of Bird Song
Dataset')plt.xlabel('PrincipalCompo
nent 1')
plt.ylabel('PrincipalComponent2')plt.
legend(title='Species')plt.grid(True)
plt.show()pri
nt(pca_df)
Output:
Assignment 3:
Q1.Exploreandexaminedifferenttypesofdatavisualizationtools/librariesusingthegivendataset.
PythonCode-
import pandas as
pdimportnumpyasn
p
importmatplotlib.pyplotasplt
fromsklearn.decompositionimportPCA
fromsklearn.preprocessing importStandardScaler
#loadthedata
train_data =
pd.read_csv('train.csv')test_data=pd
.read_csv('test.csv')
#checkthedatatypes
print("TrainingDataHead:\
n",train_data.head())print("TestingDataHead:\
n",test_data.head())
numeric_cols=train_data.select_dtypes(include=[np.number]).columnstrain_n
umeric =train_data[numeric_cols]
test_numeric=test_data[numeric_cols]
#Standardization
scalar=StandardScaler()
train_scaled=
scalar.fit_transform(train_numeric)test_scaled=sc
alar.transform(test_numeric)
#Co-variancematrix
cov_matrix=np.cov(train_scaled,rowvar=False)
#EigenvaluesandEigenvectors
eigenvalues,eigenvectors=np.linalg.eigh(cov_matrix)
print("Eigenvalues :\n",
eigenvalues)print("Eigenvectors:\
n",eigenvectors)
#PrincipalComponentAnalysis(PCA)p
ca = PCA(n_components=5)
train_pca=pca.fit_transform(train_scaled)
print("ExplainedVarianceRatio",pca.explained_variance_ratio_)
pca_df=pd.DataFrame(train_pca,columns=['PC1','PC2','PC3','PC4','PC5'])
#Box
Plotplt.figure(figsize=(10
, 5))plt.boxplot(pca_df)
plt.title('BoxPlotofPrincipalComponents')plt.xlab
el('Principal Components')plt.ylabel('Values')
plt.show()
#scatterplottingplt.figure(
figsize=(10,5))
plt.scatter(pca_df['PC1'],pca_df['PC2'],alpha=0.5)p
lt.title('Scatter Plot of Principal
Components')plt.xlabel('PrincipalComponent1')
plt.ylabel('PrincipalComponent2')plt.
grid(True)
plt.show()
#Violin
plotplt.figure(figsize=(10
,5))
plt.violinplot(pca_df,
showmedians=True)plt.title('ViolinPlotofPrincipa
lComponents')plt.xlabel('Principal
Components')plt.ylabel('Values')
plt.xticks(ticks=[1,2,3,4,5],labels=['PC1','PC2','PC3','PC4','PC5'])
plt.show()
#linegraphofcumulativevariances
cumulative_variance=np.cumsum(pca.explained_variance_ratio_)
plt.figure(figsize=(10,5))
plt.plot(range(1,len(cumulative_variance)+1),cumulative_variance,marker='o',linestyle='-
-')
plt.title('Cumulative Explained Variance
Ratio')plt.xlabel('Number of Principal
Components')plt.ylabel('CumulativeExplainedVariance
Ratio')plt.grid(True)
plt.show()
importseabornassns
plt.figure(figsize=(12,8))
sns.heatmap(pca_components, annot=True,
cmap='coolwarm')plt.title('HeatMapofPCA')
plt.xlabel('Originalfeatures')
plt.ylabel('Principal
Features')plt.show()
Output:
Assignment:4- Implement the Apriori Algorithm using suitable dataset
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
Output-
support itemsets
0 0.064543 (whole milk)
Empty DataFrame
Columns: [antecedents, consequents, antecedent support,
consequent support, support, confidence, lift, leverage,
conviction, zhangs_metric]
Index: []