0% found this document useful (0 votes)

69 views27 pages

DMlab - FilE prINCE

DMlab_FilE prINCE

Uploaded by

Rajput Prince Singh Kachhwaha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views27 pages

DMlab - FilE prINCE

DMlab_FilE prINCE

Uploaded by

Rajput Prince Singh Kachhwaha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Madhav Institute of Technology and Science, Gwalior

(Deemed to be University)

NAAC accredited with A++ Grade

Centre for Artificial Intelligence

A
Practical File
On
“Data Mining And Warehousing”
(270601)

Session: Jan-May(2024)

SUBMITTED BY :
PRINCE SINGH
0901AM211041

SUBMITTED TO:
Prof. Shubha Mishra
INDEX

S. Name of Experiment DATE OF SUBMITTED Sign

No. EXPERIMENT ON

1 To perform basic operations for mining

data (Preprocessing, Regression,
Classification,
Association, Clustering and Visualization )
using WEKA simulator/Python.
2 Setting up a flow to load an ARFF file (batch
mode) and perform a cross validation using J48
(WEKA’s C4.5 implementation).
3 Draw multiple ROC curves in the same plot
window for J48 and RandomForest as
classifiers using Knowledge flow in weka.
4 Training and Testing of naive Bayes
classifiers incrementally using Knowledge flow
in weka.
5 Write a program to count the occurrence
frequency of items in the given data set.

6 Write a program to generate frequent itemset

from a given data set.

7 Write a program to generate Association rules

from the generated frequent itemsets

8 Write a program to implement various

Association Rule Mining algorithms such as
Apriori, Eclat, FP growth and FP Tree.
9 Write a program to implement different type
of classification algorithms such as SVM,

10 Write a program to implement different types

of clustering algorithms such as Kmean,
Hierarchical, DBScan and EM Clustering.
PROGRAM - 1
AIM : To perform basic operations for mining data (Preprocessing, Regression,
Classification, Association,Clustering and Visualization ) using WEKA simulator/Python.

THEORY :

Weka contains a collection of visualization tools and algorithms for data analysis and
predictive modelling, together with graphical user interfaces for easy access to these
functions. Weka supports several standard data mining tasks, specifically, data preprocessing,
clustering, classification, regression, visualization, and feature selection. Input to Weka is
expected to be formatted according to the Attribute-Relational File Format and filename with
the .arff extension.

 Preprocessing:

The preprocessing of data is a crucial task in data mining. Because most of the data is raw,
there are chances that it may contain empty or duplicate values, have garbage values, outliers,
extra columns, or have a different naming convention. All these things degrade the results.

To make data cleaner, better and comprehensive, WEKA comes up with a comprehensive set
of options under the filter category.
 Classification :

Classification is one of the essential functions in machine learning, where we assign classes
or categories to items. The classic examples of classification are: declaring a brain tumour as
"malignant" or "benign" or assigning an email to a "spam" or "not_spam" class.
 Clustering :

In clustering, a dataset is arranged in different groups/clusters based on some similarities. In

this case, the items within the same cluster are identical but different from other clusters.
Examples of clustering include identifying customers with similar behaviours and organizing
the regions according to homogenous land use.

 Association :

Association rules highlight all the associations and correlations between items of a dataset. In
short, it is an if-then statement that depicts the probability of relationships between data
items. A classic example of association refers to a connection between the sale of milk and
bread. The tool provides Apriori , FilteredAssociator, and FPGrowth algorithms for
association rules mining in this category.
 Visualisation :

In the visualize tab, different plot matrices and graphs are available to show the trends and
errors identified by the model.
PROGRAM - 2
AIM : Setting up a flow to load an ARFF file (batch mode) and perform a cross validation
using J48 (WEKA’s C4.5 implementation).

THEORY :

 ARFF (Attribute-Relation File Format):

ARFF is a file format commonly used to describe datasets for WEKA. It includes information
about the dataset's attributes, their types, and the data values.

 Cross-validation:

A technique used to assess the performance and generalizability of a machine learning model.
It involves partitioning the dataset into subsets, training the model on some subsets, and
evaluating it on others.

 J48 (C4.5):

J48 is an implementation of the C4.5 algorithm in WEKA. It's a decision tree algorithm used
for classification.

OUTPUT –
PROGRAM - 3
AIM - Draw multiple ROC curves in the same plot window for J48 and Random Forest as
classifiers using Knowledge flow in Weka.

THEORY –

 ROC Curves:

Receiver Operating Characteristic (ROC) curves are graphical representations commonly

used to evaluate the performance of binary classification algorithms. They illustrate the trade-
off between the true positive rate (sensitivity) and the false positive rate (1 - specificity)
across various decision thresholds.

 Key Concepts:

 True Positive Rate (TPR):

Also known as sensitivity, TPR measures the proportion of actual positive instances that
are correctly identified by the classifier.

TPR = TP / (TP + FN), where TP denotes true positives and FN denotes false negatives.

 False Positive Rate (FPR):

FPR measures the proportion of actual negative instances that are incorrectly classified
as positive by the classifier.

FPR = FP / (FP + TN), where FP denotes false positives and TN denotes true negatives.

The ROC curve is created by plotting the TPR against the FPR for different threshold values.
Each point on the curve represents a sensitivity-specificity pair corresponding to a particular
decision threshold. A diagonal line (the line of no-discrimination) represents the
performance of a random classifier.

 Interpretation of ROC Curves:

 Area Under the Curve (AUC):

• AUC quantifies the overall performance of the classifier.
• AUC ranges from 0 to 1, where 1 indicates a perfect classifier, and 0.5 indicates
a random classifier.
• Higher AUC values indicate better classifier performance.

 Shape of the ROC Curve:

 ROC curves with higher elevations and closer to the upper-left corner indicate
superior classifier performance.
 The closer the ROC curve to the upper-left corner, the better the classifier
discriminates between positive and negative instances.
 Use Cases and Significance:

 Comparative Analysis:
ROC curves enable the comparison of multiple classifiers to determine which one
performs better across various decision thresholds.
It helps in selecting the most suitable classifier for a given task based on its AUC value.

 Model Selection and Tuning:

ROC curves aid in tuning classifier parameters to optimize performance.
They provide insights into the sensitivity-specificity trade-offs, helping to select an
appropriate operating point for the classifier.

WEKA KNOWLEDGE FLOW ENVIRONMENT VISUALIZATION :

RESULT :
PROGRAM - 4
AIM : Training and Testing of naive Bayes classifiers incrementally using Knowledge flow in
weka.

THEORY :

Naive Bayes classifiers are simple probabilistic classifiers based on applying Bayes' theorem
with strong (naive) independence assumptions between the features. They are often used in text
classification, spam filtering, and other applications where the assumption of independence
between features holds reasonably well.

In WEKA, the Knowledge Flow interface allows for the creation of workflows for data mining
tasks, including incremental learning. The IncrementalClassifierUpdate operator in WEKA's
Knowledge Flow allows us to train and test Naive Bayes classifiers incrementally, updating the
model as new data arrives.

Components :

 ARFF (Attribute-Relation File Format) : ARFF is a file format commonly used to

describe datasets for WEKA. It includes information about the dataset's attributes, their
types, and the data values.

 ClassAssigner : Sets The Column (first or last) as Class, name a empty column as Class.

 NaiveBayesUpdateable :- This is a class that implements the Naive Bayes algorithm for
classification and is designed to handle data streams or situations where data arrives
sequentially and cannot be stored in memory all at once. It allows for incremental updating of
the model as new data arrives, which is useful for scenarios where training data is constantly
changing or evolving.

 IncrementalClassifierEvaluator : This is a class that allows you to evaluate the

performance of a classifier on a data stream or in a situation where data arrives sequentially
and cannot be stored in memory all at once. It is useful for assessing the performance of
classifiers in online learning scenarios or when dealing with continuously evolving data.

 TextViewer : Used to Show the results of the model in text format.

OUTPUT :
PROGRAM - 5
AIM : Write a program to count the occurrence frequency of items in the given data set.

THEORY :

Counting the occurrence frequency of items in a dataset is a fundamental task in data analysis. It
provides valuable insights into the distribution of data and helps in understanding the importance
or prevalence of different categories or classes within the dataset. This information can be useful
in various applications such as classification, anomaly detection, and clustering.

CODE :

import pandas as pd

iris_df = pd.read_csv("/content/iris_data.csv", header=None,

names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])

# Count the occurrence frequency of items in the 'class'

column frequency = iris_df['class'].value_counts()

print("Occurrence frequency of items in the IRIS dataset:")

print(frequency)

OUTPUT :
PROGRAM - 6
AIM : Write a program to generate frequent item set from a given data set.

THEORY :

In this experiment, we applied the Apriori algorithm to a given dataset to generate frequent
itemsets. The algorithm identified sets of items that frequently appear together in transactions,
with the minimum support threshold set to 0.2.

The generated frequent itemsets can be used to derive association rules, which can provide
valuable insights into the relationships between different items in the dataset. These rules can be
used for various purposes, such as market basket analysis, where they can help identify patterns
in customer purchasing behavior.

CODE :

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

df = pd.read_csv("/new_dataset.csv")

te = TransactionEncoder()
te_ary = te.fit(df.values).transform(df.values)
df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

print("Frequent Itemsets:")
print(frequent_itemsets)

OUTPUT :
PROGRAM - 7
AIM : Write a program to generate Association rules from the generated frequent itemsets.

THEORY :

Association rule mining is a technique used to discover interesting relationships, or associations,

between items in large datasets. It is often used in market basket analysis to uncover patterns in
consumer behavior. The process involves finding frequent itemsets, which are sets of items that
frequently occur together in transactions, and then deriving association rules from these itemsets.

In this experiment, we first generate frequent itemsets from the dataset using the Apriori
algorithm. Frequent itemsets are sets of items that have a support value greater than a specified
threshold. We then use these frequent itemsets to generate association rules. Association rules
are rules that indicate a strong relationship between the presence of certain items (antecedent)
and the presence of another item (consequent) in a transaction, based on the support and
confidence values of the rule.

CODE :

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori,
association_rules df = pd.read_csv("/new_dataset.csv")
te = TransactionEncoder()
te_ary = te.fit(df.values).transform(df.values)

df = pd.DataFrame(te_ary, columns=te.columns_)

frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

association_rules_df = association_rules(frequent_itemsets,
metric='confidence', min_threshold=0.7)
print("Association Rules:")
print(association_rules_df)

OUTPUT :
Program – 8
Aim: Write a program to implement various Association Rule Mining algorithms such as Apriori
, Eclat, FP growth and FP Tree.

Theory:
1. Apriori Algorithm
Theory:

The Apriori algorithm is based on the principle of "apriori property," which states that any subset
of a frequent itemset must also be frequent. The algorithm employs a level-wise approach to
discover frequent itemsets. It starts by identifying frequent individual items, then iteratively
generates larger itemsets by joining frequent itemsets found in the previous step.

Implementation:

The Python implementation of the Apriori algorithm involves generating candidate itemsets,
pruning infrequent itemsets, and iterating until no new frequent itemsets are found. It then
derives association rules based on the discovered frequent itemsets and evaluates them using
support and confidence measures.

2. Eclat Algorithm

Theory:

Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) is a depth-first search
algorithm that avoids candidate generation. It uses a vertical database representation to
efficiently mine frequent itemsets by intersecting transactions containing each item.

Implementation:

The Python implementation of the Eclat algorithm involves constructing a vertical database
representation, recursively exploring itemsets, and counting their support. It efficiently generates
frequent itemsets without the need for candidate generation, making it suitable for memory-
constrained environments.
3. FP-Growth Algorithm

Theory:

The FP-Growth (Frequent Pattern Growth) algorithm is a tree-based method that constructs a
compact data structure called FP-tree to represent the dataset. It then recursively mines frequent
itemsets from the FP-tree by exploiting the properties of prefix paths.

Implementation:

The Python implementation of the FP-Growth algorithm involves constructing the FP-tree,
mining frequent itemsets using the FP-tree structure, and deriving association rules. FP-Growth
eliminates the need for candidate generation and multiple scans of the dataset, making it highly
efficient for large datasets.

4. FP-Tree Algorithm

Theory:

FP-Tree is a variation of the FP-Growth algorithm that focuses on constructing the FP-tree data
structure efficiently. It uses a frequent itemset ordering technique to optimize the construction
process and reduce memory consumption.

Implementation:

The Python implementation of the FP-Tree algorithm involves constructing the FP-tree data
structure, mining frequent itemsets, and deriving association rules. It shares similarities with FP-
Growth but may offer better performance in certain scenarios due to its optimized tree
construction.

Code:
Output:
Program – 9
Aim: Write a program to implement different type of classification algorithms such as SVM.

Theory:

Classification algorithms are essential tools in machine learning and data mining, facilitating the
categorization of data into distinct classes or categories based on input features. Various
classification algorithms employ different approaches to learn patterns and make predictions.

Decision Trees: Decision trees partition the feature space into regions, creating a tree-like
structure where each internal node represents a decision based on a feature value, and each leaf
node represents a class label. Decision trees are interpretable and can handle both numerical and
categorical data, making them suitable for understanding complex decision-making processes.

Random Forest: Random Forest is an ensemble learning method that constructs multiple decision
trees during training. It aggregates the predictions of individual trees to determine the final class
label. By reducing overfitting and improving generalization, Random Forest achieves higher
accuracy than individual decision trees. It also provides estimates of feature importance, aiding
in feature selection.

Support Vector Machines (SVM): SVM aims to find the optimal hyperplane that separates data
points of different classes with the maximum margin. SVM can handle high-dimensional data
efficiently and is effective in cases where the data is not linearly separable by transforming the
feature space using kernel functions. However, SVM's performance may degrade with large
datasets.

K-Nearest Neighbors (KNN): KNN is a non-parametric algorithm that classifies data points
based on the majority class among their 'k' nearest neighbors in the feature space. KNN is simple
and intuitive, making no assumptions about the underlying data distribution. However, its
performance can be sensitive to the choice of the distance metric and the value of 'k'.

Code:

import numpy as np

from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.tree import
DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
def train_and_evaluate(classifier, X_train, X_test, y_train,
y_test): # Train the classifier
classifier.fit(X_train, y_train)
# Make predictions
y_pred =
classifier.predict(X_test) #
Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
return accuracy
# Load sample dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Feature scaling
sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Define classifiers
svm_classifier = SVC(kernel='linear', random_state=42)
dt_classifier = DecisionTreeClassifier(random_state=42)
rf_classifier = RandomForestClassifier(n_estimators=100,
random_state=42) knn_classifier = KNeighborsClassifier(n_neighbors=3)

# Train and evaluate classifiers

classifiers = {
"SVM": svm_classifier,
"Decision Tree": dt_classifier,
"Random Forest": rf_classifier,
"K-Nearest Neighbors": knn_classifier
}
for name, classifier in classifiers.items():
accuracy = train_and_evaluate(classifier, X_train, X_test, y_train,
y_test) print("Accuracy of {} classifier: {:.2f}%".format(name, accuracy
* 100))

Output:
Program - 10
Aim: Write a program to implement different types of clustering algorithms such as Kmean,
Hierarchical, DBScan and EM Clustering.

Theory:
Clustering algorithms are unsupervised learning techniques used to group similar data points
together. One commonly used algorithm is K-Means, which partitions the data into 'k' clusters by
iteratively updating cluster centroids and assigning data points to the nearest centroid. K-Means
is efficient and easy to implement but requires the number of clusters to be specified beforehand.

Hierarchical clustering builds a hierarchy of clusters by merging or splitting them based on the
similarity between data points. It can be agglomerative (starting with individual data points and
merging them into clusters) or divisive (starting with all data points in one cluster and
recursively splitting them).

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) identifies clusters

based on the density of data points. It groups together points in high-density regions while
classifying points in low-density regions as outliers. DBSCAN is effective in discovering
clusters of arbitrary shapes and sizes and does not require the number of clusters as input.

EM (Expectation-Maximization) clustering models the data as a mixture of Gaussian

distributions. It iteratively maximizes the likelihood of the data under the Gaussian mixture
model, estimating the parameters of each component distribution. EM clustering is flexible and
can capture complex cluster structures, making it suitable for datasets with overlapping clusters
or non-spherical shapes.

Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.mixture import GaussianMixture

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.6, random_state=0)

# K-Means clustering
kmeans = KMeans(n_clusters=4)
kmeans_labels = kmeans.fit_predict(X)

# Hierarchical clustering
hierarchical = AgglomerativeClustering(n_clusters=4)
hierarchical_labels = hierarchical.fit_predict(X)

# DBSCAN clustering
dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan_labels = dbscan.fit_predict(X)
# EM (Expectation-Maximization) clustering
em = GaussianMixture(n_components=4)
em_labels = em.fit_predict(X)

# Plotting
plt.figure(figsize=(12, 10))

plt.subplot(2, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=kmeans_labels,
cmap='viridis') plt.title('K-Means Clustering')

plt.subplot(2, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=hierarchical_labels,
cmap='viridis') plt.title('Hierarchical Clustering')

plt.subplot(2, 2, 3)
plt.scatter(X[:, 0], X[:, 1], c=dbscan_labels,
cmap='viridis') plt.title('DBSCAN Clustering')

plt.subplot(2, 2, 4)
plt.scatter(X[:, 0], X[:, 1], c=em_labels, cmap='viridis')
plt.title('EM (Expectation-Maximization) Clustering')

plt.tight_layout()
plt.show()
Output:

Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
Data Minig Lab File
No ratings yet
Data Minig Lab File
25 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
DM Lab Record PDF
No ratings yet
DM Lab Record PDF
32 pages
DWDM Manual-1
No ratings yet
DWDM Manual-1
96 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
DWDM Print
No ratings yet
DWDM Print
20 pages
Recent Trends in IT Practical Solutions
No ratings yet
Recent Trends in IT Practical Solutions
11 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
Week 1
No ratings yet
Week 1
4 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
DMDV 210
No ratings yet
DMDV 210
61 pages
DMW FIle
No ratings yet
DMW FIle
27 pages
Dataminingg
No ratings yet
Dataminingg
22 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
BBA CA Semester III Manisha Madam
No ratings yet
BBA CA Semester III Manisha Madam
32 pages
Lecture 7 - Weka
No ratings yet
Lecture 7 - Weka
69 pages
DA LabFile
No ratings yet
DA LabFile
63 pages
DW Lab
No ratings yet
DW Lab
85 pages
OS Journal
No ratings yet
OS Journal
28 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
Assignment 1-Preprocessing Handon
No ratings yet
Assignment 1-Preprocessing Handon
6 pages
WEKA Lab Session
No ratings yet
WEKA Lab Session
88 pages
Latest Data Mining Lab Manual
No ratings yet
Latest Data Mining Lab Manual
74 pages
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
No ratings yet
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
104 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
6.034 Design Assignment 2: 1 Data Sets
No ratings yet
6.034 Design Assignment 2: 1 Data Sets
6 pages
Customer Indexing
No ratings yet
Customer Indexing
3 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Lab (I)
No ratings yet
Lab (I)
3 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
Individual Assignment 2
No ratings yet
Individual Assignment 2
4 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
No ratings yet
Implementation of Apriori Algorithm Using Weka: Ajay Kumar Shrivastava R. N. Panda
4 pages
DWDM File
No ratings yet
DWDM File
26 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
DWM1
No ratings yet
DWM1
19 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
No ratings yet
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
4 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
92 pages
Wekappt
No ratings yet
Wekappt
58 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Weka (20030421-Version1 by Kdelab)
No ratings yet
Weka (20030421-Version1 by Kdelab)
51 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Assignment No. 2 Database Management System
No ratings yet
Assignment No. 2 Database Management System
9 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Intro Gen AI 6p
100% (1)
Intro Gen AI 6p
6 pages
EPIData Presentation
No ratings yet
EPIData Presentation
36 pages
SAC01 - Questions With Answers
No ratings yet
SAC01 - Questions With Answers
20 pages
The Main Stages of Database Design
No ratings yet
The Main Stages of Database Design
4 pages
Lesson 1 - Laravel API CRUD Best Practice
No ratings yet
Lesson 1 - Laravel API CRUD Best Practice
12 pages
Cheminformatics
No ratings yet
Cheminformatics
4 pages
Application Architecture
No ratings yet
Application Architecture
39 pages
Install Oracle Database Gateway For Microsoft SQL Server
No ratings yet
Install Oracle Database Gateway For Microsoft SQL Server
13 pages
Want To Display The Employee's Last Name Whose Salary Is Below 10,000 and Whose Lastname Starts With Letter K
No ratings yet
Want To Display The Employee's Last Name Whose Salary Is Below 10,000 and Whose Lastname Starts With Letter K
6 pages
Lecture 3 - Measuresof Assocn
No ratings yet
Lecture 3 - Measuresof Assocn
55 pages
Dkaranam MediSlotProject
No ratings yet
Dkaranam MediSlotProject
13 pages
IERG4230 BigData Analytics
No ratings yet
IERG4230 BigData Analytics
54 pages
Sri Eshwar COE Temp Notes
No ratings yet
Sri Eshwar COE Temp Notes
12 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
Intern
No ratings yet
Intern
8 pages
Big Data For Data Economy
No ratings yet
Big Data For Data Economy
18 pages
Formal-Relational Query Languages: Practice Exercises
No ratings yet
Formal-Relational Query Languages: Practice Exercises
4 pages
Grievance Management System For Educational Institutions
No ratings yet
Grievance Management System For Educational Institutions
9 pages
Cloud Design Patterns 1711512535
No ratings yet
Cloud Design Patterns 1711512535
3 pages
Bi and Data Sharing Sherman
No ratings yet
Bi and Data Sharing Sherman
12 pages
06-Data Modeling Using The Entity-Relationship
No ratings yet
06-Data Modeling Using The Entity-Relationship
23 pages
FDSA - Question Bank
No ratings yet
FDSA - Question Bank
5 pages
DB2 Concepts
No ratings yet
DB2 Concepts
50 pages
Software Requirements Specification Music Recommendation System PDF
No ratings yet
Software Requirements Specification Music Recommendation System PDF
14 pages
Electricity Bill Management System: Software Description A. Java
No ratings yet
Electricity Bill Management System: Software Description A. Java
7 pages
BAPIs MM
No ratings yet
BAPIs MM
2 pages
BattleCard FujitsuEnterprisePostgres 14
No ratings yet
BattleCard FujitsuEnterprisePostgres 14
4 pages
Focus Architecture SanworlArticle
No ratings yet
Focus Architecture SanworlArticle
2 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet

DMlab - FilE prINCE

Uploaded by

DMlab - FilE prINCE

Uploaded by

Madhav Institute of Technology and Science, Gwalior

NAAC accredited with A++ Grade

Centre for Artificial Intelligence

S. Name of Experiment DATE OF SUBMITTED Sign

1 To perform basic operations for mining

6 Write a program to generate frequent itemset

7 Write a program to generate Association rules

8 Write a program to implement various

10 Write a program to implement different types

In clustering, a dataset is arranged in different groups/clusters based on some similarities. In

 ARFF (Attribute-Relation File Format):

Receiver Operating Characteristic (ROC) curves are graphical representations commonly

 True Positive Rate (TPR):

 False Positive Rate (FPR):

 Interpretation of ROC Curves:

 Area Under the Curve (AUC):

 Shape of the ROC Curve:

 Model Selection and Tuning:

WEKA KNOWLEDGE FLOW ENVIRONMENT VISUALIZATION :

 ARFF (Attribute-Relation File Format) : ARFF is a file format commonly used to

 IncrementalClassifierEvaluator : This is a class that allows you to evaluate the

 TextViewer : Used to Show the results of the model in text format.

iris_df = pd.read_csv("/content/iris_data.csv", header=None,

# Count the occurrence frequency of items in the 'class'

print("Occurrence frequency of items in the IRIS dataset:")

frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

Association rule mining is a technique used to discover interesting relationships, or associations,

frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

from sklearn import datasets

# Train and evaluate classifiers

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) identifies clusters

EM (Expectation-Maximization) clustering models the data as a mixture of Gaussian

# Generate sample data

You might also like