Welcome to Scribd!

0% found this document useful (0 votes)

2 views

21033570029_dm file kashish

Uploaded by

KASHISH MADAN

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

21033570029_dm file kashish

Uploaded by

KASHISH MADAN

0% found this document useful (0 votes)

2 views40 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

2 views40 pages

21033570029_dm file kashish

Uploaded by

KASHISH MADAN

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 40

Search inside document

DATA MINING

(DISCIPLINE SPECIFIC ELECTIVE)

PRACTICAL FILE

Submitted to: -
MR. RAJEEV RAI (Teacher In-charge)
As part of academic curriculum (3 RD YEAR- 6TH SEMESTER)
BSC. (HONS) COMPUTER SCIENCE

NAME: KASHISH MADAN

COLLEGE ROLL NO: 21570006
EXAMINATION R.N0: 21033570029
COLLEGE NAME: KALINDI COLLEGE, DU

Page 1
INDEX
S. No. Program Name Page No.

1. Datasets Used. 3-6

2. Practical Q1. 7-9

3. Practical Q2. 10-14

4. Practical Q3. 15-17

5. Practical Q4. 18-21

6. Practical Q5. 22-33

7. Practical Q6. 34-40

Page 2
DATASETS
1. People1.csv

2. Dirty_Iris.csv

Page 3
3.Wine.csv

4.Market_basket_optimization

Page 4
5. Social_network.csv

6. Mall_customers.csv

Page 5
7.Wholesale_customers data.csv

Page 6
QUESTION 1
Create a file “people.txt” with the following data:

Age agegroup height status yearsmarried

21 adult 6.0 single -1
2 child 3 married 0
18 adult 5.7 married 20
221 elderly 5 widowed 2
34 child -7 married 3

i. Read the data from the file “people.txt”.

ii. Create a ruleset E that contains rules to check for the
following conditions:
1. The age should be in the range 0-150.
2. The age should be greater than yearsmarried
3. The status should be married or single or widowed.
4. If the age is less than 18 the agegroup should be child, if age
is between 18 and 65 the agegroup should be adult, if the
age is more than 65 the agegroup should be elderly.

iii. Check whether ruleset E is violated by the data in the file

people.txt.
iv. Summarize the rules obtained in part (iii).
v. Visualize the results obtained in (iii).

Page 7
Page 8
PLOT IS AS FOLLOWS:

Page 9
QUESTION 2

Perform the following preprocessing tasks on the dirty_iris dataset:

i. Calculate the number and percentage of observations that are
complete.
ii. Replace all the special values in data with NA.
iii. Define these rules in a separate text file and read them.
(Use editfile function in R (package editrules). Use similar
function in Python).
Print the resulting constraint object.
- Species should be one of the following values: setosa, versicolor
or virginica.
- All measured numerical properties of an iris should be positive.
- The petal length of an iris is at least 2 times its petal width.
- The sepal length of an iris cannot exceed 30 cm.
- The sepals of an iris are longer than its petals.

iv. Determine how often each rule is broken (violatedEdits). Also

summarize and plot the result.
v. Find outliers in sepal length using boxplot and boxplot.stats.

Page 10
Page 11
Page 12
Page 13
BOXPLOT IS AS FOLLOWS:

Page 14
QUESTION 3

Load the data from wine dataset. Check whether all attributes are
standardized or not ( mean is 0 and standard deviation is 1). If not
standardize the attributes. Do the same withiris dataset.

Page 15
Page 16
Page 17
QUESTION 4

Run Apriori algorithm to find frequent item sets andassociation

rules.
1.1 Use minimum support as 50% and minimum confidence as 75%
1.2 Use minimum support as 60% and minimum confidence as 60%

Page 18
Page 19
Page 20
Page 21
QUESTION 5
Use Naïve Bayes, K-nearest and Decision tree classification algorithms and
build classifiers.
Divide the dataset into training and test set. Compare the accuracy
of the different classifiers under the following situations:
5.1 a) Training set = 75% Test set = 25%
b) Training set = 66.6% Test set = 33.3%
5.2 Training set is chosen by
i) Hold out method
ii) Random Subsampling
iii) Cross validation Method
Compare the accuracy of the classifiers obtained
5.3 Dataset is scaled to standard format.

Page 22
1. Naïve Bayes
(When training set = 75% and test set = 25%)

Page 23
Page 24
Page 25
Page 26
Page 27
KNN

Page 28
Page 29
Page 30
DECISION TREE

Page 31
Page 32
Page 33
QUESTION 6

Use simple Kmeans, DBScan, Hierarchial clustering algorithms for

clustering. Compare the performance of clusters bychanging the
parameters involved in the algorithms.

K-means

Page 34
Page 35
Page 36
DBSCAN

Page 37
Page 38
Hierarchial Clustering

Page 39
Page 40

Data Mesh
Document4 pages
Data Mesh
charlotte899
No ratings yet
Richard v. McCarthy - Applying Predictive Analytics - Finding Value in Data-Springer (2021)
Document282 pages
Richard v. McCarthy - Applying Predictive Analytics - Finding Value in Data-Springer (2021)
darko mc
0% (1)
K-Anonymity Model Project Report
Document71 pages
K-Anonymity Model Project Report
anubhav129
100% (2)
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
Vivek Sharma 2k21 Cs 111
Document48 pages
Vivek Sharma 2k21 Cs 111
Yash Lakhe
No ratings yet
DM Practicals in Python
Document55 pages
DM Practicals in Python
Akansha Sharma
No ratings yet
Datamining 2
Document54 pages
Datamining 2
ananomous.email
No ratings yet
Final Practical
Document53 pages
Final Practical
ananomous.email
No ratings yet
DM Guidelines 14jan2022
Document5 pages
DM Guidelines 14jan2022
Ayush
No ratings yet
1
Document19 pages
1
HarsimranKaurBindra
No ratings yet
DMV & ML Lab
Document103 pages
DMV & ML Lab
shindeprathamesh7768
No ratings yet
Data Mining and Business Intelligence Lab Manual
Document52 pages
Data Mining and Business Intelligence Lab Manual
Vinay Jokare
No ratings yet
Manisha 3001 Week 12
Document22 pages
Manisha 3001 Week 12
Suman Gaihre
No ratings yet
AIML Hard
Document22 pages
AIML Hard
jenilkalsariya1711
No ratings yet
Paper On Machine Learning For Kaggle
Document40 pages
Paper On Machine Learning For Kaggle
Wei Qi
No ratings yet
Institute Vision and Mission Vision: PEO1: PEO2: PEO3
Document35 pages
Institute Vision and Mission Vision: PEO1: PEO2: PEO3
Faisal Ahmad
No ratings yet
Wa0001
Document39 pages
Wa0001
Ravi Shankar
No ratings yet
Assignment-7: Opening Iris - Arff and Removing Class Attribute
Document17 pages
Assignment-7: Opening Iris - Arff and Removing Class Attribute
ammi890
No ratings yet
A216 - DWM - LAb 8
Document9 pages
A216 - DWM - LAb 8
kratikpaliwal20
No ratings yet
DMDW LAB NEW - Merged
Document53 pages
DMDW LAB NEW - Merged
jaswanthch16
No ratings yet
15 Chapter6 PDF
Document12 pages
15 Chapter6 PDF
diva
No ratings yet
ML Lab Manual (1-10) FINAL
Document34 pages
ML Lab Manual (1-10) FINAL
chintu
No ratings yet
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
Document4 pages
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
aditi1687
No ratings yet
Exploratory Data Analysis Syllabus
Document2 pages
Exploratory Data Analysis Syllabus
chakravarthydhanush71
No ratings yet
Microsoft R PreProcessing
Document1 page
Microsoft R PreProcessing
Manisha Panda
No ratings yet
DM Lab Cse
Document108 pages
DM Lab Cse
lavanya penumudi
No ratings yet
BT-2016 SEM-IV Project Report (Review 1)
Document42 pages
BT-2016 SEM-IV Project Report (Review 1)
Shreya Chauhan
No ratings yet
Unit 1 Assignment
Document6 pages
Unit 1 Assignment
Vishnu Karthik
0% (1)
Income Prediction
Document19 pages
Income Prediction
Ch Bilal Maken
No ratings yet
Can Ensemble of Classifiers Provide Better Recognition Results in Packaging Activity Draft
Document13 pages
Can Ensemble of Classifiers Provide Better Recognition Results in Packaging Activity Draft
Nazmus Sakib
No ratings yet
Data Mining (Viva)
Document18 pages
Data Mining (Viva)
Anubhav Shrivastava
No ratings yet
Project 2 Factor Hair Revised Case Study
Document25 pages
Project 2 Factor Hair Revised Case Study
rishit
No ratings yet
Weka Experiments
Document4 pages
Weka Experiments
Harshith Ch
No ratings yet
RevisedCO327 ML Practical List
Document2 pages
RevisedCO327 ML Practical List
Pranjal
No ratings yet
B DWM Lab Manual Zil
Document114 pages
B DWM Lab Manual Zil
꧁[PàRTH Pàtel]꧂
No ratings yet
Fundamentals of Data Science Students
Document52 pages
Fundamentals of Data Science Students
123sanjaypurohit
No ratings yet
DWDM Lab Manual: Department of Computer Science and Engineering
Document46 pages
DWDM Lab Manual: Department of Computer Science and Engineering
Dilli Books
No ratings yet
ML Lab
Document44 pages
ML Lab
Hari Priyan
No ratings yet
BMW M-4
Document108 pages
BMW M-4
Tarun K
No ratings yet
A Comparative Study of Classification Methods in Data Mining Using RapidMiner Studio
Document6 pages
A Comparative Study of Classification Methods in Data Mining Using RapidMiner Studio
mishranamit2211
100% (1)
Data Mining and Warehousing Lab
Document4 pages
Data Mining and Warehousing Lab
PhamThi Thiet
No ratings yet
Business Report: Advanced Statistics Module Project - II
Document9 pages
Business Report: Advanced Statistics Module Project - II
Prasad Mohan
No ratings yet
SSL - C4.5 Rules
Document13 pages
SSL - C4.5 Rules
josoa
No ratings yet
Data Warehousing and Data Mining
Document18 pages
Data Warehousing and Data Mining
lskannan47
No ratings yet
Lab Manual CSF346
Document21 pages
Lab Manual CSF346
itmailhere
No ratings yet
Creative Commons Attribution 3.0 Unported License
Document5 pages
Creative Commons Attribution 3.0 Unported License
g
No ratings yet
dwdm
Document46 pages
dwdm
Meenakshi Patel
No ratings yet
Xi Chap 4
Document7 pages
Xi Chap 4
taniaalibalghari
No ratings yet
DM Mod4
Document108 pages
DM Mod4
Srushti PS
No ratings yet
new90李美行管理科学与工程 202111200082
Document14 pages
new90李美行管理科学与工程 202111200082
li
No ratings yet
MODEL EXAM II Answer Key - For Merge
Document20 pages
MODEL EXAM II Answer Key - For Merge
devi
No ratings yet
Lab 4 Specification
Document3 pages
Lab 4 Specification
Minaal Ali Chaudhry
No ratings yet
Grid Search Hyper-Parameter Tuning and K-Means Clustering ToImprove The Decision Tree Accuracy
Document3 pages
Grid Search Hyper-Parameter Tuning and K-Means Clustering ToImprove The Decision Tree Accuracy
International Journal of Innovative Science and Research Technology
No ratings yet
Lab Manual Computer Science & Engineering
Document29 pages
Lab Manual Computer Science & Engineering
41- Vaibhav Vyas
No ratings yet
DBMS
Document51 pages
DBMS
Meenakshi Patel
No ratings yet
SLIQ
Document15 pages
SLIQ
Stan Vlad
No ratings yet
Data Preprocessing: Modern Data Analytics (G0Z39A) Prof. Dr. Ir. Jan de Spiegeleer
Document82 pages
Data Preprocessing: Modern Data Analytics (G0Z39A) Prof. Dr. Ir. Jan de Spiegeleer
Ali Shana'a
No ratings yet
Data Analytics For Accounting Exercise Multiple Choice and Discussion Question
Document3 pages
Data Analytics For Accounting Exercise Multiple Choice and Discussion Question
ukandi rukmana
No ratings yet
Genetic Algorithm Based Bayesian Classification Algorithm For Object Oriented Data
Document6 pages
Genetic Algorithm Based Bayesian Classification Algorithm For Object Oriented Data
Oliver Rif
No ratings yet
C45 Algorithm
Document12 pages
C45 Algorithm
triisant
No ratings yet
Metaheuristics for Big Data
From Everand
Metaheuristics for Big Data
Clarisse Dhaenens
No ratings yet
DS practical file (2)
Document73 pages
DS practical file (2)
KASHISH MADAN
No ratings yet
mysql interface
Document5 pages
mysql interface
KASHISH MADAN
No ratings yet
Computer Science
Document9 pages
Computer Science
KASHISH MADAN
No ratings yet
csv files
Document22 pages
csv files
KASHISH MADAN
No ratings yet
clipping
Document18 pages
clipping
KASHISH MADAN
No ratings yet
TRANSFORMATIONS (2)
Document44 pages
TRANSFORMATIONS (2)
KASHISH MADAN
No ratings yet
5
Document1 page
5
KASHISH MADAN
No ratings yet
Jay Vora: Email: Jvora20@Iitk - Ac.In Github: Jayvora314 Phone: (
Document1 page
Jay Vora: Email: Jvora20@Iitk - Ac.In Github: Jayvora314 Phone: (
noname orphan
No ratings yet
FRM - Violation - Report (For DM) 03042023
Document12 pages
FRM - Violation - Report (For DM) 03042023
ahmad zakaria
No ratings yet
ICT Paper
Document14 pages
ICT Paper
shihamh haniffa
No ratings yet
Software
Document93 pages
Software
Atharva Raj
No ratings yet
Full Stack
Document1 page
Full Stack
abhishek.ag64064
No ratings yet
Integrating AI and IoT For Smart Manufacturing
Document4 pages
Integrating AI and IoT For Smart Manufacturing
publication1
No ratings yet
Chapter One UG
Document90 pages
Chapter One UG
wubishetwudu1624
No ratings yet
Ma Chan Myae Thu (For Third Seminar)
Document47 pages
Ma Chan Myae Thu (For Third Seminar)
Dr. Myat Mon Kyaw
No ratings yet
Artificial Intelligence Fundamentals
Document2 pages
Artificial Intelligence Fundamentals
Chelsia D
No ratings yet
Iniya
Document21 pages
Iniya
monishvaran450
No ratings yet
Assignment IOT
Document1 page
Assignment IOT
Cs062 pooja Singh
No ratings yet
Data Structures and Algorithms - 2018
Document3 pages
Data Structures and Algorithms - 2018
Franklin Tamayo
No ratings yet
Smart Banking Chatbot
Document7 pages
Smart Banking Chatbot
IJRASETPublications
No ratings yet
Ingles para Selectividad
Document4 pages
Ingles para Selectividad
Marta López Lloret
No ratings yet
SUMMER TRAINING REPORT ON HTML - BY Sagar Jha BCA 5TH SEM
Document29 pages
SUMMER TRAINING REPORT ON HTML - BY Sagar Jha BCA 5TH SEM
Naveen Kumar
No ratings yet
Gouldingfishes 332
Document3 pages
Gouldingfishes 332
jeisson osorio
No ratings yet
Vaibhav Arora Program File
Document16 pages
Vaibhav Arora Program File
Vaibhav Arora
No ratings yet
Topic 15
Document11 pages
Topic 15
pakghost22
No ratings yet
Computational Linguistics
Document4 pages
Computational Linguistics
riaz6076
No ratings yet
OpenText Content Server CE 22.1 - Installation Guide English (LLESCOR220100-IGD-EN-03)
Document162 pages
OpenText Content Server CE 22.1 - Installation Guide English (LLESCOR220100-IGD-EN-03)
vadim
No ratings yet
Computer Science - IT Courses in Germany (English)
Document11 pages
Computer Science - IT Courses in Germany (English)
zarinadaleel
No ratings yet
Full ml-2
Document1 page
Full ml-2
airteloffice219
No ratings yet
Diabetic Retinopathy Detection Deep Learning - Matlab - Simulink
Document3 pages
Diabetic Retinopathy Detection Deep Learning - Matlab - Simulink
Adnan Khan
No ratings yet
Database MCQ
Document36 pages
Database MCQ
Minal Deshmukh
No ratings yet
Mini Project Report
Document20 pages
Mini Project Report
Aditya Sawant
No ratings yet
Bit 3101a BSD 2101 Bac 2102 Bisf 2102 Bbit 106 Data Structures and Algorithms
Document2 pages
Bit 3101a BSD 2101 Bac 2102 Bisf 2102 Bbit 106 Data Structures and Algorithms
Quenter Njoroge
No ratings yet
American Library Association
Document2 pages
American Library Association
824 99
No ratings yet
Data Abstraction and Encapsulation
Document7 pages
Data Abstraction and Encapsulation
Vishal Gupta
No ratings yet
Analisis Data: Kunjungan Kelas Ibu Hamil
Document4 pages
Analisis Data: Kunjungan Kelas Ibu Hamil
Tina Margaretha
No ratings yet