21033570029_dm file kashish

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

DATA MINING

(DISCIPLINE SPECIFIC ELECTIVE)

PRACTICAL FILE

Submitted to: -
MR. RAJEEV RAI (Teacher In-charge)
As part of academic curriculum (3 RD YEAR- 6TH SEMESTER)
BSC. (HONS) COMPUTER SCIENCE

NAME: KASHISH MADAN


COLLEGE ROLL NO: 21570006
EXAMINATION R.N0: 21033570029
COLLEGE NAME: KALINDI COLLEGE, DU

Page 1
INDEX
S. No. Program Name Page No.

1. Datasets Used. 3-6

2. Practical Q1. 7-9

3. Practical Q2. 10-14

4. Practical Q3. 15-17

5. Practical Q4. 18-21

6. Practical Q5. 22-33

7. Practical Q6. 34-40

Page 2
DATASETS
1. People1.csv

2. Dirty_Iris.csv

Page 3
3.Wine.csv

4.Market_basket_optimization

Page 4
5. Social_network.csv

6. Mall_customers.csv

Page 5
7.Wholesale_customers data.csv

Page 6
QUESTION 1
Create a file “people.txt” with the following data:

Age agegroup height status yearsmarried


21 adult 6.0 single -1
2 child 3 married 0
18 adult 5.7 married 20
221 elderly 5 widowed 2
34 child -7 married 3

i. Read the data from the file “people.txt”.


ii. Create a ruleset E that contains rules to check for the
following conditions:
1. The age should be in the range 0-150.
2. The age should be greater than yearsmarried
3. The status should be married or single or widowed.
4. If the age is less than 18 the agegroup should be child, if age
is between 18 and 65 the agegroup should be adult, if the
age is more than 65 the agegroup should be elderly.

iii. Check whether ruleset E is violated by the data in the file


people.txt.
iv. Summarize the rules obtained in part (iii).
v. Visualize the results obtained in (iii).

Page 7
Page 8
PLOT IS AS FOLLOWS:

Page 9
QUESTION 2

Perform the following preprocessing tasks on the dirty_iris dataset:


i. Calculate the number and percentage of observations that are
complete.
ii. Replace all the special values in data with NA.
iii. Define these rules in a separate text file and read them.
(Use editfile function in R (package editrules). Use similar
function in Python).
Print the resulting constraint object.
- Species should be one of the following values: setosa, versicolor
or virginica.
- All measured numerical properties of an iris should be positive.
- The petal length of an iris is at least 2 times its petal width.
- The sepal length of an iris cannot exceed 30 cm.
- The sepals of an iris are longer than its petals.

iv. Determine how often each rule is broken (violatedEdits). Also


summarize and plot the result.
v. Find outliers in sepal length using boxplot and boxplot.stats.

Page 10
Page 11
Page 12
Page 13
BOXPLOT IS AS FOLLOWS:

Page 14
QUESTION 3

Load the data from wine dataset. Check whether all attributes are
standardized or not ( mean is 0 and standard deviation is 1). If not
standardize the attributes. Do the same withiris dataset.

Page 15
Page 16
Page 17
QUESTION 4

Run Apriori algorithm to find frequent item sets andassociation


rules.
1.1 Use minimum support as 50% and minimum confidence as 75%
1.2 Use minimum support as 60% and minimum confidence as 60%

Page 18
Page 19
Page 20
Page 21
QUESTION 5
Use Naïve Bayes, K-nearest and Decision tree classification algorithms and
build classifiers.
Divide the dataset into training and test set. Compare the accuracy
of the different classifiers under the following situations:
5.1 a) Training set = 75% Test set = 25%
b) Training set = 66.6% Test set = 33.3%
5.2 Training set is chosen by
i) Hold out method
ii) Random Subsampling
iii) Cross validation Method
Compare the accuracy of the classifiers obtained
5.3 Dataset is scaled to standard format.

Page 22
1. Naïve Bayes
(When training set = 75% and test set = 25%)

Page 23
Page 24
Page 25
Page 26
Page 27
KNN

Page 28
Page 29
Page 30
DECISION TREE

Page 31
Page 32
Page 33
QUESTION 6

Use simple Kmeans, DBScan, Hierarchial clustering algorithms for


clustering. Compare the performance of clusters bychanging the
parameters involved in the algorithms.

K-means

Page 34
Page 35
Page 36
DBSCAN

Page 37
Page 38
Hierarchial Clustering

Page 39
Page 40

You might also like