0% found this document useful (0 votes)
136 views57 pages

Harshal ET 3 Lab Manual New

This document provides information about the Emerging Technology III Lab course for final year Computer Science and Engineering students at P.R. Pote (Patil) College of Engineering & Management, Amravati. It includes the vision, mission, and program educational objectives of the institute and department. It also contains a list of 10 planned experiments in the lab related to data mining techniques like association rule mining, decision trees, clustering, and includes spaces to record performance details and teacher assessments.

Uploaded by

Harshal Thakare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views57 pages

Harshal ET 3 Lab Manual New

This document provides information about the Emerging Technology III Lab course for final year Computer Science and Engineering students at P.R. Pote (Patil) College of Engineering & Management, Amravati. It includes the vision, mission, and program educational objectives of the institute and department. It also contains a list of 10 planned experiments in the lab related to data mining techniques like association rule mining, decision trees, clustering, and includes spaces to record performance details and teacher assessments.

Uploaded by

Harshal Thakare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

P.R.

POTE (PATIL) EDUCATION & WELFARE TRUST’S GROUP OF


INSTITUTIONS,

COLLEGE OF ENGINEERING & MANAGEMENT,

AMRAVATI.

COMPUTER SCIENCE AND ENGINEERING

Year : Final Year Semester: VIIth Sem

SUBJECT: 7KS07 Emerging Technology III Lab

NAME of LABORATORY : Emerging Technology III Lab


Emerging Technology-III Lab

Institute Vision & Mission

Vision:
To flourish as a centre of excellence for producing the skilled technocrats and
committed human beings.

Mission:
To create conducive environment for teaching &learning.
To impart quality education through demanding academic programs.
To enhance career opportunities by exposure to Industries & recent technologies.
To develop professionals with strong ethics and human values for the betterment
of society.

Department Vision Mission

Vision
● To achieve Excellence in Computer Science and Engineering for serving the
growing needs of Software industry and Society

Mission

● To create an ambiance that shall foster the growth for developing innovative and
entrepreneurial skills.
● To identify areas of specialization upon which the department can concentrate,
thus promote academic growth and career opportunities.
● To develop human resource with ethical and moral values for overall personality
development to serve the society.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

PEO’s
● Preparation: To prepare graduates to have knowledge of computer science and
competency for careers in industry.

● Core Competence: To develop problem solving skills in to the graduates that


are required to analyze, design and implement solution with the computer
knowledge.

● Breadth: To nurture the graduates to be an effective team member, build


proficiency in soft skills, inculcate multidisciplinary approach and the ability
to relate engineering with social context.

● Professional and lifelong learning: To inculcate ethical practices,


professionalism and environmental awareness for sustainable development
among students with an attitude towards lifelong learning.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Department of Computer Science & Engineering

Certificate

This is to certify that Mr……………….. Harshal Dilip Thakare……………………..

of ….VII… Semester of Bachelor of Engineering in Computer Science and Engineering

Engineering of P. R. Pote (Patil) College of Engineering & Management, Amravati, has

completed the term work satisfactory in subject ….DWDM(ET LabIII)……. for the

academic year 2022- 2023… as prescribed in the curriculum.

Place…Amravati……… PRN No:……………………

Date……-10-2023.…….. Roll No………C-441………

Practical Incharge Head of the Department

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

LIST OF PRACTICALS & PROGRESSIVE ASSESSMENT FOR TERM WORK

Course : Emerging Technology Lab III


Academic Year 2023-2024

Subject & Code: Emerging Technology Lab III Semester : VIIth Sem
7KS07

Name of Faculty :

Name of Student: Harshal D. Thakare

PRN No. Roll No. C-441

Sign of
Assessment
Title of the Date of Date of Teacher
SN. Page Marks
Practical / Experiment No Performance Submission and
(15) Remarks
To Install and Explore WEKA
1. Workbench
To create a relation (Table) &
2. study the .arff file format in
WEKA.
Apply Pre-Processing
3. techniques on data set.
Normalizing Relation (Table)
4. data using Knowledge Flow.
Finding Association Rules for
5. relation (Table) data.
To Construct Decision tree for
6. relation data and classify it.
To understand & apply the
7. procedure for Visualizing
relation (Table) data.
To study & understand the
procedure for cross-validation
8.
using J48 Algorithm for given
relation.
To Study & Understand the
procedure for Clustering
9.
Buying data using Cobweb
Algorithm.
To Study & Understand the
procedure for Clustering
10
Customer data using Simple
KMeans Algorithm.

Signature of Faculty

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

GUIDELINES FOR TEACHERS

Teachers shall discuss the following points with students before start of practical of the
subject.
1 Learning Overview: To develop better understanding of importance of the subject.
To know related skills to be developed such as intellectual and motor skills
2 Know your Laboratory Work: To understand the layout of laboratory, specifications
of equipment / instruments /materials, procedure, working in groups, planning time
etc. also to know total amount of work to be done in the laboratory.
3 Teacher shall ensure that required equipment is in working condition before start of
each experiment, also keep operating instruction manual available.
4 Explain prior concepts to the students before starting of each experiment.
5 Evolve student’s activity at the time of conduct of each experiment.
6 While taking reading / observation each student (from batch of 20 students) shall be
given a chance to perform / observe the experiment.
7 Teacher shall assess the performance of students continuously.
8 Teacher is expected to share the skills to be developed in the students.
9 Teacher should ensure that the respective skills are developed in the students after the
completion of the practical exercise.
10 Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from students by the industries.
11 Teacher may suggest the students to refer additional related literature of the technical
papers / reference books / Seminar Proceedings, etc.
12 Focus should be given on development of enlisted skills rather than theoretical /
codified knowledge.
13 During assessment teacher is expected to ask questions to the students to tap their
achievements regarding related knowledge and skills.
14 Teacher should give more focus on hands on skills.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

INSTRUCTIONS FOR STUDENTS

Students shall read the points given below for understanding the theoretical concepts
and practical applications.
1 Listen carefully to the lecture given by teacher about importance of subject, curriculum
philosophy, learning structure, skills to be developed, information about equipment,
instruments, procedure, method of continuous assessment, tentative plan of work in
laboratory and total amount of works to be done in a semester.
2 Student shall undergo study visit of the laboratory for types of equipment, and material
to be used, before performing experiments.
3 Read the write up of each experiment to be performed, a day in advance.
4
Organize the work in the group and make a record of all observations.
5
Understand the purpose of experiment and its practical implications.
6
Student should not hesitate to ask any difficulty faced during conduct of practical
/exercise.
7
Write the answers of the questions allotted by the teacher during practical hours if
possible or afterwards, but immediately.
8 The student shall study all the questions given in the laboratory manual and practice to
write the answers to these questions.
9 Student should develop the habit of pear discussion / group discussion related to
experiments / exercise so that exchanges of knowledge / skills could take place.
10 Students shall attempt to develop related hands-on-skills and gain confidence.
11 Student shall focus on development of skills rather than theoretical or codified
knowledge.
12 Student shall insist for the completions of recommended Laboratory Work, answers to
the given question etc.
13 Student shall develop the habit of evolving more ideas, innovations, skills etc. that
included in the scope of the manual.
14 Student shall refer technical magazines, proceedings of the Seminars, refer website
related to the scope of the subjects and update their knowledge and skills.
15 Student should develop the habit of not depend totally on teachers but to develop self
learning techniques.
16 Student should develop the habit to interact with the teacher without hesitation with
respect to academic involved.
17 Student should develop habit to submit the practical’s exercise continuously and
progressively on the schedule dates and should get the assessment done.
18 Student should be well prepared while submitting the write up of the exercise. This will
develop the continuity of the studies and he will not be overloaded at the end of the
term.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

LABORATORY INSTRUCTIONS

● BE RESPECTFUL! Always treat the computer lab equipment AND your teacher and
classmates the way that you would want your belongings and yourself to be treated.

● No food or drinks near the computers. NO EXCEPTIONS.

● Enter the computer lab quietly and work quietly. There are other groups and
individuals who may be using the computer lab. Please be respectful.

● Surf safely! Only visit assigned websites. Some web links can contain viruses or
malware. Others may contain inappropriate content. If you are not certain that a website
is SAFE, please ask a teacher or other adult.

● Clean up your work area before you leave. All cords should be placed on the tables (not
hanging off the sides). Headphones should be placed on the CPU/tower ormonitor.
Chair should be pushed under the tables. All trash, papers, and pencils should be picked
up.

● Do not change computer settings or backgrounds.

● Ask permission before you print.

● SAVE all unfinished work to a cloud drive or jump drive. Any work that is saved to the
computer will be deleted when the computer is powered off or updated at the end of the
day.

● If you are the last class of the day, please POWER DOWN all computers and
monitors.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

P. R. Pote (Patil) College of Engineering & Management, Amravati

Department of Computer Science & Engineering

Name of the Program: B. E. CSE Academic Year:2023-2024


Class: Final year Semester : 7th
Section : A / B / C Course Code: 7KS07
Course Owner: Prof. A. M. Bhoyar /
Course/Subject: Emerging Technology Lab III Prof. R. S. Lande / Prof. S. R.
Sontakke

Sr. No. List of Practical’s

1 To Install and Explore WEKA Workbench

2 To create a relation (Table) & study the .arff file format in WEKA.

3 Apply Pre-Processing techniques on data set.

4 Normalizing Relation (Table) data using Knowledge Flow.

5 Finding Association Rules for relation (Table) data.

6 To Construct Decision tree for relation data and classify it.

7 To understand & apply the procedure for Visualizing relation (Table) data.

To study & understand the procedure for cross-validation using J48 Algorithm for given
8 relation.
To Study & Understand the procedure for Clustering Buying data using Cobweb
9 Algorithm.
To Study & Understand the procedure for Clustering Customer data using Simple
10 KMeans Algorithm.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Lab Course Outcomes

After successful completion of laboratory course, the students will able to

Outcomes
SN
1. Provide efficient distribution of information and easy access to data
2. Create user friendly reporting environment.
Find the unseen pattern in large volume of historical data that helps to manage an
3. organization efficiently.
4. Understand the concepts of various data mining Techniques.
5. Understand the concepts of Preprocessing

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Assessment Strategy: Rubrics for continuous evaluation in lab session


Allocated
Parameters High Medium Low
Marks
Student answered onlyStudent did not answer
Student answered all the
few prelab questionsany prelab question and
prelab questions and
and partial know not aware about
Prelab test Objective of the
2 objective of the objective of the
experiment.
experiment. experiment
2 1 0
Student performed or
Student performed or Student performed or
executed experiment,
executed experiment, executed experiment,
obtained results, and
obtained results, and obtained results, and
In-Lab drawn conclusion
5 drawn conclusion fully drawn conclusion below
performance partially as per
as per expectation. the expectation.
expectation.
5 4-3 2-1
Student answered thePostStudent partially Student did not answer
lab Viva voce questionsanswered the Post labthe Post lab Viva voce
and fullyconfirms theViva voce questionsandquestions and not
Post lab test understanding of thepartially confirms confirms the
3
experiments. the understanding ofunderstanding of the
the experiments. experiments.
3 2-1 0
Records submitted byRecords submitted by Records submitted by
the Student found highlythe Student foundthe Student found highly
Lab Record satisfactory after moderately satisfactory dissatisfactory after
5 evaluation. after evaluation. evaluation.
5 4-3 2-1
Total Marks 15 Marks (Continuous Assessment)

 Assessment Strategy: Rubrics use for Internal Examination.


Allocated
Parameters High Medium Low
Marks
Students able toStudent partially able toStudent not able to
conduct the givenconduct the givenconduct given
experiment with experiment with desired experiment with
Performance 5
desired output. output. desired output.
5-4 3-1 0
Student answered the Student answered the Student did not
Questions Questions moderately answer the
Viva Voce 5
satisfactorily. satisfactorily. Questions.
5-4 3-1 0
Total marks 10 Marks (Internal Examination)

Grand Total Marks Continues Assessment (15)+Internal Examination (10) =25 Marks

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Practical No. 1

Aim: To Install and Explore WEKA Workbench.

Tasks:
1. Download and install WEKA toolkit,
2. Understand the features of WEKA such as Explorer, Knowledge flow interface,
Experimenter, Command-Line Interface (CLI).
3. Navigate the options available in the WEKA (ex. Select attributes panel, preprocess
panel, classify panel, cluster panel, associate panel and visualize panel)
4. Explore the available data sets in WEKA.
5. Load data sets (ex. weather dataset, iris dataset, etc.)
6. Load each dataset and observe the following:
i. List the attributes names and they types
ii. Number of records in each dataset
iii. Identity the class attribute(if any)
iv. Plot Histogram
v. Determine the number of records for each class.
vi. Visualize the data in various dimensions.

Introduction
Weka (Waikato environment for knowledge Analysis) is a workbench that contains a collection
of visualization tools and algorithms for data analysis and predictive modeling, together with
graphical user interfaces for easy access to these functions. The original non- Java version of
Weka was a Tcl/Tk front-end to (mostly third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing utilities in C, and Make file-based system for
running machine learning experiments. This original version was primarily designed as a tool
for analyzing data from agricultural domains, but the more recentfully Java-based version
(Weka 3), for which development started in 1997, is now used in many different application
areas, in particular for educational purposes and research. Advantages of Weka include:
 Free availability under the GNU General Public License.
 Portability, since it is fully implemented in the Java programming language and thus
runs on almost any modern computing platform
 A comprehensive collection of data preprocessing and modeling techniques
 Ease of use due to its graphical user interfaces

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Installation Procedure:

Download the WEKA tool from the following link & install it .
http://www.cs.waikato.ac.nz/ml/weka/downloading.html

 Understanding features of WEKA toolkit such as Explorer, Knowledge flow interface,


Experimenter, command-line interface.

 WEKA: Waikato Environment for Knowledge Analysis The WEKA GUI


Chooser provides a starting point for launching WEKA’S main GUI applications
and supporting tools. The GUI Chooser consists of four buttons—one for each of
the four major Weka applications—and four menus.

1. Explorer: The graphical interface used to conduct experimentation on raw data After
clicking the Explorer button the weka explorer interface appears. Inside the weka explorer
window there are six tabs:

Preprocess - used to choose the data file to be used by the application.


 Open File- allows for the user to select files residing on the local machine or
recorded medium
 Open URL- provides a mechanism to locate a file or data source from a different
location specified by the user
 Open Database- allows the user to retrieve files or data from a database source
provided by user

Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation. Again there are several options to be selected inside of the classify tab. Test
option gives the user the choice of using four different test mode scenarios on the data set.
 Use training set
 Supplied training set
 Cross validation
 Split percentage

Cluster- used to apply different tools that identify clusters within the data file. The Cluster tab
opens the process that is used to identify commonalties or clusters of occurrences within the
data set and produce information for the user to analyze.

Association- used to apply different rules to the data file that identifies association within the
data. The associate tab opens a window to select the options for associations within the dataset.

Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the experiment

Visualize- used to see what the various manipulation produced on the data set in a 2D format,
in scatter plot and bar graph output.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

2. Experimenter - this option allows users to conduct different experimental variations on


data sets and perform statistical manipulation. The Weka Experiment Environment enables
the user to create, run, modify, and analyze experiments in a more convenient manner than
is possible when processing the schemes individually. For example, the user can create an
experiment that runs several schemes against a series of datasets and then analyze the
results to determine if one of the schemes is (statistically) better than the otherschemes.
 Results destination: ARFF file, CSV file, JDBC database.
 Experiment type: Cross-validation (default), Train/Test Percentage Split (data
randomized).
 Iteration control: Number of repetitions, Data sets first/Algorithms first.
 Algorithms: filters

3. Knowledge Flow: The Knowledge Flow provides an alternative to the Explorer as a


graphical front end to WEKA’s core algorithms. The Knowledge Flow presents a data- flow
inspired interface to WEKA. The user can select WEKA components from a palette, place
them on a layout canvas and connect them together in order to form a knowledge flow for
processing and analysing data. At present, all of WEKA’s classifiers, filters, clusterers,
associators, loaders and savers are available in the Knowledge Flow along withsome extra
tools.

4. Simple CLI The Simple CLI provides full access to all Weka classes, i.e., classifiers,
filters, clusters, etc., but without the hassle of the CLASSPATH (it facilitates the one, with
which WEKA was started). It offers a simple Weka shell with separated command line and
output.

 Navigate the options available in the WEKA (ex. Select attributes panel, preprocess
panel, classify panel, cluster panel, associate panel and visualize panel). And Explorer
the available data sets in WEKA. Load a data set. Load each dataset and observe the
following:
 List the attributes names and they types
 Number of records in each dataset
 Identity the class attribute (if any)
 Plot Histogram
 Determine the number of records for each class.
 Visualize the data in various dimensions

Click the “Open file...” button to open a data set and double click on the “data” directory. Weka
provides a number of small common machine learning datasets that you can use to practice on.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Output/Result:

 List Pre lab Questions:

1. What is Weka Workbench and its primary purpose?


Ans :

2. What are some common file formats for data that Weka supports?
Ans :

3. What is the main interface of Weka, and how can you load a dataset?

Ans :

4. How do you preprocess data in Weka, and why is data preprocessing important?

Ans :

5. What are the basic steps in building and evaluating a machine learning model in
Weka?
Ans :

 List Post lab Questions:

1. After using Weka, what did you discover about the dataset you worked with?

Ans :

2. Which machine learning algorithm did you find most suitable for the dataset, and
why?
Ans :

3. What evaluation metrics did you use to assess the model's performance, and what
were the results?
Ans :

4. Did you encounter any challenges during the data preprocessing or model building
process, and how did you overcome them?

Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

5. How would you apply the knowledge gained from this Weka exploration to a real-
world data analysis or machine learning project?
Ans :

 Conclusion: 

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Practical No. 2

Aim: To create a relation (Table) & study the .arff file format in WEKA.

Task:
 Create an Employee relation (Table) with attributes name, id, salary, experience,
gender, phone number.

 Theory:
An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of
instances sharing a set of attributes. ARFF files were developed by the Machine Learning
Project at the Department of Computer Science of The University of Waikato for use with the
Weka machine learning software in WEKA, each data entry is an instance of the java class
weka. core. Instance, and each instance consists of a for loading datasets in WEKA, WEKA
can load ARFF files. Attribute Relation File Format has two sections:
1. The Header section defines relation (dataset) name, attribute name, and type.
2. The Data section lists the data instances.
WEKA supports a large number of file formats for the data. The file formats supported by
WEKA are arff, arff.gz, bsi, csv, dat, data, json, json.gz, libsvm, m, names, xrff, xrff.gz

 arff file format: An arff file contains two sections - header and data.
 The header describes the attribute types.
 The data section contains a comma separated list of data.
As an example for arff file format, the Weather data file loaded from the WEKA sample
databases is shown below:

 The @relation tag defines the name of the database.

 The @attribute tag defines the attributes.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 The @data tag starts the list of data rows each containing the comma separated fields.

o The attributes can take nominal values as in the case of outlook shown here –
@attribute outlook (sunny, overcast, rainy)

o The attributes can take real values as in this case –


@attribute temperature real

o You can also set a Target or a Class variable called play as shown here –
@attribute play (yes, no)
The Target assumes two nominal values yes or no.

 Steps to create relation:


1) Open Notepad
2) Type the following lines in Notepad.

@relation employee
@attribute name {x, y, z, a, b}
@attribute id numeric
@attribute salary {low, medium, high}
@attribute exp numeric
@attribute gender {male, female}
@attribute phone numeric
@data
x,101,low,2,male,250311
y,102,high,3,female,251665
z,103,medium,1,male,240238
a,104,low,5,female,200200
b,105,high,2,male,240240

3) After that the file is saved with .arff file extension.


4) Open WEKA & click on explorer.
7) Click on ‘open file’ and select the employee.arff file
8) Click on edit button which shows employee table on weka.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Output/Result:

 List of Pre-Lab questions:

1. What is the purpose of creating a relation (table) in WEKA?


Ans :
2. Which file format is commonly used in WEKA to represent datasets?
Ans :
3. In WEKA, what is the significance of an attribute's data type in a .arff file?
Ans :

 List of Post-lab questions:

1. In a .arff file, what does the "@" symbol indicate?

Ans :
2. Which WEKA tool is commonly used for visualizing data and creating relations
(tables)?
Ans :
3. Which step typically involves specifying attributes and their data types when creating
a relation in WEKA?
Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

4. What is the purpose of a .arff file header section?


Ans :

Conclusion:

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Practical No. 3

Aim: Apply Pre-Processing techniques on data set.

Task: Create Weather relation and apply Pre-processing techniques on it

 Description:
Real world databases are highly influenced to noise, missing and inconsistency due to their
queue size so the data can be pre-processed to improve the quality of data and missing
results and it also improves the efficiency. There are 3 pre-processing techniques:
1. Add
2. Remove
3. Normalization

 Creation of Weather relation (Table):

@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes

 Pre-Processing Technique:

1. Addition of Climate Column in existing weather relation


Steps:
1. Open Weka
2. Click on explorer.
3. Click on open file.
4. Select Weather.arff file and click on open.
5. Click on Choose button and select the Filters option.
6. In Filters, we have Supervised and Unsupervised data.
7. Click on Unsupervised data.
8. Select the attribute Add.
9. A new window is opened.
10. In that we enter attribute index, type, data format, nominal label values for Climate.
11. Click on OK.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

12. Press the Apply button, then a new attribute is added to the Weather Table.
13. Save the file.
14. Click on the Edit button, it shows a new Weather Table on Weka.
Weather Table after adding new attribute Snowy:

2. Removing windy & play attributes from existing weather relation:


Steps:
1. Open Weka-3-4
2. Click on explorer.
3. Click on open file.
4. Select Weather.arff file and click on open.
5. Click on Choose button and select the Filters option.
6. In Filters, we have Supervised and Unsupervised data.
7. Click on Unsupervised data.
8. Select the attribute remove.
9. Select the attributes windy, play to remove.
10. Click Remove button and then save.
11. Click on the Edit button, it shows a new Weather Table on Weka.

3. Applying normalization on Tempreture and Humidity


Steps:
1. Open Weka.
2. Click on explorer.
3. Click on open file.
4. Select Weather.arff file and click on open.
5. Click on Choose button and select the Filters option.
6. In Filters, we have Supervised and Unsupervised data.
7. Click on Unsupervised data.
8. Select the attribute Normalize.
9. Select the attributes temparature, humidity to Normalize.
10. Click on Apply button and then Save.
11. Click on the Edit button, it shows a new Weather Table with normalized values on
Weka.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Output/Result:
 Weather Table

 Weather Table after adding new attribute CLIMATE:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Weather Table after removing attributes WINDY, PLAY:

 Weather Table after Normalizing TEMPARATURE, HUMIDITY.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 List of Pre-Lab questions:

1. What is the primary goal of data pre-processing in machine learning?


Ans :

2. Why is it important to handle missing data during pre-processing?


Ans :

3. What is the purpose of data normalization in pre-processing?

Ans :

4. Why do we identify and address outliers in the dataset as part of pre-processing?


Ans :

5. Name one technique used for feature selection during data pre-processing.
Ans :

 List of Post-Lab questions:

1. Name three common pre-processing techniques used to clean and prepare a dataset.

Ans :

2. Why is handling missing data an important step in data pre-processing?

Ans :

3. What does data normalization achieve in pre-processing?


Ans :

4. Why do we detect and treat outliers during pre-processing?


Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Conclusion:

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Experiment No: 4

Aim: Normalizing Relation (Table) data using Knowledge Flow.

Task: Normalize Weather relation data using knowledge flow

 Description:
The knowledge flow provides an alternative way to the explorer as a graphical front end to
WEKA’s algorithm. Knowledge flow is a working progress. So, some of the functionality from
explorer is not yet available. So, on the other hand there are the things that can be done in
knowledge flow, but not in explorer. Knowledge flow presents a dataflow interface to WEKA.
The user can select WEKA components from a toolbar placed them on a layout campus and
connect them together in order to form a knowledge flow for processing and analysing the data.

 Creation of Weather Table:

@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes

1. After that the file is saved with .arff file format.


2. -3-4.
3. Click on weka-3-4, then Weka dialog box is displayed on the screen.
4. In that dialog box there are four modes, click on explorer.
5. Explorer shows many options. In that click on ‘open file’ and select the arff file
6. Click on edit button which shows Weather table on weka.

 Procedure for Knowledge Flow:


1. Open the Knowledge Flow.
2. Select the Data Source component and add Arff Loader into the knowledge layout
canvas.
3. Select the Filters component and add Attribute Selection and Normalize into the
knowledge layout canvas.
4. Select the Data Sinks component and add Arff Saver into the knowledge layout
canvas.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

5. Right click on Arff Loader and select Configure option then the new window will be
opened and select Weather.arff
6. Right click on Arff Loader and select Dataset option then establish a link between
Arff Loader and Attribute Selection.
7. Right click on Attribute Selection and select Dataset option then establish a link
between Attribute Selection and Normalize.
8. Right click on Attribute Selection and select Configure option and choose the best
attribute for Weather data.
9. Right click on Normalize and select Dataset option then establish a link between
Normalize and Arff Saver.
10. Right click on Arff Saver and select Configure option then new window will be
opened and set the path, enter .arff in look in dialog box to save normalize data.
11. Right click on Arff Loader and click on Start Loading option then everything will be
executed one by one.
12. Check whether output is created or not by selecting the preferred path.
13. Rename the data name as a.arff
14. Double click on a.arff then automatically the output will be opened in MS-Excel.

 Output/Result:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 List of Pre-Lab Questions:

1. What is the main purpose of normalizing relation (table) data using Knowledge
Flow?
Ans :

2. Why is data normalization important in data analysis and machine learning?

Ans :

3. What are some common methods for normalizing data in Knowledge Flow?

Ans :

 List of Post-Lab Questions:

1. Explain the Min-Max scaling method briefly.

Ans :

2. What can happen if you don't normalize data before analysis or modeling?
Ans :

3. Name one other data pre-processing step that might be necessary before
normalization.
Ans:

4. When might you choose not to normalize data during data analysis?
Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Conclusion:

 Assessment Table :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Experiment No: 5

Aim: Finding Association Rules for relation (Table) data.

Task: Find association rules for banking data using WEKA toolkit

Description: In data mining, association rule learning is a popular and well researched method
for discovering interesting relations between variables in large databases. It can be described
as analyzing and presenting strong rules discovered in databases using different measures of
interestingness. In market basket analysis association rules are used and they are also employed
in many application areas including Web usage mining, intrusion detection and
bioinformatics.

 Creation of Banking Table:

@relation bank
@attribute cust {male,female}
@attribute accno
{0101,0102,0103,0104,0105,0106,0107,0108,0109,0110,0111,0112,0113,0114,0115}
@attribute bankname {sbi,hdfc,sbh,ab,rbi}
@attribute location {hyd,jmd,antp,pdtr,kdp}
@attribute deposit {yes,no}
@data
male,0101,sbi,hyd,yes
female,0102,hdfc,jmd,no
male,0103,sbh,antp,yes
male,0104,ab,pdtr,yes
female,0105,sbi,jmd,no
male,0106,ab,hyd,yes
female,0107,rbi,jmd,yes
female,0108,hdfc,kdp,no
male,0109,sbh,kdp,yes
male,0110,ab,jmd,no
female,0111,rbi,kdp,yes
male,0112,sbi,jmd,yes
female,0113,rbi,antp,no
male,0114,hdfc,pdtr,yes
female,0115,sbh,pdtr,no

1. After that the file is saved with .arff file format.


2. -3-4.
3. Click on weka-3-4, then Weka dialog box is displayed on the screen.
4. In that dialog box there are four modes, click on explorer.
5. Explorer shows many options. In that click on ‘open file’ and select the arff file
6. Click on edit button which shows banking table on weka.

 Procedure for Association Rules:


1. Open Weka
2. Open explorer.
3. Click on open file and select bank.arff

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

4. Select Associate option on the top of the Menu bar.


5. Select Choose button and then click on Apriori Algorithm.
6. Click on Start button and output will be displayed on the right side of the window

 Output/Result:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 List of Pre-Lab Questions:

1. What is the primary aim of finding association rules for relation (table) data
Ans :

2. Why are association rules important in data mining and analysis?


Ans :

3. What are the key components of an association rule (e.g., "antecedent" and
"consequent")?

Ans :

 List of Post-Lab Questions:

1. Explain the concept of support and confidence in association rule mining.

Ans :

2. What is a common application of association rule mining in the business or retail


sector?
Ans :

3. Name a well-known algorithm for finding association rules in data.

Ans :

4. Why might it be essential to prune or filter association rules during the post-
processing stage?
Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Conclusion:

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Experiment No: 6

Aim: To Construct Decision tree for relation data and classify it.

Task: Construct decision tree for weather data and classify it using WEKA toolkit

 Description:
Classification & Prediction:
Classification is the process for finding a model that describes the data values and concepts
for the purpose of Prediction.

 Decision Tree:
A decision Tree is a classification scheme to generate a tree consisting of root node, internal
nodes and external nodes.

Root nodes representing the attributes. Internal nodes are also the attributes. External nodes
are the classes and each branch represents the values of the attributes

Decision Tree also contains set of rules for a given data set; there are two subsets in Decision
Tree. One is a Training data set and second one is a Testing data set. Training data set is
previously classified data. Testing data set is newly generated data.

 Creation of Weather Table:


@relation weather
@attribute outlook {sunny, rainy, overcast}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

1. After that the file is saved with .arff file format.


2. -3-4.
3. Click on weka-3-4, then Weka dialog box is displayed on the screen.
4. In that dialog box there are four modes, click on explorer.
5. Explorer shows many options. In that click on ‘open file’ and select the arff file

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

6. Click on edit button which shows weather table on weka.

 Procedure for Decision Trees:


1. Open Weka.
2. Open explorer.
3. Click on open file and select weather.arff
4. Select Classifier option on the top of the Menu bar.
5. Select Choose button and click on Tree option.
6. Click on J48.
7. Click on Start button and output will be displayed on the right side of the window.
8. Select the result list and right click on result list and select Visualize Tree option.
9. Then Decision Tree will be displayed on new window

Output

Decision Tree:

 Output/Result:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Decision Tree:

 List of Pre-Lab Questions:

1. What is the primary aim of constructing a decision tree for relation (table) data?
Ans :

2. Why are decision trees a popular choice for classification tasks in data mining and
machine learning?

Ans :

3. What are the key components of a decision tree (e.g., "root," "branches," and
"leaves")?

Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 List of Post-Lab Questions:

1. Explain the concept of "entropy" or "Gini impurity" in the context of decision tree
construction.
Ans :

2. What is "pruning" in the context of decision trees, and why might it be necessary?

Ans :

3. Name a well-known algorithm or method for constructing decision trees.


Ans :

4. How does a decision tree classify new data points once it has been constructed?
Ans :

5. What are some advantages and limitations of using decision trees for classification
tasks?
Ans :

Conclusion:

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Experiment No: 7

Aim: To understand & apply the procedure for Visualizing relation (Table) data.

Task: Apply the procedure for Visualization on Banking relation Table.

Description: This program calculates and has comparisons on the data set selection of attributes
and methods of manipulations have been chosen. The Visualization can be shownin a 2-D
representation of the information.

 Creation of Banking Table:


@relation bank
@attribute cust {male,female}
@attribute accno
{0101,0102,0103,0104,0105,0106,0107,0108,0109,0110,0111,0112,0113,0114,0115}
@attribute bankname {sbi,hdfc,sbh,ab,rbi}
@attribute location {hyd,jmd,antp,pdtr,kdp}
@attribute deposit {yes,no}
@data
male,0101,sbi,hyd,yes
female,0102,hdfc,jmd,no
male,0103,sbh,antp,yes
male,0104,ab,pdtr,yes
female,0105,sbi,jmd,no
male,0106,ab,hyd,yes
female,0107,rbi,jmd,yes
female,0108,hdfc,kdp,no
male,0109,sbh,kdp,yes
male,0110,ab,jmd,no
female,0111,rbi,kdp,yes
male,0112,sbi,jmd,yes
female,0113,rbi,antp,no
male,0114,hdfc,pdtr,yes
female,0115,sbh,pdtr,no

1) After that the file is saved with .arff file format.


2 -3-4.
3) Click on weka-3-4, then Weka dialog box is displayed on the screen.
4) In that dialog box there are four modes, click on explorer.
5) Explorer shows many options. In that click on ‘open file’ and select the arff file
6) Click on edit button which shows Banking table on Weka.

 Procedure:
1. Open Weka
2. Open the explorer and click on Preprocess, then a new window will appear. In that
window select bank.arff file then the data will be displayed.
3. After that click on the Visualize tab on the top of the Menu bar.
4. When we select Visualize tab then Plot Matrix is displayed on the screen.
5. After that we select the Select Attribute button, then select Cust attribute and clock
OK.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

6. Click on the Update button to display the output.


7. After that select the Select Attribute button and select Accno attribute and then click
OK.
8. Increase the Plot Size and Point Size.
9. Click on the Update button to display the output.
10. After that we select the Select Attribute button, then select Bankname attribute and
clock OK.
11. Click on the Update button to display the output
12. After that select the Select Attribute button and select location attribute and then click
OK.
13. Increase the Jitter Size.
14. Click on the Update button to display the output
15. After that we select the Select Attribute button, then select Deposit attribute and clock
OK.
16. Click on the Update button to display the output.

 Output/Result:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 List of Pre-Lab Questions:

1. What is the primary goal of visualizing relation (table) data in data analysis?
Ans :
2. Why is data visualization an important step in the data analysis process?
Ans :

3. Name two common types of charts or graphs used for visualizing relation data.
Ans :

 List of Post-Lab Questions:

1. Explain the purpose of using a bar chart when visualizing relation data.
Ans :

2. How does a scatter plot help in visualizing relationships between two variables in a
relation dataset?
Ans :

3. What are some common libraries or tools for creating data visualizations in data
analysis?
Ans :

4. Why might you choose a heatmap as a visualization technique for relation data?
Ans :

5. How can data visualization aid in the decision-making process in data analysis?
Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Conclusion:

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Experiment No: 8

Aim: To study & understand the procedure for cross-validation using J48 Algorithm for
given relation.

Task: Apply Cross validation using J48 algorithm for weather relation using WEKA

 Description:
Cross-validation, sometimes called rotation estimation, is a technique for assessing how the
results of a statistical analysis will generalize to an independent data set. It is mainly used in
settings where the goal is prediction, and one wants to estimate how accurately a predictive
model will perform in practice. One round of cross-validation involves partitioning a sample
of data into complementary subsets, performing the analysis on one subset (called the training
set), and validating the analysis on the other subset (called the validation set or testing set).

 Creation of Weather Table:


@relation weather
@attribute outlook {sunny, rainy, overcast}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

1. After that the file is saved with .arff file format.


2. Minimize the arff file and then -3-4.
3. Click on weka-3-4, then Weka dialog box is displayed on the screen.
4. In that dialog box there are four modes, click on explorer.
5. Explorer shows many options. In that click on ‘open file’ and select the arff file
6. Click on edit button which shows weather table on weka.

 Procedure:
1. Start Weka
2. Open Knowledge Flow.
3. Select Data Source tab & choose Arff Loader.
4. Place Arff Loader component on the layout area by clicking on that component.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

5. Specify an Arff file to load by right clicking on Arff Loader icon, and then a pop-up menu
will appear. In that select Configure & browse to the location of weather.arff
6. Click on the Evaluation tab & choose Class Assigner & place it on the layout.
7. Now connect the Arff Loader to the Class Assigner by right clicking on Arff Loader, and
then select Data Set option, now a link will be established.
8. Right click on Class Assigner & choose Configure option, and then a new window will
appear & specify a class to our data.
9. Select Evaluation tab & select Cross-Validation Fold Maker & place it on the layout.
10. Now connect the Class Assigner to the Cross-Validation Fold Maker.
11. Select Classifiers tab & select J48 component & place it on the layout.
12. Now connect Cross-Validation Fold Maker to J48 twice; first choose Training Data Set
option and then Test Data Set option.
13. Select Evaluation Tab & select Classifier Performance Evaluator component & place it on
the layout.
14. Connect J48 to Classifier Performance Evaluator component by right clicking on J48 &
selecting Batch Classifier.
15. Select Visualization tab & select Text Viewer component & place it on the layout.
16. Connect Text Viewer to Classifier Performance Evaluator by right clicking on Text
Viewer & by selecting Text option.
17. Start the flow of execution by selecting Start Loading from Arff Loader.
18. For viewing result, right click on Text Viewer & select the Show Results, and then the
result will be displayed on the new window.

 Output/Result:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 List of Pre-Lab Questions:

1. What is cross-validation?

Ans :
2. Define J48 Algorithm.

Ans :
3. Why preprocess data before cross-validation?
Ans :
4. What is overfitting?

Ans :
5. What does 'k' represent in k-fold cross-validation?
Ans :

 List of Post-Lab Questions:

1. Why cross-validate?

Ans :
2. Name common performance metrics.

Ans :
3. What does the decision tree reveal?

Ans :
4. How to mitigate underfitting?

Ans :
5. Where is J48 Algorithm useful?

Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Conclusion:

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Experiment No: 9

Aim: To Study & Understand the procedure for Clustering Buying data using Cobweb
Algorithm.

Aim: Apply procedure for Clustering Buying relation using Cobweb Algorithm.

 Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters)
so that the objects in the same cluster are more similar (in some sense or another) to each other
than to those in other clusters. Clustering is a main task of explorative data mining,and a
common technique for statistical data analysis used in many fields, including machine learning,
pattern recognition, image analysis, information retrieval, and bioinformatics.

 Procedure:
@relation buying
@attribute age {L20,20-40,G40}
@attribute income {high,medium,low}
@attribute stud {yes,no}
@attribute creditrate {fair,excellent}
@attribute buyscomp {yes,no}
@data
L20,high,no,fair,yes
20-40,low,yes,fair,yes
G40,medium,yes,fair,yes
L20,low,no,fair,no
G40,high,no,excellent,yes
L20,low,yes,fair,yes
20-40,high,yes,excellent,no
G40,low,no,fair,yes
L20,high,yes,excellent,yes
G40,high,no,fair,yes
L20,low,yes,excellent,no
G40,high,yes,excellent,no
20-40,medium,yes,excellent,yes
L20,medium,yes,fair,yes
G40,high,yes,excellent,yes

1. After that the file is saved with .arff file format.


2. Minimi -3-4.
3. Click on weka-3-4, then Weka dialog box is displayed on the screen.
4. In that dialog box there are four modes, click on explorer.
5. Explorer shows many options. In that click on ‘open file’ and select the arff file
6. Click on edit button which shows buying table on weka.

 Procedure:
1. Click Weka
2. Click on Explorer.
3. Click on open file & then select Buying.arff file.
4. Click on Cluster menu. In this there are different algorithms are there.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

5. Click on Choose button and then select cobweb algorithm.


6. Click on Start button and then output will be displayed on the screen.

Output/Result:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

List of Pre-Lab Questions:



1. Why is data preprocessing crucial before applying the Cobweb Algorithm for
clustering buying data?

Ans :

2. Explain the primary goal of clustering buying data using the Cobweb Algorithm.

Ans :

3. What distinguishes the Cobweb Algorithm from other clustering methods, and why is
it suitable for buying data?

Ans :

4. What are the potential advantages of understanding buying behavior through


clustering?
Ans :

5. How can cluster evaluation metrics assist in assessing the quality of the Cobweb
Algorithm's results for buying data?
Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab


 List of Post-lab Questions:

1. What actionable insights can be derived from the clusters generated using the Cobweb
Algorithm for buying data?

Ans :

2. How does the Cobweb Algorithm accommodate evolving buying data, and what
implications does this have for businesses?
Ans :

3. Discuss the role of visualization in presenting and interpreting the clusters formed by
the Cobweb Algorithm.
Ans :

4. What challenges might arise when applying the Cobweb Algorithm to buying data,
and how can they be addressed?
Ans :

 Conclusion: 

 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

Experiment No: 10

Aim: To Study & Understand the procedure for Clustering Customer data using Simple
KMeans Algorithm.

Task: Apply KMeans algorithm on customer relation (Table)

 Description:
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters)
so that the objects in the same cluster are more similar (in some sense or another) to each other
than to those in other clusters. Clustering is a main task of explorative data mining,and a
common technique for statistical data analysis used in many fields, including machine learning,
pattern recognition, image analysis, information retrieval, and bioinformatics.

 Creation of Customer Table:


@relation customer
@attribute name {x,y,z,u,v,l,w,q,r,n}
@attribute age {youth,middle,senior}
@attribute income {high,medium,low}
@attribute class {A,B}
@data
x,youth,high,A
y,youth,low,B
z,middle,high,A
u,middle,low,B
v,senior,high,A
l,senior,low,B
w,youth,high,A
q,youth,low,B
r,middle,high,A
n,senior,high,A

1. After that the file is saved with .arff file format.


2. Minimize the arff file and then open weka.
3. Click on weka, then Weka dialog box is displayed on the screen.
4. In that dialog box there are four modes, click on explorer.
5. Explorer shows many options. In that click on ‘open file’ and select the arff file
6. Click on edit button which shows buying table on weka.

 Procedure:
1. Click Weka
2. Click on Explorer.
3. Click on open file & then select Customer.arff file.
4. Click on Cluster menu. In this there are different algorithms are there.
5. Click on Choose button and then select SimpleKMeans algorithm.
6. Click on Start button and then output will be displayed on the screen.

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

 Output/Result:

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

List of Pre-Lab Questions:

1. Why is data preprocessing essential before applying the KMeans Algorithm to


cluster customer data?

Ans :

2. How do you determine the optimal number of clusters (k) for customer data
segmentation with the KMeans Algorithm, and why is this crucial?

Ans :

3. What role does centroid initialization play in the KMeans Algorithm, and what
methods can be used for this initialization?

Ans :

4. Why is it necessary to interpret and evaluate the results of customer data clustering,
and what metrics can be used for evaluation?

Ans :

List of Post-Lab Questions:

1. How can the insights derived from customer data clustering using the KMeans
Algorithm benefit marketing strategies for a business?
Ans :

2. Discuss the challenges that may arise when applying the KMeans Algorithm to
customer data, and how can they be addressed?
Ans :

3. What are some practical methods for visualizing and presenting the results of customer data
clustering with the KMeans Algorithm?

Ans :

4. How does the KMeans Algorithm influence decision-making in industries such as e-


commerce, retail, and customer relationship management (CRM)?

Ans :

Final Year_CSE_Semester-VII_PRPCEM-2023-2024
Emerging Technology-III Lab

5. Why is it important to document the findings and insights obtained from customer
data clustering with the KMeans Algorithm, and how can businesses use this
documentation effectively?
Ans :

Conclusion:



 Assessment Scheme :

Pre lab Test In Lab Performance Post Lab Test Record Total
(2) (5) (3) (5) (15)

Final Year_CSE_Semester-VII_PRPCEM-2023-2024

You might also like