0% found this document useful (0 votes)

3 views3 pages

Data Clustering Solution

The document discusses data clustering by preprocessing three documents, extracting keywords, and creating a term-document matrix. It calculates Euclidean distances between documents, identifying two clusters: one containing Document 1 and Document 3, and another containing Document 2. Additionally, it provides an .ARFF file format representation of the data for further analysis.

Uploaded by

willingtonkavaska

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Data Clustering Solution

Uploaded by

willingtonkavaska

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

DATA CLUSTERING

Answers to the Questions:

(i) Preprocessing the Documents:

Retained content for each document after removing stop words, punctuation, and irrelevant terms:

Document 1: information, data, meaning, objective, database, administration, storing, facts,

computer-based, databases, presenting, carry, processing, computer.

Document 2: structure, carbon-containing, chemical, compounds, include, hydrocarbons,

compounds, elements, hydrogen, carbon-hydrogen, bond, nitrogen, oxygen, chemical, diversity.

Document 3: computer, program, sequence, instructions, computer, process, data, information,

area, application, example, sum, values.

(ii) Keywords and Term-Dictionary:

Extracted keywords forming the term-dictionary:

[information, data, meaning, objective, database, administration, storing, facts,

computer, carbon, chemical, compounds, program, instructions, application, example, values].

(iii) Term-Document Matrix:

Term-document matrix representing the frequency of terms in each document:

| Term | Doc 1 | Doc 2 | Doc 3 |

|-------------------|-------|-------|-------|

| information |3 |0 |2 |

| data |2 |0 |1 |

| meaning |1 |0 |0 |
| objective |1 |0 |0 |

| database |2 |0 |0 |

| administration |1 |0 |0 |

| storing |1 |0 |0 |

| facts |1 |0 |0 |

| computer |1 |0 |2 |

| carbon |0 |2 |0 |

| chemical |0 |2 |0 |

| compounds |0 |2 |0 |

| program |0 |0 |1 |

| instructions |0 |0 |1 |

| application |0 |0 |1 |

| example |0 |0 |1 |

| values |0 |0 |1 |

(iv) Euclidean Distance and Clustering:

Calculated distances between documents:

- Distance (Doc 1, Doc 2): sqrt(21)

- Distance (Doc 1, Doc 3): sqrt(2)

- Distance (Doc 2, Doc 3): sqrt(21)

Clusters:

Cluster 1: {Doc 1, Doc 3}

Cluster 2: {Doc 2}

(v) .ARFF File:

@relation documents
@attribute term_information numeric

@attribute term_data numeric

@attribute term_meaning numeric

@attribute term_objective numeric

@attribute term_database numeric

@attribute term_administration numeric

@attribute term_storing numeric

@attribute term_facts numeric

@attribute term_computer numeric

@attribute term_carbon numeric

@attribute term_chemical numeric

@attribute term_compounds numeric

@attribute term_program numeric

@attribute term_instructions numeric

@attribute term_application numeric

@attribute term_example numeric

@attribute term_values numeric

@data

3, 2, 1, 1, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0

0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 0

2, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 1, 1, 1, 1

Introduction To Chemical Graph Theory by Stephan Wagner Official Test Bank
No ratings yet
Introduction To Chemical Graph Theory by Stephan Wagner Official Test Bank
404 pages
C For Engineers and Scientists 4th Edition Full Download
No ratings yet
C For Engineers and Scientists 4th Edition Full Download
410 pages
Thesis Lead Halide Perovskite Solar Cells
No ratings yet
Thesis Lead Halide Perovskite Solar Cells
190 pages
Testbank For University Physics Volume 3 13th Edition
No ratings yet
Testbank For University Physics Volume 3 13th Edition
17 pages
Organic Intermediate
No ratings yet
Organic Intermediate
157 pages
Testbank For Fundamentals of Heat and Mass Transfer 8th Edition
No ratings yet
Testbank For Fundamentals of Heat and Mass Transfer 8th Edition
18 pages
Testbank For Introductory Chemistry 3rd Edition Instant Download
No ratings yet
Testbank For Introductory Chemistry 3rd Edition Instant Download
17 pages
Testbank For Digital Fundamentals 11th Edition
No ratings yet
Testbank For Digital Fundamentals 11th Edition
17 pages
Testbank For University Physics Volume 3 1st Edition Instant Download
No ratings yet
Testbank For University Physics Volume 3 1st Edition Instant Download
18 pages
Testbank For Mecnica de Materiales 8th Edition
No ratings yet
Testbank For Mecnica de Materiales 8th Edition
18 pages
7 3 Fission and Fusion TWPTSGDTHSRBNHCX
No ratings yet
7 3 Fission and Fusion TWPTSGDTHSRBNHCX
28 pages
Testbank For Statistics For Engineers and Scientists 5th Edition
No ratings yet
Testbank For Statistics For Engineers and Scientists 5th Edition
17 pages
Testbank For Termodinamica 8th Edition
No ratings yet
Testbank For Termodinamica 8th Edition
17 pages
Testbank For Engineering Mechanics 13th Edition
No ratings yet
Testbank For Engineering Mechanics 13th Edition
17 pages
Testbank For Chemistry 1st Edition
No ratings yet
Testbank For Chemistry 1st Edition
18 pages
Testbank For Chemistry Atoms First 2nd Edition
No ratings yet
Testbank For Chemistry Atoms First 2nd Edition
18 pages
Testbank For Chemical Principles 6th Edition
No ratings yet
Testbank For Chemical Principles 6th Edition
18 pages
Testbank For Python For Everyone Interactive Edition 2nd Edition
No ratings yet
Testbank For Python For Everyone Interactive Edition 2nd Edition
18 pages
Testbank For Starting Out With C Early Objects 8th Edition
No ratings yet
Testbank For Starting Out With C Early Objects 8th Edition
17 pages
Testbank For Starting Out With Visual C 4th Edition
No ratings yet
Testbank For Starting Out With Visual C 4th Edition
18 pages
Testbank For Fundamentals of Engineering Thermodynamics 6th Edition
No ratings yet
Testbank For Fundamentals of Engineering Thermodynamics 6th Edition
18 pages
Testbank For Chemistry 10th Edition
No ratings yet
Testbank For Chemistry 10th Edition
17 pages
Testbank For University Chemistry 1st Edition
No ratings yet
Testbank For University Chemistry 1st Edition
17 pages
Testbank For Physics For Scientists and Engineers 2nd Edition
No ratings yet
Testbank For Physics For Scientists and Engineers 2nd Edition
17 pages
Testbank For Introduction To Algorithms 3rd Edition
No ratings yet
Testbank For Introduction To Algorithms 3rd Edition
17 pages
Testbank For Engineering Mathematics 5th Edition
No ratings yet
Testbank For Engineering Mathematics 5th Edition
18 pages
Testbank and Solutions For Chemistry Matter and Change Florida Edition 1st Edition
No ratings yet
Testbank and Solutions For Chemistry Matter and Change Florida Edition 1st Edition
18 pages
Energy From Waste Titled
No ratings yet
Energy From Waste Titled
6 pages
Database Design: Wednesday, January 25, 2006
No ratings yet
Database Design: Wednesday, January 25, 2006
21 pages
A Strategy To Compromise Handwritten Documents Processing and Retrieving Using Association Rules Mining
No ratings yet
A Strategy To Compromise Handwritten Documents Processing and Retrieving Using Association Rules Mining
6 pages
Database Design: Wednesday, January 25, 2006
No ratings yet
Database Design: Wednesday, January 25, 2006
21 pages
Turkevich Et Al 1951 A Study of The Nucleation and Growth Processes in
No ratings yet
Turkevich Et Al 1951 A Study of The Nucleation and Growth Processes in
23 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Acid Base Titration Lab - Part I - Aadit
No ratings yet
Acid Base Titration Lab - Part I - Aadit
4 pages
Formulir Analisis Keselamatan Pekerjaan Laboratorium 15082022 (Final)
No ratings yet
Formulir Analisis Keselamatan Pekerjaan Laboratorium 15082022 (Final)
16 pages
Jee Mindmap
No ratings yet
Jee Mindmap
8 pages
Arxiv
No ratings yet
Arxiv
15 pages
Dbms Ia 2 Set A Scheme
No ratings yet
Dbms Ia 2 Set A Scheme
8 pages
Biodiesel Production
No ratings yet
Biodiesel Production
94 pages
Chapter 9 Amine
No ratings yet
Chapter 9 Amine
43 pages
Database Engineering Detailed Answers Fixed
No ratings yet
Database Engineering Detailed Answers Fixed
4 pages
Antioxidants 10 01264
No ratings yet
Antioxidants 10 01264
40 pages
Vapour Liquid Equilibrium
No ratings yet
Vapour Liquid Equilibrium
3 pages
Unit 2 - Basic IR Models
No ratings yet
Unit 2 - Basic IR Models
7 pages
FD and Closure Attribute
No ratings yet
FD and Closure Attribute
42 pages
Comp 414 Cat 2
No ratings yet
Comp 414 Cat 2
1 page
Comp
No ratings yet
Comp
3 pages
Normal Forms
No ratings yet
Normal Forms
23 pages
AI Important Questions
No ratings yet
AI Important Questions
12 pages
Explain The Concept of Foreign Key. How A Foreign Key Differs From A Primary Key? Can The Foreign Key Accept Nulls? Answer
No ratings yet
Explain The Concept of Foreign Key. How A Foreign Key Differs From A Primary Key? Can The Foreign Key Accept Nulls? Answer
5 pages
Chemistry Question 2a F4 S3
No ratings yet
Chemistry Question 2a F4 S3
3 pages
10 PG Blank
No ratings yet
10 PG Blank
10 pages
Course Syllabus
No ratings yet
Course Syllabus
3 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Science (2nd Week-4th Quarter)
No ratings yet
Science (2nd Week-4th Quarter)
11 pages
cs235&cs334 Test 1
No ratings yet
cs235&cs334 Test 1
1 page
IT 220 Unit 4 Relational-Database-Design
No ratings yet
IT 220 Unit 4 Relational-Database-Design
56 pages
DM Questions
No ratings yet
DM Questions
7 pages
Solubilidad de Los Metales - Parte II
No ratings yet
Solubilidad de Los Metales - Parte II
10 pages
SCI10 Q4 MOD3 A
No ratings yet
SCI10 Q4 MOD3 A
28 pages
Mock Test
No ratings yet
Mock Test
13 pages
Flashy DB Docs
No ratings yet
Flashy DB Docs
8 pages
Dataminingassignmentjohnvictorgichonge
No ratings yet
Dataminingassignmentjohnvictorgichonge
2 pages
Final Year Project
No ratings yet
Final Year Project
69 pages
Wa0000
No ratings yet
Wa0000
1 page
DDM Record - 240907 - 123634
No ratings yet
DDM Record - 240907 - 123634
55 pages
Gravimetric Analysis D2022
No ratings yet
Gravimetric Analysis D2022
33 pages
An Efficient and Empirical Model of Distributed Clustering
No ratings yet
An Efficient and Empirical Model of Distributed Clustering
5 pages
2019-Cpe-27 DBMS Assignment 2
No ratings yet
2019-Cpe-27 DBMS Assignment 2
9 pages
MSDS Organic PK Booster V1
No ratings yet
MSDS Organic PK Booster V1
6 pages
Safety Data Sheet of XI40229-000
No ratings yet
Safety Data Sheet of XI40229-000
15 pages
Efficient Clustering Algorithm For Documents: SHOBHIT PALIWAL - 0105IT071096 SUYOG DEOSKAR - 0105IT071109
No ratings yet
Efficient Clustering Algorithm For Documents: SHOBHIT PALIWAL - 0105IT071096 SUYOG DEOSKAR - 0105IT071109
26 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
Introduction To Relational Databases: Randy Julian Lilly Research Laboratories
No ratings yet
Introduction To Relational Databases: Randy Julian Lilly Research Laboratories
21 pages
11th Investigatory Project
No ratings yet
11th Investigatory Project
1 page
KG8.1 Antibac Foam Handwash - SDS REV 01-10-2021 EN
No ratings yet
KG8.1 Antibac Foam Handwash - SDS REV 01-10-2021 EN
6 pages
CTSD Project PDF
No ratings yet
CTSD Project PDF
43 pages
1700588571-Unit 3 Test - Mr. Samai
100% (1)
1700588571-Unit 3 Test - Mr. Samai
5 pages
Corrosion Potential of Soils: For Geotechnical Engineers, It's Very Important Subject
No ratings yet
Corrosion Potential of Soils: For Geotechnical Engineers, It's Very Important Subject
11 pages
Chapter 1
No ratings yet
Chapter 1
15 pages
Material Safety Data Sheet Fishmeal
No ratings yet
Material Safety Data Sheet Fishmeal
3 pages
Y9 Science Pratise
No ratings yet
Y9 Science Pratise
20 pages
Normalisation Exercises
100% (3)
Normalisation Exercises
3 pages
Mining Data Records Based On Ontology Evolution For Deep Web
No ratings yet
Mining Data Records Based On Ontology Evolution For Deep Web
4 pages
Minimal Cover Document
No ratings yet
Minimal Cover Document
15 pages
Opc 43G W.no.37
No ratings yet
Opc 43G W.no.37
1 page
Designing Databases: Data Storage Design Objectives
No ratings yet
Designing Databases: Data Storage Design Objectives
8 pages
CS712 Assignment 1
No ratings yet
CS712 Assignment 1
7 pages
Deploying and Managing Applications with DigitalOcean: Definitive Reference for Developers and Engineers
From Everand
Deploying and Managing Applications with DigitalOcean: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
New sanitation techniques in the development cooperation: An economical reflection
From Everand
New sanitation techniques in the development cooperation: An economical reflection
Ulrich Menter
No ratings yet

Data Clustering Solution

Uploaded by

Data Clustering Solution

Uploaded by

DATA CLUSTERING

Answers to the Questions:

(i) Preprocessing the Documents:

Document 1: information, data, meaning, objective, database, administration, storing, facts,

computer-based, databases, presenting, carry, processing, computer.

Document 2: structure, carbon-containing, chemical, compounds, include, hydrocarbons,

compounds, elements, hydrogen, carbon-hydrogen, bond, nitrogen, oxygen, chemical, diversity.

Document 3: computer, program, sequence, instructions, computer, process, data, information,

area, application, example, sum, values.

(ii) Keywords and Term-Dictionary:

Extracted keywords forming the term-dictionary:

[information, data, meaning, objective, database, administration, storing, facts,

computer, carbon, chemical, compounds, program, instructions, application, example, values].

(iii) Term-Document Matrix:

Term-document matrix representing the frequency of terms in each document:

| Term | Doc 1 | Doc 2 | Doc 3 |

(iv) Euclidean Distance and Clustering:

Calculated distances between documents:

- Distance (Doc 1, Doc 2): sqrt(21)

- Distance (Doc 1, Doc 3): sqrt(2)

- Distance (Doc 2, Doc 3): sqrt(21)

Cluster 1: {Doc 1, Doc 3}

(v) .ARFF File:

@attribute term_data numeric

@attribute term_meaning numeric

@attribute term_objective numeric

@attribute term_database numeric

@attribute term_administration numeric

@attribute term_storing numeric

@attribute term_facts numeric

@attribute term_computer numeric

@attribute term_carbon numeric

@attribute term_chemical numeric

@attribute term_compounds numeric

@attribute term_program numeric

@attribute term_instructions numeric

@attribute term_application numeric

@attribute term_example numeric

@attribute term_values numeric

You might also like