0% found this document useful (0 votes)

23 views

Module 5 Document Clustering

Uploaded by

saravanangerode21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Module 5 Document Clustering

Uploaded by

saravanangerode21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Introduction Methodology Related Work The End

Document Clustering: Comparison of

Similarity Measures

Shouvik Sachdeva Bhupendra Kastore

Indian Institute of Technology, Kanpur

CS365 Project, 2014

Introduction Methodology Related Work The End

Outline

1 Introduction
The Problem and the Motivation
Approach
2 Methodology
Document Representation
Similarity Measures
Clustering Algorithms
Evaluation
3 Related Work
Past Results
References
4 The End
Introduction Methodology Related Work The End

The Problem and the Motivation

What is document clustering and why is it important?

Document clustering is a method to classify the documents

into a small number of coherent groups or clusters by
using appropriate similarity measures.
Document clustering plays a vital role in document
organization, topic extraction and information retrieval.
With the ever increasing number of high dimensional
datasets over the internet, the need for efficient clustering
algorithms has risen.
Introduction Methodology Related Work The End

The Problem and the Motivation

How can we solve this problem?

A lot of these documents share a large proportion of

lexically equivalent terms.
We will exploit this feature by using a “bag of words" model
to represent the content of a document.
We will group “similar" documents together to form a
coherent cluster.
This “similarity" can be defined in various ways. In the
vector space, it is closely related to the notion of distance
which can be defined in several ways.
We will try to test which similarity measure performs the
best across various domains of text articles in English and
Hindi.
Introduction Methodology Related Work The End

Approach

How will we compare these similarity measures?

We will first represent our document using the bag of

words and the vector space model.
Then we will cluster documents (now high dimensional
vectors) by k -means and hierarchical clustering techniques
using different similarity measures.
Documents we will use are from varied domains from
English and Hindi.
We will then compare the performance of each similarity
measure across the different kinds of documents.
Entropy and Purity measure will be used for the purposes
of evaluation.
Introduction Methodology Related Work The End

Document Representation

Bag of Words: Model

Each word is assumed to be independent and the order in

which they occur is immaterial.
Each word corresponds to a dimension in the resulting
data space.
Each document then becomes a vector consisting of
non-negative values on each dimension.
Widely used in information retrieval and text mining.
Introduction Methodology Related Work The End

Document Representation

Bag of Words: Example

Here are two simple text documents:

Document 1
I don’t know what I am saying.

Document 2
I can’t wait for this to get over.
Introduction Methodology Related Work The End

Document Representation

Bag of Words: Example

Now, based on these two documents, a dictionary is

constructed:
“I":1
“don’t"":2
“know":3
“what":4
“am":5
“saying":6
“can’t":7
“wait":8
“for":9
“this":10
“to":11
“get":12
“over":13
Introduction Methodology Related Work The End

Document Representation

Bag of Words: Example

The dictionary has 13 distinct words. Using the indices of the

dictionary, the document is represented by a 13-entry vector.
Document 1
[2,1,1,1,1,1,0,0,0,0,0,0,0]

Document 2
[1,0,0,0,0,0,1,1,1,1,1,1,1]

Each entry of the vectors refers to count of the corresponding

entry in the dictionary.
Introduction Methodology Related Work The End

Document Representation

Representing the document formally

Let D = {d1 , . . . , dn } be a set of documents and

T = {t1 , . . . , tm } be the set of distinct terms occurring in D. The
document’s representation in the vector space is given by an
m-dimensional vector t~d ,

t~d = (tf (d, t1 ), . . . , tf (d, tm ))

where tf (d, t) denotes the frequency of the term t ∈ T in

document d ∈ D.
Introduction Methodology Related Work The End

Document Representation

Pre-processing

First, we will remove stop words (non-descriptive such as

a, and, are and do). We will use the one implemented in
the Weka machine learning workbench, which contains
527 stop words.
Second, words will be stemmed using Porter?s
suffix-stripping algorithm, so that words with different
endings will be mapped into a single word. For example
production , produce , produces and product will be
mapped to the stem produc.
Third, we considered the effect of including infrequent
terms in the document representation on the overall
clustering performance and decided to discard words that
appear with less than a given threshold frequency.
We select the top 2000 words ranked by their weights and
use them in our experiments.
Introduction Methodology Related Work The End

Document Representation

TFIDF

Some terms that appear frequently in a small number of

documents but rarely in the other documents tend to be
more relevant and specific for that particular group of
documents, and therefore more useful for finding similar
documents.
To capture these terms , we transform the basic term
frequencies tf (d, t) into the tfidf (term frequency and
inversed document frequency) weighting scheme.
Tfidf weighs the frequency of a term t in a document d with
a factor that discounts its importance with its appearances
in the whole document collection, which is defined as:
|D|
tfidf (d, t) = tf (d, t) × log( )
df (t)
Here df (t) is the number of documents in which term t
appears.
Introduction Methodology Related Work The End

Similarity Measures

Metric

A metric space (X , d) consists of a set X on which is defined a

distance function which assigns to each pair of points of X a
distance between them, and which satisfies the following four
axioms:
1 d(x, y ) ≥ 0 for all points x and y of X ;
2 d(x, y ) = d(y , x) for all points x and y of X ;
3 d(x, z) ≤ d(x, y ) + d(y , z) for all points x, y and z of X ;
4 d(x, y ) = 0 if and only if the points x and y coincide.
Introduction Methodology Related Work The End

Similarity Measures

Euclidean Distance

Standard metric for geometric problems.

Given two documents da and db represented by their term
vectors t~a and t~b respectively, the Euclidean distance is
defined as
Xm
DE (t~a , t~b ) = ( |wt,a − wt,b |2 )1/2
t=1

where T = {t1 , . . . , tm } is the term set and the weights,

wt,a = tfidf (da , t).
Introduction Methodology Related Work The End

Similarity Measures

Cosine Similarity

Cosine similarity is a measure of similarity between two

vectors of an inner product space that measures the
cosine of the angle between them.
Given two documents t~a and t~b , the Cosine similarity is
defined as
t~a · t~b
SIMC (t~a , t~b ) =
|t~a | × |t~b |
where t~a and t~b are m-dimensional vectors over the term
set T .
Non-negative and bounded between [0, 1].
Introduction Methodology Related Work The End

Similarity Measures

Jaccard Coefficient

The Jaccard coefficient measures similarity between finite

sample sets, and is defined as the size of the intersection
divided by the size of the union of the sample sets.
Given two documents t~a and t~b , the Jaccard Coefficient is
defined as

t~a · t~b
SIMJ (t~a , t~b ) =
|t~a |2 + |t~b |2 − t~a · t~b

where t~a and t~b are m-dimensional vectors over the term
set T .
Non-negative and bounded between [0, 1].
Introduction Methodology Related Work The End

Similarity Measures

Pearson Correlation Coefficient

The Pearson Correlation coefficient is a measure of the

linear correlation (dependence) between two variables X
and Y, giving a value between +1 and -1 inclusive, where 1
is total positive correlation, 0 is no correlation, and -1 is
total negative correlation.
Given two documents t~a and t~b , the Pearson Correlation
Coefficient is defined as
m m
P
~ ~ t=1 wt,a × wt,b − TFa × TFb
SIMP (ta , tb ) = q Pm Pm
2 − TF 2 ][m 2 2
[m t=1 wt,a a t=1 wt,b − TFb ]

where t~a and t~b are

P m-dimensional vectors over the term
set T and TFa = m t=1 wt,a , wt,a = tfidf (da , t).
Introduction Methodology Related Work The End

Similarity Measures

Manhattan Distance

The Manhattan Distance is the distance that would be

traveled to get from one data point to the other if a grid-like
path is followed. The Manhattan distance between two
items is the sum of the differences of their corresponding
components.
Given two documents t~a and t~b , the Manhattan Distance
between them is defined as
m
X
SIMM (t~a , t~b ) = |wt,a − wt,b |
t=1

where t~a and t~b are m-dimensional vectors over the term
set T and wt,a = tfidf (da , t).
Introduction Methodology Related Work The End

Similarity Measures

Chebychev Distance

The Chebychev distance between two points is the

maximum distance between the points in any single
dimension.
Given two documents t~a and t~b , the Chebychev Distance is
defined as

SIMCh (t~a , t~b ) = max |wt,a − wt,b |

where t~a and t~b are m-dimensional vectors over the term
set T and wt,a = tfidf (da , t).
Introduction Methodology Related Work The End

Clustering Algorithms

Hierarchical Algorithms

Hierarchical clustering is a method of cluster analysis

which seeks to build a hierarchy of clusters.
Strategies for hierarchical clustering generally fall into two
types:
Agglomerative (bottom up):
This method starts with every single object (gene or
sample) in a single cluster. Then, in each successive
iteration, it agglomerates (merges) the closest pair of
clusters by satisfying some similarity criteria, until all of the
data is in one cluster. O(n3 )
Divisive(top down):
This method starts with a single cluster containing all
objects, and then successively splits resulting clusters until
only clusters of individual objects remain. O(n2 )
Introduction Methodology Related Work The End

Clustering Algorithms

Hierarchical Clustering: In action

http://www.cs.utexas.edu/~mooney/cs391L/
slides/clustering.ppt

Figure: Single Link

Introduction Methodology Related Work The End

Clustering Algorithms

Hierarchical Clustering: End Result

Figure: Hierarchical Clustering

Introduction Methodology Related Work The End

Clustering Algorithms

k -means Algorithm

Partitions observations into clusters resulting in a

partitioning of the data space into Voronoi cells.
First pick k , the number of clusters.
Initialize clusters by picking one point per cluster.For
instance, pick one point at random, then k − 1 other points,
each as far away as possible from the previous points.
Try different values of k and choose the based on the
average distance to the centroid.
Introduction Methodology Related Work The End

Clustering Algorithms

k -means: Populating Clusters

For each point, place it in the cluster whose current

centroid it is nearest.
After all points are assigned, fix the centroids of the k
clusters.
Reassign all points to their closest centroid.
Introduction Methodology Related Work The End

Clustering Algorithms

k -means: In action

http://www.codeproject.com/Articles/439890/
Text-Documents-Clustering-using-K-Means-Algorithm

Figure: k -means process

Introduction Methodology Related Work The End

Clustering Algorithms

Effective choice for k

Effective heuristics for seed selection include:

Excluding outliers from the seed set
Trying out multiple starting points and choosing the
clustering with the lowest cost; and
Obtaining seeds from another method such as hierarchical
clustering.
Introduction Methodology Related Work The End

Evaluation

Entropy

Entropy measures the distribution of categories in a given

cluster. The entropy of a cluster Ci with size ni is defined as
k
1 X nih nh
E(Ci ) = − log( i )
logc ni ni
h=1

where c is the total number of categories in the data set and nih
is the number of documents from the hth class that were
assigned to this cluster Ci .
Introduction Methodology Related Work The End

Evaluation

Purity

Purity provides provides insight into the coherence of a cluster

i.e. the degree to which a cluster contains documents from a
single category. Purity for a given cluster Ci of size ni is given
by:
1
P(Ci ) = maxh (nih )
ni
where maxh (nih ) is the number of documents that are from the
dominant category in cluster Ci and nih represents the number
of documents from the cluster assigned to category h.
Introduction Methodology Related Work The End

Evaluation

Datasets

We will use the following datasets for our project:

20news
BBC/BBC Sport
Wikipedia
FIRE
Classic
r0
Introduction Methodology Related Work The End

Past Results

Anna Huang

Table 1: Purity Results

Data Euclidean Cosine Jaccard Pearson KLD
20news 0.1 0.5 0.5 0.5 0.38
classic 0.56 0.85 0.98 0.85 0.84
hitech 0.29 0.54 0.51 0.56 0.53
re0 0.53 0.78 0.75 0.78 0.77
tr41 0.71 0.71 0.72 0.78 0.64
wap 0.32 0.62 0.63 0.61 0.61
webkb 0.42 0.68 0.57 0.67 0.75
Introduction Methodology Related Work The End

Past Results

Anna Huang

Table 2: Entropy Results

Data Euclidean Cosine Jaccard Pearson KLD
20news 0.95 0.49 0.51 0.49 0.54
classic 0.78 0.29 0.06 0.27 0.3
hitech 0.92 0.64 0.68 0.65 0.63
re0 0.6 0.27 0.33 0.26 0.25
tr41 0.62 0.33 0.34 0.3 0.38
wap 0.75 0.39 0.4 0.39 0.4
webkb 0.93 0.6 0.74 0.61 0.51
Introduction Methodology Related Work The End

References

Anna Huang.
Similarity measures for document clustering.
In Proceedings of the Sixth New Zealand Computer
Science Research Student Conference (NZCSRSC2008),
Christchurch, New Zealand, pages 49 − 56, 2008.
D. Arthur and S. Vassilvitskii.
k-means++ the advantages of careful seeding.
In Symposium on Discrete Algorithms, 2007.
Y. Zhao and G. Karypis.
Empirical and theoretical comparisons of selected criterion
functions for document clustering.
Machine Learning, 55(3), 2004.
Introduction Methodology Related Work The End

That’s all folks!

Thank You!
Questions?
Suggestions?

1Z0-1127-24 (OCI Generative AI Professional)
100% (5)
1Z0-1127-24 (OCI Generative AI Professional)
19 pages
9.Text-Based Measure of Supply Chain Risk Exposure
No ratings yet
9.Text-Based Measure of Supply Chain Risk Exposure
43 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
A Comparative Study of TF IDF LSI
No ratings yet
A Comparative Study of TF IDF LSI
8 pages
Keyword 2
No ratings yet
Keyword 2
5 pages
Modern Information Retrieval Chapter 7: Text Operations: Ricardo Baeza-Yates Berthier Ribeiro-Neto
No ratings yet
Modern Information Retrieval Chapter 7: Text Operations: Ricardo Baeza-Yates Berthier Ribeiro-Neto
40 pages
Introduction to Algorithms Lecture Notes (MIT 6_006) -- It-eBooks -- It-eBooks-2017, 2017 -- IBooker It-eBooks -- Eef227987f9618b19c6d1ddd01598c23 -- Anna’s Archive
No ratings yet
Introduction to Algorithms Lecture Notes (MIT 6_006) -- It-eBooks -- It-eBooks-2017, 2017 -- IBooker It-eBooks -- Eef227987f9618b19c6d1ddd01598c23 -- Anna’s Archive
150 pages
Latent Dirichlet Allocation
100% (2)
Latent Dirichlet Allocation
13 pages
Term Weighting
No ratings yet
Term Weighting
71 pages
Improving Topic Models With Latent Feature Word Representations
No ratings yet
Improving Topic Models With Latent Feature Word Representations
16 pages
NLP Ir
No ratings yet
NLP Ir
24 pages
Document Similarity in Repeatedly Translated Corpora: Vladimir Mateljan, Vedran Juričić, Dario Ogrizović
No ratings yet
Document Similarity in Repeatedly Translated Corpora: Vladimir Mateljan, Vedran Juričić, Dario Ogrizović
4 pages
6-Query Languages
No ratings yet
6-Query Languages
19 pages
A New Approach To Represent Textual Documents Using CVSM
No ratings yet
A New Approach To Represent Textual Documents Using CVSM
6 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
Text Classification
No ratings yet
Text Classification
32 pages
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
No ratings yet
NLP Mod-V Q - A (Uploaded by Snaptricks - In)
7 pages
Robust Algorithms For Combining Multiple Term Weighting Vectors For Document Classification
No ratings yet
Robust Algorithms For Combining Multiple Term Weighting Vectors For Document Classification
6 pages
He Laskar 2019
No ratings yet
He Laskar 2019
4 pages
Deep Learning of Semantic Word Representations To Implement A Content-Based Recommender For The Recsys Challenge'14
No ratings yet
Deep Learning of Semantic Word Representations To Implement A Content-Based Recommender For The Recsys Challenge'14
5 pages
Consistency and Structure Analysis of Scholarly Papers Using Based On Natural Language Processing
No ratings yet
Consistency and Structure Analysis of Scholarly Papers Using Based On Natural Language Processing
18 pages
CS 3308 Learning Journal 4
No ratings yet
CS 3308 Learning Journal 4
3 pages
Latent Semantic Analysis For Information Retrieval: Khyati Pawde, Niharika Purbey, Shreya Gangan, Lakshmi Kurup
No ratings yet
Latent Semantic Analysis For Information Retrieval: Khyati Pawde, Niharika Purbey, Shreya Gangan, Lakshmi Kurup
4 pages
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
No ratings yet
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
3 pages
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
No ratings yet
CS8080 INFORMATION RETRIEVAL TECHNIQUES II INTERNAL EXAMINATION - Google Forms
420 pages
Acm Iconiaac 2014
No ratings yet
Acm Iconiaac 2014
8 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Chapter 5 IR
No ratings yet
Chapter 5 IR
46 pages
Improve Text Classification Accuracy Based On Classifier Fusion Methods
No ratings yet
Improve Text Classification Accuracy Based On Classifier Fusion Methods
6 pages
Document Similarity Measure For Classification and Clustering Using TF-IDF
No ratings yet
Document Similarity Measure For Classification and Clustering Using TF-IDF
5 pages
IR Chapter 2 Part II
No ratings yet
IR Chapter 2 Part II
45 pages
Combining Content and Collaboration in Text Filtering
No ratings yet
Combining Content and Collaboration in Text Filtering
9 pages
Reference Material For NLP - 1
No ratings yet
Reference Material For NLP - 1
40 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
Nonhierarchical Document Clustering Based On A Tolerance Rough Set Model
No ratings yet
Nonhierarchical Document Clustering Based On A Tolerance Rough Set Model
14 pages
K-Means Document Clustering Using Vector Space Model
No ratings yet
K-Means Document Clustering Using Vector Space Model
5 pages
Text
No ratings yet
Text
102 pages
2
No ratings yet
2
17 pages
CS 3308 Discussion Assignment Unit 4
No ratings yet
CS 3308 Discussion Assignment Unit 4
5 pages
Research On Domain Ontology Construction in Digita
No ratings yet
Research On Domain Ontology Construction in Digita
7 pages
Conceptual Clustering of Text Clusters
No ratings yet
Conceptual Clustering of Text Clusters
9 pages
A Comparative Study On Text Representation Schemes in Text Categorization
No ratings yet
A Comparative Study On Text Representation Schemes in Text Categorization
11 pages
Testing Different Log Bases For Vector Model Weighting Technique
No ratings yet
Testing Different Log Bases For Vector Model Weighting Technique
15 pages
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
No ratings yet
Text Classification by Augmenting Bag of Words (BOW) Representation With Co-Occurrence Feature
5 pages
A Document Exploring System On Lda Topic Model For Wikipedia Articles
No ratings yet
A Document Exploring System On Lda Topic Model For Wikipedia Articles
13 pages
Kmeanseppcsit
No ratings yet
Kmeanseppcsit
5 pages
GloVe Research Paper Explained. An Intuitive Understanding and - by Nikhil Birajdar - Towards Data Science
No ratings yet
GloVe Research Paper Explained. An Intuitive Understanding and - by Nikhil Birajdar - Towards Data Science
21 pages
Psychology and Marketing - 2023 - Pugliese - How To Conduct Efficient and Objective Literature Reviews Using Natural
No ratings yet
Psychology and Marketing - 2023 - Pugliese - How To Conduct Efficient and Objective Literature Reviews Using Natural
15 pages
A Comparison of Latent Semantic Analysis and Correspondence Analysis of Document-term Matrices
No ratings yet
A Comparison of Latent Semantic Analysis and Correspondence Analysis of Document-term Matrices
37 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
Automatic Categorisation of Croatian Websites
No ratings yet
Automatic Categorisation of Croatian Websites
6 pages
Document Clustering Using Particle Swarm Optimization
No ratings yet
Document Clustering Using Particle Swarm Optimization
7 pages
Document Classification Utilising Ontologies and Relations Between Documents
No ratings yet
Document Classification Utilising Ontologies and Relations Between Documents
8 pages
Introduction of IR Models
No ratings yet
Introduction of IR Models
62 pages
Vector Space Model
No ratings yet
Vector Space Model
6 pages
Learning Context For Text Categorization
No ratings yet
Learning Context For Text Categorization
9 pages
Document Similarity From Vector Space Densities
No ratings yet
Document Similarity From Vector Space Densities
12 pages
Chatterjee 2015
No ratings yet
Chatterjee 2015
13 pages
NLP Asgn2
No ratings yet
NLP Asgn2
7 pages
DVT UNIT -4 Notes 211124 (1)
No ratings yet
DVT UNIT -4 Notes 211124 (1)
21 pages
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
List of Potential Project NLP 2021
No ratings yet
List of Potential Project NLP 2021
42 pages
Data Mining: Similarity and Distance
No ratings yet
Data Mining: Similarity and Distance
13 pages
Similarity Measures
No ratings yet
Similarity Measures
48 pages
Reserch Paper Iosr Feb2018
No ratings yet
Reserch Paper Iosr Feb2018
7 pages
Movierecommentreport
No ratings yet
Movierecommentreport
39 pages
Session-5.1-Measuring Data Similarity and Dissimilarity - Part-2
No ratings yet
Session-5.1-Measuring Data Similarity and Dissimilarity - Part-2
16 pages
Information Theory Fundamentals: Distance Between Two Images Based On Pixels
No ratings yet
Information Theory Fundamentals: Distance Between Two Images Based On Pixels
24 pages
Cosine Similarity
No ratings yet
Cosine Similarity
5 pages
Data Science
No ratings yet
Data Science
44 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
Data Mining: Similarity and Distance
No ratings yet
Data Mining: Similarity and Distance
13 pages
Process Mining Embeddings
No ratings yet
Process Mining Embeddings
22 pages
Automatic Assessment of Syntactic Complexity For Spontaneous Speech Scoring
No ratings yet
Automatic Assessment of Syntactic Complexity For Spontaneous Speech Scoring
16 pages
? DSML U4
No ratings yet
? DSML U4
27 pages
About The Contextualization of Learning Objects in Mobile Learning Settings
No ratings yet
About The Contextualization of Learning Objects in Mobile Learning Settings
4 pages
Digital Libraries - Data, Information, and Knowledge
No ratings yet
Digital Libraries - Data, Information, and Knowledge
329 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
Aditya
No ratings yet
Aditya
8 pages
nlp-questions-and-answers-mcq
No ratings yet
nlp-questions-and-answers-mcq
7 pages
DataMining ch2 PDF
No ratings yet
DataMining ch2 PDF
56 pages
FOA Project Report: Basic Conversational Chatbot - Robo
No ratings yet
FOA Project Report: Basic Conversational Chatbot - Robo
10 pages
Does Generative AI Erode Its Own Training Data? –Empirical Evidence of the Effects on Data Quantity and Characteristics from a Q&A Platform
No ratings yet
Does Generative AI Erode Its Own Training Data? –Empirical Evidence of the Effects on Data Quantity and Characteristics from a Q&A Platform
40 pages
BDH Answer Bank
No ratings yet
BDH Answer Bank
21 pages
Mdoc
No ratings yet
Mdoc
32 pages
NLP - Experiment - 8 - A10
No ratings yet
NLP - Experiment - 8 - A10
16 pages
Reference Paper - FTIR Automatic Density Peaks Clustering Based On Cosine Similarity
No ratings yet
Reference Paper - FTIR Automatic Density Peaks Clustering Based On Cosine Similarity
7 pages
Artificial Intelligence Capstone Project idea
No ratings yet
Artificial Intelligence Capstone Project idea
15 pages