0% found this document useful (0 votes)
63 views1 page

Plot Centroids by Clustering Things

Uploaded by

mixal75579
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views1 page

Plot Centroids by Clustering Things

Uploaded by

mixal75579
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

About Products For Teams Search… Log in Sign up

Home Plot centroids in K-Means using TF-IDF Ask Question

PUBLIC Asked
2 years, 8 months ago Modified
2 years, 8 months ago Viewed
1k times

Questions
I'm coding to group texts using KMeans and everything is working well, but I'm not able to plot the
Tags The Overflow Blog
centroids together. I don't know how to use matplotlib, only seaborn along with the vector created
Users 2 by tdidf. CEO update: Eliminating obstacles to
productivity, efficiency, and learning
Companies MiniBatchKMeans has the variable cluster_centers_ , but I'm not able to use it in the image.
Announcing more ways to learn and grow
COLLECTIVES
your skills
from sklearn.feature_extraction.text import TfidfVectorizer

Explore Collectives df_abstracts = df_cleared['abstract'].tolist() # list with 33,000 lines of strings


Featured on Meta

TEAMS tfidf = TfidfVectorizer(max_features=2**12, ngram_range=(1,4), stop_words = 'englis


vextorized = tfidf.fit_transform(df_abstracts)
Accessibility Update: Colors
Stack Overflow for
#For the plot generation, I do this dimensionality reduction from 33,000 to 2.
2022: a year in moderation
Teams
– Start
collaborating and from sklearn.decomposition import PCA

pca = PCA(n_components = 9)
Collectives: The next iteration
sharing organizational
knowledge. X_pca = pca.fit_transform(vextorized.toarray())

Temporary policy: ChatGPT is banned


from sklearn.cluster import MiniBatchKMeans

We’re bringing advertisements for


kmeans = MiniBatchKMeans(init='k-means++', n_clusters=4, max_iter=500, n_init=10,

technology courses to Stack Overflow


random_state=9)

y_pred = kmeans.fit_predict(vextorized)

Related
np.unique(y_pred)

palette = sns.color_palette('bright', len(set(y_pred))) 2318 Calling a function of a module by using its


Create a free Team
sns.scatterplot(X_pca[:,0], X_pca[:, 1], hue=y_pred, legend='full', palette=palette name (a string)
Why Teams? plt.title('Clustered')

3816 Using global variables in a function

4086 Iterating over dictionaries using 'for' loops


python matplotlib seaborn k-means tf-idf
857 How to change the font size on a matplotlib
plot

Share Improve this question Follow edited May 23, 2020 at 11:56 asked May 22, 2020 at 3:24 1491 How to put the legend outside the plot
StupidWolf Sergio Pantano
43.7k 17 37 68 97 1 9 712 When to use cla(), clf() or close() for
clearing a plot in matplotlib?

1619 Save plot to image file instead of displaying


you're using K-means (unsupervised method), not K-NN (supervised learning model)
– JodeCharger100
May it using Matplotlib
22, 2020 at 3:37
939 How to make IPython notebook matplotlib
Add a comment plot inline

442 How to draw vertical lines on a given plot


1 Answer Sorted by: Highest score (default)
462 How to change the figure size of a seaborn
axes or figure level plot
You did the k means clustering on the raw data, so to your centers projected onto the PCA space,
you need to transform it again. Hot Network Questions
2
I use an example dataset: Why does my sand dissolve in citric acid?

Bridge rectifier speed


from sklearn.datasets import fetch_20newsgroups

How to replace selected elements of a list of rules


from sklearn.feature_extraction.text import TfidfVectorizer
with another unbalanced list of rules
from sklearn.decomposition import PCA

from sklearn.cluster import MiniBatchKMeans


How can I write straight on blank paper?
import pandas as pd

How do I create a custom math mode binary


import seaborn as sns

operation symbol that is a combination of two


import matplotlib.pyplot as plt
binary operation symbols?

categories = ['rec.sport.baseball', 'sci.electronics',


Are there seven consecutive annual dates which
'comp.os.ms-windows.misc', 'talk.politics.misc']
result in Monday-Sunday?

QGIS save selected layers to shapefiles by python


newsgroups = fetch_20newsgroups(subset='train',
console
categories=categories)

Using a larger inductor for boost/step up does not


allow sourcing more current
X_train = newsgroups.data

y_train = newsgroups.target
How accurate is this figure by TIME magazine?

tfidf = TfidfVectorizer(max_features=2**12, ngram_range=(1,4), stop_words = 'englis What's the point of continuing not to recognize the
Taliban government of Afghanistan?
vextorized = tfidf.fit_transform(X_train)

Why light can't escape a black hole but can


escape a star with same mass?
This part when you perform the pca, you need to retain the fit so that it can be use to project the Looking for the title of a novel about a
kmeans centers: divine/heavenly being turned into a dog

What does `no particular order` mean for storage


pca = PCA(n_components = 9).fit(vextorized.toarray())
iterations?
X_pca = pca.transform(vextorized.toarray())
Extension of the trivial bundle by the canonical
bundle on a curve

How do we know we're not getting bigger?


This is how the data with the actual labels look like:
Can I trick an innocent third party into doing
something that would be illegal if the third party
labels = [newsgroups.target_names[i] for i in y_train]
had mens rea without either of us being guilty?
sns.scatterplot(X_pca[:,0], X_pca[:, 1], hue=labels, legend='full',palette="Set2")

Is Queen's Killer Queen in 4/4, 12/8, or both?

ELI5: what got Lebanon into such crisis as of


2023?

Could a planet have 1/2 the Earth’s mass but 1 g


surface gravity?

Improve my hill climbing gear ratio on my


Cannondale CAAD8

How to understand instantaneous velocity concept

Do I need to cite ChatGPT in published writing?

"a lowly profitable company": is it correct?

Why textbf is not working in macro


Question feed

Now kmeans:

kmeans = MiniBatchKMeans(init='k-means++', n_clusters=4, max_iter=500, n_init=10,

random_state=777)

y_pred = kmeans.fit_predict(vextorized)

palette = sns.color_palette('bright', len(set(y_pred)))


sns.scatterplot(X_pca[:,0], X_pca[:, 1], hue=y_pred, legend='full', palette=palette
plt.title('Clustered')

We project the centers on the first 2 components and plot them:

centers_on_PCs = pca.transform(kmeans.cluster_centers_)

plt.scatter(x=centers_on_PCs[:,0],y=centers_on_PCs[:,1],s=200,c="k",marker="X")

Share Improve this answer Follow answered May 23, 2020 at 11:54
StupidWolf
43.7k 17 37 68

Add a comment

Your Answer

Sign up or log in Post as a guest


Name
Sign up using Google

Sign up using Facebook Email


Required, but never shown

Sign up using Email and Password

Post Your Answer By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged python matplotlib seaborn

k-means tf-idf or ask your own question.

STACK OVERFLOW PRODUCTS COMPANY STACK EXCHANGE NETWORK Blog Facebook Twitter LinkedIn Instagram

Questions Teams About Technology


Help Advertising Press Culture & recreation
Collectives Work Here Life & arts
Talent Legal Science
Privacy Policy Professional
Terms of Service Business
Contact Us
API
Cookie Settings
Data Site design / logo © 2023 Stack Exchange Inc; user
Cookie Policy contributions licensed under CC BY-SA. rev 2023.2.2.43213

You might also like