0% found this document useful (0 votes)
73 views

Non-Negative Matrix Factorization (NMF) : Benjamin Wilson

Non-negative matrix factorization (NMF) is an unsupervised learning technique for dimension reduction that models data as combinations of interpretable parts. NMF expresses documents as combinations of topics and images as combinations of patterns. It works by fitting a model to non-negative sample features and extracting non-negative components and features. The features can then be used to reconstruct the original samples.

Uploaded by

prjet1 fsm1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Non-Negative Matrix Factorization (NMF) : Benjamin Wilson

Non-negative matrix factorization (NMF) is an unsupervised learning technique for dimension reduction that models data as combinations of interpretable parts. NMF expresses documents as combinations of topics and images as combinations of patterns. It works by fitting a model to non-negative sample features and extracting non-negative components and features. The features can then be used to reconstruct the original samples.

Uploaded by

prjet1 fsm1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Non-negative matrix

factorization (NMF)
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Non-negative matrix factorization
NMF = "non-negative matrix factorization"

Dimension reduction technique

NMF models are interpretable (unlike PCA)

Easy to interpret means easy to explain!

However, all sample features must be non-negative (>= 0)

UNSUPERVISED LEARNING IN PYTHON


Interpretable parts
NMF expresses documents as combinations of topics (or
"themes")

UNSUPERVISED LEARNING IN PYTHON


Interpretable parts
NMF expresses images as combinations of pa erns

UNSUPERVISED LEARNING IN PYTHON


Using scikit-learn NMF
Follows fit() / transform() pa ern

Must specify number of components e.g.


NMF(n_components=2)

Works with NumPy arrays and with csr_matrix

UNSUPERVISED LEARNING IN PYTHON


Example word-frequency array
Word frequency array, 4 words, many documents

Measure presence of words in each document using "tf-idf"


"tf" = frequency of word in document

"idf" reduces in uence of frequent words

UNSUPERVISED LEARNING IN PYTHON


Example usage of NMF
samples is the word-frequency array

from sklearn.decomposition import NMF


model = NMF(n_components=2)
model.fit(samples)

NMF(alpha=0.0, ... )

nmf_features = model.transform(samples)

UNSUPERVISED LEARNING IN PYTHON


NMF components
NMF has components

... just like PCA has principal components

Dimension of components = dimension of samples

Entries are non-negative

print(model.components_)

[[ 0.01 0. 2.13 0.54]


[ 0.99 1.47 0. 0.5 ]]

UNSUPERVISED LEARNING IN PYTHON


NMF features
NMF feature values are non-negative

Can be used to reconstruct the samples

... combine feature values with components

print(nmf_features)

[[ 0. 0.2 ]
[ 0.19 0. ]
...
[ 0.15 0.12]]

UNSUPERVISED LEARNING IN PYTHON


Reconstruction of a sample
print(samples[i,:])

[ 0.12 0.18 0.32 0.14]

print(nmf_features[i,:])

[ 0.15 0.12]

UNSUPERVISED LEARNING IN PYTHON


Sample reconstruction
Multiply components by feature values, and add up

Can also be expressed as a product of matrices

This is the "Matrix Factorization" in "NMF"

UNSUPERVISED LEARNING IN PYTHON


NMF fits to non-negative data only
Word frequencies in each document

Images encoded as arrays

Audio spectrograms

Purchase histories on e-commerce sites

... and many more!

UNSUPERVISED LEARNING IN PYTHON


Let's practice!
UNSUPERVISED LEARNING IN PYTHON
NMF learns
interpretable parts
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Example: NMF learns interpretable parts
Word-frequency array articles (tf-idf)

20,000 scienti c articles (rows)

800 words (columns)

UNSUPERVISED LEARNING IN PYTHON


Applying NMF to the articles
print(articles.shape)

(20000, 800)

from sklearn.decomposition import NMF


nmf = NMF(n_components=10)
nmf.fit(articles)

NMF(alpha=0.0, ... )

print(nmf.components_.shape)

(10, 800)

UNSUPERVISED LEARNING IN PYTHON


NMF components are topics

UNSUPERVISED LEARNING IN PYTHON


NMF components are topics

UNSUPERVISED LEARNING IN PYTHON


NMF components are topics

UNSUPERVISED LEARNING IN PYTHON


NMF components are topics

UNSUPERVISED LEARNING IN PYTHON


NMF components
For documents:
NMF components represent topics

NMF features combine topics into documents

For images, NMF components are parts of images

UNSUPERVISED LEARNING IN PYTHON


Grayscale images
"Grayscale" image = no colors, only shades of gray

Measure pixel brightness

Represent with value between 0 and 1 (0 is black)

Convert to 2D array

UNSUPERVISED LEARNING IN PYTHON


Grayscale image example
An 8x8 grayscale image of the moon, wri en as an array

UNSUPERVISED LEARNING IN PYTHON


Grayscale images as flat arrays
Enumerate the entries

Row-by-row

From le to right, top to bo om

UNSUPERVISED LEARNING IN PYTHON


Grayscale images as flat arrays
Enumerate the entries

Row-by-row

From le to right, top to bo om

UNSUPERVISED LEARNING IN PYTHON


Encoding a collection of images
Collection of images of the same size

Encode as 2D array

Each row corresponds to an image

Each column corresponds to a pixel

... can apply NMF!

UNSUPERVISED LEARNING IN PYTHON


Visualizing samples
print(sample)

[ 0. 1. 0.5 1. 0. 1. ]

bitmap = sample.reshape((2, 3))


print(bitmap)

[[ 0. 1. 0.5]
[ 1. 0. 1. ]]

from matplotlib import pyplot as plt


plt.imshow(bitmap, cmap='gray', interpolation='nearest')
plt.show()

UNSUPERVISED LEARNING IN PYTHON


Let's practice!
UNSUPERVISED LEARNING IN PYTHON
Building
recommender
systems using NMF
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Finding similar articles
Engineer at a large online newspaper

Task: recommend articles similar to article being read by


customer

Similar articles should have similar topics

UNSUPERVISED LEARNING IN PYTHON


Strategy
Apply NMF to the word-frequency array

NMF feature values describe the topics

... so similar documents have similar NMF feature values

Compare NMF feature values?

UNSUPERVISED LEARNING IN PYTHON


Apply NMF to the word-frequency array
articles is a word frequency array

from sklearn.decomposition import NMF


nmf = NMF(n_components=6)
nmf_features = nmf.fit_transform(articles)

UNSUPERVISED LEARNING IN PYTHON


Strategy
Apply NMF to the word-frequency array

NMF feature values describe the topics

... so similar documents have similar NMF feature values

Compare NMF feature values?

UNSUPERVISED LEARNING IN PYTHON


Versions of articles
Di erent versions of the same document have same topic
proportions

... exact feature values may be di erent!

UNSUPERVISED LEARNING IN PYTHON


Versions of articles
Di erent versions of the same document have same topic
proportions

... exact feature values may be di erent!

E.g. because one version uses many meaningless words

UNSUPERVISED LEARNING IN PYTHON


Versions of articles
Di erent versions of the same document have same topic
proportions

... exact feature values may be di erent!

E.g. because one version uses many meaningless words

But all versions lie on the same line through the origin

UNSUPERVISED LEARNING IN PYTHON


Cosine similarity
Uses the angle between the lines

Higher values means more similar

Maximum value is 1, when angle is 0 degrees

UNSUPERVISED LEARNING IN PYTHON


Calculating the cosine similarities
from sklearn.preprocessing import normalize
norm_features = normalize(nmf_features)
# if has index 23
current_article = norm_features[23,:]
similarities = norm_features.dot(current_article)
print(similarities)

[ 0.7150569 0.26349967 ..., 0.20323616 0.05047817]

UNSUPERVISED LEARNING IN PYTHON


DataFrames and labels
Label similarities with the article titles, using a DataFrame

Titles given as a list: titles

import pandas as pd
norm_features = normalize(nmf_features)
df = pd.DataFrame(norm_features, index=titles)
current_article = df.loc['Dog bites man']
similarities = df.dot(current_article)

UNSUPERVISED LEARNING IN PYTHON


DataFrames and labels
print(similarities.nlargest())

Dog bites man 1.000000


Hound mauls cat 0.979946
Pets go wild! 0.979708
Dachshunds are dangerous 0.949641
Our streets are no longer safe 0.900474
dtype: float64

UNSUPERVISED LEARNING IN PYTHON


Let's practice!
UNSUPERVISED LEARNING IN PYTHON
Final thoughts
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Congratulations!
UNSUPERVISED LEARNING IN PYTHON

You might also like