0% found this document useful (0 votes)

73 views

Non-Negative Matrix Factorization (NMF) : Benjamin Wilson

Non-negative matrix factorization (NMF) is an unsupervised learning technique for dimension reduction that models data as combinations of interpretable parts. NMF expresses documents as combinations of topics and images as combinations of patterns. It works by fitting a model to non-negative sample features and extracting non-negative components and features. The features can then be used to reconstruct the original samples.

Uploaded by

prjet1 fsm1

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views

Non-Negative Matrix Factorization (NMF) : Benjamin Wilson

Uploaded by

prjet1 fsm1

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Non-negative matrix

factorization (NMF)
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Non-negative matrix factorization
NMF = "non-negative matrix factorization"

Dimension reduction technique

NMF models are interpretable (unlike PCA)

Easy to interpret means easy to explain!

However, all sample features must be non-negative (>= 0)

UNSUPERVISED LEARNING IN PYTHON

Interpretable parts
NMF expresses documents as combinations of topics (or
"themes")

UNSUPERVISED LEARNING IN PYTHON

Interpretable parts
NMF expresses images as combinations of pa erns

UNSUPERVISED LEARNING IN PYTHON

Using scikit-learn NMF
Follows fit() / transform() pa ern

Must specify number of components e.g.

NMF(n_components=2)

Works with NumPy arrays and with csr_matrix

UNSUPERVISED LEARNING IN PYTHON

Example word-frequency array
Word frequency array, 4 words, many documents

Measure presence of words in each document using "tf-idf"

"tf" = frequency of word in document

"idf" reduces in uence of frequent words

UNSUPERVISED LEARNING IN PYTHON

Example usage of NMF
samples is the word-frequency array

from sklearn.decomposition import NMF

model = NMF(n_components=2)
model.fit(samples)

NMF(alpha=0.0, ... )

nmf_features = model.transform(samples)

UNSUPERVISED LEARNING IN PYTHON

NMF components
NMF has components

... just like PCA has principal components

Dimension of components = dimension of samples

Entries are non-negative

print(model.components_)

[[ 0.01 0. 2.13 0.54]

[ 0.99 1.47 0. 0.5 ]]

UNSUPERVISED LEARNING IN PYTHON

NMF features
NMF feature values are non-negative

Can be used to reconstruct the samples

... combine feature values with components

print(nmf_features)

[[ 0. 0.2 ]
[ 0.19 0. ]
...
[ 0.15 0.12]]

UNSUPERVISED LEARNING IN PYTHON

Reconstruction of a sample
print(samples[i,:])

[ 0.12 0.18 0.32 0.14]

print(nmf_features[i,:])

[ 0.15 0.12]

UNSUPERVISED LEARNING IN PYTHON

Sample reconstruction
Multiply components by feature values, and add up

Can also be expressed as a product of matrices

This is the "Matrix Factorization" in "NMF"

UNSUPERVISED LEARNING IN PYTHON

NMF fits to non-negative data only
Word frequencies in each document

Images encoded as arrays

Audio spectrograms

Purchase histories on e-commerce sites

... and many more!

UNSUPERVISED LEARNING IN PYTHON

Let's practice!
UNSUPERVISED LEARNING IN PYTHON
NMF learns
interpretable parts
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Example: NMF learns interpretable parts
Word-frequency array articles (tf-idf)

20,000 scienti c articles (rows)

800 words (columns)

UNSUPERVISED LEARNING IN PYTHON

Applying NMF to the articles
print(articles.shape)

(20000, 800)

from sklearn.decomposition import NMF

nmf = NMF(n_components=10)
nmf.fit(articles)

NMF(alpha=0.0, ... )

print(nmf.components_.shape)

(10, 800)

UNSUPERVISED LEARNING IN PYTHON

NMF components are topics

UNSUPERVISED LEARNING IN PYTHON

NMF components are topics

UNSUPERVISED LEARNING IN PYTHON

NMF components are topics

UNSUPERVISED LEARNING IN PYTHON

NMF components are topics

UNSUPERVISED LEARNING IN PYTHON

NMF components
For documents:
NMF components represent topics

NMF features combine topics into documents

For images, NMF components are parts of images

UNSUPERVISED LEARNING IN PYTHON

Grayscale images
"Grayscale" image = no colors, only shades of gray

Measure pixel brightness

Represent with value between 0 and 1 (0 is black)

Convert to 2D array

UNSUPERVISED LEARNING IN PYTHON

Grayscale image example
An 8x8 grayscale image of the moon, wri en as an array

UNSUPERVISED LEARNING IN PYTHON

Grayscale images as flat arrays
Enumerate the entries

Row-by-row

From le to right, top to bo om

UNSUPERVISED LEARNING IN PYTHON

Grayscale images as flat arrays
Enumerate the entries

Row-by-row

From le to right, top to bo om

UNSUPERVISED LEARNING IN PYTHON

Encoding a collection of images
Collection of images of the same size

Encode as 2D array

Each row corresponds to an image

Each column corresponds to a pixel

... can apply NMF!

UNSUPERVISED LEARNING IN PYTHON

Visualizing samples
print(sample)

[ 0. 1. 0.5 1. 0. 1. ]

bitmap = sample.reshape((2, 3))

print(bitmap)

[[ 0. 1. 0.5]
[ 1. 0. 1. ]]

from matplotlib import pyplot as plt

plt.imshow(bitmap, cmap='gray', interpolation='nearest')
plt.show()

UNSUPERVISED LEARNING IN PYTHON

Let's practice!
UNSUPERVISED LEARNING IN PYTHON
Building
recommender
systems using NMF
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Finding similar articles
Engineer at a large online newspaper

Task: recommend articles similar to article being read by

customer

UNSUPERVISED LEARNING IN PYTHON

Strategy
Apply NMF to the word-frequency array

NMF feature values describe the topics

... so similar documents have similar NMF feature values

Compare NMF feature values?

UNSUPERVISED LEARNING IN PYTHON

Apply NMF to the word-frequency array
articles is a word frequency array

from sklearn.decomposition import NMF

nmf = NMF(n_components=6)
nmf_features = nmf.fit_transform(articles)

UNSUPERVISED LEARNING IN PYTHON

Strategy
Apply NMF to the word-frequency array

NMF feature values describe the topics

... so similar documents have similar NMF feature values

Compare NMF feature values?

UNSUPERVISED LEARNING IN PYTHON

Versions of articles
Di erent versions of the same document have same topic
proportions

... exact feature values may be di erent!

UNSUPERVISED LEARNING IN PYTHON

Versions of articles
Di erent versions of the same document have same topic
proportions

... exact feature values may be di erent!

E.g. because one version uses many meaningless words

UNSUPERVISED LEARNING IN PYTHON

Versions of articles
Di erent versions of the same document have same topic
proportions

... exact feature values may be di erent!

E.g. because one version uses many meaningless words

But all versions lie on the same line through the origin

UNSUPERVISED LEARNING IN PYTHON

Cosine similarity
Uses the angle between the lines

Higher values means more similar

Maximum value is 1, when angle is 0 degrees

UNSUPERVISED LEARNING IN PYTHON

Calculating the cosine similarities
from sklearn.preprocessing import normalize
norm_features = normalize(nmf_features)
# if has index 23
current_article = norm_features[23,:]
similarities = norm_features.dot(current_article)
print(similarities)

[ 0.7150569 0.26349967 ..., 0.20323616 0.05047817]

UNSUPERVISED LEARNING IN PYTHON

DataFrames and labels
Label similarities with the article titles, using a DataFrame

Titles given as a list: titles

import pandas as pd
norm_features = normalize(nmf_features)
df = pd.DataFrame(norm_features, index=titles)
current_article = df.loc['Dog bites man']
similarities = df.dot(current_article)

UNSUPERVISED LEARNING IN PYTHON

DataFrames and labels
print(similarities.nlargest())

Dog bites man 1.000000

Hound mauls cat 0.979946
Pets go wild! 0.979708
Dachshunds are dangerous 0.949641
Our streets are no longer safe 0.900474
dtype: float64

UNSUPERVISED LEARNING IN PYTHON

Let's practice!
UNSUPERVISED LEARNING IN PYTHON
Final thoughts
UNSUPERVISED LEARNING IN PYTHON

Benjamin Wilson
Director of Research at lateral.io
Congratulations!
UNSUPERVISED LEARNING IN PYTHON

Arihant Skills in Mathematics (For JEE Mains & Advance)
100% (2)
Arihant Skills in Mathematics (For JEE Mains & Advance)
3,256 pages
Non-Negative Matrix Factorization (NMF) : Benjamin Wilson
No ratings yet
Non-Negative Matrix Factorization (NMF) : Benjamin Wilson
43 pages
Assign 3
No ratings yet
Assign 3
1 page
Chapter4 PDF
No ratings yet
Chapter4 PDF
20 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Preprocessing ch.4
No ratings yet
Preprocessing ch.4
20 pages
Figure 3-10: Mglearn Discrete - Scatter X - Train - Pca X - Train - Pca y - Train PLT Xlabel PLT Ylabel
No ratings yet
Figure 3-10: Mglearn Discrete - Scatter X - Train - Pca X - Train - Pca y - Train PLT Xlabel PLT Ylabel
2 pages
Clustering of Bio Medical Scientific Papers
No ratings yet
Clustering of Bio Medical Scientific Papers
5 pages
4. Chapter 8 Text Analytics
No ratings yet
4. Chapter 8 Text Analytics
42 pages
16 - Practical - 6-7.ipynb - Colab
No ratings yet
16 - Practical - 6-7.ipynb - Colab
3 pages
Feature extraction techniques in NLP
No ratings yet
Feature extraction techniques in NLP
10 pages
Topic Modelling Using Non-Negative Matrix Factorization: Anjusha C MA18M008
No ratings yet
Topic Modelling Using Non-Negative Matrix Factorization: Anjusha C MA18M008
21 pages
"Sentiment Analysis of Survey Comments: Animesh Tilak
No ratings yet
"Sentiment Analysis of Survey Comments: Animesh Tilak
12 pages
NMF Tutorial
No ratings yet
NMF Tutorial
189 pages
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
No ratings yet
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
11 pages
Cs 229, Spring 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Spring 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
8 pages
Adhikary e Murty - 2012 - Feature Selection For Unsupervised Learning
No ratings yet
Adhikary e Murty - 2012 - Feature Selection For Unsupervised Learning
8 pages
Assignment 3 Instructions
No ratings yet
Assignment 3 Instructions
10 pages
AI - W6L12
No ratings yet
AI - W6L12
44 pages
l13_machine_learning
No ratings yet
l13_machine_learning
37 pages
Allnlp
No ratings yet
Allnlp
15 pages
Vector Semantics 3
No ratings yet
Vector Semantics 3
5 pages
9 Feature Engineering Text Data
No ratings yet
9 Feature Engineering Text Data
7 pages
UNIT-3 Foundations of Deep Learning
No ratings yet
UNIT-3 Foundations of Deep Learning
32 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
21 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
176-200
No ratings yet
176-200
25 pages
Three Canonical Learning Problems
No ratings yet
Three Canonical Learning Problems
13 pages
Numpy in Visually Appealing Manner
No ratings yet
Numpy in Visually Appealing Manner
12 pages
Fast String Matching in Python
No ratings yet
Fast String Matching in Python
5 pages
NLP and ML Project
100% (1)
NLP and ML Project
37 pages
Extra Feature NLP
No ratings yet
Extra Feature NLP
5 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
2EL1730 ML Lecture11 NMF - Annotated
No ratings yet
2EL1730 ML Lecture11 NMF - Annotated
41 pages
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
No ratings yet
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
10 pages
Filter Unsupervised Spectral Feature Selection Method for Mixed Data Based on a New Feature Correlation Measure
No ratings yet
Filter Unsupervised Spectral Feature Selection Method for Mixed Data Based on a New Feature Correlation Measure
19 pages
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
No ratings yet
Non-Negative Matrix Factorization, A New Tool For Feature Extraction: Theory and Applications
8 pages
Unit6 002
No ratings yet
Unit6 002
10 pages
Module III
No ratings yet
Module III
42 pages
TextFeatureEnginerring-NLP lec2
No ratings yet
TextFeatureEnginerring-NLP lec2
60 pages
NNunsuperv Learning PDF
No ratings yet
NNunsuperv Learning PDF
21 pages
Data Analyst
No ratings yet
Data Analyst
5 pages
7-8 Feature Engineering 101-Normalization
No ratings yet
7-8 Feature Engineering 101-Normalization
8 pages
Algorithem Cheat Sheet
No ratings yet
Algorithem Cheat Sheet
25 pages
Image Classification
No ratings yet
Image Classification
18 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Nlp2.ipynb - Colab
No ratings yet
Nlp2.ipynb - Colab
3 pages
Ai Workflow Data Preparation With Numpy: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
No ratings yet
Ai Workflow Data Preparation With Numpy: MR Hew Ka Kian Hew - Ka - Kian@Rp - Edu.Sg
30 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
74 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
19 pages
Questions Answers Chapter Wise
No ratings yet
Questions Answers Chapter Wise
4 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
A Visual Intro To Numpy and Data Representation
No ratings yet
A Visual Intro To Numpy and Data Representation
16 pages
ML Unit 2 Part 2
No ratings yet
ML Unit 2 Part 2
23 pages
Margin-Based Active Learning and Background Knowledge in Text Mining
No ratings yet
Margin-Based Active Learning and Background Knowledge in Text Mining
6 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Unit 2 MLMM
No ratings yet
Unit 2 MLMM
41 pages
contrastive learning
No ratings yet
contrastive learning
10 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Numeric Computation and Statistical Data Analysis On The Java Platform Advanced Information and Knowledge Processing Chekanov Sergei V
No ratings yet
Numeric Computation and Statistical Data Analysis On The Java Platform Advanced Information and Knowledge Processing Chekanov Sergei V
62 pages
Question Bank - EC-208 - Programming Concepts
No ratings yet
Question Bank - EC-208 - Programming Concepts
20 pages
Mathcad - Chapter 4
No ratings yet
Mathcad - Chapter 4
32 pages
Test - 1 Matrices: Class-Xii (Cbse) Material By: Anish Sir Page
No ratings yet
Test - 1 Matrices: Class-Xii (Cbse) Material By: Anish Sir Page
1 page
Modeling and Simulation (EE562) Lecture 1: An Introduction To MATLAB
No ratings yet
Modeling and Simulation (EE562) Lecture 1: An Introduction To MATLAB
38 pages
Course Outline
No ratings yet
Course Outline
5 pages
Julia Exp7
No ratings yet
Julia Exp7
4 pages
Presentation On Flexural Properties of Polymer Matrix Composites PDF
No ratings yet
Presentation On Flexural Properties of Polymer Matrix Composites PDF
2 pages
Linear Algebra and Advanced Calculus: Somitra Sanadhya
No ratings yet
Linear Algebra and Advanced Calculus: Somitra Sanadhya
11 pages
12 Catalog SIP E7 Substation Automation
No ratings yet
12 Catalog SIP E7 Substation Automation
33 pages
HW 1
No ratings yet
HW 1
6 pages
Math NTNU March 2016
No ratings yet
Math NTNU March 2016
42 pages
Modal Space - in Our Own Little World: by Pete Avitabile
No ratings yet
Modal Space - in Our Own Little World: by Pete Avitabile
2 pages
MOSEK Optimizer API For Python PDF
No ratings yet
MOSEK Optimizer API For Python PDF
416 pages
Mathematical Models of Complex Flexible Missile and Software For Control System Design and Simulation
No ratings yet
Mathematical Models of Complex Flexible Missile and Software For Control System Design and Simulation
6 pages
Non-Destructive Terahertz Imaging of Illicit Drugs Using Spectral Fingerprints
No ratings yet
Non-Destructive Terahertz Imaging of Illicit Drugs Using Spectral Fingerprints
6 pages
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
No ratings yet
2022 - Chuan Shi, Xiao Wang, Cheng Yang - Advances in Graph Neural Networks-Springer
207 pages
Probability and Statistics: Dr. Faisal Bukhari Punjab University College of Information Technology (Pucit)
No ratings yet
Probability and Statistics: Dr. Faisal Bukhari Punjab University College of Information Technology (Pucit)
44 pages
Chapter 23 - Product Metrics For Software
No ratings yet
Chapter 23 - Product Metrics For Software
6 pages
Identification of MIMO Systems by TS Fuzzy Models: Input-Output
No ratings yet
Identification of MIMO Systems by TS Fuzzy Models: Input-Output
6 pages
The Strategic Position and Action Evaluation
No ratings yet
The Strategic Position and Action Evaluation
6 pages
OT_Ch02 Matrix Algebra and Its Application
No ratings yet
OT_Ch02 Matrix Algebra and Its Application
18 pages
Resizing A Matrix - Vector
No ratings yet
Resizing A Matrix - Vector
15 pages
Department of Mechanical Engineering: Laboratory Manual
No ratings yet
Department of Mechanical Engineering: Laboratory Manual
52 pages
AMORPHOUOS COMPUTING Examples, Mathematics and Theory, August2013
No ratings yet
AMORPHOUOS COMPUTING Examples, Mathematics and Theory, August2013
18 pages
Source Localization in Shallow Water in The Presence of Sensor Location Uncertainty
No ratings yet
Source Localization in Shallow Water in The Presence of Sensor Location Uncertainty
6 pages
Surface-Wave Method For Near-Surface Characterization: A Tutorial
No ratings yet
Surface-Wave Method For Near-Surface Characterization: A Tutorial
21 pages
C Language Slips
100% (1)
C Language Slips
44 pages
Gate Mathematics PDF
100% (1)
Gate Mathematics PDF
150 pages