0% found this document useful (0 votes)

19 views78 pages

Unit - IV

Uploaded by

arushasmitkumar1802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views78 pages

Unit - IV

Uploaded by

arushasmitkumar1802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 78

Unit - IV

Recommendation Systems
OUTLINE
• Recommendation Engine

• Dimensionality Reduction

• Singular Value Decomposition

• Principal Component Analysis

• Exercise
Introduction
• Recommendation engines are also called as recommendation systems, are the typical
data product and are a good starting point when you’re explaining to non–data
scientists what you do or what data science really is?

• Examples for recommendation systems:

1. Getting recommended movies on Netflix or YouTube

2. Getting recommended books on Flipkart or Amazon

• To build a solid recommendation system end-to-end requires an understanding of

linear algebra and an ability to code.
A Real-World Recommendation
Engine
• Recommendation engines are used all the time.

• Example—What movie would you like, knowing other movies you liked?
What book would you like, keeping in mind past purchases? What kind
of vacation are you likely to embark on, considering past trips?

• There are plenty of different ways to go about building such a model,

but they have very similar feels if not implementation.
Example to set up a
recommendation engine
• Scenario - suppose you have users, which form a set U; and you have items to
recommend, which form a set V.

• You can represent the above scenario as a bipartite graph (shown in below figure) if each
user and each item has a node to represent it—there are lines from a user to an item if
that user has expressed an opinion about that item.
Contd…
• Note they might not always love that item, so the edges could have
weights: they could be positive, negative, or on a continuous scale (or
discontinuous, but many-valued like a star system).

• Example :
Contd…
• Next step is, you have training data in the form of some preferences i.e., you
know some of the opinions of some of the users on some of the items.

• From those training data, you want to predict other preferences for
your users. That’s essentially the output for a recommendation engine.

• You may also have metadata on users (i.e., they are male or female, etc.) or
on items (the color of the product).
• For example, users come to your website and set up accounts, so you may know each

user’s gender, age, and preferences for up to three items.

Contd…
• Next, You may represent a given user as a vector of features, sometimes
including only metadata—sometimes including only preferences (which
would lead to a sparse vector because you don’t know all the user’s
opinions) —and sometimes including both, depending on what you’re
doing with the vector. Also, you can sometimes bundle all the user vectors
together to get a big user matrix, which we call U.
Nearest Neighbor Algorithm Review
• Let’s review the nearest neighbor algorithm (discussed already)
• Idea of Nearest Neighbor Algorithm is - if you want to predict whether user A likes something,
you look at a user B closest to user A who has an opinion, then you assume A’s opinion is the
same as B’s.
• To implement this you need a metric so you can measure distance.
• One example when the opinions are binary: Jaccard distance, i.e., 1–(the number of things
they both like divided by the number of things either of them likes).
• Other examples include Cosine similarity or Euclidean distance.
• To answer, Which Metric Is Best?
- Do experiment by using different distance measure for each experiment.
Some Problems with Nearest
Neighbors
• Curse of dimensionality

There are too many dimensions, so the closest neighbors are too far
away from each other to realistically be considered “close.”

• Overfitting

One guy is closest, but that could be pure noise. How do you adjust for
that? One idea is to use kNN, with, say, k = 5 rather than k = 1, which
increases the noise. (For Optimal Solution Choose proper value for k).
Contd…
• Correlated features
There are tons of (Many) features, moreover, that are highly correlated (inter-linked) with
each other.

For example, you might imagine that as you get older you become more conservative. But
then counting both age and politics would mean you’re double counting a single feature in
some sense.

This would lead to bad performance, because you’re using redundant information and
essentially placing double the weight on some variables. It’s preferable to build in an
understanding of the correlation and project onto smaller dimensional space.
Contd…
• Relative importance of features

Some features are more informative than others. Weighting features may therefore
be helpful: maybe your age has nothing to do with your preference for item 1. You’d
probably use something like co-variances to choose your weights.
• Sparseness
If your vector (or matrix, if you put together the vectors) is too sparse (Ex: many
entries in the vector or matrix are 0s), or you have lots of missing data, then most
things are un‐known, and the Jaccard distance means nothing because there’s no
overlap.
Contd…
• Measurement errors
There’s measurement error (also called reporting error): people may lie. (Ex; When providing the data).

• Computational complexity
There’s a calculation cost—computational complexity

• Sensitivity of distance metrics

Euclidean distance also has a scaling problem: distances in age outweigh distances for other features if
they’re reported as 0 (for don’t like) or 1 (for like). Essentially this means that raw Euclidean distance doesn’t make
much sense. Also, old and young people might think one thing but middle-aged people something else.
We seem to be assuming a linear relationship, but it may not exist. Should you be binning by age group (Creating
buckets based on Age) instead, for example? (i.e., You should use an alternative approach for this) .
Contd…
• Preferences change over time
User preferences may also change over time, which falls outside the model.
For example, at eBay, they might be buying a printer, which makes them only
want ink for a short time.

• Cost to update
It’s also expensive to update the model as you add more data.

• The biggest issues are the first two on the list, namely overfitting and the
curse of dimensionality problem.
Beyond Nearest Neighbor: Machine Learning
Classification
• To deal with overfitting and the curse of dimensionality problem, we’ll build a separate
linear regression model for each item.

• With each model, we could then predict for a given user, knowing their attributes,
whether they would like the item corresponding to that model.

• So one model might be for predicting whether you like Mad Men and another model
might be for predicting whether you would like Bob Dylan.

• Denote by fi, j user i’s stated preference for item j if you have it (or user i’s attribute, if item
j is a metadata item like age or is_logged_in).
Contd…

The good news: You know how to estimate the coefficients by linear algebra, optimization, and
statistical inference: specifically, linear regression.
The bad news: This model only works for one item, and to be complete, you’d need to build as many
models as you have items. Moreover, you’re not using other items’ information at all to create the
model for a given item, so you’re not leveraging (Using) other pieces of information.
Contd…
• But wait, there’s more good news: This solves the “weighting of the features”
problem we discussed earlier, because linear regression coefficients are
weights. (So that you can know which are more important and which are less
important)

• Crap, more bad news: overfitting is still a problem, and it comes in the form of
having huge coefficients when you don’t have enough data (i.e., not enough
opinions on given items).
Contd…
• To solve the overfitting problem, you impose a Bayesian prior that
these weights shouldn’t be too far out of whack (hit)—this is done by adding a
penalty term for large coefficients.

• That solution depends on a single parameter, which is traditionally called λ.

• But that begs the question: how do you choose λ? You could do it
experimentally: use some data as your training set, evaluate how well
you did using particular values of λ, and adjust.
Contd…
• A final problem with this prior stuff: although the problem will have a
unique solution (as in the penalty will have a unique minimum) if you
make λ large enough, by that time you may not be solving the problem
you care about.

• i.e., if you make λ absolutely huge, then the coefficients will all go to
zero and you’ll have no model at all.
The Dimensionality Problem
• So we’ve tackled the overfitting problem (previous slides), so now let’s
think about overdimensionality — i.e., the idea that you might have tens
of thousands of items.

• We typically use both Singular Value Decomposition (SVD) and Principal

Component Analysis (PCA) to tackle this.
Contd…
• To understand how this works before we dive into the math, let’s think about how we reduce
dimensions and create “latent features” internally every day.

• For example, people invent concepts like “coolness,” but we can’t directly measure how cool
someone is. Other people exhibit different patterns of behavior, which we internally map or reduce
to our one dimension of “coolness.”

• So coolness is an example of a latent feature in that it’s unobserved and not measurable directly,
and we could think of it as reducing dimensions because perhaps it’s a combination of many
“features” we’ve observed about the person and implicitly weighted in our mind.

• Two things are happening here: the dimensionality is reduced into a single feature and the latent
aspect of that feature.
Contd…
• But in this algorithm, we don’t decide which latent factors to care about. Instead we let
the machines do the work of figuring out what the important latent features are.

• “Important” in this context means they explain the variance in the answers to the
various questions—in other words, they model the answers efficiently

• Our goal is to build a model that has a representation in a low dimensional subspace
that gathers “taste information” to generate recommendations.

• To know Linear algebra click the link:

https://www.khanacademy.org/math/linear-algebra
Singular Value Decomposition (SVD)
• Maths background:

Given an m×n matrix X of rank k, it is a theorem from linear algebra that we can always
compose it into the product of three matrices as follows:

where U is m×k, S is k×k, and V is k×n, the columns of U and V are pairwise orthogonal, and
S is diagonal. Note the standard statement of SVD is slightly more involved and has U and V
both square unitary matrices, and has the middle “diagonal” matrix a rectangular. We’ll be
using this form, because we’re going to be taking approximations to X of increasingly smaller
rank.
Contd…
• Let’s apply the preceding matrix decomposition to our situation. X is our original dataset,
which has users’ ratings of items. We have m users, n items, and k would be the rank of X,
and consequently would also be an upper bound on the number d of latent variables we
decide to care about—note we choose d whereas m, n, and k are defined through our training
dataset. So just like in k-NN, where k is a tuning parameter (different k entirely—not trying to
confuse you!), in this case, d is the tuning parameter.

• Each row of U corresponds to a user, whereas V has a row for each item. The values along the
diagonal of the square matrix S are called the “singular values.” They measure the importance
of each latent variable—the most important latent variable has the biggest singular value.
YouTube URLs for SVD
• https://youtu.be/EokL7E6o1AE

• https://youtu.be/P5mlg91as1c
Important Properties of SVD
• Because the columns of U and V are orthogonal to each other, you can order the columns by singular
values via a base change operation. That way, if you put the columns in decreasing order of their
corresponding singular values (which you do), then the dimensions are ordered by importance from
highest to lowest. You can take lower rank approximation of X by throwing away part of S. In other
words, replace S by a submatrix taken from the upper-left corner of S.

• Of course, if you cut off part of S you’d have to simultaneously cut off part of U and part of V, but this
is OK because you’re cutting off the least important vectors. This is essentially how you choose the
number of latent variables d—you no longer have the original matrix X anymore, only an
approximation of it, because d is typically much smaller than k, but it’s still pretty close to X .

• SVD can’t handle missing values.

• SVD is extremely computationally expensive.

How would you actually use this for
recommendation?
Principal Component Analysis (PCA)
• Let’s look at another approach for predicting preferences. With this approach,
you’re still looking for U and V as before, but you don’t need S anymore, so you’re
just searching for U and V such that:
Contd…
Contd…
• How do you choose d? It’s typically about 100, because it’s more than 20
(as we told you, through the course of developing the product, we found that we
had a pretty good grasp on someone if we ask them 20 questions) and it’s as much
as you care to add before it’s computationally too much work.
YouTube URL for PCA
• https://www.youtube.com/watch?v=ZqXnPcyIAL8

• https://www.youtube.com/watch?v=yLdOS6xyM_Q

• https://youtu.be/FgakZw6K1QQ

• https://www.youtube.com/watch?v=0Jp4gsfOLMs
Theorem: The resulting latent features
will be uncorrelated
• A nice aspect of these latent features is that they’re uncorrelated.
Here’s a sketch of the proof:
Contd…
Contd…
Alternating Least Squares
Exercise: Build Your Own Recommendation System

• Just refer example given in Text book (Page No. 214)

Mining Social-Network
Graphs
Social
Network

No introduction required

Really?

We still need to understand a

few properties

disclaimer: the brand logos are used here entirely for educational purpose
Social
Network
 A collection of entities
– Typically people, but could be something else too
 At least one relationship between entities of the network
– For example: friends
– Sometimes boolean: two people are either friends or they are not
– May have a degree
– Discrete degree: friends, family, acquaintances, or none
– Degree – real number: the fraction of the average day that two people spend talking to each
other
 An assumption of nonrandomness or locality
– Hard to formalize
– Intuition: that relationships tend to cluster
– If entity A is related to both B and C, then the probability that B and C are related is higher than
average (random)
Social Network as a
Graph
A B D E

A graph with
boolean (friends)
C relationship
G F

 Check for the non-randomness criterion

 In a random graph (V,E) of 7 nodes and 9 edges, if XY is an edge, YZ
is an edge, what is the probability that XZ is an edge?
– For a large random graph, it would be close to |E|/(|V|C2) = 9/21 ~ 0.43
– Small graph: XY and YZ are already edges, so compute within the rest
– So the probability is (|E|−2)/(|V|C2−2) = 7/19 = 0.37
 Now let’s compute what is the probability for this graph in particular
Example courtesy: Leskovec, Rajaraman and Ullman
Social Network as a
Graph
A B D E

s have A graph with

Do e
boolean (friends)
locality C relationship
G F
ty
proper
 For each X, check possible YZ and check if YZ is an edge or not
 Example: if X = A, YZ = {BC}, it is an edge
X= YZ= Yes/Total X= YZ= Yes/Total
A BC 1/1 E DF 1/1
B AC, AD, CD 1/3 F DE,DG,EG 2/3
C AB 1/1 G DF 1/1

D BE,BG,BF,EF, 2/6 Total 9/16 ~ 0.56

EG,FG
Types of Social (or
Professional) Networks
A B D E

C G F

 Of course, the “social network”. But also several other types

 Telephone network
 Nodes are phone numbers
 AB is an edge if A and B talked over phone within the last one week, or month, or ever

 Edges could be weighted by the number of times phone calls were made, or total time of
conversation
Types of Social (or
Professional) Networks
A B D E

C G F

 Email network: nodes are email addresses

 AB is an edge if A and B sent mails to each other within the last one week, or month, or ever
– One directional edges would allow spammers to have edges
 Edges could be weighted

 Other networks: collaboration network – authors of papers, jointly written papers or not
 Also networks exhibiting locality property
Clustering of Social
Network Graphs
 Locality property  there are clusters
 Clusters are communities
– People of the same institute, or company
– People in a photography club
– Set of people with “Something in common” between them
 Need to define a distance between points (nodes)
 In graphs with weighted edges, different distances exist
 For graphs with “friends” or “not friends” relationship
– Distance is 0 (friends) or 1 (not friends)
– Or 1 (friends) and infinity (not friends)
– Both of these violate the triangle inequality
– Fix triangle inequality: distance = 1 (friends) and 1.5 or 2 (not friends) or length of
shortest path
Traditional
Clustering
A B D E

C G F

 Intuitively, two communities

 Traditional clustering depends on the distance
– Likely to put two nodes with small distance in the same cluster
– Social network graphs would have cross-community edges
– Severe merging of communities likely
 May join B and D (and hence the two communities) with not so low
Betweenness of an
Edge
A B D E

C G F

 Betweenness of an edge AB: #of pairs of nodes (X,Y) such that AB lies on the shortest
path between X and Y
– There can be more than one shortest paths between X and Y
– Credit AB the fraction of those paths which include the edge AB
 High score of betweenness means?
– The edge runs “between” two communities
 Betweenness gives a better measure
– Edges such as BD get a higher score than edges such as AB
 Not a distance measure, may not satisfy triangle inequality. Doesn’t matter!
The Girvan – Newman Algorithm
 Step 1 – BFS: Start at a node X, Calculate betweenness of edges
perform a BFS with X as root 1
E
 Observe: level of node Y = length
1
of shortest path from X to Y 1
 Edges between level are called D Level 1
F
“DAG” edges
– Each DAG edge is
part of at least one shortest 1 B G Level 2
path from X 2
 Step 2 – Labeling: Label each node
Y by the number of shortest paths
from X to Y A C Level 3
1 1
The Girvan – Newman
Algorithm
Step 3 – credit sharing: Calculate betweenness of edges
 Each leaf node gets credit 1 1
 Each non-leaf node gets 1 + E
sum(credits of the DAG edges to the 4.5
1 1.5
level below) 1
 Credit of DAG edges: Let Yi (i=1, 4.5 D Level 1
F
1.5
… , k) be parents of Z, pi = label(Yi)
credit(Z )  i 3 0.5 0.5
credit(Yi , Z )
p ( p1 !

 Intuition: a DAG edge pk ) Y Zi gets 1 B G Level 2
share of credit of Z
the 3 2
1
proportional to
the #of shortest paths 1 1
from
Finally: X to Steps
Repeat Z 1, 2 and 3 with
goingasthrough
each node root. ForYiZeach edge, A C Level 3
betweenness = sum credits obtained in all 1 1 1 1
iterations / 2
Computation in
practice
 Complexity: n nodes, e edges
– BFS starting at each node: O(e)

– Do it for n nodes

– Total: O(ne) time

– Very expensive

 Method in practice
– Choose a random subset W of
the nodes
– Compute credit of each edge
starting at each node in W
– Sum and compute betweenness
Finding Communities using
Betweenness
Method 1:
 Keep adding edges (among existing ones) starting from lowest
betweenness
 Gradually join small components to build large connected components
Finding Communities using
Betweenness
Method 1:
 Keep adding edges (among existing ones) starting from lowest betweenness
 Gradually join small components to build large connected components
Finding Communities using
Betweenness
Method 1:
 Keep adding edges (among existing ones) starting from lowest betweenness
 Gradually join small components to build large connected components
Finding Communities using
Betweenness
Method 1:
 Keep adding edges (among existing ones) starting from lowest betweenness
 Gradually join small components to build large connected components
Finding Communities using
Betweenness
Method 1:
 Keep adding edges (among existing ones) starting from lowest betweenness
 Gradually join small components to build large connected components
Finding Communities using
Betweenness
Method 1:
 Keep adding edges (among existing ones) starting from lowest betweenness
 Gradually join small components to build large connected components
Finding Communities using
Betweenness
Method 2:
 Start from all existing edges. The graph may look like one big component.
 Keep removing edges starting from highest betweenness
 Gradually split large components to arrive at communities
Finding Communities using
Betweenness
Method 2:
 Start from all existing edges. The graph may look like one big component.
 Keep removing edges starting from highest betweenness
 Gradually split large components to arrive at communities
Finding Communities using
Betweenness
Method 2:
 Start from all existing edges. The graph may look like one big component.
 Keep removing edges starting from highest betweenness
 Gradually split large components to arrive at communities

At some point, removing the edge with highest betweenness would split
the graph into separate components
Finding Communities using
Betweenness
 For a fixed threshold of betweenness, both methods would
ultimately produce the same clustering
 However, a suitable threshold is not known beforehand

 Method 1 vs Method 2
– Method 2 is likely to take less number of operations. Why?

– Inter-community edges are less than intra-community edges

Direct Discovery of Communities
• So for, we searched for communities by partitioning all the individuals in a social
network.

• While this approach is relatively efficient, it does have several limitations.

• It is not possible to place an individual in two different communities, and everyone is

assigned to a community.

• In this section, we shall see a technique for discovering communities directly by looking
for subsets of the nodes that have a relatively large number of edges among them.
Finding Cliques
• Our first thought about how we could find sets of nodes with many
edges between them is to start by finding a large clique (a set of nodes
with edges between any two of them).

• However, that task is not easy - NP-complete problems

Complete Bipartite Graphs
• A complete bipartite graph consists of s nodes on one side and t
nodes on the other side, with all st possible edges between the nodes
of one side and the other present.

Fig: The bipartite graph

Partitioning of Graphs

• Another approach to organizing social-network graphs. We use some

important tools from matrix theory (“spectral methods”) to formulate
the problem of partitioning a graph to minimize the number of edges
that connect different components.
What Makes a Good Partition?

• Given a graph, we would like to divide the nodes into two sets so that
the cut, or set of edges that connect nodes in different sets is
minimized.

• However, we also want to constrain the selection of the cut so that

the two sets are approximately equal in size.
Contd…
Normalized Cuts
• First, define the volume of a set S of nodes, denoted Vol (S), to be the
number of edges with at least one end in S.

• Suppose we partition the nodes of a graph into two disjoint sets S and T .
Let Cut (S, T ) be the number of edges that connect a node in S to a node
in T .

• Then the normalized cut value for S and T is

Example
• Again consider the graph of Fig. 10.11. If we choose S = {H} and T = {A,B,C,D,E, F,G}, then
Cut (S, T ) = 1. Vol(S) = 1, because there is only one edge connected to H.
• On the other hand, Vol(T ) = 11, because all the edges have at least one end at a node of T.
Thus, the normalized cut for this partition is 1/1 + 1/11 = 1.09.

• Now, consider the preferred cut for this graph consisting of the edges (B,D) and (C,G).
• Then S = {A,B,C,H} and T = {D,E, F,G}. Cut (S, T ) = 2, Vol (S) = 6, and Vol(T ) = 7.
• The normalized cut for this partition is thus only 2/6 + 2/7 = 0.62.
Some Matrices that describe
Graphs
• To develop the theory of how matrix algebra can help us find good
graph partitions, we first need to learn about three different matrices
that describe aspects of a graph.

i) Adjacency matrix (A)

Contd…
ii) Degree matrix (D)
Contd…
iii) Laplacian matrix, L = D − A

= -
Eigenvalues of the Laplacian
Matrix
Reference

 Mining of Massive Datasets, by Leskovec, Rajaraman and Ullman,

Chapter 10

Recurrence Relations
No ratings yet
Recurrence Relations
27 pages
IMA Questions Paper
50% (2)
IMA Questions Paper
17 pages
펙틴을 이용한 친환경 수질개선 효과 탐구
No ratings yet
펙틴을 이용한 친환경 수질개선 효과 탐구
32 pages
Lesson - Plan (Rational Equation) CO1
No ratings yet
Lesson - Plan (Rational Equation) CO1
4 pages
DS - Module 4
No ratings yet
DS - Module 4
57 pages
Chap-7 Calibration PDF
No ratings yet
Chap-7 Calibration PDF
12 pages
Computer Aided Drug Design Unit 5
No ratings yet
Computer Aided Drug Design Unit 5
12 pages
Design of Buried Thermoplastics Pipes: Results of A European Research Project by Apme & Teppfa
No ratings yet
Design of Buried Thermoplastics Pipes: Results of A European Research Project by Apme & Teppfa
34 pages
DESIGN REPORT-330Kv DA3
No ratings yet
DESIGN REPORT-330Kv DA3
92 pages
Spray Chamber
No ratings yet
Spray Chamber
11 pages
Review Module 08 Integral Calculus Part 2
No ratings yet
Review Module 08 Integral Calculus Part 2
1 page
The School of Physics: Annual Report
No ratings yet
The School of Physics: Annual Report
69 pages
Experimentation On Fabrication of Double Pipe Heat Exchanger
No ratings yet
Experimentation On Fabrication of Double Pipe Heat Exchanger
11 pages
Gujarat Technological University: Mechanical Engineering (19) SUBJECT CODE: 2151902
No ratings yet
Gujarat Technological University: Mechanical Engineering (19) SUBJECT CODE: 2151902
54 pages
Cambridge International AS & A Level: PHYSICS 9702/21
No ratings yet
Cambridge International AS & A Level: PHYSICS 9702/21
16 pages
Linear Algebra For Machine Learning
No ratings yet
Linear Algebra For Machine Learning
65 pages
Recommendation Lecture Great Learning 01 A PR 2017
No ratings yet
Recommendation Lecture Great Learning 01 A PR 2017
36 pages
Unit 3
No ratings yet
Unit 3
100 pages
Module 3-1
No ratings yet
Module 3-1
46 pages
Abortion Essay Outline
100% (2)
Abortion Essay Outline
3 pages
17 MOSFETs S18
No ratings yet
17 MOSFETs S18
39 pages
BS 718 1991
No ratings yet
BS 718 1991
27 pages
NCERT Exemplar Solution Class 7 Science Chapter 15
No ratings yet
NCERT Exemplar Solution Class 7 Science Chapter 15
7 pages
PTE Self Study - RL - 150
No ratings yet
PTE Self Study - RL - 150
32 pages
Tools & Techniques of Forensic Sciences
No ratings yet
Tools & Techniques of Forensic Sciences
16 pages
Modern Robotics Offers A: Comprehensive Contemporary Approach To The Modeling and Control of Robotic Mechanisms
No ratings yet
Modern Robotics Offers A: Comprehensive Contemporary Approach To The Modeling and Control of Robotic Mechanisms
3 pages
Habasit TC and TCF Power
No ratings yet
Habasit TC and TCF Power
8 pages
Tablas
No ratings yet
Tablas
7 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
GS Special
No ratings yet
GS Special
11 pages
Effects of Cutting Temperature and Process Optimization in Drilling of GFRP Composites
No ratings yet
Effects of Cutting Temperature and Process Optimization in Drilling of GFRP Composites
15 pages
Deep Learning Lectures - 3
No ratings yet
Deep Learning Lectures - 3
75 pages
Hsslive-2. INV T FUNS
No ratings yet
Hsslive-2. INV T FUNS
4 pages
On The Classification of The Rock Mass Excavation Behaviour in Tunneling
No ratings yet
On The Classification of The Rock Mass Excavation Behaviour in Tunneling
4 pages
ShortCourse QTT Lecture1
No ratings yet
ShortCourse QTT Lecture1
40 pages
Machine Learning: Nearest Neighbors
No ratings yet
Machine Learning: Nearest Neighbors
23 pages
DSV Mod 3
No ratings yet
DSV Mod 3
47 pages
Hydraulic Fracturing Technique To Improve Well Productivity and Oil Recovery in Deep Libyan Sandstone Reservoir
No ratings yet
Hydraulic Fracturing Technique To Improve Well Productivity and Oil Recovery in Deep Libyan Sandstone Reservoir
7 pages
Bostik Boscolastic Rev1
No ratings yet
Bostik Boscolastic Rev1
2 pages
07 Recsys1
No ratings yet
07 Recsys1
47 pages
05 KNN
No ratings yet
05 KNN
49 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
03 The Presocratics
No ratings yet
03 The Presocratics
3 pages
K-Nearest Neighbors: Nipun Batra July 5, 2020
No ratings yet
K-Nearest Neighbors: Nipun Batra July 5, 2020
66 pages
Module 2
No ratings yet
Module 2
53 pages
Statistical Learning
No ratings yet
Statistical Learning
92 pages
Chapter2 FML
No ratings yet
Chapter2 FML
33 pages
Lecture 2 Part1
No ratings yet
Lecture 2 Part1
14 pages
A Recommender System: John Urbanic
No ratings yet
A Recommender System: John Urbanic
36 pages
We Know You Will Like This: Introduction To Recommendation Engines
No ratings yet
We Know You Will Like This: Introduction To Recommendation Engines
33 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
Recommendation System
No ratings yet
Recommendation System
6 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
AML Mod5
No ratings yet
AML Mod5
33 pages
Programming Questions Recommendation System
No ratings yet
Programming Questions Recommendation System
12 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
Unsupervised Learning Algorithm 1
No ratings yet
Unsupervised Learning Algorithm 1
3 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
BookSlides 5A Similarity-based-Learning
No ratings yet
BookSlides 5A Similarity-based-Learning
40 pages
Architecting Recommender Systems: Boston Machine Learning
No ratings yet
Architecting Recommender Systems: Boston Machine Learning
63 pages
Lecture Slides-Week15,16
No ratings yet
Lecture Slides-Week15,16
50 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
DSB - Unit3
No ratings yet
DSB - Unit3
87 pages
L6 Recommendation
No ratings yet
L6 Recommendation
56 pages
Session 5
No ratings yet
Session 5
36 pages
Lecture 2 - Nearest-Neighbors Methods
No ratings yet
Lecture 2 - Nearest-Neighbors Methods
57 pages
BookSlides 5A Similarity Based Learning
No ratings yet
BookSlides 5A Similarity Based Learning
40 pages
16 Recommender Systems PDF
No ratings yet
16 Recommender Systems PDF
6 pages
Exp 2
No ratings yet
Exp 2
14 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
LondonR - Professional Matchmaking in R - Duncan Stoddard - 20160405-1
No ratings yet
LondonR - Professional Matchmaking in R - Duncan Stoddard - 20160405-1
28 pages
Predict Based Simmiliarity and Validation
No ratings yet
Predict Based Simmiliarity and Validation
19 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Lesson 6 Similarities KNN
No ratings yet
Lesson 6 Similarities KNN
25 pages
RecSys Updated
No ratings yet
RecSys Updated
37 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Data Mining With WEKA, Part 3 - Nearest Neighbor and Server-Side Library
No ratings yet
Data Mining With WEKA, Part 3 - Nearest Neighbor and Server-Side Library
7 pages
5c. Nearest Neighbour Classifier
No ratings yet
5c. Nearest Neighbour Classifier
2 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
DataViz: How to Choose the Right Chart for Your Data: Bite-Size Stats, #7
From Everand
DataViz: How to Choose the Right Chart for Your Data: Bite-Size Stats, #7
Lee Baker
No ratings yet

Unit - IV

Uploaded by

Unit - IV

Uploaded by

Unit - IV

• Singular Value Decomposition

• Principal Component Analysis

• Examples for recommendation systems:

1. Getting recommended movies on Netflix or YouTube

2. Getting recommended books on Flipkart or Amazon

• To build a solid recommendation system end-to-end requires an understanding of

• There are plenty of different ways to go about building such a model,

user’s gender, age, and preferences for up to three items.

• Sensitivity of distance metrics

• That solution depends on a single parameter, which is traditionally called λ.

• We typically use both Singular Value Decomposition (SVD) and Principal

• To know Linear algebra click the link:

• SVD can’t handle missing values.

• SVD is extremely computationally expensive.

• Just refer example given in Text book (Page No. 214)

We still need to understand a

 Check for the non-randomness criterion

s have A graph with

D BE,BG,BF,EF, 2/6 Total 9/16 ~ 0.56

 Of course, the “social network”. But also several other types

 Email network: nodes are email addresses

 Intuitively, two communities

– Total: O(ne) time

– Inter-community edges are less than intra-community edges

• While this approach is relatively efficient, it does have several limitations.

• It is not possible to place an individual in two different communities, and everyone is

• However, that task is not easy - NP-complete problems

Fig: The bipartite graph

• Another approach to organizing social-network graphs. We use some

• However, we also want to constrain the selection of the cut so that

• Then the normalized cut value for S and T is

i) Adjacency matrix (A)

 Mining of Massive Datasets, by Leskovec, Rajaraman and Ullman,

You might also like