Narang 2013
Narang 2013
Narang 2013
DATA
Unrated
kNN graph
Unrated
demonstrate increasing oscillatory behavior as the magnitude of the
vertex vertex
graph frequency increases (see [13] for details). The graph Fourier
transform (GFT) of a signal f is defined as its projection onto the
eigenvectors of the graph, i.e., f˜(λi ) =< f , ui >, or in matrix
form f̃ = Ut f .
0.11789 0.75042 0.11789 1.1475 2.2. Sampling Theory for Graph Signals
(a) (b)
We start by revisiting the theory of downsampling graph signals in
Fig. 1: An instance of predicting ratings of an unknown movie node (in red) the recent work by Pesenson [11], as well as its links to our prior
using ratings of a known set of movie nodes (in blue), in MovieLens 100k work in [12]. A signal is said to be bandlimited to the graph fre-
dataset: (a) star graph commonly used in kNN [8] prediction methods, quency band [0, ω) on a graph G, if its GFT has support only at
which ignores all the links between known movie nodes. (b) alternative frequencies [0, ω). The space of ω-bandlimited signals is called the
graph that contains the star graphs and all the links between movies in the Paley-Wiener space and is denoted by P Wω (G). It is easy to prove
known set of movies. that if f ∈ P Wω (G), then:
5446
This result is also consistent with the result in our prior work on bi- for piecewise constant signals, a common practice is to find LS ap-
partite graphs in [12, 14]. Thus, an interpolation on a graph can be proximation of function g = D1/2 f . Thus, we want to compute
posed as the problem of first defining the set of nodes with known interpolated signal ĝ such that (ĝ)S = (g)S and:
sample values as a uniqueness set S, then identifying the maximum
KΛ
ω such that S is a uniqueness set for signals in P Wω (G), and then X
reconstructing the signal values on the complement set S c by us- ĝ = x(k)uλk = UK ∗ x, (6)
k=1
ing (3).
While Pesenson’s work proves the existence of such ω, it does where x(k) = g̃(λk ) is the kth GFT coefficient of g, and x =
not provide a method to compute it, and only considers how to in- [x1 , x2 , ...xKΛ ]. Comparing only the known set S on both sides
terpolate unknown values (in S c ) in the case when the signal is ω- in (6), we obtain a linear system of equations: g(S) = (UK ∗ )S x.
bandlimited. In the next section, we provide a result to compute Theorem 1 ensures that (UK ∗ )S is a stable frame operator, and the
maximum ω given known set S, as well as interpolation in case of solution can be found by computing the pseudo inverse of (UK ∗ )S .
non-bandlmited signals. Thus, the interpolated graph signal on the unknown set S c is given
by
3. PROPOSED INTERPOLATION −1
g∗ (S c ) = (UK ∗ )S c ((UK ∗ )S )t (UK ∗ )S ((UK ∗ )S )t f (S).
(7)
To make a practical use of Pesenson’s result in Theorem 1, we first Finally, the interpolated signal is computed as: f̂ = D−1/2 ĝ.
present a result to compute the maximum ω, such that any signal in Note that, while other global methods such as [9, 10] also pro-
P Wω (G) can be reconstructed, given a subset of known samples in pose similar least-square reconstruction solutions, the choice of
S on any arbitrary graph G. number of eigenvectors K in these methods is heuristic. In our
proposed method K ∗ is chosen specifically to be the number of
Proposition 2 Given a graph G with normalized Laplacian matrix eigenvalues below the cut-off frequency ωS∗ given in Proposition 2,
L, known set S and unknown set S c , let (L2 )S c be the submatrix of which depends on the known set of nodes, and the topology of the
L2 containing only the rows and columns corresponding to unknown graph. The optimality of choosing ωS∗ as cut-off frequency can be
set S c . Then the known set S is a uniqueness set for all signals f ∈ justified as follows:
P WωS (G) with ωS∗ = σmin , where σmin 2
is the smallest singular Let ωS is the chosen cut-off frequency. If signal f ∈ P WωS (G),
2
value of (L )S c . then the proposed reconstruction is perfect (loss-less), hence opti-
mal. For f ∈ / P WωS (G), our proposed method still provides a sta-
Proof Referring to Definition 2, let φ be a signal in L2 (S c ), i.e., ble least-square solution. The solutions with ωS < ωS∗ are clearly
φ = [0(S)t φ(S c )t ]t . We have suboptimal in this case since the solution space of ωS is contained
in the solution space of ωS∗ . For ωS > ωS∗ , the reconstruction may
kLφk2 < φ(S c ) (L2 )S c φ(S c ) > sometimes produce less error, but it is not guaranteed to be stable
= = R(L2S c ) (4) (i.e., matrix (UK ∗ )S might not be a frame). In Section 4, we show
kφk 2 kφ(S c )k2
that choosing ωS > ωS∗ leads to poorer results.
where R(.) is the Rayleigh quotient of a matrix. It can be shown that
R((L2 )S c ) is always greater than the minimum eigenvalue σmin
2
of 3.2. Bilateral Link-weight Adjustment
the matrix (L2 )S c . Thus,
Unlike regular signals, the smoothness of a graph signal depends
2 both on the signal values and the underlying graph. This leads to
kLφk 2 1
≥ σmin ⇒ kφk ≤ kLφk (5) the question of whether we can modify the graph to adapt to the
kφk2 σmin given signal so that the signal is more band-limited on the simpli-
fied graph, thus leading to less interpolation error. The simplifica-
Thus, S c is a Λ-set with Λ = σmin1
. By Theorem 1, S is a unique- tion makes sense in many cases such as in recommendation systems,
ness set for all signals f ∈ P Wω (G) with ωS∗ = σmin . where the underlying graph is the result of observing average corre-
lation over a set of training users (multiple instances), and the signal
Note that, ωS∗ computed above is the maximum possible value that corresponds to a single test user. We take inspiration from image
satisfies the sufficient conditions in Theorem 1. We term ωS∗ as the processing where this kind of signal adaptive filtering is achieved by
cut-off frequency for reconstruction, since any graph-signal below bilateral filters [15].
this frequency can be perfectly reconstructed from its DU signal on In our proposed method we use g = D1/2 f as the signal to
S. be interpolated. From (1), we observe that g can be made more
bandlimited by minimizing ||Lg||. Define an error function:
3.1. Interpolation method ζ = Lg = (I − W)g = D1/2 (I − D−1 W)f . (8)
Given the cut-off frequency ωS∗ computed from Proposition 2, the Clearly, minimizing ζ at each node minimizes ||Lg||. The value of
reconstruction of any graph signal is done by least-square projection ζ at node i can be written as:
of the corresponding DU signal onto the P Wω (G) space. Let K ∗ !
be the number of eigenvalues of the Laplacian matrix L, less than √ 1 X
ζ(i) = di f (i) − wij f (j) , (9)
ωS∗ . Define UK ∗ as the matrix containing first K ∗ eigenvectors, and di j
(UK ∗ )S as the submatrix of UK ∗ containing rows corresponding
to set S. The first eigenvector of a connected graph is u1 = D1/2 1, which is proportional to the difference of f (i) with the weighted av-
which is not constant for graphs with irregular degrees. Therefore, erage (the weight being link-weights) of nodes directly connected to
5447
node i. Thus, by adapting the weights wij to be inversely propor- with or without bilateral weighting are very close to PMF method,
tional to the absolute difference |(f (i) − f (j)| at every node, we can with the interpolation with bilateral weights performing slightly bet-
minimize the error ζ, and hence ||Lg||. Following similar intuition ter. We also observed that choosing K = K ∗ + 10 in LS method
as in bilateral filters, we modify the weights between nodes in S as: leads to poorer results. The Figure 2(b) shows another RMSE plot
where users are grouped by the number of training samples, with
|f (i) − f (j)|2 x-axis showing those groups. We observe that both the LS method
ŵij = wij .exp(− ) ∀{i, j ∈ S}, (10)
θ2 and kNN method perform significantly worse when the number of
available training samples are small. The effect of applying bilateral
where parameter θ is chosen to be the mean rating over all training
weighting in our proposed methods is also most visible here.
samples. Note that in (10), we can only change the weights between
The PMF method predicts the ratings of all movies for all users
two known nodes. Our proposed interpolation algorithm is given in
simultaneously by factorizing the whole N × M rating matrix. It is
Algorithm 1. The complexity of our method is O(K ∗ |V|2 ), primar-
based on an iterative update rule and requires O(N M P ) operations
ily due to the least-square projection steps (5 − 7) in Algorithm 1.
per iteration where P is the size of the latent space. Theoretically,
any change to the rating matrix would require the PMF system to be
Algorithm 1 Proposed Graph Interpolation Method retrained on all users. However, in our method, the process of com-
Require: G0 = (V, E): Initial Graph, f : DU signal puting the movie graph is decoupled from the process of predicting
1: Compute normalized Laplacian matrix L. ratings for a given user. Accommodating a few new ratings into the
2: Compute ωS ∗ as the square-root of the smallest eigenvalue of (L2 ) c .
S systems is fast as it only affects a local portion of the movie graph.
3: Modify known nodes’ link-weights as bilateral weights using (10). Once the movie graph is fixed, the proposed method allows us to
4: Recompute normalized Laplacian matrix L. predict the ratings of movies for each user separately in O(K ∗ M 2 )
5: Compute K ∗ eigenvectors of L corresponding to eigenvalues λ < ωS ∗.
operations. Thus, assuming K ∗ ≈ P , the proposed method is faster
6: Compute g = D1/2 f .
7: Compute ĝ: (ĝ)S = (g)S and (ĝ)S c using (7).
(i.e., M 2 < N M ) than PMF, when ratings of items change fre-
8: Interpolated signal f̂ = D−1/2 ĝ.
quently and the recommendations need to be calculated only when a
user request them. Further , it may be possible to reduce complexity
in our method by using simplified filtering operations.
4. EXPERIMENTS
3
LS projection (K= K* + 10)
We apply proposed interpolation method for collaborative filtering kNN (k = 30)
PMF
in recommendation systems. The input in this problem is a partially
Cumulative RMSE
2.5
Proposed method (K = K*)
Proposed Method with bilateral weights
observed user-item rating matrix R, such that R(u, m) is the rating
2
given by user u to the movie m. Based on this information, the
system predicts new user-movie ratings. For empirical evaluation, 1.5
we choose the MovieLens 100k [16] dataset containing 100k user-
movie-rating triplets from N = 943 users and M = 1682 movies. 1
The ratings are integer values between 1 and 5. We use the 5-fold 0 100 200 300 400
Number of training samples
500 600 700
5448
6. REFERENCES
[1] M. Weber and S. Kube, “Robust Perron cluster analysis for var-
ious applications in computational life science,” in CompLife,
2005, pp. 57–66.
[2] Z. Huang, W. Chung, T. Ong, and H. Chen, “A graph-based
recommender system for digital library,” pp. 65–73, 2002.
[3] M. Girvan and M. E. Newman, “Community structure in social
and biological networks.,” Proc Natl Acad Sci U S A, June
2002.
[4] D. I Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-
dergheynst, “Signal processing on graphs: Extending high-
dimensional data analysis to networks and other irregular data
domains,” to appear SPM 2012, also in arXiv’12, , no.
arXiv:1211.0053, Dec. 2012.
[5] R.I. Kondor and J. Lafferty, “Diffusion kernels on graphs and
other discrete structures,” in Proc. ICML, 2002, pp. 315–322.
[6] S. Hoche, P. Flach, and D. Hardcastle, “A fast method for
property prediction in graph-structured data from positive and
unlabelled examples,” in ECAI 2008, 2008, pp. 162–166.
[7] J. Bennett, S. Lanning, and N. Netflix, “The netflix prize,” in
In KDD Cup and Workshop in conjunction with KDD, 2007.
[8] J. Chen, H. Fang, and Y. Saad, “Fast approximate KNN graph
construction for high dimensional data via recursive lanczos
bisection,” J. Mach. Learn. Res., vol. 10, pp. 1989–2012, dec
2009.
[9] M. Belkin and P. Niyogi, “Semi-supervised learning on rie-
mannian manifolds,” pp. 209–239, 2004.
[10] L. Grady and E. L. Schwartz, “Anisotropic interpolation on
graphs: The combinatorial dirichlet problem,” Tech. Rep.,
Boston University, 2003.
[11] I. Pesenson, “Sampling in Paley-Wiener spaces on combina-
torial graphs,” Trans. Amer. Math. Soc, vol. 360, no. 10, pp.
5603–5627, 2008.
[12] S.K. Narang and A. Ortega, “Downsampling graphs using
spectral theory,” in ICASSP ’11., May 2011.
[13] E. B. Davies, G. M. L. Gladwell, J. Leydold, and P. F. Stadler,
“Discrete nodal domain theorems,” Linear Algebra and its Ap-
plications, vol. 336, no. 1-3, pp. 51 – 60, 2001.
[14] S.K. Narang and Ortega A., “Perfect reconstruction two-
channel wavelet filter-banks for graph structured data,” IEEE
trans. on Sig. Proc. 2012, vol. 60, no. 6, June 2012.
[15] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and
color images,” in Sixth International Conference on Computer
Vision, Jan 1998, pp. 839–846.
[16] “MovieLens dataset,” as of 2003, http://www.
grouplens.org/.
[17] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based
collaborative filtering recommendation algorithms,” in WWW
’01, 2001, pp. 285–295.
[18] Ruslan Salakhutdinov and Andriy Mnih, “Probabilistic matrix
factorization,” in Advances in Neural Information Processing
Systems, 2008, vol. 20.
5449