0% found this document useful (0 votes)
54 views44 pages

Matrix and Graph

The document discusses matrix and graph data structures. It describes how a matrix can be represented as a 2-dimensional array or sparsely using lists. For sparse matrices, it is more efficient to store only the nonzero elements and their indices. Graphs can be represented using an adjacency matrix where cells indicate connections between nodes. The document also covers binary matrices, common matrix operations like addition, and using endmarks to simplify sparse vector algorithms.

Uploaded by

Abhik Dasgupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views44 pages

Matrix and Graph

The document discusses matrix and graph data structures. It describes how a matrix can be represented as a 2-dimensional array or sparsely using lists. For sparse matrices, it is more efficient to store only the nonzero elements and their indices. Graphs can be represented using an adjacency matrix where cells indicate connections between nodes. The document also covers binary matrices, common matrix operations like addition, and using endmarks to simplify sparse vector algorithms.

Uploaded by

Abhik Dasgupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Matrix and Graph

• Matrix
• Binary Matrix
• Sparse Matrix
• Operations for Vectors/Matrices
• Graph and Adjacent Matrix
• Adjacent List
Matrix and Graph

• Matrix is a 2-dimensional structure


• Used in wide areas from physical simulations to customer
management

• Graphs are also used in many areas, to represent the


relations and flows between data

• Some data structures have been considered to handle matrix


and graph; update, preserve, search, and operate
2-Dimensional Structure of Matrix

• An n×m matrix has n×m numbers

  can be stored in an array of size n×m


  [i,j] element corresponds to the i*m+j th cell of the array
  
• A naïve design is done, but there are something more
2-Diemnsaional Array

• There is a way to make 2-dimensional array, instead of


usual 1-dimensional array

• Prepare an array of pointers of size n


• Prepare n arrays of size m, and write the place of the first
cell of i-th array to the i-th cell of the pointer array
• [i,j] element of matrix a is accessed by a[i][j] (in C)
  

Simple structure

O(nm) memory space


Allocate a 2-Dimensional Array
int *MATRIX_alloc ( int n, int m ){
int i, **a, flag =0;
a = malloc ( sizeof(int *)*n );
if ( a=NULL ) return (NULL);
for ( i=0 ; i<n ; i++ ){
a[i] = malloc (sizeof(int)*m);
if ( a[i] = NULL ) flag = 1;
}
if ( flag == 1 ) return (NULL); else return (a);
}

int *MATRIX_free ( int **a ){


int i;
for ( i=0 ; i<n ; i++ ) free ( a[i] );
}
Binary Matrix

• A binary matrix is a matrix all whose cells are either 0 or 1


  + each cell is either ○ or ×
  + adjacency matrix of a graph, shown later

• Space consuming if use one integer for one 01 value (1 bit)


  motivated to compress the matrix

01010
10001
11110
11000
Representation by Bits
• A row composed of 01 values can be considered as a big integer

 by chopping into some integers of 32 bits (or 64 bits), the


integer becomes tractable

 └m/32┘ integers are sufficient to store a row


   (space efficiency also increases, and also cache efficiency)

• [i,j] element can be accessed by looking at the (j%32)th bit of


the j/32 th integer in the i-th row
Handling Bit Access
• [i,j] element can be accessed by looking at the (j%32)th bit of
the j/32 th integer in the i-th row
 … writing a code is bothering

• Prepare an array
BIT_MASK[]= {1,2,4,8,16,…}
BIT_MASK_[]= {0xfffffffe, 0xfffffffd, 0xfffffffb, …}

+ read value: a[i][j/32] & BITMASK[j%32]


+ set to 1 : a[i][j/32] = a[i][j/32] | BITMASK[j%32]
+ set to 0 : a[i][j/32] = a[i][j/32] & BITMASK_[j%32]
Sparse Matrix

• That’s all, for structures for simple matrices


• Space efficiency is in some sense optimal

• But, in application, it is often not sufficient/efficient


  for example, if matrix is sparse, many parts are redundant

• Sparse matrix has the same value in many cells (usually 0)

• Sparse matrix should be stored by memorizing the places with


non-zero values
Storing Sparse Matrix

• Let’s begin from binary matrix, for simplicity


  almost cells are 0, and few 1’s

• A simple idea is to make a list of the places of the cells being 1


• That is, memorize (x1,y1),(x2,y2),(x3,y3),… , store the row ID
and column ID of the cells being 1

• The memory requirement is “twice the number of 1’s”


this is very efficient if there are few 1’s (sparse)

• But, bad accessibility; to read a cell, we have to scan all


(binary tree / hash can be used)
Store Row-wise

• Let’s have a structure to improve the accessibility

• Classify the places of 1’s according to their row ID


  prepare n arrays, and store the column ID of 1’s in i-th
row, in the ith array

• We need to have n pointers to n arrays, but we don’t have to


store the row ID’s, thus memory efficiency increases

• The memory requirement is “# 1’s + #rows×2”


(can be “# 1’s + #rows”)
• Accessibility is good; sorting ID’s in a row array, binary search
works (linear scan is enough, if few column ID’s)
Structure in Each Row

• In sparse cases, the efficiency is increased,


However, the update concerned with insertion/deletions is not
efficient

• They are the same, in the situation of stacks and queues

• So, according to the purpose, we use lists bucket/hash/binary


tree for structures in a row
  (having n arrays is equivalent to having buckets)
Real World Data

• The characteristics of sparse matrices in practice are;

+ Matrix representing mesh network (structural calculation)


few meshes are adjacent to one, in geometrical sense, thus not
so many non-zeros per row

 array is sufficient for structures of rows

+ Road network data (adjacency of cross points + distance)


  almost the same, but update comes sometimes
(would be sufficient if (re-)allocate bit larger memory)
Real World Data (2)
Ex) A matrix representing, row  text, column  word, a cell is
one if the word is included in the text, is sparse, usually
  (POS data, Web links, Web surfing, etc.)

+ on average, #1’s in a row/column is constant, but some have so


many   (texts having many words, words included in many
texts)
+ distribution of 1’s is that so called power (zip) law, scale free;
#of items of size D is proportional to 1 / ΔD
can be often seen in real world data (≠ geometric distribution)

+ Such data needs algorithms designed so that the dense part will
not affect badly; will be the bottle neck of the computation
Non-binary Sparse Matrix

• Usual matrix are of course non-binary, it is not sufficient to


remember the places having non-zero value
 remember (place, value)

• In the case of using array,


(place, value), (place, value), (place, value),…, or
place1, plcae2,…, value1, value2,…

• In the case of lists of binary tree, assign (place, value) to each


cell/node
or, simple prepare two of them
Exercise

• Make data representing the following matrix in a sparse way

0,0,1,4,0
0,1,0,0,5
2,0,0,0,0
1,2,5,0,2
0,0,0,0,0
Column: Memory Saving for Matrix

• Buckets, or a row of a sparse matrix needs two data


(pointer to the first cell, and the size kii)
• We decrease these from two to one

• First, prepare an array of size equal to # non-zero cells. Then,


+ 0th row uses the cells of the array ranging from 0 to k00-1
+ 1stst uses from k00 to k00+k11-1 …
+ i-th row uses from k00+…+ki-1 i-1
to k00+…+kii-1,
and we remember only the start positions of the rows

• The size of i-th row can be obtained by


(start position of i+1) - (start position of i)
Matrix Operation

• Basic matrix operations are addition and multiplication


(inner product of vectors is a special case)
Further, AND and OR for binary matrix

• Algorithms for the operations are trivial if the matrices are in


the form of 2-dimensional array

However, not clear if they are in sparse forms

• Further, there are several structures that have advances for


matrix operations
Addition of Matrix
• For the addition, it is sufficient to have algorithms for additions
of each row
(so, operations of vectors are sufficient)

• First, we see the case of inner product of sparse vectors


Inner Product
• For computing inner product of two sparse vectors, the difficulty
is that we have to find the cell corresponding to each

• Sort the cells in each vector according to their column ID


• Scan two vectors simultaneously, from smaller indices
 “ simultaneously” means that iteratively pick up the smallest
column ID among the two vectors

• When we find a column ID at which both vector have non-zero


values, accumulate the product of the cells

1 5 5 1 7 3
1 1 3 3 5 4
A Code for Sparse Inner Product
int SVECTOR_innerpro (int *va, int ta, int *vb, int tb){
int ia=0, ib=0, c=0;
while ( ia<ta && ib<tb){
if (va[ia*2] < vb[ib*2] ) ia++;
else if (va[ia*2] > vb[ib*2] ) ib++;
else {
c = c + va[ia*2+1]*vb[ib*2+1];
ia++; ib++;
}
}
return ( c );
}

1 5 5 1 7 3
2 1 3 3 5 4
Addition of Two Vectors
• The addition can be done in a similar way

• Sort the cells in each vector according to their column ID


• Scan two vectors simultaneously, from smaller indices

• The positions of non-zero values in the resulted vectors are those


having non-zero values in one of two vectors, thus can be easily
identified by the scan

1 5 5 1 7 3
2 1 3 3 5 4
A Code for Addition
int SVECTOR_add (int *vc, int *va, int ta, int *vb, int tb){
int ia=0, ib=0, ic=0, c, cc;
while ( ia<ta || ib<tb){
if (ia == ta ){ c = vb[ib*2+1]; cc = vb[ib*2]; ib++; }
else if ( ib == tb ){c = va[ia*2+1]; cc = va[ia*2]; ia++; }
else if (va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; }
else if (va[ia*2] < vb[ib*2] ) { c = va[ia*2+1]; cc = va[ia*2]; ia++; }
else {
c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2];
ia++; ib++;
}
vc[ic*2] = cc; vc[ic*2+1] = c; ic++;
}
return ( ic );
}

1 5 5 1 7 3
2 1 3 3 5 4
Column: Endmarks do a Good Job!

• Compared to inner product, code for addition is relatively long


 we have exceptions at the end of the array

• So, we are motivated to simplify the code by using “endmark”


(endmark is a symbol that represent the end of the array, or
something else representing the end)

• 0, -1 or a very large value is used as an endmark


• We prepare an additional cell next to the end of each array, and
put an endmark at the cell
Column: Endmarks do a Good Job! (2)

int SVECTOR_innerpro (int *vc, *vc, int *va,


*va, int
int ta,
ta, int *vb,
*vb, int
int tb){
int
int ia=0,
ia=0, ib=0,
ib=0, ic=0,
ic=0, c,
c, cc;
cc;
while
while (( va[ia*2]
va[ia*2] !=
!= ENDMARK && vb[ib*2] != ENDMARK){ ENDMARK){
if (va[ia*2]
(va[ia*2] > vb[ib*2] ) { c = vb[ib*2+1]; cc = vb[ib*2]; ib++; }
else if (va[ia*2] < vb[ib*2]
vb[ib*2] )) {{ c = va[ia*2+1]; cc = va[ia*2]; ia++; }
else {{
c = va[ia*2+1] + vb[ib*2+1]; cc = vb[ib*2];
ia++; ib++;
}
vc[ic*2] = cc; vc[ic*2+1] = c; ic++;
}}
vc[ic*2]
vc[ic*2] == ENDMARK;
ENDMARK;
return
return (( ic );
);
} 1 5 5 1 7 3 ■
2 1 3 3 5 4 ■
Matrix Multiplication
• For sparse matrix multiplication, compute the inner products of
all the pairs of a row and a column

• However, a sparse matrix has row representations but not


column representations, getting column vectors is hard

• A simple solution is to use transposing algorithm that is


explained in the section of bucket; we will have column
representation

• On the other hand, some data structures are designed to be


enabled to trace also columns
Four-Direction List
• Lists are good at storing sparse vectors, for tracing
• However, collection of lists isn’t good at tracing column vectors,
because the cells are not connected vertically
• …so, let’s have a list connected in both row direction and
column direction

• Each cell has four arms, that point the neighboring cells in
directions of (←, →, ↑, ↓)

2
7
Pointing the Neighbors
• Links to four directions seems to form a mesh network, but not
• …since, the links can cross

• In the other words, this structure can be seen as a superimpose of


two kinds of lists; horizontal direction and vertical direction,
and the identical cells are unified into one

4 4

4 7 2

4
Having Lists of 2-Directions
• If we have lists of row vectors and column vectors both, we can
have the same accessibility, but insertions/deletions are not same

• For example, when we want to delete a cell in a row vector, we


would take long time to find the corresponding cell in column lists
In four-direction lists, they are already unified

4 7
Graph Structure

• A graph is a structure composed of a set of vertices and a set of


edges (an edge is a pair of vertices)

• Formed by sets, so the information such as positions, shapes, and


crossing edges do not matter, when it is drawn as a picture
 
(a graph with shape/position information
is called “graph visualization” or
“embedded graph”)

• When edges have directions (from one


vertex to another), it is called directed

very popular structure


Examples of Graph Data
• Adjacency relation
Hierarchy in an organization
Similarity relation

• Web network, human network, SNS friend network,…


Graph Terminology
• Edge e is said to be incident to u, v, and vice versa, if e = (u,v)
also u and v are said to be adjacent

• The #edges incident to v is the degree of v

• A graph having edges for any two vertices is a complete graph

• When there are two or more edges connecting two vertices, the
edges are called multiple edges

• If there is a partition of vertices so that any edge connects a vertex


in a group and one in the other, the graph is called bipartite graph
Storing a Graph
• n vertices can be seen as numbers 0,…,n-1
• Then, an edge is a pair of numbers

 can be stored by writing the pairs in array, lists, etc.

• Further, we need something for the accessibility

for example, we often visit a vertex, and go to the


neighboring vertex, and so we need to scan
all edges incident to the vertex
Using Matrix
• The set of edges can be represented by a matrix as follows

① j-th row/column corresponds to vertex j, and ij-cell is 1 if there


is edge (i, j) (called adjacency matrix)
  + efficient for dense graph having many edges
+ multiplicity of edges can be represented by the value of a cell

② j-th row corresponds to vertex j, and each column corresponds to


an edge; when edge e is incident to vertex i, ij cell is 1 (called
incidence matrix)
+ multiple edges represented easily

• Sparse matrix representation has advantage for


incidence matrix and sparse graph
In Practice
• 2-dimensional array is sufficient when the matrix size is small
the cost is small, redundancy is small

• Sparse matrix such as 100 by 100 with 10 non-zero elements in a


row, sparse representation will be efficient
(approximately, when density is less than 10%)

+ When we often want to scan non-zero elements, such as tracing


all vertices adjacent to a vertex, sparse representation is useful

+ If we want to check whether there is an edge between two


specified vertices, 2-dimensional array has advantage
Incidence Matrix
• An incidence matrix represents the incidence relation between
vertices and edges

• Put indices from 0,…,n-1 to vertices, and 0,…,m-1 to edges


+ store edges incident to a vertex to the corresponding row
= storing vertices incident to an edge in
0: 0,1
the corresponding column
1: 1,5
2: 0,3
0 0 0: 1,3 0: 0,2 3: 1,4
1 1: 0,2,4,5 1: 0,1,3,4 4: 1,2
5 2
1 2: 1,3,4,5 2: 4,6,7,8 5: 4,5
5
6 3 4 +
3: 0,2 3: 2,7 6: 2,5
4 8 2 4: 1,2,5 4: 3,5,8 7: 2,3
3 7 5: 1,2,4 5: 1,5,6 8: 2,4
Advantage of Incidence Matrix
• In the case of incidence matrix, each edge has ID
 so, easy to handle the attached information to each edge
 just allocate an array of size m, and it is sufficient
• In the case of adjacency matrix, edge doesn’t have ID, thus not
easy to manage correspondence of edge and its data

0: 0,1
• Multiple edges are also easy to handle
1: 1,5
2: 0,3
0 0
0: 1,3 0: 0,2 3: 1,4
1 1: 0,2,4,5 1: 0,1,3,4 4: 1,2
5 2
1 2: 1,3,4,5 2: 4,6,7,8 5: 4,5
5
6 3 4 +
3: 0,2 3: 2,7 6: 2,5
4 8 2 4: 1,2,5 4: 3,5,8 7: 2,3
7 5: 1,2,4 5: 1,5,6
3 8: 2,4
Allocate Memory for Cells
• Incidence matrix can be realized by cells of lists having four links
like sparse matrix
(two for vertices of the edges, and two for the edges in the vertex)
 disadvantages of arrays are eliminated
• Also can be of two array lists
• or, prepare an array and edge i corresponds to
0: 0,1
cells 2i and 2i+1, to represent four links
1: 1,5
2: 0,3
0 0 0: 1,3 0: 0,2 3: 1,4
1 1: 0,2,4,5 1: 0,1,3,4 4: 1,2
5 2
1 2: 1,3,4,5 2: 4,6,7,8 5: 4,5
5
6 3 4 +
3: 0,2 3: 2,7 6: 2,5
4 8 2 4: 1,2,5 4: 3,5,8 7: 2,3
3 7 5: 1,2,4 5: 1,5,6 8: 2,4
Exercise
• Make an adjacency matrix of the following graph, and that in
A sparse incidence matrix
6 0
5 1

4 2
3
Bipartite Graph
• A bipartite graph is often seen as a representation of a (binary)
(sparse) matrix

 associate nodes of one group to rows, and the others to


columns
connect by edges between vertices corresponding a cell with
non-zero value
0 4
0: 4,6
• A representation of different style
1 5 1: 4,5
2: 5,6
3: 5,6
2 6

3
Column: Store Huge Graph
• A graph needs two pointer (or integer) per edge
weight, and etc. need more

• 64 bits are required in 32 bit CPU

• However, Web graphs have billion of vertices, and 20 billions of


edges
 160GB is necessary in this way

• This is too much. Can we reduce the storage size?


Column: Store Huge Graph (2)
① Only few edges have large degrees
Vertices are mainly adjacent to these few vertices

 Put indices so that large degree vertices have small indices,


and represent small indices by small number of bits, and large
indices by many bits

Ex.)
• If the bit sequence representing a number begins with “0”, the
following 7 bits represent [0-127]
• If “10”, the following 14 bits represent 128+[0-16383]
• If “11”, the following 30 bits represent 16384+128,…
Column: Store Huge Graph (3)
② Sort the sites in dictionary order of their URLs
 links are usually to near, thus difference of ID’s becomes small

• They can be recorded in the same way, to reduce the space

• Using these, one edge needs just 10 bits


Further, we can reduce it to 5 bits

 The storage will be 20GB, thus can fit recent computers


Summary

• Data structures for matrix

• Structures for sparse matrix, and four directed lists

• Structures for graphs:


adjacency matrix and incidence matrix
adjacency list

You might also like