Wavelet Tree

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

PARALLEL WAVELET

TREE
GROUP MEMBERS : MENTOR : DR. ARUN KUMAR YADAV
SHUKRANT TYAGI
SUMIT BAUDH
VIKRANT TYAGI
VAIBHAV KAUSHIK
CONTENTS
 Introduction to project
 Inverted index
 Comparison of inverted index construction approach
 Wavelet tree
 Parallel wavelet tree
 Comparison of WT and parallel WT
 Detailed description of project
 How project will work
Introduction to project

 We are going to develop a search engine, keeping the main


objective as making an efficient way of constructing the inverted
index.

 Efficiency in construction of inverted index comes from making


the present sequential algorithm to it’s parallel version.

 The parallel algorithm will be written and executed using CUDA,


this will help in reducing the time.
Inverted Index
 In computer science, an inverted index (also referred to as postings
file or inverted file) is an index data structure storing a mapping
from content, such as words or numbers, to its locations in a database
file, or in a document or a set of documents (named in contrast to
a Forward Index, which maps from documents to content).

 The purpose of an inverted index is to allow fast full text searches, at


a cost of increased processing when a document is added to the
database.

 In simple words, it is a hash-map like data structure that directs you


from a word to a document or a web page.
Inverted Index
Construction approaches

 Hybrid+R* tree
 Hybrid+B tree
 Wavelet tree
Comparison of construction approach
Wavelet tree

 Definition : The Wavelet Tree is a succinct data structure  to


store strings in compressed space.
 A wavelet tree contains a bitmap representation of a string.
 Different operations performed on wavelet tree :
 rankc(S,i) –
the number of char c at or before position i in S.
 selectc(S,j) –
the position of the jth occurrence of c in S.
Example of wavelet tree-
Another example-
Rank operation-
Select operation-
Parallel Wavelet Tree
 We reduce the memory requirement by substituting the prefix sums with the
parallel split operation, reducing the memory down to n logσ bits of additional
space excluding the input and output.

 σ is the alphabet size.

 further optimization is to implement the algorithm recursively instead of


strictly level-by-level. In particular, the two children of a node are constructed
via two recursive calls in parallel.

 This approach is more cache-friendly and avoids explicitly computing the


node boundaries per level, which requires O(σlogn) bits of space, and instead
each processor computes the boundaries when it launches the two recursive
calls, requiring O(logn logσ)bits of stack space per processor (one pointer for
each level of the tree).
CUDA C/C++
NVIDIA CORPORATION
INTRODUCTION

 CUDA is a set of developing tools to create applications that will


perform execution on GPU (Graphics Processing Unit).
 
 CUDA compiler uses variation of C with future support of C++
 
  CUDA was developed by NVidia and as such can only run on
NVidia GPUs of G8x series and up. 
 
WHY CUDA ?

 CUDA provides ability to use high-level languages such as C to develop


application that can take advantage of high level of performance and
scalability that GPUs architecture offer. 
 
  GPUs allow creation of very large number of concurrently executed
threads at very low system resource cost.
 
 CUDA also exposes fast shared memory (16KB) that can be shared
between threads. 
 
  Full support for integer and bitwise operations.
 
 Compiled code will run directly on GPU.
CUDA LIMITATIONS
 No support of recursive function. Any recursive function must be
converted into loops.
 
  Many deviations from Floating Point Standard (IEEE 754).
 
  No texture rendering.
 
 Bus bandwidth and latency between GPU and CPU is a bottleneck
for many applications.
 
 Threads should only be run in groups of 32 and up for best
performance.
 
 Only supported on NVidia GPUs
GPU V/S CPU
 GPUs contain much larger number of dedicated ALUs then CPUs.
 
 GPUs also contain extensive support of Stream Processing
paradigm. It is related to SIMD ( Single Instruction Multiple Data)
processing. 
 
 Each processing unit on GPU contains local memory that
improves data manipulation and reduces fetch time.
CUDA – EXECUTION FLOW
 At application start of execution CUDA's compiled code runs like
any other application. Its primary execution is happening in CPU.
 
 When kernel call is made, application continue execution of non-
kernel function on CPU. In the same time, kernel function does its
execution on GPU. This way we get parallel processing between
CPU and GPU.  
 
 Memory move between host and device is primary bottleneck in
application execution. Execution on both is halted until this
operation completes.
Simple Processing Flow

PCI Bus

1. Copy input data from CPU


memory to GPU memory
Simple Processing Flow

PCI Bus

1. Copy input data from CPU


memory to GPU memory
2. Load GPU program and
execute,
caching data on chip for
performance
Simple Processing Flow

PCI Bus

1. Copy input data from CPU


memory to GPU memory
2. Load GPU program and
execute,
caching data on chip for
performance
3. Copy results from GPU
memory to CPU memory
Memory Management

 Host and device memory are separate entities


 Device pointers point to GPU memory
May be passed to/from host code
May not be dereferenced in host code
 Host pointers point to CPU memory
May be passed to/from device code
May not be dereferenced in device code

 Simple CUDA API for handling device memory


 cudaMalloc(), cudaFree(), cudaMemcpy()
 Similar to the C equivalents malloc(), free(), memcpy()

You might also like