5 Indexing and Searching Big Data

This document discusses Lucene, an open-source information retrieval library written in Java that allows applications to add indexing and search capabilities. It describes how Lucene works by converting text into an inverted index format that enables fast searching. The key components for indexing include IndexWriter, Document, Analyzer, Field, and Directory classes. For searching, important classes are IndexSearcher, Term, Query, TermQuery, and TopDocs. Various analyzers like WhitespaceAnalyzer and StandardAnalyzer are available in Lucene to preprocess text for indexing.

Uploaded by

Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

5 Indexing and Searching Big Data

Uploaded by

Vijay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Searching and

Indexing Big Data

By Dinesh Amatya
Lucene

 Lucene is a high performance, scalable Information

Retrieval (IR) library.
 Iets you add indexing and searching capabilities to
your application
 can index and make searchable any data that can
be converted to a textual format
 mature, free, open-source project implemented in
Java

Lucene
Basic Concepts : Indexing

 To search large amounts of text

quickly, one must first index that
text and convert it into a format
that will let one search it rapidly,
eliminating the slow sequential
scanning process. This conversion process is
called indexing, and its output is called an index.

Basic Concept : Inverted Index
Basic Concept: Searching

 Searching is the process of

looking up words in an index
to find documents where
they appear
 Quality of search described by
– Recall

– Precision
 Searches index instead of text
–


Typical Components of Search Application
Core Indexing Classes

IndexWriter
Document
Analyzer
Field
Directory
Primary Analyzers available in
Lucene
WhitespaceAnalyzer
SimpleAnalyzer
StopAnalyzer
KeywordAnalyzer
StanderdAnalyzer
Core Searching Classes

IndexSearcher
Term
Query
TermQuery
TopDocs
References

 https://en.wikipedia.org/wiki/Full_text_search
 Lucene in Action
 http://www.javabeat.net/using-the-built-in-analyzers-in-
lucene/

UNIT I - Introduction and Motivation
No ratings yet
UNIT I - Introduction and Motivation
57 pages
Information Retrieval and XML Data: ADBMS Unit-4
No ratings yet
Information Retrieval and XML Data: ADBMS Unit-4
37 pages
Aesthetics and Technology in Building, Pier Luigi Nervi
100% (4)
Aesthetics and Technology in Building, Pier Luigi Nervi
146 pages
IR Unit III - Notes
No ratings yet
IR Unit III - Notes
18 pages
Irs Unit - 3
No ratings yet
Irs Unit - 3
68 pages
Unit-5. Search Engines
No ratings yet
Unit-5. Search Engines
105 pages
Lucene Tutorial
100% (1)
Lucene Tutorial
189 pages
Information
No ratings yet
Information
61 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
L01
No ratings yet
L01
33 pages
Introduction To Information Retrieval - DR Alshli
No ratings yet
Introduction To Information Retrieval - DR Alshli
46 pages
Chapter - 6 - Searching and Indexing
No ratings yet
Chapter - 6 - Searching and Indexing
44 pages
4
No ratings yet
4
35 pages
Chapter 5 Searching and Indexing Big Data 250525 070825
No ratings yet
Chapter 5 Searching and Indexing Big Data 250525 070825
19 pages
Chapter 1
No ratings yet
Chapter 1
52 pages
Lect 1 IRIntroduction
No ratings yet
Lect 1 IRIntroduction
59 pages
NLP 05
No ratings yet
NLP 05
26 pages
Searching and Indexing
No ratings yet
Searching and Indexing
21 pages
A Search Engine That Supports Rich Snippets
No ratings yet
A Search Engine That Supports Rich Snippets
37 pages
Chapter 5 1712934164766
No ratings yet
Chapter 5 1712934164766
13 pages
Lecture1 Chap1
No ratings yet
Lecture1 Chap1
22 pages
IR Chapter 1
No ratings yet
IR Chapter 1
29 pages
Informaiton Retrieval and Web Search
No ratings yet
Informaiton Retrieval and Web Search
44 pages
Information Retrieval: Adt-V Unit
No ratings yet
Information Retrieval: Adt-V Unit
106 pages
Chap 1
No ratings yet
Chap 1
23 pages
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
No ratings yet
Monday - IR Fundamentals - Grace Yang - AFIRM19-IR
77 pages
Paper 10
No ratings yet
Paper 10
8 pages
1.introduction Information Retrival
No ratings yet
1.introduction Information Retrival
31 pages
IR-Module 1 and 2
No ratings yet
IR-Module 1 and 2
48 pages
4 TH Unit
No ratings yet
4 TH Unit
13 pages
Unit - 1
No ratings yet
Unit - 1
51 pages
Apache Lucene
100% (1)
Apache Lucene
13 pages
Information Storage And: Retrieval Techniques
No ratings yet
Information Storage And: Retrieval Techniques
56 pages
Introduction
No ratings yet
Introduction
32 pages
Lucene Solr
No ratings yet
Lucene Solr
52 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
50 pages
Advanced Lucene: Grant Ingersoll Center For Natural Language Processing Apachecon 2005 December 12, 2005
0% (1)
Advanced Lucene: Grant Ingersoll Center For Natural Language Processing Apachecon 2005 December 12, 2005
37 pages
Irs Ia 1
No ratings yet
Irs Ia 1
12 pages
Module 1print
No ratings yet
Module 1print
5 pages
Everything in Brief Introduction
No ratings yet
Everything in Brief Introduction
5 pages
Enhancing Search Capabilities: Exploring Lucene and Solr Techniques For Improved Search Performance
No ratings yet
Enhancing Search Capabilities: Exploring Lucene and Solr Techniques For Improved Search Performance
10 pages
Elasticsearch and Apache Lucene
No ratings yet
Elasticsearch and Apache Lucene
7 pages
Unit 1: Introduction and Data Pre-Processing
No ratings yet
Unit 1: Introduction and Data Pre-Processing
71 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
42 pages
Chap 1
No ratings yet
Chap 1
22 pages
Luce Ne Bootcamp
No ratings yet
Luce Ne Bootcamp
83 pages
Hibernate Search
No ratings yet
Hibernate Search
96 pages
Apache Lucene: Searching The Web and Everything Else
No ratings yet
Apache Lucene: Searching The Web and Everything Else
35 pages
Marc Krellenst's Session at Lucene Revolution 2011
No ratings yet
Marc Krellenst's Session at Lucene Revolution 2011
16 pages
Lucene
No ratings yet
Lucene
15 pages
The Sesame Lucenesail: RDF Queries With Full-Text Search: Nepomuk Technical Report 2008-1
No ratings yet
The Sesame Lucenesail: RDF Queries With Full-Text Search: Nepomuk Technical Report 2008-1
14 pages
Lucene 4 Cookbook - Sample Chapter
No ratings yet
Lucene 4 Cookbook - Sample Chapter
28 pages
Welcome To Lucene!
No ratings yet
Welcome To Lucene!
11 pages
Search Engine Functionality For LLP: Apache Lucene
No ratings yet
Search Engine Functionality For LLP: Apache Lucene
6 pages
Documentation Ir
No ratings yet
Documentation Ir
58 pages
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
From Everand
Elasticsearch Essentials: Harness the power of ElasticSearch to build and manage scalable search and analytics solutions with this fast-paced guide
Bharvi Dixit
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Mastering Splunk
From Everand
Mastering Splunk
James Miller
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

5 Indexing and Searching Big Data

Uploaded by

5 Indexing and Searching Big Data

Uploaded by

Searching and

Indexing Big Data

 Lucene is a high performance, scalable Information

 To search large amounts of text

 Searching is the process of

You might also like