Elasticsearch Optimization

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Elasticsearch: Store, Search, and Analyze

By Ketan Bansal
What is Elasticsearch?

● Elasticsearch is the distributed search and analytics engine at


the heart of Elastic Stack.

● It provides near real-time search and analytics for all types of


data(structured, unstructured, numerical or geospatial data)
● It can efficiently stores and index data in a way that supports
fast searches
● You can even go far beyond from simple data retrieval and
aggregate information to discover trends and pattern in your
data
What is Elasticsearch?

● Elasticsearch offers speed and flexibility to handle data in a wide


variety of cases:
* Add a search box to an app or website
* Store and analyze logs, metrics, and security event data
* Use ML to automatically model the behaviour of the data in
real-time and etc.
A. Create and Delete an Index ( Elasticsearch using Python)
B. Insert and Get Query ( Elasticsearch using Python)
C. Search Query ( Elasticsearch using Python)
D. Mapping ( Elasticsearch using Python)
D.1. Mapping ( Elasticsearch using Python)
D.2. Custom-Mapping ( Elasticsearch using Python)
Kibana: Explore, Visualize, and Share

By Your Name
What is Kibana?

● Kibana enables you to interactively explore, visualize, and share insights


into your data and manage and monitor the Elastic Stack.

● With Kibana, We can:

* Search, Observe, and Protect the data - From discovering documents


to analyzing logs to finding security vulnerabilities
* Analyze your data - Search for hidden Insights, visualyze what we’ve
found in charts, maps, and more, and combine them in a Dashboard
* Manage, Monitor, and Secure the Elastic Stack - Manage your data,
monitor the health of ES and manage accesses to the features
Add Data

● The best way to add data to Elastic Stack is to use one of the integrations
from Kibana Dashboard such as:

1. Add Data with Elastic Solutions - Website Search crawler, Elastic APM,
Endpoint Security

2. Add Data with Programming Languages - Add any data in ES using any
programming language, such as JavaScript, JAVA, Python and Ruby

3. Add Sample Data - Sample data sets come with sample visualizations,
dashboards, and more you to explore data before you add your own data

4. Upload a file - If you have a CSV, TSV, or JSON file you can upload it
and optionally import it into Elasticsearch
Kibana Query Language (KQL)

● KQL is a simple syntax for filtering Elasticsearch data using free text
search or field-based search

● It is only used in filtering data, and has no role in sorting or aggregating


data
● It is able to query nested fields and scripting fields, and does not support
regular expressions or searching with fuzzy terms
Logstash: Collect, Enrich, and Transport

By Your Name
What is Logstash?

● Logstash is an open-source data collection engine with real-time pipeline


capabilities
*Logstash event processing pipeline had 3 stages-
Inputs→filters→outputs
*Inputs generates events, filters modify them, and outputs ship them
elsewhere

● It can dynamically unify data from disparate sources and normalize the
data into the destination of our choice
● Cleanse and Democratize all the data for diverse advanced downstream
analytics and visualization use cases
Natural Language Toolkit (NLTK)

By Your Name
What is NLTK?

● Natural Language Toolkit(NLTK) is a suite of open-source python


modules, data sets, and tutorials supporting research and development in
Natural Language Processing

● A variety of text processing tasks can be performed using NLTK such as


tokenizing, stemming, lemmatization, tagging Parts of Speech etc.
Tokenizing

● By tokenizing, you can easily split up text by word or by sentence

● Convert whole text into various pieces of smaller text that are still
relatively meaningful outside from the main text (converting unstructured
data into structured data)

* Tokenizing by Words : Tokenizing by word allows you to identify words


that come up more often

word_tokenize(your_text) is the class that is used to tokenize your text into


words
Tokenizing

* Tokenizing by Sentence: When we tokenize by sentence, we can analyze


how those words are related to one another and see more context

sent_tokenize(your_text) is the class that is used to tokenize your text into


sentences

NOTE: Before using these classes, you need to first import relevant part of
NLTK
Stemming

● Stemming is a text processing task in which you reduce words to their


roots, which is a core part of a word

● For Example, “helping” and “helper” share the same root i.e. “help”

● NLTK has more than one stemmer, but we’ll use Porter Stemmer
Stemming

Where “words” is a list of tokenized words


Tagging Parts of Speech

● Tagging Parts of Speech, or POS tagging, is the task of labelling the


words in our text according to the parts of speech

● NLTK uses the word determiner to refer to articles(like “a” or “the”)

● nltk.pos_tag() is the library used for tagging, giving the output as tuple
values
Lemmatizing: Like Stemming, Lemmatizing reduces words to their core
meaning, but it’ll give you a complete English word that makes sense of its
own instead of just a fragment of a word like “discoveri”
Elasticsearch practice :
https://github.com/S19CRXPP0098/Practice/blob/main/Elasticsearch_Pr
actice.ipynb

NLTK practice :
https://github.com/S19CRXPP0098/Practice/blob/main/NLTK_Practice.
ipynb
THANK YOU

You might also like