OpenNLP - Quick Guide PDF
OpenNLP - Quick Guide PDF
OpenNLP - Overview
NLP is a set of tools used to derive meaningful and useful information from natural language sources such
as web pages and text documents.
Apache OpenNLP is an open-source Java library which is used to process natural language text. You can
build an efficient text processing service using this library.
OpenNLP provides services such as tokenization, sentence segmentation, part-of-speech tagging, named
entity extraction, chunking, parsing, and co-reference resolution, etc.
Features of OpenNLP
Following are the notable features of OpenNLP −
Named Entity Recognition (NER) − Open NLP supports NER, using which you can extract
names of locations, people and things even while processing queries.
Summarize − Using the summarize feature, you can summarize Paragraphs, articles,
documents or their collection in NLP.
Searching − In OpenNLP, a given search string or its synonyms can be identified in given text,
even though the given word is altered or misspelled.
Tagging (POS) − Tagging in NLP is used to divide the text into various grammatical elements for
further analysis.
Translation − In NLP, Translation helps in translating one language into another.
Information grouping − This option in NLP groups the textual information in the content of the
document, just like Parts of speech.
Natural Language Generation − It is used for generating information from a database and
automating the information reports such as weather analysis or medical reports.
Feedback Analysis − As the name implies, various types of feedbacks from people are
collected, regarding the products, by NLP to analyze how well the product is successful in
winning their hearts.
Speech recognition − Though it is difficult to analyze human speech, NLP has some builtin
features for this requirement.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 1/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The Apache OpenNLP library provides classes and interfaces to perform various tasks of natural
language processing such as sentence detection, tokenization, finding a name, tagging the parts of
speech, chunking a sentence, parsing, co-reference resolution, and document categorization.
In addition to these tasks, we can also train and evaluate our own models for any of these tasks.
OpenNLP CLI
In addition to the library, OpenNLP also provides a Command Line Interface (CLI), where we can train and
evaluate models. We will discuss this topic in detail in the last chapter of this tutorial.
To perform various NLP tasks, OpenNLP provides a set of predefined models. This set includes models
for different languages.
You can follow the steps given below to download the predefined models provided by OpenNLP.
Step 1 − Open the index page of OpenNLP models by clicking the following link −
http://opennlp.sourceforge.net/models-1.5/ .
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 2/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 2 − On visiting the given link, you will get to see a list of components of various languages and the
links to download them. Here, you can get the list of all the predefined models provided by OpenNLP.
Download all these models to the folder C:/OpenNLP_models/>, by clicking on their respective links. All
these models are language dependent and while using these, you have to make sure that the model
language matches with the language of the input text.
History of OpenNLP
In 2011, Apache OpenNLP 1.5.2 Incubating was released, and in the same year, it graduated as
a top-level Apache project.
OpenNLP - Environment
In this chapter, we will discuss how you can setup OpenNLP environment in your system. Let’s start with
the installation process.
Installing OpenNLP
Following are the steps to download Apache OpenNLP library in your system.
Step 1 − Open the homepage of Apache OpenNLP by clicking the following link −
https://opennlp.apache.org/ .
Step 2 − Now, click on the Downloads link. On clicking, you will be directed to a page where you can find
various mirrors which will redirect you to the Apache Software Foundation Distribution directory.
Step 3 − In this page you can find links to download various Apache distributions. Browse through them
and find the OpenNLP distribution and click it.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 4/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 4 − On clicking, you will be redirected to the directory where you can see the index of the OpenNLP
distribution, as shown below.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 5/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 5 − Each distribution provides Source and Binary files of OpenNLP library in various formats.
Download the source and binary files, apache-opennlp-1.6.0-bin.zip and apache-opennlp1.6.0-src.zip
(for Windows).
Step 2 − Click on the 'Environment Variables' button under the 'Advanced' tab.
Step 3 − Select the path variable and click the Edit button, as shown in the following screenshot.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 6/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 4 − In the Edit Environment Variable window, click the New button and add the path for OpenNLP
directory E:\apache-opennlp-1.6.0\bin and click the OK button, as shown in the following screenshot.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 7/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Eclipse Installation
You can set the Eclipse environment for OpenNLP library, either by setting the Build path to the JAR files
or by using pom.xml.
Step 1 − Make sure that you have Eclipse environment installed in your system.
Step 2 − Open Eclipse. Click File → New → Open a new project, as shown below.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 8/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 3 − You will get the New Project wizard. In this wizard, select Java project and proceed by clicking
the Next button.
Step 4 − Next, you will get the New Java Project wizard. Here, you need to create a new project and
click the Next button, as shown below.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 9/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 5 − After creating a new project, right-click on it, select Build Path and click Configure Build Path.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 10/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 6 − Next, you will get the Java Build Path wizard. Here, click the Add External JARs button, as
shown below.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 11/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Step 7 − Select the jar files opennlp-tools-1.6.0.jar and opennlp-uima-1.6.0.jar located in the lib folder
of apache-opennlp-1.6.0 folder.
On clicking the Open button in the above screen, the selected files will be added to your library.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 12/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
On clicking OK, you will successfully add the required JAR files to the current project and you can verify
these added libraries by expanding the Referenced Libraries, as shown below.
Using pom.xml
Convert the project into a Maven project and add the following code to its pom.xml.
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>myproject</groupId>
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 13/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
<artifactId>myproject</artifactId>
<version>0.0.1-SNAPSHOT</version>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-tools</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.opennlp</groupId>
<artifactId>opennlp-uima</artifactId>
<version>1.6.0</version>
</dependency>
</dependencies>
</project>
In this chapter, we will discuss about the classes and methods that we will be using in the subsequent
chapters of this tutorial.
Sentence Detection
SentenceModel class
This class represents the predefined model which is used to detect the sentences in the given raw text.
This class belongs to the package opennlp.tools.sentdetect.
The constructor of this class accepts an InputStream object of the sentence detector model file (en-
sent.bin).
SentenceDetectorME class
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 14/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
This class belongs to the package opennlp.tools.sentdetect and it contains methods to split the raw text
into sentences. This class uses a maximum entropy model to evaluate end-ofsentence characters in a
string to determine if they signify the end of a sentence.
sentDetect()
1 This method is used to detect the sentences in the raw text passed to it. It accepts a String
variable as a parameter and returns a String array which holds the sentences from the given
raw text.
sentPosDetect()
This method is used to detect the positions of the sentences in the given text. This method
accepts a string variable, representing the sentence and returns an array of objects of the type
2
Span.
The class named Span of the opennlp.tools.util package is used to store the start and end
integer of sets.
getSentenceProbabilities()
3 This method returns the probabilities associated with the most recent calls to sentDetect()
method.
Tokenization
TokenizerModel class
This class represents the predefined model which is used to tokenize the given sentence. This class
belongs to the package opennlp.tools.tokenizer.
The constructor of this class accepts a InputStream object of the tokenizer model file (entoken.bin).
Classes
To perform tokenization, the OpenNLP library provides three main classes. All the three classes
implement the interface called Tokenizer.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 15/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
SimpleTokenizer
1
This class tokenizes the given raw text using character classes.
WhitespaceTokenizer
2
This class uses whitespaces to tokenize the given text.
TokenizerME
3 This class converts raw text in to separate tokens. It uses Maximum Entropy to make its
decisions.
tokenize()
1 This method is used to tokenize the raw text. This method accepts a String variable as a
parameter, and returns an array of Strings (tokens).
sentPosDetect()
2 This method is used to get the positions or spans of the tokens. It accepts the sentence (or) raw
text in the form of the string and returns an array of objects of the type Span.
In addition to the above two methods, the TokenizerME class has the getTokenProbabilities() method.
getTokenProbabilities()
1 This method is used to get the probabilities associated with the most recent calls to the
tokenizePos() method.
NameEntityRecognition
TokenNameFinderModel class
This class represents the predefined model which is used to find the named entities in the given sentence.
This class belongs to the package opennlp.tools.namefind.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 16/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The constructor of this class accepts a InputStream object of the name finder model file (enner-
person.bin).
NameFinderME class
The class belongs to the package opennlp.tools.namefind and it contains methods to perform the NER
tasks. This class uses a maximum entropy model to find the named entities in the given raw text.
find()
1 This method is used to detect the names in the raw text. It accepts a String variable
representing the raw text as a parameter and, returns an array of objects of the type Span.
probs()
2
This method is used to get the probabilities of the last decoded sequence.
POSModel class
This class represents the predefined model which is used to tag the parts of speech of the given
sentence. This class belongs to the package opennlp.tools.postag.
The constructor of this class accepts a InputStream object of the pos-tagger model file (enpos-
maxent.bin).
POSTaggerME class
This class belongs to the package opennlp.tools.postag and it is used to predict the parts of speech of
the given raw text. It uses Maximum Entropy to make its decisions.
tag()
1 This method is used to assign the sentence of tokens POS tags. This method accepts an array
of tokens (String) as a parameter, and returns a tags (array).
getSentenceProbabilities()
2
This method is used to get the probabilities for each tag of the recently tagged sentence.
ParserModel class
This class represents the predefined model which is used to parse the given sentence. This class belongs
to the package opennlp.tools.parser.
The constructor of this class accepts a InputStream object of the parser model file (en-
parserchunking.bin).
This class belongs to the package opennlp.tools.parser and it is used to create parsers.
create()
1 This is a static method and it is used to create a parser object. This method accepts the
Filestream object of the parser model file.
ParserTool class
This class belongs to the opennlp.tools.cmdline.parser package and, it is used to parse the content.
parseLine()
This method of the ParserTool class is used to parse the raw text in OpenNLP. This method
accepts −
1
A String variable representing the text to be parsed.
A parser object.
An integer representing the no.of parses to be carried out.
Chunking
ChunkerModel class
This class represents the predefined model which is used to divide a sentence into smaller chunks. This
class belongs to the package opennlp.tools.chunker.
The constructor of this class accepts a InputStream object of the chunker model file (enchunker.bin).
ChunkerME class
This class belongs to the package named opennlp.tools.chunker and it is used to divide the given
sentence in to smaller chunks.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 18/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
chunk()
1 This method is used to divide the given sentence in to smaller chunks. It accepts tokens of a
sentence and Parts Of Speech tags as parameters.
probs()
2
This method returns the probabilities of the last decoded sequence.
While processing a natural language, deciding the beginning and end of the sentences is one of the
problems to be addressed. This process is known as Sentence Boundary Disambiguation (SBD) or simply
sentence breaking.
The techniques we use to detect the sentences in the given text, depends on the language of the text.
We can detect the sentences in the given text in Java using, Regular Expressions, and a set of simple
rules.
For example, let us assume a period, a question mark, or an exclamation mark ends a sentence in the
given text, then we can split the sentence using the split() method of the String class. Here, we have to
pass a regular expression in String format.
Following is the program which determines the sentences in a given text using Java regular expressions
(split method). Save this program in a file with the name SentenceDetection_RE.java.
String sentence = " Hi. How are you? Welcome to Tutorialspoint. "
+ "We provide free tutorials on various technologies";
Compile and execute the saved java file from the command prompt using the following commands.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 19/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
javac SentenceDetection_RE.java
java SentenceDetection_RE
On executing, the above program creates a PDF document displaying the following message.
Hi
How are you
Welcome to Tutorialspoint
We provide free tutorials on various technologies
To detect sentences, OpenNLP uses a predefined model, a file named en-sent.bin. This predefined
model is trained to detect sentences in a given raw text.
The opennlp.tools.sentdetect package contains the classes and interfaces that are used to perform the
sentence detection task.
Following are the steps to be followed to write a program which detects the sentences from the given raw
text.
The model for sentence detection is represented by the class named SentenceModel, which belongs to
the package opennlp.tools.sentdetect.
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of
the model in String format to its constructor).
Instantiate the SentenceModel class and pass the InputStream (object) of the model as a
parameter to its constructor as shown in the following code block −
The SentenceDetectorME class of the package opennlp.tools.sentdetect contains methods to split the
raw text into sentences. This class uses the Maximum Entropy model to evaluate end-of-sentence
characters in a string to determine if they signify the end of a sentence.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 20/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Instantiate this class and pass the model object created in the previous step, as shown below.
The sentDetect() method of the SentenceDetectorME class is used to detect the sentences in the raw
text passed to it. This method accepts a String variable as a parameter.
Invoke this method by passing the String format of the sentence to this method.
Example
Following is the program which detects the sentences in a given raw text. Save this program in a file with
named SentenceDetectionME.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
Compile and execute the saved Java file from the Command prompt using the following commands −
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 21/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
javac SentenceDetectorME.java
java SentenceDetectorME
On executing, the above program reads the given String and detects the sentences in it and displays the
following output.
We can also detect the positions of the sentences using the sentPosDetect() method of the
SentenceDetectorME class.
Following are the steps to be followed to write a program which detects the positions of the sentences
from the given raw text.
The model for sentence detection is represented by the class named SentenceModel, which belongs to
the package opennlp.tools.sentdetect.
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of
the model in String format to its constructor).
Instantiate the SentenceModel class and pass the InputStream (object) of the model as a
parameter to its constructor, as shown in the following code block.
The SentenceDetectorME class of the package opennlp.tools.sentdetect contains methods to split the
raw text into sentences. This class uses the Maximum Entropy model to evaluate end-of-sentence
characters in a string to determine if they signify the end of a sentence.
Instantiate this class and pass the model object created in the previous step.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 22/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The sentPosDetect() method of the SentenceDetectorME class is used to detect the positions of the
sentences in the raw text passed to it. This method accepts a String variable as a parameter.
Invoke this method by passing the String format of the sentence as a parameter to this method.
The sentPosDetect() method of the SentenceDetectorME class returns an array of objects of the type
Span. The class named Span of the opennlp.tools.util package is used to store the start and end integer
of sets.
You can store the spans returned by the sentPosDetect() method in the Span array and print them, as
shown in the following code block.
Example
Following is the program which detects the sentences in the given raw text. Save this program in a file
with named SentenceDetectionME.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 23/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SentencePosDetection.java
java SentencePosDetection
On executing, the above program reads the given String and detects the sentences in it and displays the
following output.
[0..16)
[17..43)
[44..93)
Following is the program to detect the sentences from the given raw text and display them along with their
positions. Save this program in a file with name SentencesAndPosDetection.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 24/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SentencesAndPosDetection.java
java SentencesAndPosDetection
On executing, the above program reads the given String and detects the sentences along with their
positions and displays the following output.
Following is the program to print the probabilities associated with the calls to the sentDetect() method.
Save this program in a file with the name SentenceDetectionMEProbs.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.sentdetect.SentenceDetectorME;
import opennlp.tools.sentdetect.SentenceModel;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 25/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
System.out.println(" ");
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SentenceDetectionMEProbs.java
java SentenceDetectionMEProbs
On executing, the above program reads the given String and detects the sentences and prints them. In
addition, it also returns the probabilities associated with the most recent calls to the sentDetect() method,
as shown below.
0.9240246995179983
0.9957680129995953
1.0
OpenNLP - Tokenization
The process of chopping the given sentence into smaller parts (tokens) is known as tokenization. In
general, the given raw text is tokenized based on a set of delimiters (mostly whitespaces).
Tokenization is used in tasks such as spell-checking, processing searches, identifying parts of speech,
sentence detection, document classification of documents, etc.
The opennlp.tools.tokenize package contains the classes and interfaces that are used to perform
tokenization.
To tokenize the given sentences into simpler fragments, the OpenNLP library provides three different
classes −
SimpleTokenizer − This class tokenizes the given raw text using character classes.
TokenizerME − This class converts raw text into separate tokens. It uses Maximum Entropy to
make its decisions.
SimpleTokenizer
Following are the steps to be followed to write a program which tokenizes the given raw text.
In both the classes, there are no constructors available to instantiate them. Therefore, we need to create
objects of these classes using the static variable INSTANCE.
After tokenizing the sentence, you can print the tokens using for loop, as shown below.
Example
Following is the program which tokenizes the given sentence using the SimpleTokenizer class. Save this
program in a file with the name SimpleTokenizerExample.java.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 27/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
import opennlp.tools.tokenize.SimpleTokenizer;
public class SimpleTokenizerExample {
public static void main(String args[]){
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SimpleTokenizerExample.java
java SimpleTokenizerExample
On executing, the above program reads the given String (raw text), tokenizes it, and displays the following
output −
Hi
.
How
are
you
?
Welcome
to
Tutorialspoint
.
We
provide
free
tutorials
on
various
technologies
WhitespaceTokenizer
Following are the steps to be followed to write a program which tokenizes the given raw text.
In both the classes, there are no constructors available to instantiate them. Therefore, we need to create
objects of these classes using the static variable INSTANCE.
Both these classes contain a method called tokenize(). This method accepts a raw text in String format.
On invoking, it tokenizes the given String and returns an array of Strings (tokens).
After tokenizing the sentence, you can print the tokens using for loop, as shown below.
Example
Following is the program which tokenizes the given sentence using the WhitespaceTokenizer class.
Save this program in a file with the name WhitespaceTokenizerExample.java.
import opennlp.tools.tokenize.WhitespaceTokenizer;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 29/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac WhitespaceTokenizerExample.java
java WhitespaceTokenizerExample
On executing, the above program reads the given String (raw text), tokenizes it, and displays the following
output.
Hi.
How
are
you?
Welcome
to
Tutorialspoint.
We
provide
free
tutorials
on
various
technologies
TokenizerME class
OpenNLP also uses a predefined model, a file named de-token.bin, to tokenize the sentences. It is trained
to tokenize the sentences in a given raw text.
The TokenizerME class of the opennlp.tools.tokenizer package is used to load this model, and tokenize
the given raw text using OpenNLP library. To do so, you need to −
Following are the steps to be followed to write a program which tokenizes the sentences from the given
raw text using the TokenizerME class.
The model for tokenization is represented by the class named TokenizerModel, which belongs to the
package opennlp.tools.tokenize.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 30/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of
the model in String format to its constructor).
Instantiate the TokenizerModel class and pass the InputStream (object) of the model as a
parameter to its constructor, as shown in the following code block.
The TokenizerME class of the package opennlp.tools.tokenize contains methods to chop the raw text
into smaller parts (tokens). It uses Maximum Entropy to make its decisions.
Instantiate this class and pass the model object created in the previous step as shown below.
The tokenize() method of the TokenizerME class is used to tokenize the raw text passed to it. This
method accepts a String variable as a parameter, and returns an array of Strings (tokens).
Invoke this method by passing the String format of the sentence to this method, as follows.
Example
Following is the program which tokenizes the given raw text. Save this program in a file with the name
TokenizerMEExample.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
Compile and execute the saved Java file from the Command prompt using the following commands −
javac TokenizerMEExample.java
java TokenizerMEExample
On executing, the above program reads the given String and detects the sentences in it and displays the
following output −
Hi
.
How
are
you
?
Welcome
to
Tutorialspoint
.
We
provide
free
tutorials
on
various
technologie
We can also get the positions or spans of the tokens using the tokenizePos() method. This is the method
of the Tokenizer interface of the package opennlp.tools.tokenize. Since all the (three) Tokenizer classes
implement this interface, you can find this method in all of them.
This method accepts the sentence or raw text in the form of a string and returns an array of objects of the
type Span.
You can get the positions of the tokens using the tokenizePos() method, as follows −
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 32/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The class named Span of the opennlp.tools.util package is used to store the start and end integer of
sets.
You can store the spans returned by the tokenizePos() method in the Span array and print them, as
shown in the following code block.
The substring() method of the String class accepts the begin and the end offsets and returns the
respective string. We can use this method to print the tokens and their spans (positions) together, as
shown in the following code block.
Example(SimpleTokenizer)
Following is the program which retrieves the token spans of the raw text using the SimpleTokenizer
class. It also prints the tokens along with their positions. Save this program in a file with named
SimpleTokenizerSpans.java.
import opennlp.tools.tokenize.SimpleTokenizer;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 33/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac SimpleTokenizerSpans.java
java SimpleTokenizerSpans
On executing, the above program reads the given String (raw text), tokenizes it, and displays the following
output −
[0..2) Hi
[2..3) .
[4..7) How
[8..11) are
[12..15) you
[15..16) ?
[17..24) Welcome
[25..27) to
[28..42) Tutorialspoint
[42..43) .
[44..46) We
[47..54) provide
[55..59) free
[60..69) tutorials
[70..72) on
[73..80) various
[81..93) technologies
Example (WhitespaceTokenizer)
Following is the program which retrieves the token spans of the raw text using the WhitespaceTokenizer
class. It also prints the tokens along with their positions. Save this program in a file with the name
WhitespaceTokenizerSpans.java.
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.Span;
public class WhitespaceTokenizerSpans {
public static void main(String args[]){
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 34/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
"+sent.substring(token.getStart(), token.getEnd()));
}
}
Compile and execute the saved java file from the command prompt using the following commands
javac WhitespaceTokenizerSpans.java
java WhitespaceTokenizerSpans
On executing, the above program reads the given String (raw text), tokenizes it, and displays the following
output.
[0..3) Hi.
[4..7) How
[8..11) are
[12..16) you?
[17..24) Welcome
[25..27) to
[28..43) Tutorialspoint.
[44..46) We
[47..54) provide
[55..59) free
[60..69) tutorials
[70..72) on
[73..80) various
[81..93) technologies
Example (TokenizerME)
Following is the program which retrieves the token spans of the raw text using the TokenizerME class. It
also prints the tokens along with their positions. Save this program in a file with the name
TokenizerMESpans.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 35/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac TokenizerMESpans.java
java TokenizerMESpans
On executing, the above program reads the given String (raw text), tokenizes it, and displays the following
output −
[0..5) Hello
[6..10) John
[11..14) how
[15..18) are
[19..22) you
[23..30) welcome
[31..33) to
[34..48) Tutorialspoint
Tokenizer Probability
The getTokenProbabilities() method of the TokenizerME class is used to get the probabilities associated
with the most recent calls to the tokenizePos() method.
Following is the program to print the probabilities associated with the calls to tokenizePos() method. Save
this program in a file with the name TokenizerMEProbs.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 36/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac TokenizerMEProbs.java
java TokenizerMEProbs
On executing, the above program reads the given String and tokenizes the sentences and prints them. In
addition, it also returns the probabilities associated with the most recent calls to the tokenizerPos()
method.
[0..5) Hello
[6..10) John
[11..14) how
[15..18) are
[19..22) you
[23..30) welcome
[31..33) to
[34..48) Tutorialspoint
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 37/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The process of finding names, people, places, and other entities, from a given text is known as Named
Entity Recognition (NER). In this chapter, we will discuss how to carry out NER through Java program
using OpenNLP library.
To perform various NER tasks, OpenNLP uses different predefined models namely, en-nerdate.bn, en-
ner-location.bin, en-ner-organization.bin, en-ner-person.bin, and en-ner-time.bin. All these files are
predefined models which are trained to detect the respective entities in a given raw text.
The opennlp.tools.namefind package contains the classes and interfaces that are used to perform the
NER task. To perform NER task using OpenNLP library, you need to −
Following are the steps to be followed to write a program which detects the name entities from a given
raw text.
The model for sentence detection is represented by the class named TokenNameFinderModel, which
belongs to the package opennlp.tools.namefind.
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of
the appropriate NER model in String format to its constructor).
Instantiate the TokenNameFinderModel class and pass the InputStream (object) of the model
as a parameter to its constructor, as shown in the following code block.
The NameFinderME class of the package opennlp.tools.namefind contains methods to perform the
NER tasks. This class uses the Maximum Entropy model to find the named entities in the given raw text.
Instantiate this class and pass the model object created in the previous step as shown below −
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 38/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The find() method of the NameFinderME class is used to detect the names in the raw text passed to it.
This method accepts a String variable as a parameter.
Invoke this method by passing the String format of the sentence to this method.
The find() method of the NameFinderME class returns an array of objects of the type Span. The class
named Span of the opennlp.tools.util package is used to store the start and end integer of sets.
You can store the spans returned by the find() method in the Span array and print them, as shown in the
following code block.
NER Example
Following is the program which reads the given sentence and recognizes the spans of the names of the
persons in it. Save this program in a file with the name NameFinderME_Example.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 39/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
};
Compile and execute the saved Java file from the Command prompt using the following commands −
javac NameFinderME_Example.java
java NameFinderME_Example
On executing, the above program reads the given String (raw text), detects the names of the persons in it,
and displays their positions (spans), as shown below.
[0..1) person
[2..3) person
for(Span s: nameSpans)
System.out.println(s.toString()+" "+tokens[s.getStart()]);
Following is the program to detect the names from the given raw text and display them along with their
positions. Save this program in a file with the name NameFinderSentences.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 40/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac NameFinderSentences.java
java NameFinderSentences
On executing, the above program reads the given String (raw text), detects the names of the persons in it,
and displays their positions (spans) as shown below.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 41/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
Compile and execute the saved Java file from the Command prompt using the following commands −
javac LocationFinder.java
java LocationFinder
On executing, the above program reads the given String (raw text), detects the names of the persons in it,
and displays their positions (spans), as shown below.
NameFinder Probability
The probs()method of the NameFinderME class is used to get the probabilities of the last decoded
sequence.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 42/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Following is the program to print the probabilities. Save this program in a file with the name
TokenizerMEProbs.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
public class TokenizerMEProbs {
public static void main(String args[]) throws Exception{
String sent = "Hello John how are you welcome to Tutorialspoint";
Compile and execute the saved Java file from the Command prompt using the following commands −
javac TokenizerMEProbs.java
java TokenizerMEProbs
On executing, the above program reads the given String, tokenizes the sentences, and prints them. In
addition, it also returns the probabilities of the last decoded sequence, as shown below.
[0..5) Hello
[6..10) John
[11..14) how
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 43/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
[15..18) are
[19..22) you
[23..30) welcome
[31..33) to
[34..48) Tutorialspoint
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
Using OpenNLP, you can also detect the Parts of Speech of a given sentence and print them. Instead of
full name of the parts of speech, OpenNLP uses short forms of each parts of speech. The following table
indicates the various parts of speeches detected by OpenNLP and their meanings.
DT Determiner
TO to
JJ Adjective
The POSTaggerME class of the opennlp.tools.postag package is used to load this model, and tag the
parts of speech of the given raw text using OpenNLP library. To do so, you need to −
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 44/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Following are the steps to be followed to write a program which tags the parts of the speech in the given
raw text using the POSTaggerME class.
The model for POS tagging is represented by the class named POSModel, which belongs to the package
opennlp.tools.postag.
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of
the model in String format to its constructor).
Instantiate the POSModel class and pass the InputStream (object) of the model as a parameter
to its constructor, as shown in the following code block −
The POSTaggerME class of the package opennlp.tools.postag is used to predict the parts of speech of
the given raw text. It uses Maximum Entropy to make its decisions.
Instantiate this class and pass the model object created in the previous step, as shown below −
The tokenize() method of the whitespaceTokenizer class is used to tokenize the raw text passed to it.
This method accepts a String variable as a parameter, and returns an array of Strings (tokens).
Instantiate the whitespaceTokenizer class and the invoke this method by passing the String format of the
sentence to this method.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 45/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The tag() method of the whitespaceTokenizer class assigns POS tags to the sentence of tokens. This
method accepts an array of tokens (String) as a parameter and returns tag (array).
Invoke the tag() method by passing the tokens generated in the previous step to it.
//Generating tags
String[] tags = tagger.tag(tokens);
The POSSample class represents the POS-tagged sentence. To instantiate this class, we would require
an array of tokens (of the text) and an array of tags.
The toString() method of this class returns the tagged sentence. Instantiate this class by passing the
token and the tag arrays created in the previous steps and invoke its toString() method, as shown in the
following code block.
Example
Following is the program which tags the parts of speech in a given raw text. Save this program in a file
with the name PosTaggerExample.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 46/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
//Generating tags
String[] tags = tagger.tag(tokens);
}
}
Compile and execute the saved Java file from the Command prompt using the following commands −
javac PosTaggerExample.java
java PosTaggerExample
On executing, the above program reads the given text and detects the parts of speech of these sentences
and displays them, as shown below.
Following is the program which tags the parts of speech of a given raw text. It also monitors the
performance and displays the performance of the tagger. Save this program in a file with the name
PosTagger_Performance.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.cmdline.PerformanceMonitor;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 47/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
//Generating tags
String[] tags = tagger.tag(tokens);
Compile and execute the saved Java file from the Command prompt using the following commands −
javac PosTaggerExample.java
java PosTaggerExample
On executing, the above program reads the given text and tags the parts of speech of these sentences
and displays them. In addition, it also monitors the performance of the POS tagger and displays it.
The probs() method of the POSTaggerME class is used to find the probabilities for each tag of the
recently tagged sentence.
Following is the program which displays the probabilities for each tag of the last tagged sentence. Save
this program in a file with the name PosTaggerProbs.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.postag.POSModel;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 48/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
import opennlp.tools.postag.POSSample;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
//Generating tags
String[] tags = tagger.tag(tokens);
Compile and execute the saved Java file from the Command prompt using the following commands −
javac TokenizerMEProbs.java
java TokenizerMEProbs
On executing, the above program reads the given raw text, tags the parts of speech of each token in it,
and displays them. In addition, it also displays the probabilities for each parts of speech in the given
sentence, as shown below.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 49/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
0.42983612874819177
0.8584513635863117
0.4394784478206072
Using OpenNLP API, you can parse the given sentences. In this chapter, we will discuss how to parse raw
text using OpenNLP API.
The Parser class of the opennlp.tools.Parser package is used to hold the parse constituents and the
ParserTool class of the opennlp.tools.cmdline.parser package is used to parse the content.
Following are the steps to be followed to write a program which parses the given raw text using the
ParserTool class.
The model for parsing text is represented by the class named ParserModel, which belongs to the
package opennlp.tools.parser.
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of
the model in String format to its constructor).
Instantiate the ParserModel class and pass the InputStream (object) of the model as a
parameter to its constructor, as shown in the following code block.
The Parser class of the package opennlp.tools.parser represents a data structure for holding parse
constituents. You can create an object of this class using the static create() method of the ParserFactory
class.
Invoke the create() method of the ParserFactory by passing the model object created in the previous
step, as shown below −
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 50/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
The parseLine() method of the ParserTool class is used to parse the raw text in OpenNLP. This method
accepts −
a parser object.
Invoke this method by passing the sentence the following parameters: the parse object created in the
previous steps, and an integer representing the required number of parses to be carried out.
Example
Following is the program which parses the given raw text. Save this program in a file with the name
ParserExample.java.
import java.io.FileInputStream;
import java.io.InputStream;
import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;
//Creating a parser
Parser parser = ParserFactory.create(model);
Compile and execute the saved Java file from the Command prompt using the following commands −
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 51/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
javac ParserExample.java
java ParserExample
On executing, the above program reads the given raw text, parses it, and displays the following output −
(TOP (S (NP (NN Tutorialspoint)) (VP (VBZ is) (NP (DT the) (JJS largest) (NN
tutorial) (NN library.)))))
Chunking a sentences refers to breaking/dividing a sentence into parts of words such as word groups and
verb groups.
To detect the sentences, OpenNLP uses a model, a file named en-chunker.bin. This is a predefined
model which is trained to chunk the sentences in the given raw text.
The opennlp.tools.chunker package contains the classes and interfaces that are used to find non-
recursive syntactic annotation such as noun phrase chunks.
You can chunk a sentence using the method chunk() of the ChunkerME class. This method accepts
tokens of a sentence and POS tags as parameters. Therefore, before starting the process of chunking,
first of all you need to Tokenize the sentence and generate the parts POS tags of it.
Following are the steps to be followed to write a program to chunk sentences from the given raw text.
Tokenize the sentences using the tokenize() method of the whitespaceTokenizer class, as shown in the
following code block.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 52/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Generate the POS tags of the sentence using the tag() method of the POSTaggerME class, as shown in
the following code block.
The model for chunking a sentence is represented by the class named ChunkerModel, which belongs to
the package opennlp.tools.chunker.
Create an InputStream object of the model (Instantiate the FileInputStream and pass the path of
the model in String format to its constructor).
Instantiate the ChunkerModel class and pass the InputStream (object) of the model as a
parameter to its constructor, as shown in the following code block −
The chunkerME class of the package opennlp.tools.chunker contains methods to chunk the sentences.
This is a maximum-entropy-based chunker.
Instantiate this class and pass the model object created in the previous step.
The chunk() method of the ChunkerME class is used to chunk the sentences in the raw text passed to it.
This method accepts two String arrays representing tokens and tags, as parameters.
Invoke this method by passing the token array and tag array created in the previous steps as parameters.
Example
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 53/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Following is the program to chunk the sentences in the given raw text. Save this program in a file with the
name ChunkerExample.java.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
Compile and execute the saved Java file from the Command prompt using the following command −
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 54/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
javac ChunkerExample.java
java ChunkerExample
On executing, the above program reads the given String and chunks the sentences in it, and displays
them as shown below.
You can store the spans returned by the chunkAsSpans() method in the Span array and print them, as
shown in the following code block.
Example
Following is the program which detects the sentences in the given raw text. Save this program in a file
with the name ChunkerSpansEample.java.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.Span;
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 55/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Compile and execute the saved Java file from the Command prompt using the following commands −
javac ChunkerSpansEample.java
java ChunkerSpansEample
On executing, the above program reads the given String and spans of the chunks in it, and displays the
following output −
Following is the program to print the probabilities of the last decoded sequence by the chunker. Save this
program in a file with the name ChunkerProbsExample.java.
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 56/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.cmdline.postag.POSModelLoader;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.WhitespaceTokenizer;
Compile and execute the saved Java file from the Command prompt using the following commands −
javac ChunkerProbsExample.java
java ChunkerProbsExample
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 57/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
On executing, the above program reads the given String, chunks it, and prints the probabilities of the last
decoded sequence.
0.9592746040797778
0.6883933131241501
0.8830563473996004
0.8951150529746051
OpenNLP provides a Command Line Interface (CLI) to carry out different operations through the
command line. In this chapter, we will take some examples to show how we can use the OpenNLP
Command Line Interface.
Tokenizing
input.txt
Hi. How are you? Welcome to Tutorialspoint. We provide free tutorials on various technologies
Syntax
command
output
output.txt
Hi . How are you ? Welcome to Tutorialspoint . We provide free tutorials on various technologies
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 58/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Sentence Detection
input.txt
Hi. How are you? Welcome to Tutorialspoint. We provide free tutorials on various technologies
Syntax
command
Output
Output_sendet.txt
input.txt
Syntax
Command
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 59/60
5/22/2020 OpenNLP - Quick Guide - Tutorialspoint
Output
Input.txt
Hi. How are you? Welcome to Tutorialspoint. We provide free tutorials on various technologies
Syntax
Command
Output
https://www.tutorialspoint.com/opennlp/opennlp_quick_guide.htm 60/60