Chapter 1. Introduction: List of Tables
Chapter 1. Introduction: List of Tables
Postag
POSTagger
POSTaggerTrainer
POSTaggerEvaluator
POSTaggerCrossValidator
POSTaggerConverter
Lemmatizer
LemmatizerME
LemmatizerTrainerME
LemmatizerEvaluator
Chunker
ChunkerME
ChunkerTrainerME
ChunkerEvaluator
ChunkerCrossValidator
ChunkerConverter
Parser
Parser
ParserTrainer
ParserEvaluator
ParserConverter
BuildModelUpdater
CheckModelUpdater
TaggerModelReplacer
Entitylinker
EntityLinker
Languagemodel
NGramLanguageModel
List of Tables
2.1. Normalizers
5.1. Feature Generators
Chapter 1. Introduction
Table of Contents
Description
General Library Structure
Application Program Interface (API). Generic Example
Command line interface (CLI)
Description
List of tools
Setting up
Generic Example
Description
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
It supports the
most common NLP tasks, such as tokenization, sentence segmentation,
part-of-speech tagging, named entity extraction,
chunking, parsing, and coreference resolution.
These tasks are usually required to build more advanced text processing services.
OpenNLP also includes maximum entropy and perceptron based machine learning.
The goal of the OpenNLP project will be to create a mature toolkit for the abovementioned tasks.
An additional goal is to
provide a large number of pre-built models for a variety of languages, as
well as the annotated text resources that those models
are derived from.
https://opennlp.apache.org/docs/1.9.3/manual/opennlp.html 4/64
8/10/2021 Apache OpenNLP Developer Documentation
SomeModel model = new SomeModel(modelIn);
After the tool is instantiated, the processing task can be executed. The input and the
output formats are specific to the tool, but
often the output is an array of String,
and the input is a String or an array of String.
OpenNLP provides a command line script, serving as a unique entry point to all
included tools. The script is located in the bin
directory of OpenNLP binary
distribution. Included are versions for Windows: opennlp.bat and Linux or
compatible systems:
opennlp.
List of tools
Setting up
OpenNLP script uses JAVA_CMD and JAVA_HOME variables to determine which command to
use to execute Java virtual
machine.
OpenNLP script uses OPENNLP_HOME variable to determine the location of the binary
distribution of OpenNLP. It is
recommended to point this variable to the binary
distribution of current OpenNLP version and update PATH variable to include
$OPENNLP_HOME/bin or %OPENNLP_HOME%\bin.
Generic Example
Apache OpenNLP provides a common command line script to access all its tools:
$ opennlp
This script prints current version of the library and lists all available tools:
DictionaryDetokenizer
TokenNameFinderEvaluator Measures the performance of the NameFinder model with the reference data
POSTaggerEvaluator Measures the performance of the POS tagger model with the reference data
ChunkerEvaluator Measures the performance of the Chunker model with the reference data
ParserEvaluator Measures the performance of the Parser model with the referen
BuildModelUpdater trains and updates the build model in a parser model
https://opennlp.apache.org/docs/1.9.3/manual/opennlp.html 5/64