0% found this document useful (0 votes)
63 views2 pages

Chapter 1. Introduction: List of Tables

The document summarizes the Apache OpenNLP library, which is a machine learning toolkit for natural language processing tasks like tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. It describes the main components and APIs for common NLP tasks as well as the command line interface for experiments and training.

Uploaded by

Safdar Husain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views2 pages

Chapter 1. Introduction: List of Tables

The document summarizes the Apache OpenNLP library, which is a machine learning toolkit for natural language processing tasks like tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. It describes the main components and APIs for common NLP tasks as well as the command line interface for experiments and training.

Uploaded by

Safdar Husain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

8/10/2021 Apache OpenNLP Developer Documentation

Postag

POSTagger
POSTaggerTrainer
POSTaggerEvaluator
POSTaggerCrossValidator
POSTaggerConverter

Lemmatizer

LemmatizerME
LemmatizerTrainerME
LemmatizerEvaluator

Chunker

ChunkerME
ChunkerTrainerME
ChunkerEvaluator
ChunkerCrossValidator
ChunkerConverter

Parser

Parser
ParserTrainer
ParserEvaluator
ParserConverter
BuildModelUpdater
CheckModelUpdater
TaggerModelReplacer

Entitylinker

EntityLinker

Languagemodel

NGramLanguageModel

List of Tables

2.1. Normalizers
5.1. Feature Generators

Chapter 1. Introduction
Table of Contents

Description
General Library Structure
Application Program Interface (API). Generic Example
Command line interface (CLI)

Description
List of tools
Setting up
Generic Example

Description
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
It supports the
most common NLP tasks, such as tokenization, sentence segmentation,
part-of-speech tagging, named entity extraction,
chunking, parsing, and coreference resolution.
These tasks are usually required to build more advanced text processing services.
OpenNLP also includes maximum entropy and perceptron based machine learning.

The goal of the OpenNLP project will be to create a mature toolkit for the abovementioned tasks.
An additional goal is to
provide a large number of pre-built models for a variety of languages, as
well as the annotated text resources that those models
are derived from.

General Library Structure


The Apache OpenNLP library contains several components, enabling one to build
a full natural language processing pipeline.
These components
include: sentence detector, tokenizer,
name finder, document categorizer, part-of-speech tagger, chunker,
parser,
coreference resolution. Components contain parts which enable one to execute the
respective natural language
processing task, to train a model and often also to evaluate a
model. Each of these facilities is accessible via its application
program
interface (API). In addition, a command line interface (CLI) is provided for convenience
of experiments and training.

Application Program Interface (API). Generic Example


OpenNLP components have similar APIs. Normally, to execute a task,
one should provide a model and an input.

A model is usually loaded by providing a FileInputStream with a model to a


constructor of the model class:

try (InputStream modelIn = new FileInputStream("lang-model-name.bin")) {

https://opennlp.apache.org/docs/1.9.3/manual/opennlp.html 4/64
8/10/2021 Apache OpenNLP Developer Documentation
SomeModel model = new SomeModel(modelIn);

After the model is loaded the tool itself can be instantiated.

ToolName toolName = new ToolName(model);

After the tool is instantiated, the processing task can be executed. The input and the
output formats are specific to the tool, but
often the output is an array of String,
and the input is a String or an array of String.

String output[] = toolName.executeTask("This is a sample text.");

Command line interface (CLI)


Description

OpenNLP provides a command line script, serving as a unique entry point to all
included tools. The script is located in the bin
directory of OpenNLP binary
distribution. Included are versions for Windows: opennlp.bat and Linux or
compatible systems:
opennlp.

List of tools

The list of command line tools for Apache OpenNLP 1.9.3,


as well as a description of its arguments, is available at section
Chapter 17, The Command Line Interface.

Setting up
OpenNLP script uses JAVA_CMD and JAVA_HOME variables to determine which command to
use to execute Java virtual
machine.

OpenNLP script uses OPENNLP_HOME variable to determine the location of the binary
distribution of OpenNLP. It is
recommended to point this variable to the binary
distribution of current OpenNLP version and update PATH variable to include
$OPENNLP_HOME/bin or %OPENNLP_HOME%\bin.

Such configuration allows calling OpenNLP conveniently. Examples below


suppose this configuration has been done.

Generic Example

Apache OpenNLP provides a common command line script to access all its tools:

$ opennlp

This script prints current version of the library and lists all available tools:

OpenNLP <VERSION>. Usage: opennlp TOOL

where TOOL is one of:

Doccat learnable document categorizer

DoccatTrainer trainer for the learnable document categorizer

DoccatConverter converts leipzig data format to native OpenNLP format

DictionaryBuilder builds a new dictionary

SimpleTokenizer character class tokenizer

TokenizerME learnable tokenizer

TokenizerTrainer trainer for the learnable tokenizer

TokenizerMEEvaluator evaluator for the learnable tokenizer

TokenizerCrossValidator K-fold cross validator for the learnable tokenizer

TokenizerConverter converts foreign data formats (namefinder,conllx,pos) to native OpenNLP format

DictionaryDetokenizer

SentenceDetector learnable sentence detector

SentenceDetectorTrainer trainer for the learnable sentence detector

SentenceDetectorEvaluator evaluator for the learnable sentence detector

SentenceDetectorCrossValidator K-fold cross validator for the learnable sentence detector

SentenceDetectorConverter converts foreign data formats (namefinder,conllx,pos) to native OpenNLP format

TokenNameFinder learnable name finder

TokenNameFinderTrainer trainer for the learnable name finder

TokenNameFinderEvaluator Measures the performance of the NameFinder model with the reference data

TokenNameFinderCrossValidator K-fold cross validator for the learnable Name Finder

TokenNameFinderConverter converts foreign data formats (bionlp2004,conll03,conll02,ad) to native OpenNLP f


CensusDictionaryCreator Converts 1990 US Census names into a dictionary

POSTagger learnable part of speech tagger

POSTaggerTrainer trains a model for the part-of-speech tagger

POSTaggerEvaluator Measures the performance of the POS tagger model with the reference data

POSTaggerCrossValidator K-fold cross validator for the learnable POS tagger

POSTaggerConverter converts conllx data format to native OpenNLP format

ChunkerME learnable chunker

ChunkerTrainerME trainer for the learnable chunker

ChunkerEvaluator Measures the performance of the Chunker model with the reference data

ChunkerCrossValidator K-fold cross validator for the chunker

ChunkerConverter converts ad data format to native OpenNLP format

Parser performs full syntactic parsing

ParserTrainer trains the learnable parser

ParserEvaluator Measures the performance of the Parser model with the referen
BuildModelUpdater trains and updates the build model in a parser model

CheckModelUpdater trains and updates the check model in a parser model

TaggerModelReplacer replaces the tagger model in a parser model

All tools print help when invoked with help parameter

Example: opennlp SimpleTokenizer help

https://opennlp.apache.org/docs/1.9.3/manual/opennlp.html 5/64

You might also like