0% found this document useful (0 votes)

71 views5 pages

Java-ML: A Machine Learning Library

This document summarizes a machine learning library called Java-ML. Java-ML is a collection of machine learning and data mining algorithms written in Java that aims to be readily usable and easily extensible. It contains implementations of various clustering, classification, feature selection, and other algorithms. The library is designed to be integrated easily into other Java programs and to serve as a reference for algorithm implementations.

Uploaded by

Jhon Hernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views5 pages

Java-ML: A Machine Learning Library

Uploaded by

Jhon Hernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220320645

Java-ML: a Machine Learning Library

Article in Journal of Machine Learning Research · January 2009

DOI: 10.1145/1577069.1577103 · Source: DBLP

CITATIONS READS
110 3,583

3 authors, including:

Thomas Abeel Yves Van de Peer

Delft University of Technology Ghent University
136 PUBLICATIONS 4,854 CITATIONS 811 PUBLICATIONS 58,332 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Sequencing View project

Genetic and Genomic approaches to identify loci involved n plant disease resistance View project

All content following this page was uploaded by Yves Van de Peer on 16 May 2014.

The user has requested enhancement of the downloaded file.

Journal of Machine Learning Research 10 (2009) 931-934 Submitted 9/08; Revised 1/09; Published 4/09

Java-ML: A Machine Learning Library

Thomas Abeel THOMAS . ABEEL @ PSB . UGENT. BE

Yves Van de Peer YVES . VANDEPEER @ PSB . UGENT. BE
Yvan Saeys YVAN . SAEYS @ PSB . UGENT. BE
VIB Department of Plant Systems Biology
Ghent University
9000 Gent, Belgium

Editor: Sören Sonnenburg

Abstract
Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily
usable and easily extensible API for both software developers and research scientists. The inter-
faces for each type of algorithm are kept simple and algorithms strictly follow their respective
interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and
implementing new algorithms is also easy. The implementations of the algorithms are clearly writ-
ten, properly documented and can thus be used as a reference. The library is written in Java and is
available from http://java-ml.sourceforge.net/ under the GNU GPL license.

Keywords: open source, machine learning, data mining, java library, clustering, feature selection,
classification

1. Introduction

Machine learning techniques are increasingly popular in research fields like bio- and chemo-
informatics, text and web mining, as well as many other areas of research and industry. In this
paper we present Java-ML: a cross-platform, open source machine learning library written in Java.

Several well-known data mining libraries already exist, including for example, Weka (Witten
and Frank, 2005) and Yale/RapidMiner (Mierswa et al., 2006). These programs provide a user-
friendly interface and are geared towards interactive use with the user. In contrast to these programs,
Java-ML is oriented towards developers that want to use machine learning in their own programs.
To this end, Java-ML interfaces are restricted to the essentials, and are very easy to understand. As
a result, Java-ML facilitates a broad exploration of different models, is straightforward to integrate
into your own source code, and can be easily extended.

Regarding the content of the library, Java-ML also has a different focus than the other libraries.
Java-ML contains an extensive set of similarity based techniques, and offers state-of-the-art feature
selection techniques. The large number of similarity functions allow for a broad set of clustering
and instance based learning techniques, while the feature selection techniques are well suited to
deal with high-dimensional domains, such as the ones often encountered in bioinformatics and
biomedical applications.

2009
c Thomas Abeel, Yves Van de Peer and Yvan Saeys.
A BEEL , VAN DE P EER AND S AEYS

Clustering Classification
K-means-like (7) SVM (2)
Self organizing maps Instance based learning (4)
Density based clustering (3) Tree based methods (2)
Markov chain clustering Random Forests
Cobweb Bagging
Cluster evaluation measures (15)

Feature selection Data filters

Entropy based methods (4) Discretization
Stepwise addition/removal (2) Normalization (2)
SVM RFE Missing values (3)
Random forests Instance manipulation (11)
Ensemble feature selection

Distance measures Utilities

Similarity measures (6) Cross-validation/evaluation
Distance metrics (11) Data loading (ARFF and CSV)
Correlation measures (2) Weka bridges (2)

Table 1: Overview of the main algorithms included in Java-ML. The number of algorithms for each
category is shown in parentheses.

2. Description of the Library

In this section we first describe the software design of Java-ML, we then discuss how to integrate it
in your program and finally we cover the documentation.

2.1 Structure of the Library

The library is built around two core interfaces: Dataset and Instance. These two interfaces have
several implementations for different types of samples. The machine learning algorithms implement
one of the following interfaces: Clusterer, Classifier, FeatureScoring, FeatureRanking or
FeatureSubsetSelection. Distance, correlation and similarity measures implement the interface
DistanceMeasure. These distance measures can be used in many algorithms to modify their behav-
ior. Cluster evaluation measures are defined by the ClusterEvaluation interface. Manipulation
filters either implement InstanceFilter or DatasetFilter, depending on the level they work on.
All implementing classes for each of the interfaces are available from the API documentation that
is available on the Java-ML website. Each of these interfaces provides one or two methods that are
required to execute the algorithm on a particular data set. Several utility classes make it easy to load
data from tab or comma separated files and from ARFF formatted files. An overview of the main
algorithms included in Java-ML can be found in Table 1.
The library provides several algorithms that have not been made available before in a bundled
form. In particular, clustering algorithms and the accompanying cluster evaluation measures are ex-
tensively represented. This includes the adaptive quality-based clustering algorithm, density based
methods, self-organizing maps (both as clustering and classification algorithm) and numerous other

932
JAVA -ML: A M ACHINE L EARNING L IBRARY

well-known clustering algorithms. A large number of distance, similarity and correlation measures
are included. Feature selection algorithms include traditional algorithms like symmetrical uncer-
tainty, gain ratio, RELIEF, stepwise addition/removal, as well as a number of more recent methods
(SVMRFE and random forest attribute evaluation). Also the recently introduced concept of ensem-
ble feature selection techniques (Saeys et al., 2008) is incorporated in the library. We have also
implemented a fast and simple random tree algorithm to cope with high dimensional, sparse and
ambiguous data. Finally, we provide bridges for classification and clustering in Weka and libsvm
(Fan et al., 2005).

2.2 Easy Integration in Your Own Source Code

Including Java-ML algorithms in your own source code is very simple. To illustrate this, we present
here two short code fragments that demonstrate the ease to integrate the library. The following lines
of code integrate a K-Means clustering algorithm in your own program.

Dataset data = FileHandler.loadDataset(new File("iris.data"), 4, ",");

Clusterer km = new KMeans();
Dataset[]clusters=km.cluster(data);

The first line uses the FileHandler utility to load data from the iris.data file. In this file, the class
label is on the fourth position and the fields are separated by a comma. The second line constructs a
new instance of the KMeans clustering algorithm with default values, in this case k=4. The third line
uses the KMeans instance to cluster the data that we loaded in the first line. The resulting clusters
will be returned as an array of data sets.
The following example illustrates how to perform a cross-validation experiment for a specific
dataset and classifier.

Dataset data = FileHandler.loadDataset(new File("iris.data"), 4, ",");

Classifier knn = new KNearestNeighbors(5);
CrossValidation cv = new CrossValidation(knn);
Map<Object, PerformanceMeasure> p = cv.crossValidation(data);

First we load the iris data set, and construct a K-nearest neighbors classifier, which uses 5 neigh-
bors to classify instances. In the next line, we initialize the cross-validation with our classifier. The
last line runs the cross-validation on the loaded data. By default, a 10-fold cross validation will
be performed. The result is returned in a map, which maps each class label to its corresponding
PerformanceMeasure (Map<Object,PerformanceMeasure>). For classification problems, a per-
formance measure is a wrapper around four values: (i) true positives, (ii) true negatives, (iii) false
positives and (iv) false negatives. This class also provides a number of derivative measures such as
accurracy, error rate, precision, recall and others. More advanced samples are available from the
documentation pages on the Java-ML website.

2.3 Documentation
There are a number of resources for documentation about Java-ML. The source code itself is docu-
mented thoroughly, always up-to-date, and accessible from the web site through the API documen-
tation. The web site additionally provides a number of tutorials with illustrated code samples for

933
A BEEL , VAN DE P EER AND S AEYS

the most common tasks in Java-ML, covering the following topics: installing the library, introduc-
ing basic concepts, creating and loading data, creating algorithms and applying them to your data,
and more advanced topics for people who would like to contribute to the library. Finally, all code
samples as well as the PDF versions of the tutorials are also included in the Java-ML distribution
itself.

3. Case Studies
The library described in this manuscript has been used in several studies. Here we highlight two
applications which have been recently published.
Initially, the project focused on clustering algorithms and measures to evaluate the quality of
a clustering. Our goal was to separate DNA sequences that are likely to contain a promoter (the
controlling element of a gene) from other sequences, a well-known task in bioinformatics. The
best results were obtained using a clustering algorithm based on self-organizing maps (Abeel et al.,
2008).
More recently, the focus has shifted toward feature selection. More specifically, we are looking
whether ensemble feature selection (combining different feature selectors) can improve the stability
of feature selection in case of high-dimensional data sets with few samples. The improvements in
stability were shown not to affect the prediction accuracy. This is ongoing research, but the first
results are promising (Saeys et al., 2008).

Acknowledgments

We thank A. De Rijcke for his early contributions to Java-ML, as well as the anonymous reviewers
for their valuable comments. TA is funded by IWT-Vlaanderen. YS would like to thank the Research
Foundation-Flanders (FWO-Vlaanderen) for funding his research.

References
Thomas Abeel, Yvan Saeys, Pierre Rouzé, and Yves Van de Peer. ProSOM: Core promoter predic-
tion based on unsupervised clustering of DNA physical profiles. Bioinformatics, 24(13):i24–i31,
July 2008.

Rong-En Fan, Pai-Hsuen Chen, and Chih-Jen Lin. Working set selection using the second order
information for training SVM. Journal of Machine Learning Research, 6:1889–1918, 2005.

Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz, and Timm Euler. Yale: Rapid pro-
totyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.

Yvan Saeys, Thomas Abeel, and Yves Van de Peer. Robust feature selection using ensemble feature
selection techniques. In Proceedings of the ECML-PKDD conference 2008, 2008.

Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques.
Morgan Kaufmann, San Francisco, 2nd edition, 2005.

934

View publication stats

ML Concepts&Algorithms
No ratings yet
ML Concepts&Algorithms
193 pages
Ai and Machine Learning With Java
No ratings yet
Ai and Machine Learning With Java
7 pages
Inteligencia Artificial Java (English)
100% (4)
Inteligencia Artificial Java (English)
222 pages
Speech and Language Processing Draft 2nd Edition Daniel Jurafsky Instant Download
100% (1)
Speech and Language Processing Draft 2nd Edition Daniel Jurafsky Instant Download
80 pages
Vtu ML Lab Manual
67% (3)
Vtu ML Lab Manual
47 pages
JAX Essentials: The Complete Guide for Developers and Engineers
From Everand
JAX Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
KSS Catalog-E
No ratings yet
KSS Catalog-E
236 pages
Ros 121223 1633 8028
No ratings yet
Ros 121223 1633 8028
1,656 pages
ABES Presentation
No ratings yet
ABES Presentation
91 pages
9699457926machine Learning Lab
No ratings yet
9699457926machine Learning Lab
55 pages
Machine Learning in Java - Sample Chapter
100% (1)
Machine Learning in Java - Sample Chapter
26 pages
The Landscape of Machine,...
No ratings yet
The Landscape of Machine,...
31 pages
Lecture 6 - Spark ML
No ratings yet
Lecture 6 - Spark ML
31 pages
Algorithms and Frameworks Used in The Development of Machine Learning Models
No ratings yet
Algorithms and Frameworks Used in The Development of Machine Learning Models
5 pages
AI Java Detailed 32 Week Plan
No ratings yet
AI Java Detailed 32 Week Plan
5 pages
AI Java Full Weekly Study Plan
No ratings yet
AI Java Full Weekly Study Plan
4 pages
ML Lab Manual (IT-804)
No ratings yet
ML Lab Manual (IT-804)
49 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
AI Libraries
No ratings yet
AI Libraries
3 pages
cp4252 Machine Learning
100% (2)
cp4252 Machine Learning
49 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
58 pages
Mooc Presentation
No ratings yet
Mooc Presentation
13 pages
Machine Learning Software Engineering
No ratings yet
Machine Learning Software Engineering
3 pages
Large It List
No ratings yet
Large It List
864 pages
Camacho Reflection#12
No ratings yet
Camacho Reflection#12
1 page
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
REF-10-Automated Machine Learning The New Wave of Machine Learning
No ratings yet
REF-10-Automated Machine Learning The New Wave of Machine Learning
8 pages
ML-Plan: Automated Machine Learning Via Hierarchical Planning
No ratings yet
ML-Plan: Automated Machine Learning Via Hierarchical Planning
21 pages
Question: 2. An Air Conditioning Plant Comprising Lter, Cooler Coil, Fan A
No ratings yet
Question: 2. An Air Conditioning Plant Comprising Lter, Cooler Coil, Fan A
2 pages
Flow-Based Programming For Machine Learning
No ratings yet
Flow-Based Programming For Machine Learning
30 pages
CCD Chapter 6 Notes
No ratings yet
CCD Chapter 6 Notes
18 pages
Mlibspark
No ratings yet
Mlibspark
2 pages
Class Note Expanded 1
No ratings yet
Class Note Expanded 1
7 pages
ML Libraries Frameworks Updated
No ratings yet
ML Libraries Frameworks Updated
13 pages
Transport Requests in SAP
No ratings yet
Transport Requests in SAP
9 pages
Essential Avro: Definitive Reference for Developers and Engineers
From Everand
Essential Avro: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Tools Machine Learning
No ratings yet
Tools Machine Learning
9 pages
Qxtend IG v0182
No ratings yet
Qxtend IG v0182
98 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Machine Learning Tools and Toolkits in The Explora
No ratings yet
Machine Learning Tools and Toolkits in The Explora
7 pages
Implementation of Color Sorting
No ratings yet
Implementation of Color Sorting
37 pages
Sample of Writing Cover Letter For Job Application
100% (1)
Sample of Writing Cover Letter For Job Application
6 pages
Biblio Java PDF
No ratings yet
Biblio Java PDF
4 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
ML Lab Manual Arpan
No ratings yet
ML Lab Manual Arpan
48 pages
Machine Learning Tools
No ratings yet
Machine Learning Tools
14 pages
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Machine Learning On Big Data: Opportunities and Challenges
No ratings yet
Machine Learning On Big Data: Opportunities and Challenges
25 pages
Mastering Java Collections: From Basics to Expert Proficiency
From Everand
Mastering Java Collections: From Basics to Expert Proficiency
William Smith
No ratings yet
Karthik
No ratings yet
Karthik
10 pages
Enabling Automated Machine Learning For Model-Driven AI Engineering
No ratings yet
Enabling Automated Machine Learning For Model-Driven AI Engineering
5 pages
Java Fundamentals Made Easy: A Practical Guide with Examples
From Everand
Java Fundamentals Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Java Data Structures Explained: A Practical Guide with Example
From Everand
Java Data Structures Explained: A Practical Guide with Example
William E. Clark
No ratings yet
Test: Jfo Section 2 Quiz
0% (2)
Test: Jfo Section 2 Quiz
2 pages
An Introduction To Machine Learning and Its Applications
No ratings yet
An Introduction To Machine Learning and Its Applications
8 pages
SDLC Topic Computer New Book 1st Year 2025
No ratings yet
SDLC Topic Computer New Book 1st Year 2025
5 pages
Data Analytics and Machine Learning With Java: Conference Paper
No ratings yet
Data Analytics and Machine Learning With Java: Conference Paper
10 pages
DIRECTIONS: Work Out The Problems Below by Subtracting The Two Numbers. Make Sure You
100% (2)
DIRECTIONS: Work Out The Problems Below by Subtracting The Two Numbers. Make Sure You
3 pages
Java Algorithms for Beginners: A Practical Guide with Examples
From Everand
Java Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Java OOP Simplified: A Practical Guide with Examples
From Everand
Java OOP Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
AWS Helper
No ratings yet
AWS Helper
67 pages
Class VIII - ICT
No ratings yet
Class VIII - ICT
2 pages
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet
AUTOSAR FO RS ProjectObjectives
No ratings yet
AUTOSAR FO RS ProjectObjectives
14 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Dell XPS 8940 Gaming Tower PC - Intel
No ratings yet
Dell XPS 8940 Gaming Tower PC - Intel
2 pages
A Survey of Machine Learning Methods For Iot and Their Future Applications
No ratings yet
A Survey of Machine Learning Methods For Iot and Their Future Applications
5 pages
DB2 Redirect Restore Using TSM
No ratings yet
DB2 Redirect Restore Using TSM
8 pages
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
From Everand
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
William Smith
No ratings yet
HPE - A00094858en - Us - Aruba CX Mobile App User Guide
No ratings yet
HPE - A00094858en - Us - Aruba CX Mobile App User Guide
42 pages
These Are The Top 10 Machine Learning Languages On GitHub
No ratings yet
These Are The Top 10 Machine Learning Languages On GitHub
3 pages
MLib Cheat Sheet Design
No ratings yet
MLib Cheat Sheet Design
1 page
Mastering Core Java: From Basics to Expert Proficiency
From Everand
Mastering Core Java: From Basics to Expert Proficiency
William Smith
No ratings yet
CRM Question
No ratings yet
CRM Question
2 pages
Machine Learning Toolbox
No ratings yet
Machine Learning Toolbox
10 pages
BusTicketingSystem PPT
No ratings yet
BusTicketingSystem PPT
18 pages
Production Activity Control Scheduling
No ratings yet
Production Activity Control Scheduling
31 pages
Effective Web Searching
No ratings yet
Effective Web Searching
13 pages
Academic Planner Class 2
No ratings yet
Academic Planner Class 2
7 pages
Lecture 1.1.4 (ATmega328 Block Diagram and External Peri.)
No ratings yet
Lecture 1.1.4 (ATmega328 Block Diagram and External Peri.)
14 pages
UNIT I Complete Notes
No ratings yet
UNIT I Complete Notes
5 pages
Smart Phone Catalog 230518
No ratings yet
Smart Phone Catalog 230518
17 pages
(Part 2) Java 4 Selenium WebDriver: Come Learn How To Program For Automation Testing
From Everand
(Part 2) Java 4 Selenium WebDriver: Come Learn How To Program For Automation Testing
Rex Jones
No ratings yet
Bramah-Systems Audit
No ratings yet
Bramah-Systems Audit
14 pages
Java Programming
From Everand
Java Programming
Brian Evenson
No ratings yet
Oomd Mod
No ratings yet
Oomd Mod
9 pages
Listlabo
No ratings yet
Listlabo
3 pages
ASM Module 01: Creating Your Primary Product List, Part 1: This Lesson Covers
No ratings yet
ASM Module 01: Creating Your Primary Product List, Part 1: This Lesson Covers
4 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

Java-ML: A Machine Learning Library

Uploaded by

Java-ML: A Machine Learning Library

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Java-ML: a Machine Learning Library

Article in Journal of Machine Learning Research · January 2009

Thomas Abeel Yves Van de Peer

SEE PROFILE SEE PROFILE

Sequencing View project

The user has requested enhancement of the downloaded file.

Java-ML: A Machine Learning Library

Thomas Abeel THOMAS . ABEEL @ PSB . UGENT. BE

Editor: Sören Sonnenburg

Feature selection Data filters

Distance measures Utilities

2. Description of the Library

2.1 Structure of the Library

2.2 Easy Integration in Your Own Source Code

Dataset data = FileHandler.loadDataset(new File("iris.data"), 4, ",");

Dataset data = FileHandler.loadDataset(new File("iris.data"), 4, ",");

View publication stats

You might also like