Numeric Computation and Statistical Data Analysis on the Java Platform Advanced Information and Knowledge Processing Chekanov Sergei V pdf download
Numeric Computation and Statistical Data Analysis on the Java Platform Advanced Information and Knowledge Processing Chekanov Sergei V pdf download
https://textbookfull.com/product/scientific-data-analysis-using-
jython-scripting-and-java-advanced-information-and-knowledge-
processing-2010th-edition-chekanov-sergei-v/
https://textbookfull.com/product/advanced-r-statistical-
programming-and-data-models-analysis-machine-learning-and-
visualization-1st-edition-matt-wiley/
https://textbookfull.com/product/metaprogramming-in-r-advanced-
statistical-programming-for-data-science-analysis-and-
finance-1st-edition-thomas-mailund/
https://textbookfull.com/product/real-estate-analysis-in-the-
information-age-techniques-for-big-data-and-statistical-
modeling-1st-edition-winson-geideman/
Functional Programming in R: Advanced Statistical
Programming for Data Science, Analysis and Finance 1st
Edition Thomas Mailund
https://textbookfull.com/product/functional-programming-in-r-
advanced-statistical-programming-for-data-science-analysis-and-
finance-1st-edition-thomas-mailund/
https://textbookfull.com/product/advanced-object-oriented-
programming-in-r-statistical-programming-for-data-science-
analysis-and-finance-1st-edition-thomas-mailund/
https://textbookfull.com/product/statistical-data-analysis-using-
sas-intermediate-statistical-methods-mervyn-g-marasinghe/
https://textbookfull.com/product/advanced-linear-modeling-
statistical-learning-and-dependent-data-3rd-edition-
christensen-r/
Sergei V. Chekanov
Numeric
Computation and
Statistical Data
Analysis on the
Java Platform
Advanced Information and Knowledge
Processing
Series editors
Lakhmi C. Jain
Bournemouth University, Poole, UK and
University of South Australia, Adelaide, Australia
Xindong Wu
University of Vermont
Information systems and intelligent knowledge processing are playing an increasing
role in business, science and technology. Recently, advanced information systems
have evolved to facilitate the co-evolution of human and information networks
within communities. These advanced information systems use various paradigms
including artificial intelligence, knowledge management, and neural science as well
as conventional information processing paradigms. The aim of this series is to
publish books on new designs and applications of advanced information and
knowledge processing paradigms in areas including but not limited to aviation,
business, security, education, engineering, health, management, and science. Books
in the series should have a strong focus on information processing—preferably
combined with, or extended by, new results from adjacent sciences. Proposals for
research monographs, reference books, coherently integrated multi-author edited
books, and handbooks will be considered for the series and each proposal will be
reviewed by the Series Editors, with additional reviews from the editorial board and
independent reviewers where appropriate. Titles published within the Advanced
Information and Knowledge Processing series are included in Thomson Reuters’
Book Citation Index.
Numeric Computation
and Statistical Data Analysis
on the Java Platform
123
Sergei V. Chekanov
HEP Division
Argonne National Laboratory
Lemont, IL
USA
Numerical and statistical algorithms are typically confined within a specific pro-
gramming language. For example, the R open-source data-analysis software uses a
specialized scripting language, which is an implementation of the “S” programming
language. Many commercial mathematical programs follow this trend. This book is
about a platform for statistical calculations using algorithms that are not confined by
a chosen language. For example, this platform allows mixing Python and Java
numerical libraries, or using them on their own. Or, one can use this book to
program statistical code using other languages, such as Groovy, Ruby, and
BeanShell. This book is about an approach to scientific programming and visual-
ization that does not set strict requirements on specific programming languages, nor
on operating systems where such calculations are performed.
There are many books written about Java—one of the most popular program-
ming languages. There are many books written about Python, which is another very
popular programming language. This book explains how to mix them, bringing
incredible algorithmic power and cutting-edge numeric libraries to scientific com-
putations and data visualization.
In this book I did not go deep inside particular scientific research area, since the
aim was to give concrete examples which illustrate which Java libraries should be
used to perform computations. In the cases when I could not cover the subject in
detail, a sufficient number of relevant references was given, so the reader can easily
find necessary information for each chapter using external sources.
Thus this book presents practical approaches to numerical computations, data
analysis, and knowledge discovery, focusing on programming techniques. Each
chapter describes the conceptual underpinning for numerical and statistical calcu-
lations using Java libraries, covering many aspects from simple multidimensional
arrays and histograms to clustering analysis, curve fitting, neural networks, and
symbolic calculations. To make the examples as simple as possible from the
computational point of view, I fully embrace the scripting approach in the course of
this book. This leads to short and clear analysis codes, so you could concentrate on
the logic of analysis flow rather than on language-specific details.
vii
viii Preface
This book uses Python as the main programming language, since it is elegant and
easy to learn. It is a great language for teaching scientific computation. For devel-
opers, this is an ideal language for fast prototyping and debugging. The book dis-
cusses how to design code snippets for numeric computation and statistics on the
Java platform. To be more exact, we will use Jython (Python implemented in Java), a
language that uses not only native Python modules, but can also access very com-
prehensive Java classes. The reader will learn how to write analysis codes, while
numerous code snippets will give you some ideas on numeric algorithms which can
easily be incorporated into realistic research application. The book includes more
than 300 code snippets to produce data-visualization plots in 2D and 3D.
I am almost convinced myself that this book is self-contained and does not
depend on detailed knowledge of computing language, although knowledge of
Python and Java is desirable. However, the reader may still need some programming
background in order to use this book with other languages, such as Groovy,
BeanShell, and Ruby, since I did not give very detailed coverage of these languages.
This book is intended for general audiences, for those who use computing to make
sense of data surrounding us. It can be used as a source of knowledge on data
analysis and statistical calculations for students and professionals of all disciplines.
This book was written for undergraduate and graduate students, academics, pro-
fessors, and professionals of any field and any age. The book could be used as a
textbook for students.
We also hope that this book will be useful for those who study financial markets,
since the numeric algorithms discussed in this book are undoubtedly common to any
knowledge discovery research. This book equips readers with the description of a
computational platform for statistical calculations which can be viewed as an inex-
pensive alternative to costly commercial products used by financial-market analysts.
I assume the readers are not familiar with Python/Jython, the main programming
language used for code snippets in this book. But some basic understanding of
statistics and mathematics would be very helpful to understand the material of this
book.
All example codes of this book can easily be transformed to Java, Groovy,
Ruby/JRuby, or BeanShell codes. You are presumed to have knowledge of pro-
gramming in Java, if you will choose the path of moving the examples to Java, or if
you will decide to create Java libraries to be deployed as jar files for a new project.
The book will discuss how to do this, and a few Java examples will be provided.
Transformations of the example snippets to scripting languages, such as Groovy,
Ruby/JRuby, or BeanShell, may require some knowledge of these scripting
languages. The good thing is that the analysis algorithms and numerical libraries
will be exactly the same, so a little effort is required to move to other languages.
Again, we will show you how to convert Jython codes to these languages. In most
Preface ix
cases, our examples should be sufficient to get started with a new language. The
more knowledge about Groovy and Ruby/JRuby you can bring, the more you will
get out of this book.
References
This book describes a software which is a collective work of many developers who
have dedicated themselves to scientific computing. The author is grateful to all
people who contributed to scientific software, and for their inspiration and dedi-
cation to science and knowledge-discovery software.
Many numeric and graphic libraries discussed in this book were released as
open-source projects. I am grateful to the authors of such open-source programs for
their enthusiasm to share their work, and for making their software publicly
available.
You can find a list of contributions to the software packages described in this
book on the jWork.ORG web page (http://jwork.org/dmelt/). A special note of
thanks to those of you who reported bugs in a constructive way, helped with
solutions, and shared your knowledge and experience with others.
Much of this project grew out of fruitful collaboration with many of my col-
leagues who devoted themselves to high energy physics. Over the course of the past
twenty-five years I have learned a lot about programming aspects of scientific
research. I would like to thank my colleagues for checking and debugging the
examples shown in this book, and here the list will be endless.
I would like to thank everyone at Springer for their help with the production
process. In particular, managing editors H. Desmond and J. Robinson, who helped
start this book in its present form.
Not least, personal thanks go to my dear wife, Tania, and my sons, Alexey
(Alosha) and Roman, for their love and patience to a husband and father who was
only half (mentally) present after coming from his work. Without their patience and
understanding, this book would not have been possible. Finally, I also thank my
parents and sister for their support of my interests in all aspects of science.
xi
Contents
xiii
xiv Contents
2 Introduction to Jython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 Code Structure and Jython Objects . . . . . . . . . . . . . . . . . . . 27
2.1.1 Numbers as Objects . . . . . . . . . . . . . . . . . . . . . . 30
2.1.2 Formatted Output . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.3 Mathematical Functions . . . . . . . . . . . . . . . . . . . . 32
2.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Strings as Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Import Statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.1 Executing Native Applications . . . . . . . . . . . . . . . 37
2.5 Comparison Tests and Loops . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 The “if-else” Statement . . . . . . . . . . . . . . . . . . . . 38
2.5.2 Loops. The “for” Statement . . . . . . . . . . . . . . . . . 39
2.5.3 The “continue” and “break” Statements . . . . . . . . . 39
2.5.4 Loops. The “while” Statement . . . . . . . . . . . . . . . 40
2.6 Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6.2 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6.3 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6.4 Functional Programming . . . . . . . . . . . . . . . . . . . 48
2.7 Java Collections in Jython . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.7.1 List. An Ordered Collection . . . . . . . . . . . . . . . . . 50
2.7.2 Set. A Collection Without Duplicate Elements . . . . 53
2.7.3 SortedSet. Sorted Unique Elements . . . . . . . . . . . . 54
2.7.4 Map. Mapping Keys to Values . . . . . . . . . . . . . . . 55
2.7.5 Java Map with Sorted Elements . . . . . . . . . . . . . . 55
2.7.6 Real-Life Example: Sorting and Removing
Duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.8 Random Numbers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.9 Time Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.9.1 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.10 Python Functions and Modules. . . . . . . . . . . . . . . . . . . . . . 60
2.11 Python Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.11.1 Initializing a Class . . . . . . . . . . . . . . . . . . . . . . . 65
2.11.2 Classes Inherited from Other Classes. . . . . . . . . . . 65
2.11.3 Java Classes in Jython . . . . . . . . . . . . . . . . . . . . . 66
2.11.4 Not Covered Topics . . . . . . . . . . . . . . . . . . . . . . 66
2.12 Parallel Computing and Threads . . . . . . . . . . . . . . . . . . . . . 67
2.13 Arrays in Jython. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.13.1 Array Conversion and Transformations . . . . . . . . . 69
2.13.2 Performance Issues . . . . . . . . . . . . . . . . . . . . . . . 69
2.13.3 Used Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.14 Exceptions in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Contents xv
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
Conventions and Acronyms
This book uses the following typographical convention: A box with a code inside
usually means interactive Python/Jython commands typed in the “Jython Shell.” All
such commands start with the symbol [[[ which is the usual invitation in
Python to type a command. This is shown in the example below:
Working interactively with the Jython prompt has the drawback that it is
impossible to save typed commands. In most cases, the code snippets are not so
short, although they are still much shorter than in any other programming language.
Therefore, it is desirable to save the typed code in a file for further modification and
execution. In this case, we use Jython macro files, i.e., we write a code using the
DMelt (or any other) editor [15], save it in a file with the extension “.py”, and run it
using the keyboard shortcut [F8] or the button “run” from the DMelt tool bar
menu. Such code examples are also shown inside the box, but code lines do not
start with the Python invitation symbol [[[ . In such situations, the example
codes will be shown as:
For examples written in the Python language, double quotes and apostrophe are
interchangeable. For Java and other languages, this is not the case. So, to make our
code to be easily convertible to Java or Groovy, we will use double quotes around
strings. As in the above example, we will try to comment code lines as much as we
can. For Python, comments are preceded by the hash character.
If a code snippet is used as a Python/Jython module by other programs, then we
should write our code inside a file. A Python code always imports an external
module using its file name. Since the file names are important, we will indicate
exactly which file name should be used under the box with a code. For example, if a
program code is considered a module that has to be imported by another code
example, we will show it as:
xxv
xxvi Conventions and Acronyms
imports the file “hello.py” and executes it, printing the string. In other cases, we
will use arbitrary file names for the code snippets.
We use typewriter font for Jython and Java classes and methods. For file
names and directories, we also use the same font style with additional parentheses.
We remind that the directory name separators are backward slashes for
Windows, and slashes for Linux and Mac computers. For example, the directory
with examples will be shown as:
macro/examples/
For Windows computers, the same directory should be shown as:
macro\examples\
The dots in this example are used to indicate the upper-level directory.
We will try to avoid using abbreviations. When we use abbreviations, we will
explain their meaning directly in the text. When space allows, we will use mean-
ingful names for variables. This is all.
Chapter 1
Java Computational Platform
1.1 Introduction
Java is both a programming language and a computing platform which runs Java
code. This book uses both. But the Java programming language is not necessary
for the approach adopted in this book, since the Java platform allows the usage of
scripting languages, such as Jython/Python, Groovy, Ruby/JRuby, BeanShell, and
others.
The heart of the Java platform is the Java Virtual Machine (JVM) that runs
programs converted to Java bytecode programs. The conversion to bytecode is done
by Java compiler. Bytecode is the optimized and effective machine language of
JVM. The JVM reads this bytecode, interprets it, and executes the program.
In fact, even if you write your code using other programming languages, such as
Python and Groovy, which are simpler than the Java language, your code still will
be converted to Java bytecode programs.
The JVM is ported to different platforms and insulates the program from the
underlying hardware and operating system. Thus it provides hardware- and
operating-system independence. The Java application programming interface (API)
is also a part of the Java platform. Java API classes are used for building software
applications.
First, let us discuss the Java programming language, one of the most popular
object-oriented programming languages in use. The statistics of SourceForge reports
that the number of open-source applications written in Java is close to those
written in C++. According to the TIOBE software index (http://www.tiobe.com/), a
c Springer International Publishing Switzerland 2016 1
S.V. Chekanov, Numeric Computation and Statistical Data Analysis
on the Java Platform, Advanced Information and Knowledge Processing,
DOI 10.1007/978-3-319-28531-3_1
2 1 Java Computational Platform
is as fast as for any other program. The JIT compilation converts Java bytecode into
native machine code at runtime. The conversion step can be slow; however, this
does not matter as much for numerical calculations involving large loops due to JIT
compilation.
One should however mention that Java uses more memory than C or FORTRAN.
The main reason—JVM does a lot of internal bookkeeping for garbage collection,
program optimization at runtime, and providing a safeguard for the Java program.
Well, it is better to assign such tasks to the JVM—people who need to use Java
will have more time to think about numeric algorithms and how to advance their
respective applied disciplines.
Numerical and statistical calculations explained in this book use the DataMelt
(shorter, DMelt) software platform [1] that runs on the Java platform. It is a collec-
tion of libraries integrated with different programming languages. Unlike other sta-
tistical programs, it is not limited to a single programming language: DMelt can be
used with several scripting languages such as Python/Jython, Groovy, Ruby/JRuby,
as well as with Java. Generally, the DMelt computational platform extends the stan-
dard Java software platform in several areas:
• Adds a support for Jython, Groovy, JRuby, BeanShell, and GNU Octave high-
level scripting languages.
• Adds an IDE and interactive shells to work with these scripting languages and
with Java. It also adds a support to process programs in the command line (i.e., in
a batch mode).
• Adds comprehensive Java libraries for numeric computation and visualization,
incorporating free scientific packages from more than a hundred Java developers
around the world. At the moment when this book is written, DMelt includes more
than 30,000 Java classes from more than a hundred open-source Java libraries.
• DMelt includes online resources for library updates, class documentation, and for
example databases. The Web-based package descriptions are directly accessible
from the DMelt IDE. We will discuss this topic later.
Figure 1.1 illustrates the DMelt program structure. DMelt includes a support for
several scripting languages that can be run on JVM, third-party numerical libraries
integrated with IDE, and online services for update and documentation.
DMelt was designed to enable researches to spend their time thinking about
problems and their solutions, rather than diving into low-level coding using pro-
gramming languages. DMelt analysis macros for data manipulations are based on
Jython, an implementation of the high-level language Python. Thus, one can fully
benefit from a variety of programming possibilities offered by Python, including its
syntax clarity and high-level libraries. But Jython is not a prerequisite for this frame-
work: Java and other languages supported by DMelt can also be used to access the
mathematical and graphical libraries of DMelt.
4 1 Java Computational Platform
We should immediately warn you: the DMelt numerical and graphical libraries can
be considered neither as most efficient nor error-free. The code of DMelt does not
always follow the coding recommendations for Java developers including naming
1.1 Introduction 5
conventions and code layout. We even admit that some parts were not designed
with the highest possible performance for code execution in mind. The reason is
simple: it was not written by professional programmers. The numerical libraries
were written by many people at different times, most of them were students and
scientists who had to develop numerical and data visualization algorithms for their
own research programs, since commercial software companies either could not offer
similar programs or their products were too expensive. Many contributed packages
have been discontinued many years ago, but have been brought to life after their
inclusion into DMelt. In addition, some packages were written using Java 1.1, and
this had also some impact on the coding style of certain libraries.
Thus, a professional programmer may immediately find some parts of the code
that look unprofessionally written. This is true even for some examples shown in
this book. The reason for this was not because we were not aware of such coding
issues. In some cases, we did not find appealing reasons to keep very strict coding
standard at the expense of simplicity. For example, in most cases, we import all
classes inside a package using the statement:
We did not enforce the latter case to keep the examples of this book short and con-
cise, so we could fit the code snippets into the pages of this book. Also, it is possible
that you may not like to type long lists of imported classes during a code prototyp-
ing (personally, I do not like this style), since this can be done later during code
deployment.
A professorial programmer might find some other odds, like why some object
containers are designed to store only double values (like the P1D class to be
discussed below), while it is more practical to store integer values when necessary.
Again, the motivation was not because of omissions. The reason was that the reader
may not want to dive into extra complexity of dealing with different types, since
integers are only a subset of float values. There are plenty of other classes which are
well suited for storing integer values (we will discuss them in this book).
The main motivation for the DMelt project was to develop an accessible and
friendly tool to be used in scientific search, with a syntax oriented toward scientists
rather than programmers. The design of this project was mainly motivated by
simplicity: there are many programming languages which are required to learn for
many years before starting to write useful scientific and engineering projects. The
approach discussed in this book is very different: generally, the reader does not need
to know any programming language to start writing analysis codes using DMelt
libraries. However, if it happens that the reader knows either Java or Python (or
both) already, he or she will find this book to be also interesting, since DMelt is not
just a simplified entry to the world of the Java and Python computer programming. It
6 1 Java Computational Platform
shows how to use programming for practical purposes such as numeric calculations,
statistics, and data analysis.
The reader may also notice that a little attention has been paid to how to write and
use Java or Jython classes. Of course, classes are necessary for any object-oriented
language. The reason for this is the following: for the majority of scientific data
analysis programs, the logic of scripting programs is linear, i.e., an analysis code
typically consists of a well-defined sequence of statements to be evaluated one by
one, from the top to the bottom of the code. It is very unlikely that data analysis logic
will contain highly parallel algorithmic branches as those for the usual graphical
user interface (GUI) development.1 Certainly, the classes are necessary when one
develops Java libraries to be used by a scripting language. But, in this book, we
mainly concentrate on the scripting examples based on the existing Java libraries of
DMelt, rather than discussing how to write classes for numerical computation to be
deployed as external libraries.
1.1.4 Errors
This book may contain typos, omissions, or even errors. DMelt can also contain
bugs. If you notice any errors or if you have suggestions regarding the book and
code examples, I would be happy to hear from you. You can send your comments
to:
One can also post bug reports to the DMelt forum accessible from the main Web
page:
http://jwork.org/dmelt/
DMelt is not a software that stands still. Therefore, this book represents a snap-
shot of the time when the DMelt version 1.4 was in use, therefore, some examples
may fall out of date. Therefore, the reader is encouraged to look at the Web page
given above to find corrected examples.
1 We should probably say that this may not be totally true in future when multi-core machines will
be rather common and one will face with the question of how to parallelize analysis codes to gain
high performance. We briefly discuss this topic in this book.
Random documents with unrelated
content Scribd suggests to you:
The Project Gutenberg eBook of Curly: A Tale
of the Arizona Desert
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Language: English
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com