Practical Java Machine Learning: Projects with Google Cloud Platform and Amazon Web Services
By Mark Wickham
()
About this ebook
Practical Java Machine Learning includes multiple projects, with particular focus on the Android mobile platform and features such as sensors, camera, and connectivity, each of which produce data that can power unique machine learning solutions. You will learn to build a variety of applications that demonstrate the capabilities of the Google Cloud Platform machine learning API, including data visualizationfor Java; document classification using the Weka ML environment; audio file classification for Android using ML with spectrogram voice data; and machine learning using device sensor data.
After reading this book, you will come away with case study examples and projects that you can take away as templates for re-use and exploration for your own machine learning programming projects with Java.
What You Will Learn
- Identify, organize, and architect the data required for ML projects
- Deploy ML solutions in conjunction with cloud providers such as Google and Amazon
- Determine which algorithm is the most appropriate for a specific ML problem
- Implement Java ML solutions on Android mobile devices
- Create Java ML solutions to work with sensor data
- Build Java streaming based solutions
Experienced Java developers who have not implemented machine learning techniques before.
Related to Practical Java Machine Learning
Related ebooks
Deep Learning with Azure: Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform Rating: 0 out of 5 stars0 ratingsC# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications Rating: 0 out of 5 stars0 ratingsBeginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks Rating: 0 out of 5 stars0 ratingsPractical MATLAB: With Modeling, Simulation, and Processing Projects Rating: 0 out of 5 stars0 ratingsAdvanced Analytics with Transact-SQL: Exploring Hidden Patterns and Rules in Your Data Rating: 0 out of 5 stars0 ratingsPractical Machine Learning in JavaScript: TensorFlow.js for Web Developers Rating: 0 out of 5 stars0 ratingsData Science Fundamentals for Python and MongoDB Rating: 0 out of 5 stars0 ratingsHands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials Rating: 0 out of 5 stars0 ratingsData Structures and Algorithms with Go: Create efficient solutions and optimize your Go coding skills (English Edition) Rating: 0 out of 5 stars0 ratingsEveryday Data Structures Rating: 0 out of 5 stars0 ratingsPractical Machine Learning with Python: A Problem-Solver's Guide to Building Real-World Intelligent Systems Rating: 0 out of 5 stars0 ratingsImplementing AI Systems: Transform Your Business in 6 Steps Rating: 0 out of 5 stars0 ratingsMachine Learning Engineering with MLflow: Manage the end-to-end machine learning life cycle with MLflow Rating: 0 out of 5 stars0 ratingsAdvanced Elasticsearch 7.0: A practical guide to designing, indexing, and querying advanced distributed search engines Rating: 0 out of 5 stars0 ratingsInstant .NET 4.5 Extension Methods How-to: Utilize and harness the power of extension methods in your .NET applications Rating: 0 out of 5 stars0 ratingsDeploy Machine Learning Models to Production: With Flask, Streamlit, Docker, and Kubernetes on Google Cloud Platform Rating: 0 out of 5 stars0 ratingsBeginning Machine Learning in iOS: CoreML Framework Rating: 0 out of 5 stars0 ratingsComputer Vision with Maker Tech: Detecting People With a Raspberry Pi, a Thermal Camera, and Machine Learning Rating: 0 out of 5 stars0 ratingsMATLAB Machine Learning Recipes: A Problem-Solution Approach Rating: 0 out of 5 stars0 ratingsMastering Hibernate Rating: 0 out of 5 stars0 ratingsArtificial Intelligence 2024 Book 2 of 2: AI, #2 Rating: 0 out of 5 stars0 ratingsBeginning MATLAB and Simulink: From Novice to Professional Rating: 0 out of 5 stars0 ratingsHands-On Deep Learning for Images with TensorFlow: Build intelligent computer vision applications using TensorFlow and Keras Rating: 0 out of 5 stars0 ratings
Programming For You
Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5HTML in 30 Pages Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5C Programming For Beginners: The Simple Guide to Learning C Programming Language Fast! Rating: 5 out of 5 stars5/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsC Programming for Beginners: Your Guide to Easily Learn C Programming In 7 Days Rating: 4 out of 5 stars4/5C All-in-One Desk Reference For Dummies Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Python: For Beginners A Crash Course Guide To Learn Python in 1 Week Rating: 4 out of 5 stars4/5Spies, Lies, and Algorithms: The History and Future of American Intelligence Rating: 4 out of 5 stars4/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5Coding All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsHTML & CSS: Learn the Fundaments in 7 Days Rating: 4 out of 5 stars4/5
Reviews for Practical Java Machine Learning
0 ratings0 reviews
Book preview
Practical Java Machine Learning - Mark Wickham
© Mark Wickham 2018
Mark WickhamPractical Java Machine Learninghttps://doi.org/10.1007/978-1-4842-3951-3_1
1. Introduction
Mark Wickham¹
(1)
Irving, TX, USA
Chapter 1 establishes the foundation for the book.
It describes what the book will achieve, who the book is intended for, why machine learning (ML) is important, why Java makes sense, and how you can deploy Java ML solutions.
The chapter includes the following:
A review all of the terminology of AI and its sub-fields including machine learning
Why ML is important and why Java is a good choice for implementation
Setup instructions for the most popular development environments
An introduction to ML-Gates, a development methodology for ML
The business case for ML and monetization strategies
Why this book does not cover deep learning, and why that is a good thing
When and why you may need deep learning
How to think creatively when exploring ML solutions
An overview of key ML findings
1.1 Terminology
As artificial intelligence and machine learning have seen a surge in popularity, there has arisen a lot of confusion with the associated terminology. It seems that everyone uses the terms differently and inconsistently.
Some quick definitions for some of the abbreviations used in the book:
Artificial intelligence (AI): Anything that pretends to be smart.
Machine learning (ML): A generic term that includes the subfields of deep learning (DL) and classic machine learning (CML).
Deep learning (DL): A class of machine learning algorithms that utilize neural networks.
Reinforcement learning (RL): A supervised learning style that receives feedback, but not necessarily for each input.
Neural networks (NN): A computer system modeled on the human brain and nervous system.
Classic machine learning (CML): A term that more narrowly defines the set of ML algorithms that excludes the deep learning algorithms.
Data mining (DM): Finding hidden patterns in data, a task typically performed by people.
Machine learning gate (MLG): The book will present a development methodology called ML-Gates. The gate numbers start at ML-Gate 5 and conclude at ML-Gate 0. MLG3, for example, is the abbreviation for ML-Gate 3 of the methodology.
Random Forest (RF) algorithm: A learning method for classification, regression and other tasks, that operates by constructing decision trees at training time.
Naive Bayes (NB) algorithm: A family of probabilistic classifiers
based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.
K-nearest neighbor (KNN) algorithm: A non-parametric method used for classification and regression where the input consists of the k closest training examples in the feature space.
Support vector machine (SVM) algorithm: A supervised learning model with associated learning algorithm that analyzes data used for classification and regression.
Much of the confusion stems from the various factions or domains
that use these terms. In many cases, they created the terms and have been using them for decades within their domain.
Table 1-1 shows the domains that have historically claimed ownership to each of the terms. The terms are not new. Artificial intelligence is a general term. AI first appeared back in the 1970s.
Table 1-1
AI Definitions and Domains
The definitions in Table 1-1 represent my consolidated understanding after reading a vast amount of research and speaking with industry experts. You can find huge philosophical debates online supporting or refuting these definitions.
Do not get hung up on the terminology. Usage of the terms often comes down to domain perspective of the entity involved. A mathematics major who is doing research on DL algorithms will describe things differently than a developer who is trying to solve a problem by writing application software. The following is a key distinction from the definitions:
Data mining is all about humans discovering the hidden patterns in data, while machine learning automates the process and allows the computer to perform the work through the use of algorithms.
It is helpful to think about each of these terms in context of infrastructure
and algorithms.
Figure 1-1 shows a graphical representation of these relationships. Notice that statistics are the underlying foundation, while artificial intelligence
on the right-hand side includes everything within each of the additional subfields of DM, ML, and DL.
Machine learning is all about the practice of selecting and applying algorithms to our data.
I will discuss algorithms in detail in Chapter 3. The algorithms are the secret sauce that enables the machine to find the hidden patterns in our data.
../images/468661_1_En_1_Chapter/468661_1_En_1_Fig1_HTML.jpgFigure 1-1
Artificial intelligence subfield relationships
1.2 Historical
The term artificial intelligence
is hardly new. It has actually been in use since the 1970s. A quick scan of reference books will provide a variety of definitions that have in fact changed over the decades. Figure 1-2 shows a representation of 1970s AI, a robot named Shakey, alongside a representation of what it might look like today.
Figure 1-2
AI, past and present
Most historians agree that there have been a couple of AI winters.
They represent periods of time when AI fell out of favor for various reasons, something akin to a technological ice age.
They are characterized by a trend that begins with pessimism in the research community, followed by pessimisms in the media, and finally followed by severe cutbacks in funding. These periods, along with some historical context, are summarized in Table 1-2.
Table 1-2
History of AI and Winter
Periods
It is important to understand why these AI winters happened. If we are going to make an investment to learn and deploy ML solutions, we want to be certain another AI winter is not imminent.
Is another AI winter on the horizon? Some people believe so, and they raise three possibilities:
Blame it on statistics: AI is headed in the wrong direction because of its heavy reliance on statistical techniques. Recall from Figure 1-1 that statistics are the foundation of AI and ML.
Machines run amuck: Top researchers suggest another AI winter could happen because misuse of the technology will lead to its demise. In 2015, an open letter to ban development and use of autonomous weapons was signed by Elon Musk, Steven Hawking, Steve Wozniak, and 3,000 AI and robotics researchers.
Fake data: Data is the fuel for machine learning (more about this in Chapter 2). Proponents of this argument suggest that ever increasing entropy will continue to degrade global data integrity to a point where ML algorithms will become invalid and worthless. This is a relevant argument in 2018. I will discuss the many types of data in Chapter 2.
It seems that another AI winter is not likely in the near future because ML is so promising and because of the availability of high-quality data with which we can fuel it.
Much of our existing data today is not high quality, but we can mitigate this risk by retaining control of the source data our models will rely upon.
Cutbacks in government funding caused the previous AI winters. Today, private sector funding is enormous. Just look at some of the VC funding being raised by AI startups. Similar future cutbacks in government support would no longer have a significant impact. For ML, it seems the horse is out of the barn for good this time around.
1.3 Machine Learning Business Case
Whether you are a freelance developer or you work for a large organization with vast resources available, you must consider the business case before you start to apply valuable resources to ML deployments.
Machine Learning Hype
ML is certainly not immune from hype. The book preface listed some of the recent hype in the media. The goal of this book is to help you overcome the hype and implement real solutions for problems.
ML and DL are not the only recent technology developments that suffer from excessive hype. Each of the following technologies has seen some recent degree of hype:
Virtual reality (VR)
Augmented reality (AR)
Bitcoin
Block chain
Connected home
Virtual assistants
Internet of Things (IoT)
3D movies
4K television
Machine learning (ML)
Deep learning (DL)
Some technologies become widespread and commonly used, while other simply fade away. Recall that just a few short years ago 3D movies were expected to totally overtake traditional films for cinematic release. It did not happen.
It is important for us to continue to monitor the ML and DL technologies closely. It remains to be seen how things will play out, but ultimately, we can convince ourselves about the viability of these technologies by experimenting with them, building, and deploying our own applications.
Challenges and Concerns
Table 1-3 lists some of the top challenges and concerns highlighted by IT executives when asked what worries them the most when considering ML and DL initiatives. As with any IT initiative, there is an opportunity cost associated with implementing it, and the benefit derived from the initiative must outweigh the opportunity cost, that is, the cost of forgoing another potential opportunity by proceeding with AI/ML.
Fortunately, there are mitigation strategies available for each of the concerns. These strategies, summarized below, are even available to small organization and individual freelance developers.
Table 1-3
Machine Learning Concerns and Mitigation Strategies
Using the above mitigation strategies, developers can produce some potentially groundbreaking ML software solutions with a minimal learning curve investment. It is a great time to be a software developer.
Next, I will take a closer look at ML data science platforms. Such platforms can help us with the goal of monetizing our machine learning investments. The monetization strategies can further alleviate some of these challenges and concerns.
Data Science Platforms
If you ask business leaders about their top ML objectives, you will hear variations of the following:
Improve organizational efficiency
Make predictive insights into future scenarios or outcomes
Gain a competitive advantage by using AI/ML
Monetize AI/ML
Regardless of whether you are an individual or freelance developer, monetization is one of the most important objectives.
Regardless of organizational size, monetizing ML solutions requires two building blocks: deploying a data science platform , and following a ML development methodology .
When it comes to the data science platforms, there are myriad options. It is helpful to think about them by considering a build vs. buy
decision process. Table 1-4 shows some of the typical questions you should ask when making the decision. The decisions shown are merely guidelines.
Table 1-4
Data Science Platform: Build vs. Buy Decision
So what does it actually mean to buy
a data science platform? Let’s consider an example.
You wish to create a recommendation engine for visitors to your website. You would like to use machine learning to build and train a model using historical product description data and customer purchase activity on your website. You would then like to use the model to make real-time recommendations for your site visitors. This is a common ML use case. You can find offerings from all of the major vendors to help you implement this solution. Even though you will be building
your own model using the chosen vendor’s product, you are actually buying
the solution from the provider. Table 1-5 shows how the pricing might break down for this project for several of the cloud ML providers.
Table 1-5
Example ML Cloud Provider Pricing https://cloud.google.com/ml-engine/docs/pricing , https://aws.amazon.com/aml/pricing/ , https://azure.microsoft.com/en-us/pricing/details/machine-learning-studio/
In this example, you accrue costs because of the compute time required to build your model. With very large data sets and construction of deep learning models, these costs become significant.
Another common example of buying
an ML solution is accessing a prebuilt model using a published API. You can use this method for image detection or natural language processing where huge models exist which you can leverage simply by calling the API with your input details, typically using JSON. You will see how to implement this trivial case later in the book. In this case, most of the service providers charge by the number of API calls over a given time period.
So what does it mean to build
a data science platform? Building in this case refers to acquiring a software package that will provide the building blocks needed to implement your own AI or ML solution.
The following list shows some of the popular data science platforms:
MathWorks: Creators of the legendary MATLAB package, MathWorks is a long-time player in the industry.
SAP: The large database player has a complete big data services and consulting business.
IBM: IBM offers Watson Studio and the IBM Data Science Platform products.
Microsoft: Microsoft Azure provides a full spectrum of data and analytics services and resources.
KNIME: KNIME analytics is a Java-based, open, intuitive, integrative data science platform.
RapidMiner: A commercial Java-based solution.
H2O.ai: A popular open source data science and ML platform.
Dataku: A collaborative data science platform that allows users to prototype, deploy, and run at scale.
Weka: The Java-based solution you will explore extensively in this book.
The list includes many of the popular data science platforms, and most of them are commercial data science platforms. The keyword is commercial. You will take a closer look at Rapidminer later in the book because it is Java based. The other commercial solutions are full-featured and have a range of pricing options from license-based to subscription-based pricing.
The good news is you do not have to make a capital expenditure in order to build a data science platform because there are some open source alternatives available. You will take a close look at the Weka package in Chapter 3. Whether you decide to build or buy, open source alternatives like Weka are a very useful way to get started because they allow you to build your solution while you are learning, without locking you into an expensive technology solution.
ML Monetization
One of the best reasons to add ML into your projects is increased potential to monetize. You can monetize ML in two ways: directly and indirectly.
Indirect monetization: Making ML a part of your product or service.
Direct monetization: Selling ML capabilities to customers who in turn apply them to solve particular problems or create their own products or services.
Table 1-6 highlights some of the ways you can monetize ML.
Table 1-6
ML Monetization Approaches
Many of the direct strategies employ DL approaches. In this book, the focus is mainly on the indirect ML strategies. You will implement several integrated ML apps later in the book. This strategy is indirect because the ML functionality is not visible to your end user.
Customers are not going to pay more just because you include ML in your application. However, if you can solve a new problem or provide them capability that was not previously available, you greatly improve your chances to monetize.
There is not much debate about the rapid growth of AI and ML. Table 1-7 shows estimates from Bank of America Merrill Lynch and Transparency Market Research. Both firms show a double-digit cumulative annual growth rate, or CAGR. This impressive CAGR is consistent with all the hype previously discussed.
Table 1-7
AI and ML Explosive Growth
These CAGRs represent impressive growth. Some of the growth is attributed to DL; however, you should not discount the possible opportunities available to you with CML, especially for mobile devices.
The Case for Classic Machine Learning on Mobile
Classic machine learning is not a very commonly used term. I will use the term to indicate that we are excluding deep learning. Figure 1-3 shows the relationship. These two approaches employ different algorithms, and I will discuss them in Chapter 4.
This book is about implementing CML for widely available computing devices using Java. In a sense, we are going after the low-hanging fruit.
CML is much easier to implement than DL, but many of the functions we can achieve are no less astounding.
Figure 1-3
Classic machine learning relationship diagram
There is a case for mastering the tools of CML before attempting to create DL solutions. Table 1-8 highlights some of the key differences between development and deployment of CML and DL solutions.
Table 1-8
Comparison of Classic Machine Learning and Deep Learning
For mobile devices and embedded devices, CML makes a lot of sense. CML outperforms DL for smaller data sets, as shown on the left side of the chart in Figure 1-7.
It is possible to create CML models with a single modern CPU