Hopkins B. ChatGPT For Java. A Hands-On Developer's Guide... 2024
Hopkins B. ChatGPT For Java. A Hands-On Developer's Guide... 2024
ChatGPT
for Java
A Hands-on Developer’s Guide
to ChatGPT and Open AI APIs
―
Bruce Hopkins
ChatGPT for Java
A Hands-on Developer’s Guide
to ChatGPT and Open AI APIs
Bruce Hopkins
Foreword by Patricia Thaine,
Co-Founder & CEO, Private AI
ChatGPT for Java: A Hands-on Developer’s Guide to ChatGPT
and Open AI APIs
Bruce Hopkins
Beaverton, OR, USA
Foreword�������������������������������������������������������������������������������������������xiii
iii
Table of Contents
6. Model (Optional)����������������������������������������������������������������������������������������21
7. Temperature (Optional)������������������������������������������������������������������������������22
8. Maximum Length (Optional)����������������������������������������������������������������������22
Try It Now! Experimenting with the “System” Role��������������������������������������������22
Conclusion����������������������������������������������������������������������������������������������������������23
iv
Table of Contents
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
Analyzing ChatGPTClientForQAandModeration.java������������������������������������������186
Using JSONPath in Order to Extract Content Quickly in JSON Files������������192
Running Our Intelligent Q&A Bot: TechSupportBot.java�������������������������������192
We Have a Monumental Achievement… With One Slight Flaw������������������������195
Update the System Message to ChatGPT and Let’s Try Again���������������������������196
Conclusion��������������������������������������������������������������������������������������������������������198
Index�������������������������������������������������������������������������������������������������229
viii
About the Author
Bruce Hopkins is a technical writer and
world-renowned expert. He is both an Oracle
Java Champion as well as an Intel Software
Innovator. Bruce is also the author of the
Apress book Bluetooth for Java.
ix
About the Technical Reviewer
Van VanArsdale is a technology leader with
over 30 years of experience in the software
industry. He holds a B.S. in Computer
Information Systems from the University of
Massachusetts Lowell and an M.S. in CIS from
Missouri State University. He has worked as
a software engineer, architect, manager, and
teacher. Van currently leads a team at a top
financial services company and is an adjunct
instructor at Missouri State University.
xi
Foreword
Having had the pleasure of knowing Bruce for a few years now, I’ve
witnessed his ability to take cutting-edge concepts and problems and
make them accessible to programmers from a myriad of different
backgrounds and abilities. Bruce’s experience is particularly suitable for
the task of breaking down many of the latest AI technologies, like ChatGPT,
into core components that developers can use in their day-to-
day problem-solving. He is the co-author of Bluetooth for Java, which
came out just four years after Bluetooth made its way into the market,
and he is the author of numerous technical guides written for major
technology companies who needed their newest inventions to be adopted
by developers.
As the co-founder and CEO of a Microsoft-backed AI company, called
Private AI, I have the privilege of speaking with developers, managers, and
C-level executives from organizations around the world not only about the
implementation of responsible use of data but about the questions they
have around generative AI, what kinds of problems to tackle, and where
to start. Technology like ChatGPT is so new that the underlying model
architecture allowing it to perform as well as it does – the Transformer
architecture, that is – only came out in 2017 in the research paper titled
“Attention Is All You Need.” As a result, a lot of the key questions that
managers are struggling with are also questions developers are avidly
diving into, including “What can I do with this technology?”, “How and
where should I implement it?”, “Where should I start?”, “What are the
pitfalls?”, and the ever-gnawing question when a lot of attention is placed
on new inventions: “Is this all just hype?”
xiii
Foreword
xiv
CHAPTER 1
Introducing ChatGPT
for Java Developers
Who Is This Book For?
First of all, this book is for Java developers who don’t have any training or
experience in artificial intelligence, natural language processing, machine
learning, or deep learning. You may have heard of the term “language
model,” but I’m going to assume that it’s NOT a term that you use
every day.
Secondly, you might be familiar with (or have tried) ChatGPT, but
you don’t quite understand how everything works “under the hood”
and you’re not sure how to get started in order to use Java and ChatGPT
programmatically together to “AI enable” your own applications and
services.
Chapter Overview
In this chapter, we’re going to dive right in and explain a few terms that
you might not be familiar with, and then go straight to the ChatGPT
Playground. The Playground is essentially a web interface for developers
that’s built on top of ChatGPT REST API. So, once you’re familiar with the
Playground, the calls and responses from the ChatGPT API will feel like
second nature to you.
This book is filled with tons of examples and example code, so why bother
typing everything yourself? Below is the URL to the GitHub repo to access
the source code.
https://github.com/Apress/ChatGPT-for-Java
2
Chapter 1 Introducing ChatGPT for Java Developers
1. Singleton Pattern:
- Purpose: This pattern ensures that a class has only
one instance and provides a global point of access to that
instance. It's particularly useful when you want to control
access to shared resources, such as configuration managers or
connection pools.
- Importance: The Singleton pattern is important because it
provides a controlled mechanism to prevent creating multiple
instances of a class, which can save memory and synchronize
access to resources. Moreover, it ensures that operations using
3
Chapter 1 Introducing ChatGPT for Java Developers
the singleton are using the same instance and data, which is
critical in scenarios where consistent state or coordinated
actions are required.
2. Observer Pattern:
- Purpose: This pattern defines a one-to-many dependency
between objects. When one object (known as the subject
or observable) changes its state, all of its dependents
(observers) are notified and updated automatically. It's the
underlying principle behind the Model-View-Controller (MVC)
architecture, where the model is the subject and the views are
observers.
- Importance: The Observer pattern is vital for building
systems that are modular and decoupled. Changes in one part of
the system don't directly affect others, allowing for easier
maintenance and scalability. This pattern also makes it easier
to add new observers without modifying existing code, enhancing
the system's extensibility.
3. Factory Pattern:
- Purpose: The Factory pattern provides an interface for
creating instances of a class, but it's up to the derived
subclasses to implement the method and determine which class
to instantiate. It promotes the loose coupling by eliminating
the need to bind application-specific classes into the code.
Instead, the code interacts only with the resulting interface
or abstract class, so the system can evolve without altering
the calling code.
- Importance: Using the Factory pattern, a system can be
more flexible and scalable. If there's a need to introduce a
new product or modify an existing one, it can be done without
disturbing the existing code. It encapsulates the object
creation process and separates it from the main logic of
4
Chapter 1 Introducing ChatGPT for Java Developers
Not bad, huh? ChatGPT was not only able to provide a list of common
design patterns used by Java developers, but also explain valid reasons on
why they are important – all done in Natural Language.
So, at the end of the day, what does this mean for Java developers?
Well, it’s actually a big deal. That’s because developers are now able to
create applications that can have some type of “understanding” of text in
Natural Language. Before the advent of AI tools like ChatGPT, developers
typically had to create regular expressions (also called “regex”) in order to
perform rudimentary character and String pattern matching within a body
of text. However, pattern matching and Natural Language Understanding
are two totally different things.
5
Chapter 1 Introducing ChatGPT for Java Developers
Listing 1-3. Sadstory.txt - A Sad Story About a Kid Who Didn’t Eat
Ice Cream
6
Chapter 1 Introducing ChatGPT for Java Developers
7
Chapter 1 Introducing ChatGPT for Java Developers
As you can see, ChatGPT can utilize NLP and NLU, and therefore it can
understand (artificially intelligently, that is) the context of the scenario. It
can interpret the relationships between the kids, their preferences, and
Steve’s lactose intolerance. It is able to understand that the last names of
the children, the street name, and the city name are the names of dairy
products, but are obviously irrelevant to the question at hand.
8
Chapter 1 Introducing ChatGPT for Java Developers
9
Chapter 1 Introducing ChatGPT for Java Developers
10
Chapter 1 Introducing ChatGPT for Java Developers
GPT-4 GPT-4 is the latest generation of OpenAI’s GPT set of models. GPT
stands for Generative Pre-trained Transformer, and these models
have been trained to understand natural language as well as multiple
programming languages. The GPT-4 models take text and images as
inputs as the prompt and provide text as the output.
11
Chapter 1 Introducing ChatGPT for Java Developers
Later in this book in Chapter 4, we’re going to have some fun with
the DALL·E model to visualize the conversation of what is being
discussed in your favorite podcast episode.
Whisper Simply stated, the Whisper model converts audio into text.
In this book, we’re going to use the Whisper model to search for text
in a podcast episode.
(continued)
12
Chapter 1 Introducing ChatGPT for Java Developers
Embeddings The Embeddings model can convert large amounts of text into a
numerical representation of how the Strings in the text are related.
So how is that useful? Embeddings allows developers to do specific
tasks using custom datasets. Yes, this means that you can train
the embeddings model on specific data that is relevant to your
application. This allows you to do operations such as
13
Chapter 1 Introducing ChatGPT for Java Developers
Moderation The moderation models are fine-tuned models that can detect
whether text may be sensitive or unsafe. These models can analyze
text content and classify it according to the following categories:
• Hate
• Hate/threatening
• Harassment
• Harassment/threatening
• Self-harm
• Self-harm/intent
• Self-harm/instructions
• Sexual
• Sexual/minors
• Violence
• Violence/graphic
14
Chapter 1 Introducing ChatGPT for Java Developers
15
Chapter 1 Introducing ChatGPT for Java Developers
Table 1-1. List of Models with Their Token Limitations and the Cost
Per Token
Model Max Tokens Cost of Token Cost of Token
Input Output
16
Chapter 1 Introducing ChatGPT for Java Developers
1
0 Not creative, very predictable
Figure 1-2. Modify the Temperature in Order to Get More (or Less)
Creative Responses
https://platform.openai.com/account/api-keys
As you can see from the image in Figure 1-3, you can name your API
key anything that you want.
Figure 1-3. Before You Can Access the Playground or Make API Calls,
You Need to Have an API Key
17
Chapter 1 Introducing ChatGPT for Java Developers
https://platform.openai.com/playground
Upon entering the Playground, click the combobox at the top and
select Chat the option to end the Chat Playground, as shown in Figure 1-4.
Figure 1-4. After Entering the Playground, Select the Chat Option
Figure 1-5 depicts the Chat Playground, with certain parts numbered
so that they can be easily identified.
18
Chapter 1 Introducing ChatGPT for Java Developers
1. System
As you can see, the user interface for the Chat Playground is vastly more
complex than the ChatGPT website that everyone else uses. So let’s talk
about the System field (see Figure 1-5, item 1).
In my opinion, ChatGPT can be described as “a vastly powerful form
of artificial intelligence… with amnesia.” Therefore, when you’re using
ChatGPT programmatically, you need to inform the system who it is in the
conversation!
Figure 1-6, shown as follows, gives you a glimpse of the thousands of
different roles that ChatGPT can play in a conversation.
19
Chapter 1 Introducing ChatGPT for Java Developers
Figure 1-6. The System Field in the Chat Playground Allows You to
Set the Role That ChatGPT Will Play in the Conversation
2. User
The User field (Figure 1-5, item 2) in the ChatGPT Playground is where you
type your prompt to ChatGPT, which can be anything that you want, for
example, “describe how telemedicine will affect the medical industry.”
20
Chapter 1 Introducing ChatGPT for Java Developers
21
Chapter 1 Introducing ChatGPT for Java Developers
You may also see that some models have a month and day
associated with their name, which is simply a snapshot of that model.
Programmatically selecting a snapshot enables developers to have some
sort of predictability in the responses that they will receive from ChatGPT,
because the current models are always updated.
T ry It Now! Experimenting
with the “System” Role
Now that we’re familiar with several of the features of the Chat Playground,
let’s send our first prompt using the settings discussed above. Listings 1-7
and 1-8 use the same prompt asking ChatGPT to give a few paragraphs on
telemedicine, but the role of the system is vastly different from each other.
22
Chapter 1 Introducing ChatGPT for Java Developers
You are encouraged to try these two prompts yourself and see what the
responses are. Adjust the settings for the temperature and token length to
get familiar with how those parameters affect the outcome.
Conclusion
You just learned more about how ChatGPT can be used by developers.
We covered some of the basics of the Chat Playground, which is a web
interface for developers to interact with the ChatGPT API.
We talked about how to set the system, user, and assistant roles in the
Chat Playground and how to adjust settings such as the temperature and
maximum length of output.
You learned about some of the parameters and terminology necessary
to use the Chat Playground, such as the model, the temperature, and
tokens. Getting familiar with the parameters of the Chat Playground is
essential to knowing how to use the REST API since the Playground is a
subset of capabilities offered by the REST API.
In the next chapter, we’ll see how to use ChatGPT as your “pair
programmer” and port the officially supported ChatGPT REST interface
to Java.
23
CHAPTER 2
Using ChatGPT
As Your Java
Pair-Programmer
I’m a big fan of some of the practices of XP (eXtreme Programming), and
especially pair-programming. No matter what flavor of pair-programming
that you prefer, it involves two engineers sitting down at the same screen
and solving the same problem together. One of the biggest benefits that
you get is a fresh set of eyes on a problem, and of course, you now have two
engineers who have “touched” the codebase instead of one. Sometimes
you can have one engineer write the code and the other write the tests and
the comments. No matter how you slice it, it’s all good stuff.
Now, the OpenAI REST APIs for ChatGPT and their other models are
officially supported in Python, Typescript, and of course, cURL (which is
the de facto standard for REST APIs).
There are a few Java APIs that exist that were created by third-party
developers, but the biggest problem (in my opinion) is that this space is
rapidly changing. OpenAI is constantly updating both their models and
their HTTP interfaces, and as a result, they are adding or deprecating
features and functionality frequently. If you choose to use a third-party
Java API for your project, you risk the problem of using an API that is
outdated or rout of sync with the OpenAI REST API.
26
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
Table 2-1. The HTTP Parameters Necessary to Call the List Models
Endpoint
HTTP Param Description
M
odel (JSON)
Table 2-2. The Structure of the Model JSON Object
Field Type Description
(continued)
27
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
Now that we have the details of the HTTP request and response,
we can now create the ListModel.java app using the prompt shown in
Listing 2-1.
So, after sending the prompt, ChatGPT will provide you an answer as
shown in Listing 2-2.
28
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
String inputLine;
StringBuilder response = new StringBuilder();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
30
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
"object": "list",
"data": [
{
"id": "text-search-babbage-doc-001",
"object": "model",
"created": 1651172509,
"owned_by": "openai-dev"
},
{
"id": "gpt-4",
"object": "model",
"created": 1687882411,
"owned_by": "openai"
},
{
"id": "gpt-3.5-turbo-16k",
"object": "model",
"created": 1683758102,
"owned_by": "openai-internal"
},
{
"id": "curie-search-query",
"object": "model",
"created": 1651172509,
"owned_by": "openai-dev"
},
{
"id": "text-davinci-003",
"object": "model",
"created": 1669599635,
"owned_by": "openai-internal"
},
31
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
{
"id": "text-search-babbage-query-001",
"object": "model",
"created": 1651172509,
"owned_by": "openai-dev"
},
{
"id": "babbage",
"object": "model",
"created": 1649358449,
"owned_by": "openai"
},
...
Listing 2-3 is a partial list due to the sheer size of the number of models
available for developers to use! The good news, however, is that the full
response is provided as a table in the Appendix.
Now that we can programmatically get a list of models available to
use, it’s time to send prompts to ChatGPT using Java. This is accomplished
using the Chat Endpoint.
Chat Endpoint
The Chat Endpoint (formerly called, “Chat Completion”) is a REST service
that’s basically a 1-to-1 representation of what you can do in the Chat
Playground; therefore, this service should feel like second nature to you.
32
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
Table 2-4 describes the format of the JSON object necessary for the
request body for the Chat Endpoint. After a quick glance, you can see that only
a few fields are actually required in order to successfully invoke the service.
C
hat (JSON)
Table 2-4. The Structure of the Chat JSON Object
Field Type Required? Description
33
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
34
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
35
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
36
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
37
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
38
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
39
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
40
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
Listing 2-4 is an example of what the JSON object would look like in
order to properly invoke the Chat Endpoint.
{
"model": "gpt-3.5-turbo",
"messages": [
{
41
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
"role": "system",
"content": "You are a product marketer"
},
{
"role": "user",
"content": "Explain why Java is so widely used in the
enterprise "
}
],
"temperature": 1,
"max_tokens": 256,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}
42
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
43
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
44
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
Listing 2-5 is an example of the JSON response after invoking the Chat
Endpoint.
{
"id": "chatcmpl-7wUOFQ3S34scDLmrLdWTTqvHmXztQ",
"object": "chat.completion",
"created": 1694174199,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Java is widely used in the enterprise
because it is platform-independent, allowing applications to
run on any system. Additionally, Java has a large and mature
45
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
46
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
Figure 2-1. The ChatGPT Tokenizer Can Give You a Quick Count of
the Tokens in Your Prompt
47
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
User:
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a product marketer"
},
{
"role": "user",
"content": "Explain why Java is so widely used in the
enterprise "
}
48
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
],
"temperature": 1,
"max_tokens": 256,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}'
The response from ChatGPT was a good start, but didn’t have
everything exactly as I wanted it. Now, since I didn’t specify exactly how
the HTTP calls should have been made, ChatGPT decided to use the
Apache HTTPClient library. Therefore, I added the following User message
(Listing 2-7).
49
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.annotation.JsonProperty;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
50
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
try {
URL url = new URL(endpoint);
HttpURLConnection connection = (HttpURLConnection)
url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type",
"application/json");
connection.setRequestProperty("Authorization",
"Bearer " + openAIKey);
connection.setDoOutput(true);
51
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
@JsonProperty("model")
private String model;
@JsonProperty("messages")
private List<Message> messages;
@JsonProperty("temperature")
private float temperature;
@JsonProperty("max_tokens")
52
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
@JsonProperty("top_p")
private float top_p;
@JsonProperty("frequency_penalty")
private int frequency_penalty;
@JsonProperty("presence_penalty")
private int presence_penalty;
}
@JsonProperty("content")
private String content;
53
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
As you examine the preceding code listing, you see that the HTTP calls
are done using pure Java APIs and not using any external libraries – just
like I requested in the prompt. Note, however, that creating and parsing
JSON objects in pure Java can be a pain, so I personally specified that the
Jackson API should be used, which (as you can see) is reflected in the
imports statements and the code itself.
The resulting code includes two inner classes, Chat and Message,
which can easily be separated into separate Java files. I can either do that
manually myself, or add a new “User” message to ChatGPT to request the
inner classes to be separated into distinct Java files.
After executing ChatGPTClient.java, Listing 2-9 has the response.
{
"id": "chatcmpl-7xIRvjByGobmWH9Vo7OObHCSSwzgI",
"object": "chat.completion",
"created": 1694366627,
"model": "gpt-3.5-turbo-0613",
"choices": [
{
54
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
"index": 0,
"message": {
"role": "assistant",
"content": "Java is widely used in the enterprise
primarily due to its numerous benefits and features that make
it a popular choice among large organizations. Here are some
key reasons why Java is so widely adopted in the enterprise:\n\
n1. Platform Independence: One of the biggest advantages of
Java is its platform independence. Java programs can run on
any operating system, making it highly adaptable across a
wide range of devices and platforms. This makes it easier
for enterprises to develop applications that can be deployed
on different systems without any major modifications.\n\n2.
Robustness and Stability: Java is known for its strong emphasis
on reliability, stability, and error handling. It has a built-
in memory management system that prevents memory leaks and
ensures robust performance. This stability is highly valued in
enterprise environments where systems need to run consistently
without disruptions.\n\n3. Scalability: Java offers excellent
scalability, making it suitable for large-scale enterprise
applications. It provides robust support for multi-threading,
allowing applications to handle a large number of concurrent
users smoothly. Java's ability to handle high traffic loads and
distribute processing across multiple servers makes it ideal
for enterprise-level systems.\n\n4. Rich Standard Library and
Frameworks: Java comes with a comprehensive standard library,
offering a wide range of pre-built functions and classes that
simplify development. Additionally, Java has a"
},
"finish_reason": "length"
}
55
Chapter 2 Using ChatGPT As Your Java Pair-Programmer
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 256,
"total_tokens": 284
}
}
Did you notice in the preceding listing that the answer to my prompt
was truncated? “Additionally, Java has a" is not a full sentence, but since I
requested that no more than 256 tokens should be used in the response, it
didn’t go beyond that limitation.
Conclusion
Contrary to popular belief, ChatGPT is not a mind-reader! It doesn’t
have the ability to replace developers and architects because it’s (gasp!)
artificially intelligent. It’s very useful to ask a single question and receive an
immediate, straightforward response. It definitely can be used to convert a
natural language prompt (or request) into code, but you definitely need a
developer to make the judgment call if the resulting code should be used,
refined, or completely disregarded.
56
CHAPTER 3
Using AI in the
Enterprise! Creating
a Text Summarizer
for Slack Messages
In today’s corporate world, it’s extremely common for companies to have
an instance of Slack (or Microsoft Teams) to organize themselves, and use
it as a central place of communication to everyone in the company. Now,
if you’ve ever used Slack before, I think you know how easily a channel can
become flooded with a ton of messages because SOME important thing
happened SOMEWHERE in the company or the world.
Of course, the more responsibility that you have within the company
(i.e., manager, team leader, architect, etc.), the more channels you’re
expected to participate in. In my opinion, Slack is a double-edged sword.
You need to use it to do your job, but as a developer, you definitely can’t
attend a daily standup meeting and say, “Yesterday, uh, I spent all day
reading Slack. No roadblocks.”
Additionally, if you work for a company with clients in various time
zones (which is quite common nowadays) it’s pretty daunting to open
Slack in the morning and see a ton of messages that were posted while you
were away from the keyboard.
58
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
this.model = model;
this.messages = messages;
this.temperature = temperature;
this.max_tokens = max_tokens;
this.top_p = top_p;
this.frequency_penalty = frequency_penalty;
this.presence_penalty = presence_penalty;
}
So, if you refer back to Table 2-4 in Chapter 2, you see that only the
model and messages parameters are actually required to successfully
invoke the Chat Endpoint. All the other parameters are optional, and some
of them have their own built-in defaults if you don’t specify anything.
Those are the reasons why we didn’t need to “model out” the entire Chat
JSON object.
So, this constructor is basically begging to be refactored using the
Builder Pattern. The Builder Pattern allows us to get an instance of the
object that we want, while ONLY specifying the parameters that we
care about.
Additionally, it makes sense that the Chat and Message objects are no
longer inner classes, and exist in their own .java files. Listing 3-2 shows
how we can get an instance of the Chat object which has been modified
using the Builder Pattern.
59
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
.presencePenalty(presence_penalty)
.build();
Since the request to the Chat Endpoint MUST has a model and
message specified in the Chat JSON object, some defaults have been added
to the Chat.java class in order to make it safer to use (safer in the sense of
being less error prone to users of the class). Listing 3-3 is the new Chat.
java file.
import java.util.List;
import java.util.ArrayList;
import com.fasterxml.jackson.annotation.JsonProperty;
@JsonProperty("messages")
private List<Message> messages;
@JsonProperty("temperature")
private float temperature;
@JsonProperty("max_tokens")
private int max_tokens;
@JsonProperty("top_p")
private float top_p;
@JsonProperty("frequency_penalty")
private int frequency_penalty;
@JsonProperty("presence_penalty")
private int presence_penalty;
60
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
private Chat(ChatBuilder builder) {
this.model = builder.model;
this.messages = builder.messages;
this.temperature = builder.temperature;
this.max_tokens = builder.max_tokens;
this.top_p = builder.top_p;
this.frequency_penalty = builder.frequency_penalty;
this.presence_penalty = builder.presence_penalty;
}
private ChatBuilder() {
}
61
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
public ChatBuilder model(String model) {
this.model = model;
return this;
}
62
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
return this;
}
}
}
This class (with this design pattern) is flexible enough for you to add
or remove parameters that you need to invoke the Chat Endpoint. If at any
time OpenAI adds new parameters and features to the Chat Endpoint, you
can modify this class to support the new requirements.
For completeness, Listing 3-4 shows the Message.java class.
import com.fasterxml.jackson.annotation.JsonProperty;
@JsonProperty("content")
private String content;
63
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
hatGPT Is Here to Take Away Everyone’s
C
Jobs (Not Really)
It is my humble opinion that every company in the world is sitting on a
gold mine of untapped information. If you are using any system that keeps
a log of exchanges between employees, a database of support requests
from your customers, or any large repository of text (yes, this includes your
email, Microsoft Exchange, and corporate Gmail), then you have a large
repository of unstructured text that is waiting to be utilized.
Therefore, the best use of ChatGPT is not to eliminate anyone’s jobs.
It should be used in order to augment and extend what team members in
your company are already doing. As we saw in the previous chapter, as a
programmer, ChatGPT can work as a very effective Pair-Programmer. It
is also very good at performing certain difficult tasks very efficiently and
quickly. So, let’s see a practical example of what can be done in order to
make useful a large source of unstructured text.
• Dave (PM)
• Keith (CTO)
65
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Keith [16:01 | 02/08/2019]: Thanks for bringing this to our
attention, Fatima. Let's jump on this right away. @John,
can you take the lead in investigating the issue since our
architect is out sick today?
66
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Keith [16:22 | 02/08/2019]: I'll pull up the monitoring logs,
John, and check if there have been any memory-related anomalies
in recent releases. Let me get back to you on that.
67
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Dave [18:06 | 02/08/2019]: Great job, team! John, please
keep the client informed about the progress and let them know
we have a fix ready for them on the next app update. Can
someone make sure the release notes reflect this?
68
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Prompt #1: “tl;dr”
This prompt is short, sweet, and to the point. In case you’re not familiar
with it, it’s a term used online frequently to briefly summarize a long
article. It literally stands for “Too Long; Didn’t Read,” which is how I feel
about a lot of conversations on Slack when I’m swamped with work.
Conveniently, ChatGPT can understand this simple term (I wouldn’t
necessarily call it a word) as a prompt, and the results are pretty
impressive.
Note You have two options now on how to send the preceding
prompt to ChatGPT. The first option is to use the ChatGPTClient.
java app and its supporting classes. The second option is to use
the Chat Playground. In either case, be sure that the first User
message contains all of the text that is in Listing 3-5. It’s truncated in
Listing 3-6 to make it more readable.
69
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Notice how the number of tokens is kept small, at 360, which will
ultimately help ChatGPT to create a short response. Also see that the
temperature is left at 1 (the default value), but the Top P is increased all the
way to its maximum (which is 1). Therefore, it’s important to play around
with temperature and other settings to control the tone and variety of
ChatGPT’s responses.
Listing 3-7 shows ChatGPT’s summarization of the conversation from
Listing 3-5.
70
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Prompt #2: “Explain This in 3 Sentences or Less”
So, If “tl;dr” doesn’t provide a brief enough explanation for you, then we
can tell ChatGPT to make things even simpler. ChatGPT does well with
parameters, so let’s specify that the explanation shouldn’t be any longer
than 3 sentences as shown in Listing 3-8.
As you may have noticed, the temperature is set to 0.82 in this example.
We really want ChatGPT to give us a concise answer here, so the “drier” the
better.
71
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
issue is related to a compatibility problem with iOS 15.1. John
fixes the problem and Keith approves the deployment of the fix
to the client's production environment.
Again, the temperature is set below 1.0 so that ChatGPT can keep the
answer brief and easy to understand. We increased the token length so that
ChatGPT can take the time to explain some potentially technical concepts.
72
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Listing 3-11. RESPONSE. Getting a Simplified Version of a More
Complex Conversation
John checks the codebase but doesn't find any obvious issues,
so he suspects it could be a memory-related problem. Keith
agrees to check for memory-related anomalies in recent
releases. Fatima provides an update that the client is using an
iPhone X running iOS 15.1 and mentioned that the issue started
after updating the app. Keith suggests testing the latest app
update on an iPhone X simulator with iOS 15.1 to replicate
the issue.
73
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
John uploads the patch and informs the client about the fix.
Keith reviews and approves the patch, and John proceeds with
updating the app in the store. Fatima informs the client about
the fix, and they express relief and gratitude for the prompt
response. Dave commends the team for their collaboration
and quick action, emphasizing the importance of maintaining
good work.
74
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Temperature: 1
Maximum length: 1200
Top P: 0.75
Frequency penalty: 0
Presence penalty: 0.5
1. Test the patch: Once the patch has been deployed to the app
store, thoroughly test it on various devices and operating
systems to ensure that the issue has been resolved and there
are no new bugs or regressions introduced.
75
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
5. Document the issue and resolution: Create a detailed report
documenting the issue, the steps taken to reproduce and resolve
it, and any other relevant information. This will serve as
a reference for future troubleshooting and help improve the
development process.
76
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Let’s Talk About Real Prompt Engineering
If you Google the term, “prompt engineering,” you’ll find a ton of
examples, blogs, and even full websites with subscription plans that will
try to convince you that the perfect prompts can be created by simply only
using text. As you have seen from the examples above, prompt engineering
cannot be accomplished solely by merely crafting textual inputs.
In reality, the process is quite similar to cooking a sophisticated meal.
Imagine trying to cook, for example, beef bourguignon using only salt as
the seasoning, and neglecting all other ingredients and spices! Honestly,
the result would pale in comparison to the actual dish.
Similarly, try assembling an entire orchestra but only use one
instrument and one musician. That’s an embarrassing “one man band.”
Therefore, simply adjusting the text to the prompt isn’t enough in order
to truly perform prompt engineering. The parameters such as the model’s
temperature, which controls randomness; the top-p, impacting token
probability; the specific model used; the number of tokens; and the other
parameters to the endpoint all play highly pivotal roles in getting a great
response.
This book is not about prompt engineering, since (as you can see
from the explanation above) it truly involves several factors that don’t
have anything to do with Java. However, you are highly encouraged
to experiment with ALL the parameters to the models and endpoints
provided by OpenAI to find what works best for your use case.
77
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
But one step at a time. First, we’re going to make our Slack bot app, so
head over to Slack API website (Figure 3-1).
https://api.slack.com/
78
CHAPTER 3 USING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Of course, you’ll need to have a Slack account in order for this to work,
so if you don’t have one, then you need to create one first.
After you have logged in, go to the top-right of the page and navigate
to “Your apps ➤ Create your first app”, as shown in Figure 3-1. In Slack
terminology, a “bot” is an “app,” and bots are not allowed to run on a Slack
instance unless they have been registered with Slack first.
As shown in Figure 3-2, you’ll be taken to the Your Apps page where
you can manage your Slack apps. Immediately, you’ll see a popup to
Create an App button in the middle of the screen.
Select the option to create your app from scratch. This is because we
want to be able to manipulate all of the details of the application ourselves
without overcomplicating things with a bunch of default settings.
79
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Afterward, you’ll be prompted to specify a name for your bot and to
select the workspace that you want your bot to have access to, as shown in
Figure 3-3.
Click the Create App button to proceed.
80
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
names of the people in the Slack workspace; otherwise, we’ll get a
UUID representation of the person instead of their name, which is
meaningless to us.
So, scroll down and be sure to add the following OAuth Scopes to your
Slack Bot, as shown in Figure 3-4.
• channels:history
• channels:read
• users:read
81
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Confirming Your Settings
After you’ve added the appropriate scopes for your bot, scroll back up and
click Basic Information from the left side bar.
On the page that follows, you’ll see that there’s now a green checkmark
beside “Add features and functionality,” which confirms that you’ve added
your scopes correctly, as shown in Figure 3-5.
82
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
83
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Click the Allow button to authorize the bot and allow the permissions
you allotted in the previous step.
84
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Getting Your Slack Bot (Access) Token
This time, “token,” actually means access token! In order to connect to the
Slack API and access messages and user information programmatically,
you need a specific OAuth token generated for your Slack bot.
Figure 3-8. Copy Your OAuth Token for Your Slack Bot App
Back on the OAuth & Permissions page, be sure to copy the bot
token (which usually starts with “xoxb-”) from the page here, as shown in
Figure 3-8.
85
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Inviting Your Bot to Your Channel
Next, you’re going to go to the channel you’d like to use to test your bot and
type in the following command in the channel itself.
/invite
Select the option “Add apps to this channel,” and then select the
name of the Slack Bot that you specified earlier when you registered the
bot with Slack.
86
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Finding the Channel ID of Your Channel
Ok, this is an easy step to do. In Slack, right-click the name of your channel
and select the option “View Channel details.” At the bottom of the popup
window is the ID of your channel. Copy that number and save it for later.
Your Java app will need this in order to join the right channel in your Slack
workspace.
• GSON
• gson-<VERSION>.jar
• Kotlin
• kotlin-stdlib-<VERSION>.jar
• kotlin-stdlib-jdk8-<VERSION>.jar
87
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
• OK HTTP and OK IO
• okhttp-<VERSION>.jar
• okio-<VERSION>.jar
• okio-jvm-<VERSION>.jar
• SL4J
• slf4j-api-<VERSION>.jar
Therefore, Listings 3-14 and 3-15 are snippets of the Maven pom.xml
and Gradle build.gradle files necessary (with the versions that I tested
with) in order to get everything to build.
<dependencies>
<!-- Gson library -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10.1</version>
</dependency>
88
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
<!-- OkHttp library -->
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.11.0</version>
</dependency>
89
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.7</version>
</dependency>
</dependencies>
dependencies {
// Gson library
implementation 'com.google.code.gson:gson:2.10.1'
compileKotlin {
kotlinOptions {
jvmTarget = "1.8"
90
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
}
}
compileTestKotlin {
kotlinOptions {
jvmTarget = "1.8"
}
}
Now that we have our access token, as well as all of our necessary
dependencies, let’s look at the code necessary in order to access a channel
and grab all the chat history within a specified time range. For obvious
purposes, we want the user name, timestamp, and message content of
each posting in the channel.
import com.slack.api.Slack;
import com.slack.api.methods.MethodsClient;
import com.slack.api.methods.request.conversations.
ConversationsHistoryRequest;
import com.slack.api.methods.response.conversations.
ConversationsHistoryResponse;
import com.slack.api.methods.request.users.UsersInfoRequest;
import com.slack.api.methods.response.users.UsersInfoResponse;
import com.slack.api.model.Message;
91
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
import com.slack.api.model.User;
import com.slack.api.model.block.LayoutBlock;
import java.time.*;
import java.util.Collections;
import java.util.List;
ConversationsHistoryRequest request =
ConversationsHistoryRequest.builder()
.channel(channelId)
.oldest(String.valueOf(startTime))
.latest(String.valueOf(endTime))
.build();
92
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
try {
ConversationsHistoryResponse response = methods.
conversationsHistory(request);
UsersInfoRequest userInfoRequest =
UsersInfoRequest.builder()
.user(userId)
.build();
UsersInfoResponse userInfoResponse =
methods.usersInfo(userInfoRequest);
if (userInfoResponse != null &&
userInfoResponse.isOk()) {
User user = userInfoResponse.getUser();
System.out.println("User: " + user.
getName());
System.out.println("Timestamp: " +
timestamp);
System.out.println("Message: " +
message.getText());
System.out.println();
}
}
} else {
93
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
System.out.println("Failed to fetch messages: "
+ response.getError());
}
} catch (Exception e) {
e.printStackTrace();
}
}
94
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
In order to retrieve the chat history, we use the
ConversationsHistoryRequest class which is another class provided by the
Slack API. Here, all you need to do is specify the desired channel ID, the
oldest timestamp, and the latest timestamp to define the time range for the
chat history. In this example, we retrieve messages from August 3, 2023, at
10:00 to August 12, 2023, at 15:00. Easy peasy.
Listing 3-17 shows the output after executing ChannelReaderSlackBot.
java, which is truncated here since you already have the full text in
Listing 3-5 earlier in this chapter.
...
95
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Exercises Left for the Reader
So, there are obviously a few additional things we can do here, and these
steps will be left for you (the reader) to accomplish, for example:
• Making the ChatGPTClient.java (the related classes)
to be even more safe for novice users. For example,
for ChatGPT, the valid value for the top P parameter is
only between 0 and 1. The constructor for the Chat.java
class should throw an Exception if the user specifies
anything that is beyond the range of any valid values.
96
CHAPTER 3 U
SING AI IN THE ENTERPRISE! CREATING A TEXT SUMMARIZER FOR
SLACK MESSAGES
Conclusion
In this chapter, we talked about one of the various ways artificial
intelligence can be put to practical use within the enterprise today. We
showed you how to improve upon our ChatGPTClient.java application by
utilizing the builder pattern in order to allow constructing your class to be
a lot more flexible than what was in the previous chapter.
Most notably, however, we discussed what is truly “prompt
engineering,” by discussing that prompt engineering cannot be
accomplished by simply textual input to ChatGPT alone. You definitely
need to understand the ramifications of all the input parameters to the
ChatGPT API, in order to properly, and effectively, perform prompt
engineering.
Using what we learned about prompt engineering, we were able to
successfully obtain summarizations of any large body of text provided to
us. Finally, we saw the code necessary in order to run an automated bot
that will grab messages from any Slack channel programmatically, if we
specify a valid date range.
In this chapter (as well as the previous chapter), we were working
exclusively with the Chat Completions Endpoint of the OpenAI APIs. In
the next chapter, we’re going to push the boundaries of what’s possible by
experimenting with the Whisper and DALL·E Endpoints.
97
CHAPTER 4
Multimodal AI:
Creating a
Podcast Visualizer
with Whisper
and DALL·E 3
Now let’s introduce a new term: multimodal AI. In the most simplest of
terms, generative AI models can create content in 1 of 4 formats:
• Text
• Audio
• Images
• Video
100
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
In order to make the code for the Podcast Visualizer easy to follow
along, we’ll do things separately in the following three steps:
The code presented here in this chapter has tons of practical uses, for
example:
101
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
102
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
103
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
• Would / Wood
• Flour / Flower
• Two / Too / To
• Break / Brake
• Allowed / Aloud
As you can see from Listing 4-1, Whisper was able to understand all
the punctuation in the audio, identify all the foreign words (of which
there were several), and understand the names as well as the company
name (“duolingo”) within a URL! Of course, if you noticed, it could also
understand the difference between “wood” and “would.”
104
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
• Afrikaans
• Arabic
• Armenian
• Azerbaijani
• Belarusian
• Bosnian
• Bulgarian
• Catalan
• Chinese
• Croatian
• Czech
• Danish
• Dutch
• Estonian
• Finnish
• French
• Galician
• German
105
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
• Greek
• Hebrew
• Hindi
• Hungarian
• Icelandic
• Indonesian
• Italian
• Japanese
• Kannada
• Kazakh
• Korean
• Latvian
• Lithuanian
• Macedonian
• Malay
• Marathi
• Maori
• Nepali
• Norwegian
• Persian
• Polish
• Portuguese
• Romanian
106
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
• Russian
• Serbian
• Slovak
• Slovenian
• Spanish
• Swahili
• Swedish
• Tagalog
• Tamil
• Thai
• Turkish
• Ukrainian
• Urdu
• Vietnamese
• Welsh
So, at the end of the day, it will be able to understand audio spoken by
yourself and probably any language spoken by your friends and colleagues.
Developers are limited to send no more than 50 requests per minute to
the endpoint, so this constraint needs to be taken into consideration if you
want to transcribe vast amounts of audio.
Whisper supports audio in flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav,
or webm formats. Regardless of the format that you use, the maximum file
size to send to the endpoint is 25MB.
Now, if you haven’t worked extensively with audio files, please be
aware that some formats create REALLY HUGE files (e.g., wav format), and
others create really small files (e.g., m4a format). So, converting your file to
107
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
a different format can help you with the 25MB limitation. However, later in
this chapter, we’ll see the code for a tool that takes a single large audio file
and splits it into multiple, smaller files.
Transcriptions Endpoint
The Transcriptions Endpoint is a REST service that converts audio into
text, and is only compatible with the Whisper model.
108
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Compatible models
include
• whisper-1
(continued)
109
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Additionally, this
field can be used to
change the spelling or
capitalization of any
words that Whisper is
not familiar with.
response_ String Optional This is the format
format default: json of the output of
transcription.
110
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
111
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
However, that’s not the case for other audio programs which tend to last for
up to an hour (or even more). In cases like that, you’re going to definitely
exceed the 25MB file limitation.
Therefore, let’s pair-program with ChatGPT and use our human
intelligence to create our own utility that will take a single audio file and
split it into multiple smaller files.
Listing 4-2 is the prompt that I sent to ChatGPT in order to get a basic
app that splits audio files.
After a little back and forth, I was able to create (as shown in Listing 4-3)
the AudioSplitter.Java application that includes my edits to what was
generated by ChatGPT.
112
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
import org.bytedeco.javacv.FFmpegFrameGrabber;
import org.bytedeco.javacv.FFmpegFrameRecorder;
import java.io.IOException;
long segmentStartTime = 0;
long segmentEndTime;
int segmentNumber = 1;
113
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
segmentEndTime = Math.min(segmentStartTime
+ segmentDurationInSeconds,
totalDurationInSeconds);
grabber.setTimestamp(segmentStartTime *
1000000); // Set the grabber's timestamp to
the start time in microseconds
segmentStartTime = segmentEndTime;
segmentNumber++;
}
} catch (IOException e) {
114
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
e.printStackTrace();
}
}
}
Since the standard Java libraries don’t have wide support for various
media formats, we’re using a combination of the FFmpeg library and
JavaCV (both are free and open source).
The goal is simple: split an MP3 file into contiguous segments no
longer than 10 minutes using the Java language. In this simple app, we do
the following steps:
In order for this to work, you need to have the JavaCV and FFmpeg
libraries properly installed and configured in your project.
115
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Note FFmpeg is an open source binary that you will need to install
on your machine and be placed in your PATH or made accessible
within your project. JavaCV uses FFmpeg via JNI (the Java Native
Interface).
FFmpeg is an extremely versatile media converter which not
only handles MP3 audio files, but various other audio file formats
(including M4A, OGG, and WAV). It is able to convert video formats as
well as static images like PNG, JPEG, and GIF.
116
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Listing 4-4 is the prompt to put in the Chat Playground to get things
started. Be sure to note that I’m asking for a 60-sec HTTP request timeout
since Whisper may take a little while to generate the transcript.
Model: gpt-4
Temperature: 1
After some back and forth, here’s the response ChatGPT gave us that
worked, as shown in Listing 4-5.
117
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
import java.io.*;
import java.nio.file.*;
import okhttp3.*;
import java.util.*;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.Stream;
/**
* Client class to transcribe MP3 files using the OpenAI
Whisper model.
*/
public class WhisperClient {
118
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
119
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
// Iterate over each MP3 file, transcribe it, and print
the response
for (File mp3File : mp3Files) {
// Construct the request body for transcription
RequestBody requestBody = new MultipartBody.
Builder()
.setType(MultipartBody.FORM)
.addFormDataPart(
"file",
mp3File.getName(),
RequestBody.create(mp3File, MEDIA_
TYPE_MP3))
.addFormDataPart("model", model)
.addFormDataPart("response_format",
responseFormat)
.build();
120
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
As you can see, the OkHttpClient class allows us to use the Builder
pattern to create an instance of the class with the timeouts we specified.
Now, here’s the reason why it’s a good idea when using the
AudioSplitter.java utility to make the output folder separate from the
input folder. We’re creating a Collection of all the MP3 files from within
a directory. More specifically, it’s a “List<File>” that’s named “mp3Files”
to hold all the files that we want transcribed. Therefore, if the file that
we want to be split into smaller segments is in the same folder with the
segments themselves, then we’ll be sending the large file (the one that’s
>25MB file) along with the smaller files to the Transcription Endpoint,
which defeats the whole purpose of using the AudioSplitter.java app.
Therefore, the “Files.walk()” method allows us to recursively traverse
the “mp3FolderPath” directory and collect all MP3 files and filter out those
that do not end with the “.mp3” extension (for safety purposes, and to
prevent any errors with the web service). Then, we map each “Path” to its
corresponding “File” object and sort the files alphabetically based on their
names. Finally, we use the “Collectors.toList()” method to collect all the
sorted files into the “mp3Files” list.
With a Collection of MP3 files in hand, it’s now time to send them
to the Transcription Endpoint. As we build the RequestBody, the most
important lines you should pay attention to are
.addFormDataPart("model", model)
.addFormDataPart("response_format", responseFormat)
121
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
That’s due to the fact that if you want to add any optional parameters
to the HTTP Request, (refer to Table 4-2 for all the parameters) such as the
prompt or temperature, then you need to add them here in the same way
that we specified the model and the response format.
122
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
123
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
124
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
The way Camille remembers Foxy, it's almost like a movie. Her
memories feel like endless summer, hazy and perfect, like a
scene shot on crackly film. I just remembered like the feeling
125
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
For brevity, we’re only showing an excerpt of the transcript. The full
transcript itself is over 8000 words due to the fact that the episode is nearly
1 hr in length.
Listing 4-8. The Prompt for GPT-4 to Create a Prompt for DALL⋅E
126
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
As you can see in the prompt, the model used is the 32k token version
of GPT-4 in order to allow us to process REALLY LONG text transcripts.
DALL⋅E needs to know the type of image to generate so that’s why we need
to specify that the image should be a photo, painting, digital art, etc. We
need to ensure that the resulting text generated by GPT-4 is short, so we
want to have a maximum length of 150 tokens. Also, in order to prevent
GPT-4 from repeating some phrases multiple times, we introduced a
frequency penalty of 0.33.
Listing 4-9 shows the results from GPT-4 after reading the transcript of
Episode 811 of This American Life.
127
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Table 4-3. The HTTP Parameters Necessary to Call the Create Image
Endpoint
HTTP Param Description
Table 4-4 describes the format of the JSON object necessary for the
request body for the Create Image Endpoint. For obvious reasons, the
prompt is the only required parameter in order to successfully invoke the
service.
128
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
(continued)
129
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
130
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
131
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Image (JSON)
Table 4-5. The Structure of the Image JSON Object
Field Type Description
(or)
import java.io.IOException;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.databind.ObjectMapper;
import okhttp3.*;
132
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
133
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
try {
Response response = client.newCall(request).
execute();
if (!response.isSuccessful()) throw new
IOException("Unexpected code " + response);
System.out.println(response.body().string());
} catch (Exception e) {
e.printStackTrace();
}
}
@JsonProperty("prompt")
private String prompt;
@JsonProperty("n")
private int n;
@JsonProperty("size")
private String size;
}
134
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Now, since we’re already using the OkHttp library to make our HTTP
Requests with the Transcriptions Endpoint for the Whisper model, we’ll
continue to use it for the Create Image Endpoint for the DALL⋅E model.
The most important thing to understand here is the CreateImage
inner class. It has the @JsonProperty annotations, and it encapsulates the
important parameters necessary to create an image:
• The text prompt describing the details of the image
Figures 4-3 and 4-4 show the image generated from the text prompt in
Listing 4-9.
Figure 4-3. The DALL⋅E Generated Image of a Girl and Her Dog from
Episode 811 of “This American Life” Podcast
135
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
Figure 4-4. The DALL·E Generated Image of a Girl and Her Dog from
Episode 811 of “This American Life” Podcast
136
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
• 3-D render
• Painting
• Abstract painting
• Oil pastel
• Digital art
• Photo
• Photorealistic
• Hyperrealistic
• Neon photo
• High-quality photo
• Silhouette
• Vaporware
• Cartoon
• Plush object
137
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
• Marble sculpture
• Hand sketch
• Poster
• Synth wave
• Hand drawn
138
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
If you can describe a dream to another person, then you should have
no problem describing what you want to DALL⋅E.
Conclusion
In this chapter we accomplished a lot! With a few classes we created a
Podcast Visualizer.
139
Chapter 4 Multimodal AI: Creating a Podcast Visualizer with Whisper and DALL·E 3
140
CHAPTER 5
Creating
an Automated
Community Manager
Bot with Discord
and Java
When you’re launching an app or a service, it’s important to build and
maintain your own community. Below are the telltale signs of a healthy
user community:
No matter what type of app or service that I create, I would love for my
user community to exemplify the items listed above!
142
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
control growth and prevents spam. This model not only ensures a tailored
experience for members but also enhances security, since community
managers have the discretion to grant or deny access.
Discord not only supports text messaging, but also supports
voice chats and streaming video. Very similar to Slack, Discord allows
community managers to separate content into channels to organize
discussions, streamline information flow, and to help users see the content
that they’re interested in.
143
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
144
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
is launching a new mobile app for customers of their bank. They want to
have a channel that will be monitored by a bot to answer questions from
users of the app, and they also want to ensure that no one is soliciting users
of their app, or posting hurtful or harmful content in their Discord servers.
Figure 5-1. This Fake App from a Fake Bank Has Real Problems
145
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
146
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Specify the name of your server and provide an optional server icon (if
you have one).
147
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
https://discord.com/developers
148
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
149
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
150
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
• Bot
151
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
After we select the bot’s scope, we get to see all the permissions that
are only applicable to bots. Bot permissions fall into three categories:
general, text, and voice.
In case you’re curious about the categories, the general permissions
allow the bot to act in the capacity of a normal human moderator, such as
managing the server, roles, and channels. Bots with these permissions can
also kick and ban members.
Text permissions allow the bot to send and receive messages in text
channels, and voice permissions allow the bot to participate in voice
channels. Simple enough, right?
Select the following permission for the bot:
• Bot Permissions
• Text Permissions
• Send Messages
• Read Message History
152
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Although you haven’t written any Java code yet, now it’s time to invite
your bot to your server.
153
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Figure 5-11. If You Read the Screen Carefully Here, You Can See That
Discord Has a Sense of Humor
154
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Click the “Authorize” button to give the bot the permission to run on
your server.
If everything went smoothly, then you should see an automated
message in the General channel of your server that indicates that the
process has been successful.
155
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Note For obvious reasons, using the word “token” here makes me
nervous because this word has two distinct meanings in this book
due to the context, but here’s a quick refresher on the meanings:
• When using Discord and Slack APIs, a “token” is an
authentication token.
• When using OpenAI APIs, a “token” as a part
of a word.
Figure 5-13. Click the “Reset Token” Button to See Your ID Token
156
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Be sure to copy and save the ID token to someplace safe. You will need
this token in the Java code that’s presented later in this chapter.
Scroll down the page to the section named “Privileged Gateway
Intents,” and enable the option named “MESSAGE CONTENT INTENT.”
Note So let’s slow things down a bit and talk about intents. What
exactly is an “intent” and why is it needed? For the purposes of the
Discord API, you need to specify explicitly every type of information
that you want to be notified by Discord programmatically. Otherwise,
Discord will constantly bombard you with events that are not relevant
to you or your bot. For example, for our purposes, we don’t care
when people join or leave the server. However, if you want to send a
list of server rules to anyone who joins your server for the first time,
then you definitely would want to enable the “SERVER MEMBERS
INTENT.” When we deep dive into the code, you’ll see more
information about intents.
157
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
158
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
• javax.annotation-api-<VERSION>.jar
• opus-java-<VERSION>.jar
• nv-websocket-client-<VERSION>.jar
• commons-collections4-<VERSION>.jar
• slf4j-api-<VERSION>.jar
159
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
import java.io.IOException;
import java.util.EnumSet;
import net.dv8tion.jda.api.JDA;
import net.dv8tion.jda.api.JDABuilder;
import net.dv8tion.jda.api.entities.Activity;
import net.dv8tion.jda.api.entities.User;
import net.dv8tion.jda.api.entities.channel.ChannelType;
import net.dv8tion.jda.api.events.message.MessageReceivedEvent;
160
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
import net.dv8tion.jda.api.hooks.ListenerAdapter;
import net.dv8tion.jda.api.requests.GatewayIntent;
161
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
.setActivity(Activity.customStatus
("Ready to answer questions")) // Set the
bot's custom status.
.build();
162
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
if (senderDiscordID.equals(messageEvent.getJDA().
getSelfUser())) {
return;
} else if (messageEvent.getChannelType() ==
ChannelType.TEXT) {
// Ignore messages not in the specified "q-and-a"
channel.
if (!messageEvent.getChannel().getName().equalsIgno
reCase(channelToWatch)) {
return;
}
}
// Send a greeting response to the user who sent the
message.
String reply = "hi <@" + senderDiscordID.getId() + ">,
I can help you with that!";
messageEvent.getChannel().sendMessage(reply).queue();
}
}
163
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Afterward, we again use the Builder Pattern with the JDABuilder class
with our discordToken and the intents that we’re interested in with the
following call:
jda.getRestPing().queue(ping -> System.out.println("Logged in
with ping: " + ping) );
164
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
165
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Now let’s run our Java Discord bot. After executing the app, be sure to
return back to your Discord server, and try to type a question in the
channel that you setup for Q&A. Figure 5-15 shows the response to my
question, “Is this bot going to answer my questions about the app?”
166
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
• On the right side, you’ll see that the bot is online with a
green status indicator.
• The bot also has a custom status to let you know what it
will do in the channel.
167
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Figure 5-16. Providing a Name and Icon for the Second Bot
• Scopes
• Bot
• Bot Permissions
• General Permissions
• Kick Members
• Ban Members
168
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
• Text Permissions
• Send Messages
• Manage Messages
169
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
It’s not because puppies are inherently evil. However, they do have
a tendency to destroy your favorite pair of shoes when left alone. In all
honesty, we simply need something to test our code in Discord when we
run our bot.
Listing 5-2 is the code for ContentModeratorBotDumb.java.
import java.io.IOException;
import java.util.EnumSet;
import net.dv8tion.jda.api.JDA;
import net.dv8tion.jda.api.JDABuilder;
import net.dv8tion.jda.api.entities.Activity;
import net.dv8tion.jda.api.entities.Message;
import net.dv8tion.jda.api.entities.User;
import net.dv8tion.jda.api.entities.channel.unions.
MessageChannelUnion;
import net.dv8tion.jda.api.events.message.MessageReceivedEvent;
import net.dv8tion.jda.api.hooks.ListenerAdapter;
import net.dv8tion.jda.api.requests.GatewayIntent;
170
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
171
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
@Override
public void onMessageReceived(MessageReceivedEvent
messageEvent){
172
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
}
}
}
173
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
Figure 5-17. This Bot Has a Strict Rule About Discussing “Puppies”;
However, Discussing “Kittens” Is Perfectly Fine
Conclusion
We just went through all the steps necessary to create two functioning
Discord bots in Java. For those who are unfamiliar with the process of
creating a Discord server, we showed the process on how to setup a server
to manage our community.
As you can see, we took a much different approach compared to our
Slack bot that we did in Chapter 3! The Slack bot that we created was pretty
much focused on user productivity within the workplace. The two Discord
bots, on the other hand, are truly focused on community management. We
have everything in place for these bots to be artificially intelligent with the
help of OpenAI’s APIs. This is all accomplished in the final two chapters.
174
Chapter 5 Creating an Automated Community Manager Bot with Discord and Java
175
CHAPTER 6
Adding Intelligence
to Our Discord Bots,
Part 1: Using the Chat
Endpoint for Q&A
At this point, we have all the structure in place to make both our Discord
bots that we created in the previous chapter to be fully functional and
artificially intelligent. In the last two chapters of this book, we’re going to
follow all the steps necessary in order to make both bots to be artificially
intelligent. In this chapter, we’ll get started with our Tech Support Bot,
which was called TechSupportBotDumb.java. Below are the two major
changes that we’re going to make:
Making TechSupportBot.java
More Intelligent
Listing 6-1 contains the full contents of the frequently asked questions
that the fictional customer support team has created based upon support
tickets from users of the newly launched mobile banking application.
178
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
3. Is the Crooks Bank Mobile App safe and secure?
Yes, the Crooks Bank Mobile App prioritizes your security. We use
state-of-the-art encryption and security protocols to protect
your data and transactions. Your information is safe with us.
179
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
8. Can I get customer support through the Crooks Bank
Mobile App?
Absolutely! We offer customer support through our in-app
messaging feature. You can also find our customer service
contact information on our website.
As you can see in the Frequently Asked Questions text file in Listing 6-1,
there’s no magic involved here. It’s simply a list of questions and the
answers. Now, let’s see the newly modified TechSupportBot.java class. This
is represented in Listing 6-2.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.EnumSet;
import net.dv8tion.jda.api.JDA;
import net.dv8tion.jda.api.JDABuilder;
import net.dv8tion.jda.api.entities.Activity;
import net.dv8tion.jda.api.entities.User;
180
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
import net.dv8tion.jda.api.entities.channel.ChannelType;
import net.dv8tion.jda.api.entities.channel.unions.
MessageChannelUnion;
import net.dv8tion.jda.api.events.message.MessageReceivedEvent;
import net.dv8tion.jda.api.hooks.ListenerAdapter;
import net.dv8tion.jda.api.requests.GatewayIntent;
181
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
GatewayIntent.DIRECT_MESSAGES, // For private
direct messages.
GatewayIntent.MESSAGE_CONTENT // To allow
access to message content.
);
182
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
// Print the number of guilds the bot is
connected to.
System.out.println("Guilds: " + jda.
getGuildCache().size());
System.out.println("Self user: " + jda.
getSelfUser());
} catch (InterruptedException e) {
// Handle exceptions if the thread is interrupted
during the awaitReady process.
e.printStackTrace();
}
}
183
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
} else if (messageEvent.getChannelType() ==
ChannelType.TEXT) {
// Ignore messages not in the specified "q-and-a"
channel.
if (!channel.getName().equalsIgnoreCase(channelT
oWatch)) {
return;
}
}
// this line takes the question from the Discord users
and asks ChatGPT
reply = chatGPTClient.sendMessageFromDiscordUser
(message.getContentDisplay());
channel.sendMessage(reply).queue();
}
184
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
}
}
}
As you can see, we’re defining some Strings that provide a reference to
the file path location where the frequently asked questions file is stored.
We also have a String that will be used to contain the contents of the
file itself.
Now, as we have learned from the previous chapters in the book, you
can dramatically set the tone of the conversation by providing a specific
message to the system itself in your prompt. Therefore, we have a String
here as well containing the system message. Finally, we have a reference to
a Class, ChatGPTClientForQAandModeration, which will be quite similar
to the other ChatGPTClient classes that we use previously in the book.
185
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
Updates to the onMessageReceived() Method
Now, when a message is received, be sure to notice the following line:
Here, we need to give the full package and class name of the Message
class used by the JDA library because we already created and use a
Message class for encapsulating and representing the JSON objects when
we send HTTP requests to the Chat Endpoint.
Now let’s further examine the following three lines of code:
channel.sendTyping().queue();
reply = chatGPTClient.sendMessageFromDiscordUser(message.
getContentDisplay());
channel.sendMessage(reply).queue();
Here, we provide a nice user experience and show the user that the bot
is “typing,” while the user’s question is being sent to ChatGPT. When the
response comes back, we provide the reply back to the user.
Analyzing
ChatGPTClientForQAandModeration.java
In Listing 6-2, TechSupportBot.java instantiates
ChatGPTClientForQAandModeration.java which (as we stated previously)
is very similar to the ChatGPTClient classes we have used before. The
complete source for ChatGPTClientForQAandModeration.java is shown in
Listing 6-3.
186
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
Listing 6-3. ChatGPTClientForQAandModeration.java
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
//
// OpenAI parameters that we already know how to use
//
String openAIKey = "";
String endpoint = "https://api.openai.com/v1/chat/
completions";
String model = "gpt-4";
float temperature = 1.0f;
int max_tokens = 256;
float top_p = 1.0f;
int frequency_penalty = 0;
int presence_penalty = 0;
187
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
//
// The constructor needs to be passed the contents from the
FAQ.txt file
// and the system message
//
public ChatGPTClientForQAandModeration(String
systemMessage, String initialInstructionsToChatGPT) {
this.systemMessage = systemMessage;
this.initialInstructionsToChatGPT =
initialInstructionsToChatGPT;
}
188
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
.frequencyPenalty(frequency_penalty)
.presencePenalty(presence_penalty)
.build();
jsonInput = mapper.writeValueAsString(chat);
System.out.println(jsonInput);
} catch (JsonProcessingException e) {
e.printStackTrace();
}
try {
URL url = new URL(endpoint);
HttpURLConnection connection = (HttpURLConnection)
url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type",
"application/json");
connection.setRequestProperty("Authorization",
"Bearer " + openAIKey);
connection.setDoOutput(true);
189
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
String line;
while ((line = reader.readLine()) != null) {
response.append(line);
}
reader.close();
//
// We are only interested in the "message.content" in the
JSON response
// So here's the easy way to extract that
//
public String extractAnswerFromJSON(String jsonResponse) {
String chatGPTAnswer = "";
try {
// Create an ObjectMapper instance
ObjectMapper objectMapper = new ObjectMapper();
190
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
JsonNode rootNode = objectMapper.
readTree(jsonResponse);
return chatGPTAnswer;
}
}
public ChatGPTClientForQAandModeration(String
knowledgeBaseFileContents, String systemMessage) {
this.knowledgeBaseFileContents =
knowledgeBaseFileContents;
this.systemMessage = systemMessage;
}
191
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
Every time a question has been posted on our Discord channel,
TechSupportBot.java will call sendMessageFromDiscordUser() from
ChatGPTClientForQAandModeration.java. In this method, we will perform
all of the usual things necessary in order to invoke the Chat Endpoint.
192
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
Listing 6-4 shows the conversation between us and the Discord Bot.
193
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
Me: I have an old Blackberry phone, is the app supported on
my device?
Q and A Bot: Sure, here's how you can set up the Crooks Bank
Mobile App on your iPhone:
Please make sure you update the app regularly to access new
features and improvements. If you have any challenges during
setup, please reach out to us through the app's support feature
or call our customer service.
194
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
e Have a Monumental Achievement…
W
With One Slight Flaw
OK, if you step back and examine what we have achieved so far, you
should realize that we’re looking at what is nothing short of a monumental
achievement. We have the following:
So, with all this that we have accomplished, there’s one GIGANTIC flaw
that we can’t ignore. In Listing 6-4, the bot said to the user:
Once you're logged in, you can set up Touch ID or Face ID for
quicker, secure login in the future.
No, no, no! Bad bot! In case you didn’t read the frequently asked
questions file completely, please allow me to explain what went
wrong here:
195
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
1. The FAQ.txt file in Listing 6-1 clearly states that the
Crooks Bank mobile app is a mobile app. Currently,
Touch ID can only be used on Apple desktop and
laptop computers. So this doesn’t even make sense.
196
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
answer questions explicitly from the training information. If
you were not given an answer to a question, then provide an
apologetic message to the user explaining that you do not know
the answer.";
And that’s it. The FAQ.txt file remains exactly the same as before, and
all we needed to do is to update the System Message to let ChatGPT know
it’s role in the conversation. Listing 6-5 has the new conversation between
us and the Discord bot.
Q and A Bot: I'm sorry, but from the information provided, the
Crooks Bank Mobile App can only be downloaded from the App
Store for iOS devices and Google Play for Android devices.
BlackBerry devices are not mentioned, therefore I cannot
confirm that it would be compatible.
Q and A Bot: To set up the Crooks Bank Mobile App on your iPhone,
you can download it from the App Store. Simply search for "Crooks
Bank Mobile App" and click the "Install" button. After the
installation is finished, open the app and follow the prompts to
set up your account. If you already have an account, use your
login information to access your account through the app.
197
Chapter 6 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 1: USING THE CHAT
ENDPOINT FOR Q&A
Now that’s a lot better! Our Tech Support bot is “sticking to the script”
and is not allowed to makeup or assume anything.
Conclusion
In this chapter, we’ve accomplished a lot! We have a fully functioning
Discord bot that can be trained by anyone without any experience in
AI, NLP, or Machine Learning using a simple text file. We learned that a
“helpful assistant” may at times become TOO HELPFUL and assume facts
that are not true. However, we also reiterated the concept and the value of
the System Message, which is an important part of Prompt Engineering.
Now that we have made our Q&A Discord bot to be intelligent, now
let’s find out how to make out Content Moderator bot to be intelligent
as well!
198
CHAPTER 7
Adding Intelligence
to Our Discord Bots,
Part 2: Using the Chat
and Moderation
Endpoints for
Moderation
In this chapter, we’re going to take the steps necessary in order to make our
Content Moderator Discord bot artificially intelligent. Let’s overview the
changes that we’re going to make:
• Hate
• Hate/threatening
• Harassment/threatening
• Self-harm
• Self-harm/intent
• Self-harm/instructions
• Sexual
• Sexual/minors
• Violence
• Violence/graphic
200
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Moderations Endpoint
The Moderations Endpoint allows developers to submit a String of text,
and to subsequently know if it’s violent, hateful, threatening, or contains
any form of harassment.
201
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Creating the Request
Table 7-1 lists all the HTTP parameters necessary to call the Moderations
Endpoint.
Table 7-2 describes the format of the JSON object necessary for the
request body for the Moderations Endpoint. The service is very simple to
use, since only one parameter is required to properly invoke the service.
202
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Create Moderation (JSON)
Table 7-2. The Request Body for the Moderation Endpoint
Field Type Required? Description
203
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
M
oderation (JSON)
Table 7-3. The Structure of the Moderation JSON Object
Field Type Description
204
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Table 7-3. (continued)
↳↳ self-harm/ Boolean This indicates whether or not the text given contains
intent content in which the speaker expresses that they are
engaging or intend to engage in acts of self-harm,
such as suicide, cutting, and eating disorders.
↳↳ self-harm/ Boolean This indicates whether or not the text given contains
instructions content that encourages the performing acts of self-
harm, such as suicide, cutting, and eating disorders.
This includes content that gives instructions or advice
on how to commit such acts.
↳↳ sexual Boolean This indicates whether or not the text given contains
content meant to arouse sexual excitement, such as
the description of sexual activity.
This includes content that promotes sexual services;
however, this excludes topics such as sex education
and wellness.
↳↳ sexual/ Boolean This indicates whether or not the text given contains
minors content that includes an individual under the age of
18.
↳↳ violence Boolean This indicates whether or not the text given contains
content depicting death, violence, or physical injury.
↳↳ violence/ Boolean This indicates whether or not the text given contains
graphic content depicting death, violence, or physical injury in
graphic detail.
↳ category_ Array A list of the categories along with the scores given by
scores the model.
↳↳ hate Number Score for the category “hate.”
(continued)
205
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Table 7-3. (continued)
206
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Listing 7-1. The Moderation JSON Object
{
"id": "modr-XXXXX",
"model": "text-moderation-005",
"results": [
{
"flagged": true,
"categories": {
"sexual": false,
"hate": false,
"harassment": false,
"self-harm": false,
"sexual/minors": false,
"hate/threatening": false,
"violence/graphic": false,
"self-harm/intent": false,
"self-harm/instructions": false,
"harassment/threatening": true,
"violence": true,
},
"category_scores": {
"sexual": 1.2282071e-06,
"hate": 0.010696256,
"harassment": 0.29842457,
"self-harm": 1.5236925e-08,
"sexual/minors": 5.7246268e-08,
"hate/threatening": 0.0060676364,
207
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
"violence/graphic": 4.435014e-06,
"self-harm/intent": 8.098441e-10,
"self-harm/instructions": 2.8498655e-11,
"harassment/threatening": 0.63055265,
"violence": 0.99011886,
}
}
]
}
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
208
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
// OpenAI parameters that we already know how to use
String openAIKey = "";
String endpoint = "https://api.openai.com/v1/moderations";
String model = "text-moderation-latest";
jsonInput = mapper.writeValueAsString(modRequest);
System.out.println(jsonInput);
} catch (JsonProcessingException e) {
e.printStackTrace();
}
try {
URL url = new URL(endpoint);
HttpURLConnection connection = (HttpURLConnection)
url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Content-Type",
"application/json");
209
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
connection.setRequestProperty("Authorization",
"Bearer " + openAIKey);
connection.setDoOutput(true);
210
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
connection.disconnect();
} catch (IOException e) {
e.printStackTrace();
}
return moderationResponse;
}
211
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
} catch (JsonProcessingException e) {
e.printStackTrace();
}
return response;
}
class ModerationResponse {
boolean isFlagged = false;
ArrayList<String> offendingCategories = new
ArrayList<>();
@Override
public String toString() {
return "ModerationResponse{" +
"isFlagged=" + isFlagged +
", offendingCategories=" +
offendingCategories +
'}';
}
}
class ModerationResponse {
boolean isFlagged = false;
ArrayList<String> offendingCategories = new
ArrayList<>();
212
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
This class encapsulates the valuable information from the Moderation
JSON object which is returned from the Moderations Endpoint. Namely, if
the original Discord message that we want evaluated violates the content
rules, we have a boolean, isFlagged, to let us know. If isFlagged is true, then
offendingCategories is populated with the categories that the content has
been flagged for.
Therefore, the method getModerationResponsefromJSON() does
exactly what the name says. We pass the Moderation JSON object
returned by the Moderations Endpoint, and we get a fully instantiated
ModerationResponse object.
Making ContentModeratorBot.java
More Intelligent
Now that we have ModerationClient.java to invoke the Moderations
Endpoint, let’s take a look at the updated ContentModeratorBot.
java (formerly named ContentModeratorBotDumb.java) that will
use the ModerationClient.java to check for harmful content and the
ChatGPTClientForQAandModeration.java (unmodified from the previous
chapter) to check for potential scams.
Listing 7-3 is the full source code for our intelligent Discord Moderator
Bot, ContentModeratorBot.java.
import java.io.IOException;
import java.util.EnumSet;
import net.dv8tion.jda.api.JDA;
import net.dv8tion.jda.api.JDABuilder;
import net.dv8tion.jda.api.entities.Activity;
import net.dv8tion.jda.api.entities.User;
213
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
import net.dv8tion.jda.api.entities.channel.unions.
MessageChannelUnion;
import net.dv8tion.jda.api.events.message.MessageReceivedEvent;
import net.dv8tion.jda.api.hooks.ListenerAdapter;
import net.dv8tion.jda.api.requests.GatewayIntent;
214
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
If any of these violations are detected, respond with
"FLAG" (in uppercase without quotation marks). If the
message adheres to the rules, respond with "SAFE" (in
uppercase without quotation marks).
""";
215
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
// create a new ModerationClient
moderationClient = new ModerationClient();
216
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
// Handle exceptions if the thread is interrupted
during the awaitReady process.
e.printStackTrace();
}
}
@Override
public void onMessageReceived(MessageReceivedEvent
messageEvent){
// this line takes the message from the Discord user
and invokes the Moderation Endpoint
moderationResponse = moderationClient.checkForObjection
alContent(message.getContentDisplay());
// this line takes the message from the Discord user
and invokes the Chat Endpoint
217
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
chatGPTResponse = chatGPTClient.sendMessageFromDiscordU
ser(message.getContentDisplay());
if (chatGPTResponse.equals("FLAG") ||
moderationResponse.isFlagged ){
}
}
218
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Important Changes to Note
from the Previous Version of the Content
Moderator Bot
Let’s briefly take a look at ContentModeratorBot.java from Listing 7-3 and
discuss the changes that were made. The following code snippet contains a
portion of the class definition section.
219
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
// this is our Chat Endpoint client
static ChatGPTClientForQAandModeration
chatGPTClient = null;
// this is our Moderation Endpoint client
static ModerationClient moderationClient = null;
If you’re using Java 13+, then you can define a whole block of text using
“triple quote” notation. This is how we defined the System Message that
will be used by the ChatGPTClientForQAandModeration class.
moderationResponse = moderationClient.checkForObjection
alContent(message.getContentDisplay());
chatGPTResponse = chatGPTClient.sendMessageFromDiscord
User(message.getContentDisplay());
if (chatGPTResponse.equals("FLAG") ||
moderationResponse.isFlagged ){
220
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
// Mention the user who sent the
inappropriate message
String authorMention = senderDiscordID.
getAsMention();
Here, we take each message that was posted in the Discord server and
check it with both the Moderations Endpoint and the Chat Endpoint. If
either Endpoints return to inform us that the message is flagged, then we
delete the message in the channel and inform the user that their message
violated the rules.
Now that our Content Moderator Discord bot is intelligent, let’s give
it a try!
221
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
222
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
Content Mod Bot: @JavaChatGPT This comment was deemed
inappropriate for this channel. If you believe this to be in
error, please contact one of the human server moderators.
Conclusion
In this chapter, we created a fully functioning content moderator for
our entire Discord server! We leveraged both the Moderations and Chat
Endpoints from OpenAI to create a custom content moderator that not
only flags unsafe content like hateful and threatening messages, but also
prevents the users of the Discord server from being subject to unwanted
solicitations.
223
Chapter 7 ADDING INTELLIGENCE TO OUR DISCORD BOTS, PART 2: USING THE CHAT AND
MODERATION ENDPOINTS FOR MODERATION
the message and inform everyone that the message was
deleted. However, since the Tech Support Bot doesn’t
know that it shouldn’t respond to other bots, it will try
to create a response. Of course, bots should not talk to
other bots.
224
APPENDIX 1
226
Appendix 1 List of OpenAI Models
227
Appendix 1 List of OpenAI Models
228
Index
A, B language models, 2
neural network, 10
Application programming
observer pattern, 4
interfaces (APIs),
OpenAI (see OpenAI)
see OpenAI
pre-trained model, 11
Artificial intelligence (AI)
regular expressions, 6–8
ChatGPTClientForQAand
response, 3
Moderation.java, 186–194
singleton pattern, 3
FAQ.txt file, 196
string tokenizer, 15, 16
moderations endpoint, 199
temperature, 17
monumental achievement, 195
token counter, 46, 47
onMessageReceived()
ChatGPT, see Chat Generation
method, 186
Pre-Trained Transformer
TechSupportBot.java class,
(ChatGPT)
178–186
ChatGPTClientForQAand
See also Multimodal AI
Moderation.java
AudioSplitter.java, 111–116, 123,
JSONPath, 192
125, 127, 140
source code, 186–194
Automatic Speech Recognition
TechSupportBot.java, 192–198
(ASR), 102–108
ChatGPTClient.Java, 48
builder pattern, 58, 60–63
C Chat Object, 58–60
Chat Generation Pre-Trained initial conversation, 48, 49
Transformer (ChatGPT), 1 Message.java class, 63
analysis information, 7–9 resulting code, 54–56
data model, 11 source code, 49–54
factory pattern, 4 Community management
Java design patterns, 3 app/service, 141
230
INDEX
231
INDEX
232
INDEX
233