APIs For AI and Data Science (For DUC PHAM) (Ryan Day)
APIs For AI and Data Science (For DUC PHAM) (Ryan Day)
Science
Using Python to Build and Use APIs for Machine Learning and
Data Analytics
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
Ryan Day
Hands-On APIs for AI and Data
Science
by Ryan Day
See http://oreilly.com/catalog/errata.csp?isbn=9781098164416
for release details.
The views expressed in this work are those of the author and do
not represent the publisher’s views. While the publisher and
the author have used good faith efforts to ensure that the
information and instructions contained in this work are
accurate, the publisher and the author disclaim all
responsibility for errors or omissions, including without
limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any
code samples or other technology this work contains or
describes is subject to open source licenses or the intellectual
property rights of others, it is your responsibility to ensure that
your use thereof complies with such licenses and/or rights.
978-1-098-16435-5
Chapter 1. Becoming An API Provider
With Early Release ebooks, you get books in their earliest form
—the author’s raw and unedited content as they write—so you
can take advantage of these technologies long before the official
release of these titles.
Before you start building your first API, let me ask a question:
why would you want to build one in the first place? Being an
API provider is going to take money, time, and effort that you
could spend on other parts of your business or project. So you
should be able to state the reason you would take on this effort.
Partner API
Private API
Public API
The fantasy manager expects all of the league hosts and advice
websites to connect to each other so that managers don’t have
to manually enter their team roster in multiple places for
advice. They don’t care how the websites accomplish this; they
expect it to just work.
While most league hosts are fairly similar, advice websites vary
widely. On one end of the spectrum are sites with basic advice
articles and projected rankings. On the other end are full-
featured management platforms supporting multiple teams
from multiple league hosts. Somewhere in the middle are sites
with some automated features such as “rate my team” or
weekly projections tailored to a specific league host. The advice
websites consume data from sports data sources and league
hosts to create models and analytics to help the user analyze
their team and make decisions throughout the season. Some
advice websites are ad-supported, and others charge a
subscription fee.
Figure 1-3 shows an example of the analytics products that an
advice website provides, in this case, Fantasypros.
Sports data sources are companies that collect and sell sports
data to a variety of different subscribers, including broadcast
and media sources, league web hosts, advice websites, and
individuals. The data may include traditional statistics, sensor-
driven telemetry data, labeled training data for models, or
machine learning models.
Throughout the book, you will be exploring how APIs are used
in fantasy sports, which is a large and growing entertainment
business. However, the business and technical concepts you
learn apply to many other businesses.
Software
Version Purpose
name
GitHub
You will use GitHub in several ways in this book. You will store
all of your program code in repositories while you develop it.
You will use GitHub Codespaces as your Python development
environment. You will use GitHub Pages to publish your
developer portal.
The reason the book uses so many of GitHub’s tools is that I find
they simplify environment management and work together
well. The end result will be a very professional API and data
science portfolio that demonstrates what you have
accomplished.
Python
SQLite
APIs often serve data to their users, and that data is typically
stored in a relational database. For this book, you’ll be using
SQLite for the database storage of your APIs. SQLite is well
suited for learning projects like the ones you will be developing
for several reasons:
SQLAlchemy
SQL injection is a serious vulnerability in any software that accepts input from users
and queries a database with it, including web applications and APIs. It occurs when
bad actors insert malicious code into inputs that are intended for data values.
# macOS
.DS_Store
# google cloud
**/gcloud/
**/google-cloud-sdk/
*.gz
# AWS
**/aws
*.zip
$ python3 --version
Python 3.10.13
The next tool you will be using for the job is SQLite.
Conveniently, SQLite comes installed with Python.
mkdir chapter1_project
cd chapter1_project
Which produces:
SQLite version 3.41.2 2023-03-22 11:56:21
Enter ".help" for usage hints.
sqlite>
Within SQL, two subsets of commands are used for setting up and maintaining
databases:
Data Definition Language (DDL): The SQL commands that create database
structures.
Data Manipulation Language (DML): The SQL statements that change the data,
through inserts, updates, deletes, and similar statements.
This book does not teach the syntax of SQL, but the scripts used
are fairly basic. To learn more about SQL, I recommend
Learning SQL: Generate, Manipulate, and Retrieve Data, 3rd
Edition by Alan Beaulieu (O’Reilly, 2020).
sqlite> .tables
performance player
Now that the tables are created, you will insert some sample
data into them. There are several ways to do this, but for
simplicity, you will use a direct SQL statement.
You’ll notice that the script included values for three of the
columns in the table, but not the primary key player_id. Because
this is the primary key, SQLite auto-increments the value for
this field when new records are inserted.
sqlite> .headers on
sqlite> select * from player;
player_id|first_name|last_name
1|Justin|Tucker
2|Harrison|Butker
3|Wil|Lutz
4|Matt|Prater
5|Mason|Crosby
6|Daniel|Carlson
7|Graham|Gano
8|Younghoe|Koo
9|Greg|Joseph
10|Eddie|Pineiro
Verify the data loaded into the performance table. Type select
* from performance; resulting in:
5|2023_1|7.5|5
6|2023_1|7.5|6
7|2023_1|7.5|7
8|2023_1|7.5|8
9|2023_1|7.5|9
10|2023_1|7.5|10
Name: SQLAlchemy
Version: 1.4.49
Summary: Database Abstraction Library
Home-page: https://www.sqlalchemy.org
Author: Mike Bayer
The directory listing when you complete will look like the
following:
.
└── chapter1_project
├── database.py
├── fantasy_data.db
├── main_cli.py
├── models.py
└── requirements.txt
The tasks that you need to accomplish in this file are the
following:
SQLALCHEMY_DATABASE_URL = "sqlite:///./fantasy_da
engine = create_engine(
SQLALCHEMY_DATABASE_URL, connect_args={"check
)
SessionLocal = sessionmaker(autocommit=False, aut
Base = declarative_base()
Take a look at this file piece by piece. At the top of most Python
files, you will import the external libraries that you will use. In
this case, three specific SQLAlchemy libraries are imported.
SQLALCHEMY_DATABASE_URL = "sqlite:///./fantasy_da
engine = create_engine(
SQLALCHEMY_DATABASE_URL, connect_args={"check
)
Base = declarative_base()
Here are the two tasks that you need to perform in this file:
performances = relationship("Performance", ba
class Performance(Base):
__tablename__ = "performance"
NOTE
When you import a class from another Python file in the same directory, you can
reference the filename without the .py extension.
Now it’s time to begin the definition of the Player class, which
is the Python class you’ll use to store data from the SQLite
player table. You do this using the class statement, stating the
name of the class, and specifying that it will be a subclass of the
Base class imported from the database.py file. Use the magic
command tablename to tell SQLAlchemy to reference the
player table. Because of this statement, when you ask
SQLAlchemy to query Player , it will know behind the scenes
to access the player table in the database. This is one of the key
benefits of an ORM: mapping the Python code automatically to
the underlying database:
class Player(Base):
__tablename__ = "player"
The rest of the Player class definition maps additional details
about that table. Each statement defines one attribute in the
class using the Column method provided by SQLAlchemy.
Along with the definition of the tables, you define the foreign-
key relationship between the tables using the
relationship() function. This results in a
Player.performances attribute that will return all the
related rows from the performance table for each row in the
player table:
performances = relationship("Performance", ba
The file also contains the definition for the Performance class.
The definition is similar to the Player definition. One thing to
notice is that the relationship() function results in
Performance.player attribute, which you can use to retrieve
the player related to each performance.
main_cli.py is the file you will execute with Python to query the
database and print the results. It references models.py and
database.py files.
def main():
with SessionLocal() as session:
players = session.query(models.Player).al
for player in players:
print(f'Player ID: {player.player_id}
print(f'Player ID: {player.first_name
print(f'Player ID: {player.last_name}
for performance in player.performance
print(f'Performance ID: {performa
print(f'Week Number: {performance
print(f'Fantasy Points: {performa
Breaking down the main() function you see a few key items.
You are using the SessionLocal object that was created in the
databases.py file with information about the SQLite database
and the settings selected. The with…
as code creates a Python
context manager, which is Python code that performs enter
logic before a statement runs and exit logic after it runs. By
using the context manager, the session will be opened for the
database operations and then closed when the operations
finish.
session.query(Player).all() Retriev
Player v
session.query(Player).filter(Player.last_name == Filter a
Tucker) a data e
value
session.query(Player).all().order_by(Player.last_name) Return
Player v
sorted b
Player.l
Now that all the code is written, you are ready to execute it and
see how Python handles the SQLite database. From the
command line enter the command python3 main_cli.py
resulting in:
Player ID: 1
Player ID: Justin
Player ID: Tucker
Performance ID: 1
Week Number: 2023_1
Fantasy Points: 7.5
Player ID: 2
Player ID: Harrison
Player ID: Butker
Performance ID: 2
Week Number: 2023_1
Fantasy Points: 7.5
Player ID: 3
Player ID: Wil
Player ID: Lutz
Performance ID: 3
Week Number: 2023_1
Fantasy Points: 7.5
...[listing continues]
By creating these three Python files, you have connected to a
SQLite database, defined the database structure, and queried
the database.
EXTENDING YOUR PORTFOLIO PROJECT
Here is how you can extend your project based on this chapter:
Summary
You’ve made a good start in your journey as an API provider.
Let’s review what you have accomplished so far:
With Early Release ebooks, you get books in their earliest form
—the author’s raw and unedited content as they write—so you
can take advantage of these technologies long before the official
release of these titles.
Usability testing
User story
Structured template used in agile development to capture
user needs. Helps designers and developers to focus on
user outcomes instead of technical objectives.
—Steve Martin
To further empathize with users, you can create a user persona which defines key
users with additional details such as age, education, and even a fictional name. Read
Personas: learn how to discover your audience, understand them, and pivot to
address their needs for more information about this technique.
After reviewing the data that could fulfill each of the needs, you
find that user needs can be divided into roughly four
quadrants, based on how frequently updates are needed, and
whether read-write access is needed as shown in Figure 2-1.
Figure 2-1. Four quadrants of requirements.
Using this general approach will help select the first API
products that SWC will develop, and decide between the
Quadrant 1 and Quadrant 3 users. Both quadrants appear to
have roughly the same number of users that desire them, so
user desirability is fairly even. Both have a potential economic
benefit because they provide access to value-added services
that will benefit existing SWC customers and generate new
customers. From the technical perspective, there is a clear
difference: providing read-only, daily data (Quadrant 3) will be
significantly simpler than providing real-time, read-write
access (Quadrant 1). Simpler generally means less expensive to
develop and host, which is the other half of the economic
viability.
Using these three principles leads to a clear candidate for the
first API products: Quadrant 3. To further refine these products,
you will create user stories, which are structured descriptions
of the user type, goals, and motivations that can be fulfilled by a
product.
As a (user type)
I want to (goal or intent)
So that (motivation or benefits)
You create several user stories for the Quadrant 3 needs so that
you can decide which API products can fulfill them.
Leagues
Teams
Players
This relies on information that SWC stores today, and can be
made available for read-only APIs. In contrast, user stories
three and five require historical (prior-season) data about
teams and leagues. FSA does not retain that information, so
supporting these user stories is not viable. If you want to
consider supporting it in the future, you could make plans for it.
You would want to consider the economic feasibility of it in that
case. You have a target then: you will work to implement APIs
that fulfill the user stories one, two, and four.
Summary of Progress
API
A set of endpoints related to a single data source or
business domain.
API endpoint
API version
Breaking changes
De-serialization
HTTP verb
Specific type of action for HTTP traffic. Examples include
GET , which reads data, and DELETE which deletes data.
Path parameter
Query parameter
Serialization
Web framework
A set of libraries that simplify common tasks for web
applications.
REST: 86%
WebHooks: 36%
GraphQL:29%
SOAP:26%
WebSockets:25%
gRPC:11%
gRPC
gRPC is not a likely candidate for the APIs that you will be
creating in your portfolio project. However, it’s worth
mentioning in this discussion of API architectural styles related
to data science for one big reason: large language models
(LLMs). These machine learning models are the engines behind
generative AI services such as Bard and ChatGPT. These are
very big models that need all the performance they can get, and
are using gRPC in some cases to achieve this.
Before diving into the Python coding for your API, let’s discuss
how this book will use a couple of key terms. For this book, we
will consider a RESTful API to be a set of endpoints that are all
related to the same data source. From this perspective, your
SWC website will start with a single API: the SWC Fantasy
Football API.
.
└── api
└── version
└── endpoint
Endpoint
HTTP verb URL
description
You can see that the URL is re-used for several of the endpoints.
But by combining the HTTP verb with the URL, a specific action
is taken when this resource is called. This HTTP verb plus URL
combination must be unique. For your portfolio project, you
will develop a set of endpoints to fulfill the user cases you
selected.
In this chapter, you will begin building the APIs that provide
your consumers access to all the valuable fantasy football data
that they’ve asked for. This will provide them direct access to
data for their own data-centric work, as well as allow third-
party websites and apps to provide them with services.
Table 2-3 displays the new tools you will use in this chapter:
Table 2-3. News tools used in this chapter
Software
Version Purpose
name
FastAPI
As you will see as you work through the portfolio project, all of
these capabilities provide benefits to the users of your APIs.
Pydantic
Uvicorn
You will also add the team_player table, which handles the
many-to-many relationship between players and teams. This is
necessary because each fantasy team is made up of 12-16 NFL
players. Across the thousands of leagues that SWC hosts, each
NFL player will occur on many different teams.
In your development environment, create a directory for your
Chapter 2 code with the following commands:
mkdir chapter2_project
cd chapter2_project
You will now create several files for the FastAPI API code and
Pydantic validation schemas, continuing to follow the FastAPI
Tutorial databases template.
The directory listing when you complete will look like the
following:
.
└── chapter2_project
├── crud.py
├── database.py
├── fantasy_data.db
├── main.py
├── models.py
├── schemas.py
└── requirements.txt
cp ../chapter1_project/fantasy_data.db .
sqlite3 fantasy_data.db
Which produces:
SQLite version 3.41.2 2023-03-22 11:56:21
Enter ".help" for usage hints.
sqlite>
sqlite> .tables
league performance player team
To populate the new tables, you will execute DML scripts. For
the sake of space in this chapter, the number of league, team,
and team_player records is limited.
/* Team records */
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
INSERT INTO team (league_id, team_name) VALUES (1
You have now created three new tables in your database and
populated them with a small amount of example data. You are
now ready to create the API code using several new tools.
Endpoint
HTTP verb URL
description
pip will download and install these libraries, along with others
that are required by the libraries themselves. You should see a
message that states that these libraries were successfully
installed, such as the following:
Name: SQLAlchemy
Version: 1.4.49
Summary: Database Abstraction Library
Home-page: https://www.sqlalchemy.org
Author: Mike Bayer
Author-email: [email protected]
License: MIT
Location: /usr/local/python/3.10.13/lib/python3.1
Requires: greenlet
Requires: greenlet
Required-by:
---
Name: pydantic
Version: 2.3.0
Summary: Data validation using Python type hints
Home-page:
Author:
Author-email: Samuel Colvin <[email protected]>, E
License:
Location: /usr/local/python/3.10.13/lib/python3.1
Requires: annotated-types, pydantic-core, typing-
Required-by: fastapi
---
Name: fastapi
Version: 0.103.2
Summary: FastAPI framework, high performance, eas
Home-page:
Author:
Author-email: Sebastián Ramírez <[email protected]
License:
Location: /usr/local/python/3.10.13/lib/python3.1
Requires: anyio, pydantic, starlette, typing-exte
Required-by:
---
Name: uvicorn
Version: 0.23.2
Summary: The lightning-fast ASGI server.
Home-page:
Author:
Author-email: Tom Christie <[email protected]>
License:
Location: /usr/local/python/3.10.13/lib/python3.1
Requires: click, h11, typing-extensions
Required-by:
class Player(Base):
__tablename__ = "player"
performances = relationship("Performance", ba
class Performance(Base):
__tablename__ = "performance"
class League(Base):
__tablename__ = "league"
class Team(Base):
__tablename__ = "team"
class TeamPlayer(Base):
__tablename__ = "team_player"
Let’s take a look at the updated models.py file. The Player and
Performance classes have not changed. The first new code is
the definition of the League class.
class League(Base):
__tablename__ = "league"
Look at the next block of code, which defines the Team class.
class Team(Base):
__tablename__ = "team"
class TeamPlayer(Base):
__tablename__ = "team_player"
You have now defined all of the SQLAlchemy models needed for
the new database tables. Next, you will define the SQLAlchemy
query functions for them.
import models
def get_player(db: Session, player_id: int):
return db.query(models.Player).filter(models
import models
It’s worth noticing the import models statement to remember
that these functions are performing actions on the SQLAlchemy
models you defined in the models.py file. Instead of issuing SQL
commands, you will be executing methods of your model
classes and SQLAlchemy will create prepared SQL statements to
retrieve the data.
By using filter(models.Player.player_id ==
player_id).first() , this function receives a specific
Player.player_id value, and returns the first matching
instance. Because you have defined player_id as a primary
key in the models.py file and the SQLite database, this query will
return a single result.
Although you define the classes using Python code and your
code interacts with them as fully formed Python objects, the
consumer will receive them in an HTTP request as a JSON
object. Pydantic automatically performs the de-serialization
process, which is converting the Python objects into JSON, the
text format that is used to transmit it to the consumer. This
means you do not need to manage de-serialization in your
Python code, which simplifies your program. It is worth
mentioning again that Pydantic 2 is written in Rust, which
makes this process much faster than similar code you could
write in Python.
class Performance(BaseModel):
performance_id : int
player_id : int
week_number : str
fantasy_points : float
class Config:
from_attributes = True
class PlayerBase(BaseModel):
player_id : int
first_name : str
last_name : str
class Config:
from_attributes = True
class Player(PlayerBase):
performances: List[Performance] = []
class Config:
from_attributes = True
class TeamBase(BaseModel):
league_id : int
team_id : int
team_name : str
class Config:
from_attributes = True
class Team(TeamBase):
players: List[PlayerBase] = []
class Config:
from_attributes = True
class League(BaseModel):
league_id : int
league_name : str
scoring_type : str
teams: List[TeamBase] = []
class Config:
from_attributes = True
Let’s dive into the Pydantic schemas to see how they work. The
first class is the simplest, the Performance class.
class Performance(BaseModel):
performance_id : int
player_id : int
week_number : str
fantasy_points : float
class Config:
from_attributes = True
class PlayerBase(BaseModel):
player_id : int
first_name : str
last_name : str
class Config:
from_attributes = True
class Player(PlayerBase):
performances: List[Performance] = []
class Config:
from_attributes = True
The next two classes are used to define the team data.
class TeamBase(BaseModel):
league_id : int
team_id : int
team_name : str
class Config:
from_attributes = True
class Team(TeamBase):
players: List[PlayerBase] = []
class Config:
from_attributes = True
class League(BaseModel):
league_id : int
league_name : str
scoring_type : str
teams: List[TeamBase] = []
class Config:
from_attributes = True
The League class looks simple, but don’t miss one detail:
League.teams is a List of TeamBase objects, which do not
contain a list of players. This means that the consumer
receiving a League does not recursively receive a list of all
players on each team.
At this point, you have designed the DTOs that will be used to
send data to the API consumer, which are defined in Pydantic.
You are ready to bring FastAPI into the mix.
Now that all of the pieces are in place in the other Python files,
you can tie them together with the FastAPI functionality in
main.py. As you will see in the chapters in Part 1, an impressive
amount of functionality is provided for your API in just a few
lines of code.
app = FastAPI()
# Dependency
def get_db():
db = SessionLocal()
try:
y
yield db
finally:
db.close()
@app.get("/")
async def root():
return {"message": "API health check successf
@app.get("/v0/players/", response_model=list[sche
def read_players(skip: int = 0, limit: int = 100,
players = crud.get_players(db, skip=skip, lim
return players
@app.get("/v0/players/{player_id}", response_mode
def read_player(player_id: int, db: Session = Dep
@app.get("/v0/performances/", response_model=list
def read_performances(skip: int = 0, limit: int =
performances = crud.get_performances(db, skip
return performances
@app.get("/v0/leagues/", response_model=list[sche
pp g ( g p [
def read_leagues(skip: int = 0, limit: int = 100,
leagues = crud.get_leagues(db, skip=skip, lim
return leagues
@app.get("/v0/teams/", response_model=list[schema
def read_teams(skip: int = 0, limit: int = 100, d
teams = crud.get_teams(db, skip=skip, limit=l
return teams
app = FastAPI()
# Dependency
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
@app.get("/")
async def root():
return {"message": "API health check successf
The following explains how the HTTP verb and URL are
specified in the decorator:
HTTP verb: All of these endpoints use the GET verb, which
is defined by the @app.get() decorator function.
URL: The first parameter of the get() function is the
relative URL. For this first endpoint, the URL is /v0/players/.
@app.get("/v0/players/{player_id}", response_mode
def read_player(player_id: int, db: Session = Dep
player = crud.get_player(db, player_id=player
if player is None:
raise HTTPException(status_code=404, deta
return player
The final three endpoints do not use any new features. But
together they complete all of the user stories that we have
included for our first API.
@app.get("/v0/performances/", response_model=list
def read_performances(skip: int = 0, limit: int =
performances = crud.get_performances(db, skip
return performances
@app.get("/v0/leagues/", response_model=list[sche
def read_leagues(skip: int = 0, limit: int = 100,
leagues = crud.get_leagues(db, skip=skip, lim
return leagues
return leagues
@app.get("/v0/teams/", response_model=list[schema
def read_teams(skip: int = 0, limit: int = 100, d
teams = crud.get_teams(db, skip=skip, limit=l
return teams
In either case, if your API is working, you should see the health
check message in your web browser:
The real test is when you call the first endpoint that looks up
data. Give that a try by copying the following URL (or
equivalent if on GitHub Codespaces) in your browser bar:
http://127.0.0.1:8000/v0/players/?skip=0&limit=3. If everything is
working correctly, you should see the following data in your
browser:
[{"player_id":1,"first_name":"Justin","last_name"
TIP
This chapter covered a lot, so it’s possible that an error occurred or you are not
getting a successful result. Don’t worry, this happens to all of us. Here are a few
suggestions for how to troubleshoot any problems you are running into:
If this first API endpoint is working for you, try out some more
of the URLs from Table 2-4 to verify that you have completed all
of your user stories. Congratulations, you are an API developer!
EXTENDING YOUR PORTFOLIO PROJECT
For some tips about RESTful API design, read Ten REST
Commandments by Steve McDougall.
Summary
In this chapter, you built on the foundation of the database
created in Chapter 1. Here is what you have accomplished so
far:
Z-Access
https://wikipedia.org/wiki/Z-Library
ffi