Silen Petri Clean Code Principles and Patterns Python Edition 2023
Silen Petri Clean Code Principles and Patterns Python Edition 2023
Python Edition
Petri Silen
This book is for sale at http://leanpub.com/cleancodeprinciplesandpatternspythonedition
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
2: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3: Architectural Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1: Software Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2: Single Responsibility Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3: Uniform Naming Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4: Encapsulation Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5: Service Aggregation Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.6: High Cohesion, Low Coupling Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7: Library Composition Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8: Avoid Duplication Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.9: Externalized Service Configuration Principle . . . . . . . . . . . . . . . . . . . . . . . . 28
3.9.1: Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.9.2: Kubernetes ConfigMaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.9.3: Kubernetes Secrets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.10: Service Substitution Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.11: Inter-Service Communication Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.11.1: Synchronous Communication Method . . . . . . . . . . . . . . . . . . . . . . 35
3.11.2: Asynchronous Communication Method . . . . . . . . . . . . . . . . . . . . . 36
3.11.3: Shared Data Communication Method . . . . . . . . . . . . . . . . . . . . . . . 38
3.12: Domain-Driven Architectural Design Principle . . . . . . . . . . . . . . . . . . . . . . 38
3.12.1: Design Example 1: Mobile Telecom Network Analytics Software System . . . . 39
3.12.2: Design Example 2: Banking Software System . . . . . . . . . . . . . . . . . . . 43
3.13: Autopilot Microservices Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.13.1: Stateless Microservices Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.13.2: Resilient Microservices Principle . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.13.3: Horizontally Autoscaling Microservices Principle . . . . . . . . . . . . . . . . 48
3.13.4: Highly-Available Microservices Principle . . . . . . . . . . . . . . . . . . . . . 49
3.13.5: Observable Microservices Principle . . . . . . . . . . . . . . . . . . . . . . . . 50
3.14: Software Versioning Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.14.1: Use Semantic Versioning Principle . . . . . . . . . . . . . . . . . . . . . . . . 51
CONTENTS
principle, and dependency inversion principle. Each SOLID principle is presented with realistic but
simple examples. The uniform naming principle defines a uniform way to name interfaces, classes,
functions, function pairs, boolean functions (predicates), builder, factory, conversion, and lifecycle
methods. The encapsulation principle describes that a class should encapsulate its internal state and
how immutability helps ensure state encapsulation. The encapsulation principle also discusses the
importance of not leaking an object’s internal state out. The object composition principle defines
that composition should be preferred over inheritance. Domain-driven design (DDD) is presented
with two real-world examples. All the design patterns from the GoF’s Design Patterns book are
presented with realistic yet simple examples. The don’t ask, tell principle is presented as a way to
avoid the feature envy design smell. The chapter also discusses avoiding primitive-type obsession
and the benefits of using semantically validated function arguments. The chapter ends by presenting
the dependency injection principle and avoiding code duplication principle, also known as the don’t
repeat yourself (DRY) principle.
The fourth chapter is about coding principles. The chapter starts with a principle for uniformly
naming variables in code. A uniform naming convention is presented for integer, floating-point,
boolean, string, enum, and collection variables. Also, a naming convention is defined for maps,
pairs, tuples, objects, optionals, and callback functions. The uniform source code repository structure
principle is presented with examples. Next, the avoid comments principle defines concrete ways
to remove unnecessary comments from the code. The following concrete actions are presented:
naming things correctly, returning a named value, return-type aliasing, extracting a constant for
a boolean expression, extracting a constant for a complex expression, extracting enumerated values,
and extracting a function. The chapter discusses the benefits of using type hints. We discuss the
most common refactoring techniques: renaming, extracting a method, extracting a variable, replacing
conditionals with polymorphism, and introducing a parameter object. The importance of static code
analysis is described, and the most popular static code analysis tools are listed. The most common
static code analysis issues are listed with the preferred way to correct them. Handling errors and
exceptions correctly in code is fundamental and can be easily forgotten or done wrong. This chapter
instructs how to handle errors and exceptions, and how to return errors by returning a boolean
failure indicator, an optional value, or an error object. The chapter instructs how to adapt code
to a wanted error-handling mechanism and handle errors functionally. Ways to avoid off-by-one
errors are presented. Readers are instructed on handling situations where some code is copied from
a web page found by googling or generated by AI. The chapter ends with a discussion about code
optimization: when and how to optimize.
The fifth chapter is dedicated to testing principles. The chapter starts with the introduction of the
functional testing pyramid. Then we present unit testing and instruct how to use test-driven devel-
opment (TDD). We give unit test examples with mocking. When introducing software component
integration testing, we discuss behavior-driven development (BDD) and the Gherkin language to
describe features. Integration test examples are given using Behave and Postman API development
platform. The chapter also discusses the integration testing of UI software components. We end the
integration testing section with an example of setting up an integration testing environment using
Docker Compose. Lastly, the purpose of end-to-end (E2E) testing is discussed with some examples.
The chapter ends with a discussion about non-functional testing. The following categories of non-
Introduction 4
functional testing are covered in more detail: performance testing, stability testing, reliability testing,
security testing, stress, and scalability testing.
The sixth chapter handles security principles. The threat modeling process is introduced and there
is an example how to conduct threat modeling for a simple API microservice. A full-blown frontend
OpenID Connect/OAuth 2.0 authentication and authorization example with TypeScript, Vue.js, and
Keycloak is implemented. Then we discuss how authorization by validating a JWT should be handled
in the backend. The chapter ends with a discussion of the most important security features: password
policy, cryptography, denial-of-service prevention, SQL injection prevention, security configuration,
automatic vulnerability scanning, integrity, error handling, audit logging, and input validation.
The seventh chapter is about API design principles. First, we tackle design principles for frontend
facing APIs. We discuss how to design JSON-RPC, REST, and GraphQL APIs. Also, subscription-
based and real-time APIs are presented with realistic examples using Server-Sent Events (SSE) and
the WebSocket protocol. The last part of the chapter discusses inter-microservice API design and
event-driven architecture. gRPC is introduced as a synchronous inter-microservice communication
method, and examples of request-only and request-response asynchronous APIs are presented.
The 8th chapter discusses databases and related principles. We cover the following types of
databases: relational databases, document databases (MongoDB), key-value databases (Redis), wide-
column databases (Cassandra), and search engines. For relational databases, we present how to use
object-relational mapping (ORM), one-to-one, one-to-many and many-to-many relationships, and
parameterized SQL queries. Finally, we present three normalization rules for relational databases.
The 9th chapter presents concurrent programming principles regarding threading and thread safety.
For thread safety, we present several ways to achieve thread synchronization: locks, atomic variables
and thread-safe collections. We also discuss how to publish and subscribe to changes from two
different threads to a shared state.
The 10th chapter discusses teamwork principles. We explain the importance of using an agile
framework and discuss the fact that a developer usually never works alone and what that entails.
We discuss how to document a software component so that onboarding new developers is easy and
quick. Technical debt in software is something that each team should avoid. Some concrete actions to
prevent technical debt are presented. Code reviews are something teams should do, and this chapter
gives guidance on what to focus on in code reviews. The chapter ends with a discussion of developer
roles each team should have and provides hints on enabling a team to develop software as concurrently
as possible.
The 11th chapter is dedicated to DevSecOps. DevOps describes practices that integrate software
development (Dev) and software operations (Ops). It aims to shorten the software development life
cycle through parallelization and automation and provides continuous delivery with high software
quality. DevSecOps is a DevOps augmentation where security practices are integrated into the
DevOps practices. This chapter presents the phases of the DevOps lifecycle: plan, code, build and
test, release, deploy, operate and monitor. The chapter gives an example of creating a microservice
container image and how to specify the deployment of a microservice to a Kubernetes cluster. Also,
a complete example of a CI/CD pipeline using GitHub Actions is provided.
3: Architectural Principles
This chapter describes architectural principles for designing clean, modern cloud-native software
systems and applications. With architectural design I mean designing of a software system consisting
of multiple software components. This chapter focuses on modern cloud-native microservices, but
some of the principles can be used with a monolithic software architecture. In this book, we don’t
handle monolithic software architecture design, but if you design a monolithic software system,
you should consider implementing a so-called modular monolith which is a monolith, but different
functionalities are clearly seprated inside the monolith. This kind of architecture with modularity
makes it possible in the future to dismantle the monolith to microservices if needed or extract part(s)
of the monolith into own microservice(s).
Cloud-native software is built of loosely coupled scalable, resilient and observable services that can
run in public, private, or hybrid clouds. Cloud-native software utilizes technologies like containers
(e.g., Docker), microservices, serverless functions, and container orchestration (e.g., Kubernetes),
and it can be automatically deployed using declarative code. Examples in this chapter assume
microservices deployed in a Kubernetes environment. Kubernetes is a cloud provider agnostic way
of running containerized microservices and has gained huge popularity in the recent years. If you
are new to Kubernetes, you can find an overview of the main concepts at https://kubernetes.io/docs/
concepts/.
This chapter discusses the following architectural principles and patterns:
system. That generic data ingester is not an application without some configuration that makes it
a specific service that we can call an application. For example, the generic data ingester can have
a configuration to ingest raw data from the radio network part of the mobile network. The generic
data ingester and the configuration together form an application: a radio network data ingester. Then
there could be another configuration for ingesting raw data from the core network part of the mobile
network. That configuration together with the generic data ingester make another application: a core
network data ingester
Computer programs and libraries are software components. A software component is something that
can be individually packaged, tested, and delivered. It consists of one or more classes, and a class
consists of one or more functions (class methods). (There are no traditional classes in purely functional
languages, but software components consist only of functions.) A computer program can also be
composed of one or more libraries, and a library can be composed of other libraries.
Architectural Principles 8
A software system is at the highest level in the software hierarchy and should have a single dedicated
purpose. For example, there can be an e-commerce or payroll software system. But there should
not be a software system that handles both e-commerce and payroll-related activities. If you were
a software vendor and had made an e-commerce software system, selling that to clients wanting an
e-commerce solution would be easy. But if you had made a software system that encompasses both
e-commerce and payroll functionality, it would be hard to sell that to customers wanting only an e-
commerce solution because they might already have a payroll software system and, of course, don’t
want another one.
Let’s consider the application level in the software hierarchy. Suppose we have designed a software
system for telecom network analytics. This software system is divided into four different applications:
Radio network data ingestion, core network data ingestion, data aggregation, and data visualization.
Architectural Principles 9
Each of these applications has a single dedicated purpose. Suppose we had coupled the data
aggregation and visualization applications into a single application. In that case, replacing the
data visualization part with a 3rd party application could be difficult. But when they are separate
applications with a well-defined interface, it would be much easier to replace the data visualization
application with a 3rd party application, if needed.
A software component should also have a single dedicated purpose. A service type of software
component with a single responsibility is called a microservice. For example in an e-commerce
software system, one microservice could be responsible for handling orders and another for handling
sales items. Both of those microservices are responsible for one thing only. By default, we should
not have a microservice responsible for both orders and sales items. That would be against the single
responsibility principle because order and sales item handling are two different functionalities at the
same level of abstraction. But sometimes it can make sense to combine two or more functionalities
into a single microservice. The reason could be that the functionalities strongly belong together and
by putting functionalities in a single microservice would diminish the drawbacks of microservices
like needing to use distributed transactions. Thus, the size of an microservice can vary and depends
on the abstraction level of the microservice. Some microservices can be small and some microservices
can be larger in size if they are at a higher level of abstraction. A microservice is always smaller than
a monolith and larger than a single function. Depending on the software system and its design, the
number of microservices in it can vary from a handful of microservices to tens or even hundreds of
microservices.
Let’s have an example with an e-commerce software system which consists of following functionality:
• sales items
• shopping cart
• orders
Let’s design how to split the above described functionality into microservices. When deciding which
functionality to put in the same microservice, we consider that the requirement of single responsibility
is met and high functional and non-functional cohesion is achieved. High functional cohesion
means that two functionalities depend on each other and tend to change together. An example of
low functional cohesion would email sending functionality and shopping cart functionality. Those
two functionalities don’t depend each other and they don’t change together. Thus, we should
always implement email sending and shopping cart functionalities as two separate microservices.
Non-functional cohesion is related to all non-functional aspects like architecture, technology stack,
deployment, scalability, resiliency, availability, observability, etc.
We should not put all the e-commerce software system functionality in a single microservice, because
there is not high non-functional cohesion between sales items related functionality and the other
functionality. The functionality related to sales items should be put into a separate microservice that
can scale separately, because sales item microservice receives much more traffic compared to shopping
cart and order services. Also, we should be able to choose appropriate database technology for the
sales item microservice. The database engine used should be optimized for high number of reads
and low number of writes. Later, we might realize that the pictures of the sales items should not be
Architectural Principles 10
stored in the same database as other sales item related information. We could then introduce a new
microservice solely dedicated to storing/retrieving sales item images.
Instead of implementing shopping cart and order related functionality as two separate microservices,
we could implement them as a single microservice. This is because shopping cart and order
functionalities have high functional cohesion. For example, whenever a new order is placed, the items
from shopping cart should be read and then removed. Also the non-functional cohesion is high, both
services can use the same technology stack and scale together. By putting the two functionalities
in a single microservice, we get rid of distributed transactions and are able to use standard database
transactions. That simplifies the codebase and testing of the microservice. We should not name
the microservice as shopping-cart-and-order-service, because that name does not denote a single
responsibility. What we should do is to name the microservice using a term on a higher level of
abstraction. We could name it as purchase-service, for example. In the future, if we notice that
the requirement of high functional and non-functional cohesion is no more met, we can split the
purchase-service into two separate microservices: shopping-cart-service and order-service.
The initial division of a software system into microservices should not be engraved in stone. You can
make changes to that in the future if seen appropriate. You might realize that a certain microservice
should be divided into two separate microservices due to different scaling needs, for example. Or you
might realize that it is better to couple two or more microservices into a single microservice to avoid
complex distributed transactions, for instance.
There are many advantages to microservices:
• Improved productivity
– You can choose the best-suited programming language and technology stack
– Microservices are easy to develop in parallel because there will be fewer merge conflicts
– Developing a monolith can result in more frequent merge conflicts
• Better scalability
– Each microservice encapsulates its data, which can be accessed via a public API only
– Upgrading only the changed microservice(s) is enough. No need to update the whole
monolith every time
– Build the changed microservice only. No need to build the whole monolith when
something changes
• Fewer dependencies
• Enables open-closed architecture, meaning architecture that is open for extension and closed
for modification
– New functionality not related to any existing microservice can be put into a new
microservice instead of modifying the current codebase.
The main drawback of microservices is the complexity that a distributed architecture brings.
Implementing transactions between microservices requires implementing distributed transactions
which are more complex than normal database transactions. Distributed transactions require more
code and testing. You can avoid distributed transactions by placing closely related services in a
single microservice if that is possible. Operating and monitoring a microservice-based software
system is complicated. Also, testing a distributed system is more challenging than testing a monolith.
Development teams should put focus on these areas by hiring DevOps and test automation specialists.
A library type of software component should also have a single responsibility. Like calling single-
responsibility services microservices, we can call a single-responsibility library a microlibrary. For
example, there could be a library for handling YAML-format content and another for handling XML-
format content. We shouldn’t try to bundle the handling of both formats into a single library. If
we did and needed only the YAML-related functionality, we would also always get the XML-related
functionality. Our code would always ship with the XML-related code, even if it is never used. This
can introduce unnecessary code bloat. We would also have to take any security patch for the library
into use, even if the patch was only for the XML-related functionality we don’t use.
When developing software, you should establish a naming convention for different kinds of software
components: microservices, clients, jobs, operators, command line interfaces (CLIs) and libraries.
Next I present my suggested way of naming different software components.
Architectural Principles 12
Microservices should define a public API that other microservices use for interfacing. Anything
behind the public API is private and inaccessible from other microservices.
Architectural Principles 13
While microservices should be made stateless (the stateless services principle is discussed later in this
chapter), a stateless microservice needs a place to store its state outside the microservice. Typically,
the state is stored in a database. The database is the microservice’s internal dependency and should be
made private to the microservice, meaning that no other microservice can directly access the database.
Access to the database happens indirectly using the microservice’s public API.
It is discouraged to allow multiple microservices to share a single database because then there is no
control how each microservice will use the database, and what requirements each microservice has
for the database.
Sometimes it is possible to share a physical database with several microservices if each microservice
uses its own logical database. This requires that a specific database user is created for each
microservice. Each database user can access only one logical database dedicated to a particular
microservice. In this way, no microservice can directly access another microservice’s database. This
approach can still pose some problems because the dimensioning requirements of all microservices for
the shared physical database must be considered. Also, the deployment responsibility of the shared
database must be decided. The shared database could be deployed as a platform or common service
as part of the platform or common services deployment, for example.
Service aggregation happens when one service on a higher level of abstraction aggregates services on
a lower level of abstraction.
Let’s have a service aggregation example with an e-commerce software system that allows people to
sell second-hand products online.
Architectural Principles 14
The problem domain of the e-commerce service consists of the following subdomains:
– Add new sales items, modify, view, and delete sales items
• Order domain
– Placing orders
* Ensure payment
* Create order
* Remove ordered items from the shopping cart
* Mark ordered sales items sold
* Send order confirmation by email
– View orders with sales item details
– Update and delete orders
We should not implement all the subdomains in a single ecommerce-service microservice because then
we would not be following the single responsibility principle. We should use service aggregation.
We create a separate lower-level microservice for each subdomain. Then we create a higher-level
ecommerce-service microservice that aggregates those lower-level microservices.
We can define that our ecommerce-service aggregates the following lower-level microservices:
• user-account-service
• sales-item-service
• shopping-cart-service
Architectural Principles 15
– View a shopping cart, add/remove sales items from a shopping cart or empty a shopping
cart
• order-service
– Create/Read/Update/Delete orders
• email-notification-service
Most of the microservices described above can be implemented as REST APIs because they mainly
contain basic CRUD (create, read, update and delete) operations for which a REST API is a good match.
We will handle API design in more detail in a later chapter. Let’s implement the sales-item-service as
a REST API using Django and Django REST framework.
Create a directory for the Django project and in that directory create a virtual environment:
venv\Scripts\activate
source venv/bin/activate
Install dependencies:
We will implement the SalesItem model class first which contains properties like name and price.
Figure 3.5. models.py
class SalesItem(models.Model):
user_account_id = models.BigIntegerField()
name = models.CharField(max_length=512)
price = models.IntegerField(
validators=[MinValueValidator(1), MaxValueValidator(2147483647)]
)
Next, we will implement a serializer for the SalesItem model. In the serializer class we list the wanted
model fields to be serialized by their names. This is good for security point of view. We should not
use the fields = '__all__', because if we add some internal fields to the model they would be
automatically serialized and sent to clients, exposing internal information to clients. It is safer to list
the serialized fields explicitly.
Figure 3.6. serializers.py
class SalesItemSerializer(serializers.ModelSerializer):
class Meta:
model = SalesItem
fields = ['id', 'user_account_id', 'name', 'price']
Finally we implement the SalesItemViewSet class which defines API endpoints for creating, getting,
updating, and deleting sales items:
Architectural Principles 17
class SalesItemViewSet(viewsets.ModelViewSet):
queryset = SalesItem.objects.all()
serializer_class = SalesItemSerializer
def list(
self, request: Request, *args: tuple[Any], **kwargs: dict[str, Any]
) -> Response:
user_account_id = request.query_params.get('userAccountId')
queryset = (
SalesItem.objects.all()
if user_account_id is None
else SalesItem.objects.filter(user_account_id=user_account_id)
)
We also need to update the urls.py file in the Django project to contain following:
Figure 3.8. urls.py
router = routers.DefaultRouter(trailing_slash=False)
router.register('sales-items', SalesItemViewSet)
urlpatterns = [
path('', include(router.urls)),
]
In the above examples I used idiomatic Django by defining models in models.py file, serializers
in serializers.py file and views in views.py file. Instead of that, you could define each class in its
own file and name the file according to the class name. In my opinion that is the best approach
to ensure a single responsibility for each module. For module names containing a class definition,
I use CapWords (or PascalCase). This is against the PEP 8 style guide, and it is the only deviation
from PEP 8 that I am making in this book. You can of course follow the PEP 8, but there are two
reasons for the approach I am using:
• The module name tells you that it contains a single public class definition and the module
name tells the name of the class. For example, if you have a module named OrderService.py,
you can expect that a class named OrderService can be imported from it.
• If you export an instance of the class from a module, that kind of module should be named
in snake case. For example, If you have a module with a private __OrderService class and
export of an order_service variable (singleton) that is an instance of the __OrderService
class, you should name that module as order_service.py. Now the order_service.py module
name tells everyone that a variable named order_service should be importable from that
module.
Let’s get to the Django example. If you have several models and you put them all into the models.py
file, the size of file will grow and it is no more easy to find the wanted class. A better option
is to create a models directory (a package) and put individual model classes to separate modules
in that directory. Finding the wanted model is easy, because the models are automatically listed
in alphabetical order in the file browser of your IDE. You cannot guarantee alphabetical order of
multiple class definitions if they are defined in a single file.
The same approach applies to modules that contain multiple functions. Suppose we have an utils.py
module containing various utility functions. Once again, a better option is to create a directory
named utils and put individual functions into their own files. You can then easily locate the wanted
utility function by looking at the utils directory contents. You can even create subdirectories to
make the structure hierarchical, like string directory under the utils directory for utility functions
related to string. A single file should contain a single public function, but it can additionally contain
multiple private functions that the public function utilizes. There will be more discussion about
single responsibility principle in the next chapter.
Below is defined how the ecommerce-service will orchestrate the use of the aggregated lower-level
microservices:
• Order domain
The ecommerce-service is meant to be used by frontend clients, like a web client, for example. Backend
for Frontend (BFF) term is often used to describe a microservice designed to provide an API for
frontend clients. Compared to the BFF term, service aggregation is a generic term, and there need not
be a frontend involved. You can use service aggregation to create an aggregated microservice used
by another microservice or microservices. There can even be multiple levels of service aggregation if
you have a large and complex software system.
Clients can have different needs regarding what information they want from an API. For example,
a mobile client might be limited to exposing only a subset of all information available from an API.
In contrast, a web client can fetch all information, or it can be customized what information a client
retrieves from the API.
All of the above requirements are something that a GraphQL-based API can fulfill. For that reason, it
would be wise to implement the ecommerce-service using GraphQL. I have chosen the Ariadne library
to implement a single GraphQL query in the ecommerce-service. Below is the implementation of a
user query, which fetches data from three microservices. It fetches user account information from the
user-account-service, the user’s sales items from the sales-item-service, and finally, the user’s orders
from the order-service.
Let’s create a new Python project and install the following dependencies:
Architectural Principles 20
query = QueryType()
type_defs = gql(
"""
type UserAccount {
id: ID!,
userName: String!
# Define additional properties...
}
type SalesItem {
id: ID!,
name: String!
# Define additional properties...
}
type Order {
id: ID!,
userId: ID!
# Define additional properties...
}
type User {
userAccount: UserAccount!
salesItems: [SalesItem!]!
orders: [Order!]!
}
type Query {
user(id: ID!): User!
}
"""
)
USER_ACCOUNT_SERVICE_URL = os.environ.get('USER_ACCOUNT_SERVICE_URL')
SALES_ITEM_SERVICE_URL = os.environ.get('SALES_ITEM_SERVICE_URL')
ORDER_SERVICE_URL = os.environ.get('ORDER_SERVICE_URL')
@query.field('user')
async def resolve_user(_, info, id):
async with AsyncClient() as client:
[
user_account_service_response,
sales_item_service_response,
order_service_response,
Architectural Principles 21
] = await gather(
client.get(f'{USER_ACCOUNT_SERVICE_URL}/user-accounts/{id}'),
client.get(
f'{SALES_ITEM_SERVICE_URL}/sales-items?userAccountId={id}'
),
client.get(f'{ORDER_SERVICE_URL}/orders?userAccountId={id}'),
)
user_account_service_response.raise_for_status()
sales_item_service_response.raise_for_status()
order_service_response.raise_for_status()
return {
'userAccount': user_account_service_response.json(),
'salesItems': sales_item_service_response.json(),
'orders': order_service_response.json(),
}
In order to start the GraphQL server, we need an ASGI web server (e.g., hypercorn). You can run the
GraphQL server:
export SALES_ITEM_SERVICE_URL=http://127.0.0.1:8000
export USER_ACCOUNT_SERVICE_URL=...
export ORDER_SERVICE_URL=...
hypercorn app:app -b 127.0.0.1:5000
You can access the GraphiQL UI at http://127.0.0.1:5000/graphql On the left-hand side pane, you can
specify a GraphQL query. For example, to query the user identified with id 2:
{
user(id: 2) {
userAccount {
id
userName
}
salesItems {
id
name
}
orders {
id
userId
}
}
}
Because we only have implemented the sales-item-service lower-level microservice and haven’t
implemented all the lower-level microservices, let’s modify the app.py to return dummy static results
instead of accessing the non-existent lower-level microservices:
Architectural Principles 22
query = QueryType()
type_defs = gql(
"""
type UserAccount {
id: ID!,
userName: String!
# Define additional properties...
}
type SalesItem {
id: ID!,
name: String!
# Define additional properties...
}
type Order {
id: ID!,
userId: ID!
# Define additional properties...
}
type User {
userAccount: UserAccount!
salesItems: [SalesItem!]!
orders: [Order!]!
}
type Query {
user(id: ID!): User!
}
"""
)
SALES_ITEM_SERVICE_URL = os.environ.get('SALES_ITEM_SERVICE_URL')
@query.field('user')
async def resolve_user(_, info, id):
async with AsyncClient() as client:
[
user_account_service_response,
sales_item_service_response,
Architectural Principles 23
order_service_response,
] = await gather(
getUserAccount(id),
client.get(
f'{SALES_ITEM_SERVICE_URL}/sales-items?userAccountId={id}'
),
getOrders(id),
)
return {
'userAccount': user_account_service_response.json(),
'salesItems': sales_item_service_response.json(),
'orders': order_service_response.json(),
}
If we now execute the previously specified query, we should see the below query result. We assume
that sales-item-service returns a single sales item with id 1.
{
"data": {
"user": {
"userAccount": {
"id": "2",
"userName": "Petri"
},
"salesItems": [
{
"id": "1",
"name": "Sales item 1"
}
],
"orders": [
{
"id": "1",
"userId": "2"
}
]
}
}
}
We can simulate a failure by modifying the app.py to start the app with wrong URL (port 8000 is
changed to 8001):
export SALES_ITEM_SERVICE_URL=http://127.0.0.1:8001
hypercorn app:app -b 127.0.0.1:5000
Now, if we execute the query again, we will get the below error response because the server cannot
connect to a service at the local host on port 8001 because there is no service running at localhost:8001.
Architectural Principles 24
{
"data": null,
"errors": [
{
"message": "All connection attempts failed",
"locations": [
{
"line": 2,
"column": 3
}
],
"path": [
"user"
],
"extensions": {
"exception": {
"stacktrace": [ ...
],
"context": {
"mapped_exc": "<class 'httpx.ConnectError'>",
"from_exc": "<class 'httpc...rotocolError'>",
"to_exc": "<class 'httpx...rotocolError'>",
"message": "'All connecti...tempts failed'"
}
}
}
}
]
}
You can also query a user and specify the query to return only a subset of fields. The below query does
not return ids and does not return orders. The server-side GraphQL library automatically includes
only requested fields in the response. You, as a developer, do not have to do anything. You can, of
course, optimize your microservice to fetch only the requested fields from the database if you desire.
{
user(id: 2) {
userAccount {
userName
}
salesItems {
name
}
}
}
{
"data": {
"user": {
"userAccount": {
"userName": "pksilen"
},
"salesItems": [
{
"name": "sales item 1"
}
]
}
}
}
The above example lacks some features like authorization that is needed for production. Authorization
should check that a user can only execute the user query to fetch his/hers resources. The authorization
should fail if a user tries to execute the user query using someone else’s id. Security is discussed more
in the coming security principles chapter.
The user query in the previous example spanned over multiple lower-level microservices: user-
account-service, sales-item-service, and order-service. Because the query is not mutating anything, it
can be executed without a distributed transaction. A distributed transaction is similar to a regular
(database) transaction, with the difference that it spans multiple remote services.
The API endpoint for placing an order in the ecommerce-service needs to create a new order using the
order-service, mark purchased sales items as bought using the sales-item-service, empty the shopping
cart using the shopping-cart-service, and finally send order confirmation email using the email-
notification-service. These actions need to be wrapped inside a distributed transaction because we
want to be able to roll back the transaction if any of these operations fail. Guidance on how to
implement a distributed transaction is given later in this chapter.
Service aggregation utilizes the facade pattern. The facade pattern allows hiding individual lower-
level microservices behind a facade (the higher-level microservice). The clients of the software
system access the system through the facade. They don’t directly contact the individual lower-level
microservices behind the facade because it breaks the encapsulation of the lower-level microservices
inside the higher- level microservice. A client accessing the lower-level microservices directly creates
unwanted coupling between the client and the lower-level microservices, which makes changing the
lower-level microservices hard without affecting the client.
Think about a post office counter as an example of a real-world facade. It serves as a facade for the
post office and when you need to receive a package, you communicate with that facade (the post office
clerk at the counter). You have a simple interface of just telling the package code, and the clerk will
find the package from the correct shelf and bring it to you. If you hadn’t that facade, it would mean
that you would have to do lower-level work by yourself. Instead of just telling the package code, you
must walk to the shelves and try to find the proper shelf where your package is located, make sure
that you pick the correct package, and then carry the package by yourself. In addition to requiring
more work, this approach is more error-prone. You can accidentally pick someone else’s package if
Architectural Principles 26
you are not pedantic enough. And think about the case when you go to the post office next time and
find out that all the shelves have been rearranged. This wouldn’t be a problem if you used the facade.
Service aggregation, where a higher-level microservice delegates to lower-level microservices, also
implements the bridge pattern. A higher-level microservice provides only some high-level control
and relies on the lower-level microservices to do the actual work.
Service aggregation allows using more design patterns from the object-oriented design world. The
most useful design patterns in the context of service aggregation are:
• Decorator pattern
• Proxy pattern
• Adapter pattern
Decorator pattern can be used to add functionality in a higher-level microservice for lower-level
microservices. One example is adding audit logging in a higher-level microservice. For example,
you can add audit logging to be performed for requests in the ecommerce-service. You don’t need to
implement the audit logging separately in all the lower-level microservices.
Proxy pattern can be used to control the access from a higher-level microservice to lower-level
microservices. Typical examples of the proxy pattern are authorization and caching. For example,
you can add authorization and caching to be performed for requests in the ecommerce-service. Only
after successful authorization will the requests be delivered to the lower-level microservices. And
if a request’s response is not found in the cache, the request will be forwarded to the appropriate
lower-level microservice. You don’t need to implement authorization and caching separately in all
the lower-level microservices.
Adapter pattern allows a higher-level microservice to adapt to different versions of the lower-level
microservices while maintaining the API towards clients unchanged.
Cohesion refers to the degree to which classes inside a service belong together. Coupling refers
to how many other services a service is interacting with. When following the single responsibility
principle, it is possible to implement services as microservices with high cohesion. Service aggregation
adds low coupling. Microservices and service aggregation together enable high cohesion and low
coupling, which is the target of good architecture. If there were no service aggregation, lower-level
microservices would need to communicate with each other, creating high coupling in the architecture.
Also, clients would be coupled with the lower-level microservices. For example, in the e-commerce
example, the order-service would be coupled with almost all the other microservices. And if the
sales-item-service API changed, in the worst case, there would be a change needed in three other
Architectural Principles 27
microservices. When using service aggregation, lower-level microservices are coupled only to the
higher-level microservice.
High cohesion and low coupling mean that the development of services can be highly parallelized. In
the e-commerce example, the five lower-level microservices don’t have coupling with each other. The
development of each of those microservices can be isolated and assigned to a single team member or
a group of team members. The development of the lower-level microservices can proceed in parallel,
and the development of the higher-level microservice can start when the APIs of the lower-level
microservices become stable enough. The target is to design the lower-level microservices APIs early
on to enable the development of the higher-level microservice.
Suppose you need a library for parsing configuration files (in particular syntax) in YAML or JSON
format. In that case, you can first create the needed YAML and JSON parsing libraries (or use
existing ones). Then you can create the configuration file parsing library, composed of the YAML
and JSON parsing libraries. You would then have three different libraries: one higher-level library
and two lower-level libraries. Each library has a single responsibility: one for parsing JSON, one for
parsing YAML, and one for parsing configuration files with a specific syntax, either in JSON or YAML.
Software components can now use the higher-level library for parsing configuration files, and they
need not be aware of the JSON/YAML parsing libraries at all.
Architectural Principles 28
Duplication at the software system level happens when two or more software systems use the same
services. For example, two different software systems can both have a message broker, API gateway,
identity and access management (IAM) application, and log and metrics collection services. You
could continue this list even further. The goal of duplication-free architecture is to have only one
deployment of these services. Public cloud providers offer these services for your use. If you have a
Kubernetes cluster, an alternative solution is to deploy your software systems in different Kubernetes
namespaces and deploy the common services to a shared Kubernetes namespace, which can be called
the platform or common-services, for example.
Duplication at the service level happens when two or more services have common functionality that
could be extracted to a separate new microservice. For example, consider a case where both a user-
account-service and order-service have the functionality to send notification messages by email to a
user. This email-sending functionality is duplicated in both microservices. Duplication can be avoided
by extracting the email-sending functionality to a separate new microservice. The single responsibility
of the microservices becomes more evident when the email-sending functionality is extracted to its
own microservice. One might think another alternative is extracting the common functionality to a
library. This is not a solution that is as good because microservices become dependent on the library.
When changes to the library are needed (e.g., security updates), you must change the library version
in all the microservices using the library and then test all the affected microservices.
When a company develops multiple software systems in several departments, the software devel-
opment typically happens in silos. The departments are not necessarily aware of what the other
departments are doing. For example, it might be possible that two departments have both developed
a microservice for sending emails. There is now software duplication that none is aware of. This is not
an optimal situation. A software development company should do something to enable collaboration
between the departments and break the silos. One good way to share software is to establish
shared folders or organizations in the source code repository hosting service that the company uses.
For example, in GitHub, you could create an organization for sharing source code repositories for
common libraries and another for sharing common services. Each software development department
has access to those common organizations and can still develop its software inside its own GitHub
organization. In this way, the company can enforce proper access control for the source code of
different departments, if needed. When a team needs to develop something new, it can first consult
the common source code repositories to find out if something is already available that can be reused
as such or extended.
Service configuration means any data that varies between service deployments (different environ-
ments, different customers, etc.). The following are typical places where externalized configuration
can be stored when software is running in a Kubernetes cluster:
• Environment variables
• Kubernetes ConfigMaps
• Kubernetes Secrets
In the following sections, let’s discuss these three configuration storage options.
You should not hardcode the default values for environment variables in the source code. This is
because the default values are typically not for a production environment but for a development
environment. Suppose you deploy a service to a production environment and forget to set all the
needed environment variables. In that case, your service will have some environment variables with
default values unsuitable for a production environment.
You can supply environment variables for a microservice in environment-specific .env files. For
example, you can have an .env.dev file for storing environment variable values for a development
environment and an .env.ci file for storing environment variable values used in the microservice’s
continuous integration (CI) pipeline. The syntax of .env files is straightforward. There is one
environment variable defined per line:
Figure 3.13. .env.dev
NODE_ENV=development
HTTP_SERVER_PORT=3001
LOG_LEVEL=INFO
MONGODB_HOST=localhost
MONGODB_PORT=27017
MONGODB_USER=
MONGODB_PASSWORD=
NODE_ENV=integration
HTTP_SERVER_PORT=3001
LOG_LEVEL=INFO
MONGODB_HOST=localhost
MONGODB_PORT=27017
MONGODB_USER=
MONGODB_PASSWORD=
When a software component is deployed to a Kubernetes cluster using Helm, environment variable
values should be defined in the Helm chart’s values.yaml file:
Figure 3.15. values.yaml
nodeEnv: production
httpServer:
port: 8080
database:
mongoDb:
host: my-service-mongodb
port: 27017
The values in the above values.yaml file can be used to define environment variables in a Kubernetes
Deployment using the following Helm chart template:
Architectural Principles 31
When Kubernetes starts a microservice pod, the following environment variables will be made
available for the running container:
NODE_ENV=production
HTTP_SERVER_PORT=8080
MONGODB_HOST=my-service-mongodb
MONGODB_PORT=27017
The below Kubernetes Deployment descriptor defines that the content of the my-service ConfigMap’s
key LOG_LEVEL will be stored in a volume named config-volume, and the value of the LOG_LEVEL key will
be stored in a file named LOG_LEVEL. After mounting the config-volume to the /etc/config directory
in a my-service container, it is possible to read the contents of the /etc/config/LOG_LEVEL file, which
contains the text: INFO.
Architectural Principles 32
In Kubernetes, editing of a ConfigMap is reflected in the respective mounted file. This means that
you can listen to changes in the /etc/config/LOG_LEVEL file. Below is shown how to do it using the
watchdog library:
class UpdateLogLevelFsEventHandler(FileSystemEventHandler):
def on_modified(self, event):
try:
with open('/etc/config/LOG_LEVEL', 'r') as file:
new_log_level = file.read()
# Check here that 'new_log_level'
# contains a valid log level
update_log_level(new_log_level)
except:
# Handler errors
update_log_level_fs_event_handler = UpdateLogLevelFsEventHandler()
observer = Observer()
observer.schedule(
update_log_level_fs_event_handler,
path='/etc/config/LOG_LEVEL',
recursive=False
)
observer.start()
# ...
Architectural Principles 33
# observer.stop()
# observer.join()
database:
mongoDb:
host: my-service-mongodb
port: 27017
user: my-service-user
password: Ak9(lKt41uF==%lLO&21mA#gL0!"Dps2
apiVersion: v1
kind: Secret
metadata:
name: my-service
type: Opaque
data:
mongoDbUser: {{ .Values.database.mongoDb.user | b64enc }}
mongoDbPassword: {{ .Values.database.mongoDb.password | b64enc }}
After being created, secrets can be mapped to environment variables in a Deployment descriptor for
a microservice. In the below example, we map the value of the secret key mongoDbUser from the
my-service secret to an environment variable named MONGODB_USER and the value of the secret key
mongoDbPassword to an environment variable named MONGODB_PASSWORD.
Architectural Principles 34
- name: MONGODB_PASSWORD
valueFrom:
secretKeyRef:
name: my-service
key: mongoDbPassword
When a my-service pod is started, the following environment variables are made available for the
running container:
MONGODB_USER=my-service-user
MONGODB_PASSWORD=Ak9(lKt41uF==%lLO&21mA#gL0!"Dps2
Let’s have an example where a microservice depends on a MongoDB service. The MongoDB service
should expose itself by defining a host and port combination. For the microservice, you can specify
the following environment variables for connecting to a localhost MongoDB service:
MONGODB_HOST=localhost
MONGODB_PORT=27017
Suppose that in a Kubernetes-based production environment, you have a MongoDB service in the
cluster accessible via a Kubernetes Service named my-service-mongodb. In that case, you should have
the environment variables for the MongoDB service defined as follows:
Architectural Principles 35
MONGODB_HOST=my-service-mongodb.default.svc.cluster.local
MONGODB_PORT=8080
Alternatively, a MongoDB service can run in the MongoDB Atlas cloud. Then the MongoDB service
could be connected to using the following kind of environment variable values:
MONGODB_HOST=my-service.tjdze.mongodb.net
MONGODB_PORT=27017
As shown with the above examples, you can easily substitute a different MongoDB service depending
on your microservice’s environment. If you want to use a different MongoDB service, you don’t need
to modify the microservice’s source code but only change the configuration.
In case of a failure when processing a request, the request processing microservice sends an error
response to the requestor microservice. The requestor microservice can cascade the error up in the
synchronous request stack until the initial request maker is reached. Often, that initial request maker
is a client, like a web or mobile client. The initial request maker can then decide what to do. Usually, it
will attempt to send the request again after a while (we are assuming here that the error is a transient
server error, not a client error, like a bad request, for example)
Asynchronous communication can be implemented using a message broker. Services can produce
messages to the message broker and consume messages from the message broker. There are
several message broker implementations available like Apache Kafka, RabbitMQ, Apache ActiveMQ
and Redis. When a microservice produces a request to a message broker’s topic, the producing
microservice must wait for an acknowledgment from the message broker indicating that the request
was successfully stored to multiple, or preferably all, replicas of the topic. Otherwise, there is no 100%
guarantee that the request was successfully delivered in some message broker failure scenarios.
When an asynchronous request is of type fire-and-forget (i.e., no response is expected), the request
processing microservice must ensure that the request will eventually get processed. If the request
processing fails, the request processing microservice must reattempt the processing after a while. If a
termination signal is received, the request processing microservice instance must produce the request
back to the message broker and allow some other instance of the microservice to fulfill the request.
The rare possibility exists that the production of the request back to the message broker fails. You
Architectural Principles 38
could then try to save the request to a persistent volume, for instance, but also that can fail. The
likelihood of such a situation is very low.
Designing APIs for inter-service communication is described in more detail in the API design
principles chapter.
I often compare software system architectural design to the architectural design of a house. The
house represents a software system. The facade of the house represents the external interfaces of
the software system. The rooms in the house are the microservices of the software system. Like a
microservice, a single room usually has a dedicated purpose. The architectural design of a software
system encompasses the definition of external interfaces, microservices, and their interfaces to other
microservices.
The result of the architectural design phase is a ground plan for the software system. After the
architectural design, you have the facade designed, and all the rooms are specified: the purpose of
each room and how rooms interface with other rooms.
Designing an individual microservice is no more architectural design it is like the interior design of a
single room. The design of microservices is handled using object-oriented design principles, presented
in the next chapter.
Domain-driven design (DDD) is a software design approach where software is modeled to match
a problem/business domain according to input from the domain experts. Usually, these experts
come from the business and specifically from product management. The idea of DDD is to transfer
the domain knowledge from the domain experts to individual software developers so that everyone
participating in software development can share a common language that describes the domain. The
idea of the common language is that people can understand each other, and no multiple terms are
used to describe a single thing. This common language is also called the ubiquitous language.
The domain knowledge is transferred from product managers and architects to lead developers and
product owners (POs) in development teams. The team’s lead developer and PO share the domain
knowledge with the rest of the team. This usually happens when the team processes epics and features
and splits them into user stories in planning sessions. A software development team can also have a
dedicated domain expert or experts.
DDD starts from the top business/problem domain. The top domain is split into multiple subdomains
on the same abstraction level: one level lower than the top domain. A domain should be divided into
subdomains so that there is minimal overlap between subdomains. Subdomains will be interfacing
with other subdomains using well-defined interfaces. Subdomains are also called bounded contexts,
and technically they represent an application or a microservice. For example, a banking software
system can have a subdomain or bounded context for loan applications and another for making
payments.
1) Ingesting raw data from various sources of the mobile telecom network
Architectural Principles 40
Let’s pick up some keywords from the above definitions and formulate short names for the subdo-
mains:
The Presenting insights domain should contain a web application that can present insights in various
ways, like using dashboards containing charts presenting aggregated counters and calculated KPIs.
We can call this application Insights visualizer.
Now we have the following applications for the software system defined:
Next, we continue architectural design by splitting each application into one or more software
components. (services, clients, and libraries). When defining the software components, we must
remember to follow the single responsibility principle, avoid duplication principle and externalized
service configuration principle.
When considering the Radio network data ingester and Core network data ingester applications, we
can notice that we can implement them both using a single microservice, data-ingester-service, with
different configurations for radio and core network. This is because the protocol for ingesting the
data is the same for radio and core networks. The two networks differ in the schema of the ingested
data. Using a single configurable microservice, we can avoid code duplication by using externalized
configuration.
The Data aggregator application can be implemented using a single data-aggregator-service microser-
vice. We can use externalized configuration to define what counters and KPIs the microservice should
aggregate and calculate.
The Insights visualizer application consists of three different software components:
• A web client
• A service for fetching aggregated and calculated data (counters and KPIs)
• A service for storing the dynamic configuration of the web client
Architectural Principles 42
The dynamic configuration service stores information about what insights to visualize and how in
the web client.
Microservices in the Insights visualizer application are:
• insights-visualizer-web-client
• insights-visualizer-data-service
• insights-visualizer-configuration-service
Now we are ready with the microservice-level architectural design for the software system.
The last part of architectural design is to define the inter-service communication methods. The
data-ingester-service needs to send raw data to data-aggregator-service. The sending of data is
done using asynchronous fire-and-forget requests and is implemented using a message broker. The
communication between the data-aggregator-service and the insights-visualizer-data-service should
use the shared data communication method because the data-aggregator-service generates aggregated
data that the insights-visualizer-data-service uses. The communication between the insights-
visualizer-web-client in the frontend and the insights-visualizer-data-service and insights-visualizer-
configuration-service in the backend is synchronous communication that can be implemented using
an HTTP-based JSON-RPC, REST, or GraphQL API.
Architectural Principles 43
Next, design continues in development teams. Teams will specify the APIs between the microservices
and conduct further domain-driven design and object-oriented design for the microservices. API
design is covered in a later chapter, and object-oriented design is covered in the next chapter.
1) Loan applications
2) Making payments
In the loan applications domain, a customer can submit a loan application. The eligibility for the loan
will be assessed, and the bank can either accept the loan application and pay the loan or reject the loan
application. In the making payments domain, a customer can make payments. Making a payment
will withdraw money from the customer’s account. It is also a transaction that should be recorded.
Architectural Principles 44
Let’s add a feature that a payment can be made to a recipient in another bank:
Let’s add another feature: money can be transferred from external banks to a customer’s account.
Architectural Principles 45
As can be noticed from the above pictures, the architecture of the banking software system evolved
when new features were introduced. For example, two new subdomains (or bounded contexts)
were created: money transfer and external money transfer. There was not so much change in the
microservices themselves, but how they are grouped logically to bounded contexts was altered.
A microservice can be made stateless by storing its state outside itself. The state can be stored in a
data store that microservice instances share. Typically, the data store is a database or an in-memory
cache (like Redis, for example).
In a Kubernetes cluster, the resiliency of a microservice is handled by the Kubernetes control plane. If
a computing node where a microservice instance is located needs to be decommissioned, Kubernetes
will create a new instance of the microservice on another computing node and then evict the
microservice from the node to be decommissioned.
What needs to be done in a microservice is to make it listen to Linux termination signals, especially
the SIGTERM signal, which is sent to a microservice instance to indicate that it should terminate.
Upon receiving a SIGTERM signal, the microservice instance should initiate a graceful shutdown. If
the microservice instance does not shut down gracefully, Kubernetes will eventually issue a SIGKILL
signal to terminate the microservice instance forcefully. The SIGKILL signal is sent after a termination
grace period has elapsed. This period is, by default, 30 seconds, but it is configurable.
There are other reasons a microservice instance might be evicted from a computing node. One such
reason is that Kubernetes must assign (for some reason which can be related to CPU/memory requests,
Architectural Principles 47
for instance) another microservice to be run on that particular computing node, and your microservice
won’t fit there anymore and must be moved to another computing node.
If a microservice pod crashes, Kubernetes will notice that and start a new pod so that there are
always the wanted number of microservice replicas (pods/instances) running. The replica count can
be defined in the Kubernetes Deployment for the microservice.
But what if a microservice pod enters a deadlock and cannot serve requests? This situation can be
remediated with the help of a liveness probe. You should always specify a liveness probe for each
microservice Deployment. Below is an example of a microservice Deployment where an HTTP GET
type liveness probe is defined:
Figure 3.33. deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "microservice.fullname" . }}
spec:
replicas: 1
selector:
matchLabels:
{{- include "microservice.selectorLabels" . | nindent 6 }}
template:
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.imageRegistry }}/{{ .Values.imageRepository }}:{{ .Values.im\
ageTag }}"
livenessProbe:
httpGet:
path: /isMicroserviceAlive
port: 8080
initialDelaySeconds: 30
failureThreshold: 3
periodSeconds: 3
Kubernetes will poll the /isMicroserviceAlive HTTP endpoints of the microservice instances every
three seconds (after the initial delay of 30 seconds reserved for the microservice instance startup).
The HTTP endpoint should return the HTTP status code 200 OK. Suppose requests to that endpoint
fail (e.g., due to a deadlock) three times in a row (defined by the failureThreshold property) for
a particular microservice instance. In that case, the microservice instance is considered dead, and
Kubernetes will terminate the pod and launch a new pod automatically.
When upgrading a microservice to a newer version, the Kubernetes Deployment should be modified.
A new container image tag should be specified in the image property of the Deployment. This change
will trigger an update procedure for the Deployment. By default, Kubernetes performs a rolling
update, which means your microservice can serve requests during the update procedure without
downtime.
Suppose you had defined one replica in the microservice Deployment (as above replicas: 1), and
performed a Deployment upgrade (change the image to a newer version). In that case, Kubernetes
Architectural Principles 48
would create a new pod using the new image tag, and only after the new pod is ready to serve requests
will Kubernetes delete the pod running the old version. So there is no downtime, and the microservice
can serve requests during the upgrade procedure.
If your microservice deployment had more replicas, e.g., 10, by default, Kubernetes would terminate
max 25% of the running pods and start max 25% of the replica count new pods. The rolling update
means that updating pods happens in chunks, 25% of the pods at a time. The percentage value is
configurable.
Horizontal scaling means adding new instances or removing instances of a microservice. Horizontal
scaling of a microservice requires statelessness. Stateful services are usually implemented using sticky
sessions so that requests from a particular client go to the same service instance. The horizontal scaling
of stateful services is complicated because a client’s state is stored on a single service instance. In the
cloud-native world, we want to ensure even load distribution between microservice instances and
target a request to any available microservice instance for processing.
Initially, a microservice can have one instance only. When the microservice gets more load, one
instance cannot necessarily handle all the work. In that case, the microservice must be scaled
horizontally (scaled out) by adding one or more new instances. When several microservice instances
are running, the state cannot be stored inside the instances anymore because different client requests
can be directed to different microservice instances. A stateless microservice must store its state outside
the microservice in an in-memory cache or a database shared by all the microservice instances.
Microservices can be scaled manually, but that is rarely desired. Manual scaling requires someone to
constantly monitor the software system and make the needed scaling actions manually. Microservices
should scale horizontally automatically. There are two requirements for a microservice to be
horizontally auto-scalable:
Typical metrics for horizontal autoscaling are CPU utilization and memory consumption. In many
cases, using the CPU utilization metric alone can be enough. It is also possible to use a custom or
external metric. For example, the Kafka consumer lag metric can indicate if the consumer lag is
increasing and if a new microservice instance should be spawned to reduce the consumer lag.
In Kubernetes, you can specify horizontal autoscaling using the HorizontalPodAutoscaler (HPA):
Architectural Principles 49
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: my-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-service
minReplicas: 1
maxReplicas: 99
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 75
- type: Resource
resource:
name: memory
targetAverageUtilization: 75
In the above example, the my-service microservice is horizontally auto-scaled so that there is always
at least one instance of the microservice running. There can be a maximum of 99 instances of the
microservice running. The microservice is scaled out if CPU or memory utilization is over 75%, and it
is scaled in (the number of microservice instances is reduced) when both CPU and memory utilization
falls below 75%.
If only one microservice instance runs in an environment, it does not make the microservice
highly available. If something happens to that one instance, the microservice becomes temporarily
unavailable until a new instance has been started and is ready to serve requests. For this reason,
you should run at least two or more instances for all business-critical microservices. You should also
ensure that these two instances don’t run on the same computing node. The instances should run
in different availability zones of the cloud provider. Then a catastrophe in availability zone 1 won’t
necessarily affect microservices running in availability zone 2.
You can ensure that no two microservice instances run on the same computing node by defining an
anti-affinity rule in the microservice Deployment:
Architectural Principles 50
For a business-critical microservice, we need to modify the horizontal autoscaling example from the
previous section: The minReplicas property should be increased to 2:
Figure 3.36. hpa.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: my-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-service
minReplicas: 2
maxReplicas: 99
.
.
.
In addition to metrics, to enable drill-down to a problem’s root cause, distributed tracing should be
implemented to log the communication between different microservices to troubleshoot inter-service
communication problems. Each microservice must also log at least all errors and warnings. These
logs should be fed to a centralized log collection system where querying the logs is made easy.
Semantic versioning means that given a version number in the format: <MAJOR>.<MINOR>.<PATCH>,
increment the:
In semantic versioning, major version zero (0.x.y) is for initial development. Anything can change at
any time. The public API should not be considered stable. Typically, software components with zero
major versions are still in a proof of concept phase, and anything can change. If you want or need to
take a newer version into use, you must be prepared for changes, and sometimes these changes can
be considerable, resulting in a lot of refactoring.
Architectural Principles 52
And when you are ready to migrate to the new major version of the library, you can uninstall the old
version and install the new major version in the following way:
pip uninstall common-ui-lib
pip install common-ui-lib-2
Consider when to create a new major version of a library. When you created the first library version,
you probably did not get everything right in the public API. That is normal. It is almost impossible to
create a perfect API the first time. Before releasing the second major version of the library, I suggest
reviewing the new API with a team, collecting user feedback, and waiting long enough to get the API
“close to perfect” the second time. No one wants to use a library with frequent backward-incompatible
major version changes.
Trunk-based development is suitable for modern software which have extensive set of automated
functional and non-functional tests and can use feature toggles. There is also an older branching
model called GitFlow which can be used instead of trunk-based development to get better control of
releasing software. You can find more information about the GitFlow at https://www.atlassian.com/
git/tutorials/comparing-workflows/gitflow-workflow.
When you need to develop a new feature, it can be done using either of the following ways:
# First commit
git commit -a -m "Commit message"
When the feature is ready, you can create a pull or merge request from the feature branch to the main
branch. You can create the pull/merge request in your Git hosting service’s web page or using the
link in the output of the git push command. After creating the pull/merge request a build pipeline
should be started and colleagues can review the code. The build started after creating the pull/merge
request builds candidate artifacts which are stored to the artifact repository, but they are deleted after
a certain period of time. If you need to change the code after making the pull/merge request, just
modify the code, add, commit and push it to the repo as shown earlier. After code is reviewed and
the build pipeline succeeds, the merge can be completed. After the merge, a build pipeline from the
main branch should be run. This pipeline run should push the final release artifacts to the artifact
repository.
Teams merge their part of the feature to the main branch. When all feature branches are merged into
the main branch, the feature toggle can be switched on to activate the feature.
People who haven’t used feature toggles may have some prejudice and misconceptions:
– Not all features need a toggle, only those should have a toggle that need it. An example
of a case when feature toggle is needed is when the feature is implemented but not yet
100% tested.
– This can be true if codebase contains technical debt and is not properly designed. (=
Applying shotgun surgery to spaghetti code)
– Usually implementing a feature toggle does not need changes in many places but just a
single or few places
– Almost always feature toggles can be implemented with negligible performance degra-
dation, e.g. one or a few if-statements
– First of all, do you really need to remove them? Many times feature toggles can be left in
the codebase, if they don’t degenerate the readability of the code or code performance
– When the codebase has the correct design (e.g. open-closed principle is used), removing
a feature toggle is a lot easier compared to situation where shotgun surgery needs to be
applied to spaghetti code.
– Comprehensive automated testing should make it relatively safe to remove feature toggles
Event sourcing ensures that all changes to the state of a service are stored as an ordered sequence
of events. Event sourcing makes it possible to query state changes. Also, the state change events act
as an audit log. It is possible to reconstruct past states and rewind the current state to some earlier
state. Unlike CRUD actions on resources, event sourcing utilizes only CR (create and read) actions. It
is only possible to create new events and read events. It is not possible to update an existing event or
delete an event.
Let’s have an example of using event sourcing to store orders in an e-commerce software system. The
order-service should be able to store the following events:
• AbstractOrderEvent
– Abstract base event for other concrete events containing timestamp and order id proper-
ties
• OrderCreatedEvent
• OrderPaymentEvent
• OrderModificationEvent
– Contains information about modifications made by the customer to the order before
packaging
• OrderPackagedEvent
• OrderCanceledEvent
– Describes that the customer has canceled the order and the order should not be shipped
• OrderShippedEvent
– Contains information about the logistics partner and the tracking id of the order shipment
• OrderDeliveredEvent
• OrderShipmentReceivedEvent
Architectural Principles 57
• OrderReturnedEvent
• OrderReturnShippedEvent
– Contains information about the logistics partner and the tracking id of the return shipment
• OrderReturnReceivedEvent
– Contains information about who handled the order return and the status of returned items
• OrderReimbursedEvent
– Contains information about the reimbursement for the returned order item(s) to the
customer
Let’s consider the previous order-service example that used event sourcing. In the order-service, all
the commands are events. We want users to be able to query orders efficiently. We should have an
additional representation of an order in addition to events because it is inefficient to always generate
the current state of an order by replaying all the related events. For this reason, our architecture should
utilize the CQRS pattern and divide the order-service into two different services: order-command-
service and order-query-service.
Architectural Principles 58
The order-command-service is the same as the original order-service that uses event sourcing, and
the order-query-service is a new service. The order-query-service has a database where it holds a
materialized view of orders. The two services are connected with a message broker. The order-
command-service sends events to a topic in the message broker. The order-query-service reads events
from the topic and applies changes to the materialized view. The materialized view is optimized
to contain basic information about each order, including its current state, to be consumed by the
e-commerce company staff and customers. Because customers query orders, the materialized view
should be indexed by the customer id column to enable fast retrieval. Suppose that, in some special
case, a customer needs more details about an order that is available in the materialized view. In
that case, the order-command-service can be used to query the events of the order for additional
information.
Let’s have an example of a distributed transaction using the saga orchestration pattern with an
online banking system where users can transfer money from their accounts. We have a higher-level
microservice called account-money-transfer-service, which is used to make money transfers. The
banking system has also two lower-level microservices called account-balance-service and account-
transaction-service. The account-balance-service holds accounts’ balance information while the
account-transaction-service keeps track of all transactions on the accounts. The account-money-
transfer-service acts as a saga orchestrator and utilizes both of the lower-level microservices to make
a money transfer to happen.
Let’s consider a distributed transaction executed by the account-money-transfer-service when a user
makes a withdrawal of $25,10:
1) The account-money-transfer-service tries to withdraw the amount from the user’s account by
sending the following request to the account-balance-service:
Architectural Principles 60
{
"sagaUuid": "e8ab60b5-3053-46e7-b8da-87b1f46edf34",
"amountInCents": 2510
}
The sagaUuid is a universally unique identifier (UUID) generated by the saga orchestrator before the
saga begins. If there are not enough funds to withdraw the given amount, the request fails with the
HTTP status code 400 Bad Request. If the request is successfully executed, the account-balance-service
should store the saga UUID to a database table temporarily. This table should be cleaned up regularly
by deleting old enough saga UUIDs.
2) The account-money-transfer-service will create a new account transaction for the user’s account
by sending the following request to the account-transaction-service:
{
"sagaUuid": "e8ab60b5-3053-46e7-b8da-87b1f46edf34",
// Additional transaction information here...
}
The above-described distributed transaction has two requests, each of which can fail. Let’s consider
the scenario where the first request to the account-balance-service fails. If the first request fails
due to a request timeout, we don’t know if the request was successfully processed by the recipient
microservice. We don’t know because we did not get the response and status code. For that reason,
we need to perform a compensating action by issuing the following compensating request:
{
"sagaUuid": "e8ab60b5-3053-46e7-b8da-87b1f46edf34",
"amountInCents": 2510
}
The account-balance-service will perform the undo-withdraw action only if a withdrawal with the
given saga UUID was earlier made and that withdrawal has not been undone yet. Upon successful
undoing, the account-balance-service will delete the row for the given saga UUID from the database
table where the saga UUID was earlier temporarily stored. Further undo-withdraw actions with the
same saga UUID will be no-op actions making the undo-withdraw action idempotent.
Next, let’s consider the scenario where the first request succeeds and the second request fails. Now we
have to compensate for both requests. First, we compensate for the first request as described earlier.
Then we will compensate for the second request by deleting the account transaction identified with
the sagaUuid:
Architectural Principles 61
DELETE /account-transaction-service/accounts/123456789012/transactions?sagaUuid=e8ab60b5-\
3053-46e7-b8da-87b1f46edf34 HTTP/1.1
If a compensating request fails, it must be repeated until it succeeds. Notice that the above
compensating requests are both idempotent, i.e., they can be executed multiple times with the same
result. Idempotency is a requirement for a compensating request because it can be possible that
a compensating request fails after the compensation was already performed. That compensation
request failure will cause the compensating request to be attempted again. The distributed transaction
manager in the account-money-transfer-service should ensure that a distributed transaction is
successfully completed or roll-backed by the instances of the account-money-transfer-service. You
should implement a single distributed transaction manager library per programming language or
technology stack and use that in all microservices that need to orchestrate distributed transactions.
Alternatively, use a 3rd party library.
Let’s have another short example with the ecommerce-service presented earlier in this chapter. The
order-placing endpoint of the ecommerce-service should make the following requests in a distributed
transaction:
1) Ensure payment
2) Create an order
3) Remove the ordered sales items from the shopping cart
4) Mark the ordered sales items sold
5) Enqueue an order confirmation email for sending
The saga choreography pattern utilizes asynchronous communication between microservices. In-
volved microservices send messages to each other in a choreography to achieve saga completion.
The saga choreography pattern has a couple of drawbacks:
Architectural Principles 62
• The execution of a distributed transaction is not centralized like in the saga orchestration
pattern, and it can be hard to figure out how a distributed transaction is actually performed.
• It creates coupling between microservices, while microservices should be as loosely coupled as
possible.
The saga choreography pattern works best in cases where the number of participating microservices is
low. Then the coupling between services is low, and it is easier to reason how a distributed transaction
is performed.
Let’s have the same money transfer example as earlier, but now using the saga choreography pattern
instead of the saga orchestration pattern.
1) The account-money-transfer-service initiates the saga by sending the following event to the
message broker’s account-balance-service topic:
{
"event": "Withdraw",
"data": {
"sagaUuid": "e8ab60b5-3053-46e7-b8da-87b1f46edf34",
"amountInCents": 2510
}
}
2) The account-balance-service will consume the Withdraw event from the message broker, perform
a withdrawal, and if successful, send the same event to the message broker’s account-
transaction-service topic.
3) The account-transaction-service will consume the Withdraw event from the message broker,
persist an account transaction, and if successful, send the following event to the message
broker’s account-money-transfer-service topic:
{
"event": "WithdrawComplete",
"data": {
"sagaUuid": "e8ab60b5-3053-46e7-b8da-87b1f46edf34"
}
}
{
"event": "WithdrawFailure",
"data": {
"sagaUuid": "e8ab60b5-3053-46e7-b8da-87b1f46edf34"
}
}
{
"event": "WithdrawRollback",
"data": {
"sagaUuid": "e8ab60b5-3053-46e7-b8da-87b1f46edf34",
"amountInCents": 2510,
// Additional transaction information here...
}
}
Once the rollback in the account-balance-service is done, the rollback event will be produced to
the account-transaction-service topic in the message broker. After the account-transaction-service
has successfully performed the rollback, it sends a WithdrawRollbackComplete event to the account-
money-transfer-service topic. Once the account-money-transfer-service consumes that message, the
withdrawal event is successfully rolled back. Suppose the account-money-transfer-service does not
receive the WithdrawRollbackComplete event during some timeout period. In that case, it will restart
the rollback choreography by resending the WithdrawRollback event to the account-balance-service.
The microservice architecture enables using the most suitable technology stack to develop each
microservice. For example, some microservices require high performance and controlled memory
allocation, and other microservices don’t need such things. You can choose the used technology stack
based on the needs of a microservice. For a real-time data processing microservice, you might pick
C++ or Rust, and for a simple REST API, you might choose Node.js and Express, Java and Spring
Boot, or Python and Django.
Even if the microservice architecture allows different teams and developers to decide what pro-
gramming languages and technologies to use when implementing a microservice, defining preferred
technology stacks for different purposes is still a good practice. Otherwise, you might find yourself in
a situation where numerous programming languages and technologies are used in a software system.
Some programming languages and technologies like Clojure, Scala, or Haskell can be relatively niche.
Architectural Principles 64
When software developers in the organization come and go, you might end up in situations where you
don’t have anyone who knows about some specific niche programming language or technology. In
the worst case, a microservice needs to be reimplemented from scratch using some more mainstream
technologies. For this reason, you should specify technology stacks that teams should use. These
technology stacks should contain as much as possible mainstream programming languages and
technologies.
For example, an architecture team might decide the following:
The above technology stacks are mainstream. Recruiting talent with needed knowledge and
competencies should be effortless.
After you have defined the preferred technology stacks, you should create a utility or utilities that can
be used to kick-start a new project using a particular technology stack quickly. This utility or utilities
should generate the initial source code repository content for a new microservice, client, or library.
The initial source code repository should contain at least the following items for a new microservice:
• .env file(s) to store environment variables for different environments (dev, CI)
• .gitignore
• README.MD template
• Linting rules (e.g., .eslintrc.json)
• Code formatting rules (e.g., .prettier.rc)
• Initial code for integration tests, e.g., docker-compose.yml file for spinning up an integration
testing environment
• Infrastructure code for the chosen cloud provider, e.g., code to deploy a managed SQL database
in the cloud
Architectural Principles 65
The utility should ask the following questions from the developer before creating the initial source
code repository content for a microservice:
Of course, decisions about the preferred technology stacks are not engraved in stone. They are not
static. As time passes, new technologies arise, and new programming languages gain popularity. At
some point, a decision could be made that a new technology stack should replace an existing preferred
technology stack. Then all new projects should use the new stack, and old software components will
be gradually migrated to use the new technology stack.
Many developers are keen on learning new things on a regular basis. They should be encouraged
to work on hobby projects with technologies of their choice, and they should be able to utilize new
programming languages and frameworks in selected new projects.
4: Object-Oriented Design Principles
This chapter describes principles related to object-oriented design. The following principles are
discussed:
We start the chapter with the definition of object-oriented programming (OOP) concepts and discuss
different programming paradigms: OOP, imperative and functional programming. We also analyze
why OOP is hard even though the OOP concepts and basic principles are not that difficult to grasp.
• Classes/Objects
• Encapsulation
• Abstraction
• Inheritance
• Interfaces
Object-Oriented Design Principles 67
– Interface evolution
• Polymorphism
– Dynamic dispatch (late binding)
4.1.1: Classes/Objects
A class is user-defined data types that act as the blueprint for individual objects (instances of the
class). An object is created using the class’s __init__ method which sets the initial state of the object.
A class consist of attributes and methods which can be either class or instance attributes/methods.
Instance attributes define the state of an object. Instance methods act on instance attributes, i.e. they
are used to query and modify the state of an object. Class attributes belong to the class and class
methods act on class attributes.
An object can represent either a concrete or abstract entity in the real world. For example, a circle and
an employee object represent real-world entities while object representing an open file (a file handle)
is an abstract entity.
Attributes of an object can contain other objects to create object hierarchies. This is called object
composition which is handled in more detail in the composition principle section.
In pure object-oriented languages like Java, you need to create always a class where you can put
functions. Even if you have only class methods and no attributes, you must create a class in Java to
host the class methods (static methods). In Python, you don’t need to create classes in those cases,
just put the functions to a single module or create a package (directory) and put each function to a
separate module.
4.1.2: Encapsulation
Encapsulation makes changing the internal state of an object directly outside of the object impossible.
The idea of encapsulation is that the state of the object is internal to the object and can be changed
externally only by the public methods of the object. Encapsulation contributes to better security
and avoidance of data corruption. Unfortunately, encapsulation is not supported by the Python
language, but there are conventions that can be used to simulate encapsulation. More about that
in the Encapsulation principle section.
4.1.3: Abstraction
Objects only reveal internal mechanisms that are relevant for the use of other objects, hiding any
unnecessary implementation code. Callers of the object methods don’t need to know the internal
workings of the object, they adhere only to the public API of the object. This makes it possible to
change the implementation details without affecting any external code.
Object-Oriented Design Principles 68
4.1.4: Inheritance
Inheritance allows classes to be arranged in a hierarchy that represents is-a relationships. For example,
class Employee might inherit from class Person. All the attributes and methods in the parent (super)
class also appear in the child (sub) class with the same names. For example, class Person might
define attributes name and birth_date. These will also be available in the Employee class. Child
class can add methods and attributes compared to the parent class. Child class can also override
a method in the parent class. For example, the Employee might add attributes employer and salary.
This technique allows easy re-use of the same functionality and data definitions, and also mirroring
real-world relationships in an intuitive way.
Python also supports multiple inheritance where a child class can have multiple parent classes. The
problem with multiple inheritance is that the child class can inherit different versions of a method
with the same name. By default, multiple inheritance should be avoided whenever possible. Some
languages like Java don’t support multiple inheritance at all. In Python also inheriting from multiple
so called mixin classes can be problematic, because two mixin classes can also have clashing method
names. Also inheritance will cram additional functionality into a single child class making the class
large and possibly not having a single responsibility. A better way to add functionality to a class is to
compose the class of multiple other classes (the mixins). In that way, there is no need to worry about
possible clashing of method names.
Multiple inheritance is allowed for interfaces. Because Python does not have interfaces, you can use
multiple inheritance if you use an abstract base class (ABC) or protocol. More about interfaces in the
next section.
4.1.5: Interface
Interface specifies a contract that classes that implement the interface must obey. Interfaces are useful
for polymorphic behaviour which is described in the next section. An interface consists of one or more
methods that classes implementing it must implement. Python does not have interfaces, but it has
abstract base classes (ABCs) and Protocols. Both of these can be used to implement an interface. The
ABC syntax is more verbose compared to protocols because you must always denote a method in an
ABC with @abstractmethod decorator.
Below is two interfaces implemented inheriting from the ABC and one class that implements the both
interfaces:
Object-Oriented Design Principles 69
class Drawable(ABC):
@abstractmethod
def draw(self) -> None:
pass
class Clickable(ABC):
@abstractmethod
def click(self) -> None:
pass
button = Button()
button.draw()
button.click()
# Output:
# Button drawn
# Button clicked
You can also combine usage of ABCs and protocols. However, it is good practice to stick to only way
of defining interfaces. Below is an example where the Window class implements two interfaces, one
that is defined extending from the ABC (in the above code listing) and one that is defined extending
from the Protocol:
class Draggable(Protocol):
def dragTo(self, x: int, y: int) -> None:
pass
window = Window()
window.draw()
window.dragTo(200, 300)
# Output:
Object-Oriented Design Principles 70
# Window drawn
# Window dragged to (200, 300)
For the rest of the book, I will use the terms interface and protocol interchangeably.
After the interface have been defined and it is used by the implementing classes, and you would like
to add method(s) to the interface, you have to provide a default implementation in your interface,
because the classes that currently implement your interface don’t implement the methods you want
to add to the interface. This is especially true in cases where the implementing classes are something
you cannot or don’t want to modify.
Let’s imagine you have a Message interface with get_data and get_length_in_bytes methods, and
you want to add set_queued_at_instant and get_queued_at_instant methods to the interface. You
can add the methods to the interface but you must provide a default implementation, like raise an
error indicating the method is not implemented.
class Message(Protocol):
def get_data(self):
# ...
def get_length_in_bytes(self):
# ...
4.1.6: Polymorphism
Polymorphism means that methods are polymorphic when the actual method to be called is decided
during the runtime. For this reason polymorphism is also called late binding (to a particular method)
or dynamic dispatch. Polymorphic behaviour is easily implemented using an interface variable. You
can assign any object that implements the particular interface to the interface variable. When call a
method on the interface variable, that actual called method is decided based on what type of object
is currently assigned to the interface variable. Below is an example of polymorphic behaviour:
Object-Oriented Design Principles 71
# Output:
# Button drawn
drawable = Window()
drawable.draw()
# Output:
# Window drawn
Polymorphic behaviour is also exhibited when you have a variable of the parent class type and assign
a child class object to the variable, like in the below example:
class IconButton(Button):
def draw(self) -> None:
print("Button with icon drawn")
# Output:
# Button drawn
button = IconButton()
button.draw()
# Output:
# Button with icon drawn
• Imperative programming
• Object-oriented programming
• Functional programming
numbers = [1, 2, 3, 4, 5]
doubled_even_numbers = []
print(doubled_even_numbers)
# Output:
# [4, 16]
In mathematics and computer science, a higher-order function (HOF) is a function that does at least
one of the following: 1. Takes one or more functions as arguments 2. Returns a function as its
result.
numbers = [1, 2, 3, 4, 5]
# Output:
# [4, 16]
As you can see, the above example is much safer, shorter and simpler. For example, there are no
variable assignments or state modifications.
Let’s implement the above code using map and filter functions:
Object-Oriented Design Principles 73
numbers = [1, 2, 3, 4, 5]
# Output:
# [4, 16]
In the above example, we assigned a lambda to a variable. This practice is not according to PEP 8.
We should use def to define a function instead:
numbers = [1, 2, 3, 4, 5]
def is_even(number):
return number % 2 == 0
def doubled(number):
return number**2
# Output:
# [4, 16]
The above expression is hard to read. Let’s use a variable to store an intermediate value:
# Output:
# [4, 16]
There is another way to implement the above code using composition of functions. We can define re-
usable functions and compose more specific functions from more general-purpose functions. Below
is an example of function composition using the compose function from the toolz library. The example
also uses the partial function from functools module to create partially applied functions. For
example, the filterEven function is a partially applied filterfunction where the first parameter is
bound to the isEven function and similarly the mapDoubled function is a partially applied mapfunction
where the first parameter is bound to the doubled function. The compose function composes two or
more functions in the following way: compose(f, g)(x) is same as f(g(x)) and compose(f, g, h)(x)
is same as f(g(h(x))) and so on. You can compose as many functions as you need/want.
Object-Oriented Design Principles 74
numbers = [1, 2, 3, 4, 5]
def is_even(number):
return number % 2 == 0
def doubled(number):
return number**2
# Output:
# [4, 16]
In the above example all the following functions can be made re-usable and put into a library:
• is_even
• doubled
• filter_even
• map_doubled
Modern code should favor functional programming over imperative programming when possible. As
compared to functional programming, imperative programming comes with the following disadvan-
tages:
1. Mutable State: Imperative programming relies heavily on mutable state, where variables can
be modified throughout the program’s execution. This can lead to subtle bugs and make
the program harder to reason about, as the state can change unpredictably. In functional
programming, immutability is emphasized, reducing the complexity of state management and
making programs more reliable.
2. Side Effects: Imperative programming often involves side effects, where functions or operations
modify state or interact with the external world. Side effects make the code harder to test,
reason about, and debug. Functional programming, on the other hand, encourages pure
functions that have no side effects, making the code more modular, reusable, and testable.
3. Concurrency and Parallelism: Imperative programming can be challenging to parallelize and
reason about in concurrent scenarios. Since mutable state can be modified by multiple threads
or processes, race conditions and synchronization issues can occur. Functional programming,
with its emphasis on immutability and pure functions, simplifies concurrency and parallelism
by eliminating shared mutable state.
4. Lack of Referential Transparency: Imperative programming tends to rely on assignments and
statements that modify variables in-place. This can lead to code that is difficult to reason about
due to implicit dependencies and hidden interactions between different parts of the code. In
Object-Oriented Design Principles 75
Pure imperative programming also easily leads to code duplication, lack of modularity and abstraction
issues. These are issues that can be solved using object-oriented programming.
You should not use a single programming paradigm only. To best utilize both object-oriented
programming (OOP) and functional programming (FP) when developing software, you can leverage
the strengths of each paradigm in different parts of your codebase. Use domain-driven design
(DDD) and object-oriented design to design the application: interfaces and classes. Implement
classes by encapsulating related behavior and (possibly mutable) state in the classes. Apply OOP
principles like SOLID principles and OOP design patterns. This principles and patterns allow to
make code modular and easily extensible without accidentally breaking existing code. Use FP as
much as possible when implementing class and instance methods. Embrace functional composition
by creating pure functions that take immutable data as input and produce always the same output
for the same input without side effects. Use higher-order functions to compose functions and build
complex operations from simpler ones, for example utilize higher-order functions in OOP by passing
functions as arguments to methods or using them as callbacks. This allows for greater flexibility and
modularity, enabling functional-style operations within an OOP framework. Also remember to use
functional programming libraries, either the standard or 3rd party libraries. Consider using functional
techniques for error handling, such as Either or Maybe/Optional types. This helps you manage errors
without exceptions, promoting more predictable and robust code. This is because function signatures
don’t tell if they can throw. You must remember always consult documentation and check if a function
can throw.
Aim for immutability within your codebase, regardless of the paradigm. Immutable data reduces
complexity, avoids shared mutable state, and facilitates reasoning about your code. Favor creating
new objects or data structures instead of modifying existing ones.
• You cannot rush into coding, but you need to have patience and should perform object-oriented
design (OOD) first
• You cannot get the OOD right on the first try. You need to have discipline and time reserved
for refactoring.
• The difference of object composition and inheritance is not properly understood and inheritance
is used in place of object composition making the OOD flawed
• SOLID principles are not understood or followed
Object-Oriented Design Principles 76
– It can be difficult to create optimal sized classes and functions with single responsibility
* For example, you might have a single responsibility class, but the class is too big.
You must realize that you need to split the class into smaller classes the original
class is composed of. Each of these smaller classes have a single responsibility on a
lower-level of abstraction compared to the original class
– Understanding and following the open-closed principle can be challenging
* The idea of open-closed principle is to avoid modifying existing code and thus
avoiding breaking any existing working code. For example, If you have a collection
class and need also a thread-safe collection class, don’t modify the existing collection
class, e.g. by adding a constructor flag to tell if a collection should be thread-safe or
not. Instead, create a totally new class for thread-safe collections.
– Liskov’s principle is not as simple as it looks
* For example, if you have a base class Circle that has a draw method. If you derive a
FilledCircle class from the Circle class, you must implement the draw function so
that it first calls the base class method. But sometimes it is possible to override the
base class method with derived class method
– Interface segregation is usually left undone, if it is not immediately needed. This might
hinder extensibility of the codebase in the future
– In many texts, dependency inversion principle is explained in complicated terms. In
general, dependency inversion principle means programming against interfaces instead
of concrete class types.
• You don’t understand the value of dependency injection and are not using it
– Dependency injection is a requirement for effectively utilizing some other principles, like
the open-closed principle.
– Dependency injection makes unit testing a breeze, because you can create mock imple-
mentations and inject them to the tested code
• You don’t know/understand design patterns and don’t know when and how to use them
– Familiarize yourself with the design patterns
– Some design patterns are more useful than others. You use some patterns basically in
every codebase and some patterns you almost never use
– Many design patterns help to make code more modular, extensible and help to avoid
needing to modify existing code. Modifying existing code is always a risk. You can
introduce bugs, sometimes very subtle and hard to discover, in already working code.
– Learning the design patterns takes time. It can take years to master them, and mastery is
only achieved by repeatedly using them in real-life codebases.
To be able to master OOD and OOP is a life-long process. You are never 100% ready. The best way
to become better in OOD and OOP, as in any other thing in your life, is practising. I have been
practising OOD and OOP for 29 years and I am still improving and learning something new on a
regular basis. Start a non-trivial (hobby/work) project and work with it trying to make the code 100%
clean. Whenever you think you are ready with it, leave the project for some time and then come back
to the project and you might be surprised to notice that there are several things needing improvement!
Object-Oriented Design Principles 77
All five SOLID principles1 are covered in this section. The dependency inversion principle is
generalized as a program against interfaces principle. The five SOLID principles are the following:
Each class should have a single dedicated purpose. A class can represent a single thing, like a bank
account (Account class) or an employee (Employee class), or provide a single functionality like parsing
a configuration file (ConfigFileParser class) or calculating tax (TaxCalculator class).
We should not create a class representing a bank account and an employee. It is simply wrong.
Of course, an employee can have a bank account. But that is a different thing. It is called object
composition. In object composition, an Employee class object contains an Account class object. The
Employee class still represents one thing: An employee (who can have a bank account). Object
composition is covered later in this chapter in more detail.
At the function level, each function should perform a single task. The function name should describe
what task the function performs, meaning each function name should contain a verb. The function
name should not contain the word and because it can mean that the function is doing more than
one thing or you haven’t named the function on a correct abstraction level. You should not name a
function according to the steps it performs (e.g., do_this_and_that_and_then_some_third_thing) but
instead, use wording on a higher level of abstraction.
When a class represents something, it can contain multiple methods. For example, in the Account class,
there can be methods like deposit and withdraw. It is still a single responsibility if these methods are
simple enough and if there are not too many methods in the class.
Below is a real-life code example where the and word is used in the function name:
1 https://en.wikipedia.org/wiki/SOLID
Object-Oriented Design Principles 78
In the above example, the function seems to do two things: deleting a page and removing all the
references to that page. But if we look at the code inside the function, we can realize that it is doing
a third thing also: deleting a page key from configuration keys. So should the function be named
delete_page_and_all_references_and_config_key? It does not sound reasonable. The problem with
the function name is that it is at the same level of abstraction as the function statements. The function
name should be at a higher level of abstraction than the statements inside the function.
How should we then name the function? I cannot say for sure because I don’t know the context of
the function. We could name the function just delete. This would tell the function caller that a page
will be deleted. The caller does not need to know all the actions related to deleting a page. The caller
just wants a page to be deleted. The function implementation should fulfill that request and do the
needed housekeeping actions, like removing all the references to the page being deleted and so on.
Let’s consider another example with React Hooks. React Hooks have a function named useEffect
which can be used to enqueue functions to be run after component rendering. The useEffect function
can be used to run some code after the initial render (after the component mount), after every render,
or conditionally. This is quite much responsibility for a single function. Also, the function’s quite
strange name does not reveal its purpose. The word effect comes from the fact that this function is
used to enqueue other functions with side effects to be run. The term side effect might be familiar to
functional language programmers. It indicates that a function is not pure (has side effects).
Below is an example React functional component:
Figure 4.1. MyComponent.jsx
function subscribeToDataUpdates() {
// ...
}
function unsubscribeFromDataUpdates() {
// ...
}
startFetchData();
subscribeToDataUpdates();
return function cleanup() { unsubscribeFromDataUpdates() };
}, []);
// JSX to render
Object-Oriented Design Principles 79
return ...;
}
In the above example, the useEffect call makes calls to functions startFetchData and
subscribeToDataUpdates to happen after the initial render because of the supplied empty array
for dependencies (the second parameter to the useEffect function). The cleanup function returned
from the function supplied to useEffect will be called before the effect will be run again or when the
component is unmounted and in this case, only on unmount because the effect will only run once
after the initial render.
Let’s imagine how we could improve the useEffect function. We could separate the functionality
related to mounting and unmounting into two different functions: afterMount and beforeUnmount.
Then we could change the above example to the following piece of code:
function subscribeToDataUpdate() {
// ...
}
function unsubscribeFromDataUpdate() {
// ...
}
afterMount(startFetchData, subscribeToDataUpdates);
beforeUnmount(unsubscribeFromDataUpdates)
// JSX to render
return ...;
}
The above example is cleaner and much easier for a reader to understand than the original example.
There are no multiple levels of nested functions. You don’t have to return a function to be executed
on component unmount, and you don’t have to supply an array of dependencies.
Let’s have another example of a React functional component:
Object-Oriented Design Principles 80
useEffect(() => {
function updateClickCountInDocumentTitle() {
document.title = `Click count: ${clickCount}`;
}
updateClickCountInDocumentTitle();
});
}
In the above example, the effect is called after every render (because no dependencies array is supplied
for the useEffect function). Nothing in the above code clearly states what will be executed and when.
We still use the same useEffect function, but now it behaves differently compared to the previous
example. It seems like the useEffect function is doing multiple things. How to solve this? Let’s think
hypothetically again. We could introduce yet another new function that can be called when we want
something to happen after every render:
export default function ButtonClickCounter() {
const [clickCount, setClickCount] = useState(0);
afterEveryRender(function updateClickCountInDocumentTitle() {
document.title = `Click count: ${clickCount}`;
});
}
The intentions of the above React functional component are pretty clear: It will update the click count
in the document title after every render.
Let’s optimize our example so that the click count update happens only if the click count has changed:
import { useEffect, useState } from "react";
useEffect(() => {
function updateClickCountInDocumentTitle() {
document.title = `Click count: ${clickCount}`;
}
updateClickCountInDocumentTitle();
}, [clickCount]);
}
Notice how clickCount is now added to the dependencies array of the useEffect function. This means
the effect is not executed after every render but only when the click count is changed.
Let’s imagine how we could improve the above example. We could introduce a new function that
handles dependencies: afterEveryRenderIfChanged. Our hypothetical example would now look like
this:
Object-Oriented Design Principles 81
afterEveryRenderIfChanged(
[clickCount],
function updateClickCountInDocumentTitle() {
document.title = `Click count: ${clickCount}`;
});
}
Making functions do a single thing also helped make the code more readable. Regarding the
original examples, a reader must look at the end of the useEffect function call to figure out in what
circumstances the effect function will be called. And it is cognitively challenging to understand and
remember the difference between a missing and empty dependencies array. Good code is such that
it does not make the code reader think. At best, code should read like prose: after every render if
changed “clickCount”, update click count in document title.
One idea behind the single responsibility principle is that it enables software development using
the open-closed principle described in the next section. When you follow the single responsibility
principle and need to add functionality, you add it to a new class, which means you don’t need to
modify an existing class. You should avoid modifying existing code but extend it by adding new
classes, each with a single responsibility.
Any time you find yourself modifying some method in an existing class, you should first consider if
this principle could be followed and if the modification could be avoided. Every time you modify an
existing class, you can introduce a bug in the working code. The idea of this principle is to leave the
working code untouched, so it does not get accidentally broken.
Let’s have an example where this principle is not followed. We have the following existing and
working code:
Object-Oriented Design Principles 82
class Shape(Protocol):
# ...
class RectangleShape(Shape):
def __init__(self, width: int, height: int):
self.__width = width
self.__height = height
@property
def width(self) -> int:
return self.__width
@property
def height(self) -> int:
return self.__height
@width.setter
def width(self, width: int):
self.__width = width
@height.setter
def height(self, height: int):
self.__height = height
Suppose we get an assignment to introduce support for square shapes. Let’s try to modify the existing
RectangleShape class, because a square is also a rectangle:
class RectangleShape(Shape):
# Constructor for creating rectangles
def __init__(self, width: int, height: int):
self.__width = width
self.__height = height
@property
def width(self) -> int:
return self.__width
@property
def height(self) -> int:
return self.__height
@width.setter
def width(self, width: int):
if self.__height == self.__width:
self.__height = width
self.__width = width
@height.setter
def height(self, height: int):
Object-Oriented Design Principles 83
if self.__height == self.__width:
self.__width = height
self.__height = height
We needed to add a factory method for creating squares and modify two methods in the class.
Everything works okay when we run tests. But we have introduced a subtle bug in the code: If we
create a rectangle with an equal height and width, the rectangle becomes a square, which is probably
not what is wanted. This is a bug that can be hard to find in unit tests. This example showed that
modifying an existing class can be problematic. We modified an existing class and accidentally broke
it.
A better solution to introduce support for square shapes is to use the open-closed principle and create
a new class that implements the Shape protocol. Then we don’t have to modify any existing class, and
there is no risk of accidentally breaking something in the existing code. Below is the new SquareShape
class:
class SquareShape(Shape):
def __init__(self, side_length: int):
self.__side_length = side_length
@property
def side_length(self) -> int:
return self.__side_length
@side_length.setter
def side_length(self, side_length: int):
self.__side_length = side_length
An existing class can be safely modified by adding a new method in the following cases:
1) The added method is a pure function, i.e., it always returns the same value for the same
arguments and does not have side effects, i.e., it does not modify the object’s state.
2) The added method is read-only and tread-safe, i.e., it does not modify the object’s state and
accesses the object’s state in a thread-safe manner in the case of multithreaded code. An
example of a read-only method in a shape class would be a method that calculates the shape’s
area.
3) Class is immutable, i.e., the added method (or any other method) cannot modify the object’s
state
There are a couple of cases where the modification of existing code is needed. One example is factories.
When you introduce a new class, you need to modify the related factory to be able to create an instance
of that new class. For example, if we had a ShapeFactory class, we would need to modify it to support
the creation of SquareShape objects. Factories are discussed later in this chapter.
Another case is adding a new enum constant. You typically need to modify existing code to handle
the new enum constant. If you forget to add the handling of the new enum constant somewhere in
the existing code, typically, a bug will arise. For this reason, you should always safeguard switch-case
Object-Oriented Design Principles 84
statements with a default case that throws and if/else-if structures with an else branch that throws.
You can also enable your static code analysis tool to report an issue if a switch statement’s default case
is missing or an else-branch is missing from an if/else-if structure. Also, some static code analysis
tools can report an issue if you miss handling an enum constant in a switch-case statement.
Here is an example of safeguarding an if/else-if structure:
class FilterType(Enum):
INCLUDE = 1
EXCLUDE = 2
class Filter(Protocol):
def is_filtered_out(self) -> bool:
pass
class FilterImpl(Filter):
def __init__(self, filter_type: FilterType):
self.__filter_type = filter_type
if filter_type == 'include':
# ...
elif filter_type == 'exclude':
# ...
else:
# Safeguarding
raise ValueError('Invalid filter type')
In the future, if a new literal is added to the FilterType type and you forgot to update the if-statement,
you get an error raised instead of just silently passing through the if-statement without any action
taken.
Object-Oriented Design Principles 85
We can notice from the above examples that if/else-if structures could be avoided with a better
object-oriented design. For instance, we could create a Filter protocol and two separate classes,
IncludeFilter and ExcludeFilter, that implement the Filter protocol. Using object-oriented design
allows us to eliminate the FilterType enum and the if/else-if structure. This is known as the replace
conditionals with polymorphism refactoring technique. Refactoring is discussed more in the next
chapter. Below is the above example refactored to be more object-oriented:
class Filter(Protocol):
def is_filtered_out(self) -> bool:
pass
class IncludeFilter(Filter):
# ...
class ExcludeFilter(Filter):
# ...
class Shape(Protocol):
def draw(self) -> None:
pass
class RectangleShape(Shape):
def __init__(self, width: int, height: int):
self.__width = width;
self.__height = height;
def draw(self):
# ...
@property
def width(self) -> int:
return self.__width
@property
def height(self) -> int:
return self.__height
@width.setter
def width(self, width: int):
self.__width = width
@height.setter
def height(self, height: int):
self.__height = height
class SquareShape(RectangleShape):
def __init__(self, side_length: int):
super().__init__(side_length, side_length)
@RectangleShape.width.setter
def width(self, width: int):
RectangleShape.width.fset(self, width)
RectangleShape.height.fset(self, width)
@RectangleShape.height.setter
def height(self, height: int):
RectangleShape.width.fset(self, height)
RectangleShape.height.fset(self, height)
The above example does not follow Liskov’s substitution principle because you cannot set a square’s
width and height separately. This means that a square is not a rectangle from an object-oriented point
of view. Of course, mathematically, a square is a rectangle. But when considering the above public
API of the RectangleShape class, we can conclude that a square is not a rectangle because a square
cannot fully implement the API of the RectangleShape class. We cannot substitute a square object for
a rectangle object. What we need to do is to implement the SquareShape class without deriving from
the RectangleShape class:
Object-Oriented Design Principles 87
class SquareShape(Shape):
def __init__(self, side_length: int):
self.__side_length = side_length
def draw(self):
# ...
@property
def side_length(self) -> int:
return self.__side_length
@side_length.setter
def side_length(self, side_length: int):
self.__side_length = side_length
• A subclass must implement the superclass API and retain (or, in some cases, replace) the
functionality of the superclass.
• A superclass should not have protected fields because it allows subclasses to modify the state
of the superclass, which can lead to incorrect behavior in the superclass
Below is an example where a subclass extends the behavior of a superclass in the do_something method.
The functionality of the superclass is retained in the subclass making a subclass object substitutable
for a superclass object.
class SuperClass:
# ...
def do_something(self):
# ...
class SubClass(SuperClass):
# ...
def do_something(self):
super().do_something()
Let’s have a concrete example of using the above strategy. We have the following CircleShape class
defined:
Object-Oriented Design Principles 88
class Shape(Protocol):
def draw(self) -> None:
pass
class CircleShape(Shape):
def draw(self):
# Draw the circle stroke here
class FilledCircleShape(CircleShape):
def draw(self):
super().draw() # Draws the circle stroke
# Fill the circle
The FilledCircleShape class fulfills the requirements of Liskov’s substitution principle. We can use
an instance of the FilledCircleShape class everywhere where an instance of the CircleShape class
is wanted. The FilledCircleShape class does all that the CircleShape class does, plus adds some
behavior (= filling the circle).
You can also completely replace the superclass functionality in a subclass:
class ReverseList(list):
def __iter__(self):
return ReverseListIterator(self)
The above subclass implements the superclass API and retains its behavior: The iterator method still
returns an iterator. It just returns a different iterator compared to the superclass.
We will use the Python-specific term protocol instead of interface for the rest of this section. Let’s
have an example with several automobile classes:
Object-Oriented Design Principles 89
class Automobile(Protocol):
def drive(self, start: Location, destination: Location) -> None:
pass
def carry_cargo(
self,
volume_in_cubic_meters: float,
weight_in_kgs: float
) -> None:
pass
class PassengerCar(Automobile):
# Implement drive and carry_cargo
class Van(Automobile):
# Implement drive and carry_cargo
class Truck(Automobile):
# Implement drive and carry_cargo
class ExcavatingAutomobile(Automobile):
def excavate(self) -> None:
pass
class Excavator(ExcavatingAutomobile):
# Implement drive, carry_cargo and excavate
Notice how the Automobile protocol has two methods declared. This can limit our software if we later
want to introduce other vehicles that could be just driven but unable to carry cargo. In an early phase,
we should segregate two micro protocols from the Automobile protocol. A micro protocol defines a
single capability or behavior. After segregation, we will have the following two micro protocols:
class Drivable(Protocol):
def drive(self, start: Location, destination: Location) -> None:
pass
class CargoCarriable(Protocol):
def carry_cargo(
self,
volume_in_cubic_meters: float,
weight_in_kgs: float
) -> None:
pass
Now that we have two protocols, we can use these interfaces also separately in our codebase. For
example, we can have a list of drivable objects or a list of objects that can carry cargo. We still want
Object-Oriented Design Principles 90
to have a protocol for automobiles, though. We can use protocol multiple inheritance to redefine the
Automobile protocol to extend the two micro protocols:
If we look at the ExcavatingAutomobile protocol, we can notice that it extends the Automobile protocol
and adds excavating behavior. Once again, we have a problem if we want to have an excavating
machine that is not auto-mobile. The excavating behavior should be segregated into its own micro
protocol:
class Excavating(Protocol):
def excavate(self) -> None:
pass
We can once again use the protocol multiple inheritance to redefine the ExcavatingAutomobile protocol
as follows:
The ExcavatingAutomobile protocol now extends three micro protocols: Excavating, Drivable, and
CargoCarriable. Where-ever you need an excavating, drivable, or cargo-carriable object in your
codebase, you can use an instance of the Excavator class there.
Let’s have another example with a generic collection protocol. We should be able to traverse a
collection and also be able to compare two collections for equality. First, we define a generic Iterator
protocol for iterators. It has two methods, as described below:
T = TypeVar('T')
class Iterator(Protocol[T]):
def has_next_elem(self) -> bool:
pass
class Collection(Protocol[T]):
def create_iterator(self) -> Iterator[T]:
pass
Collection is a protocol with two unrelated methods. Let’s segregate those methods into two micro
protocol: Iterable and Equatable. The Iterable interface is for objects that you can iterate over. It
has one method for creating new iterators. The Equatable protocol’s equals method is more generic
than the equals method in the Collection protocol. You can equate an Equatable object with another
object of type T:
class Iterable(Protocol[T]):
def create_iterator(self) -> Iterator[T]:
pass
class Equatable(Protocol[T]):
def equals(self, another_object: T) -> bool:
pass
We can use protocol multiple inheritance to redefine the Collection protocol as follows:
We can implement the equals method by iterating elements in two collections and checking if the
elements are equal:
class AbstractCollection(Collection[T]):
@abstractmethod
def create_iterator(self) -> Iterator[T]:
pass
@staticmethod
def __are_equal(iterator: Iterator[T], another_iterator: Iterator[T]):
while iterator.has_next_elem():
if another_iterator.has_next_elem():
if (
iterator.get_next_elem()
!= another_iterator.get_next_elem()
):
return False
else:
return False
return True
another_iterator = another_collection.create_iterator()
collections_are_equal = self.__are_equal(
iterator, another_iterator
)
return (
False
if another_iterator.has_next_elem()
else collections_are_equal
)
Collections can also be compared. Let’s introduce support for such collections. First, we define a
generic Comparable protocol for comparing an object with another object:
class Comparable(Protocol[T]):
def compare_to(self, another_object: T) -> ComparisonResult:
pass
Now we can introduce a comparable collection protocol that allows comparing two collections of the
same type:
Let’s define a generic sorting algorithm for collections whose elements are comparable:
U = TypeVar('U', bound=ComparableCollection)
Let’s create two protocols, Inserting and InsertingIterable for classes whose instances elements can
be inserted into:
class Inserting(Protocol[T]):
def insert(self, element: T) -> None:
pass
Let’s redefine the Collection protocol to extend the InsertingIterable protocol because a collection
is iterable, and you can insert elements into a collection.
Object-Oriented Design Principles 93
class Collection(InsertingIterable[T]):
pass
Next, we introduce two generic algorithms for collections: map and filter. We can realize that those
algorithms work with more abstract objects than collections. We benefit from protocol segregation
because instead of the Collection protocol, we can use the Iterable and InsertingIterable protocols
to create generic map and filter algorithms. Later it is possible to introduce some additional non-
collection iterable objects that can utilize the algorithms as well. Below is the implementation of the
map and filter functions:
T = TypeVar('T')
U = TypeVar('U')
def map(
source: Iterable[T],
mapped: Callable[[T], U],
destination: InsertingIterable[U],
) -> InsertingIterable[U]:
source_iterator = source.create_iterator()
while source_iterator.has_next_elem():
source_element = source_iterator.get_next_elem()
destination.insert(mapped(source_element))
return destination
def filter(
source: Iterable[T],
is_included: Callable[[T], bool],
destination: InsertingIterable[T],
) -> InsertingIterable[T]:
source_iterator = source.create_iterator()
while source_iterator.has_next_elem():
source_element = source_iterator.get_next_elem()
if is_included(source_element):
destination.insert(source_element)
return destination
class List(Collection[T]):
def __init__(self, *args: T):
# ...
# ...
class Stack(Collection[T]):
# ...
class MySet(Collection[T]):
# ...
Now we can use the map and filter algorithms with the above-defined collection classes:
class Closeable(Protocol):
def close(self) -> None:
pass
class MaybeInserting(Protocol[T]):
async def try_insert(self, value: T) -> None:
pass
class MapError(Exception):
pass
class FileLineInserter(CloseableMaybeInserting[T]):
def __init__(self, file_path_name: str):
self.__file = None
self.__file_path_name = file_path_name
def close(self):
self.__file.close()
Let’s use the above-defined try_map algorithm and the FileLineInserter class to write doubled
numbers (one number per line) to a file named file.txt:
run(my_func())
Python’s standard library utilizes the interface segregation and multiple interface inheritance in a
very exemplary way. For example, Python standard library defines the below abstract base classes
(or interfaces) that implement a single method only. I.e. they are microinterfaces.
Object-Oriented Design Principles 96
Python standard library contains also the below abstract base classes that inherit from multiple
(micro)interfaces:
An interface is used to define an abstract base type. Various implementations can be introduced that
implement the interface. When you want to change the behavior of a program, you create a new
class that implements an interface and then use an instance of that class. In this way, you can practice
the open-closed principle. You can think of this principle as a prerequisite for using the open-closed
principle effectively. The program against interfaces principle is a generalization of the dependency
inversion principle from the SOLID principles:
class Shape(Protocol):
def draw(self) -> None:
pass
The name of an interface describes something abstract, which you cannot create an object of. In the
above example, Shape is clearly something abstract. You cannot create an instance of Shape and then
draw it or calculate its area because you don’t know what shape it is. But when a class implements an
interface, a concrete object of the class representing the interface can be created. Below is an example
of three different classes that implement the Shape interface:
Object-Oriented Design Principles 98
class CircleShape(Shape):
def __init__(self, radius: int):
self.__radius = radius
def draw(self):
# ...
class RectangleShape(Shape):
def __init__(self, width: int, height: int):
self.__width = width
self.__height = height
def draw(self):
# ...
class SquareShape(RectangleShape):
def __init__(self, side_length: int):
super().__init__(side_length, side_length)
When using shapes in code, we should program against the Shape interface. In the below example,
we make a high-level class Canvas dependent on the Shape interface, not on any of the low-level
classes (CircleShape, RectangleShape or SquareShape). Now both the high-level Canvas class and all
the low-level shape classes depend on abstraction only, the Shape interface. We can also notice that
the high-level class Canvas does not import anything from the low-level classes. Also, the abstraction
Shape does not depend on concrete implementations (classes).
class Canvas:
def __init__(self):
self.__shapes: Final[list[Shape]] = []
def draw_shapes(self):
for shape in self.__shapes:
shape.draw()
A Canvas object can contain any shape and draw any shape. It can handle any of the currently defined
concrete shapes and any new shape defined in the future.
If you did not program against an interface and did not use the dependency inversion principle, your
Canvas class would look like the following:
Object-Oriented Design Principles 99
class Circle:
def draw(self):
# ...
class Rectangle:
def draw(self):
# ...
class Square:
def draw(self):
# ...
class Canvas:
def __init__(self):
self.__shapes: Final[list[Circle | Rectangle | Square]] = []
def draw_shapes(self):
for shape in self.__shapes:
shape.draw()
The above high-level Canvas class is coupled with all the low-level classes (Circle, Rectangle, and
Square). The type annotations in the Canvas class must be modified if a new shape type is needed.
If something changes in the public API of any low-level class, the Canvas class needs to be modified
accordingly. In the above example we are implicitly specifying the protocol for the draw method: it
does not take arguments and returns None.
Let’s have another example. If you have read books or articles about object-oriented design, you may
have encountered something similar as is presented in the below example:
class Dog:
def walk(self):
# ...
def bark(self):
# ...
class Fish:
def swim(self):
# ...
class Bird:
def fly(self):
# ...
def sing(self):
# ...
Object-Oriented Design Principles 100
Three concrete implementations are defined above, but no interface is defined. Let’s say we are
making a game that has different animals. The first thing to do when coding the game is to remember
to program against interfaces and thus introduce an Animal protocol that we can use as an abstract
base type. Let’s try to create the Animal protocol based on the above concrete implementations:
class Animal(Protocol):
def walk(self) -> None:
pass
class Dog(Animal):
def walk(self):
# ...
def bark(self):
# ...
def swim(self):
raise NotImplementedError()
def fly(self):
raise NotImplementedError()
def sing(self):
raise NotImplementedError()
The above approach is wrong. We declare that the Dog class implements the Animal protocol, but it
does not do that. It implements only methods walk and bark while other methods throw an exception.
We should be able to substitute any concrete animal implementation where an animal is required.
But it is impossible because if we have a Dog object, we cannot safely call swim, fly, or sing methods
because they will always throw.
The problem is that we defined the concrete classes before defining the interface. That approach is
wrong. We should specify the interface first and then the concrete implementations. What we did
above was the other way around.
When defining an interface, we should remember that we are defining an abstract base type, so we
must think in abstract terms. We must consider what we want the animals to do in the game. If we
look at the methods walk, fly, and swim, they are all concrete actions. But what is the abstract action
Object-Oriented Design Principles 101
common to these three concrete actions? It is move. And walking, flying, and swimming are all ways
of moving. Similarly, if we look at the bark and sing methods, they are also concrete actions. What is
the abstract action common to these two concrete actions? It is make sound. And barking and singing
are both ways to make a sound. If we use these abstract actions, our Animal protocol looks like the
following:
class Animal(Protocol):
def move(self) -> None:
pass
We can now redefine the animal classes to implement the new Animal protocol:
class Dog(Animal):
def move(self):
# walk
def make_sound(self):
# bark
class Fish(Animal):
def move(self):
# swim
def make_sound(self):
# Intentionally no operation
# (Fishes typically don't make sounds)
pass
class Bird(Animal):
def move(self):
# fly
def make_sound(self):
# sing
Now we have a correct object-oriented design and can program against the Animal interface. We can
call the move method when we want an animal to move and the make_sound method when we want
an animal to make a sound.
After realizing that some birds don’t fly at all, we can easily enhance our design. We can introduce
two different implementations:
Object-Oriented Design Principles 102
class AbstractBird(Animal):
@abstractmethod
def move(self):
pass
def make_sound(self):
# sing
class FlyingBird(AbstractBird):
def move(self):
# fly
class NonFlyingBird(AbstractBird):
def move(self):
# walk
We might also later realize that not all birds sing but make different sounds. Ducks quack, for example.
Instead of using inheritance as was done above, an even better alternative is to use object composition.
We compose the Bird class of behavioral classes for moving and making sounds:
class Mover(Protocol):
def move(self) -> None:
pass
class SoundMaker(Protocol):
def make_sound(self) -> None:
pass
class Bird(Animal):
def __init__(self, mover: Mover, sound_maker: SoundMaker):
self.__mover = mover
self.__sound_maker = sound_maker
def move(self):
self.__mover.move()
def make_sound(self):
self.__sound_maker.make_sound()
Now we can create birds with various behaviors for moving and making sounds. We can use the
factory pattern to create different birds. The factory pattern is described in more detail later in this
chapter. Let’s introduce three different moving and sound-making behaviors and a factory to make
three kinds of birds: goldfinches, ostriches, and domestic ducks.
Object-Oriented Design Principles 103
class Flyer(Mover):
def move(self):
# fly
class Runner(Mover):
def move(self):
# run
class Walker(Mover):
def move(self):
# walk
class GoldfinchSoundMaker(SoundMaker):
def make_sound(self):
# Sing goldfinch specific songs
class OstrichSoundMaker(SoundMaker):
def make_sound(self):
# Make ostrich specific sounds like whistles,
# hoots, hisses, growls, and deep booming growls
# that sound like the roar of a lion
class Quacker(SoundMaker):
def make_sound(self):
# quack
class BirdType(Enum):
GOLDFINCH = 1
OSTRICH = 2
DOMESTIC_DUCK = 3
class BirdFactory:
def create_bird(self, bird_type: BirdType):
match bird_type:
case BirdType.GOLDFINCH:
return Bird(Flyer(), GoldfinchSoundMaker())
case BirdType.OSTRICH:
return Bird(Runner(), OstrichSoundMaker())
case BirdType.DOMESTIC*DUCK:
return Bird(Walker(), Quacker())
case *:
raise ValueError('Unsupported bird type')
Object-Oriented Design Principles 104
Uncle Bob uses the term clean architecture in his book Clean Architecture for this same principle. I do
not use the term architecture because I have reserved that term to designate the design of something
larger than a single service (i.e. a software system). Here we are focusing on designing a single
(micro)service conducting OOD in a particular fashion.
Clean microservice design comes with the following benefits:
Uses cases and entities together form the model of the service, also called the business logic.
Object-Oriented Design Principles 105
The direction of dependencies in the above diagrams is shown with arrows. We can see that the
microservice API depends on the controller we create. The controller depends on the use cases. The
use case layer depends on (business) entities. The purpose of the use case layer is to orchestrate
operations on the (business) entities. In te above figure, the parts of software that tend to change most
often is located at the outer layers (e.g. Controller technology like REST, GraphQL, and database) and
the part of software that is the most stable is located at the center (Entities). Let’s have an example
of an entity, a bank account. We know it is something that don’t change often. It has a couple of key
attributes: owner, account number, interest rate and balance (and probably some other attributes), but
what bank account is or does has remained the same for tens of years. But we cannot say the same
for API technologies or database technologies. Those are things that change with a much faster pace
compared to bank accounts. Because of the direction of dependencies in the above figure, changes in
the outer layers do not affect the inner layers. Using the clean microservice design allows for easy
change of the used API technology and database, e.g. from REST to some other technology and from
SQL database to NoSQL database. All these changes can be made without affecting the business logic
(use case and entities layers).
Let’s have a real-life example of creating an API microservice called order-service, which handles
orders in an e-commerce software system. First, we define a REST API controller using FastAPI:
Object-Oriented Design Principles 106
class RestOrderController:
__order_service: OrderService = Provide['order_service']
def __init__(self):
self.__router = APIRouter()
self.__router.add_api_route(
'/orders/',
self.create_order,
methods=['POST'],
status_code=201,
response_model=OutputOrder,
)
self.__router.add_api_route(
'/orders/{id_}',
self.get_order,
methods=['GET'],
response_model=OutputOrder,
)
@property
def router(self):
return self.__router
The API offered by the microservice depends on the controller, as seen in the above diagram. The
API is currently a REST API, but we could create and use a GraphQL controller. Then our API,
which depends on the controller, is a GraphQL API. Below is a partial implementation of a GraphQL
controller using FastAPI and Strawberry library:
Object-Oriented Design Principles 107
import strawberry
from dependency_injector.wiring import Provide
from strawberry.fastapi import GraphQLRouter
class GraphQlOrderController:
@strawberry.type
class Query:
@strawberry.field
def order(self, id: int) -> OutputOrder:
output_order = order_service.get_order(id)
return OutputOrder.from_pydantic(output_order)
@strawberry.type
class Mutation:
@strawberry.mutation
def create_order(self, input_order: InputOrder) -> OutputOrder:
output_order = order_service.create_order(
input_order.to_pydantic()
)
return OutputOrder.from_pydantic(output_order)
@property
def router(self):
return self.__router
class OrderService(Protocol):
def create_order(self, input_order: InputOrder) -> OutputOrder:
pass
class OrderServiceImpl(OrderService):
__order_repository: OrderRepository = Provide['order_repository']
if order is None:
raise EntityNotFoundError('Order', id_)
return OutputOrder.from_orm(order)
The OrderServiceImpl class has a dependency on an order repository. This dependency is also
inverted. The OrderServiceImpl class depends only on the OrderRepository interface. The order
Object-Oriented Design Principles 109
repository is used to orchestrate the persistence of order entities. Note that there is not any direct
dependency on a database.
Below is the OrderRepository protocol:
Figure 4.8. repositories/OrderRepository.py
class OrderRepository(Protocol):
def initialize(self) -> None:
pass
# Rest of methods...
The OrderRepository interface depends only on the Order entity class. You can introduce a class called
an interface adapter that implements the OrderRepository interface. A database interface adapter
adapts a particular concrete database to the OrderRepository interface. Entity classes do not depend
on anything except other entities to create hierarchical entities. For example, the Order entity consists
of OrderItem entities. Let’s introduce an OrderRepository interface adapter class for an SQL database:
Figure 4.9. repositories/SqlOrderRepository.py
import os
class SqlOrderRepository(OrderRepository):
__engine = create_engine(os.environ.get('DATABASE_URL'))
__SessionLocal = sessionmaker(
autocommit=False, autoflush=False, bind=__engine
)
def __init__(self):
try:
Base.metadata.create_all(bind=self.__engine)
Object-Oriented Design Principles 110
# Rest of methods...
The above class requires the used database service is configured in the environment variable named
DATABASE_URL. For a local MySQL database, you could use:
If you are interested in the implementation of to_entity_dict method, please check the Appendix
A.
If you want to change the database to MongoDB, it can be done by implementing a new interface
adapter which implements the OrderRepository interface. In the coming database principles chapter,
we will implement a MongoDB repository.
Object-Oriented Design Principles 111
When implementing a clean microservice design, everything is wired together using configuration
and dependency injection. Dependency injection is configured using the dependency-injector library
and defining a DiContainer class:
Object-Oriented Design Principles 112
class DiContainer(containers.DeclarativeContainer):
wiring_config = containers.WiringConfiguration(
modules=[
'.services.OrderServiceImpl',
'.controllers.RestOrderController',
'.controllers.FlaskRestOrderController',
'.controllers.GraphQlOrderController',
'.repositories.SqlOrderRepository',
]
)
order_service = providers.Singleton(OrderServiceImpl)
order_repository = providers.Singleton(SqlOrderRepository)
# order_controller = providers.Singleton(RestOrderController)
order_controller = providers.Singleton(GraphQlOrderController)
If we want to change something in our microservice, we can create a new class and use that new class
in the DiContainer. We could create a new repository class for a different type of database, or we could
create a new service class that implements part of services locally and part of them remotely or we
could introduce a new controller using gRPC for example. All these changes would be according to the
open-closed principle, because we are not modifying any existing classes (except for the DiContainer
of course) but we are extending our application by introducing new classes implementing existing
interfaces.
In the app.py file we create an instance of the DI container, create the FastAPI app, define an error
handler for mapping business errors to HTTP responses and finally wire the wanted controller to the
FastAPI app:
Figure 4.12. app.py
di_container = DiContainer()
app = FastAPI()
@app.exception_handler(OrderServiceError)
def handle_order_service_error(request: Request, error: OrderServiceError):
# Log error.cause
Object-Oriented Design Principles 113
return JSONResponse(
status_code=error.status_code,
content={
'errorMessage': error.message,
'stackTrace': get_stack_trace(error.cause),
},
)
@app.exception_handler(RequestValidationError)
def handle_request_validation_error(
request: Request, error: RequestValidationError
):
# Audit log
return JSONResponse(
status_code=400,
content={'errorMessage': str(error)},
)
@app.exception_handler(Exception)
def handle_unspecified_error(request: Request, error: Exception):
# Increment 'request_failures' counter by one
# with labels:
# api_endpoint=f'{request.method} {request.url}'
# status_code=500
# error_code='UnspecifiedError'
return JSONResponse(
status_code=500,
content={
'errorMessage': str(error),
'stackTrace': get_stack_trace(error),
},
)
order_controller = di_container.order_controller()
app.include_router(order_controller.router)
If you are interested in the rest of the code (i.e. DTOs, entities, GraphQL schema (=types) and errors,
please check the Appendix A.
Object-Oriented Design Principles 114
The dependency injection container is the only place in a microservice that contains references to
concrete implementations. The dependency injection principle is discussed more in a later section
of this chapter. The dependency inversion principle and dependency injection principle usually go
hand in hand. Dependency injection is used for wiring interface dependencies so that those become
dependencies on concrete implementations, as seen in the figure below.
Let’s add a feature where the shopping cart is emptied when an order is created:
Figure 4.14. services/ShoppingCartOrderService.py
from dependency_injector.wiring import Provide
class OrderServiceImpl(OrderService):
__order_repository: OrderRepository = Provide['order_repository']
__shopping_cart_service: ShoppingCartService = Provide[
'shopping_cart_service'
]
self.__shopping_cart_service.empty_cart(order.user_id)
return self.__order_repository.save(order)
As you can see from the above code, the ShoppingCartOrderService class is not depending on any
concrete implementation of the shopping cart service. We can create an interface adapter class
that is a concrete implementation of the ShoppingCartService interface. That interface adapter class
connects to a particular external shopping cart service, for example, via REST API. Once again, the
dependency injector will inject a concrete ShoppingCartService implementation to an instance of the
ShoppingCartOrderService class.
Note that the above create_order method is not production quality because it lacks a transaction.
Now we have seen examples of the following benefits of clean microservice design:
Let’s change the used web framework from FastAPI to Flask. What we need to do is to create a Flask
specific version of the RestOrderController:
Figure 4.15. controllers/FlaskRestOrderController.py
from dependency_injector.wiring import Provide
from flask import Response, jsonify, request
from flask_classful import FlaskView, route
class FlaskRestOrderController(FlaskView):
__order_service: OrderService = Provide['order_service']
@route('/orders', methods=['POST'])
def create_order(self) -> Response:
output_order = self.__order_service.create_order(
InputOrder(**request.json)
)
Object-Oriented Design Principles 116
@route('/orders/<id_>')
def get_order(self, id_: int) -> Response:
output_order = self.__order_service.get_order(id_)
return jsonify(output_order.dict())
di_container = DiContainer()
app = Flask(__name__)
@app.errorhandler(OrderServiceError)
def handle_order_service_error(error: OrderServiceError):
return Response(
json.dumps(
{
'errorMessage': error.message,
'stackTrace': get_stack_trace(error.cause),
}
),
status=error.status_code,
mimetype='application/json',
)
@app.errorhandler(Exception)
def handle_unspecified_error(error: Exception):
return Response(
json.dumps(
{
'errorMessage': str(error),
'stackTrace': get_stack_trace(error),
}
),
status=500,
mimetype='application/json',
)
FlaskRestOrderController.register(app, route_base='/')
if __name__ == '__main__':
app.run()
Object-Oriented Design Principles 117
We were able to change the used web framework by introducing two new small modules. We did not
touch any existing module, thus we can be certain that we did not break any existing functionality.
We were once again successfully applying the open-closed principle to our codebase.
This section presents conventions for uniformly naming interfaces, classes, and functions.
When an interface represents an abstract thing, name it according to that abstract thing. For
example, if you have a drawing application with various geometrical objects, name the geometrical
object interface Shape. It is a simple abstract noun. Names should always be the shortest, most
descriptive ones. There is no reason to name the geometrical object interface as GeometricalObject
or GeometricalShape, if we can use simply Shape.
When an interface represents an abstract actor, name it according to that abstract actor. The name of
an interface should be derived from the functionality it provides. For example, if there is a parseConfig
method in the interface, the interface should be named ConfigParser, and if an interface has a
validateObject method, the interface should be named ObjectValidator. Don’t use mismatching
name combinations like a ConfigReader interface with a parseConfig method or an ObjectValidator
interface with a validateData method.
When an interface represents a capability, name it according to that capability. Capability is
something that a concrete class is capable of doing. For example, a class could be sortable, iterable,
comparable, equitable, etc. Name the respective interfaces according to the capability: Sortable,
Iterable, Comparable, and Equitable. The name of an interface representing a capability usually ends
with able or ing.
Don’t name interfaces starting with the I prefix (or any other prefix or postfix). Instead, use an
Impl postfix for class names to distinguish a class from an interface when needed. You should be
programming against interfaces, and if every interface has its name prefixed with I, it just adds
unnecessary noise to the code.
Object-Oriented Design Principles 118
Some examples of class names representing a thing are: Account, Order, RectangleShape, and
CircleShape. In a class inheritance hierarchy, the names of classes usually refine the interface
name or the base class name. For example, if there is an InputMessage interface, then there
can be different concrete implementations (= classes) of the InputMessage interface. They can
represent an input message from different sources, like KafkaInputMessage and HttpInputMessage.
And there could be different subclasses for different data formats: AvroBinaryKafkaInputMessage or
JsonHttpInputMessage.
The interface or base class name should be retained in the class or subclass name. Class
names should follow the pattern: <class-purpose> + <interface-name> or <sub-class-purpose>
+ <super-class-name>, e.g., Kafka + InputMessage = KafkaInputMessage and AvroBinary +
KafkaInputMessage = AvroBinaryKafkaInputMessage. Name abstract classes with the prefix Abstract.
If an interface or class name is 20 or more characters long, consider abbreviating one or more
words in the class name. The reason for this is to keep the code readable. Very long words are
harder to read and slows a developer down. (Remember that code is more often read than written).
But only use abbreviations that are commonly used and understandable for other developers. If a
word does not have a good abbreviation, then don’t abbreviate. For example, int the class name
AvroBinaryKafkaInputMessage we can only abbreviate the Message to Msg. For other words in the class
name, there are no established abbreviations available. Abbreviating Binary to Bin is questionable,
because Bin could also mean a bin. Don’t abbreviate a word if you benefit only one or two characters.
For example, there is no reason to abbreviate Account to Accnt.
Instead of abbreviation, you can shorten a name by dropping a one or more words from it provided
that the name still remains easily understandable by any developer. For example, if you have a
classes InternalMessage, InternalMessageSchema and InternalMessageField, you could shorten the
last two class name to the following: InternalSchema and InternalField. This is because these two
classes are mainly used in conjunction with the InternalMessage class: An InternalMessage object
has a schema and one or more fields. You can also use nested classes: InternalMessage.Schema and
InternalMessage.Field.
If you have related classes and one or more class names requires shortening, you should shorten
all related class names to keep the naming uniform. For example, if you have two classes
ConfigurationParser and JsonConfigurationParser, you should shorten the names of both classes,
not only the one longer than 19 characters. The new class names would be ConfigParser and
JsonConfigParser.
If an interface or class name is less than 20 characters long, you should not usually try to make it
shorter.
Don’t add a design pattern name to a class name if it does not bring any real benefit. For example,
suppose we have a DataStore interface, a DataStoreImpl class, and a class that is wrapping a DataStore
instance and uses the proxy pattern to add caching functionality to the wrapped data store. We should
not name the caching class CachingProxyDataStore or CachingDataStoreProxy. The word proxy does
not add significant value, so the class should be named simply CachingDataStore. That name tells
clearly that it is a question about a data store with caching functionality. A seasoned developer
Object-Oriented Design Principles 119
notices from the CachingDataStore name that the class uses the proxy pattern. And if not, looking at
the class implementation will finally reveal it.
The general rule is to name a function so that the purpose of the function is clear. A good function
name should not make you think. If a function name is 20 or more characters long, consider
abbreviating one or more words in the name. The reason for this is to keep the code readable. Very
long words are harder to read and slows a developer down. (Remember that code is more often
read than written). But only use abbreviations that are widely used and understandable for other
developers. If a word does not have a good abbreviation, then don’t abbreviate.
Below is an example of a protocol containing two methods named with simple verbs only. It is not
necessary to name the methods as start_thread and stop_thread because the methods are already
part of the Thread interface, and it is self-evident what the start method starts and what the end
method ends.
class Thread(Protocol):
def start(self) -> None:
pass
Many languages offer streams that can be written to, like the standard output stream. Streams are
usually buffered, and the actual writing to the stream does not happen immediately. For example, the
below statement does not necessarily write to the standard output stream immediately. It buffers the
text to be written later when the buffer is flushed to the stream. This can happen when the buffer is
full, when some time has elapsed since the last flush or when the stream is closed.
stream.write(...)
The above statement is misleading and could be corrected by renaming the function to describe what
it actually does:
Object-Oriented Design Principles 120
stream.write_on_flush(...)
The above function name immediately tells a developer that writing happens only on flush, and the
developer can consult the function documentation to determine when the flushing happens.
You can introduce a convenience method to perform a write with an immediate flush:
# Instead of this:
stream.write_on_flush(...)
stream.flush()
class ConfigParser(Protocol):
def try_parse_config(self, config_json: str) -> Config:
pass
When a function’s action has a target, it is useful to name the function using the following pattern:
<action-verb> + <action-target>, for example, parse + config = parse_config.
We can drop the action target from the function name if the function’s first parameter describes the
action target. It is not wrong to keep the action target in the function name, though. But if it can be
dropped, it usually makes the function call statements read better. In the below example, the word
“config” appears repeated: try_parse_config(config_json), which makes the function call statement
read a bit clumsy.
config = config_parser.try_parse_config(config_json)
class ConfigParser(Protocol):
def try_parse(self, config_json: str) -> Config:
pass
As shown below, this change makes the code read better, presuming we use a descriptive variable
name. And we should, of course, always use a descriptive variable name.
config = config_parser.try_parse(config_json);
T = TypeVar('T')
class Vector(Protocol[T]):
# OK
def push_back(self, value: T) -> None:
pass
# Not ideal,
# word "value" repeated
def push_back_value(self, value: T) -> None:
pass
class KafkaAdminClient:
@staticmethod
def create(self, topic: str) -> None:
# ...
The above function name should be used only when a topic is the only thing a Kafka admin client
can create, because Python does not support method overloading.
There are two ways how to call the method:
KafkaAdminClient.create(topic='xyz')
or
topic = "xyz"
KafkaAdminClient.create(topic)
You don’t need to add a preposition to a function name if the preposition can be assumed (i.e., the
preposition is implicit). In many cases, only one preposition can be assumed. If you have a function
named wait, the preposition for can be assumed, and if you have a function named subscribe, the
preposition to can be assumed. We don’t need to use function names wait_for and subscribe_to.
Suppose a function is named laugh(person: Person). Now we have to add a preposition because
none can be assumed. We should name the function either laugh_with(person: Person) or laugh_-
at(person: Person).
Let’s analyze the Python’s list methods and see how well they are named:
Object-Oriented Design Principles 122
list.append(item)
This tells clearly where the value is put in the list. This reads good. We can easily assume an
at preposition after the insert word.
list.remove(item)
This method removes only the first item in the list. For that reason, it should be named
remove_first(item). This method can also raise an exception. We should communicate that
in the method signature. Let’s use a try prefix: list.try_remove_first(item). We will discuss
exception handling and try prefix more in the next chapter.
list.pop(index)
This reads good. We can easily assume an at preposition after the pop word.
list.index(item)
The word index is not a verb here. We add a correct verb and inform the user that the first index
of the found item is returned. This method can also raise an exception. We should communicate
that in the method signature. This method should be renamed to list.try_find_first_index_-
of(item)
list.count(item)
This reads good. We can easily assume an of preposition after the count word.
list.sort()
This is perfect. It informs that list is sorted in-place. If the method returned a new sorted list,
it should be called list.sorted()
list.reverse()
This is perfect. It informs that list is reversed in-place. If the method returned a new reversed
list, it should be called list.reversed()
list.copy()
The word copy strongly associates with copying from one place to another. I would rename
this method to list.clone()
Methods in a class can come in pairs. A typical example is a pair of getter and setter methods. When
you define a method pair in a class, name the methods logically. The methods in a method pair often
do two opposite things, like getting or setting a value. If you are unsure how to name one of the
methods, try to find an antonym for a word. For example, if you have a method whose name starts
with “create” and are unsure how to name the method for the opposite action, try a Google search:
“create antonym”.
Here is a non-comprehensive list of some method names that come in pairs:
• open/close
• load/save
• initialize/destroy
• create/destroy
• insert/delete
• start/stop
• pause/resume
• start/finish
• increase/decrease
• increment/decrement
• construct/destruct
• encrypt/decrypt
• encode/decode
• obtain/relinquish
• acquire/release
• reserve/release
• startup/shutdown
• login/logout
• begin/end
• launch/terminate
• publish/subscribe
• join/detach
• <something>/un<something>, e.g. assign/unassign, install/uninstall, subscribe/unsubscribe,
follow/unfollow
• <something>/de<something>, e.g. serialize/deserialize, allocate/deallocate
• <something>/dis<something>, e.g. connect/disconnect
Let’s have a couple of examples from real-life. The apt tool in Debian/Ubuntu-based Linux has an
install command to install a package, but the command for uninstalling a package is remove. It should
be uninstall. The Kubernetes package manager Helm has this correct. It has an install command
to install a Helm release and an uninstall command to uninstall it.
The naming of boolean functions (predicates) should be such that when reading
the function call statement, it reads as a boolean statement that can be true or
false.
In this section, we consider naming functions that are predicates and return a boolean value. Here
I don’t mean functions that return true or false based on the success of the executed action, but
cases where the function call is used to evaluate a statement as true or false. The naming of boolean
functions should be such that when reading the function call statement, it makes a statement that can
be true or false. Below are some examples:
Object-Oriented Design Principles 124
class Response:
def has_error(self) -> bool:
# ...
class String:
def is_empty(self) -> bool:
# ...
class Thread:
def should_terminate(self) -> bool:
# ...
# ...
A boolean returning function is correctly named when you call the function in code and can read that
function call statement in plain English. Below is an example of incorrect and correct naming:
class Thread:
# Incorrect naming
def stopped(self) -> bool:
# ...
# Correct naming
def is_stopped(self) -> bool:
# ...
if thread.stopped():
# Here we have: if thread stopped
# This is not a statement with a true or false answer.
# It is a second conditional form,
# asking what would happen if thread stopped.
# ...
From the above examples, we can notice that many names of boolean-returning functions start with
either is or has and follows the below pattern:
• should + <verb>
• can + <verb>
But as we saw with the starts_with, ends_with, and contains functions, a boolean returning function
name can start with any verb in third-person singular form (i.e., ending with an s). If you have
a collection class, its boolean method names should have a verb in the plural form, for example:
numbers.include(...) instead of numbers.includes(...). Name your collection variables always in
plural form (e.g., numbers instead of number_list). We will discuss the uniform naming principles for
variables in the next chapter.
Do not include the does word in a function name, like does_start_with, does_end_with, or does_-
contain. Adding the does word doesn’t add any real value to the name, and such function names are
awkward to read when used in code, for example:
Object-Oriented Design Principles 126
line = text_file_reader.read_line()
When you want to use the past tense in a function name, use a did prefix in the function name, for
example:
class DatabaseOperation:
def execute(self) -> None:
# ...
A builder class is used to create builder objects that build a new object of a particular type. If you
wanted to construct a URL, a UrlBuilder class could be used for that purpose. Builder class methods
add properties to the built object. For this reason, it is recommended to name builder class methods
starting with the verb add. The method that finally builds the wanted object should be named simply
build or build + <build-target>, for example, build_url. I prefer the longer form to remind the reader
what is being built. Below is an example of naming the methods in a builder class:
class UrlBuilder:
def add_scheme(self, scheme: str) -> 'UrlBuilder':
# ...
return self
url = (
UrlBuilder().add_scheme('https://').add_host('google.com').build_url()
)
Factory method names usually start with the verb create. Factory methods can be named so that the
create verb is implicit, for example:
Optional.of(value)
Optional.empty() # Not optimal, 'empty' can be confused as a verb
Either.with_left(value)
Either.with_right(value)
SalesItem.from_dto(input_sales_item)
Similarly, conversion methods can be named so that the convert verb is implicit. Conversion methods
without a verb usually start with the to preposition, for example:
numeric_value.to_string()
dict_value.to_json()
I recommend using method names with implicit verbs sparingly and only in circumstances where the
implicit verb is self-evident and does not force a developer to think.
Lifecycle methods are called on certain occasions only. Lifecycle method names should answer the
question: When or “on what occasion” will this method be called? Examples of good names for
lifecycle methods are: on_init, on_error, on_success, after_mount, before_unmount. For example, in
React, there are lifecycle methods in class components called componentDidMount, componentDidUpdate
and componentWillUnmount. There is no reason to repeat the class name in the lifecycle method names.
Better names would have been: afterMount, afterUpdate, and beforeUnmount.
Object-Oriented Design Principles 128
Naming rules for function parameters are mostly the same as for variables. Uniform naming principle
for variables is described in the next chapter in more detail.
There are some exceptions, like naming object parameters. When a function parameter is an object,
the name of the object class can be left out from the parameter name when the parameter name and the
function name implicitly describe the class of the parameter. This exception is acceptable because the
function parameter type can always be easily checked by looking at the function signature. And this
should be easily done with a glance because a function should be short (a maximum of 5-7 statements).
Below is an example of naming object type parameters:
# Better way
# When we think about 'drive' and 'start' or 'destination',
# we can assume that 'start' and 'destination' mean locations
def drive(start: Location, destination: Location) -> None:
# ...
Some programming languages like Swift allow adding so-called external names to function parame-
ters. Using external names can make a function call statement read better, as shown below:
func send(
message: String,
from sender: Person,
to recipient: Person
) {
// ...
}
Encapsulation is achieved by declaring class attributes private. Python does not have attribute access
modifiers. You should use a naming convention. Use an attribute name prefixed with __ to denote
a private attribute. You can create getter and setter methods (or properties) if you need the state to
be modifiable outside the class. However, encapsulation is best ensured if you don’t need to create
getter and setter methods for the class attributes. Do not automatically implement getter and setter
methods for every class. Only create those accessor methods if needed, like when the class represents
a modifiable data structure. And only generate setter methods for attributes that need to be modified
outside the class.
Regarding the first approach, when a copy is returned, the caller can use it as they like. Changes made
to the copied object don’t affect the original object. I am primarily talking about making a shallow
copy. In many cases, a shallow copy is enough. For example, a list of primitive values, immutable
strings, or immutable objects does not require a deep copy of the list. But you should make a deep
copy when needed.
The copying approach can cause a performance penalty, but in many cases, that penalty is insignifi-
cant. Yyou can easily create a copy of a list:
values = [1, 2, 3, 4, 5]
values2 = values.copy()
The second approach requires you to create an unmodifiable version of a modifiable object and return
that unmodifiable object. You can create an unmodifiable version of a class by yourself. Below is an
example:
T = TypeVar('T')
class MyList(Protocol[T]):
def append(self, item: T) -> None:
pass
class UnmodifiableMyList(MyList[T]):
In the above example, the unmodifiable list class takes another list (a modifiable list) as a constructor
argument. It only implements the MyList protocol methods that don’t attempt to modify the wrapped
list. In this case, it implements only the get_item method that delegates to the respective method in
the MyList class. The methods of the UnmodifiableMyList class that attempt to modify the wrapped
list should raise an error. The UnmodifiableMyList class utilizes the proxy pattern by wrapping an
object of the MyList class and partially allowing access to the MyList class methods.
Unmodifiable and immutable objects are slightly different. No one can modify an immutable object,
but when you return an unmodifiable object from a method, that object can still be modified by the
owning class, and modifications are visible to everyone that has received an unmodifiable version of
the object. If this is something undesirable, you should use a copy instead.
Object-Oriented Design Principles 131
1) Store a copy of the modifiable argument object to the class’s internal state
2) Store an unmodifiable version of the modifiable argument object to the class’s internal state
class MyClass:
def __init__(self, values: MyList[int]):
self.__values = UnmodifiableMyList(values)
this.state = {
clickCount: 0
};
}
}
It is not good object-oriented design that the state property is public or protected in the Component
class. You should not modify the base class’s state property in the ButtonClickCounter subclass. The
proper way to initialize the state in an object-oriented manner would be to give the initial state as a
parameter to the Component class constructor using super. However, the following is not supported by
React:
Object-Oriented Design Principles 132
Setting the state is done with the setState method defined in the Component class, but accessing the
state happens directly through the state property. This leads to a problem where you cannot use
this.state when calling the setState method because that can lead to erroneous behavior, according
to the React documentation. So the following is not allowed:
incrementClickCount = () =>
this.setState({
clickCount: this.state.clickCount + 1
});
Below is an example of using the setState method correctly in a React class component:
this.state = {
clickCount: 0
};
}
incrementClickCount = () =>
this.setState(({ clickCount }) => ({
clickCount: clickCount + 1
}));
render() {
return (
<>
Click count: {this.state.clickCount}
<button onClick={this.incrementClickCount} />
</>
);
}
}
Accessing the state in the Component subclasses should be done using a getter getState, not directly
accessing the state property. Below is the above example modified to use the imaginary getState
method:
Object-Oriented Design Principles 133
incrementClickCount = () =>
this.setState({
clickCount: this.getState().clickCount + 1
});
render() {
return (
<>
Click count: {this.getState().clickCount}
<button onClick={this.incrementClickCount} />
</>
);
}
}
For example, a car object can be composed of an engine and transmission object (to name a few).
Objects are rarely “composed” by deriving from another object, i.e., using inheritance. But first, let’s
try to specify classes that implement the below Car protocol using inheritance:
class Car(Protocol):
def drive(self, start: Location, destination: Location) -> None:
pass
class CombustionEngineCar(Car):
def drive(self, start: Location, destination: Location) -> None:
# ...
class ElectricEngineCar(Car):
def drive(self, start: Location, destination: Location) -> None:
# ...
Object-Oriented Design Principles 134
class ManualTransmissionCombustionEngineCar(CombustionEngineCar):
def drive(self, start: Location, destination: Location) -> None:
# ...
class AutomaticTransmissionCombustionEngineCar(CombustionEngineCar):
def drive(self, start: Location, destination: Location) -> None:
# ...
If we wanted to add other components to a car, like a two or four-wheel drive, the number of classes
needed would increase by three. If we wanted to add a design property (sedan, hatchback, wagon,
or SUV) to a car, the number of needed classes would explode, and the class names would become
ridiculously long. We can notice that inheritance is not the correct way to build more complex classes.
Class inheritance creates an is-a relationship between a superclass and its subclasses. Object
composition creates a has-a relationship. We can claim that ManualTransmissionCombustionEngineCar
is a kind of CombustionEngineCar, so basically, we are not doing anything wrong here, one might think.
But when designing classes, you should first determine if object composition could be used: is there
a has-a relationship? Can you declare a class as an attribute of another class? If the answer is yes,
then composition should be used instead of inheritance.
All the above things related to a car are actually properties of a car. A car has an engine. A car
has a transmission. It has a two or four-wheel drive and design. We can turn the inheritance-based
solution into a composition-based solution:
class Drivable(Protocol):
def drive(self, start: Location, destination: Location) -> None:
pass
class Engine(Protocol):
# Methods like start, stop ...
class CombustionEngine(Engine):
# Methods like start, stop ...
class ElectricEngine(Engine):
# Methods like start, stop ...
class Transmission(Protocol):
# Methods like shift_gear ...
class AutomaticTransmission(Transmission):
# Methods like shift_gear ...
class ManualTransmission(Transmission):
Object-Oriented Design Principles 135
class Car(Drivable):
def __init__(
self,
engine: Engine,
transmission: Transmission,
driveType: DriveType,
design: Design
):
self.__engine = engine
self.__transmission = transmission
self.__driveType = driveType
self.__design = design
# self.__engine.start()
# self.__transmission.shift_gear(...)
# ...
# self.__engine.stop()
Let’s have a more realistic example with different chart types. At first, this sounds like a case where
inheritance could be used: We have some abstract base charts that different concrete charts extend,
for example:
class Chart(Protocol):
def render_view(self) -> None:
pass
class AbstractChart(Chart):
@abstractmethod
def render_view(self) -> None:
pass
@abstractmethod
def update_data(self, ...) -> None:
pass
class XAxisChart(AbstractChart):
@abstractmethod
def render_view(self) -> None:
pass
class ColumnChart(XAxisChart):
def render_view(self) -> None:
# Render column chart using library XYZ
class NonAxisChart(AbstractChart):
@abstractmethod
def render_view(self) -> None:
pass
class PieChart(NonAxisChart):
def render_view(self) -> None:
# Render pie chart using library XYZ
class DonutChart(PieChart):
def render_view(self) -> None:
# Render donut chart using library XYZ
The above class hierarchy looks manageable: there should not be too many subclasses that need to be
defined. We can, of course, think of new chart types, like a geographical map or data table for which
we could add subclasses. One problem with a deep class hierarchy arises when you need to change
or correct something related to a particular chart type. Let’s say you want to change or correct some
behavior related to a pie chart. You will first check the PieChart class if the behavior is defined there.
If you can’t find what you are looking for, you need to navigate to the base class of the PieChart
class (NonAxisChart) and look there. And you might need to continue this navigation until you reach
the base class where the behavior you want to change or correct is located. Of course, if you are
incredibly familiar with the codebase, you might be able to locate the correct subclass on the first try.
But in general, this is not a straightforward task.
Using class inheritance can introduce class hierarchies where some classes have significantly more
methods than other classes. For example, in the chart inheritance chain, the AbstractChart class
probably has significantly more methods than classes at the end of the inheritance chain. This class
size difference creates an imbalance between classes making it hard to reason about what functionality
Object-Oriented Design Principles 137
class XyzPieChart(XyzNonAxisChart):
def render_view(self) -> None:
# Render pie chart using XYZ library
class AbcPieChart(AbcNonAxisChart):
def render_view(self) -> None:
# Render pie chart using ABC library
Implementing the above functionality using composition instead of inheritance has several benefits:
In the below example, we have split some chart behavior into two types of classes: chart view
renderers and chart data factories:
class Chart(Protocol):
def render_view(self) -> None:
pass
class ChartViewRenderer(Protocol):
def render_view(self, data: ChartData, options: ChartOptions) -> None:
pass
Object-Oriented Design Principles 138
class ChartDataFactory(Protocol):
def create_data(self, ...) -> ChartData:
pass
class ChartImpl(Chart):
def __init__ (
self,
view_renderer: ChartViewRenderer,
data_factory: ChartDataFactory,
options: ChartOptions
):
self.__view_renderer = view_renderer
self.__data_factory = data_factory
self.__options = options
self.__data = None,
class XyzPieChartViewRenderer(ChartViewRenderer):
def render_view(self, data: ChartData, options: ChartOptions) -> None:
# Render pie chart with XYZ library
class AbcPieChartViewRenderer(ChartViewRenderer):
def render_view(self, data: ChartData, options: ChartOptions) -> None:
# Render pie chart with ABC library
class ChartType(Enum):
COLUMN = 1
PIE = 2
class ChartFactory(Protocol):
def create_chart(self, chart_type: ChartType) -> Chart:
pass
class AbcChartFactory(ChartFactory):
def create_chart(self, chart_type: ChartType) -> Chart:
match chart_type:
case ChartType.COLUMN:
return ChartImpl(AbcColumnChartViewRenderer(),
XAxisChartDataFactory())
case ChartType.PIE:
Object-Oriented Design Principles 139
return ChartImpl(AbcPieChartViewRenderer(),
NonAxisChartDataFactory())
case _:
raise ValueError('Invalid chart type')
class XyzChartFactory(ChartFactory):
def create_chart(self, chart_type: ChartType) -> Chart:
match chart_type:
case ChartType.COLUMN:
return ChartImpl(XyzColumnChartViewRenderer(),
XAxisChartDataFactory())
case ChartType.PIE:
return ChartImpl(XyzPieChartViewRenderer(),
NonAxisChartDataFactory())
case _:
raise ValueError('Invalid chart type')
The XyzPieChartViewRenderer and AbcPieChartViewRenderer classes use the adapter pattern as they
convert the supplied data and options to an implementation (ABC or XYZ chart library) specific
interface.
We can easily add more functionality by composing the ChartImpl class of more classes. There could
be, for example, a title formatter, tooltip formatter class, y/x-axis label formatter, and event handler
classes.
class ChartImpl(Chart):
def __init__ (
self,
view_renderer: ChartViewRenderer,
data_factory: ChartDataFactory,
title_formatter: ChartTitleFormatter,
tooltip_formatter: ChartTooltipFormatter,
x_axis_label_formatter: ChartXAxisLabelFormatter,
event_handler: ChartEventHandler,
options: ChartOptions
):
# ...
class AbcChartFactory(ChartFactory):
def create_chart(self, chart_type: ChartType) -> Chart:
match chart_type:
case ChartType.COLUMN:
return ChartImpl(AbcColumnChartViewRenderer(),
XAxisChartDataFactory(),
ChartTitleFormatterImpl(),
XAxisChartTooltipFormatter(),
ChartXAxisLabelFormatterImpl(),
ColumnChartEventHandler())
case ChartType.PIE:
return ChartImpl(AbcColumnChartViewRenderer(),
NonAxisChartDataFactory(),
ChartTitleFormatterImpl(),
Object-Oriented Design Principles 140
NonAxisChartTooltipFormatter(),
NullXAxisLabelFormatter(),
NonAxisChartEventHandler())
case _:
raise ValueError('Invalid chart type')
DDD means that the structure of software, and the names appearing in the code (interface, class,
function, and variable names) should match the domain. For example, in a banking software system,
names like Account, withdraw, deposit, make_payment and LoanApplication should be used. The top-
level domain of a software system should be divided into smaller subdomains. And each subdomain
should be implemented as a separate application or software component. The top-level domain
contains all features of the software system and each subdomain is a subset of those features. For
example, a development team can be dedicated to the loan applications subdomain and another team
to payments. Developers in a team need to know about their team’s subdomain. And when interfacing
with other domains, they need to know enough about the other domains to understand the interfaces.
In this way, a single team will have a smaller set of concepts to comprehend and remember. Product
managers and the chief architect should have a good grasp of the top-level domain, i.e., they should
understand the big picture.
• Entities
• Value Objects
• Aggregates
• Aggregate Roots
• Factories
• Repositories
• Services
• Events
Object-Oriented Design Principles 141
4.9.1.1: Entities
An entity is a domain object that has an identity. Usually this is indicated by the fact the the entity
class has some kind of id attribute. Examples of entities are an employee and a bank account. An
employee object has an employee id and a bank account has a number that identifies the back account.
Entities can contain methods that operate on the attributes of the entity. For example, a bank account
entity can have methods withdraw and deposit that operate on the balance attribute of the entity.
Value objects are domain objects that don’t have an identity. Examples of value objects are an address
or a price object. The price object can have two attributes: amount and currency, but it does not have
an identity. Similarly an address object can have the following attributes: street address, postal code,
city and country.
4.9.1.3: Aggregates
Aggregates are entities composed of other entities. For example an order entity can have one or more
order item entities. In regard to object-oriented design, this is the same as object composition.
Aggregate roots are domain objects that don’t have any parent objects. An order entity is an aggregate
root if it does not have a parent entity. But an order item entity is not an aggregate root when it
belongs to an order. Aggregate roots serve as facade objects and operations should be performed
on the aggregate root objects not directly accessing the objects behind the facade (e.g. not directly
accessing the individual order items, but perform operations on order objects). Or if you have an
aggregate car object containing wheels, you don’t operate the wheels outside of the car object, but
the car object provides a facade like a turn method and the car object internally operates the wheels
making the car object an aggregate root. More about the facade design pattern in a later section of
this chapter.
Aggregate roots exists also in a microservice architecture. Let’s say we have a bank account that is an
aggregate root and it contains transaction entities. The bank account and transaction entities can be
handled in different microservices (bank-account-service and account-transaction-service), but only
the bank-account-service can directly access and modify the transaction entities using the account-
transaction-service. The role and benefit of an aggregate root are the following:
• The aggregate root protects against invariant violation, for example no other service should
directly remove or add transactions using the account-transaction-service. That would break
the invariant that the sum of transactions should be the same as the balance of the account
maintained by the bank-account-service.
Object-Oriented Design Principles 142
• The aggregate root simplifies (database/distributed) transactions. Your microservice can call the
bank-account-service and let it manage the distributed transactions between the bank-account-
service and account-transaction-service, it’s not something that your microservice need to do.
You can easily split the aggregate root into more entities, for example, we could have the bank
account aggregate root to contain a balance entity and transaction entities. The balance entity
could be handled by a separate account-balance-service. Still all bank account operations must be
made to the bank-account-service which will orchestrate e.g. withdraw and deposit operations using
the account-balance-service and account-transaction-service. We can even split the bank-account-
service to two separate microservices: bank-account-service for account CRUD operations (excluding
updates related to balance) and account-money-transfer-service that will handle withdraw and deposit
operations using the two lower-level microservices: account-balance-service and account-transaction-
service. We had an example of the latter case in the previous chapter when we discussed distributed
transactions.
4.9.2: Actors
Actors perform commands. End-users are actors, but also services can be actors. For example, in a
data exporter microservice there can be a input message consumer actor/service that has a command
to consumer a message from a data source.
4.9.2.1: Factories
In domain-driven design, the creation of domain objects can be separated from the object classes
themselves to factories. Factories are objects that are dedicated to create objects of certain type. More
about the factory design pattern in a later section of this chapter.
4.9.2.2: Repositories
A repository is an object with methods for persisting domain objects and retrieving them from a data
store (e.g. a database). Typically, there is one repository for each aggregate root, e.g. for an order
repository for order entities.
4.9.2.3: Services
Services are used to implement business use cases and contain functionality that is not directly part
of any specific object. Services orchestrate operations on aggregate roots, for example order service
orchestrates operations on order entities. A service typically uses a related repository to perform
persistence related operations. A service can also be seen as an actor with specific command(s). For
example, in a data exporter microservice there can be a input message consumer actor/service that
has a command to consumer a message from a data source.
Object-Oriented Design Principles 143
4.9.2.4: Events
Events are operations on entities and form the business use cases. Events are usually handled by
services. For example, related to order entities, there could be following events: create an order,
update an order and cancel order. These events can be implemented having an order service with the
following methods: create_order, update_order and cancel_order.
Event storming is a light-weight method that a team can conduct to find out DDD related concepts
in a software component. The event storming process typically follows the below steps:
1) Figure out domain events (events are usually written in past tense)
2) Figure out commands that caused the domain events
3) Add actors/services that execute the commands
4) Figure out related entities (including aggregates and root aggregates) and value objects
In event storming, the different DDD concepts like events, commands, actors and entities are
represented with different color sticky notes on a wall. Related sticky notes are grouped together,
like actor(s), command(s) and entity/entities for a specific domain event.
Data exporter handles data that consists of messages that contain multiple fields. Data
exporting should happen from an input system to an output system. During the export,
various transformations to the data can be made, and the data formats in the input and
output systems can differ.
Let’s start the event storming process by figuring out the domain events:
Let’s take the first domain event “Messages are consumed from the input system” and figure out what
caused the event and what was the actor. Because there is no end-user involved we can conclude
that the event was caused by a “input message consumer” service executing a “consume message”
command. This operation results in creation of an “input message” entity. Below is a picture how this
would look like as sticky notes on the wall.
Object-Oriented Design Principles 147
When continuing the event storming process further for the Input domain, we can figure out that it
consists of the following additional DDD concepts:
• Commands
• Actors/Services
• Entities
– Input message
• Value Objects
– Input configuration
Below is the list of sub-domains, interfaces and classes in the Input domain:
• Input message
– Consumes messages from the input data source and creates InputMessage instances
– InputMessageConsumer is a protocol that can have several concrete implementations, like
KafkaInputMessageConsumer for consuming messages from a Kafka data source
• Input configuration
– InputConfig instance contains parsed configuration for the domain, like the input data
source type, host, port, and input data format.
When considering the Internal Message domain in more detail, we can figure out that it consists of
the following additional DDD concepts when using the event storming process:
• Entities
– Internal message
Object-Oriented Design Principles 150
– Internal field
• Aggregate
– Internal message
• Internal Message
When considering the Transformer domain in more detail, we can figure out that it consists of the
following additional DDD concepts when using the event storming process:
Object-Oriented Design Principles 151
• Commands
• Actors/Services
• Value objects
– Transformer configuration
• Field transformer
• Message Transformer
• Transformer configuration
When considering the Output domain in more detail, we can figure out that it consists of the following
additional DDD concepts when using the event storming process:
• Commands
• Actors/Services
• Entities
Object-Oriented Design Principles 153
– Output message
• Value objects
– Output configuration
• Output message
• Output configuration
The above design is also following the clean microservice design principle. Note that this principle is
applicable for other kind of microservices also, not just for APIs. From the above design we can find
out the following interface adapters that are not part of the business logic of the microservice:
We should be able to modify the above mentioned implementations or add a new implementation
without any modification to other part of code (the business logic). What this all means is that we
can easily adapt our microservice to consume data from different data sources in different data formats
and output the transformed data to different data sources in various data formats. Additionally, the
configuration of our microservice can be read from various sources and various format. For example,
if we now read the microservice configuration from a local file in JSON format, in the future we could
Object-Oriented Design Principles 155
introduce two new classes and read the microservice configuration from an API using some new data
format.
After defining the interfaces between the above-defined subdomains, the four subdomains can be
developed very much in parallel. This can speed up the microservice development significantly. The
code of each subdomain should be put into a separate source code folders. We will discuss source
code organization more in the next chapter.
If you combine the above design diagrams, they form a data processing pipeline that can be
implemented in the following way:
class DataExporterApp:
def run(self) -> None:
while self.__is_running:
input_msg = self.__input_msg_consumer.consume_input_msg()
internal_msg = self.__input_msg_decoder.decode(input_msg)
transformed_msg = self.__msg_transformer.transform(internal_msg)
output_msg = self.__output_msg_encoder.encode(transformed_msg)
self.__output_msg_producer.produce(output_msg)
And the transform method of the MessageTransformer class can be implemented in the following way:
class MessageTransformer:
def transform(self, internal_msg: InternalMessage) -> InternalMessage:
transformed_msg = InternalMessage()
for field_transformer in self.__field_transformers:
field_transformer.transform_field(
internal_msg, transformed_msg
)
return transformed_msg
• Anomaly
• Measurement
Let’s first analyze the Measurement subdomain in more detail and define domain events for it:
Let’s continue using the event storming and define the additional DDD concepts:
• Commands
• Entities
• Aggregates
– Measurement
• Value Objects
– Measurement data
– Measurement query
• Anomalies are detected in a measurement according to anomaly detection rule using a trained
anomaly model
• Anomaly detection is triggered at regular intervals
• Anomaly model is trained for a measurement
• Anomaly model is created
• Anomaly model training is triggered at regular intervals
• A detected anomaly (i.e. an anomaly indicator) is created
• A detected anomaly (i.e. an anomaly indicator) is serialized to a wanted format, e.g. JSON
• The detected anomaly (i.e. an anomaly indicator) is published to specific destination using a
specific protocol
Let’s continue with the event storming and define the additional DDD concepts:
• Commands
• Actors/Services
• Factories
• Entities
The two domains, anomaly and measurement, can be developed in parallel. The anomaly domain
interfaces with the measurement domain to fetch data for a particular measurement from a particular
data source. The development effort of both the anomaly and measurement domains can be further
split to achieve even more development parallelization. For example, one developer could work with
anomaly detection, another with anomaly model training, and the third with anomaly indicators.
• Factory pattern
• Abstract factory pattern
• Factory method pattern
• Builder pattern
• Singleton pattern
• Prototype pattern
• Object pool pattern
Object-Oriented Design Principles 159
Factory pattern allows deferring what kind of object will be created to the point
of calling the create- method of the factory.
A factory typically consists of precisely one or several methods for creating objects of a particular base
type. A factory separates the logic of creating objects from the objects themselves which is according
to the single responsibility principle.
Below is an example ConfigParserFactory that has a single create method for creating different kinds
of ConfigParser objects. The return type of the factory’s create-method is usually an interface. This
allows different kinds of objects in a specific class hierarchy to be created. In the case of a factory with
a single create-method, the method usually contains a match-case statement or an if/elif structure.
Factories are the only place where extensive match-case statements or if/elif structures are allowed
in object-oriented programming. If you have a lengthy match-case statement or long if/elif structure
somewhere else in code, that is typically a sign of a non-object-oriented design.
class ConfigParser(Protocol):
# ...
class JsonConfigParser(ConfigParser):
# ...
class YamlConfigParser(ConfigParser):
# ...
class ConfigFormat(Enum):
JSON = 1
YAML = 2
class ConfigParserFactory:
@staticmethod
def create_config_parser(config_format: ConfigFormat) -> ConfigParser:
match config_format:
case ConfigFormat.JSON:
return JsonConfigParser()
case ConfigFormat.YAML:
return YamlConfigParser()
case _:
raise ValueError('Unsupported config format')
class ShapeFactory:
@staticmethod
def create_circle_shape(radius: int) -> Shape:
return CircleShape(radius)
@staticmethod
def create_rectangle_shape(width: int, height: int) -> Shape:
return RectangleShape(width, height)
@staticmethod
def create_square_shape(side_length: int) -> Shape:
return SquareShape(side_length)
The abstract factory pattern is an extension of the earlier described factory pattern. Usually, the
abstract factory pattern should be used instead of the plain factory pattern. Below is an example of
an abstract ConfigParserFactory with one concrete implementation:
class ConfigParserFactory(Protocol):
def create_config_parser(self, config_format: ConfigFormat) -> ConfigParser:
pass
}
class ConfigParserFactoryImpl(ConfigParserFactory):
def create_config_parser(self, config_format: ConfigFormat) -> ConfigParser:
match config_format:
case ConfigFormat.JSON:
return JsonConfigParser()
case ConfigFormat.YAML:
return YamlConfigParser()
case _:
raise ValueError('Unsupported config format')
You should follow the program against interfaces principle and use the abstract ConfigParserFactory
in your code instead of a concrete factory. Then using the dependency injection principle, you can
inject the wanted factory implementation, like ConfigParserFactoryImpl.
When unit testing code, you should create mock objects instead of real ones with a factory.
The abstract factory pattern comes to your help because you can supply a mock instance of the
ConfigParserFactory in the tested code. Then you can expect the mocked create_config_parser
method to be called and return a mock instance conforming to the ConfigParser protocol. And
then, you can expect the parse method to be called on the ConfigParser mock and return a mocked
configuration. Below is an example unit test. We test the initialize method in the Application class
containing a ConfigParserFactory type attribute. The Application class uses the ConfigParserFactory
instance to create a ConfigParser object to parse the application configuration.
Object-Oriented Design Principles 161
class Config(Protocol):
# ...
class ConfigParser(Protocol):
def parse(self) -> Config:
pass
# ...
class ConfigParserFactory(Protocol):
def create_config_parser(self) -> ConfigParser:
pass
# ...
class Application:
def __init__(self, config_parser_factory: ConfigParserFactory):
self.__config_parser_factory = config_parser_factory
self.__config: Config | None = None
@property
def config(self):
return self.__config
class ConfigParserFactoryMock(ConfigParserFactory):
pass
class ConfigParserMock(ConfigParser):
pass
class ConfigMock(Config):
pass
Object-Oriented Design Principles 162
class ApplicationTests(TestCase):
def test_initialize(self):
# GIVEN
config_parser_factory_mock = ConfigParserFactoryMock()
config_parser_mock = ConfigParserMock()
config_parser_factory_mock.create_config_parser = Mock(
return_value=config_parser_mock
)
config_mock = ConfigMock()
config_parser_mock.parse = Mock(return_value=config_mock)
application = Application(config_parser_factory_mock)
# WHEN
application.initialize()
# THEN
self.assertEqual(application.config, config_mock)
if __name__ == '__main__':
main()
In the above example, we created mocks manually. The Mock constructor creates a mock object that is
also callable (i.e. it is also a mock function). You can supply the return value the mock should return
when it is called in the Mock constructor. It is also possible to create mocks automatically using the
@patch decorator which can make code less verbose as shown below:
class ApplicationTests(TestCase):
@patch.object(ConfigParserFactory, '__new__')
@patch.object(ConfigParser, '__new__')
@patch.object(Config, '__new__')
def test_initialize(
self,
config_mock: Mock,
config_parser_mock: Mock,
config_parser_factory_mock: Mock,
):
# GIVEN
config_parser_factory_mock.create_config_parser.return_value = (
config_parser_mock
)
config_parser_mock.parse.return_value = config_mock
application = Application(config_parser_factory_mock)
# WHEN
application.initialize()
# THEN
self.assertEqual(application.config, config_mock)
Object-Oriented Design Principles 163
if __name__ == '__main__':
main()
Unit testing and mocking are better described later in the testing principles chapter.
In the factory method pattern, objects are created using one or more factory
methods in a class, and the class constructor is made private. The factory
methods are usually class methods.
If you want to validate parameters in a constructor, the constructor may raise an error. You cannot
return an error value from a constructor. Creating constructors that cannot throw is recommended
because it is relatively easy to forget to catch errors raised in a constructor if nothing in the constructor
signature tells it can raise an error. See the next chapter for a discussion about the error/exception
handling principle.
class Url:
def __init__(
self,
scheme: str,
port: int,
host: str,
path: str,
query: str
):
# Validate the arguments and throw if invalid
You can use the factory method pattern to overcome the problem of raising an error in a constructor.
You can make a factory method to return an optional value (if you don’t need to return an error
cause) or make the factory method raise an error. We can add a try prefix to the factory method name
to signify that it can raise an error. Then, the function signature (function name) communicates to
readers that the function may raise an error.
Below is an example class with two factory methods. The constructor of the class is made private
using a PrivateConstructor metaclass. Users of the class can only create instances of the class by
using a factory method.
Object-Oriented Design Principles 164
T = TypeVar("T")
class PrivateConstructor(type):
def __call__(
cls: type[T],
*args: tuple[Any, ...],
**kwargs: dict[str, Any]
):
raise TypeError('Constructor is private')
def _create(
cls: type[T],
*args: tuple[Any, ...],
**kwargs: dict[str, Any]
) -> T:
return super().__call__(*args, **kwargs)
class Url(metaclass=PrivateConstructor):
def __init__(
self,
scheme: str,
port: int,
host: str,
path: str,
query: str
):
# ...
@classmethod
def create_url(
cls,
scheme: str,
port: int,
host: str,
path: str,
query: str
) -> 'Url | None':
# Validate the arguments and return 'None' if invalid
# If valid return a 'Url' instance:
# return cls._create(str, port, host, path, query)
@classmethod
def try_create_url(
cls,
scheme: str,
port: int,
host: str,
path: str,
query: str
) -> 'Url':
# Validate the arguments and raise an error if invalid
# If valid return a 'Url' instance:
# return cls._create(str, port, host, path, query)
Returning an optional value from a factory method allows utilizing functional programming tech-
Object-Oriented Design Principles 165
niques. Python does not have an optional class, but let’s first define an Optional class:
T = TypeVar('T')
U = TypeVar('U')
@classmethod
def of(cls, value: T) -> 'Optional[T]':
return cls._create(value)
@classmethod
def of_nullable(cls, value: T | None) -> 'Optional[T]':
return cls._create(value)
@classmethod
def empty(cls) -> 'Optional[T]':
return cls._create(None)
NOTE! When I use Optional class in this book, it is always the above defined class, not the Optional
from the Python’s typing module.
Notice how the above Optional class code utilized factory method pattern. It has a private constructor
and three factory methods to create different kind of Optional objects. There is the benefit that you
can name the factory methods descriptively what you can’t do with a single constructor. The name
of the factory method tells what kind of object will be created.
class Url(metaclass=PrivateConstructor):
def __init__(
self,
scheme: str,
port: int,
host: str,
path: str,
query: str
):
# ...
@classmethod
def create_url(
cls,
scheme: str,
host: str,
port: int,
path: str,
query: str
) -> Optional['Url']:
# ...
maybeUrl = Url.create_url(...)
In the builder pattern, you add properties to the built object with _addxxx methods of the builder class.
After adding all the needed properties, you can build the final object using the build or _buildxxx
method of the builder class.
For example, you can construct a URL from parts of the URL. Below is a example of using a UrlBuilder
class:
url = UrlBuilder().add_scheme('https').add_host('www.google.com').build_url()
The builder pattern has the benefit that properties given for the builder can be validated in the build
method. You can make the builder’s build method return an optional indicating whether the building
was successful. Or, you can make the build method throw if you need to return an error. Then you
should name the build method using a try prefix, for example, try_build_url. The builder pattern
also has the benefit of not needing to add default properties to the builder. For example, https could
be the default scheme, and if you are building an HTTPS URL, the add_scheme is not needed to be
called. The only problem is that you must consult the builder documentation to determine the default
values.
One drawback with the builder pattern is that you can give the parameters logically in the wrong
order like this:
url = UrlBuilder().add_host('www.google.com').add_scheme('https').build_url()
It works but does not look so nice. So if you are using a builder, always try to give the parameters
for the builder in a logically correct order if such order exists. The builder pattern works well when
there isn’t any inherent order among the parameters. Below is an example of such a case: A house
built with a HouseBuilder class.
house = HouseBuilder()\
.add_kitchen()\
.add_living_room()\
.add_bedrooms(3)\
.add_bath_rooms(2)\
.add_garage()\
.build_house()
You can achieve functionality similar to a builder with a factory method with parameters with default
values:
Object-Oriented Design Principles 168
class Url(metaclass=PrivateConstructor):
def __init__(
self,
host: str,
path: str,
query: str,
scheme: str = 'https',
port: int = 443,
):
# ...
@classmethod
def create_url(
cls,
host: str,
path: str,
query: str,
scheme: str = 'https',
port: int = 443,
) -> 'Url | None':
# ...
In the factory method above, there is clear visibility of what the default values are. Of course,
you cannot now give the parameters in a logical order. There is also a greater possibility that you
accidentally provide some parameters in the wrong order because many of them are of the same type
(string). This won’t be a potential issue with a builder where you use a method with a specific name
to give a specific parameter. In modern development environments, giving parameters in the wrong
order is less probable because IDEs offer inlay parameter hints. It is easy to see if you provide a
particular parameter in the wrong position. As shown below, giving parameters in the wrong order
can also be avoided using semantically validated function parameter types. Semantically validated
function parameters will be discussed later in this chapter.
class Url(metaclass=PrivateConstructor):
# ...
@classmethod
def create_url(
cls,
host: str,
path: str,
query: str,
scheme: Scheme = Scheme.create('https'),
port: Port = Port.create(443),
) -> 'Url | None':
# ...
@dataclass
class UrlParams:
host: str
scheme: str = 'https'
port: int = 443
path: str = ""
query: str = ""
class Url(metaclass=PrivateConstructor):
def __init__(self, url_params: UrlParams):
# ...
@classmethod
def create_url(cls, url_params: UrlParams) -> 'Optional[Url]':
# ...
Singleton pattern defines that a class can have only one instance.
Singletons are very common in pure object-oriented languages like Java. In many cases, a singleton
class can be identified as not having any state. And this is why only one instance of the class is needed.
There is no point in creating multiple instances that are the same. In some non-pure object-oriented
languages, singletons are not necessarily as common as in pure object-oriented languages and can
often be replaced by just defining functions.
In Python, a singleton instance can be created in a module and exported. When you import the
instance from the module in other modules, the other modules will always get the same exported
instance, not a new instance every time. Below is an example of such a singleton. First we define a
singleton in a module named my_class_singleton.py:
class MyClass:
# ...
my_class_singleton = MyClass()
And in other_module_1.py:
Object-Oriented Design Principles 170
print(my_class_singleton)
print(my_class_singleton)
When you run the other_module_2, you should have the following kind of output where the object
addresses are the same meaning that my_class_singleton is really a singleton
The singleton pattern can be implemented using a class with static methods only. The problem
with a static class is that the singleton class is then hardcoded, and static classes can be hard
or impossible to mock in unit testing. We should remember to program against interfaces. The
best way to implement the singleton pattern is by using the dependency inversion principle and
the dependency injection principle. Below is an example using the dependency-injector library for
handling dependency injection. The constructor of the FileConfigReader class expects a ConfigParser
instance. We annotate the constructor with the @inject annotation and provide a ConfigParser
instance with name config_parser from the DI container (defined later):
class ConfigReader(Protocol):
def try_read(self, config_location: str) -> Config:
pass
class FileConfigReader(ConfigReader):
@inject
def __init__(
self,
config_parser: ConfigParser = Provide['config_parser']
):
self.__config_parser = config_parser
In the below DiContainer class, we first configure wiring and then the name config_parser is bound
to a singleton instance of ConfigParserImpl class. (The ConfigParserImpl class code is not shown
here). The wiring_config expects that the FileConfigReader class is defined in a module named
FileConfigReader.py.
class DiContainer(containers.DeclarativeContainer):
wiring_config = containers.WiringConfiguration(
modules=['FileConfigReader']
)
config_parser = providers.Singleton(ConfigParserImpl)
The prototype pattern lets you create a new object using an existing object as a
prototype.
class Shape(Protocol):
# ...
class Position(Protocol):
def get_x(self) -> int:
pass
class DrawnShape:
def __init__ (self, position: Position, shape: Shape):
self.__position = position
self.__shape = shape
To use the prototype pattern is to call the clone_to method on a prototype object and give the position
parameter to specify where the new shape should be positioned.
The prototype pattern is also used in JavaScript to implement prototypal inheritance. Since
EcmaScript version 6, class-based inheritance has been available, and prototypal inheritance is not
needed to be used.
Object-Oriented Design Principles 172
The idea of prototypal inheritance is that the common parts for the same class objects are stored in
a prototype instance. These common parts typically mean the shared methods. There is no sense
in storing the methods multiple times in each object. That would be a waste of resources because
Javascript functions are objects themselves.
When you create a new object with the Object.create method, you give the prototype as a parameter.
After that, you can set properties for the newly created object. When you call a method on the
created object, and if that method is not found in the object’s properties, the prototype object will
be looked up for the method. Prototypes can be chained so that a prototype object contains another
prototype object. This chaining is used to implement an inheritance chain. Below is a simple example
of prototypal inheritance:
const pet = {
name: '',
getName: function() { return this.name; }
};
petNamedBella.name = 'Bella';
console.log(petNamedBella.getName()); // Prints 'Bella'
dogNamedLuna.name = 'Luna';
console.log(dogNamedLuna.getName()); // Prints 'Luna'
dogNamedLuna.bark(); // Prints 'bark'
In the object pool pattern, created objects are stored in a pool where objects
can be acquired from and returned for reuse. The object pool pattern is an
optimization pattern because it allows the reuse of created objects.
If you need to create many short-lived objects, you should utilize an object pool and reduce the need
for memory allocation and de-allocation, which takes time. In garbage-collected languages, frequent
object creation and deletion cause extra work for the garbage collector, which consumes CPU time.
Below is an example object pool protocol.
Object-Oriented Design Principles 173
T = TypeVar('T')
class ObjectPool(Protocol[T]):
def acquire_object(self, cls: type[T]) -> T:
pass
class LimitedSizeObjPool(ObjectPool[T]):
def __init__(self, max_pool_size: int):
self.__max_pool_size = max_pool_size
self.__pooled_objects = []
if pool_is_not_full:
self.__pooled_objects.append(object_)
class MyObject:
# ...
Below is a slightly different implementation of an object pool. The below implementation accepts
clearable objects, meaning objects returned to the pool are cleared before reusing. You can also supply
parameters used when constructing an object.
Object-Oriented Design Principles 174
class Clearable(Protocol):
def clear(self) -> None:
pass
T = TypeVar('T', bound=Clearable)
class LimitedSizeObjPool(ObjectPool[T]):
def __init__(self, max_pool_size: int, *args, **kwargs):
self.__max_pool_size = max_pool_size
self.__args = args
self.__kwargs = kwargs
self.__pooled_objects = []
if pool_is_not_full:
object_.clear()
self.__pooled_objects.append(object_)
class MyObject(Clearable):
def __init__(self, param1: int, param2: str, **kwargs):
print(param1, param2, kwargs)
# Prints: Cleared
my_object_pool.return_object(my_object_1)
• Composite pattern
• Facade pattern
Object-Oriented Design Principles 175
• Bridge pattern
• Strategy pattern
• Adapter pattern
• Proxy pattern
• Decorator pattern
• Flyweight pattern
In the composite pattern, a class can be composed of itself, i.e., the composition
is recursive.
Recursive object composition can be depicted by how a user interface can be composed of different
widgets. In the example below, we have a Pane class that is a Widget. A Pane object can contain several
other Widget objects, meaning a Pane object can contain other Pane objects.
class Widget(Protocol):
def render(self) -> None:
pass
class Pane(Widget):
def __init__(self, widgets: list[Widget]):
self.__widgets = widgets
class StaticText(Widget):
def render(self) -> None:
# Render static text widget
class TextInput(Widget):
def render(self) -> None:
# Render text input widget
class Button(Widget):
def render(self) -> None:
# Render button widget
class UiWindow:
def __init__(self, widgets: list[Widget]):
self.__widgets = widgets
Objects that form a tree structure are composed of themselves recursively. Below is an Avro record
field schema with a nested record field:
{
"type": "record",
"name": "sampleMessage",
"fields": [
{
"name": "field1",
"type": "string"
},
{
"name": "nestedRecordField",
"namespace": "nestedRecordField",
"type": "record",
"fields": [
{
"name": "nestedField1",
"type": "int",
"signed": "false"
}
]
}
]
}
For parsing an Avro schema, we could define classes for different sub-schemas by the field type. When
analyzing the below example, we can notice that the RecordAvroFieldSchema class can contain any
AvroFieldSchema object, also other RecordAvroFieldSchema objects, making a RecordAvroFieldSchema
object a composite object.
class AvroFieldSchema(Protocol):
# ...
class RecordAvroFieldSchema(AvroFieldSchema):
def __init__(self, sub_field_schemas: list[AvroFieldSchema]):
self.__sub_field_schemas = sub_field_schemas
class StringAvroFieldSchema(AvroFieldSchema):
# ...
class IntAvroFieldSchema(AvroFieldSchema):
# ...
Let’s use the data exporter microservice as an example. For that microservice, we could create a
Config interface that can be used to obtain configuration for the different parts (input, transformer,
and output) of the data exporter microservice. The Config interface acts as a facade. Users of the
facade need not see behind the facade. They don’t know what happens behind the facade. And they
shouldn’t care because they are just using the interface provided by the facade.
There can be various classes doing the actual work behind the facade. In the below example, there
is a ConfigReader that reads configuration from possibly different sources (from a local file or a
remote service, for example) and there are configuration parsers that can parse a specific part of the
configuration, possibly in different data formats like JSON or YAML. None of these implementations
and details are visible to the user of the facade. Any of these implementations behind the facade can
change at any time without affecting the users of the facade because facade users are not coupled to
the lower-level implementations.
Below is the implementation of the Config facade:
class Config(Protocol):
def try_get_input_config(self) -> InputConfig:
pass
class ConfigImpl(Config):
@inject
def __init__(
self,
config_reader: ConfigReader = Provide['config_reader'],
input_config_parser: InputConfigParser = Provide[
'input_config_parser'
],
transformer_config_parser: TransformerConfigParser = Provide[
'transformer_config_parser'
],
output_config_parser: OutputConfigParser = Provide[
'output_config_parser'
Object-Oriented Design Principles 178
]
):
self.__config_reader = config_reader
self.__input_config_parser = input_config_parser
self.__transformer_config_parser = transformer_config_parser
self.__output_config_parser = output_config_parser
self.__config_string = ""
self.__input_config = None
self.__output_config = None
self.__transformer_config = None
There is a unique alternative available if the above facade is implemented in Java: only the Config
interface and the ConfigImpl class could be made public, and all the configuration reading and parsing
related interfaces and classes could be package-private. This would make the usage of the facade
mandatory. No one else except the ConfigurationImpl class could use the lower-level implementation
classes related to configuration reading and parsing.
Don’t confuse the word “abstract” here with an abstract class. In an abstract class, some behavior is
not implemented at all, but the implementation is deferred to subclasses of the abstract class. Here,
instead of the term “abstraction class”, we could use the term delegating class instead.
Object-Oriented Design Principles 179
Let’s have an example with shapes and drawings capable of drawing different shapes:
class Shape(Protocol):
def render(self, renderer: ShapeRenderer) -> None:
pass
class RectangleShape(Shape):
def __init__(
self,
upper_left_corner: Point,
width: int,
height: int
):
self.__upper_left_corner = upper_left_corner
self.__width = width
self.__height = height
class CircleShape(Shape):
def __init__(self, center: Point, radius: int):
self.__center = center
self.__radius = radius
The above RectangleShape and CircleShape classes are abstractions because they delegate their
functionality (rendering) to an external class (implementation class) of the ShapeRenderer type.
Object-Oriented Design Principles 180
We can provide different rendering implementations for the shape classes. Let’s define two shape
renderers, one for rendering raster shapes and another for rendering vector shapes:
class ShapeRenderer(Protocol):
def render_circle(self, center: Point, radius: int) -> None:
pass
def render_rectangle(
self,
upper_left_corner: Point,
width: int,
height: int
) -> None:
pass
class RasterShapeRenderer(ShapeRenderer):
def __init__(self, canvas: Canvas):
self.__canvas = canvas
def render_rectangle(
self,
upper_left_corner: Point,
width: int,
height: int
):
# Render rectangle to canvas
class VectorShapeRenderer(ShapeRenderer):
def __init__(self, svg_root: SvgElement):
self.__svg_root = svg_root
def render_rectangle(
self,
upper_left_corner: Point,
width: int,
height: int
):
# Render rectangle as SVG element
# and attach as child to SVG root
Object-Oriented Design Principles 181
class Drawing(Protocol):
def get_shape_renderer(self) -> ShapeRenderer:
pass
class AbstractDrawing(Drawing):
def __init__(self, name: str):
self.__name = name
@abstractmethod
def get_shape_renderer(self) -> ShapeRenderer:
pass
@abstractmethod
def get_file_extension(self) -> str:
pass
@abstractmethod
def get_data(self) -> bytearray:
pass
class RasterDrawing(AbstractDrawing):
def __init__(self, name: str):
super().__init__(name)
self.__canvas = Canvas()
self.__shape_renderer = RasterShapeRenderer(self.__canvas)
Object-Oriented Design Principles 182
class VectorDrawing(AbstractDrawing):
def __init__(self, name: str):
super().__init__(name)
self.__svg_root = SvgElement()
self.__shape_renderer = VectorShapeRenderer(self.__svg_root)
In the above example, we have delegated the rendering behavior of the shape classes to concrete
classes implementing the ShapeRenderer protocol. The Shape classes only represent a shape but don’t
render the shape. They have a single responsibility of representing a shape. Regarding rendering, the
shape classes are “abstractions” because they delegate the rendering to other classes responsible for
rendering different shapes.
Now we can have a list of shapes and render them differently. We can do this as shown below because
we did not couple the shape classes with any specific rendering behavior.
shapes = [RectangleShape(Point(), 2, 3), CircleShape(Point(), 4)]
raster_drawing = RasterDrawing('raster-drawing')
raster_drawing.draw(shapes)
raster_drawing.save()
vector_drawing = VectorDrawing('vector-drawing')
vector_drawing.draw(shapes)
vector_drawing.save()
Below is an example where the behavior of a ConfigReader class can be changed by changing the
value of the configParser field to an instance of a different class. The default behavior is to parse the
configuration in JSON format, which can be achieved by calling the constructor without parameter.
Object-Oriented Design Principles 183
class ConfigParser(Protocol):
def try_parse(self, config_str: str) -> Config:
pass
class ConfigReader:
def __init__(self, config_parser: ConfigParser = JsonConfigParser()):
self.__config_parser = config_parser
config = self.__config_parser.try_parse(config_str)
return config
Using the strategy pattern, we can change the functionality of a ConfigReader instance by changing the
config_parser field value. For example, there could be the following classes available that implement
the ConfigParser protocol:
• JsonConfigParser
• YamlConfigParser
• TomlConfigParser
We can dynamically change the behavior of a ConfigReader instance to use the YAML parsing strategy
by giving an instance of the YamlConfigParser class as a parameter for the ConfigReader constructor.
The adapter pattern changes one interface to another interface. The adapter
pattern allows you to adapt different interfaces to a single interface.
In the below example, we have defined a Message protocol for messages that can be consumed from a
data source using a MessageConsumer.
Object-Oriented Design Principles 184
class Message(Protocol):
def get_data(self) -> bytearray:
pass
class MessageConsumer(Protocol):
def consume_message(self) -> Message:
pass
Next, we can define the message and message consumer adapter classes for Apache Kafka and Apache
Pulsar:
Figure 4.26. KafkaMsgConsumer.py
from MessageConsumer import MessageConsumer
class KafkaMsgConsumer(MessageConsumer):
def consume_message(self) -> Message:
# Consume a message from Kafka using a 3rd party
# Kafka library
# Wrap the consumed message inside an instance
# of KafkaMessage class
# Return the KafkaMessage instance
class KafkaMessage(Message):
def __init__(self, kafka_lib_msg):
self.__kafka_lib_msg = kafka_lib_msg
class PulsarMsgConsumer(MessageConsumer):
def consume_message(self) -> Message:
# Consume a message from Pulsar using the Pulsar client
# Wrap the consumed Pulsar message inside an instance
# of PulsarMessage
# Return the PulsarMessage instance
class PulsarMessage(Message):
# ...
Now we can use Kafka or Pulsar data sources with identical consumer and message interfaces. In the
future, it will be easy to integrate a new data source into the system. We only need to implement
appropriate adapter classes (message and consumer classes) for the new data source. No other code
changes are required. Thus, we would be following the open-closed principle correctly.
Let’s imagine that the API of the used Kafka library changed. We don’t need to make changes in
many places in the code. We need to create new adapter classes (message and consumer classes) for
the new API and use those new adapter classes in place of the old adapter classes. All this work is
again following the open-closed principle.
Consider using the adapter pattern even if there is nothing to adapt to, especially when working with
3rd party libraries. Because then you will be prepared for the future when changes can come. It might
be possible that a 3rd party library interface changes or there is a need to take a different library into
use. If you have not used the adapter pattern, taking a new library or library version into use could
mean that you must make many small changes in several places in the codebase, which is error-prone
and against the open-closed principle.
Let’s have an example of using a 3rd party logging library. Initially, our adapter for the abc-logging-
library is just a wrapper around the abc_logger instance from the library. There is not any actual
adapting done.
Object-Oriented Design Principles 186
class Logger(Protocol):
def log(self, log_level: LogLevel, message: str) -> None:
pass
class AbcLogger(Logger):
def log(self, log_level: LogLevel, message: str) -> None:
abc_logger.log(log_level, message)
Suppose that in the future, a better logging library is available called xyz-logging-library, and we
would like to take that into use, but it has a bit different interface. Its logging instance is called
xyz_log_writer, the logging method is named differently, and the parameters are given in different
order compared to the abc-logging-library. We can create an adapter for the new logging library, and
no other code changes are required elsewhere in the codebase:
Figure 4.32. XyzLogger.py
from xyz_logging_library import xyz_log_writer
from Logger import Logger
from LogLevel import LogLevel
class XyzLogger(Logger):
def log(self, log_level: LogLevel, message: str) -> None:
xyz_log_writer.write_log_entry(message, log_level)
We don’t have to modify all the places in the code where logging is used. And usually, logging is used
in many places. We have saved ourselves from a lot of error-prone and unnecessary work, and once
again, we have followed the open-closed principle.
When using the proxy pattern, you define a proxy class that wraps another class (the proxied class).
The proxy class conditionally delegates to the wrapped class. The proxy class implements the interface
of the wrapped class and is used in place of the wrapped class in the code.
Below is an example of a proxy class, CachingEntityStore, that caches the results of entity store
operations:
Object-Oriented Design Principles 187
TKey = TypeVar('TKey')
TValue = TypeVar('TValue')
def store(
self,
key: TKey,
value: TValue,
time_to_live_in_secs: int = 0
) -> None:
pass
TKey = TypeVar('TKey')
TValue = TypeVar('TValue')
def store(
self,
key: TKey,
value: TValue,
time_to_live_in_secs: int = 0
) -> None:
# ...
T = TypeVar('T')
class EntityStore(Protocol[T]):
async def try_get_entity(self, id_: int) -> Awaitable[T]:
pass
Object-Oriented Design Principles 188
T = TypeVar('T')
class DbEntityStore(EntityStore[T]):
async def try_get_entity(self, id_: int) -> Awaitable[T]:
# Try get entity from database ...
T = TypeVar('T')
class CachingEntityStore(EntityStore[T]):
__entity_cache: MemoryCache[int, T]
if entity is None:
entity = await self.__entity_store.try_get_entity(id_)
time_to_live_in_secs = 60
self.__entity_cache.store(id_, entity, time_to_live_in_secs)
return entity
In the above example, the CachingEntityStore class is the proxy class wrapping an EntityStore. The
proxy class is modifying the wrapped class behavior by conditionally delegating to the wrapped class.
It delegates to the wrapped class only if an entity is not found in the cache.
Below is another example of a proxy class that authorizes a user before performing a service operation:
class UserService(Protocol):
class Error(Exception):
pass
class UserServiceImpl(UserService):
async def try_get*user(self, id*: int) -> Awaitable[User]:
# Try get user by id ...
class AuthorizingUserService(UserService):
def __init__(
self,
user_service: UserService,
user_authorizer: UserAuthorizer
):
self.__user_service = user_service
self.__user_authorizer = user_authorizer
In the above example, the AuthorizingUserService class is a proxy class that wraps a UserService.
The proxy class is modifying the wrapped class behavior by conditionally delegating to the wrapped
class. It will delegate to the wrapped class only if authorization is successful.
A decorator class wraps another class whose functionality will be augmented. The decorator class
implements the interface of the wrapped class and is used in place of the wrapped class in the code.
The decorator pattern is useful when you cannot modify an existing class, e.g., the existing class is
in a 3rd party library. The decorator pattern also helps to follow the open-closed principle because
you don’t have to modify an existing method to augment its functionality. You can create a decorator
class that contains the new functionality.
Below is an example of the decorator pattern. There is a standard SQL statement executor
implementation and two decorated SQL statement executor implementations: one that adds logging
functionality and one that adds SQL statement execution timing functionality. Finally, a double-
decorated SQL statement executor is created that logs an SQL statement and times its execution.
Object-Oriented Design Principles 190
import time
from collections.abc import Awaitable
from typing import Protocol, Any
class SqlStatementExecutor(Protocol):
async def try_execute(
self,
sql_statement: str,
parameter_values: list[Any] | None = None
) -> Awaitable[Any]:
pass
class SqlStatementExecutorImpl(SqlStatementExecutor):
# Implement __get_connection() ...
class LoggingSqlStatementExecutor(SqlStatementExecutor):
def __init__(self, sql_statement_executor: SqlStatementExecutor):
self.__sql_statement_executor = sql_statement_executor
logger.log(
LogLevel.DEBUG,
f'Executing SQL statement: {sql_statement}'
)
class TimingSqlStatementExecutor(SqlStatementExecutor):
def __init__(self, sql_statement_executor: SqlStatementExecutor):
self.__sql_statement_executor = sql_statement_executor
) -> Awaitable[Any]:
start_time_in_ns = time.time_ns()
end_time_in_ns = time.time_ns()
duration_in_ns = end_time_in_ns - start_time_in_ns
duration_in_ms = duration_in_ns / 1_000_000
logger.log(
LogLevel.DEBUG,
f'SQL statement execution duration: {duration_in_ms} ms'
)
return result
timing_and_logging_sql_statement_executor = LoggingSqlStatementExecutor(
TimingSqlStatementExecutor(SqlStatementExecutorImpl())
)
In Python you can also use decorator pattern with functions and methods. Python decorators allow
us to wrap a function in order to extend the behaviour of the wrapped function, without permanently
modifying it. Python decorators are functions that take a function as a parameter and return another
function that is used in place of the decorated function. Let’s have a very simple example of a function
decorator in Python:
# Decorator
def print_hello(func):
def wrapped_func(*args, **kwargs):
print('Hello')
return func(*args, **kwargs)
return wrapped_func
@print_hello
def add(a: int, b: int) -> int:
return a + b
result = add(1, 2)
print(result) # Prints: Hello 3
Let’s have another example with a decorator that times the execution time of a function and prints it
to the console:
Object-Oriented Design Principles 192
import time
from functools import wraps
# Decorator
def timed(func):
@wraps(func)
def wrapped_func(*args, **kwargs):
start_time_in_ns = time.perf_counter_ns()
result = func(*args, **kwargs)
end_time_in_ns = time.perf_counter_ns()
duration_in_ns = end_time_in_ns - start_time_in_ns
print(
f'Exec of func "{func.__name__}" took {duration_in_ns} ns'
)
return result
return wrapped_func
@timed
def add(a: int, b: int) -> int:
return a + b
result = add(1, 2)
print(result)
# Prints, for example:
# Exec of func "add" took 625 ns
# 3
# Decorator
def logged(func):
@wraps(func)
def wrapped_func(*args, **kwargs):
result = func(*args, **kwargs)
# In real-life, you use a logger here instead of print
print(f'Func "{func.__name__}" executed')
return result
return wrapped_func
@logged
@timed
def add(a: int, b: int) -> int:
return a + b
result = add(1, 2)
print(result)
# Prints, for example:
# Exec of func "add" took 583 ns
# Func "add" executed
# 3
If you change the order of decorators, you get the output in different order. And also the execution
Object-Oriented Design Principles 193
time of the function is longer, because also the time spent in logging is added to the total execution
time:
@timed
@logged
def add(a: int, b: int) -> int:
return a + b
result = add(1, 2)
print(result)
# Prints, for example:
# Func "add" executed
# Exec of func "add" took 9708 ns
# 3
You can also use decorator functions without the @-syntax to create new functions:
logged_add = logged(add)
timed_add = timed(add)
logged_timed_add = logged(timed(add))
timed_logged_add = timed(logged(add))
print(logged_add(1,2))
# Prints
# Func "add" executed
# 3
print(timed_add(1,2))
# Prints, for example
# Exec of func "add" took 209 ns
# 3
print(logged_timed_add(1, 2))
# Prints, for example
# Exec of func "add" took 208 ns
# Func "add" executed
# 3
print(timed_logged_add(1, 2))
# Prints, for example
# Func "add" executed
# Exec of func "add" took 1250 ns
# 3
Let’s have a simple example with a game where different shapes are drawn at different positions.
Let’s assume that the game draws a lot of similar shapes but in different positions so that we can
notice the difference in memory consumption after applying this pattern.
Shapes that the game draws have the following properties: size, form, fill color, stroke color, stroke
width, and stroke style.
class Shape(Protocol):
# ...
# Define Color...
# Define StrokeStyle...
class AbstractShape(Shape):
def __init__(
self,
fill_color: Color,
stroke_color: Color,
stroke_width: int,
stroke_style: StrokeStyle
):
self.__fill_color = fill_color
self.__stroke_color = stroke_color
self.__stroke_width = stroke_width
self.__stroke_style = stroke_style
class CircleShape(AbstractShape):
def __init__(
self,
fill_color: Color,
stroke_color: Color,
stroke_width: int,
stroke_style: StrokeStyle,
radius: int
):
super().__init__(
fill_color,
stroke_color,
stroke_width,
stroke_style
)
self.__radius = radius
class PolygonShape(AbstractShape):
def __init__(
self,
fill_color: Color,
stroke_color: Color,
Object-Oriented Design Principles 195
stroke_width: int,
stroke_style: StrokeStyle,
line_segments: list[LineSegment]
):
super().__init__(
fill_color,
stroke_color,
stroke_width,
stroke_style
)
self.__line_segments = line_segments
When analyzing the PolygonShape class, we can notice that it contains many properties that consume
memory. Especially a polygon that has many line segments can consume a noticeable amount of
memory. If the game draws many identical polygons in different screen positions and always creates
a new PolygonShape object, there would be a lot of identical PolygonShape objects in the memory. To
remediate this, we can introduce a flyweight class, DrawnShapeImpl, which contains the position of a
shape and a reference to the actual shape. In this way, we can draw a lot of DrawnShapeImpl objects
that all contain a reference to the same PolygonShape object:
class DrawnShape(Protocol):
# ...
class DrawnShapeImpl(DrawnShape):
def __init__(self, shape: Shape, screen_position: Position):
self.__shape = shape
self.__screen_position = screen_position
# ...
polygon = PolygonShape(...)
positions = generateLotsOfPositions()
• Mediator pattern
• Template method pattern
• Memento pattern
• Visitor pattern
• Null object pattern
The chain of responsibility pattern lets you pass requests along a chain of
handlers.
• Process the request and then pass it to the next handler in the chain
• Process the request without passing it to the subsequent handlers (terminating the chain)
• Leave the request unprocessed and pass it to the next handler
The FastAPI web framework utilizes the chain of responsibility pattern for handling requests. In the
FastAPI framework, you can write pluggable behavior using middlewares, a concept similar to servlet
filters in Java. Below is an example of a middleware that adds HTTP request processing time to the
response in a custom HTTP header:
import time
app = FastAPI()
@app.middleware('http')
async def add_request_processing_time_header(request: Request, call_next):
start_time_in_ns = time.time_ns()
response = await call_next(request)
end_time_in_ns = time.time_ns()
processing_time_in_ns = end_time_in_ns - start_time_in_ns
processing_time_in_ms = processing_time_in_ns / 1_000_000
response.headers["X-Processing-Time-Millis"] = str(processing_time_in_ms)
return response
app = FastAPI()
# Authorization middleware
@app.middleware('http')
async def authorize(request: Request, call_next):
# From request's 'Authorization' header,
# extract the bearer JWT, if present
# Set 'token_is_present' variable value
# Verify the validity of JWT and assign result
# to 'token_is_valid' variable
if token_is_valid:
response = await call_next(request)
elif token_is_present:
# NOTE! call_next is not invoked,
# this will terminate the request
response = Response('Unauthorized', 403)
else:
# NOTE! call_next is not invoked,
# this will terminate the request
response = Response('Unauthenticated', 401)
return response
# Logging middleware
@app.middleware('http')
async def log(request: Request, call_next):
print(f'GET {str(request.url)}')
return await call_next(request)
@app.get('/hello')
def hello():
return 'Hello!'
One typical example of using the observer pattern is a UI view observing a model. The UI view will
be notified whenever the model changes and can redraw itself. Let’s have an example:
Object-Oriented Design Principles 198
class Observer(Protocol):
def notify_about_change(self) -> None:
pass
class Observable(Protocol):
def observe_by(self, observer: Observer) -> None:
pass
class ObservableImpl(Observable):
__observers: list[Observer]
def __init__(self):
self.__observers = []
class TodosModel(ObservableImpl):
__todos: list[Todo]
def __init__(self):
super().__init__()
self.__todos = []
class TodosView(Observer):
def __init__(self, todos_model: TodosModel):
self.__todos_model = todos_model
todos_model.observe_by(self)
Let’s have another example that utilizes the publish-subscribe pattern. Below we define a
Object-Oriented Design Principles 199
MessageBroker class that contains the following methods: publish, subscribe, and unsubscribe.
T = TypeVar('T')
class MessagePublisher(Protocol[T]):
def publish(self, topic: str, message: T) -> None:
pass
class MessageSubscriber(Protocol[T]):
def subscribe(
self,
topic: str,
handle_message: Callable[[T], None]
) -> None:
pass
def __init__(self):
self.__topic_to_handle_msgs_map = {}
def subscribe(
self,
topic: str,
handle_message: Callable[[T], None]
) -> None:
handle_messages = self.__topic_to_handle_msgs_map.get(topic)
if handle_messages is None:
self.__topic_to_handle_msgs_map[topic] = [handle_message]
else:
handle_messages.append(handle_message)
def unsubscribe(
self,
topic: str,
handle_message: Callable[[T], None]
) -> None:
handle_messages = self.__topic_to_handle_msgs_map.get(topic)
topic = 'test'
message_broker.subscribe(topic, print_message)
message_broker.publish(topic, 'Hi!')
message_broker.unsubscribe(topic, print_message)
message_broker.publish('test', 'Hi!')
class Action(Protocol):
def perform(self) -> None:
pass
class Command(Protocol):
def execute(self) -> None:
pass
class PrintAction(Action):
def __init__(self, message: str):
self.__message = message
class PrintCommand(Command):
def __init__(self, message: str):
self.__message = message
As can be seen, the above PrintAction and PrintCommand instances encapsulate state that is used when
the action/command is performed (usually at a later stage compared to action/command instance
creation).
Now we can use our print action/command:
Actions and commands can be made undoable provided that the action/command is undoable. The
above print action/command is not undoable, because you cannot undo print to the console. Let’s
introduce an undoable action: add item to a list. It is an action that can be undone by removing the
item from the list.
T = TypeVar('T')
values = [1, 2]
add3ToValuesAction = AddToListAction(3, values)
add3ToValuesAction.perform()
print(values) # Prints [1, 2, 3]
add3ToValuesAction.undo()
print(values) # Prints [1, 2]
The iterator pattern can be used to add iteration capabilities to a sequence class.
Let’s create a reverse iterator for the Python’s list class. We implement the Iterator abstract base
class by supplying implementation for the next method:
Object-Oriented Design Principles 202
T = TypeVar('T')
class ReverseListIterator(Iterator[T]):
def __init__(self, values: list[T]):
self.__values = values.copy()
self.__position = len(values) - 1
We can put the ReverseListIterator class into use in a ReverseArrayList class defined below:
class ReverseList(list[T]):
def __iter__(self) -> Iterator[T]:
return ReverseListIterator(self)
Now we can use the new iterator to iterate over a list in reverse order:
// Prints:
// 5
// 4
// 3
// 2
// 1
The state pattern lets an object change its behavior depending on its current
state.
Developers don’t often treat an object’s state as an object but as an enumerated value (enum), for
example. Below is an example where we have defined a UserStory class representing a user story that
can be rendered on screen. An enum value represents the state of a UserStory object.
Object-Oriented Design Principles 203
class UserStoryState(Enum):
TODO = 1
IN_DEVELOPMENT = 2
IN_VERIFICATION = 3
READY_FOR_REVIEW = 4
DONE = 5
class Icon(Protocol):
# ...
class TodoIcon(Icon):
# ...
class UserStory:
def __init__(self, name: str):
self.__name = name
self.__state = UserStoryState.TODO
The above solution is not an object-oriented one. We should replace the conditionals (switch-case
statement) with a polymorphic design. This can be done by introducing state objects. In the state
pattern, the state of an object is represented with an object instead of an enum value. Below is the
above code modified to use the state pattern:
Object-Oriented Design Principles 204
class UserStoryState(Protocol):
@property
def icon(self) -> Icon:
pass
class TodoUserStoryState(UserStoryState):
@property
def icon(self) -> Icon:
return TodoIcon()
class InDevelopmentUserStoryState(UserStoryState):
@property
def icon(self) -> Icon:
return InDevelopmentIcon()
class InVerificationUserStoryState(UserStoryState):
@property
def icon(self) -> Icon:
return InVerificationIcon()
class ReadyForReviewUserStoryState(UserStoryState):
@property
def icon(self) -> Icon:
return ReadyForReviewIcon()
class DoneUserStoryState(UserStoryState):
@property
def icon(self) -> Icon:
return DoneIcon()
class UserStory:
def __init__(self, name: str):
self.__name = name
self.__state = TodoUserStoryState()
Let’s have another example with an Order class. An order can have a state, like paid, packaged,
delivered, etc. Below we implement the order states as classes:
Object-Oriented Design Principles 205
class OrderState(Protocol):
def create_message(self, order_id: str) -> str:
pass
class PaidOrderState(OrderState):
def create_message(self, order_id: str) -> str:
return 'Order ' + order_id + ' is successfully paid'
class DeliveredOrderState(OrderState):
def create_message(self, order_id: str) -> str:
return 'Order ' + order_id + ' is delivered'
class Order:
def __init__(self, id: str, state: OrderState, customer: Customer):
self.__id = id
self.__state = state
self.__customer = customer
@property
def customer_email_address(self) -> str:
return self.__customer.email_address
@property
def state_message(self) -> str:
return self.__state.create_message(self.__id)
email_service = EmailService(...)
order = Order(...)
email_service.send_email(
order.customer_email_address,
order.state_message
)
The mediator pattern lets you reduce dependencies between objects. It restricts
direct communication between two different layers of objects and forces them to
collaborate only via a mediator object or objects.
The mediator pattern eliminates the coupling of two different layers of objects. So changes to one
layer of objects can be made without the need to change the objects in the other layer.
Object-Oriented Design Principles 206
A typical example of the mediator pattern is Model-View-Controller (MVC) pattern. In the MVC
pattern, model and view objects do not communicate directly but only via mediator objects
(controllers). Next, several different ways to use the MVC pattern in frontend clients are presented.
Traditionally MVC pattern was used in the backend when the backend also generated the view to
be shown in the client device (web browser). With the advent of single-page web clients, a modern
backend is a simple API containing only a model and controller (MC).
In the below picture, you can see how dependency inversion is used, and none of the implementation
classes depend on concrete implementations. You can easily change any implementation class
to a different one without the need to modify any other implementation class. Notice how the
ControllerImpl class uses the bridge pattern and implements two bridges, one towards the model
and the other towards the view.
As shown in the below picture, the controller can also be used as a bridge-adapter: The controller can
be modified to adapt to changes in the view layer (View2 instead of View) without needing to change
the model layer. The modified modules are shown with a gray background in the picture. Similarly,
the controller can be modified to adapt to changes in the model layer without needing to change the
view layer (not shown in the picture).
Object-Oriented Design Principles 207
The following examples use a specialization of the MVC pattern called Model-View-Presenter (MVP).
In the MVP pattern, the controller is called the presenter. I use the more generic term controller in all
examples, though. A Presenter act as a middle-man between a view and a model. A presenter-type
controller object has a reference to a view object and a model object. A view object commands the
presenter to perform actions on the model. And the model object asks the presenter to update the
view object.
Let’s have a simple todo application as an example. First, we implement the Todo class, which is part
of the model.
Figure 4.41. Todo.py
class Todo:
def __init_*(self, id*: int, name: str, is_done: bool):
self._*id = id*
self.__name = name
self.__is_done = is_done
@property
def id(self) -> int:
return self.__id
@id.setter
def id(self, id_: int) -> None:
self._*id = id*
@property
Object-Oriented Design Principles 208
@name.setter
def name(self, name: str) -> None:
self.__name = name
@property
def is_done(self) -> bool:
return self.__is_done
@is_done.setter
def is_done(self, is_done: bool) -> None:
self.__is_done = is_done
class TodoView(Protocol):
def show_todos(self, todos: list[Todo]) -> None:
pass
class TodoViewImpl(TodoView):
def __init__(self, controller: TodoController):
self.__controller = controller
controller.view = self
controller.start_fetch_todos()
Then we implement a generic Controller class that acts as a base class for concrete controllers:
Object-Oriented Design Principles 209
TModel = TypeVar('TModel')
TView = TypeVar('TView')
@property
def model(self) -> TModel | None:
return self.__model
@model.setter
def model(self, model: TModel) -> None:
self.__model = model
@property
def view(self) -> TView | None:
return self.__view
@view.setter
def view(self, view: TView) -> None:
self.__view = view
The below TodoControllerImpl class implements two actions, start_fetch_todos and toggle_todo_-
done, which delegate to the model layer. It also implements two actions, update_view_with_todos and
update_view_with_error_message, that delegate to the view layer.
class TodoController(Protocol):
async def start_fetch_todos(self) -> None:
pass
The below TodoModelImpl class implements the fetching of todos (fetch_todos) using the supplied
todo_service. The todo_service accesses the backend to read todos from a database, for example.
When todos are successfully fetched, the controller is told to update the view. If fetching of the
todos fails, the view is updated to show an error. Toggling a todo done is implemented using the
todo_service and its try_update_todo method.
class TodoService(Protocol):
class Error(Exception):
# ...
class TodoModel(Protocol):
async def fetch_todos(self) -> None:
pass
class TodoModelImpl(TodoModel):
__todos: list[Todo]
def __init__(
self,
controller: TodoController,
todo_service: TodoService
):
self.__controller = controller
controller.model = self
self.__todo_service = todo_service
self.__todos = []
if todo:
todo.is_done = not todo.is_done
try:
await self.__todo_service.try_update_todo(todo)
except TodoService.Error as error:
self.__controller.update_view_with_error_message(error.message)
Let’s make an exception to having all examples in Python and implement the above example using Web
Components. If you are not a full-stack Python developer, you can skip the rest of the section, because
it is frontend related TypeScript code. The web component view should extend the HTMLElement class.
Object-Oriented Design Principles 212
The connectedCallback method of the view will be called on the component mount. It starts fetching
todos. The showTodos method renders the given todos as HTML elements. It also adds event listeners
for the Mark done buttons. The showError method updates the inner HTML of the view to show an
error message.
Figure 4.50. Todo.ts
interface TodoView {
showTodos(todos: Todo[]): void;
showError(errorMessage: string): void;
}
connectedCallback() {
controller.startFetchTodos();
this.innerHTML = '<div>Loading todos...</div>';
}
showTodos(todos: Todo[]) {
const todoElements = todos.map(({ id, name, isDone }) => `
<li id="todo-${id}">
${id} ${name}
${isDone ? '' : '<button>Mark done</button>'}
</li>
`);
this.innerHTML = `<ul>${todoElements}</ul>`;
showError(message: string) {
this.innerHTML = `
Object-Oriented Design Principles 213
<div>
Failure: ${message}
</div>
`;
}
}
We can use the same controller and model APIs for this web component example as in the Python
example. We just need to convert the Python code to respective TypeScript code:
Figure 4.53. Controller.ts
class TodoControllerImpl
extends Controller<TodoModel, TodoView>
implements TodoController {
startFetchTodos(): void {
this.getModel()?.fetchTodos();
}
constructor(
private readonly controller: TodoController,
private readonly todoService: TodoService
) {
controller.setModel(this);
}
fetchTodos(): void {
this.todoService.getTodos()
.then((todos) => {
this.todos = todos;
controller.updateViewWithTodos(todos);
})
.catch((error) =>
controller.updateViewWithError(error.message));
}
if (foundTodo) {
foundTodo.isDone = !foundTodo.isDone;
this.todoService
.updateTodo(foundTodo)
.catch((error: any) =>
controller.updateViewWithError(error.message));
}
}
}
We could use the above-defined controller and model as such with a React view component:
Figure 4.59. ReactTodoView.tsx
// ...
import controller from './todoController';
// ...
constructor(props: Props) {
super(props);
controller.setView(this);
this.state = {
todos: []
Object-Oriented Design Principles 216
}
}
componentDidMount() {
controller.startFetchTodos();
}
showTodos(todos: Todo[]) {
this.setState({ ...this.state, todos });
}
showError(errorMessage: string) {
this.setState({ ...this.state, errorMessage });
}
render() {
// Render todos from 'this.state.todos' here
// Or show 'this.state.errorMessage' here
}
}
If you have multiple views using the same controller, you can derive your controller from the below-
defined MultiViewController class:
Figure 4.60. MultiViewController.ts
getViews(): TView[] {
return this.views;
}
Let’s say we want to have two views for todos, one for the actual todos and one viewing the todo
count. We need to modify the controller slightly to support multiple views:
Object-Oriented Design Principles 217
class TodoControllerImpl
extends MultiViewController<TodoModel, TodoView>
implement TodoController {
startFetchTodos(): void {
this.getModel()?.fetchTodos();
}
Many modern UI frameworks and state management libraries implement a specialization of the MVC
pattern called, Model-View-ViewModel (MVVM). In the MVVM pattern, the controller is called the
view model. I use the more generic term controller in the below example, though. The main difference
between the view model and the presenter in the MVP pattern is that in the MVP pattern, the presenter
has a reference to the view, but the view model does not. The view model provides bindings between
the view’s events and actions in the model. This can happen so that the view model adds action
dispatcher functions as properties of the view. And in the other direction, the view model maps
the model’s state to the properties of the view. When using React and Redux, for example, you can
connect the view to the model using the mapDispatchToProps function and connect the model to the
view using the mapStateToProps function. These two mapping functions form the view model (or the
controller) that binds the view and model together.
Let’s first implement the todo example with React and Redux and later show how the React view can
be replaced with an Angular view without any modification to the controller or the model layer.
Let’s implement a list view for todos:
Object-Oriented Design Principles 218
function TodosListView({
toggleTodoDone,
startFetchTodos,
todos
}: Props) {
useEffect(() => {
startFetchTodos();
}, [startFetchTodos]);
return <ul>{todoElements}</ul>;
}
constructor(reduxDispatch: ReduxDispatch) {
this.dispatch = (action: AbstractAction<any>) =>
reduxDispatch({ type: action });
}
}
startFetchTodos: () =>
this.dispatch(new StartFetchTodosAction())
}
getState(appState: AppState) {
return {
todos: appState.todosState.todos,
}
}
}
In the development phase, we can use the following temporary implementation of the
StartFetchTodosAction class:
Object-Oriented Design Principles 220
const initialTodosState = {
todos: []
} as TodoState
Now we can introduce a new view for todos, a TodosTableView which can utilize the same controller
as the TodosListView.
Figure 4.70. TodosTableView.tsx
function TodosTableView({
toggleTodoDone,
startFetchTodos,
todos
}: Props) {
useEffect(() => {
startFetchTodos();
}, [startFetchTodos]);
return <table><tbody>{todoElements}</tbody></table>;
}
We can notice some duplication in the TodosListView and TodosTableView components. For example,
both are using the same effect. We can create a TodosView for which we can give as parameter the
type of a single todo view, either a list item or a table row view:
Object-Oriented Design Principles 222
function TodosView({
toggleTodoDone,
startFetchTodos,
todos,
TodoView
}: Props) {
useEffect(() => {
startFetchTodos()
}, [startFetchTodos]);
In most cases, you should not store state in a view even if the state is for that particular view only.
Instead, when you store it in the model, it brings the following benefits:
We can also change the view implementation from React to Angular without modifying the controller
or model layer. This can be done, for example, using the @angular-redux2/store library. Below is a
todos table view implemented as an Angular component:
Figure 4.77. todos-table-view.component.ts
const { startFetchTodos,
toggleTodoDone } = controller.actionDispatchers;
@Component({
selector: 'todos-table-view',
template: `
<table>
<tr *ngFor="let todo of (todoState | async)?.todos">
<td>{{ todo.id }}</td>
<td>{{ todo.name }}</td>
<td>
<input
type="checkbox"
[checked]="todo.isDone"
(change)="toggleTodoDone(todo.id)"
/>
</td>
</tr>
</table>
`
})
export class TodosTableView implements OnInit {
@Select(controller.getState) todoState: Observable<TodoState>;
ngOnInit(): void {
startFetchTodos();
}
toggleTodoDone(id: number) {
toggleTodoDone(id);
}
}
Object-Oriented Design Principles 227
@Component({
selector: 'app-root',
template: `
<div>
<todos-table-view></todos-table-view>
</div>`,
styleUrls: ['./app.component.css']
})
export class AppComponent {
title = 'angular-test';
}
@NgModule({
declarations: [
AppComponent, TodosTableView
],
imports: [
BrowserModule,
NgReduxModule
],
providers: [],
bootstrap: [AppComponent]
})
export class AppModule {
constructor(ngRedux: NgRedux<AppState>) {
ngRedux.provideStore(store);
}
}
Template method pattern allows you to define a template method in a base class,
and subclasses define the final implementation of that method. The template
method contains one or more calls to abstract methods implemented in the
subclasses.
In the below example, the AbstractDrawing class contains a template method, draw. This method
includes a call to the get_shape_renderer method, an abstract method implemented in the subclasses
Object-Oriented Design Principles 228
of the AbstractDrawing class. The draw method is a template method, and a subclass defines how to
draw a single shape.
class Drawing(Protocol):
def get_shape_renderer(self) -> ShapeRenderer:
pass
class AbstractDrawing(Drawing):
def __init__(self, shapes: list[Shape]):
self.__shapes = shapes
@abstractmethod
def get_shape_renderer(self) -> ShapeRenderer:
pass
We can now implement two subclasses of the AbstractDrawing class, which define the final behavior
of the templated draw method.
class RasterDrawing(AbstractDrawing):
def __init__(self, shapes: list[Shape]):
super().__init__(shapes)
canvas = Canvas()
self.__shape_renderer = RasterShapeRenderer(canvas)
class VectorDrawing(AbstractDrawing):
def __init__(self, shapes: list[Shape]):
super().__init__(shapes)
svg_root = SvgElement()
self.__shape_renderer = VectorShapeRenderer(svg_root)
Object-Oriented Design Principles 229
The memento pattern can be used to save the internal state of an object to another
object called the memento object.
Let’s have an example with a TextEditor class. First, we define a TextEditorState protocol and its
implementation. Then we define a TextEditorStateMemento class for storing a memento of the text
editor’s state.
class TextEditorState(Protocol):
def clone(self) -> 'TextEditorState':
pass
class TextEditorStateImpl(TextEditorState):
# Implement text editor state here ...
class TextEditorStateMemento:
def __init__(self, state: TextEditorState):
self.__state = state.clone()
@property
def state(self):
return self.__state
The TextEditor class stores mementos of the text editor’s state. It provides methods to save a state,
restore a state, or restore the previous state:
class TextEditor:
__state_mementos: list[TextEditorStateMemento]
def __init__(self):
self.__current_state = TextEditorStateImpl(...)
self.__state_mementos = []
self.__current_version = 1
self.__current_version += 1
In the above example, we can add a memento for the text editor’s state by calling the save_state
method. We can recall the previous version of the text editor’s state with the restore_previous_state
method, and we can recall any version of the text editor’s state using the restore_state method.
Visitor pattern allows adding functionality to a class (like adding new methods)
without modifying the class. This is useful, for example, with library classes that
you cannot modify.
class Shape(Protocol):
def draw(self) -> None:
pass
class CircleShape(Shape):
def __init__(self, radius: int):
self.__radius = radius
@property
def radius(self) -> int:
return self.__radius
class RectangleShape(Shape):
def __init__(self, width: int, height: int):
self.__width = width
self.__height = height
@property
def width(self) -> int:
return self.__width
Object-Oriented Design Principles 231
@property
def height(self) -> int:
return self.__height
Let’s assume we need to calculate the total area of shapes in a drawing. Currently, we are in a situation
where we can modify the shape classes, so let’s add calculate_area methods to the classes:
import math
from typing import Protocol
class Shape(Protocol):
# ...
class CircleShape(Shape):
# ...
class RectangleShape(Shape):
# ...
Adding a new method to an existing class may be against the open-closed principle. In the above case,
adding the calculate_area methods is safe because the shape classes are immutable. And even if they
were not, adding the calculate_area methods would be safe because they are read-only methods, i.e.,
they don’t modify the object’s state, and we don’t have to worry about thread safety because we can
agree that our example application is not multithreaded.
Now we have the area calculation methods added, and we can use a common algorithm to calculate
the total area of shapes in a drawing:
But what if the shape classes, without the area calculation capability, were in a 3rd party library that
we cannot modify? We would have to do something like this:
Object-Oriented Design Principles 232
total_shapes_area = reduce(
shapes_area,
shapes,
0.0
)
The above solution is complicated and needs updating every time a new type of shape is introduced.
The above example does not follow object-oriented design principles: it contains an if/elif structure
with isinstance checks.
We can use the visitor pattern to replace the above conditionals with polymorphism. First, we
introduce a visitor protocol that can be used to provide additional behavior to the shape classes.
Then we introduce an execute method in the Shape protocol. And in the shape classes, we implement
the execute methods so that additional behavior provided by a concrete visitor can be executed:
class Shape(Protocol):
# ...
class CircleShape(Shape):
def __init__(self, radius: int):
self.__radius = radius
@property
def radius(self) -> int:
return self.__radius
Object-Oriented Design Principles 233
class RectangleShape(Shape):
def __init__(self, width: int, height: int):
self.__width = width
self.__height = height
@property
def width(self) -> int:
return self.__width
@property
def height(self) -> int:
return self.__height
Suppose that the shape classes were mutable and made thread-safe. We would have to define the
execute methods with appropriate synchronization to make them also thread-safe:
class CircleShape(Shape):
# Constructor that initializes
# self.__lock = Lock()
class RectangleShape(Shape):
# Constructor that initializes
# self.__lock = Lock()
class AreaCalculationShapeBehavior(ShapeBehavior):
def execute_for_circle(self, circle: CircleShape) -> Any:
return math.pi * circle.radius**2
Now we can implement the calculation of shapes’ total area using a common algorithm, and we get
rid of the conditionals. We execute the below area_calculation behavior for each shape
total_shapes_area = reduce(
lambda accum_shapes_area, shape:
accum_shapes_area + shape.execute(area_calculation),
shapes,
0.0
)
You can add more behavior to the shape classes by defining a new visitor. Let’s define a
PerimeterCalculationShapeBehaviour class:
class PerimeterCalculationShapeBehavior(ShapeBehavior):
def execute_for_circle(self, circle: CircleShape) -> Any:
return 2 * math.pi * circle.radius
Notice that we did not need to use the visitor term in our code examples. Adding the design pattern
name to the names of software entities (class/function names, etc.) often does not bring any real
benefit but makes the names longer. However, there are some design patterns, like the factory pattern
and builder pattern where you always use the design pattern name in a class name.
If you develop a third-party library and would like the behavior of its classes to be extended by its
users, you should make your library classes accept visitors that can perform additional behavior.
Use the null object pattern to implement a class for null objects that don’t do anything. A null object
can be used in place of a real object that does something.
Let’s have an example with a Shape protocol:
Object-Oriented Design Principles 235
class Shape(Protocol):
def draw(self) -> None:
pass
class NullShape(Shape):
def draw(self) -> None:
# Intentionally no operation
A null shape does not draw anything. We can use an instance of the NullShape class everywhere
where a concrete implementation of the Shape protocol is wanted.
If your object asks many things from another object using, e.g., multiple getters, you might be guilty
of the feature envy design smell. Your object is envious of a feature that the other object should have.
Let’s have an example and define a cube shape class:
class ThreeDShape(Protocol):
# ...
class Cube3DShape(ThreeDShape):
def __init__(self, width: int, height: int, depth: int):
self.__width: Final = width
self.__height: Final = height
self.__depth: Final = depth
@property
def width(self, ) -> int:
return self.__width
@property
def height(self) -> int:
return self.__height
Object-Oriented Design Principles 236
@property
def depth(self) -> int:
return self.__depth
Next, we define another class, CubeUtils, that contains a method for calculating the total volume of
cubes:
@final
class CubeUtils:
@staticmethod
def calculate_total_volume(cubes: list[Cube3DShape]) -> int:
total_volume = 0
for cube in cubes:
total_volume += cube.width * cube.height * cube.depth
return total_volume
In the calculate_total_volume method, we ask three times about a cube object’s state. This is against
the don’t ask, tell principle. Our method is envious of the volume calculation feature and wants to do
it by itself rather than telling a Cube3DShape object to calculate its volume.
Let’s correct the above code so that it follows the don’t ask, tell principle:
class ThreeDShape(Protocol):
def calculate_volume(self) -> int:
pass
class Cube3DShape(ThreeDShape):
def __init__(self, width: int, height: int, depth: int):
self.__width: Final = width
self.__height: Final = height
self.__depth: Final = depth
@final
class ThreeDShapeUtils:
@staticmethod
def calculate_total_volume(three_d_shapes: list[ThreeDShape]) -> int:
total_volume = 0
for three_d_shape in three_d_shapes:
total_volume += three_d_shape.calculate_volume()
return total_volume
Object-Oriented Design Principles 237
Now our calculate_total_volume method is not asking anything about a cube object. It just tells a
cube object to calculate its volume. We also removed the asking methods (getters/properties) from
the Cube3DShape class because they are no longer needed.
Below is another example of asking instead of telling:
import time
class AnomalyDetectionEngine:
def run(self) -> None:
while self.__is_running:
now = time.time()
if self.__anomaly_detector.anomalies_should_be_detected(now):
anomalies = self.__anomaly_detector.detect_anomalies()
# Do something with the detected anomalies ...
time.sleep(1)
In the above example, we ask the anomaly detector if we should detect anomalies now. Then,
depending on the result, we call another method on the anomaly detector to detect anomalies. This
could be simplified by making the detect_anomalies method to check if anomalies should be detected
using the anomalies_should_be_detected method. Then the anomalies_should_be_detected method
can be made private, and we can simplify the above code as follows:
class AnomalyDetectionEngine:
def run(self) -> None:
while self.__is_running:
anomalies = self.__anomaly_detector.detect_anomalies()
# Do something with the detected anomalies ...
time.sleep(1)
user.get_account().get_balance()
user.get_account().withdraw(...)
The above statements can be corrected either by moving functionality to a different class or by making
the second object to act as a facade between the first and the third object.
Below is an example of the latter solution, where we introduce two new methods in the User class
and remove the get_account method:
Object-Oriented Design Principles 238
user.get_account_balance()
user.withdraw_from_account(...)
In the above example, the User class is a facade in front of the Account class that we should not access
directly from our object.
However, you should always check if the first solution alternative could be used instead. It makes the
code more object-oriented and does not require creating additional methods.
Below is an example that uses User and SalesItem entities and is not obeying the law of Demeter:
from SalesItem import SalesItem
from User import User
sales_item_price = sales_item.get_price()
# ...
We can resolve the problem in the above example by moving the purchase method to the correct class,
in this case, the User class:
from Account import Account
from SalesItem import SalesItem
class User:
def __init__(self, account: Account):
self.__account = account
# ...
Many of us have experienced situations where we have supplied arguments to a function in the wrong
order. This is easy if the function, for example, takes two integer parameters, but you accidentally
give those two integer parameters in the wrong order. You don’t get a compilation error.
Another problem with primitive types as function arguments is that the argument values are not
necessarily validated. You have to implement the validation logic in your function.
Suppose you accept an integer parameter for a port number in a function. In that case, you might get
any integer value as the parameter value, even though the valid port numbers are from 1 to 65535.
Suppose you also had other functions in the same codebase accepting a port number as a parameter. In
that case, you could end up doing the same validation logic in multiple places and have thus duplicate
code in your codebase.
Let’s have a simple example of using this principle:
class RectangleShape(Shape):
def __init__(self, width: int, height: int):
self.__width = width
self.__height = height
In the above example, the constructor has two parameters with the same primitive type (int). It is
possible to give width and height in the wrong order. But if we refactor the code to use objects
instead of primitive values, we can make the likelihood of giving the arguments in the wrong order
much smaller:
T = TypeVar('T')
class Value(Generic[T]):
def __init__(self, value: T):
self.__value: Final = value
@property
def value(self) -> T:
return self.__value
class Width(Value[int]):
pass
class Height(Value[int]):
pass
class RectangleShape(Shape):
def __init__(self, width: Width, height: Height):
Object-Oriented Design Principles 240
self.__width = width.value
self.__height = height.value
width = Width(20)
height = Height(50)
# OK
rectangle = RectangleShape(width, height)
In the above example, Width and Height are simple data classes. They don’t contain any behavior.
You can use concrete data classes as function parameter types. There is no need to create an interface
for a data class. So, the program against interfaces principle does not apply here.
In Python, we have another way safe-guard against giving same type parameters in the wrong order:
use named parameters. Without the named parameters, we would create a new rectangle:
In the above example, we must be sure that we have the first parameter to be width and second
to be height. When using named parameters, we don’t have to remember the correct order of the
parameters:
Let’s have another simple example where we have the following function signature:
The above function signature allows function callers to supply a non-namespaced name accidentally.
By using a custom type for the namespaced name, we can formulate the above function signature to
the following:
Object-Oriented Design Principles 241
class NamespacedName:
def __init__(self, namespace: str, name: str):
self.__namespaced_name: Final = (
name if not namespace else (namespace + '.' + name)
)
Let’s have a more comprehensive example with an HttpUrl class. The class constructor has several
parameters that should be validated upon creating an HTTP URL:
class HttpUrl:
def __init__(
self,
scheme: str,
host: str,
port: int,
path: str,
query: str
):
self.__http_url: Final = (
scheme
+ "://"
+ host
+ ":"
+ str(port)
+ path
+ "?"
+ query
)
T = TypeVar('T')
class AbstractValidatedValue(Generic[T]):
def __init__(self, value: T):
self._value: Final = value
@abstractmethod
def is_valid(self) -> bool:
pass
class GetError(Exception):
pass
class HttpScheme(AbstractValidatedValue[str]):
# Because instances are immutable, we can cache the validation result
@cache
def is_valid(self) -> bool:
lowercase_value = self._value.lower()
return lowercase_value == 'https' or lowercase_value == 'http'
Let’s create a Port class (and similar classes for the host, path, and query should be created):
Object-Oriented Design Principles 243
class Port(AbstractValidatedValue[int]):
# Because instances are immutable, we can cache the validation result
@cache
def is_valid(self) -> bool:
return 1 <= self._value <= 65535
Let’s create a utility class, OptionalUtils, with a method for mapping a result for five optional values:
T = TypeVar('T')
U = TypeVar('U')
V = TypeVar('V')
X = TypeVar('X')
Y = TypeVar('Y')
R = TypeVar('R')
@final
class OptionalUtils:
@staticmethod
def map_all(
opt1: Optional[T],
opt2: Optional[U],
opt3: Optional[V],
opt4: Optional[X],
opt5: Optional[Y],
mapper: Callable[[T, U, V, X, Y], R]
) -> Optional[R]:
if (
opt1.is_present()
and opt2.is_present()
and opt3.is_present()
and opt4.is_present()
and opt5.is_present()
):
return Optional.of(
map(
opt1.try_get(),
opt2.try_get(),
opt3.try_get(),
opt4.try_get(),
opt5.try_get(),
)
)
Object-Oriented Design Principles 244
else:
return Optional.empty()
Next, we can reimplement the HttpUrl class to contain two alternative factory methods for creating
an HTTP URL:
# Imports ...
class PrivateConstructor(type):
def __call__(
cls: type[T], *args: tuple[Any, ...], **kwargs: dict[str, Any]
):
raise TypeError('Constructor is private')
def _create(
cls: type[T], *args: tuple[Any, ...], **kwargs: dict[str, Any]
) -> T:
return super().__call__(*args, **kwargs)
class HttpUrl(metaclass=PrivateConstructor):
def __init__(self, http_url: str):
self.__http_url = http_url
class CreateError(Exception):
pass
@classmethod
def try_create(
cls,
scheme: HttpScheme,
host: Host,
port: Port,
path: Path,
query: Query
) -> 'HttpUrl':
try:
return cls._create(
scheme.try_get()
+ '://'
+ host.try_get()
+ ':'
+ str(port.try_get())
+ path.try_get()
+ '?'
+ query.try_get()
)
except AbstractValidatedValue.GetError as error:
raise cls.CreateError(error)
maybe_http_url = HttpUrl.create(
HttpScheme('https'),
Host('www.google.com'),
Port(443),
Path('/query'),
Query('search=jee')
)
# Prints https://www.google.com:443/query?search=jee
print(maybe_http_url.try_get().url_string)
Notice how we did not hardcode the URL validation inside the HttpUrl class, but we created small
validated value classes: HttpScheme, Host, Port, Path, and Query. These classes can be further utilized
in other parts of the codebase if needed and can even be put into a common validation library for
broader usage.
An application typically receives unvalidated input data from external sources in the following ways:
Make sure that you validate any data received from the above mentioned sources. Preferably, use a
ready-made validation library or if needed, create your own validation logic.
When using dependency injection, the dependencies are injected only upon the application startup.
The application can first read its configuration and then decide what objects are created for the
application. In many languages, dependency injection is crucial for unit tests also. When executing a
unit test using DI, you can inject mock dependencies into the tested code instead of using the standard
dependencies of the application.
Below is an example of using the singleton pattern without dependency injection:
class LogLevel(Enum):
ERROR = 1
WARN = 2
INFO = 3
DEBUG = 4
TRACE = 5
class StdOutLogger(Logger):
@staticmethod
def log(log_level: LogLevel, message: str):
# Log to standard output
class Application:
def run(self):
StdOutLogger.log(LogLevel.Info, 'Starting application')
# ...
In the above example, we are using a static method of the hard-coded StdOutLogger class. It is difficult
to change the logger later and difficult to unit test a static method.
We should refactor the above code not to use static methods and to use dependency injection:
Object-Oriented Design Principles 247
class LogLevel(Enum):
ERROR = 1
WARN = 2
INFO = 3
DEBUG = 4
TRACE = 5
class Logger(Protocol):
def log(self, log_level: LogLevel, message: str):
pass
class StdOutLogger(Logger):
def log(self, log_level: LogLevel, message: str):
# Log to standard output
class Application:
@inject
def __init__(self, logger: Logger = Provide['logger']):
self.__logger = logger
def run(self):
self.__logger.log(LogLevel.INFO, 'Starting application')
# ...
class DiContainer(containers.DeclarativeContainer):
wiring_config = containers.WiringConfiguration(
modules=['Application']
)
logger = providers.Singleton(StdOutLogger)
Now it is easy to change a different logger. Let’s say that we want to log to a file instead of standard
output. We can introduce a new class for the file-based logger (following the open-closed principle)
Figure 4.84. FileLogger.py
class FileLogger(Logger):
def __init__(self, log_file_directory: str):
self.__log_file_directory = log_file_directory
Then we can change the DI container to use the file-based logger instead of std out logger:
Figure 4.85. DiContainer.py
import os
class DiContainer(containers.DeclarativeContainer):
wiring_config = containers.WiringConfiguration(
modules=['Application']
)
We can also change the logging behaviour dynamically based on the environment where the
application is running:
Object-Oriented Design Principles 249
import os
class DiContainer(containers.DeclarativeContainer):
wiring_config = containers.WiringConfiguration(
modules=['Application']
)
if os.environ.get('LOG_DESTINATION') == 'file':
logger = providers.Singleton(FileLogger, os.environ.get('LOG_DIRECTORY'))
else:
logger = providers.Singleton(StdOutLogger)
For all you full-stack Python developers, below is a TypeScript example of a data-visualization-web-
client where the noicejs NPM library is used for dependency injection. This library is similar to the
Google Guice library. Below is a FakeServicesModule class that configures dependencies for different
backend services that the web client uses. As you can notice, all the services are configured to use fake
implementations because this DI module is used when the backend services are not yet available. A
RealServicesModule class can be implemented and used when the backend services become available.
In the RealServicesModule class, the services are bound to their actual implementation classes instead
of fake implementations.
this.bind('measureService')
.toInstance(new FakeMeasureService());
this.bind('dimensionService')
.toInstance(new FakeDimensionService());
this.bind('chartDataService')
.toInstance(new FakeChartDataService());
);
}
}
With the noicejs library, you can configure several DI modules and create a DI container from the
wanted modules. The module approach lets you divide dependencies into multiple modules, so you
Object-Oriented Design Principles 250
don’t have a single big module. It also lets you instantiate a different module or modules based on
the application configuration.
In the below example, the DI container is created from a single module, an instance of the
FakeServicesModule class:
In the development phase, we could create two separate modules, one for fake services and another one
for real services, and control the application behavior based on the web page’s URL query parameter:
Figure 4.88. diContainer.ts
import { Container } from 'noicejs';
import FakeServicesModule from './FakeServicesModule';
import RealServicesModule from './RealServicesModule';
Then you must configure the diContainer before dependency injection can be used. In the below
example, the diContainer is configured before a React application is rendered:
Figure 4.89. app.ts
import React from 'react';
import ReactDOM from 'react-dom';
import diContainer from './diContainer';
import AppView from './app/view/AppView';
diContainer.configure().then(() => {
ReactDOM.render(<AppView />, document.getElementById('root'));
});
Then, in Redux actions, where you need a service, you can inject the required service with the @Inject
decorator. You specify the name of the service you want to inject. The service will be injected as the
class constructor argument’s property (with the same name).
Object-Oriented Design Principles 251
// Imports ...
type ConstructorArgs = {
chartDataService: ChartDataService,
chart: Chart,
dispatch: Dispatch;
};
export default
@Inject('chartDataService')
class StartFetchChartDataAction extends AbstractChartAreaAction {
private readonly chartDataService: ChartDataService;
private readonly chart: Chart;
constructor({ chart,
chartDataService,
dispatch }: ConstructorArgs) {
super(dispatch);
this.chartDataService = chartDataService;
this.chart = chart;
}
this.chart.isFetchingChartData = true;
return ChartAreaStateUpdater
.getNewStateForChangedChart(currentState, this.chart);
}
}
// Imports...
// Imports...
constructor(reduxDispatch: ReduxDispatch) {
this.dispatch = (action: AbstractAction<any>) =>
reduxDispatch({ type: action });
}
dispatchWithDi(
diContainer: { create: (...args: any[]) => Promise<any> },
ActionClass:
abstract new (...args: any[]) => AbstractAction<any>,
otherArgs: {}
) {
// diContainer.create will create a new object of
// class ActionClass.
// The second parameter of the create function defines
// additional properties supplied to ActionClass constructor.
// The create method is asynchronous. When it succeeds,
// the created action object is available in the 'then'
// function and it can be now dispatched
diContainer
.create(ActionClass, {
dispatch: this.dispatch,
...otherArgs
})
.then((action: any) => this.dispatch(action));
}
}
class InputMessage(Protocol):
def try_decode_schema_id(self) -> int:
pass
class AvroBinaryKafkaInputMessage(InputMessage):
def __init__(self, kafka_message: KafkaMessage):
self.__kafka_message = kafka_message
If we wanted to introduce a new Kafka input message class for JSON, CSV, or XML format, we could
create a class like the AvroBinaryKafkaInputMessage class. But then we can notice the duplication of
code in the try_decode method. We can notice that the try_decode method is the same regardless of
the input message source and format. According to this principle, we should move the duplicate code
to a common base class, AbstractInputMessage. We could make the try_decode method a template
method according to the template method pattern and create abstract methods for getting the message
data and its length:
Object-Oriented Design Principles 255
class AbstractInputMessage(InputMessage):
@abstractmethod
def try_decode_schema_id(self) -> int:
pass
@abstractmethod
def _get_data(self) -> bytearray:
pass
@abstractmethod
def _get_length(self) -> int:
pass
# Template method
@final
def try_decode(self, schema: Schema) -> DecodedMessage:
return schema.try_decode_message(
self._get_data(),
self._get_length()
)
class AbstractKafkaInputMessage(AbstractInputMessage):
def __init__(self, kafka_message: KafkaMessage):
self.__kafka_message = kafka_message
@abstractmethod
def try_decode_schema_id(self) -> int:
pass
return self.__kafka_message.payload
class AvroBinaryKafkaInputMessage(AbstractKafkaInputMessage):
def try_decode_schema_id(self) -> int:
# Try decode the schema id from the beginning of
# the Avro binary Kafka message
# Use base class _get_data() and _get_length()
# methods to achieve that
In a CSS file, you define CSS properties for CSS classes, for example:
.icon {
background-repeat: no-repeat;
background-size: 1.9rem 1.9rem;
display: inline-block;
height: 2rem;
margin-bottom: 0.2rem;
margin-right: 0.2rem;
width: 2rem;
}
.pie-chart-icon {
background-image: url('pie_chart_icon.svg');
}
The problem with the above approach is that it is not correctly object-oriented. In the HTML code,
you must list all the class names to achieve a mixin of all the needed CSS properties. It is easy to
forget to add a class name. For example, you could specify pie-chart-icon only and forget to specify
the icon.
It is also difficult to change the inheritance hierarchy afterward. Suppose you wanted to add a new
class chart-icon for all the chart icons:
Object-Oriented Design Principles 257
.chart-icon {
// Define properties here...
}
You would have to remember to add the chart-icon class name to all places in the HTML code where
you are rendering chart icons:
The above-described approach is very error-prone. What you should do is introduce proper object-
oriented design. You need a CSS preprocessor that makes extending CSS classes possible. In the below
example, I am using SCSS:
<span class="pieChartIcon">...</span>
.icon {
background-repeat: no-repeat;
background-size: 1.9rem 1.9rem;
display: inline-block;
height: 2rem;
margin-bottom: 0.2rem;
margin-right: 0.2rem;
width: 2rem;
}
.chartIcon {
@extend .icon;
.pieChartIcon {
@extend .chartIcon;
background-image: url('../../../../../assets/images/icons/chart/pie_chart_icon.svg');
}
In the above example, we define only one class for the HTML element. The inheritance hierarchy
is defined in the SCSS file using the @extend directive. We are now free to change the inheritance
hierarchy in the future without any modification needed in the HTML code.
5: Coding Principles
This chapter presents principles for coding. The following principles are presented:
At best, having your code written with great names makes it read like prose. And remember that
code is more often read than written, so code must be easy to read and understand.
Naming variables with names that also convey information about the variable’s type is crucial in
untyped languages and beneficial in typed languages, too, because modern typed languages use
automatic type deduction, and you won’t always see the actual type of a variable. But when the
variable’s name tells its type, it does not matter if the type name is not visible.
As a rule of thumb if a variable name is 20 or more characters long, you should consider making
it shorter. Try abbreviate one or more words in the variable name, but only use meaningful and
well-known abbreviations. If such abbreviations don’t exists, then don’t abbreviate at all. For
example, if you have a variable named environment_variable_name, you should try to shorten it,
because it is over 20 characters long. You can abbreviate environment to environ and variable to var
resulting in a variable name environ_var_name which is short enough. Both abbreviations environ
and var are commonly used and well understood. Let’s have another example of a variable named
loyalty_bonus_percentage. You cannot abbreviate loyalty. You cannot abbreviate bonus. But you can
Coding Principles 259
abbreviate percentage to percent or even pct. I would rather use percent instead of pct. Using percent
makes the variable name shorter than 20 characters (underscores are not counted as variable name
characters).
In the following sections, naming conventions for different types of variables are proposed.
# ...
We can change the variable name to pool_is_full to make the if-statement read more fluently. In the
below example, the if-statements reads “if pool_is_full” instead of “if is_pool_full”:
# ...
Don’t use boolean variable names in the form of <passive-verb>_something, like inserted_field,
because this can confuse the reader. It is unclear if the variable name is a noun that names an object
or a boolean statement. Instead, use either did_insert_field or field_was_inserted.
Below is an example of the incorrect naming of a variable used to store a function return value. The
drop_redundant_tables function returns a boolean. Someone might think that tables_dropped means
a list of dropped table names. So, the name of the variable is obscure and should be changed.
Coding Principles 261
tables_dropped = drop_redundant_tables(
prefix,
vms_data,
config.database,
hive_client,
logger
)
if tables_dropped:
# ...
Below is the above example modified so that the variable name is changed to indicate a boolean
statement:
tables_were_dropped = drop_redundant_tables(
prefix,
vms_data,
config.database,
hive_client,
logger
)
if tables_were_dropped:
# ...
You could have used a variable named did_drop_tables, but the tables_were_dropped makes the if-
statement more readable. If the return value of the drop_redundant_tables function was a list of
dropped table names, I would name the return value receiving variable as dropped_table_names.
When you read code containing a negated boolean variable, it usually reads bad, for example:
app_was_started = app.start()
if not app_was_started:
# ...
What you can do is to mentally move the not word to the correct place in the sentence to make to
sentence read like proper English. For example: if app was not started
The other option is to negate the variable. That is done by negating both sides of the assignment by
adding not on both sides of the assignment operator. Here is an example:
if app_was_not_started:
# ...
Coding Principles 262
try:
year = int(year_as_string)
except ValueError:
# ...
If you have a variable that could be confused with an object variable, like schema, but it is a string,
add string to the end of the variable name, i.e. schema_string. Here is an example:
schema = schema_parser.parse(schema_string)
if result == Result.Ok:
# ...
Let’s add some detail and context to the result variable name:
producer_create_result = pulsar_client.create_producer(...)
if producer_create_result == Result.Ok:
# ...
customers = [...]
processed_customers = process(customers)
integers = [1, 2, 3, 4, 5]
even_integers = filter(even, integers)
In most cases, this is enough because you don’t necessarily need to know the underlying collection
implementation. Using this naming convention allows you to change the type of a collection variable
without needing to change the variable name. If you are iterating over a collection, it does not matter
if it is an array, list, or set. Thus, it does not bring any benefit if you add the collection type name to
the variable name, for example, customer_list or task_set. Those names are just longer. You might
want to specify the collection type in some special cases. Then, you can use the following kind of
variable names: queue_of_tasks, stack_of_cards, or set_of_timestamps.
Below is an example, where the function is named correctly to return a collection (of categories),
but the variable receiving the return value is not named according to the collection variable naming
convention:
order_count = customer_name_to_order_count.get(customer_name)
suppliers = product_name_to_suppliers.get(product_name)
customer_name_to_order_count = {
'John': 10,
'Peter': 5
}
for (
customer_name,
order_count,
) in customer_name_to_order_count.items():
print(customer_name, order_count)
person object of the Person class, an account object of the Account class, etc. You can freely decorate
the object’s name, for example, with an adjective: completed_task. It is important to include the class
name or at least some significant part of it at the end of the variable name. Then looking at the end
of the variable name tells what kind of object is in question.
Sometimes you might want to name an object variable so that the name of its class is implicit, for
example:
In the above example, the classes of home and destination objects are not explicit. In most cases,
it is preferable to make the class name explicit in the variable name when it does not make the
variable name too long. This is because of the variable type deduction. The types of variables are
not necessarily visible in the code, so the type of a variable should be communicated by the variable
name. Below is an example where the types of function parameters are explicit.
When you create optional types using a type union, you don’t need any prefixes in optional variable
names. In the below example, the discount parameter is optional:
price_with_tax = add_tax(price_without_tax)
values = [1, 2, 3, 4, 5]
doubled_values = [doubled(value) for value in values]
doubled_values2 = list(map(doubled, values))
squared_values = [squared(value) for value in values]
squared_values2 = list(map(squared, values))
even_values = [value for value in values if even(value)]
even_values2 = list(filter(even, values))
def sum(accum_sum: int | float, number: int | float) -> int | float:
return accum_sum + number
If the callback function is very simple and short like the doubled and squared functions are, we can
inline them in Python list comprehensions making them a bit shorter:
To understand what happens in the above code, you should start reading from the innermost function
call and proceed toward the outermost function call. When traversing the function call hierarchy, the
difficulty lies in storing and retaining information about all the nested function calls in short-term
memory.
We could simplify reading the above example by giving a name to the anonymous function and
introducing variables (constants) for intermediate function call results. Of course, our code becomes
more prolonged, but coding is not a competition to write the shortest possible code but to write
Coding Principles 267
the shortest, most readable, and understandable code for other people and your future self. It is a
compiler’s job to compile the below longer code into as efficient code as the above shorter code.
Below is the above code refactored:
Let’s think hypothetically: if Clojure’s map function took parameters in a different order and the range
function was named integers and the take function was named take-first (like take-last), we would
have an even more explicit version of the original code:
class Order:
def __init__(self, order_id: int, order_state: OrderState):
self.__order_id = order_id
self.__order_state = order_state
class Order:
def __init__(self, id_: int, state: OrderState):
self._*id = id*
self._state = state
If you have a class property to store a callback function (e.g., event handler or lifecycle callback),
you should name it so that it tells on what occasion the stored callback function is called. Name
properties storing event handlers using the following pattern: on + <event-type>, e.g., on_click or
on_submit. Name properties storing lifecycle callbacks in a similar way you would name a lifecycle
method, for example: on_init, after_mount, or before_mount.
Coding Principles 268
When picking a name for something, use the most common shortest name. If you have a function
named relinquish_something, consider a shorter and more common name for the function. You could
rename the function to release_something, for example. The word “release” is shorter and more
common than the “relinquish” word. Use Google to search for word synonyms, e.g., “relinquish
synonym”, to find the shortest and most common similar term.
Let’s assume that you are building a data exporter microservice and you are currently using the
following terms in the code: message, report, record and data. Instead of using four different terms
to describe the same thing, you should pick just one term, like message, for example, and use it
consistently throughout the microservice code.
Many abbreviations are commonly used, like str for a string, num/nbr for a number, prop for a
property, or val for a value. Most programmers use these, and I use them to make long names
shorter. If a variable name is short, the full name should be used instead, like number_of_items
instead of nbr_of_items. Use abbreviations in cases where the variable name otherwise becomes too
long (20 or more characters). What I especially try to avoid is using uncommon abbreviations. For
example, I would never abbreviate amount to amnt or discount to dscnt because I haven’t seen those
abbreviations used much in real life.
Names that are too short do not communicate what the variable is about. Avoid using a single
character variable name like in the following example which starts five threads:
Instead use a proper variable name to indicate what the loop counter is for:
If you don’t need to use the loop counter value inside the loop, you can use an underscore as the loop
variable name to indicate that it is not used. The below loop loops object_count times:
Coding Principles 269
for _ in range(object_count):
objects.append(acquire_object())
You can create source code repositories containing starter projects per each technology stack to
ensure the uniformity of the repository structures. Below is an example how to structure source code
repositories for a Python microservice. In the below example, a containerized (Docker) microservice
deployed to a Kubernetes cluster is assumed. Your CI tool might require that the CI pipeline code
must reside in a specific directory. But if not, place the CI pipeline code in a ci directory.
my-python-service
├── ci
│ └── Jenkinsfile
├── docker
│ ├── Dockerfile
│ └── docker-compose.yml
├── docs
├── env
│ ├── .env.dev
│ └── .env.ci
├── helm
│ └── my-python-service
│ ├── templates
│ ├── .helmignore
│ ├── Chart.yaml
│ ├── values.schema.json
│ └── values.yaml
├── integration-tests
│ ├── features
│ │ └── feature1.feature
│ └── steps
├── scripts
│ └── // Bash scripts here...
├── src
├── venv
├── .gitignore
├── .pylintrc
└── README.MD
Usually unit tests should be located in the same directory as source code modules, but you can also
put them in a specific test directory.
Coding Principles 270
Below is an example of a microservice’s src directory that is not organized by domains but is
incorrectly organized according to technical details:
example-service/
└── src/
├── controllers/
│ ├── AController.py
│ └── BController.py
├── entities/
│ ├── AEntity.py
│ └── BEntity.py
├── errors/
│ ├── AError.py
│ └── BError.py
├── dtos/
│ ├── ADto.py
│ └── BDto.py
├── repositories/
│ ├── ARepository.py
│ └── BRepository.py
└── services/
├── AService.py
└── BService.py
Below is the above example modified so that directories are organized by domains:
example-service/
└── src/
├── domainA/
│ ├── AController.py
│ ├── ADto.py
│ ├── AEntity.py
│ ├── AError.py
│ ├── ARepository.py
│ └── AService.py
└── domainB/
├── BController.py
├── BDto.py
├── BEntity.py
├── BError.py
├── BRepository.py
└── BService.py
example-service/
└── src/
├── domainA/
│ ├── domainA-1/
│ │ ├── A1Controller.py
│ │ └── ...
│ └── domainA-2/
│ ├── A2Controller.py
│ └── ...
└── domainB/
├── BController.py
└── ...
If you want, you can create subdirectories for technical details inside a domain directory. This is
the recommended approach if, otherwise, the domain directory would contain more than 5 to 7 files.
Below is an example of the salesitem domain:
sales-item-service
└── src
└── salesitem
├── dtos
│ ├── InputSalesItem.py
│ └── OutputSalesItem.py
├── entities
│ └── SalesItem.py
├── errors
│ ├── SalesItemServiceError.py
│ ├── Error1.py
│ └── Error2.py
├── repository
│ ├── SalesItemRepository.py
│ └── SalesItemRepositoryImpl.py
├── service
│ ├── SalesItemService.py
│ └── SalesItemServiceImpl.py
└── SalesItemController.py
To highlight the clean microservice design principle, we could also use the following kind of directory
layout:
sales-item-service
└── src
└── salesitem
├── businesslogic
│ ├── dtos
│ │ ├── InputSalesItem.py
│ │ └── OutputSalesItem.py
│ ├── entities
│ │ └── SalesItem.py
│ ├── errors
│ │ ├── SalesItemServiceError.py
│ │ ├── Error1.py
│ │ └── Error2.py
│ ├── repository
│ │ └── SalesItemRepository.py
Coding Principles 272
│ └── service
│ ├── SalesItemService.py
│ └── SalesItemServiceImpl.py
├── SalesItemController.py
└── SalesItemRepositoryImpl.py
sales-item-service
└── src
└── salesitem
├── businesslogic
│ ├── dtos
│ │ ├── InputSalesItem.py
│ │ └── OutputSalesItem.py
│ ├── entities
│ │ └── SalesItem.py
│ ├── errors
│ │ ├── SalesItemServiceError.py
│ │ ├── Error1.py
│ │ └── Error2.py
│ ├── repository
│ │ └── SalesItemRepository.py
│ └── service
│ ├── SalesItemService.py
│ └── SalesItemServiceImpl.py
├── controllers
│ ├── FlaskRestSalesItemController.py
│ └── AriadneGraphQlSalesItemController.py
└── ifadapters
├── SqlSalesItemRepository.py
└── MongoDbSalesItemRepository.py
In the above example, when following the clean microservice design principle, if you add or change
a controller or an interface adapter, you should not need to make any changes to the business logic
part of the service.
Below is the source code directory structure for the data exporter microservice designed in the
previous chapter. There are subdirectories for the four subdomains: input, internal message,
transformer, and output. There is a subdirectory created for each common nominator in the class
names. It is effortless to navigate the directory tree when locating a particular file. Also, the number
of source code files in each directory is low. You can grasp the contents of a directory with a glance.
The problem with directories containing many files is that it is not easy to find the wanted file. For
this reason, a single directory should ideally have 2-4 files. The absolute maximum is 5-7 files.
Note that below, a couple of directories are left unexpanded to shorten the example. It should be easy
for the reader to infer the contents of the unexpanded directories.
Coding Principles 273
src
├── common
├── input
│ ├── config
│ │ ├── parser
│ │ │ ├── InputConfigParser.py
│ │ │ └── JsonInputConfigParser.py
│ │ ├── reader
│ │ │ ├── InputConfigReader.py
│ │ │ └── LocalFileSystemInputConfigReader.py
│ │ ├── InputConfig.py
│ │ └── InputConfigImpl.py
│ └── message
│ ├── consumer
│ │ ├── InputMsgConsumer.py
│ │ └── KafkaInputMsgConsumer.py
│ ├── decoder
│ │ ├── InputMsgDecoder.py
│ │ └── AvroBinaryInputMsgDecoder.py
│ ├── InputMessage.java
│ └── KafkaInputMessage.java
├── internalmessage
│ ├── field
│ ├── InternalMessage.java
│ └── InternalMessageImpl.java
├── transformer
│ ├── config
│ ├── field
│ │ ├── impl
│ │ │ ├── CopyFieldTransformer.py
│ │ │ ├── ExprFieldTransformer.py
│ │ │ ├── FilterFieldTransformer.py
│ │ │ └── TypeConvFieldTransformer.py
│ │ ├── FieldTransformer.py
│ │ ├── FieldTransformers.py
│ │ └── FieldTransformersImpl.py
│ └── message
│ ├── MsgTransformer.py
│ └── MsgTransformerImpl.py
└── output
├── config
└── message
├── encoder
└── producer
We could also structure the code according to the clean microservice design in the following way:
Coding Principles 274
src
├── common
├── businesslogic
│ ├── input
│ │ ├── config
│ │ │ ├── InputConfig.py
│ │ │ ├── InputConfigImpl.py
│ │ │ ├── InputConfigParser.py
│ │ │ └── InputConfigReader.py
│ │ └── message
│ │ ├── InputMessage.java
│ │ ├── InputMsgConsumer.py
│ │ └── InputMsgDecoder.py
│ ├── internalmessage
│ │ ├── field
│ │ ├── InternalMessage.java
│ │ └── InternalMessageImpl.java
│ ├── transformer
│ │ ├── config
│ │ ├── field
│ │ │ ├── impl
│ │ │ │ ├── CopyFieldTransformer.py
│ │ │ │ ├── ExprFieldTransformer.py
│ │ │ │ ├── FilterFieldTransformer.py
│ │ │ │ └── TypeConvFieldTransformer.py
│ │ │ ├── FieldTransformer.py
│ │ │ ├── FieldTransformers.py
│ │ │ └── FieldTransformersImpl.py
│ │ └── message
│ │ ├── MsgTransformer.py
│ │ └── MsgTransformerImpl.py
│ └── output
│ ├── config
│ └── message
└── ifadapters
├── config
│ ├── parser
│ │ └── json
│ │ ├── JsonInputConfigParser.py
│ │ ├── JsonTransformerConfigParser.py
│ │ └── JsonOutputConfigParser.py
│ └── reader
│ └── localfilesystem
│ ├── LocalFileSystemInputConfigReader.py
│ ├── LocalFileSystemTransformerConfigReader.py
│ └── LocalFileSystemOutputConfigReader.py
├── input
│ ├── kafka
│ │ ├── KafkaInputMsgConsumer.py
│ │ └── KafkaInputMessage.java
│ └── AvroBinaryInputMsgDecoder.py
└── output
├── CsvOutputMsgEncoder.py
└── PulsarOutputMsgProducer.py
From the above directory structure we can easily see the following:
• Configurations are in JSON format and read from the local file system
Coding Principles 275
Any change we want/need to do in the ifadapters directory should not affect the business logic part
in the businesslogic directory.
Below is the source code directory structure for the anomaly detection microservice designed in
the previous chapter. The anomaly directory is expanded. We can see that our implementation
is using JSON for various parsing activities and self-organizing maps (SOM) is used for anomaly
detection. JSON and Kafka are used to publish anomaly indicators outside the microservice. Adding
new concrete implementations to the below directory structure is straightforward. For example, if
we wanted to add YAML support for configuration files, we could create yaml subdirectories where
we could place YAML-specific implementation classes.
src
├── anomaly
│ ├── detection
│ │ ├── configuration
│ │ │ ├── factory
│ │ │ │ ├── AnomalyDetectionConfigFactory.py
│ │ │ │ └── AnomalyDetectionConfigFactoryImpl.py
│ │ │ ├── parser
│ │ │ │ ├── AnomalyDetectionConfigParser.py
│ │ │ │ └── JsonAnomalyDetectionConfigParser.py
│ │ │ ├── AnomalyDetectionConfig.py
│ │ │ └── AnomalyDetectionConfigImpl.py
│ │ ├── engine
│ │ │ ├── AnomalyDetectionEngine.py
│ │ │ └── AnomalyDetectionEngineImpl.py
│ │ ├── rule
│ │ │ ├── factory
│ │ │ │ ├── AnomalyDetectionRuleFactory.py
│ │ │ │ └── AnomalyDetectionRuleFactoryImpl.py
│ │ │ ├── parser
│ │ │ │ ├── AnomalyDetectionRuleParser.py
│ │ │ │ └── AnomalyDetectionRuleParserImpl.py
│ │ │ ├── AnomalyDetectionRule.py
│ │ │ └── AnomalyDetectionRuleImpl.py
│ │ ├── AnomalyDetector.py
│ │ └── AnomalyDetectorImpl.py
│ ├── indicator
│ │ ├── factory
│ │ │ ├── AnomalyIndicatorFactory.py
│ │ │ └── AnomalyIndicatorFactoryImpl.py
│ │ ├── publisher
│ │ │ ├── AnomalyIndicatorPublisher.py
│ │ │ └── KafkaAnomalyIndicatorPublisher.py
│ │ ├── serializer
│ │ │ ├── AnomalyIndicatorSerializer.py
│ │ │ └── JsonAnomalyIndicatorSerializer.py
│ │ ├── AnomalyIndicator.py
│ │ └── AnomalyIndicatorImpl.py
│ └── model
│ ├── factory
│ │ ├── AnomalyModelFactory.py
Coding Principles 276
│ │ └── AnomalyModelFactoryImpl.py
│ ├── training
│ │ ├── engine
│ │ │ ├── AnomalyModelTrainingEngine.py
│ │ │ └── AnomalyModelTrainingEngineImpl.py
│ │ ├── AnomalyModelTrainer.py
│ │ └── SomAnomalyModelTrainer.py
│ ├── AnomalyModel.py
│ └── SomAnomalyModel.py
├── common
├── measurement
├── app.py
For full-stack Python developers, let’s have one more example with a data-visualization-web-client.
This web client’s UI consists of the following pages, which all include a common header:
• Dashboards
• Data Explorer
• Alerts
The Dashboards page contains a dashboard group selector, dashboard selector, and chart area to
display the selected dashboard’s charts. You can select the shown dashboard by first selecting a
dashboard group and then a dashboard from that group.
Coding Principles 277
The Data Explorer page contains selectors for choosing a data source, measure(s), and dimension(s).
The page also contains a chart area to display charts. Using the selectors, a user can change the shown
measure(s) and dimension(s) for the currently selected chart in the chart area.
Coding Principles 278
Based on the above design, the web client can be divided into the following subdomains:
• Common UI components
– Chart Area
* Chart
• Header
• Pages
– Alerts
– Dashboards
– Data Explorer
src
├── app
│ ├── common
│ │ └── chartarea
│ │ └── chart
│ ├── header
│ └── pages
│ ├── alerts
│ ├── dashboards
│ │ └── selectors
│ │ ├── dashboardgroup
│ │ └── dashboard
│ └── dataexplorer
│ └── selectors
│ ├── datasource
│ ├── dimension
│ └── measure
├── index.ts
└── store.ts
Below is an example of what a single subdomain directory can look like when using React, Redux
and SCSS modules:
src
├── app
│ └── header
│ ├── model
│ │ ├── actions
│ │ │ ├── AbstractHeaderAction.ts
│ │ │ └── NavigateToPageAction.ts
│ │ ├── services
│ │ └── state
│ │ ├── types
│ │ ├── HeaderState.ts
│ │ └── initialHeaderState.ts
│ ├── view
│ │ ├── navigation
│ │ │ ├── NavigationView.module.scss
│ │ │ └── NavigationView.tsx
│ │ ├── HeaderView.module.scss
│ │ └── HeaderView.tsx
│ └── headerController.ts
├── index.ts
└── store.ts
In the above example, we have created two directories for the technical details of the header domain:
model and view directories. The model directory contains actions, services, and the state, and the
view directory contains the view component, its possible subcomponents and CSS definitions. The
model’s state directory can contain a subdirectory for types used in the subdomain state. The state
directory should always contain the type definition for the subdomain’s state and the initial state.
The services directory contains a service or services that use backend services to control the backend
model.
Coding Principles 280
Comments can be problematic. You cannot trust them 100% because they can be misleading, outdated,
or downright wrong. You can only trust the source code itself. Comments are often entirely
unnecessary and only make the code more verbose. Allowing comments can produce code that
contains bad names explained with an attached comment and the code typically also contains too long
functions where functionality blocks are described with attached comments instead of refactoring
the code by extracting well-named functions. The following sections describe several ways to avoid
writing comments and still keep your code understandable. The following things can be done to avoid
writing comments:
– For example, if you are using a certain algorithm, don’t document that algorithm in a
comment, but name the respective class/function so that it contains the algorithm name.
Readers can then google the algorithm by name, if they are not familiar with it.
• You should not add comments about variable/function types. Use type annotations everywhere.
• You don’t need to comment that a function can raise an error. Use the function name try prefix
convention described later in this chapter.
• Don’t add a comment to a piece of code, but extract a new well-named function
• Keep your functions small, in that way they are easier to understand, because they cannot
contain too complex logic that could justify comments are needed
• Don’t add as a comment information that can be obtained from the version control system.
• Don’t comment out code. Just remove the unused code. The removed code will be available in
the version control system forever.
• You don’t have to comment the logic that a function uses. Code readers should be able to infer
that information from the code itself, and additionally also from the related unit tests. Complex
code logic and behaviour does not usually need comments if you have practised TDD and there
is a complete set of well-named unit tests available for the function in question.
Comments for a public API in a library are needed, because the library needs API documentation
that can be automatically generated from the comments to avoid situations where API comments
and docs are out of sync. In non-library software components, API documentation is usually not
needed, because you have access to the API interface, implementation and unit tests. The unit tests,
for example, specify what the function does in different scenarios. The unit test name tells the scenario
and the expectations and assertions in the unit test code tell the expected behaviour in the particular
situation. API implementation and unit tests are not typically available for library users, and even if
they are, a user should not adhere to them, because they internal details subject to change.
Coding Principles 281
class MessageBuffer:
# Return False if buffer full,
# True if message written to buffer
def write(self, message: Message) -> bool:
# ...
class MessageBuffer:
def write(self, message: Message) -> bool:
# ...
Dropping the comment alone is not the best solution because some crucial information is now missing.
What does that boolean return value mean? It is not 100% clear. We can assume that returning True
means that message was successfully written, but nothing is communicated about returning False.
We can only assume it is some error, but not sure what error.
In addition to removing the comment, we should give a better name for the function and rename it
as follows:
class MessageBuffer:
def write_if_buf_not_full(self, message: Message) -> bool:
pass
Now the purpose of the function is clear, and we can be sure what the boolean return value means.
It means whether the message was written to the buffer. Now we also know why the writing of a
message can fail: the buffer is full. This will give the function caller sufficient information about
what to do next. It should probably wait a while so that the buffer reader has enough time to read
messages from the buffer and free up some space.
Below is a real-life example of C++ code where the comment and the function name does not match:
Coding Principles 282
/**
* @brief Add new counter or get existing, if same labels used already.
* @param counterName Name of the counter
* @param help Help text added for counter, if new countername
* @param labels Specific labels for counter.
* @return counter pointer used when increasing counter, or nullptr
* if metrics not initialized or invalid name or labels
*/
static prometheus::Counter* addCounter(
std::string counterName,
std::string help,
const std::map<std::string, std::string>& labels);
In the above example, the function name tells it adds a counter, but the comment says it adds or gets an
existing counter. The real problem is that once someone first reads the function name ‘addCounter’,
she/he does not necessarily read the ‘brief’ in the comments, because she/he immediately understands
what function does when reading its name: it should add an counter. As a solution, we could improve
the name of the function to be addOrGetExistingCounter.
Below is a real-life example from a book that I once read:
class Mediator(Protocol):
# To register an employee
def register(self, person: Person) -> None:
pass
There are three functions in the above example, each of which has a problem. The first function is
registering a person, but the comment says it is registering an employee. So, there is a mismatch
between the comment and the code. In this case, I trust the code over the comment. The correction
is to remove the comment because it does not bring any value. It only causes confusion.
The second function says in the comment that it sends a message from one employee to another. The
function name tells about connecting employees, but the parameters are persons. I assume that a part
of the comment is correct: to send a message from someone to someone else. But once again, I trust
the code more over the comment and assume the message is sent from one person to another. We
should remove the comment and rename the function.
Coding Principles 283
In the third function, the comment adds information missing from the function name. The comment
also discusses members, as other parts of the code speak about employees and persons. There are
three different terms used: employee, person, and member. Just one term should be picked. Let’s
choose the term person and use it systematically.
Below is the refactored version without the comments:
class Mediator(Protocol):
def register(self, person: Person) -> None:
pass
class Metrics(Protocol):
# ...
def add_counter(
self,
counter_family: CounterFamily,
labels: dict[str, str]
) -> int:
pass
def increment_counter(
self,
counter_index: int,
increment_amount: int
) -> None:
pass
What is the return value of the add_counter function? Someone might think a comment is needed to
describe the return value because it is unclear what int means. Instead of writing a comment, we can
introduce a named value (= variable/constant) to be returned from the function. The idea behind the
named return value is that it communicates the semantics of the return value without the need for a
comment. Below is the implementation for the add_counter function:
class Metrics(Protocol):
def add_counter(
self,
counter_family: CounterFamily,
labels: dict[str, str]
) -> int:
# Perform adding a counter here and
# set value for the 'counter_index' variable
return counter_index
In the above implementation, we have a single return of a named value at the end of the function. All
we have to do is to look at the end of the function and spot the return statement, which should tell
us the meaning of the mysterious int typed return value: It is a counter index. And we can spot that
the increase_counter function requires a counter_index argument and this establishes a connection
between calling the add_counter function first, storing the returned counter index, and later using
that stored counter index in calls to the increase_counter function.
CounterIndex = int
class Metrics(Protocol):
# ...
def add_counter(
self,
counter_family: CounterFamily,
labels: dict[str, str]
Coding Principles 285
) -> CounterIndex:
pass
def increment_counter(
self,
counter_index: CounterIndex,
increment_amount: int
) -> None:
pass
We can improve the above metrics example. First of all, we should avoid the primitive type obsession.
We should not be returning an index from the add_counter method, but we should rename the method
as create_counter and return an instance of a Counter class from the method. Then we should make
the example more object-oriented by moving the increment_counter method to the Counter class
and renaming it just increment. Also, the name of the Metrics class itself should be changed to
MetricFactory.
class MessageBuffer:
def write_if_buf_not_full(self, message: Message) -> bool:
message_was_written = False
if len(self.__messages) < self.__max_length:
# Buffer is not full
self.__messages.append(message)
message_was_written = True
return message_was_written
By introducing a constant to be used in the “buffer is full” check, we can get rid of the “Buffer is not
full” comment:
Coding Principles 286
class MessageBuffer:
def write_if_buf_not_full(self, message: Message) -> bool:
message_was_written = False
buffer_is_not_full = len(self.__messages) < self.__max_length
if buffer_is_not_full:
self.__messages.append(message)
message_was_written = True
return message_was_written
import sys
application = Application()
if application.run():
# Application was run successfully
sys.exit(0)
Let’s introduce an enumerated type, ExitCode, and use it instead of magic numbers:
import sys
from enum import IntEnum
class ExitCode(IntEnum):
Success = 0
Failure = 1
application = Application()
app_was_successfully_run = application.run()
exit_code = (
ExitCode.Success if app_was_successfully_run else ExitCode.Failure
)
sys.exit(exit_code)
It is now easy to add more exit codes with descriptive names later if needed.
Coding Principles 287
class MessageBuffer:
def write_fitting(self, messages: list[Message]) -> None:
if len(self.__messages) + len(messages) <= self.__max_length:
# All messages fit in buffer
self.__messages.extend(messages)
messages.clear()
else:
# All messages do not fit, write only messages that fit
nbr_of_msgs_that_fit = self.__max_length - len(messages)
self.__messages.extend(messages[:nbr_of_msgs_that_fit])
del messages[:nbr_of_msgs_that_fit]
Here is the same code with comments refactored out by extracting two new methods:
class MessageBuffer:
def write_fitting(self, messages: list[Message]) -> None:
all_messages_fit = len(self.__messages) + len(messages) <= self.__max_length
if all_messages_fit:
self.__write_all(messages)
else:
self.__write_only_fitting(messages)
some actions using a shell script is easier. Because the syntax and commands in shell scripts can be
hard to understand, many developers tend to solve the problem by adding comments to scripts.
Next, alternative ways to make scripts more understandable without comments are presented. Let’s
consider the below example from one real-life script I have bumped into:
create_network() {
#create only if not existing yet
if [[ -z "$(docker network ls | grep $DOCKER_NETWORK_NAME )" ]];
then
echo Creating $DOCKER_NETWORK_NAME
docker network create $DOCKER_NETWORK_NAME
else
echo Network $DOCKER_NETWORK_NAME already exists
fi
}
• The comment was removed, and the earlier commented expression was moved to a well-named
function
• The negation in the expression was removed, and the contents of the then and else branches
were swapped
• Variable names were made camel case to enhance readability
createDockerNetwork() {
if dockerNetworkExists $networkName; then
echo Docker network $networkName already exists
else
echo Creating Docker network $networkName
docker network create $networkName
fi
}
If your script accepts arguments, give the arguments proper names, for example:
dataFilePathName=$1
schemaFilePathName=$2
The script reader does not have to remember what $1 or $2means, and you don’t have to insert any
comments to clarify the meaning of the arguments.
If you have a complex command in a Bash shell script, you should not attach a comment to it but
extract a function with a proper name to describe the command.
The below example contains a comment:
Coding Principles 289
updateHelmChartVersionInChartYamlFile() {
sed -i "s/^version:.*/version: $1/g" helm/service/Chart.yaml
}
updateHelmChartVersionInChartYamlFile $version
getFileLongestLineLength() {
echo $(awk '{ if (length($0) > max) max = length($0) } END { print max }' $1)
}
A single return statement with a named value at the end of a function clearly communicates the
return value semantics if the return value type does not directly communicate it. For example, if you
return a value of a primitive type like an integer or boolean from a function, it is not necessarily 100%
clear what the return value means. But when you return a named value at the end of the function,
the name of the returned variable communicates the semantics.
You might think that being unable to return a value in the middle of a function would make the
function less readable because of lots of nested if-statements. This is possible, but one should
remember that a function should be small. Aim to have a maximum of 5-9 lines of statements in
a single function. Following that rule, you never have a hell of nested if-statements inside a single
function.
Having a single return statement at the end of a function makes refactoring the function easier.
You can use automated refactoring tools provided by your IDE. It is always harder to extract a new
function from code containing a return statement. The same is true for loops with a break or continue
statement. It is easier to refactor code inside a loop that does not contain a break or continue statement.
In some cases, returning a single value at the end of a function makes the code more straightforward
and requires fewer lines of code.
Below is an example of a function with two return locations:
Coding Principles 290
class TransformThread(Thread):
# ...
return True
When analyzing the above function, we notice that it transforms an input message into an output
message. We can conclude that the function returns True on successful message transformation. We
can shorten the function by refactoring it to contain only one return statement. After refactoring, it
is 100% clear what the function return value means.
from threading import Thread
class TransformThread(Thread):
# ...
return msg_was_transformed
As an exception to this rule, you can have multiple return statements in a function when the function
has optimal length and would become too long if it is refactored to contain a single return statement.
Coding Principles 291
Additionally, it is required that the semantic meaning of the return value is clear from the function
name or the return type of the function. Below is an example of a function with multiple return
statements. It is also clear from the function name what the return value means. Also, the length of
the function is optimal: seven statements.
T = TypeVar('T')
class MyIterator(Protocol[T]):
def has_next_item(self) -> bool:
pass
def are_equal(
iterator: MyIterator[T],
another_iterator: MyIterator[T]
) -> bool:
while iterator.has_next_item():
if another_iterator.has_next_item():
if (
iterator.get_next_item()
!= another_iterator.get_next_item()
):
return False
else:
return False
return True
If we refactored the above code to contain a single return statement, the code would become too long
(10 statements) to fit in one function, as shown below. In this case, we should prefer the above code
over the below code.
def are_equal(
iterator: MyIterator[T],
another_iterator: MyIterator[T]
) -> bool:
iters_are_equal = True
while iterator.has_next_item():
if another_iterator.has_next_item():
if (
iterator.get_next_item()
!= another_iterator.get_next_item()
):
iters_are_equal = False
break
else:
iters_are_equal = False
break
return iters_are_equal
Coding Principles 292
As the second exception to this rule, you can use multiple return locations in a factory because you
know from the factory name what type of objects it creates. Below is an example factory with multiple
return statements:
from enum import Enum
class CarType(Enum):
AUDI = 1
BMW = 2
MERCEDES_BENZ = 3
class Car:
# ...
class Audi(Car):
#...
class Bmw(Car):
# ...
class MercedesBenz(Car):
# ...
class CarFactory:
def create_car(self, car_type: CarType) -> Car:
match car_type:
case CarType.AUDI:
return Audi()
case CarType.BWM:
return Bmw()
case CarType.MERCEDES_BENZ:
return MercedesBenz()
case _:
raise ValueError('Invalid car type')
You can manage with a trivial software component without types, but when it grows bigger and more
people are working with it, the benefits of static typing become evident.
Let’s analyze what potential problems using an untyped language might incur:
Coding Principles 293
You need to refactor even if you are writing code for a new software component. Refactoring is not
related to legacy codebases only. If you don’t refactor, you let technical debt grow in the software. The
main idea behind refactoring is that no one can write the perfect code on the first try. Refactoring
means that you change code without changing the actual functionality. After refactoring, most of
the tests should still pass, the code is organized differently, and you have a better object-oriented
design and improved naming of things. Refactoring does not usually affect integration tests but can
affect unit tests depending on the type and scale of refactoring. Keep this in mind when estimating
refactoring effort.
We don’t necessarily reserve any or enough time for refactoring when we plan things. When we
provide work estimates for epics, features, and user stories, we should be conscious of the need to
refactor and add some extra time to our initial work estimates (which don’t include refactoring).
Refactoring is work that is not necessarily understood clearly by the management. The management
should support the need to refactor even if it does not bring clear added value to an end user. But it
brings value by not letting the codebase rot and removing technical debt. If you have software with
lots of accumulated technical debt, it is costly to develop new features and maintain the software.
Coding Principles 295
Also, the quality of the software is lower, which can manifest in many bugs and lowered customer
satisfaction.
Below is a list of the most common code smells and refactoring techniques to solve them:
5.7.1: Rename
This is probably the single most used refactoring technique. You often don’t get the names right
on the first try and need to do renaming. Modern IDEs offer tools that help rename things in the
code: interfaces, classes, functions, and variables. The IDE’s renaming functionality is always better
than the plain old search-and-replace method. If using the search-and-replace method, you can
accidentally rename something that is not wanted to be renamed or don’t rename something that
should have been renamed.
# ...
if (
data_source_selector_is_open
and measure_selector_is_open
and dimension_selector_is_open
):
data_source_selector.style.height = f'{0.2 * available_height}px'
measure_selector.style.height = f'{0.4 * available_height}px'
dimension_selector.style.height = f'{0.4 * availableHeight}px'
elif (
not data_source_selector_is_open
and not measure_selector_is_open
and dimension_selector_is_open
):
dimension_selector.style.height = f'{available_height}px'
# ...
all_selectors_are_open = (
data_source_selector_is_open
and measure_selector_is_open
and dimension_selector_is_open
)
only_dimension_selector_is_open = (
not data_source_selector_is_open
and not measure_selector_is_open
and dimension_selector_is_open
)
if all_selectors_are_open:
data_source_selector.style.height = f'{0.2 * available_height}px'
measure_selector.style.height = f'{0.4 * available_height}px'
dimension_selector.style.height = f'{0.4 * availableHeight}px'
elif only_dimension_selector_is_open:
dimension_selector.style.height = f'{available_height}px'
class AvroFieldSchema:
# ...
It can be challenging to understand what the boolean expression means. We could improve the
function by adding a comment: (We assume that each field name has a root namespace that cannot
contain a dot character)
class AvroFieldSchema:
# ...
But we should not write comments because comments are never 100% trustworthy. It is possible that
a comment and the related code are not in synchrony: someone has changed the function without
updating the comment or modified only the comment but did not change the function. Let’s refactor
the above example by removing the comment and extracting multiple constants. The below function
is longer than the original, but it is, of course, more readable. If you look at the last two statements of
the method, you can understand in what case two field schemas are equal. It should be the compiler’s
job to make the below longer version of the function as performant as the original function.
class AvroFieldSchema:
# ...
other_field_name_without_root_ns = other_field_schema.name[
other_field_schema.name.find('.') + 1 :
]
types_and_names_without_root_ns_are_equal = (
self.__type == other_field_schema.type
and name_without_root_ns == other_field_name_without_root_ns
)
return types_and_names_without_root_ns_are_equal
Coding Principles 298
class Chart(Protocol):
def do_something(self) -> None:
pass
class ColumnChart(Chart):
def do_something(self) -> None:
# do this ...
class PieChart(Chart):
def do_something(self) -> None:
# do that ...
class GeographicMapChart(Chart):
def do_something(self) -> None:
# do a third thing
Suppose you are implementing a data visualization application and have many places in your code
where you check the chart type and need to introduce a new chart type. It could mean you must add
a new case or elif statement in many places in the code. This approach is very error-prone and is
called shotgun surgery because you need to find all the places in the codebase where code needs to be
Coding Principles 299
modified. What you should do is conduct proper object-oriented design and introduce a new chart
class containing the new functionality instead of introducing that new functionality by modifying
code in multiple places.
class KafkaConsumer:
def __init__(
self,
brokers: list[str],
topics: list[str],
extra_config_entries: list[str],
tls_is_used: bool,
cert_should_be_verified: bool,
ca_file_path_name: str,
cert_file_path_name: str,
key_file_path_name: str
):
# ...
Let’s group the Transport Layer Security (TLS) related parameters to a parameter class named
TlsOptions:
class TlsOptions:
def __init__(
self,
tls_is_used: bool,
cert_should_be_verified: bool,
ca_file_path_name: str,
cert_file_path_name: str,
key_file_path_name: str
):
# ...
Now we can modify the KafkaConsumer constructor to utilize the TlsOptions parameter class:
Coding Principles 300
class KafkaConsumer:
def __init__(
self,
brokers: list[str],
topics: list[str],
extra_config_entries: list[str],
tls_options: TlsOptions
):
# ...
import os
return behave_test_folder
Let’s refactor the above code so that the if and else statements are inverted:
Coding Principles 301
if host_mount_folder is None:
behave_test_folder = os.getcwd()
else:
final_host_mount_folder = host_mount_folder
if host_mount_folder.startswith('/mnt/c/'):
final_host_mount_folder = host_mount_folder.replace(
'/mnt/c/', '/c/', 1
)
behave_test_folder = (
final_host_mount_folder + '/' + relative_test_folder
)
return behave_test_folder
We should not have a negation in the if-statement’s condition. Let’s refactor the above example:
Static code analysis tools find bugs and design-related issues on your behalf. Use multiple static code
analysis tools to get the full benefit. Different tools might detect different issues. Using static code
analysis tools frees people’s time in code reviews to focus on things that automation cannot tackle.
Below is a list of some common static code analysis tools for Python:
• PyLint
• Ruff
• Sonarlint
• SonarQube/SonarCloud
• Black (Code formatter)
Coding Principles 302
Infrastructure and deployment code should be treated the same way as source code. Remember to run
static code analysis tools on your infrastructure and deployment code, too. Several tools are available
for analyzing infrastructure and deployment code, like Checkcov, which can be used for analyzing
Terraform, Kubernetes, and Helm code. Helm tool contains a linting command to analyze Helm chart
files, and Hadolint is a tool for analyzing Dockerfiles statically.
Issue Description/Solution
Chain of instance of checks This issue indicates a chain of
conditionals in favor of object-oriented
design. Use the replace conditionals with
polymorphism refactoring technique to
solve this issue.
Feature envy Use the don’t ask, tell principle from the
previous chapter to solve this issue.
Use of concrete classes Use the program against interfaces
principle from the previous chapter to
solve this issue.
Assignment to a function argument Don’t modify function arguments but
introduce a new variable.
Commented-out code Remove the commented-out code. If you
need that piece of code in the future, it is
available in the version control system
forever.
Const correctness Make attributes and variables @final
whenever possible to achieve
immutability and avoid accidental
modifications
Nested match statement Use match statements mainly only in
factories. Do not nest them.
Nested conditional expression (ternary Conditional expression should not be
operator) nested because it greatly hinders the
code readability.
Coding Principles 303
Issue Description/Solution
Overly complex boolean expression Split the boolean expression into parts
and introduce constants to store the
parts and the final expression
Expression can be simplified This can be refactored automatically by
the IDE.
Match statement without default branch Always introduce a default branch and
raise an exception there. Otherwise,
when you are using a match statement
with an enum, you might encounter
strange problems after adding a new
enum value that is not handled by the
matach statement.
Law of Demeter The object knows too much. It is coupled
to the dependencies of another object,
which creates additional coupling and
makes code harder to change.
Reuse of local variable Instead of reusing a variable for a
different purpose, introduce a new
variable. That new variable can be
named appropriately to describe its
purpose.
Scope of variable is too broad Introduce a variable only just before it is
needed.
Protected field Subclasses can modify the protected
state of the superclass without the
superclass being able to control that.
This is an indication of breaking the
encapsulation and should be avoided.
Breaking the encapsulation: Return of Use the don’t leak modifiable internal
modifiable/mutable field state outside an object principle from the
previous chapter to solve this issue.
Breaking the encapsulation: Assignment Use the don’t assign from a method
from a method parameter to a parameter to a modifiable field principle
modifiable/mutable field from the previous chapter to solve this
issue.
Non-constant public field Anyone can modify a public field. This
breaks the encapsulation and should be
avoided.
Overly broad except-block This can indicate a wrong design. Don’t
expect the language’s base exception
class if you should only catch your
application’s base error class, for
example. Read more about handling
exceptions in the next section.
Coding Principles 304
An error is something that can happen, and one should be prepared for it. An
exception is something that should never happen.
You define errors in your code and raise them in your functions. For example, if you try to write to a
file, you must be prepared for the error that the disk is full, or if you are reading a file, you must be
prepared for the error that the file does not exist (anymore).
Many errors are recoverable. You can delete files from the disk to free up some space to write to a file.
Or, in case a file is not found, you can give a “file not found” error to the user, who can then retry the
operation using a different file name, for example. Exceptions are something you don’t usually define
in your application, but the system raises them in exceptional situations, like when a programming
error is encountered.
An exception can be raised, for example, when memory is low, and memory allocation cannot be
performed, or when a programming error results in an array index out of bounds or a dict not
containing a specific key. When an exception is thrown, the program cannot continue executing
normally and might need to terminate. This is why many exceptions can be categorized as
unrecoverable errors. In some cases, it is possible to recover from exceptions. Suppose a web service
encounters a null pointer exception while handling an HTTP request. In that case, you can terminate
the handling of the current request, return an error response to the client, and continue handling
further requests normally. It depends on the software component how it should handle exceptional
situations.
Errors define situations where the execution of a function fails for some reason. Typical examples of
errors are a file not found error, an error in sending an HTTP request to a remote service, or failing
to parse a configuration file. Suppose a function can raise an error. Depending on the error, the
function caller can decide how to handle the error. In case of transient errors, like a failing network
request, the function caller can wait a while and call the function again. Or, the function caller
can use a default value. For example, if a function tries to load a configuration file that does not
exist, it can use some default configuration instead. And in some cases, the function caller cannot do
anything but leave the error unhandled or expect the error but raise another error at a higher level of
abstraction. Suppose a function tries to load a configuration file, but the loading fails, and no default
configuration exists. In that case, the function cannot do anything but pass the error to its caller.
Eventually, this error bubbles up in the call stack, and the whole process is terminated due to the
inability to load the configuration. This is because the configuration is needed to run the application.
Without configuration, the application cannot do anything but exit.
When defining error classes, define a base error class for your software component. You can name the
base error class according to the name of the software component. For example, for the data exporter
Coding Principles 305
microservice, you can define a DataExporterError (or DataExporterServiceError) base error class or
for common-utils-lib you can define CommonUtilsError (or CommonUtilsLibError) and for sales-item-
service you can define SalesItemServiceError. Depending on the case, you remove either remove or
keep the software component type name in the base error class name. The popular requests Python
package implements this convention. It defines a requests.RequestException that is the base class
for all other errors the library methods can raise. For each function that can raise an error, define
a base error class at the same abstraction level as the function. That error class should extend the
software component’s base error class. For example, if you have a parse(config_str) function in the
ConfigParser class, define a base error class for the function inside the class with the name ParseError,
i.e. ConfigParser.ParseError. If you have a read_file function in the FileReader class, define a base
error class in the FileReader class with the name ReadFileError, i.e. FileReader.ReadFileError. If all
the methods in a class can raise the same error, it is enough to define only one error at the class level.
For example, if you have a HttpClient class where all methods like get, post, put etc. can raise an
error, you can only define a single Error error class in the HttpClient class
Below is an example of errors defined for the data exporter microservice:
class DataExporterError(Exception):
pass
class FileReader:
class ReadFileError(DataExporterError):
pass
class ConfigParser:
class ParserError(DataExporterError):
pass
Following the previous rules makes it easy to catch errors in the code because you can infer the
error class name from the called method (and class) name. In the below example, we can infer the
ReadFileError error class name from the read_file method name:
try:
file_contents = file_reader.read_file(...)
except FileReader.ReadFileError as error:
# Handle error ...
You can also catch all user-defined errors using the software component’s base error class in the expect
clause.
Coding Principles 306
try:
config_string = file_reader.read_file(...)
return config_parser.parse(config_string)
except DataExporterError as error:
# Handle error situation
Don’t catch the language’s base exception class or some other too-generic exception class because that
will catch, in addition to all user-defined errors, exceptions, like MemoryError or ZeroDivisionError,
which is probably not what you want. So, do not catch a too-generic exception class like this:
try:
config_string = file_reader.read_file(...)
return config_parser.parse(config_string)
except BaseException as error:
# Don't do this!
Catch all exceptions only in special places in your code, like in the main function or the main loop,
like the loop in a web service processing HTTP requests or the main loop of a thread. Below is an
example of correctly catching the language’s base exception class in the main function. When you
catch an unrecoverable exception in the main function, log it and exit the process with an appropriate
error code. When you catch an unrecoverable error in a main loop, log it and continue the loop if
possible.
try:
application.run(...)
except BaseException exception
logger.log(exception)
sys.exit(1)
else:
sys.exit(0)
Using the above-described rules, you can make your code future-proof or forward-compatible so that
adding new errors to be thrown from a function in the future is possible. Let’s say that you are using
a fetch_config function like this:
try:
configuration = config_fetcher.fetch_config(url)
except ConfigFetcher.FetchConfigError as error:
# Handle error ...
Your code should still work if a new type of error is thrown from the fetch_config function. Let’s say
that the following new errors could be thrown from the fetch_config function:
When classes for these new errors are implemented, they must extend the function’s base error class,
in this case, the FetchConfigError class. Below are the new error classes defined:
Coding Principles 307
class ConfigFetcher:
class FetchConfigError(DataExporterError):
pass
class MalformedUrlError(FetchConfigError):
pass
class ServerNotFoundError(FetchConfigError):
pass
class TimeoutError(FetchConfigError):
pass
You can later enhance your code to handle different errors raised from the fetch_config method
differently. For example, you might want to handle a TimeoutError so that the function will wait a
while and then retry the operation because the error can be transient:
try:
configuration = config_fetcher.fetch_config(url)
except ConfigFetcher.TimeoutError as error):
# Retry after a while
except ConfigFetcher.MalformedUrlError as error:
# Inform caller that URL should be checked
except ConfigFetcher.ServerNotFoundError as error:
# Inform caller that URL host/port cannot be reached
except ConfigFetcher.FetchConfigError as error:
# Handle possible other error situations
# This will catch any new exception that could be thrown
# from the 'fetchConfig' function in the future
In the above examples, we handled raised errors correctly, but you can easily forget to handle a raised
error. This is because nothing in the function signature tells you whether the function can throw or
not. The only way to find out is to check the documentation (if available) or investigate the source
code (if available). This is one of the biggest problems regarding error handling because you must
know and remember that a function can raise an error, and you must remember to catch and handle
errors. You don’t always want to handle an error immediately, but still, you must be aware that the
error will bubble up in the call stack and should be dealt with eventually somewhere in the code.
Below is an example extracted from the documentation of the popular Python requests package:
Coding Principles 308
import requests
r = requests.get('https://api.github.com/events')
r.json()
# [{'repository': {'open_issues': 0, 'url': 'https://github.com/...
Did you know that both requests.get and r.json can raise an error? This example unfortunately does
not include error handling at all. If you copy-paste the above code sample directly to your production
code, it is possible that you forget to handle errors. If you go to the API reference documentation of
the requests package, you can find the documentation for the get method. That documentation (at
the time of writing this book) does not tell that the method can raise an error. The documentation
only speaks about the method parameters and return value and its type. Only if you scroll down the
documentation page, you find a section about exceptions. But what if you don’t scroll down? You
might end up thinking that the method does not raise an error. The get method documentation should
be corrected so that the it tells that the method can raise an error and contains a link to the section
where the possible errors are described.
The above described problem can be mitigated at least on some level when practising Test-driven
development (TDD). TDD will be described in the next chapter which covers testing related principles.
In TDD, you define the tests before the implementation which forces you to think about also error
scenarios and make tests for them. When you have tests for error scenarios, it is not possible to leave
those scenarios unhandled in the actual implementation code.
One of the best solutions to the problem that error handling might be forgotten is to make raising
errors more explicit:
Use a ‘try’ prefix in a function name if the function can raise an error.
This is a straightforward rule. If a function can raise an error, name the function so that its name
starts with try. This makes it clear to every caller that the function can raise an error, and the caller
should be prepared for that. For the caller of the function, there are three alternatives to deal with a
thrown error:
1) Catch the base error class of the called function (or software component) and handle the error,
e.g., catch DataFetcher.FetchDataError if you are calling a method named try_fetch_data in a
class named DataFetcher.
2) Catch the base error class of the called function (or software component) and raise a new error
on a higher level of abstraction. Now you also have to name the calling function with a try
prefix.
3) Don’t catch errors. Let them propagate upwards in the call stack. Now you also have to name
the calling function with a try prefix.
class ConfigFetcher:
def fetch_config(self, url: str) -> Config:
try:
config_str = self.__data_fetcher.try_fetch_data(url)
return self.__config_parser.try_parse(config_str)
except (
DataFetcher.FetchDataError,
ConfigParser.ParseError
) as error:
# You could also catch errors in two different
# except blocks
# You could also catch the base error class 'DataExporterError'
# of the software component
class ConfigFetcher:
class FetchConfigError(DataExporterError):
pass
class ConfigFetcher:
def try_fetch_config(self, url: str) -> Config:
# No try-except, all raised errors from both try_fetch_data
# and try_parse method calls propagate
# to the caller and
# this function must be named with the 'try' prefix
# to indicate that it can raise an error
config_str = self.__data_fetcher.try_fetch_data(url)
return self.__config_parser.try_parse(config_str)
class DataExporter:
def initialize(self) -> None:
try:
config = self.__config_fetcher.try_fetch_config(url)
except DataExporterError as error:
# In this case you must catch the base error class of
# the software component (DataExporterError), because
# you don't know what errors try_fetch_config can
# raise, because no FetchConfigError class
# has been defined in the ConfigFetcher class
If we go back to the requests package usage example, the error-raising methods requests.get and
Response.json could be renamed to requests.try_get and Response.try_parse_json. That would
make the earlier example to look like the following:
import requests
r = requests.try_get('https://api.github.com/events')
r.try_parse_json()
# [{'repository': {'open_issues': 0, 'url': 'https://github.com/...
Now we can see that the two methods can raise an error, so we can put them inside a try-block:
import requests
try:
r = requests.try_get('https://api.github.com/events')
r.try_parse_json()
# [{'repository': {'open_issues': 0, 'url': 'https://github.com/...
except ...
# ...
To make the try-prefix convention even better, a linting rule that enforces the correct naming of error-
raising functions could be developed. The rule should force the function name to have a try prefix if
the function raises or propagates errors. A function propagates errors when it calls an error-raising
(try-prefixed) method outside a try-except block.
You can also create a library that has try-prefixed functions that wrap error-raising functions that
don’t follow the try-prefix rule:
Coding Principles 311
class JsonParser:
class ParseError(Exception):
pass
@staticmethod
def try_parse(
s,
*,
cls=None,
object_hook=None,
parse_float=None,
parse_int=None,
parse_constant=None,
object_pairs_hook=None,
**kwargs
):
try:
return json.loads(s)
except json.JSONDecodeError as error:
raise JsonParser.ParseError(error)
Now if you use the JsonParser’s try_parse method, you can easily infer the class name of the possibly
raised errors without the need to consult any documentation.
When using a web framework, the framework usually provides an error-handling mechanism. The
framework catches all possible errors when processing a request and maps them to HTTP responses
with HTTP status codes indicating a failure. Typically the default status code is 500 Internal Server
Error. When you utilize the web framework’s error-handling mechanism, there is not a big benefit
in naming error-raising functions with the try-prefix, because it won’t be problematic if you forget
to catch an error and many times this is what you want to do, pass the error to the web framework’s
error handler. And usually you provide your own error handler instead of using the default one, so
you get responses in the format you want. So, if you want, you can opt out of the try-prefix rule, but
of course you can use it for the sake of consistency. You can also put error classes in own modules
and put them in a specific package (directory).
It is usually a good practice to document the used error handling mechanism in the software
component documentation.
Best way to avoid forgetting to handle errors is to practice rigorous test driven development (TDD),
which is described in the next chapter. Another great way to not forget handling errors is to walk
through the code line by line and check if the particular line can produce an error. If it can produce
an error, what kind of error and are there multiple different errors the line can possibly produce. Let’s
have an example with the following code (we focus only on possible errors, not what the function
does):
Coding Principles 312
import requests
from jwt import PyJWKClient, decode
class JwtAuthorizer:
# ...
def __try_get_jwt_claims(
self, auth_header: str | None
) -> dict[str, Any]:
if not self.__jwks_client:
oidc_config_response = requests.get(self.__oidc_config_url)
oidc_config = oidc_config_response.json()
self.__jwks_client = PyJWKClient(oidc_config['jwks_uri'])
The code on the first line cannot produce an error. On the second line, the requests.get method can
raise an error on connection failure, for example. Can it produce other errors? It can produce the
following errors:
It can also produce an error response, e.g. an internal server error. Our code does not handle that
currently which is why we should add the following line after the requests.get method call: oidc_-
config_response.raise_for_status() which can raise an HttpError if the response status code is >=
400. The third line can raise a JSONDecodeError if the response is not valid JSON. The fourth line can
raise a KeyError, because it is possible that the key jwks_uri does not exists in the response JSON.
The fifth line can raise an IndexError, because the list returned by the split does not necessarily
does not have an element at the index one. Also the sixth line can raise an error, when the JWKS
client cannot connect to the IAM system or the JWT is invalid. And the second last line can raise a
PyJWKClientError when the JWT is invalid. As a summary, all the lines in the above code can produce
at least one kind of error except the first and last lines.
Let’s refactor the code to implement error handling instead of passing all possible errors and
exceptions to the caller:
Coding Principles 313
import requests
from jwt import PyJWKClient, PyJWKClientError, decode
from jwt.exceptions import InvalidTokenError
class JwtAuthorizer:
class GetJwtClaimsError(Exception):
pass
def __try_get_jwt_claims(
self, auth_header: str | None
) -> dict[str, Any]:
try:
if not self.__jwks_client:
oidc_config_response = requests.get(self.__oidc_config_url)
oidc_config_response.raise_for_status()
oidc_config = oidc_config_response.json()
self.__jwks_client = PyJWKClient(oidc_config['jwks_uri'])
I suggest that you make yourself a habit that you walk through the code of the function line by line
once you think it is ready to find out if you have accidentally missed handling of some error.
You can return a failure indicator from a failable function when the function does not need to return
any additional value. It is enough to return a failure indicator from the function when there is no
need to return any specific error code or message. This can be because there is only one reason the
function can fail, or function callers are not interested in error details. To return a failure indicator,
return a boolean value from the function: True means a successful operation, and False indicates a
failure:
return task_was_performed
Suppose a function should return a value, but the function call can fail, and there is precisely one
cause why the function call can fail. In this case, return an optional value from the function. In the
below example, getting a value from the cache can only fail when no value for a specific key is stored
in the cache. We don’t need to return any error code or message.
TKey = TypeVar('TKey')
TValue = TypeVar('TValue')
Or if you want to use more functional approach, return an Optional object. (The Optional class was
defined in the previous chapter)
Coding Principles 315
TKey = TypeVar('TKey')
TValue = TypeVar('TValue')
When you need to provide details about an error to a function caller, you can return an error object
from the function:
@dataclass
class BackendError:
http_status_code: int
error_code: int
message: str
If a function does not return any value but can produce an error, you can return either an error object
or None:
T = TypeVar('T', bound=Entity)
class DataStore:
async def update*entity(
self,
id*: int,
entity: T
) -> Awaitable[BackendError | None]:
# ...
Alternatively, return an optional error as shown below. (The Optional class used in these examples
was defined in the previous chapter. As you can see from imports, we use that class not the Optional
from the typing module)
Coding Principles 316
T = TypeVar('T', bound=Entity)
class DataStore:
async def update*entity(
self,
id*: int,
entity: T
) -> Awaitable[Optional[BackendError]]:
# ...
Suppose a function needs to return a value or an error. In that case, you can use a 2-tuple (i.e., a
pair) type, where the first value in the tuple is the actual value or None in case of an error and the
second value in the tuple is an error object or None value in case of a successful operation. Below is
an example.
T = TypeVar('T', bound=Entity)
class DataStore:
async def create_entity(
self,
entity: T
) -> Awaitable[Union[(T, None), (None, BackendError)]]:
# ...
If we want to make our method more functional, we should return an Either type from it, but Python
does not have that. Either type contains one of two values, either a left value or a right value. The
Either type can be defined as follows. (The Optional class used in the below example is the same as
defined in the previous chapter, not the Optional from the typing module).
Coding Principles 317
TLeft = TypeVar('TLeft')
TRight = TypeVar('TRight')
T = TypeVar('T')
U = TypeVar('U')
class PrivateConstructor(type):
def __call__(
cls: type[T],
*args: tuple[Any, ...],
**kwargs: dict[str, Any]
):
raise TypeError('Constructor is private')
def _create(
cls: type[T],
*args: tuple[Any, ...],
**kwargs: dict[str, Any]
) -> T:
return super().__call__(*args, **kwargs)
@classmethod
def with_left(cls, value: TLeft) -> 'Either[TLeft, TRight]':
return cls._create(Optional.of(value), Optional.empty())
@classmethod
def with_right(cls, value: TRight) -> 'Either[TLeft, TRight]':
return cls._create(Optional.empty(), Optional.of(value))
def map_left(
self,
to_value: Callable[[TLeft], U]
) -> 'Either[U, TRight]':
return Either._create(
self.__maybe_left_value.map(to_value),
self.__maybe_right_value
)
Coding Principles 318
def map_right(
self,
to_value: Callable[[TRight], U]
) -> 'Either[TLeft, U]':
return Either._create(
self.__maybe_left_value,
self.__maybe_right_value.map(to_value)
)
def map(
self,
left_to_value: Callable[[TLeft], U],
right_to_value: Callable[[TRight], U]
) -> U:
return self.__maybe_left_value.map(left_to_value).or_else_get(
lambda: self.__maybe_right_value.map(right_to_value).try_get()
)
def apply(
self,
consume_left_value: Callable[[TLeft], None],
consume_right_value: Callable[[TRight], None]
) -> None:
self.__maybe_left_value.if_present(consume_left_value)
self.__maybe_right_value.if_present(consume_right_value)
class Error(Exception):
pass
Now we can use the new Either type and rewrite the example as follows:
Coding Principles 319
T = TypeVar('T', bound=Entity)
class DataStore:
async def create_entity(
self,
entity: T
) -> Awaitable[Either[T, BackendError]]:
# ...
You can adapt to a desired error-handling mechanism by creating an adapter class. For example, if
a library has a error-raising method, you can create an adapter class with a method returning an
optional value. The below Url class has a try_create_url factory method that can raise an error:
class Url:
# ...
class CreateUrlError(Exception):
pass
@classmethod
def try_create_url(
cls,
scheme: str,
host: str,
port: int,
path: str,
query: str
) -> 'Url':
# ...
# Potentially raise a CreateError here ...
class UrlFactory:
def create_url(
self,
scheme: str,
host: str,
port: int,
path: str,
query: str
) -> Url | None:
try:
Coding Principles 320
If the code using the UrlFactory is interested in the error details, we can also create a method that
does not raise an error but returns either a value or an error:
class UrlFactory:
def create_url_or_error(
self,
scheme: str,
host: str,
port: int,
path: str,
query: str
) -> Union[(Url, None), (None, Url.CreateError)]:
try:
return (
Url.try_create_url(scheme, host, port, path, query),
None,
)
except Url.CreateUrlError as error:
return None, error
The below Failable class can be used in functional error handling. A Failable object represents
either a value of type T or an instance of the Exception class, i.e. Failable[T] is same as Either[T,
Exception]
T = TypeVar('T')
class PrivateConstructor(type):
def __call__(
cls: type[T],
*args: tuple[Any, ...],
**kwargs: dict[str, Any]
):
raise TypeError('Constructor is private')
def _create(
cls: type[T],
Coding Principles 321
@classmethod
def with_value(cls, value: T) -> 'Failable[T]':
return cls._create(Either.with_left(value))
@classmethod
def with_error(cls, error: Exception) -> 'Failable[T]':
return cls._create(Either.with_right(error))
def map_value(
self,
to_value: Callable[[T], U]
) -> 'Failable[U]':
return Failable._create(self.__value_or_error.map_left(to_value))
def map_error(
self,
to_error: Callable[[Exception], Exception]
) -> 'Failable[T]':
if self.__value_or_error.has_left_value():
error = to_error(Exception())
return Failable.with_error(error)
else:
return Failable._create(
self.__value_or_error.map_right(to_error)
)
class Application:
# ...
class InitializeError(DataExporterError):
pass
The benefit of the above functional approach is that it is shorter than an entire try-catch block. The
above functional approach is also as understandable as a try-catch block. Remember that you should
write the shortest, most understandable code. When a method returns a Failable instance, you don’t
have to name the method with the try prefix because the method does not throw. The call to the
or_raise method on Failable is used to convert the functional code back to imperative code.
You can also use other methods of the Failable class. For example, a default value can be returned
with the or_else method:
class Application:
# ...
You can also transform multiple imperative error-raising statements into functional failable state-
ments. For example, instead of writing:
class Application:
# ...
class InitializeError(DataExporterError):
pass
class Application:
# ...
class InitializeError(DataExporterError):
pass
The above functional code is shorter than the same imperative code, but it is less readable, for which
reason you might want to use the imperative approach instead of the functional approach.
It can be error-prone to use error-raising imperative code together with functional programming
constructs. Let’s assume we have the below code that reads and parses multiple configuration files
to a single configuration object using a functional programming construct reduce. We have named
the config reading function try_read_config with the try-prefix, because it can raise an error. When
we use the reduce function, we must remember to surround it with an try-except block, because the
reduce function will call the try_read_config function that can throw.
import json
from functools import reduce
from typing import Any
def try_read_config(
accumulated_config: dict[str, Any],
config_file_path_name: str
):
with open(config_file_path_name) as config_file:
config_json = config_file.read()
config = json.loads(config_json)
return accumulated_config | config
def get_config(
config_file_path_names: list[str]
) -> dict[str, Any]:
try:
return reduce(try_read_config, config_file_path_names, {})
except:
# ...
We could turn the above example to more functional by making the get_config function to return a
Failable instance:
Coding Principles 324
import json
from functools import reduce
from typing import Any
def to_config_or_error(
accum_config_or_error: Failable[dict[str, Any]],
config_file_path_name: str
) -> Failable[dict[str, Any]]:
try:
with open(config_file_path_name) as config_file:
config_json = config_file.read()
config = json.loads(config_json)
return accum_config_or_error.map_value(
lambda accum_config: accum_config | config
)
except (OSError, json.JSONDecodeError) as error:
return accum_config_or_error.map_error(
lambda accum_error: RuntimeError(
f'{str(accum_error)}\n{config_file_path_name}: {str(error)}'
)
)
def get_config(
config_file_path_names: list[str]
) -> Failable[dict[str, Any]]:
return reduce(
to_config_or_error,
config_file_path_names,
Failable.with_value({})
)
{
"foo": 1,
"bar": 2
}
{
"xyz": 3
}
Let’s introduce an error (a missing comma after the first property) in the config1.json file:
{
"foo": 1
"bar": 2
}
In the first example, there should be ‘<’ instead of ‘<=’, and in the latter example, there should be ‘<=’
instead of ‘<’. Fortunately, the above mistakes can be avoided in Python:
Coding Principles 326
In Python’s range function, you must remember it starts from zero and the end of the range is
exclusive. Both of these can create an off-by-one error, if you don’t remember that and assume
the start to be one or the end of the range is inclusive. The off-by-one errors is caused by the thing
that given a range people by default assume that is inclusive at both ends. So a range(6) gives values
from 0 to 5, not from 1 to 6. And range(1, 6) gives values from 1 to 5, not from 1 to 6. The same
thing is with slices, e.g. values[:6] starts from index 0 and ends at index 5. If you want a slice that
is all but the last item, you can use negative indexing: values[:-1] gives of values except the last
one. Using -1 is much more error-safe than using values[:len(values) - 1] which might produce
an off-by-one error if you forget the -1. And similarly using values[:-2] is less error-prone than
using values[:len(values) - 2]. You can also use negative indexing, e.g. to get the last value with
values[-1] instead of values[len(values) - 1]. You can think that a negative index is a one-based
index starting from the end of a list.
Additionally, unit tests are your friend when trying to spot off-by-one errors. So remember to write
unit tests for the edge cases, too.
We all have done it, and we have done it hundreds of times: googled for answers. Usually, you find
good resources by googling, but the problem often is that examples in the googled results are not
necessarily production quality. One specific thing missing in them is error handling. If you copy and
paste code from a website, it is possible that errors are not handled appropriately. You should always
analyze the copy-pasted code to see if error handling needs to be added.
When you provide answers for other people, try to make the code as production-like as possible. In
Stack Overflow, you find the most up-voted answer right below the question. If the answer is missing
error handling, you can comment on that and let the author improve their answer. You can also up-
vote an answer that seems the most production ready. Usually, the most up-voted answers are pretty
old. For this reason, it is useful to scroll down to see if a more modern solution fits your need better.
And you can also up-vote that more modern solution so it will become ranked higher in the list of
answers.
Regarding open source libraries, the first examples in their documentation can describe only the
“happy path” usage scenario, and error handling is described only in later parts of the documentation.
This can cause problems if you copy-paste code from the “happy path” example and forget to add error
Coding Principles 327
handling. For this reason, open-source library authors should give production-quality examples early
in the documentation.
Regarding generative AI and ChatGPT, I have couple of experiences. I asked ChatGPT to generate
simple Django code. The generated code was about 95% correct, but it did not work. The problem was
that ChatGPT forgot to generate code for generating the database tables (makemigrations, migrate).
If you were inexperienced with the Django framework that kind of bug might difficult to solve. In
a scenario like that, I would advice you to discover the problem first and then ask from ChatGPT to
solve the problem for you.
My other experiment with ChatGPT was to generate GraphQL server code using Ariadne library. The
ChatGPT generated code was for an old version of ariadne and did not work correctly with a newer
version of the Ariadne library. (Notice that the data used to train ChatGPT contains more older than
newer data. ChatGPT did not know to prioritize the less and newer data over the older and more
data.) It also generated some lines of code in wrong order, which made the GraphQL api not work at
all. It took quite of a lot of debugging for such a small program to finally find what was wrong: The
executable schema was create before query resolver. It should have been create only after defining
the resolver.
When using ChatGPT or other generative AI tool, you should familiarize yourself with the generated
code, otherwise you don’t know what your program is doing and if the AI generated code contains
bugs(s), those will be hard to find, because you don’t have clear understanding what the code is
actually doing. Don’t let the AI be the master, but an apprentice.
Best way to prevent bugs related to code taken from the web is to practice Test driven development
(TDD). TDD is described in the next chapter. But the idea behind TDD is to specify the function first
and write unit test cases for different scenarios there are: edge/corner cases, error scenarios, security
scenarios. For example, let’s say that you are new to Python and google for a code snippet to perform
a HTTP request to an API endpoint. Once you have googled for the code, you can copy-paste the
code into your function. Most probably now error scenarios are not handled. What you should do
also is to practice TDD and write unit test cases for different scenarios like, what if the remote server
cannot be contacted or the contact results in timeout or what if the remote server responds with an
error (a HTTP response with a status code greater than or equal to 400). What if you need to parse the
result from the API (e.g. parse JSON) and it fails? Once you have written a unit test case for all those
scenarios, you can be sure that error handling in the actual function implementation is not forgotten.
Optimizations should primarily target only the busy loop or loops in a software component. Busy
loops are the loops in threads that execute over and over again, possibly thousands or more iterations
in a second. Performance optimization should not target functionality that executes only once or a
couple of times during the software component’s lifetime, and running that functionality does not take
a long time. For example, an application can have configuration reading and parsing functionality
when it starts. This functionality takes a short time to execute. It is not reasonable to optimize that
functionality because it runs only once. It does not matter if you can read and parse the configuration
in 200 or 300 milliseconds, even if there is a 50% difference in performance.
Let’s use the data exporter microservice as an example. Our data exporter microservice consists of
input, transformer, and output parts. The input part reads messages from a data source. We cannot
affect the message reading part if we use a 3rd party library for that purpose. Of course, if multiple
3rd party libraries are available, it is possible to craft performance tests and evaluate which 3rd party
library offers the best performance. If there are several 3rd party libraries available for the same
functionality, we tend to use the most popular library or a library we know beforehand. If performance
is an issue, we should evaluate different libraries and compare their performances.
The data exporter microservice has the following functionality in its busy loop: decode an input
message to an internal message, perform transformations, and encode an output message. Decoding
an input message requires decoding each field in the message. Let’s say there are 5000 messages
handled per second, and each message has 100 fields. During one second, 50000 fields must be decoded.
This reveals that the optimization of the decoding functionality is crucial. The same applies to output
message encoding. We at Nokia have implemented the decoding and encoding Avro binary fields
ourselves. We were able to make them faster than what was provided by a 3rd party library.
Coding Principles 329
Removing unnecessary functionality is something that will boost performance. You should stop to
think critically about your software component: Is my software component doing only the necessary
things considering all circumstances?
Let’s consider the data exporter’s functionality. It is currently decoding an input message to an
internal message. This internal message is used when making various transformations to the data.
Transformed data is encoded to a wanted output format. The contents of the final output message
can be a small subset of the original input message. This means that only a tiny part of the decoded
message is used. In that case, it is unnecessary to decode all the fields of an input message if, for
example, only 10% of the fields are used in the transformations and output messages. By removing
unnecessary decoding, we can improve the performance of the data exporter microservice.
In garbage-collected languages like Python, the benefit of using an object pool is clear from the
garbage-collection point of view. In the object pool pattern, objects are created only once and then
reused. This will take pressure away from garbage collection. If we didn’t use an object pool, new
objects could be created in a busy loop repeatedly, and soon after they were created, they could be
discarded. This would cause many objects to be made available for garbage collection in a short period
of time. Garbage collection takes processor time, and if the garbage collector has a lot of garbage to
collect, it can slow the application down for an unknown duration at unknown intervals.
If you are performing number crunching in you application, do not use the regular Python data
structures, but find a suitable library, like numpy, that contains optimized data structures for a
particular purpose.
Choose an algorithm with reduced complexity as measured using the Big-O notation. This can reduce
CPU/memory used. In the below example we are using find algorithm with a list:
The above algorithm must traverse the list which makes it slower compared to a find algorithm with
a set:
Coding Principles 330
The below algorithm (list comprehension) will generate a list of 20000 values:
If we don’t need all the 20000 values in the memory at the same time, we could use a different
algorithm (generator expression) which consumes much less memory, because not all the 20000 values
are in the memory:
The type of the values object in the above example is Generator which inherits from Iterator. You
can use the values anywhere an iterator is expected.
If you have an expensive pure function that always returns the same result for the same input without
any side effects, you can benefit from caching the function results. You can cache function results
using either the @cache or lru_cache decorator, for example:
print(make_expensive_calc(1))
# After the first call,
# the function result for the input value 1
# will be cached
print(make_expensive_calc(1))
# The result of function call is fetched from the cache
@cache is same as @lru_cache(maxsize=None), i.e. the cache does not have maximum size limit.
If you are reading/writing very large files, you can benefit from setting custom buffer sizes. The below
examples set buffer sizes to 1MB:
Coding Principles 331
If your application has many objects with some identical properties, those parts of the objects with
identical properties are wasting memory. You should extract the common properties to a new class
and make the original objects reference a shared object of that new class. Now your objects share
a single common object, and possibly significantly less memory is consumed. This design pattern is
called the flyweight pattern and was described in more detail in the earlier chapter.
6: Testing Principles
Testing is traditionally divided into two categories: functional and non-functional testing. This
chapter will first describe the functional testing principles and then the non-functional testing
principles.
• Unit testing
• Integration testing
• End-to-end (E2E) testing
The testing pyramid depicts the relative number of tests in each phase. Most tests are unit tests. The
second most tests are integration tests, and the fewest are E2E tests. Unit tests should cover the whole
codebase of a software component. Unit testing focuses on testing individual public functions as units
(of code). Software component integration tests cover the integration of the unit-tested functions to a
complete working software component, including testing the interfaces to external services. Examples
of external services are a database, a message broker, and other microservices. E2E testing focuses on
testing the end-to-end functionality of a complete software system.
There are various different terms used regarding different testing levels and phases:
The term component testing is also used to denote only the integration of the unit-tested modules in
a software component without testing the external interfaces and in connection with the component
testing term, there is the term integration testing used to denote the testing of external interfaces
of a software component. Here I use the term integration testing to denote both the integration of
unit-tested modules and external interfaces. Typically there is no reason to separate these tests into
separate testing phases.
Unit tests should be written for public functions only. Do not try to test private functions separately.
They should be tested indirectly when testing public functions. Unit tests should test the function
specification, i.e. what the function does in different scenarios, not how the function is implemented.
When you unit test only public functions, you can easier refactor the function implementation, e.g.
rewrite the private functions that the public function uses without any modifications to the related
unit tests.
Below is an example of a public function using a private function:
Testing Principles 334
def __read_file(...):
# ...
def parse_config(...):
# ...
# __read_file(...)
# do_something(...)
# ...
In the above parse_config.py module, there is one public function, parse_config, and one private
function, __read_file. In unit testing, you should test the public parse_config function in isolation
and mock the do_something function, which is imported from another module. And you indirectly
test the private __read_file function when testing the public parse_config function.
Below is the above example written using classes. You test the class-based version in a similar way
as the above version. You write unit tests for the public parse_config method only. Those tests will
test the private __read_file method indirectly. You must supply a mock instance of the OtherClass
class for the ConfigParser constructor.
class OtherClass:
# ...
class ConfigParser:
def __init__(self, other_class: OtherClass):
self.__other_class = other_class
# ...
Unit tests should test all the functionality of a public function: happy path(s), possible failure
situations, security issues, and edge cases so that each code line of the function is covered by at
least one unit test. Security issues in functions are mostly related to the input the function gets. Is
that input secure? If your function receives unvalidated input data from end-user, that data must be
validated against a possible attack by a malicious end-user.
Below are some examples of edge cases listed:
Testing Principles 335
• Are the last loop counter value correct? This test should detect possible off-by-one errors
• Test with an empty array
• Test with the smallest allowed value
• Test with the biggest allowed value
• Test with a negative value
• Test with a zero value
• Test with a very long string
• Test with an empty string
• Test with floating-point values having different precisions
• Test with floating-point values that are rounded differently
• Test with a very small floating-point value
• Test with a very large floating-point value
Unit tests should not test the functionality of dependencies. That is something to be tested with
integration tests. A unit test should test a function in isolation. If a function has one or more
dependencies on other functions defined in different classes (or modules), those dependencies should
be mocked. A mock is something that mimics the behavior of a real object or function. Mocking will
be described in more detail later in this section.
Testing functions in isolation has two benefits. It makes tests faster. This is a real benefit because you
can have a lot of unit tests, and you run them often, so it is crucial that the execution time of the unit
tests is as short as possible. Another benefit is that you don’t need to set up external dependencies,
like a database, a message broker, and other microservices, because you are mocking the functionality
of the dependencies.
Unit tests gives you security against introducing accidental bugs when refactoring code. Unit
tests ensure that the function specification is met by the implementation code. And it should be
remembered that it is hard to write the perfect code on the first try. You are bound to practice
refactoring if you want to keep your code base clean and free of technical debt. And when you
refactor, the unit tests are on your side to prevent accidentally introducing bugs.
code samples in the book but I don’t present them using TDD, because it would make everything
complicated and more verbose.
I suggest that you start small with TDD. Best way to start using TDD is when you are implementing
a brand new software component. You have to keep on practising TDD systematically, even if it feels
unnatural at first. Only that way you can build yourself a habit where you always use TDD.
The pure TDD cycle consists of the following steps:
Let’s continue with an example. Suppose there is the following user story in the backlog waiting to
be implemented:
Let’s first write a test for the ‘happy path’ of the specified functionality:
import unittest
class ConfigParserTests(unittest.TestCase):
config_parser = ConfigParserImpl()
def test_parse(self):
# GIVEN
config_str = 'propName1=value1\npropName2=value2'
# WHEN
config = self.config_parser.parse(config_str)
# THEN
self.assertEqual(config.get_property_value('propName1'), 'value1')
self.assertEqual(config.get_property_value('propName2'), 'value2')
Now, if we run all the tests, we get a compilation error, which means that the test case we wrote won’t
pass yet. Next, we shall write the simplest possible code to make the test case both compile and pass:
Testing Principles 337
class Configuration(Protocol):
def get_property_value(self, property_name: str) -> str:
pass
class ConfigurationImpl(Configuration):
def __init__(self, prop_name_to_value_dict: dict[str, str]):
self.__prop_name_to_value_dict: Final = prop_name_to_value_dict
class ConfigParser(Protocol):
def parse(self, config_str: str) -> Configuration:
pass
class ConfigParserImpl(ConfigParser):
def parse(self, config_str: str) -> Configuration:
# Parse config_str and assign properties to
# 'prop_name_to_value_dict' variable
return ConfigurationImpl(prop_name_to_value_dict)
Now the test passes and we can add new functionality. Let’s add a test for the case when parsing fails.
We can now repeat the TDD cycle from the beginning by creating a failing test first:
import unittest
class ConfigParserTests(unittest.TestCase):
# ...
def test_try_parse_when_parsing_fails(self):
# GIVEN
config_str = 'invalid'
try:
# WHEN
self.config_parser.try_parse(config_str)
# THEN
self.fail('ConfigParser.ParseError should have been raised')
except ConfigParser.ParseError:
# THEN error was successfully raised
Next, we should refactor the implementation to make the second test pass:
Testing Principles 338
class ConfigParser(Protocol):
class ParseError(DataExporterError):
pass
class ConfigParserImpl(ConfigParser):
def try_parse(self, config_str: str) -> Configuration:
# Try parse config_str and if successful
# assign config properties to 'prop_name_to_value_dict'
# variable
if prop_name_to_value_dict is None:
raise self.ParseError()
else:
return ConfigurationImpl(prop_name_to_value_dict)
We also need to refactor the first unit test to call try_parse instead of parse. We can continue adding
test cases for additional functionality.
For me, the above-described TDD cycle sounds a bit cumbersome. But, there are clear benefits in
creating tests beforehand. When tests are defined first, it is usually less likely that one forgets to
test or implement something. This is because TDD better forces you to think about the function
specification: happy path(s), possible security issues, edge and failure cases.
If you don’t practice TDD and do the implementation always first, it is more likely you might forget
an edge case or a particular failure/security scenario. When you don’t practice TDD, you go straight
to the implementation, and you tend to think about the happy path(s) only and strive to get them
working. When you are focusing on getting the happy path(s) working, you don’t think about the
edge cases and failure/security scenarios much because you are mentally so strongly focusing on the
happy path(s). And if you forget to implement an edge case or failure scenario, you don’t test it. You
can have 100% unit test coverage for a function, but a particular edge case or failure/security scenario
is left unimplemented and untested. This is what has happened to me, also. And it has happened
more than once. Only after realizing that TDD can save me from those kind of bugs, I started to take
TDD seriously. Before that I did not realize the actual value of the TDD and thought it to be a bit
too cumbersome process. If there is only one takeaway for yourself from this book, it should be the
TDD. Practising TDD will make you write less bugs and it makes writing code less stressful (this is
important!), because you have tackled the error situations and edge cases before starting to write any
code.
As an alternative to the above-described TDD cycle, you can conduct a simplified version of TDD.
In the simplified version of TDD, you first specify the function like in the full-blown TDD. From the
function specification, you extract all the needed tests, including the “happy path”, edge cases and
Testing Principles 339
failure/security scenarios. Then you put a fail call in all the tests not to forget to implement them
later. Additionally you can put a comment to each test that tells what is expected result with a certain
given input. For example, in failure scenarios, you can put a comment that tells what kind of error is
expected to be raised and in an edge case, you can put a comment that tells with input of x, output of
y is expected. After you have implemented a test, the comment can be removed.
Let’s say that we the following function specification:
Let’s first write a failing test case for the “happy path” scenario:
import unittest
class ConfigParserTests(unittest.TestCase):
def test_try_parse(self):
# Happy path scenario
self.fail()
import unittest
class ConfigParserTests(unittest.TestCase):
# ...
def test_try_parse__when_json_parsing_fails(self):
# Failure scenario, should produce an error
self.fail()
def test_try_parse__when_mandatory_prop_is_missing(self):
# Failure scenario, should produce an error
self.fail()
def test_try_parse__when_optional_prop_is_missing(self):
# Should use default value
self.fail()
def test_try_parse__with_extra_props(self):
Testing Principles 340
def test_try_parse__when_prop_has_invalid_type(self):
# Failure scenario, should produce an error
self.fail()
def test_try_parse__when_integer_prop_out_of_range(self):
# Input validation security scenario, should produce an error
self.fail()
def test_try_parse__when_string_prop_too_long(self):
# Input validation security scenario, should produce an error
self.fail()
Now you have a high-level specification of the function in the form of scenarios. Next, you can
continue with the function implementation. After you have completed the function implementation,
implement the tests one by one, and remove the fail calls.
The benefit of this approach is that you don’t have to switch continuously between the implementation
source code file and the test source code file. In each phase, you can focus on one thing:
1) Function specification
In real-life the initial function specification is not always 100% correct or complete. During the
function implementation, you might discover e.g. a new error scenario that was not in the initial
function specification. You should then immediately add a new failing unit test for that new scenario
not to forget to implement it later. Once you think your function implementation is complete, go
through the function code line-by-line and check if any line can produce an error that is not taken
into account yet. Having this habit will reduce the possibility that you accidentally leave some error
unhandled in the function code.
Sometimes you need to modify an existing function because you are not always able to follow the
open-closed principle for various reasons like not possible or feasible. When you need to modify an
existing function follow the below steps:
Testing Principles 341
2) Add/Remove/Modify tests
Let’s have an example where we change the configuration parser so that it should produce an error if
configuration contains extra properties. Now we have the specification of the change defined. Next
we need to modify the tests. We need to modify the test_try_parse__with_extra_props method as
follows:
import unittest
class ConfigParserTests(unittest.TestCase):
# ...
def test_try_parse__with_extra_props(self):
self.fail()
Next, we implement the wanted change and then implement the above unit test.
Let’s have another example where we change the configuration parser so that the configuration can
be given in YAML in addition to JSON. We need to add the following failing unit tests:
import unittest
class ConfigParserTests(unittest.TestCase):
# ...
def test_try_parse__when_config_in_yaml_format(self):
self.fail()
def test_try_parse__when_yaml_parsing_fails(self):
# Should produce an error
self.fail()
Testing Principles 342
We should also rename the following test methods: test_try_parse and test_try_parse__when_-
parsing_fails to test_try_parse__when_config_in_json_format and test_try_parse__when_json_-
parsing_fails. Next we implement the changes to the function and lastly we implement the two
new tests. (Depending on the actual test implementation, you may or may not need to make small
change to JSON parsing related tests to make them pass.)
As a final example, let’s do the following change: Configuration does not have any optional properties,
but all properties are mandatory. This means that we can remove the following test: test_try_parse_-
_when_optional_prop_is_missing. We also need to change the test_try_parse__when_mandatory_-
prop_is_missing test. In order to remember to modify the test, we can initially modify the test to a
failing test:
import unittest
class ConfigParserTests(unittest.TestCase):
# ...
def test_try_parse__when_mandatory_prop_is_missing(self):
self.fail()
# Existing implementation here ...
Once we have implemented the change, we can complete the implementation of the test and remove
the fail call.
In the above examples, we had function specifications with happy path and failure/security scenarios.
Let’s have an example of a function specification that has edge cases. We should implement a contains
method for a string class. The method should do the following:
The method takes a string argument and if that string is found in the string the string
object represents, then True is returned, otherwise False is returned.
We can immediately notice that there are two happy paths and we can create the following failing
tests:
import unittest
class StringTests(unittest.TestCase):
def test_contains__when_arg_string_is_found(self):
# Should return True
self.fail()
def test_contains__when_arg_string_is_not_found(self):
# Should return False
self.fail()
There are several edge cases we might want to test also to make 100% sure that the function works
correctly in every case:
Testing Principles 343
import unittest
class StringTests(unittest.TestCase):
# ...
def test_contains__strings_are_equal(self):
self.fail()
# Should return True
def test_contains__when_both_strings_are_empty(self):
self.fail()
# Should return True
def test_contains__when_arg_string_is_empty(self):
self.fail()
# Should return False
def test_contains__when_this_string_is_empty(self):
self.fail()
# Should return False
def test_contains__when_arg_string_is_found_at_begin(self):
self.fail()
# Should return True
def test_contains__when_arg_string_is_found_at_end(self):
self.fail()
# Should return True
def test_contains__when_arg_string_is_longer_than_this_string(self):
self.fail()
# Should return False
When functions to be tested are in a class, a respectively named class for unit tests should be
created. For example, if there is a ConfigParser class, the respective class for unit tests should
be ConfigParserTests. This way, it is easy to locate the file containing unit tests for a particular
implementation class.
A test method name should start with a test prefix, after which the name of the tested method should
come. For example, if the tested method is try_parse, the test method name should be test_try_-
parse. There are usually several tests for a single function. All test method names should begin with
Testing Principles 344
test_<function-name>, but the test method name should also contain a description of the specific
scenario the test method tests, for example: test_try_parse__when_parsing_fails. The name of the
tested scenario is separated from the tested function name by two underscores.
6.1.1.3: Mocking
Python has unittest.mock library for mocking in unit tests. It allows you to replace parts of your
system under test with mock objects and make assertions about how they have been used. The
mocking library provides the following ways to mock:
Let’s have examples that cover all the four different ways of mocking. First we will have a Kafka client
that allows creating a Kafka topic on a Kafka broker. We want the topic creation to be idempotent,
i.e. it does not do anything if the topic already exists. We will use the simplified version of TDD in
this exercise by first specifying the functionality of the Kafka client as failing unit tests first:
class KafkaClientTests(TestCase):
def test_try_create_topic__when_create_succeeds(self):
self.fail()
def test_try_create_topic__when_create_fails(self):
# Raise an error
self.fail()
def test_try_create_topic__when_topic_exists(self):
self.fail()
class KafkaClient:
def __init__(self, kafka_host: str):
self.__admin_client = AdminClient(
{'bootstrap.servers': kafka_host}
)
Testing Principles 345
class CreateTopicError(DataExporterError):
pass
def try_create_topic(
self,
name: str,
num_partitions: int,
replication_factor: int,
retention_in_secs: int,
retention_in_gb: int
):
topic = NewTopic(
name,
num_partitions,
replication_factor,
config={
'retention.ms': str(retention_in_secs * 1000),
'retention.bytes': str(retention_in_gb * pow(10, 9))
}
)
try:
topic_name_to_creation_dict = (
self.__admin_client.create_topics([topic])
)
topic_name_to_creation_dict[name].result()
except KafkaException as error:
if error.args[0].code() != KafkaError.TOPIC_ALREADY_EXISTS:
raise self.CreateTopicError(error)
Let’s implement the first test method to test the successful execution of the try_create_topic method:
class KafkaClientTests(TestCase):
@patch('asyncio.Future')
@patch('KafkaClient.NewTopic')
@patch('KafkaClient.AdminClient')
def test_try_create_topic__when_create_succeeds(
self,
admin_client_class_mock: Mock,
new_topic_class_mock: Mock,
future_class_mock: Mock,
):
# GIVEN
admin_client_mock = admin_client_class_mock.return_value
future_mock = future_class_mock.return_value
admin_client_mock.create_topics.return_value = {
'test': future_mock
}
kafka_client = KafkaClient('localhost:9092')
# WHEN
kafka_client.try_create_topic(
Testing Principles 346
'test',
num_partitions=3,
replication_factor=2,
retention_in_secs=5 * 60,
retention_in_gb=100,
)
# THEN
admin_client_class_mock.assert_called_once_with(
{'bootstrap.servers': 'localhost:9092'}
)
new_topic_class_mock.assert_called_once_with(
'test',
3,
2,
config={
'retention.ms': str(5 * 60 * 1000),
'retention.bytes': str(100 * pow(10, 9)),
},
)
admin_client_mock.create_topics.assert_called_once_with(
[new_topic_class_mock.return_value]
)
future_mock.result.assert_called_once()
In the above example, we use two classes AdminClient and NewTopic from the Confluent Kafka library.
We cannot access these real dependencies directly in our unit tests, but we must mock them. This
means we patch both NewTopic and AdminClient classes that are imported from the KafkaClient
which imports them from confluent_kafka.cimpl and confluent_kafka.admin respectively. The
mocks are created using the @patch decorators. We also mock the asyncio.Future class, because
AdminClient.create_topics returns a dict containing a Future instance. The mocked versions of
the classes are supplied as arguments to the test_try_create_topic method. We can access the
mocked AdminClient and Futureinstances from the mocked class using the return_value property.
After executing the test, we need to verify calls to the mocks.
Let’s add another test for the case when the topic creation fails:
class KafkaClientTests(TestCase):
@patch('asyncio.Future')
@patch('KafkaClient.NewTopic')
@patch('KafkaClient.AdminClient')
def test_try_create_topic__when_create_fails(
self,
admin_client_class_mock: Mock,
new_topic_class_mock: Mock,
Testing Principles 347
future_class_mock: Mock,
):
# GIVEN
kafka_client = KafkaClient('localhost:9092')
admin_client_mock = admin_client_class_mock.return_value
future_mock = future_class_mock.return_value
admin_client_mock.create_topics.return_value = {
'test': future_mock
}
future_mock.result.side_effect = KafkaException(KafkaError(1))
# WHEN
try:
kafka_client.try_create_topic(
'test',
num_partitions=3,
replication_factor=2,
retention_in_secs=5 * 60,
retention_in_gb=100,
)
self.fail('KafkaException should have been raised')
except KafkaClient.CreateTopicError:
pass
# THEN
admin_client_class_mock.assert_called_once_with(
{'bootstrap.servers': 'localhost:9092'}
)
new_topic_class_mock.assert_called_once_with(
'test',
3,
2,
config={
'retention.ms': str(5 * 60 * 1000),
'retention.bytes': str(100 * pow(10, 9)),
},
)
admin_client_mock.create_topics.assert_called_once_with(
[new_topic_class_mock.return_value]
)
The key in the above test is to make the Future mock instance’s result method to raise a
KafkaException as side effect. Then in the actual test code we ensure that a KafkaException is thrown,
if not we fail the test with a message telling that a KafkaException should have been raised.
The above two test methods contain duplicate code. We should keep also the test code clean. Let’s
refactor the test case to remove duplicated code. We introduce a set_up method that will do the setup
of the mocks and creation of the KafkaClient instance. We refactor the common mock call assertions
into a separate private method used by the both tests. The patchers are set for the whole class which
means that the unit test framework will patch each method starting with a test prefix.
Testing Principles 348
@patch('asyncio.Future')
@patch('KafkaClient.NewTopic')
@patch('KafkaClient.AdminClient')
class KafkaClientTests(TestCase):
def set_up(
self,
admin_client_class_mock: Mock,
future_class_mock: Mock,
) -> None:
# GIVEN
self.admin_client_mock = admin_client_class_mock.return_value
self.future_mock = future_class_mock.return_value
self.admin_client_mock.create_topics.return_value = {
'test': self.future_mock
}
self.topic_params = {
'num_partitions': 3,
'replication_factor': 2,
'retention_in_secs': 5 * 60,
'retention_in_gb': 100,
}
self.kafka_client = KafkaClient('localhost:9092')
def test_try_create_topic__when_create_succeeds(
self,
admin_client_class_mock: Mock,
new_topic_class_mock: Mock,
future_class_mock: Mock,
):
# GIVEN
self.set_up(admin_client_class_mock, future_class_mock)
# WHEN
self.kafka_client.try_create_topic('test', **self.topic_params)
# THEN
self.__assert_mock_calls(
admin_client_class_mock, new_topic_class_mock
)
self.future_mock.result.assert_called_once()
def test_try_create_topic__when_create_fails(
self,
admin_client_class_mock: Mock,
new_topic_class_mock: Mock,
future_class_mock: Mock,
):
# GIVEN
self.set_up(admin_client_class_mock, future_class_mock)
self.future_mock.result.side_effect = KafkaException(KafkaError(1))
# WHEN
Testing Principles 349
try:
self.kafka_client.try_create_topic('test', **self.topic_params)
self.fail('KafkaException should have been raised')
except KafkaClient.CreateTopicError:
pass
# THEN
self.__assert_mock_calls(
admin_client_class_mock, new_topic_class_mock
)
def __assert_mock_calls(
self, admin_client_class_mock: Mock, new_topic_class_mock: Mock
):
admin_client_class_mock.assert_called_once_with(
{'bootstrap.servers': 'localhost:9092'}
)
new_topic_class_mock.assert_called_once_with(
'test',
3,
2,
config={
'retention.ms': str(5 * 60 * 1000),
'retention.bytes': str(100 * pow(10, 9)),
},
)
self.admin_client_mock.create_topics.assert_called_once_with(
[new_topic_class_mock.return_value]
)
class KafkaClientTests(TestCase):
def test_try_create_topic__when_topic_exists(self):
# GIVEN
self.future_mock.result.side_effect = KafkaException(
KafkaError(KafkaError.TOPIC_ALREADY_EXISTS)
)
# WHEN
self.kafka_client.try_create_topic('test', **self.topic_params)
# THEN
self.__assert_mock_calls()
In the above examples, we used patch to create mocks for classes. Let’s have another example where
we patch library methods directly. We should implement a HTTP client that allows fetching JSON
data parsed to a dict from a URL. Let’s utilze the simplified TDD and list all possible scenarios for the
HTTP client:
• Fetching JSON data from the URL fails with a HTTP status code >=400. Should raise an error.
• Not being able to connect to URL successfully (e.g. malformed URL, connection refused,
connection timeout, …). Should raise an error
class HttpClientTests(TestCase):
def test_try_fetch_dict__when_fetch_succeeds(self):
self.fail()
def test_try_fetch_dict__when_json_parse_fails(self):
# Should raise an error
self.fail()
def test_try_fetch_dict__when_response_has_error(self):
# Should raise an error
self.fail()
def test_try_fetch_dict__when_remote_connection_fails(self):
# Should raise an error
self.fail()
Now we can implement the HttpClient class so that it provides the functionality specified by the
above test methods.
import requests
class HttpClient:
# Replace the 'Exception' below with the base error
# class of the software component
class Error(Exception):
pass
If we didn’t use the simplified TDD, we could have easily end up with the following implementation
focusing on the happy path:
Testing Principles 351
import requests
class HttpClient:
def fetch_dict(self, url: str) -> dict[str, Any]:
response = requests.get(url, timeout=60)
response.raise_for_status()
return response.json()
The problem is that it is easy to forget handling the errors possibly thrown from the requests.get and
Response.json methods. Using TDD forces us to stop before implementing anything and think about
the possible error scenarios and edge cases in addition to the happy path scenario.
Let’s implement the first test method:
URL = 'https://localhost:8080/'
DICT = {'test': 'test'}
class HttpClientTests(TestCase):
@patch('requests.Response.__new__')
@patch('requests.get')
def test_try_fetch_dict__when_fetch_succeeds(
self, requests_get_mock: Mock, response_mock: Mock
):
# GIVEN
requests_get_mock.return_value = response_mock
response_mock.status_code = 200
response_mock.raise_for_status.return_value = None
response_mock.json.return_value = DICT
# WHEN
response_dict = HttpClient().try_fetch_dict(URL)
# THEN
requests_get_mock.assert_called_once_with(URL, timeout=60)
self.assertDictEqual(response_dict, DICT)
import json
from unittest import TestCase
from unittest.mock import Mock, patch
import requests
from HttpClient import HttpClient
URL = 'https://localhost:8080/'
DICT = {'test': 'test'}
class HttpClientTests(TestCase):
@patch('requests.Response.__new__')
@patch('requests.get')
def test_try_fetch_dict__when_json_parse_fails(
self, requests_get_mock: Mock, response_mock: Mock
):
# GIVEN
requests_get_mock.return_value = response_mock
response_mock.status_code = 200
response_mock.raise_for_status.return_value = None
response_mock.json.side_effect = requests.JSONDecodeError(
'JSON decode error', json.dumps(DICT), 1
)
# WHEN
try:
HttpClient().try_fetch_dict(URL)
self.fail('HttpClient.Error should have been raised')
except HttpClient.Error as error:
# THEN
self.assertIn('JSON decode error', str(error))
# THEN
requests_get_mock.assert_called_once_with(URL, timeout=60)
Now we once again have duplicated test code and we must refactor the tests:
import json
from unittest import TestCase
from unittest.mock import Mock, patch
import requests
from HttpClient import HttpClient
URL = 'https://localhost:8080/'
DICT = {'test': 'test'}
@patch('requests.Response.__new__')
@patch('requests.get')
class HttpClientTests(TestCase):
def test_try_fetch_dict__when_fetch_succeeds(
self, requests_get_mock: Mock, response_mock: Mock
):
# GIVEN
requests_get_mock.return_value = response_mock
response_mock.status_code = 200
Testing Principles 353
response_mock.raise_for_status.return_value = None
response_mock.json.return*value = DICT
# WHEN
dict* = HttpClient().try_fetch_dict(URL)
# THEN
requests_get_mock.assert_called_once_with(URL, timeout=60)
self.assertDictEqual(dict_, DICT)
def test_try_fetch_dict__when_json_parse_fails(
self, requests_get_mock: Mock, response_mock: Mock
):
# GIVEN
requests_get_mock.return_value = response_mock
response_mock.status_code = 200
response_mock.raise_for_status.return_value = None
response_mock.json.side_effect = requests.JSONDecodeError(
'JSON decode error', json.dumps(DICT), 1
)
# WHEN
self.assertRaises(
HttpClient.Error, HttpClient().try_fetch_dict, URL
)
# THEN
requests_get_mock.assert_called_once_with(URL, timeout=60)
Let’s add the final two test methods to complete the test case. I also changed to try-except blocks to
use assertRaises method to showcase an alternative way to verify that a function call raises an error.
import json
from unittest import TestCase
from unittest.mock import Mock, patch
import requests
from HttpClient import HttpClient
URL = 'https://localhost:8080/'
DICT = {'test': 'test'}
@patch('requests.Response.__new__')
@patch('requests.get')
class HttpClientTests(TestCase):
# ...
def test_try_fetch_dict__when_response_has_error(
self, requests_get_mock: Mock, response_mock: Mock
):
# GIVEN
requests_get_mock.return_value = response_mock
response_mock.status_code = 500
response_mock.raise_for_status.side_effect = requests.HTTPError()
# WHEN
self.assertRaises(
Testing Principles 354
# THEN
requests_get_mock.assert_called_once_with(URL, timeout=60)
def test_try_fetch_dict__when_remote_connection_fails(
self, requests_get_mock: Mock, response_mock: Mock
):
# GIVEN
requests_get_mock.side_effect = requests.ConnectionError()
# WHEN
self.assertRaises(
HttpClient.Error, HttpClient().try_fetch_dict, URL
)
# THEN
requests_get_mock.assert_called_once_with(URL, timeout=60)
Let’s have an example where we use @patch.dict. Let’s assume that we have the following code
without unit tests:
import os
import sys
def main():
kafka_client = KafkaClient(get_environ_var('KAFKA_HOST'))
try:
kafka_client.try_create_topic(
get_environ_var('KAFKA_TOPIC'),
num_partitions=3,
replication_factor=2,
retention_in_secs=5 * 60,
retention_in_gb=100,
)
except KafkaClient.CreateTopicError:
sys.exit(1)
if __name__ == '__main__':
main()
In the unit test case, we use @patch.dict to patch the os.environ dict. In the second test method,
we also use the @patch.object decorator instead of the plain @patch decorator. The @patch.object
method patches a method/attribute with a mock in KafkaClient type objects.
Testing Principles 355
import os
from unittest import TestCase
from unittest.mock import Mock, patch
KAFKA_HOST = 'localhost:9092'
KAFKA_TOPIC = 'test'
# WHEN
main()
# THEN
kafka_client_class_mock.assert_called_once_with(KAFKA_HOST)
kafka_client_mock.try_create_topic.assert_called_once_with(
KAFKA_TOPIC,
num_partitions=3,
replication_factor=2,
retention_in_secs=5 * 60,
retention_in_gb=100,
)
@patch.object(KafkaClient, '__init__')
@patch.object(KafkaClient, 'try_create_topic')
@patch('sys.exit')
def test_main__when_exec_failed(
self,
sys_exit_mock: Mock,
try_create_topic_mock: Mock,
kafka_client_init_mock: Mock,
):
# GIVEN
kafka_client_init_mock.return_value = None
try_create_topic_mock.side_effect = KafkaClient.CreateTopicError()
# WHEN
main()
# THEN
kafka_client_init_mock.assert_called_once_with(KAFKA_HOST)
sys_exit_mock.assert_called_once_with(1)
Let’s create unit test for code that uses dependency injection. We have the following code from an
earlier chapter and we would like to create a unit test for the Application class run method. In the
below example, we assume that each class is in its own module named according to the class name
and the di_container = DiContainer() definition is in a module name di_container.
Testing Principles 356
class LogLevel(Enum):
ERROR = 1
WARN = 2
INFO = 3
# ...
class Logger(Protocol):
def log(self, log_level: LogLevel, message: str):
pass
class StdOutLogger(Logger):
def log(self, log_level: LogLevel, message: str):
# Log to standard output
class DiContainer(containers.DeclarativeContainer):
wiring_config = containers.WiringConfiguration(
modules=['Application']
)
logger = providers.Singleton(StdOutLogger)
di_container = DiContainer()
class Application:
@inject
def __init__(self, logger: Logger = Provide['logger']):
self.__logger = logger
def run(self):
self.__logger.log(LogLevel.INFO, 'Starting application')
# ...
In the below unit test, we first create a mock instance of Logger class and then override the logger
provider in the DI container with that mock. We use the override context manager to define the scope
of the override.
Testing Principles 357
class ApplicationTests(TestCase):
def test_run__when_execution_succeeds(self):
logger_mock = Mock(Logger)
with di_container.logger.override(logger_mock):
# GIVEN
application = Application()
# WHEN
application.run()
# THEN
logger_mock.log.assert_called_once_with(
LogLevel.INFO, 'Starting application'
)
UI component unit testing differs from regular unit testing because you cannot necessarily test the
functions of a UI component in isolation if you have, for example, a React functional component.
You must conduct UI component unit testing by mounting the component to DOM and then perform
tests by triggering events, for example. This way, you can test the event handler functions of a UI
component. The rendering part should also be tested. It can be tested by producing a snapshot of the
rendered component and storing that in version control. Further rendering tests should compare the
rendered result to the snapshot stored in the version control.
Below is an example of testing the rendering of a React component, NumberInput:
Figure 6.3. NumberInput.test.jsx
import renderer from 'react-test-renderer';
// ...
describe('NumberInput') () => {
// ...
describe('render', () => {
it('renders with buttons on left and right"', () => {
const numberInputAsJson =
renderer
.create(<NumberInput buttonPlacement="leftAndRight"/>)
.toJSON();
expect(numberInputAsJson).toMatchSnapshot();
});
const numberInputAsJson =
renderer
.create(<NumberInput buttonPlacement="right"/>)
.toJSON();
expect(numberInputAsJson).toMatchSnapshot();
});
});
});
Below is an example unit test for the number input’s decrement button’s click event handler function,
decrementValue:
describe('NumberInput') () => {
// ...
describe('decrementValue', () => {
it('should decrement value by given step amount', () => {
render(<NumberInput value="3" stepAmount={2} />);
fireEvent.click(screen.getByText('-'));
const numberInputElement = screen.getByDisplayValue('1');
expect(numberInputElement).toBeTruthy();
});
});
});
In the above example, we used the testing-library, which has implementations for all the common UI
frameworks: React, Vue and Angular. It means you can use mostly the same testing API regardless
of your UI framework. There are tiny differences, basically only in the syntax of the render method.
If you had implemented some UI components and unit tests for them with React, and you would
like to reimplement them with Vue, you don’t need to reimplement all the unit tests. You only need
to modify them slightly (e.g., make changes to the render function calls). Otherwise, the existing
unit tests should work because the behavior of the UI component did not change, only its internal
implementation from React to Vue.
In the software component integration testing, all public functions of a software component should
be touched by at least one integration test. Not all functionality of the public functions should be
Testing Principles 359
tested because that has already been done in the unit testing phase. This is why there are fewer
integration tests than unit tests. The term integration testing sometimes refers to the integration of a
complete software system or a product. However, it should be used to describe software component
integration only. When testing a product or a software system, the term E2E testing should be used
to avoid confusion and misunderstandings.
The best way to define integration tests is by using behavior-driven development (BDD). BDD encour-
ages teams to use domain-driven design and concrete examples to formalize a shared understanding
of how a software component should behave. In BDD, behavioral specifications are the root of the
integration tests. A team can create behavioral specifications during the initial domain-driven design
phase. This practice will shift the integration testing to the left, meaning that writing the integration
tests starts early and can proceed in parallel with the actual implementation. One widely used and
recommended way to write behavioral specifications is the Gherkin language.
When using the Gherkin language, the behavior of a software component is described as features.
There should be a separate file for each feature. These files have the .feature extension. Each feature
file describes one feature and one or more scenarios for that feature. The first scenario should be the
so-called “happy path” scenario, and other possible scenarios should handle additional happy paths,
failures, and edge cases that need to be tested. Remember that you don’t have to test every failure
and edge case because those were already tested in the unit testing phase.
Below is a simplified example of one feature in a data-visualization-configuration-service. We assume
that the service is a REST API. The feature is for creating a new chart. (In a real-life scenario, a chart
contains more properties like the chart’s data source and what measure(s) and dimension(s) are shown
in the chart, for example). In our simplified example, a chart contains the following properties: layout
id, type, number of x-axis categories shown and how many rows of chart data should be fetched from
the database that acts as a data source for the chart.
The above example shows how the feature’s name is given after the Feature keyword. You can add
free-form text below the feature’s name to describe the feature in more detail. Next, a scenario is
defined after the Scenario keyword. First, the name of the scenario is given. Then comes the steps
of the scenario. Each step is defined using one of the following keywords: Given, When, Then, And, and
But. A scenario should follow this pattern:
Then I should get a response with status code 400 "Bad Request"
And response body should contain error object with
"is mandatory field" entry for following fields
| layout_id |
| fetched_row_count |
| x_axis_categ_shown_count |
| type |
Now we have one feature with two scenarios specified. Next, we shall implement the scenarios. We
want to implement the integration tests in Python, so we will be using the Behave BDD tool that
supports the Gherkin language.
We place integration test code into the source code repository’s integration-tests directory. The feature
files are put in the integration-tests/features directory. Feature directories should be organized into
subdirectories in the same way source code is organized into subdirectories: using domain-driven
design and creating subdirectories for subdomains. We can put the above create_chart.feature file to
the integration-tests/features/chart directory.
Let’s first create an environment.py file in the integration-tests/features to store things common to all
step implementations:
BASE_URL = 'http://localhost:8080/data-visualization-configuration-service/'
Next, we need to provide an implementation for each step in the scenarios. Let’s start with the first sce-
nario. We shall create and a create_chart_steps.py file in the src/integration-tests/features/chart/steps
directory for the implementation of the steps:
Testing Principles 361
import requests
from behave import given, then, when
from behave.runner import Context
from environment import BASE_URL
input_chart = {}
@then(
'I should get the chart given above with status code {status_code:d} "{reason}"'
)
def step_impl6(context: Context, status_code: int, reason: str):
assert context.response.status_code == status_code
assert context.response.reason == reason
output_chart = context.response_dict
assert output_chart['id'] > 0
assert output_chart['layout_id'] == input_chart['layout_id']
assert output_chart['type'] == input_chart['type']
assert output_chart['x_axis_categ_shown_count'] == (
input_chart['x_axis_categ_shown_count']
)
assert output_chart['fetched_row_count'] == (
input_chart['fetched_row_count']
)
The above implementation contains a function for each step. Each function is annotated with an
annotation for a specific Gherkin keyword: @given, @when, and @then. Note that a step in a scenario
can be templated. For example, the step Given chart layout id is 1 is templated and defined in the
function @Given("chart layout id is {layout_id:d}") def step_impl(context: Context, layout_id:
int) where the actual layout id is given as a parameter to the function. You can use this templated
step in different scenarios that can give a different value for the layout id, for example: Given chart
Testing Principles 362
layout id is 8. The :d modifier after the layout_id tells to Behave that this variable should be
converted to an integer.
The @when('I create a new chart') step implementation uses requests package for submitting an
HTTP POST request to the data-visualization-configuration-service. And the @then('I should get
the chart given above with status code {status_code:d} "{reason}"') step implementation takes
the HTTP POST response stored in the context and validates the status code and the properties in the
response body.
The second scenario is a common failure scenario where you create something with missing
parameters. Because this scenario is common (i.e., we can use the same steps in other features), we put
the step definitions in a file named common_steps.py in the common subdirectory of the integration-
tests/features/steps directory.
Here are the step implementations:
@then(
'I should get a response with status code {status_code:d} "{reason}"'
)
def step_impl1(context: Context, status_code: int, reason: str):
assert context.response.status_code == status_code
assert context.response.reason == reason
@then(
'response body should contain error object with {error} entry for following fields'
)
def step_impl2(context: Context, error: str):
error_description = context.response_dict.error_description
for field in context.table:
assert f'{field} {error}' in error_description
To execute the integration tests with Behave, run the behave command in the integration-tests
directory. Tell about behave command line params and how test only certain tests using tags.
Some frameworks offer their way of creating integration tests. For example Django web framework
offers its own way of doing integration tests. There are two things why I don’t recommend using
a framework specific testing tools. The first reason is that then your integration tests are coupled
to the framework and if you decide to reimplement your microservice using a different language or
different framework, you need to reimplement the integration tests also. When you use a generic
BDD integration testing tool like Behave, your integration tests are not coupled to any microservice
implementation programming language or framework. The second reason is that there is less learning
and information burden for QA/test engineers when they don’t have to master multiple framework
specific integration testing tools. If you use a single BDD integration testing tool like Behave in all
the microservices in a software system, it will be easier for QA/test engineers to work with different
microservices.
Testing Principles 363
For API microservices, one more alternative to implement integration tests is an API development
platform like Postman1 . Postman can be used to write integration tests using JavaScript.
Suppose we have an API microservice named _sales-itemservice which offers CRUD operations on
sales items. Below is an example API request for creating a new sales item. You can define this in
Postman as a new request:
{
"name": "Test sales item",
"price": 10,
}
Here is a Postman test case to validate the response to the above request:
In the above test case, the response status code is verified first, and then the salesItem object is
parsed from the response body. Value for the variable salesItemId is set. This variable will be used
in subsequent test cases. Finally, the values of the name and price properties are checked.
Next, a new API request could be created in Postman to retrieve the just created sales item:
We used the value stored in the salesItemId variable in the request URL. Variables can be used in the
URL and request body using the following notation: {{<variable-name>}}. Let’s create a test case for
the above request:
1 https://www.postman.com/
Testing Principles 364
API integration tests written in Postman can be utilized in a CI pipeline. An easy way to do that is to
export a Postman collection to a file that contains all the API requests and related tests. A Postman
collection file is a JSON file. Postman offers a Node.js command-line utility called Newman2 . It can
be used to run API requests and related tests in an exported Postman collection file.
You can run integration tests in an exported Postman collection file with the below command in a CI
pipeline:
In the above example, we assume that a file named integrationTestsPostmanCollection.json has been
exported to the integration-tests directory in the source code repository.
You can also use the Gherkin language when specifying UI features. For example, the TestCafe UI
testing tool can be used with the gherkin-testcafe tool to make TestCafe support the Gherkin syntax.
Let’s create a simple UI feature:
Next, we can implement the above steps in JavaScript using the TestCafe testing API:
2 https://learning.postman.com/docs/running-collections/using-newman-cli/installing-running-newman/
Testing Principles 365
// Imports...
There is another similar tool to TestCafe, namely Cypress. You can also use Gherkin with Cypress
with the cypress-cucumber-preprocessor package. Then you can write your UI integration tests like
this:
Before integration tests can be run, an integration testing environment must be set up. An integration
testing environment is where the tested microservice and all its dependencies are running. The easiest
way to set up an integration testing environment for a containerized microservice is to use Docker
Compose, a simple container orchestration tool for a single host.
Let’s create a docker-compose.yml file for the sales-item-service microservice, which has a MySQL
database as a dependency. The database is used by the microservice to store sales items.
Figure 6.5. docker-compose.yaml
version: "3.8"
services:
wait-for-services-ready:
image: dokku/wait
sales-item-service:
restart: always
build:
context: .
env_file: .env.ci
ports:
- "3000:3000"
depends_on:
- mysql
mysql:
image: mysql:8.0.22
command: --default-authentication-plugin=mysql_native_password
restart: always
cap_add:
- SYS_NICE
environment:
MYSQL_ROOT_PASSWORD: ${MYSQL_PASSWORD}
ports:
- "3306:3306"
In the above example, we first define a service wait-for-services-ready which we will use later. Next,
we define our microservice, sales-item-service. We ask Docker Compose to build a container image for
the sales-item-service using the Dockerfile in the current directory. Then we define the environment
for the microservice to be read from an .env.ci file. We expose port 3000 and tell that our microservice
depends on the mysql service.
Next, we define the mysql service. We tell what image to use, give a command-line parameter and
define the environment and expose a port.
Before we can run the integration tests, we must spin the integration testing environment up using
the docker-compose up command:
We tell the docker-compose command to read environment variables from an .env.ci file, which should
contain an environment variable named MYSQL_PASSWORD. We ask docker-compose to always build
Testing Principles 367
the sales-item-service by specifying the --build flag. The -d flag tells docker-compose to run in the
background.
Before we can run the integration tests, we must wait until all services defined in the docker-
compose.yml are up and running. We use the wait-for-services-ready service provided by the
dokku/wait3 image. We can wait for the services to be ready by issuing the following command:
docker-compose
--env-file .env.ci
run wait-for-services-ready
-c mysql:3306,sales-item-service:3000
-t 600
The above command will finish after mysql service’s port 3306 and sales-item-service’s port 3000 can
be connected. After the above command is finished, you can run the integration tests. In the below
example, we run the integration tests using the newman CLI tool:
If your integration tests are implemented using Behave, you can run them by going to the integration-
tests directory and running the behave command there.
After integration tests are completed, you can shut down the integration testing environment:
docker-compose down
If you need other dependencies in your integration testing environment, you can add them to the
docker-compose.yml file. If you need to add other microservices with dependencies, you must also
add transitive dependencies. For example, if you needed to add another microservice that uses a
PostgreSQL database, you would need to add both the other microservice and PostgreSQL database
to the docker-compose.yml file.
Let’s say the sales-item-service depends on Apache Kafka 2.x that depends on a Zookeeper service.
The sales-item-service’s docker-compose.yml looks like the below after adding Kafka and Zookeeper:
3 https://hub.docker.com/r/dokku/wait
Testing Principles 368
version: "3.8"
services:
wait-for-services-ready:
image: dokku/wait
sales-item-service:
restart: always
build:
context: .
env_file: .env.ci
ports:
- 3000:3000
depends_on:
- mysql
- kafka
mysql:
image: mysql:8.0.22
command: --default-authentication-plugin=mysql_native_password
restart: always
cap_add:
- SYS_NICE
environment:
MYSQL_ROOT_PASSWORD: ${MYSQL_PASSWORD}
ports:
- "3306:3306"
zookeeper:
image: bitnami/zookeeper:3.7
volumes:
- "zookeeper_data:/bitnami"
ports:
- 2181:2181"
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: bitnami/kafka:2.8.1
volumes:
- "kafka_data:/bitnami"
ports:
- "9092:9092"
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
depends_on:
- zookeeper
volumes:
zookeeper_data:
driver: local
kafka_data:
driver: local
Testing Principles 369
As the name says, in E2E testing, test cases should be end-to-end. They should test that each
microservice is deployed correctly to the test environment and connected to its dependent services.
The idea of E2E test cases is not to test details of microservices’ functionality because that has already
been tested as part of unit and software component integration testing.
Let’s consider a telecom network analytics software system that consists of the following applications:
• Data ingestion
• Data correlation
• Data aggregation
• Data exporter
• Data visualization
Testing Principles 370
The southbound interface of the software system is the data ingestion application. The data
visualization application provides a web client as a northbound interface. Additionally, the data
exporter application provides another northbound interface for the software system.
E2E tests are designed and implemented similarly to software component integration tests. We are
just integrating different things (microservices instead of functions). E2E testing should start with the
specification of E2E features. These features can be specified using the Gherkin language and put in
.feature files.
You can start specifying and implementing E2E tests right after the architectural design for the
software system is completed. This way, you can shift the implementation of the E2E test to the
left and speed up the development phase. You should not start specifying and implementing E2E only
when the whole software system is implemented.
Testing Principles 371
Our example software system should have at least two happy-path E2E features. One for testing the
data flow from data ingestion to data visualization and another feature to test the data flow from data
ingestion to data export. Below is the specification of the first E2E feature:
And then, we can create the other feature that tests the E2E path from data ingestion to data export:
Next, E2E tests can be implemented. Any programming language and tool compatible with the
Gherkin syntax, like Behave with Python, can be used. If the QA/Test engineers in the development
teams already use Behave for integration tests, it would be natural to use Behave also for the E2E
tests.
The software system we want to E2E test must reside in a production-like test environment. Usually,
E2E testing is done in both the CI and the staging environment(s). Before running the E2E tests,
software needs to be deployed to the test environment.
If we consider the first feature above, implementing the E2E test steps can be done so that the steps in
the Given part of the scenario are implemented using externalized configuration. If our software
system runs in a Kubernetes cluster, we can configure the microservices by creating the needed
ConfigMaps. The southbound interface simulator can be controlled by launching a Kubernetes Job
or, if the southbound interface simulator is a microservice with an API, commanding it via its API.
After waiting for all the ingested data to be aggregated and visualized, the E2E test can launch a test
tool suited for web UI testing (like TestCafe) to export chart data from the web UI to downloaded
files. Then the E2E test compares the content of those files with expected values.
You can run E2E tests in a CI environment after each commit to the main branch (i.e., after a
microservice CI/CD pipeline run is finished) to test that a new commit did not break any E2E tests.
Alternatively, if the E2E tests are complex and take a long time to execute, you can run the E2E tests
in the CI environment on a schedule, like hourly.
You can run E2E tests in a staging environment using a separate pipeline in your CI/CD tool.
• Performance testing
• Data volume testing
• Stability testing
• Reliability testing
• Stress and scalability testing
• Security testing
Testing Principles 373
Many parts of the reliability testing can be automated. You can use ready-made chaos engineering
tools or create your own tools. Use a tool to induce failures in the environment. Then verify that
services remain either highly available or at least swiftly recover from failures.
Considering the telecom network analytics software system, we could introduce a test case where
the message broker (e.g., Kafka) is shut down. Then we expect alerts triggered after a while by the
microservices that try to use the unavailable message broker. After the message broker is started, the
alerts should cancel automatically, and the microservices should continue normal operation.
{{- end }}
{{- end }}
It is also possible to specify the autoscaling to use an external metric. An external metric could be
Kafka consumer lag, for instance. If Kafka consumer lag grows too high, the HPA can scale the
microservice out to have more processing power for the Kafka consumer group and when the Kafka
consumer lag decreases below a defined threshold, HPA can scale the microservice in to reduce the
number of pods.
• Cross-site scripting
• SQL injection
• Path disclosure
• Denial of service
• Code execution
• Memory corruption
• Cross-site request forgery (CSRF)
• Information disclosure
• Local/remote file inclusion
A complete list of possible security vulnerabilities found by the OWASP ZAP tool can be found at
https://www.zaproxy.org/docs/alerts/.
I want to bring up visual testing here because it is important. Backstop.js and cypress-plugin-snapshots
test web UI’s HTML and CSS using snapshot testing. Snapshots are screenshots taken of the web UI.
Snapshots are compared to ensure that the visual look of the application stays the same and there are
no bugs introduced with HTML or CSS changes.
7: Security Principles
This chapter describes principles related to security and addresses the main security features related
to software developers.
Security is an integral part of a production-quality software, like are the source code itself and all the
tests. Suppose that security-related features are implemented only in a very late phase of a project.
In that case, there is a greater possibility of not finding time to implement them or forgetting to
implement them. For that reason, security-related features should be implemented rather first than
last. The threat modeling process described in the next section should be used to identify the potential
threats and provide a list of security features that need to be implemented as threat countermeasures.
Security lead works tightly with development teams. He/She educates teams on security related
processes and security features. The security lead facilitates the teams with the below described
threat modelling process, but following the process is responsibility of the team as is the actual
implementation of security features.
• Decompose application
• Determine and rank threats
• Determine countermeasures and mitigation
Security Principles 379
Category Description
Spoofing Attacker acting as another user without real authentication
or using stolen credentials
Tampering Attacker maliciously changing data
Repudiation Attacker being able to perform prohibited operations
Information disclosure Attacker gaining access to sensitive data
Denial of service Attacker trying to make the service unusable
Elevation of privilege Attacker gaining unwanted access rights
• Spoofing
– Attacker is able to read other user’s data using the other user’s id when there is proper
authorization missing
– Attacker is able to steal user credentials on the network because of insecure protocol, like
HTTP instead of HTTPS, is used
– Attacker creates a fake website login page to steal user credentials
– Attacker is able to intercept network traffic and replay some user’s requests as such or
modified
• Tampering
Security Principles 380
– Attacker gains access to database using SQL injection and is able to change existing data
– Attacker is able to modify other user’s data using the other user’s id when there is proper
authorization missing
• Repudiation
– Attacker is able to perform malicious action without notice when there is audit logging
missing
• Information disclosure
– Sensitive information is accidentally sent in request responses (like error stack traces or
business-critical data)
– Sensitive information is not properly encrypted
– Sensitive information is accessible without proper authorization (e.g. role-based)
• Denial of service
– Attacker can create unlimited number of request when proper request rate limiting is
missing
– Attacker can send requests with large amount of data when data size is not limited at all
– Attacker can try to make regular expression DoS attacks by sending strings that can cause
regular expression evaluation to take a lot of CPU time
– Attacker can send invalid values in requests in order to try to crash the service or cause
for-ever loop if no proper input validation is in place
• Elevation of privilege
– Attacker who does not have user account can access the service because of missing
authentication/authorization
– Attacker is able to act as an administrator because the service does not check that the user
has a proper role
– Attacker is able to access operating system root rights, because the process is running
with root user rights.
The Application Security Frame (ASF) categorizes application security features into the following
categories:
Security Principles 381
Category Description
Audit & Logging Logging user actions to detect, e.g., repudiation
attacks
Authentication Prohibit identity spoofing attacks
Authorization Prohibit elevation of privilege attacks
Configuration Management Proper storage of secrets and configuring the
system with the least privileges
Data Protection in Transit and Rest Using secure protocols like TLS, encrypting
sensitive information like PII in databases
Data Validation Validate input data from users to prevent, e.g.,
injection and ReDoS attacks
Exception Management Do not reveal implementation details in error
messages to end-users
When using either of the above-described threat categorization methodologies, threats in each
category should be listed based on the information about the decomposed application: what are the
application entry points and assets that need to be secured? After listing potential threats in each
category, the threats should be ranked. There are several ways to rank threats. The simplest way to
rank threats is to put them in one of the three categories based on the risk: high, medium, and low.
As a basis for the ranking, you can use information about the threat’s probability and how big an
adverse effect (impact) it has. The idea of ranking is to prioritize security features. Security features
for high-risk threats should be implemented first.
In this phase, we will decompose the order-service to see what parts it is composed of and what are
its dependencies.
Security Principles 383
Based on the above decomposed view of the order-service, we shall next identify the following:
Security Principles 384
As drawn in the above picture, attacker’s entry points are from the internet (the order-service is
exposed to public internet via API Gateway) and also an internal attacker could be able to sniff the
network traffic between services.
Assets under threat are the API Gateway, order-service, its database and unencrypted network traffic.
The order-service has the following trust levels:
• Users can place orders for themselves (not for other users)
• Users can view their orders (not other users)
• Users can update their own order only before it is packaged and shipped
• Administrator can create/read/update/delete any order
Next, we should list possible threats in each category of the STRIDE method. We also define the risk
for each possible threat.
1. Spoofing
2. Tampering
1. Attacker trying to tamper with database using SQL injection (Risk: High)
2. Attacker able to capture and modify unencrypted internet traffic (Risk: High)
3. Attacker able to capture and modify unencrypted internal network traffic (Risk: Low)
3. Repudiation
1. Attacker being able to conduct malicious operations without getting caught (Risk: High)
4. Information disclosure
1. Attacker able to access sensitive information because that is not properly encrypted (Risk:
Medium)
Security Principles 385
2. Attacker receives sensitive information like detailed stack traces in request responses.
(Risk: Medium) Attacker can use that information and exploit possible security holes in
the implementation
3. Information is disclosed to attacker because internet traffic is plain-text, i.e. not secured
(Risk: High)
4. Information is disclosed to attacker because internal network traffic is plain-text, i.e. not
secured (Risk: Low)
5. Denial of service
6. Elevation of Privilege
1. Attacker who does not have user account can access the service because of missing
authentication/authorization (Risk: High)
2. Attacker is able to act as an administrator because the service does not check that the user
has a proper role (Risk: High)
3. Attacker is able to access operating system root rights, because the process is running
with root user rights (Risk: Medium)
1. Allow only the user that owns a certain resource to access it (1.1, 1.2)
2. Implement audit logging for operations that create/modify/delete orders (1.3, 3.1)
3. Use parameterized statements for SQL or use ORM, also configure the least permissions for
the database user (2.1) The normal database user should be able to do anything that is only
administrator related like deleting, creating/dropping tables etc.
4. Only allow secure internet traffic to API gateway (TLS is terminated at the API gateway) (1.3,
2.2)
5. Implement mTLS between services using a service mesh like Istio (2.3, 4.4)
6. Encrypt all sensitive information like Personally Identifiable Information (PII) and critical
business data in the database (4.1)
Security Principles 386
7. Do not return error stack traces when the microservice is running in production (4.2)
8. Implement request rate-limiting, e.g. in the API gateway (5.1.)
9. Validate input data to the microservice and define maximum allowed string, array and request
lengths lengths (5.2) Additionally consider audit logging input validation failures
10. Do not use regular expression in validation or use regexps that cannot cause ReDoS (5.3.)
11. Validate input data to the microservice, e.g. correct types, min/max of numeric values, list of
allowed values (5.4) Additionally consider audit logging input validation failures
12. Implement user authentication and authorization using JWTs (1.1, 1.2, 6.1) Consider audit
logging authentication/authorization failures to detect possible attacks
13. For administrator only operations, verify that the JWT contains admin role, before allowing
the operation (1.1, 1.2, 6.2) Additionally configure the system so that admin operations are not
accessible from the internet unless absolutely needed.
14. For the containerized microservice, define the following:
Next we should prioritize the above user stories according to related threat risk levels. Let’s calculate
a priority index for each user story using the following values for threat risk levels:
• High = 3
• Medium = 2
• Low = 1
Here is the prioritized user stories from the highest priority index (PI) to the lowest:
8. Use parameterized statements for SQL or use ORM, also configure the least permissions for
the database user (PI: 3) The normal database user should be able to do anything that is only
administrator related like deleting, creating/dropping tables etc.
9. Do not use regular expression in validation or use regexps that cannot cause ReDoS (PI: 3)
10. Validate input data to the microservice, e.g. min/max numeric values, list of allowed values
(PI: 3)
11. Encrypt all sensitive information like Personally Identifiable Information (PII) and critical
business data in the database (PI: 2)
12. Implement mTLS between services using a service mesh like Istio (PI: 2)
13. Do not return error stack traces when the microservice is running in production (PI: 2)
14. For the containerized microservice, define the following: … (PI: 2)
The team should review the prioritized security user stories list with the product security lead.
Because the security is an integral part of a software system, at least all the above user stories having PI
greater than 2 should be implemented before delivering the first production version. The user stories
with PI <= 2 could be delivered immediately in the first feature package after the initial delivery. This
was just an example. Everything depends on what level of security is wanted and/or required and
the relevant stakeholders should be involved to make the decisions about the level of security.
In the above example, we did not list threats related to missing API security related HTTP response
headers. This is because they are the same for any REST API. These API security related HTTP
response headers are discussed in a later section of this chapter. The sending of these headers should
be consolidated to the API gateway so that all API microservices don’t have to implement them
themselves.
– Attacker being able to conduct malicious operations without getting caught (Risk: High)
– Attacker acting as someone else using stolen credentials (Risk: Medium)
• Authentication
– Attacker who does not have user account can access the service because of missing
authentication/authorization (Risk: High)
• Authorization
Security Principles 388
– Attacker is able to act as an administrator because the service does not check that the user
has a proper role (Risk: High)
– Attacker trying to create an order for someone else (Risk: High)
– Attacker trying read/update someone else’s order (Risk: High)
• Configuration Management
– Attacker is able to access operating system root rights, because the process is running
with root user rights (Risk: Medium)
– Attacker trying to tamper with database using SQL injection (Risk: High)
– Attacker able to capture and modify unencrypted internet traffic (Risk: High)
– Attacker able to capture and modify unencrypted internal network traffic (Risk: Low)
– Attacker able to access sensitive information because that is not properly encrypted (Risk:
Medium)
– Information is disclosed to attacker because internet traffic is plain-text, i.e. not secured
(Risk: High)
– Information is disclosed to attacker because internal network traffic is plain-text, i.e. not
secured (Risk: Low)
• Data Validation
• Exception Management
– Attacker receives sensitive information like detailed stack traces in request responses.
(Risk: Medium)
You can even use two different threat categorization method, like STRIDE and ASF together, because
when using multiple methods it can be more likely to discover all the possible threats. Now
considering the ASF categorization, we can see that the Configuration Management category speaks
about storage of secrets. When we used STRIDE, we did not discover any threat related to secrets. But
if we think about it, our order-service should have at least three secrets: database user name, database
user password and the encryption key used to encrypt sensitive data in the database. We must store
these secrets in a safe place, like as a Secret in Kubernetes environment. None of these secret should
not be hard-coded in the source code.
Security Principles 389
Regarding frontend authorization, attention must be paid to the secure storage of authorization-
related secrets like code verifier and tokens. Those must be stored in a secure location in the browser.
Below is a list of some insecure storing mechanisms:
• Cookies
• Session/Local Storage
– Easily stolen by malicious code because the encryption key is in plain text
• Global variable
Storing secrets in closure variables is not inherently insecure, but secrets are lost on page refresh or
new page.
Below is an example that uses a service worker as a secure storage of secrets. The additional benefit
of a service worker is that it does not allow malicious 3rd party code to modify the service worker’s
fetch method so that it can, for example, steal access tokens.
originalFetch = fetch;
fetch = (url, options) => {
// Implement malicious attack here
// For example: change some data in the request body
Of course, one can ask: why is it possible to modify the built-in method on the global object like that?
Of course, it should not be possible, but unfortunately, it is.
Let’s create a Vue.js application that performs authentication and authorization using the OpenID
Connect protocol, an extension of the OAuth2 protocol.
In the main module below, we set up the global fetch to always return an error and only allow our
tryMakeHttpRequest function to use the original global fetch method. Then we register a service
worker. If the service worker has already been registered, it is not registered again. Finally, we
create the application (App component), activate the router, activate the Pinia middleware for state
management, and mount the application to a DOM node:
Figure 7.2. main.ts
import { setupFetch } from "@/tryMakeHttpRequest";
setupFetch();
import { createApp } from "vue";
import { createPinia } from "pinia";
import App from "@/App.vue";
import router from "@/router";
if ("serviceWorker" in navigator) {
await navigator.serviceWorker.register("/serviceWorker.js");
}
Below is the definition of the App component. After mounting, it will check whether the user is already
authorized.
If the user is authorized, his/hers authorization information will be fetched from the service worker,
and the user’s first name will be updated in the authorization information store. The user will be
forwarded to the Home page.
If the user is not authorized, authorization will be performed.
Figure 7.3. App.vue
<template>
<HeaderView />
<router-view></router-view>
</template>
<script setup>
import { onMounted } from "vue";
import { useRouter } from "vue-router";
import authorizationService from "@/authService";
import { useAuthInfoStore } from "@/stores/authInfoStore";
import HeaderView from "@/HeaderView.vue";
import tryMakeHttpRequest from "@/tryMakeHttpRequest";
onMounted(async () => {
const response = await tryMakeHttpRequest("/authorizedUserInfo");
const responseBody = await response.text();
if (responseBody !== "") {
const authorizedUserInfo = JSON.parse(responseBody);
const { setFirstName } = useAuthInfoStore();
setFirstName(authorizedUserInfo.firstName);
router.push({ name: "home" });
} else if (route.path !== '/auth') {
authorizationService
.tryAuthorize()
.catch(() => router.push({ name: "auth-error" }));
}
});
</script>
The header of the application displays the first name of the logged-in user and a button for logging
the user out:
Figure 7.5. HeaderView.vue
<template>
<span>{{authInfoStore.firstName}}</span>
<button @click="logout">Logout</button>
</template>
<script setup>
import { useRouter } from "vue-router";
import authorizationService from "@/authService";
import { useAuthInfoStore } from "@/stores/authInfoStore";
function logout() {
authorizationService
.tryLogout()
.catch(() => router.push({ name: "auth-error" }));
}
</script>
The tryMakeHttpRequest function is a wrapper around the browser’s global fetch method. It will start
an authorization procedure if an HTTP request returns the HTTP status code 403 Forbidden.
Figure 7.6. tryMakeHttpRequest.ts
return response;
});
}
const allowedOrigins = [
"http://localhost:8080", // IAM in dev environment
"http://localhost:3000", // API in dev environment
"https://software-system-x.domain.com" // prod environment
];
function respondWithUserInfo(event) {
const response =
new Response(data.authorizedUserInfo
? JSON.stringify(data.authorizedUserInfo)
: '');
event.respondWith(response);
}
function respondWithIdToken(event) {
const response = new Response(data.idToken
? data.idToken
: '');
event.respondWith(response);
}
function respondWithTokenRequest(event) {
let body = "grant_type=authorization_code";
body += `&code=${data.code}`;
body += `&client_id=app-x`;
body += `&redirect_uri=${data.redirectUri}`;
body += `&code_verifier=${data.codeVerifier}`;
const tokenRequest = new Request(event.request, { body });
function respondWithApiRequest(event) {
Security Principles 394
event.respondWith(fetch(authorizedRequest));
}
function fetchHandler(event) {
const requestUrl = new URL(event.request.url);
if (event.request.url.endsWith('/authorizedUserInfo') &&
!apiEndpointRegex.test(requestUrl.pathname)) {
respondWithUserInfo(event);
} else if (event.request.url.endsWith('/idToken') &&
!apiEndpointRegex.test(requestUrl.pathname)) {
respondWithIdToken(event);
} else if (allowedOrigins.includes(requestUrl.origin)) {
if (tokenEndpointRegex.test(requestUrl.pathname)) {
respondWithTokenRequest(event);
} else if (apiEndpointRegex.test(requestUrl.pathname)) {
respondWithApiRequest(event);
}
} else {
event.respondWith(fetch(event.request));
}
}
Authorization using the OAuth2 Authorization Code Flow is started with a browser redirect to a URL
of the following kind:
https://authorization-server.com/auth?response_type=code&client_id=CLIENT_ID&redirect_uri\
=https://example-app.com/cb&scope=photos&state=1234zyx...ghvx3&code_challenge=CODE_CHALLE
NGE&code_challenge_method=SHA256
• state - A random string generated by your application, which you’ll verify later
• code_challenge - PKCE extension: URL-safe base64-encoded SHA256 hash of the code verifier.
A code verifier is a random string secret you generate
• code_challenge_method=S256 - PKCE extension: indicates which hashing method is used (S256
means SHA256)
We need to use the PKCE extension as an additional security measure because we perform the
Authorization Code Flow in the frontend instead of the backend.
If authorization is successful, the authorization server will redirect the browser to the above-given
_redirecturi with code and state given as URL query parameters, for example:
https://example-app.com/cb?code=AUTH_CODE_HERE&state=1234zyx...ghvx3
After the application is successfully authorized, tokens can be requested with the following kind of
HTTP POST request:
grant_type=authorization_code&
code=AUTH_CODE_HERE&
redirect_uri=REDIRECT_URI&
client_id=CLIENT_ID&
code_verifier=CODE_VERIFIER
interface AuthorizedUserInfo {
readonly userName: string;
readonly firstName: string;
readonly lastName: string;
readonly email: string;
}
const response =
await tryMakeHttpRequest(oidcConfiguration.token_endpoint, {
method: "post",
mode: "cors",
headers: {
"Content-Type": "application/x-www-form-urlencoded",
},
});
return response.json();
}
authUrl += "?response_type=code";
authUrl += "&scope=openid+profile+email";
authUrl += `&client_id=${this.clientId}`;
authUrl += `&redirect_uri=${this.authRedirectUrl}`;
authUrl += `&state=${state}`;
Security Principles 398
authUrl += `&code_challenge=${challenge.code_challenge}`;
authUrl += "&code_challenge_method=S256";
return authUrl;
}
navigator.serviceWorker?.controller?.postMessage({
key: "refreshToken",
value: tokens.refresh_token,
});
navigator.serviceWorker?.controller?.postMessage({
key: "idToken",
value: tokens.id_token,
});
}
private storeAuthorizedUserInfo(
idToken: any,
authInfoStore: ReturnType<typeof useAuthInfoStore>
) {
const idTokenClaims: any = jwt_decode(idToken);
const authorizedUserInfo = {
userName: idTokenClaims.preferred_username,
firstName: idTokenClaims.given_name,
lastName: idTokenClaims.family_name,
email: idTokenClaims.email,
};
navigator.serviceWorker?.controller?.postMessage({
key: "authorizedUserInfo",
value: authorizedUserInfo
});
authInfoStore.setFirstName(idTokenClaims.given_name);
}
}
Below is an example response you get when you execute the tryMakeHttpRequest function in the
tryGetTokens method:
Security Principles 399
{
"access_token": "eyJz93a...k4laUWw",
"id_token": "UFn43f...c5vvfGF",
"refresh_token": "GEbRxBN...edjnXbL",
"token_type": "Bearer",
"expires_in": 3600
}
The AuthorizationCallback component is the component that will be rendered when the autho-
rization server redirects the browser back to the application after successful authorization. This
component stores the authorization code and the received state in the service worker and initiates
a request for tokens. After receiving tokens, it will route the application to the home page. As an
additional security measure, the token request will only be performed if the original state and received
state are equal. This check is done in the service worker code.
Figure 7.9. AuthorizationCallback.vue
<template>
<div></div>
</template>
<script setup>
import { onMounted } from "vue";
import { useRouter, useRoute } from "vue-router";
import authorizationService from "@/authService";
import { useAuthInfoStore } from "@/stores/authInfoStore";
onMounted(async () => {
// Store authorization code in service worker
navigator.serviceWorker?.controller?.postMessage({
key: "code",
value: query.code,
});
const routes = [
{
path: "/",
name: "login",
component: LoginView,
},
{
path: "/auth",
name: "auth",
component: AuthorizationCallback,
},
{
path: "/auth-error",
name: "auth-error",
component: AuthorizationError,
},
{
path: "/home",
name: "home",
component: HomeView,
},
];
The below authService module contains definitions of needed constants and creates an instance of the
AuthorizationService class. The below code contains values for a local development environment.
In real life, these values should be taken from environment variables. The below values work if you
have a Keycloak service running at localhost:8080 and the Vue app running at localhost:5173. You
must create a client in the Keycloak with the name ‘app-x’. Additionally, you must define a valid
redirect URI and add an allowed web origin. Lastly, you must configure a valid post-logout redirect
URI (see the below image). The default access token lifetime in Keycloak is just one minute. You can
increase that for testing purposes in the realm settings (the token tab)
const oidcConfigurationEndpoint =
"http://localhost:8080/realms/master/.well-known/openid-configuration";
Only let authorized users access resources. The best way not to forget to implement authorization
is to deny access to resources by default. You can require that an authorization annotation must
be present in all controller methods. If an API endpoint does not require authorization, a special
annotation like @allow_any_user could be used. If a controller method is missing an authorization
annotation, an exception can be thrown, for example. This way, you can never forget to add an
authorization annotation to a controller method.
Broken access control is number one in the OWASP Top 10 for 2021. Especially remember to disallow
users to create resources for other users. Also disallow users to view, edit or delete resources belonging
to someone else (also known as Insecure Direct Object Reference (IDOR) prevention). It is not enough
to use universally unique ids (UUIDs) as ids for resources instead of using basic integers. This is
because if an attacker can obtain a URL for an object with an UUID, he can access the object behind
the URL because there is no access control in place.
Below is a JWT-based authorizer class that can be used in a FastAPI backend API service. In the
example we are using the following additional python libraries: python-benedict and pyjwt. The
below example utilizes role-based access control (RBAC), but there more modern alternatives for that
including attribute-based access control (ABAC) and relationship-based access control (ReBAC). More
information about those is available in OWASP Authorization Cheat Sheet1
1 https://cheatsheetseries.owasp.org/cheatsheets/Authorization_Cheat_Sheet.html
Security Principles 403
class Authorizer(Protocol):
pass
import requests
from Authorizer import Authorizer
from benedict import benedict
from fastapi import HTTPException, Request
from jwt import PyJWKClient, PyJWKClientError, decode
from jwt.exceptions import InvalidTokenError
class __JwtAuthorizer(Authorizer):
IAM_ERROR: Final = 'IAM error'
def __init__(self):
# OpenId Connect configuration endpoint in the IAM system
self.__oidc_config_url = os.environ['OIDC_CONFIG_URL']
self.__jwks_client = None
# This is the URL where you can fetch the user id for a
# specific 'sub' claim value in the access token
# For example: http://localhost:8082/user-service/users
self.__get_users_url = os.environ['GET_USERS_URL']
request: Request
) -> None:
jwt_user_id = self.__get_jwt_user_id(request)
try:
get_entity_by_id_and_user_id(id, jwt_user_id)
except HTTPException as error:
if error.status_code == 404:
raise HTTPException(status_code=403, detail='Unauthorized')
# Log error details
raise HTTPException(status_code=500, detail=self.IAM_ERROR)
def authorize_if_user_has_one_of_roles(
self, allowed_roles: list[str], request: Request
) -> None:
claims = self.__decode_jwt_claims(
request.headers.get('Authorization')
)
try:
roles = benedict(claims)[self.__roles_claim_path]
except KeyError as error:
# Log error details
raise HTTPException(status_code=500, detail=self.IAM_ERROR)
user_is_authorized = any(
[True for role in roles if role in allowed_roles]
)
if not user_is_authorized:
raise HTTPException(status_code=403, detail='Unauthorized')
def __decode_jwt_claims(
self, auth_header: str | None
) -> dict[str, Any]:
if not auth_header:
raise HTTPException(status_code=401, detail='Unauthenticated')
try:
if not self.__jwks_client:
oidc_config_response = requests.get(self.__oidc_config_url)
oidc_config_response.raise_for_status()
oidc_config = oidc_config_response.json()
self.__jwks_client = PyJWKClient(oidc_config['jwks_uri'])
return jwt_claims
Security Principles 405
try:
sub_claim = claims['sub']
users_response = requests.get(
f'{self.__get_users_url}?sub={sub_claim}&fields=id'
)
users_response.raise_for_status()
# Response JSON is expected in the form [{ "id": 12345 }]
users = users_response.json()
except (KeyError, requests.RequestException) as error:
# Log error details
raise HTTPException(status_code=500, detail=self.IAM_ERROR)
try:
return users[0].id
except (IndexError, AttributeError):
raise HTTPException(status_code=403, detail='Unauthorized')
authorizer = __JwtAuthorizer()
app = FastAPI()
@app.get('/sales-item-service/sales-items')
async def get_sales_items():
# No authentication/authorization required
# Send sales items
@app.post('/messaging-service/messages')
Security Principles 406
@app.get('/order-service/orders/{id}')
async def get_order(id: int, request: Request):
authorizer.authorize_for_user_own_resources_only(
id,
order_service.get_order_by_id_and_user_id,
request
)
# Get order identified with 'id'
# and having user id of JWT's owner
@app.post('/order-service/orders')
async def create_order(order: InputOrder, request: Request):
authorizer.authorize_for_self(
order.user_id,
request
)
# Create an order for the user
# User cannot create orders for other users
@app.put('/order-service/orders/{id}')
async def update_order(id: int, order: OrderUpdate, request: Request):
authorizer.authorize_for_user_own_resources_only(
id,
order_service.get_order_by_id_and_user_id,
request
)
# Update an order identified with 'id'
# and user id of JWT's owner
@app.delete('/order-service/orders/{id}')
async def delete_order(id: int, request: Request):
authorizer.authorize_if_user_has_one_of_roles(
['admin'], request
)
# Only admin user can delete an order
In the above example, the authorization is separately coded inside each request handler. We
could extract the authorization from the request handler methods to decorators that can be used
in conjunction with the methods. The decorators could be implemented in a separate library and
they can accept any authorizer that implements the Authorizer protocol, not just the JwtAuthorizer:
Security Principles 407
class AuthDecorException(Exception):
pass
def allow_any_user(handle_request):
return handle_request
def allow_for_user_own_resources_only(
Security Principles 408
authorizer: Authorizer,
get_entity_by_id_and_user_id: Callable[[int, int], Any]
):
def decorate(handle_request):
@wraps(handle_request)
async def wrapped_handle_request(*args, **kwargs):
try:
authorizer.authorize_for_user_own_resources_only(
kwargs['id'],
get_entity_by_id_and_user_id,
kwargs['request']
)
except (KeyError):
raise AuthDecorException(
"Request handler must accept 'id' and 'request' parameters"
)
return await handle_request(*args, **kwargs)
return wrapped_handle_request
return decorate
In the above example, we implemented decorators that can take parameters. Those decorators have
three levels of nested functions compared to non-parameterized decorators that have only two levels
of nested functions. Normal decorators without parameters can be used the following ways:
def my_decorator(func):
# ...
@my_decorator
def func():
# ...
my_decorated_func = my_decorator(func)
def my_decorator(arg):
def decorate(func):
# ...
return decorate
@my_decorator(some_arg)
def func():
# ...
my_decorated_func = my_decorator(some_arg)(func)
In the above authorization decorators example, we needed to use wraps decorator from the functools
module. This is because how FastAPI works in regard to the request handler methods. If we did not
use the wraps decorator, we would get an error, which states that FastAPI is expecting args and kwargs
parameters for a request handler. All the above authorization decorators expect the allow_any_user
require that FastAPI request handlers accept a request argument. The allow_for_self decorator also
requires that a FastAPI request handler accepts either a user_id argument or a DTO argument with
the argument name ending with dto word. This DTO argument must be an object having a user_id
attribute.
Now we can use the above defined authorization decorators when defining the request handlers as
follows:
# Imports ...
@app.get('/sales-item-service/sales-items')
@allow_any_user
async def get_sales_items():
# ...
@app.post('/messaging-service/messages')
@allow_authorized_user(authorizer)
async def create_message(request: Request):
# ...
@app.get('/order-service/orders/{id}')
@allow_for_user_own_resources_only(
authorizer,
order_service.get_order_by_id_and_user_id
)
async def get_order(id: int, request: Request):
# ...
@app.post('/order-service/orders')
@allow_for_user_own_resources_only(authorizer)
async def create_order(
order_dto: InputOrder, request: Request
):
# ...
@app.put('/order-service/orders/{id}')
@allow_for_user_own_resources_only(
authorizer,
Security Principles 410
order_service.get_order_by_id_and_user_id
)
async def update_order(
id: int, order: OrderUpdate, request: Request
):
# ...
@app.delete('/order-service/orders/{id}')
@allow_for_user_roles(['admin'], authorizer)
async def delete_order(id: int, request: Request):
# ...
Next we create a function that can used to check that all request handlers in a microservice project
contain an authorization decorator. This function should be called before starting the microservice.
You can put this function and its call into microservice starter project so that all new microservices
created from the starter project automatically check for the presence of authorization decorator in all
request handlers.
import os
class AuthDecorNotSpecifiedException(Exception):
def __init__(self, file_name: str, line_number: int):
self.__file_name = file_name
self.__line_number = line_number
def __str__(self):
return f'Auth decorator not specified in file {self.__file_name} line {self.__lin\
e_number}'
def ensure_request_handlers_have_auth*decor():
for path, *, file_names in os.walk('./'):
for file_name in file_names:
if file_name.endswith('.py'):
file_path_name = os.path.join(path, file_name)
with open(file_path_name) as file:
lines = file.readlines()
prev_line = ''
for line_index, line in enumerate(lines):
line = line.strip()
if any(
[
prev_line.startswith(decorator)
for decorator in [
'@app.get',
'@app.put',
'@app.patch',
'@app.post',
'@app.delete',
]
]
):
if not line.startswith('@allow_'):
line_number = line_index + 1
raise AuthDecorNotSpecifiedException(
file_name, line_number
Security Principles 411
)
prev_line = line
if os.environ.get('ENV') == 'DEVELOPMENT':
ensure_request_handlers_have_auth_decor()
The above code will walk all Python files in the current directory and its subdirectories, read their
contents and check for @allow_xxx decorator to be placed after any of the following decorators:
@app.get/put/patch/post/delete.
7.4.3: Cryptography
The following are the key security features to implement related to cryptography:
– You don’t need to implement HTTPS in all the microservices because you can set up a
service mesh and configure it to implement mTLS between services
• Do not store sensitive information like personally identifiable information (PII) in clear text
– Encrypt sensitive data before storing it in a database and decrypt it upon fetching from
the database
– Remember to identify which data is classified as sensitive according to privacy laws,
regulatory requirements, or business needs
– Do not use legacy protocols such as FTP and SMTP for transporting sensitive data
– Discard sensitive data as soon as possible or use tokenization (e.g., PCI DSS compliant) or
even truncation
– Do not cache sensitive data
• Do not use old/weak cryptographic algorithms. Use robust algorithms like SHA-256 or AES-256
• Do not allow using default/weak passwords or default encryption keys in a production
environment
– You can implement validation logic for passwords/encryption keys in microservices when
the microservices run in production. If a microservice’s used passwords/encryption keys
are not strong enough, the microservice should not run but exit with an error
Encryption keys should be rotated (i.e. changed) when one or more of the following criteria is met:
Encryption key rotation should happen so that all existing data is decrypted and encrypted with the
new key. This will happen gradually of course and for that reason e.g. each encrypted database table
row must contain an id of the used encryption key. When all existing data is encrypted with the new
key meaning all references to it are removed, the old key can be destroyed.
• Establish request rate limiting for microservices. It can be done at the API gateway level or by
the cloud provider
• Use Captcha to prevent non-human (robotic) users from performing potentially expensive
operations like creating new resources or fetching large resources, like large files, for example
Security Principles 413
• Use parameterized SQL statements. Do not concatenate user-supplied data directly to an SQL
statement string
• Remember that you cannot use parameterization in all parts of an SQL statement. If you must
put user-supplied data into an SQL statement without parameterization, sanitize/validate it
first. For example, for LIMIT, you must validate that the user-supplied value is an integer and
in a given range
• Migrate to use ORM (Object Relational Mapping)
• Use proper limiting on the number of fetched records within queries to prevent mass disclosure
of records
• Verify the correct shape of the first query result row. Do not send the query result to the client
if the shape of the data in the first row is wrong, e.g., it contains the wrong fields.
import os
user_supplied_dir = ...
os.system(f'mkdir {user_supplied_dir}')
A malicious user can supply for example the following kind of directory: some_dir && rm -rf /.
Instead, use a specific function provided by the os module:
import os
user_supplied_dir = ...
os.mkdir(user_supplied_dir)
An example of the above Docker container security configuration is given in the DevSecOps Principles
chapter later in the book.
Implement the sending of security-related HTTP response headers in the API gateway:
• X-Content-Type-Options: nosniff
• Strict-Transport-Security: max-age: ; includeSubDomains
• X-Frame-Options: DENY
• Content-Security-Policy: frame-ancestors 'none'
• Content-Type: application/json
• If caching is not specifically enabled and configured, the following header should be set:
Cache-Control: no-store
• Access-Control-Allow-Origin: https://your_domain_here
If you are returning HTML instead JSON, you should replace/add also the following response headers:
Disable browser features that are not needed/wanted using the Permissions-Policy response header.
The below example disables all the listed features:
Security Principles 415
7.4.10: Integrity
Use only container images with tags that have an SHA digest. If an attacker succeeds in publishing
a malicious container image with the same tag, the SHA digest prevents from taking that malicious
image into use. Ensure you use libraries and dependencies from trusted sources, like NPM or Maven.
You can also host internal mirrors of repositories to avoid accidentally using any untrusted repository.
Ensure a review process exists for all code (source, deployment, infrastructure) and configuration
changes so that no malicious code can be introduced into your software system.
7.4.12: Logging
When writing log entries, never write any of the below to the log:
Security Principles 416
• Session ids
• Access tokens
• Personally identifiable information (PII)
• Passwords
• Database connection strings
• Encryption keys
• Information that is not legal to collect
• Information that the end-user has opted out of collection
When validating numeric values, always validate that a value is in a specified range. For example, if
you use an unvalidated number to check if a loop should end and that number is very large, it can
cause a denial of service (DoS). If a number should be an integer, don’t allow floating-point values.
When validating a string, always validate the maximum length of the string first. Only after that
perform additional validation. Validating a long string using a regular expression can cause a regular
expression denial of service (ReDoS). You should avoid crafting your own regular expressions for
validation purposes, instead use a ready-made library that contains battle-tested code. Consider also
using the Google RE2 library3 . It is safer than regular expression functionality provided by many
language runtimes, and your code will be less susceptible to ReDoS attacks.
Timestamps (or times or dates) are usually given as an interger or string. Apply needed validation
to a timestamp/time/date value. For example, you can validate if a timestamp is in future or past, or
timestamp is earlier or later than a specific timestamp
When validating an array, you should validate the size of the array not being too small or large, and
you can validate the uniqueness of values if needed. Also, after validating the maximum size of the
array, remember to validate each value in the array separately.
Validate an object by validating each property of the object separately. Remember to validate nested
objects also.
• Ensure the file name extension for the uploaded file is one of the allowed extensions
• Ensure the file is not larger than a defined maximum size
3 https://github.com/google/re2/tree/abseil/python
Security Principles 418
• When storing an uploaded file on the server side, pay attention to following:
– Do not use a file name supplied by the user, but use a new filename to store the file on
the server
– Do not let the user to choose the path where the uploaded file is stored on the server
8: API Design Principles
This chapter presents design principles for both frontend facing and backend APIs. First, frontend-
facing API design is discussed, and then inter-microservice API design is covered.
As the name says, JSON-RPC APIs are for executing remote procedure calls. The remote procedure
argument is a JSON object in the HTTP request body. And the remote procedure return value is a
JSON object in the HTTP response body. A client calls a remote procedure by issuing an HTTP POST
request where it specifies the name of the procedure in the URL path and gives the argument for the
remote procedure call in the request body in JSON.
Below is an example request for a translation service’s translate procedure:
1 https://github.com/grpc/grpc-web
API Design Principles 420
{
"text": "Ich liebe dich"
"fromLanguage": "German",
"toLanguage": "English"
}
The API server shall respond with an HTTP status code and include the procedure’s response in the
HTTP response body in JSON.
For the above request, you get the following response:
HTTP/1.1 200 OK
Content-Type: application/json
{
"translatedText": "I love you"
}
{
"containingText": "Software design patterns"
}
HTTP/1.1 200 OK
Content-Type: application/json
[
{
"url": "https://...",
"title": "...",
"date": "...",
"contentExcerpt": "..."
},
More results here ...
]
You can create a complete service using JSON-RPC instead of REST or GraphQL. Below are five
remote procedures defined for a sales-item-service. The procedures are for basic CRUD operations.
The benefit of using JSON-RPC instead of REST, GraphQL, or gRPC, is that you don’t have to learn
any specific technology.
API Design Principles 421
{
"name": "Sample sales item",
"price": 20
}
{
"id": 1
}
{
"id": 1,
"name": "Sample sales item name modified",
"price": 30
}
{
"id": 1
}
You can easily create API endpoints for the above service. Below is an example implemented with
FastAPI.
API Design Principles 422
app = FastAPI()
class InputSalesItem(BaseModel):
name: str
price: int
class OutputSalesItem(BaseModel):
id: int
name: str
price: int
class SalesItemUpdate(InputSalesItem):
id: int
class Id(BaseModel):
id: int
@app.post("/sales-item-service/create-sales-item", response_model=OutputSalesItem)
async def create_sales_items(sales_item: InputSalesItem) -> Any:
# ...
@app.post("/sales-item-service/get-sales-items", response_model=list[OutputSalesItem])
async def get_sales_items() -> Any:
# ...
@app.post("/sales-item-service/get-sales-item-by-id", response_model=OutputSalesItem)
async def get_sales_item_by_id(id: Id) -> Any:
# ...
@app.post("/sales-item-service/update-sales-item", response_model=OutputSalesItem)
async def update_sales_item(sales_item_update: SalesItemUpdate) -> Any:
# ...
@app.post("/sales-item-service/delete-sales-item-by-id")
async def delete_sales_item_by_id(id: Id) -> None:
# ...
You can version your API by adding a version number to the URL path. In the below example, the new
API version 2 allows a new procedure argument someNewParam to be supplied for the search-web-pages
procedure.
API Design Principles 423
{
"containingText": "Software design patterns"
"someNewParam": "..."
}
Many APIs fall into the category of performing CRUD operations on resources. Let’s create an
example REST API called sales-item-service for performing CRUD operations on sales items.
Creating a new resource using a REST API is done by sending an HTTP POST request to the API’s
resource endpoint. The API’s resource endpoint should be named according to resources it handles.
The resource endpoint name should be a noun and always given in the plural form, for example, for
the sales-item-service, the resource endpoint should be sales-items, and for an order-service handling
orders, the resource endpoint should be called orders.
You give the resource to be created in the HTTP request body in JSON. To create a new sales item,
you can issue the following request:
{
"name": "Sample sales item",
"price": 20
}
The server will respond with the HTTP status code 201 Created. The server can add fields to the
resource upon creation. Typically, the server will add an id property to the created resource, but
it can add other properties also. The server will respond with the created resource in the HTTP
response body in JSON. Below is a response to a sales item creation request. You can notice that the
server added the id property to the resource. Other properties that are usually added are the creation
timestamp and the version of the resource (the version of a newly created resource should be one).
API Design Principles 424
{
"id": 1,
"name": "Sample sales item",
"price": 20
}
If the supplied resource to be created is somehow invalid, the server should respond with the HTTP
status code 400 Bad Request and explain the error in the response body. The response body should be
in JSON format containing information about the error, like the error code and message. To make API
error responses consistent, if possible, use the same error response format throughout all the APIs in
a software system. Below is an example of an error response:
{
"statusCode": 500,
"statusText": "Internal Server Error",
"errorCode": "IAMError",
"errorMessage": "Unable to connect to the Identity and Access Management service"
"errorDescription": "Describe the error in more detail here, if relevant/needed..."
"stackTrace": "Call stack trace here..."
}
NOTE! In the above example, the stackTrace property should NOT be included in the production
environment by default, because it can reveal internal implementation details to possible attackers.
Use it only in development and other internal environments, and if absolutely needed, enable it in
the production environment only for a short period of time to conduct debugging. The errorCode
property is useful for updating error counter metric(s). Use it as a label for the error counter(s). There
will be more discussion about metrics in the coming DevSecOps principles chapter.
If the created resource is huge, there is no need to return the resource to the caller and waste network
bandwidth. You can return the added properties only. For example, if the server only adds the id
property, it is possible to return only the id in the response body:
{
"id": 1
}
The request sender can construct the created resource by merging the sent resource object with the
received resource object.
When a client tries to create a new resource, the resource creation request may fail so that the resource
was created successfully created on the server, but the client did not receive a response on time, and
API Design Principles 425
the request failed due to timeout. From the server’s point of view, the request was successful, but
from the client’s point of view, the status of the request was indeterminate. The client, of course,
needs to re-issue the time-outed request, and if it succeeds, the same resource is created twice on the
server side, which is probably always unwanted.
Suppose a resource contains a unique property, like a user’s email. In that case, it is impossible to
create a duplicate resource if the server is correctly implemented (= the unique property is marked as
a unique column in the database table definition). In many cases, such a unique field does not exist
in the resource. In those cases, the client can supply a universally unique identifier (UUID), named
creationUuid, for example. The role of the server is to check if a resource with the same creationUuid
was already created and to fail the creation of a duplicate resource. As an alternative to the UUID
approach, the server can ask for verification from the client if the creation of two identical resources
is intended in case the server receives two identical resources from the same client during a short
period of time.
Reading resources with a REST API is done by sending an HTTP GET request to the API’s resource
endpoint. To read all sales items, you can issue the following request:
The server will respond with the HTTP status code 200 OK. The server will respond with a JSON
array of resources in the response body or an empty array in case none is found. Below is an example
response to a request to get the sales items:
HTTP/1.1 200 OK
Content-Type: application/json
[
{
"id": 1,
"name": "Sample sales item",
"price": 20
}
]
To read a single resource by its id, add the resource id to the request URL path as follows:
The following request can be issued to read the sales item identified with id 1:
HTTP/1.1 200 OK
Content-Type: application/json
{
"id": 1,
"name": "Sample sales item",
"price": 20
}
The server responds with the HTTP status code 404 Not Found if the requested resource is not found.
You can define parameters in the URL query string to filter what resources to read. A query string is
the last part of the URL and is separated from the URL path by a question mark (?) character. A query
string can contain one or more parameters separated by ampersand (&) characters. Each query string
parameter has the following format: <query-parameter-name>=<query-parameter-value>. Below is an
example request with two query parameters: name-contains and price-greater-than.
The above request gets sales items whose name contains the string Sample and whose price is greater
than 10.
To define a filter, you can specify a query parameter in the following format:
<fieldName>[-<condition>]=<value>,for example:
• price=10
• price-not-equal=10
• price-less-than=10
• price-less-than-equal=10
• price-greater-than=10
• price-greater-than-equal=10
• name-starts-with=Sample
• name-ends-with=item
• name-contains=Sample
• createdTimestamp-before=2022-08-02T05:18:00Z
• createdTimestamp-after=2022-08-02T05:18:00Z
• images.url-starts-with=https
Remember that when implementing the server side and adding the above-given parameters to an SQL
query, you must use a parameterized SQL query to prevent SQL injection attacks because an attacker
can send malicious data in the query parameters.
Other actions like projection, sorting, and pagination for the queried resources can also be defined
with query parameters in the URL:
API Design Principles 427
GET /sales-item-service/sales-items?fields=id,name&sort-by=price:asc&offset=0&limit=100 H\
TTP/1.1
The above request gets sales items sorted by price (ascending). The number of fetched sales items is
limited to 100. Sales items are fetched beginning from the offset 0, and the response only contains
fields id and name for each sales item.
The fields parameter defines what resource fields (properties) are returned in the response. The
wanted fields are defined as a comma-separated list of field names. If you want to define sub-resource
fields, those can be defined with the dot notation, for example:
fields=id,name,images.url
The sort-by query parameter defines sorting using the following format:
sort-by=<fieldName>:asc|desc,[<fieldName>:asc|desc]
For example:
sort-by=price:asc,images.rank:asc
In the above example, the resources are returned sorted first by ascending price and secondarily by
image’s rank.
The limit and offset parameters are used for pagination. The limit query parameter defines the
maximum number of resources that can be returned. The offset query parameter specifies the offset
from which resources are returned. You can also paginate sub-resources by giving the offset and limit
in the form of <sub-resource>:<number>. Below is an example of using pagination query parameters:
offset=0&limit=50,images:5
The above query parameters define that the first page of 50 sales items is fetched, and each sales
item contains the first five images of the sales item. Instead of offset and limit parameters, you can
use page and pageSize parameters. The page parameter defines the page number, and the pageSize
defines how many resources a page should contain.
Remember to validate user-supplied data to prevent SQL injection attacks when implementing the
server side and adding data from query parameters to an SQL query. For example, field names in the
fields query parameter should only contain characters allowed in an SQL column name. Similarly, the
value of the sort-by parameter should only contain characters allowed in an SQL column name and
words asc and desc. And finally, the values of the offset and limit (or page and pageSize) parameters
must be integers. You should also validate the limit/pageSize parameter against the maximum allowed
value because you should not allow end-users to fetch too many resources at a time.
Some HTTP servers log the URL of an HTTP GET request. For this reason, it is not recommended
to put sensitive information in the URL. Sensitive information should be put into a request body.
Also, browsers can have a limit for the maximum length of an URL. If you have a query string that
is thousands of characters long, you should give parameters in the request body instead. You should
not put a request body to an HTTP GET request. What you should do is issue the request using the
HTTP POST method instead, for example:
API Design Principles 428
{
"fields": ["name"],
"sortBy": "price:asc",
"limit": 100
}
The server can confuse the above request with a sales item creation request because the URL and the
HTTP method are identical to a resource creation request. For this reason, a custom HTTP request
header X-HTTP-Method-Override has been added to the request. The server should read the custom
header and treat the above request as a GET request. The X-HTTP-Method-Override header tells the
server to override the request method with the method supplied in the header.
Updating a resource with a REST API is done by sending an HTTP PUT or PATCH request to the
API’s resource endpoint. To update the sales item identified with id 1, you can issue the following
request:
{
"name": "Sample sales item name modified",
"price": 30
}
The server can also sent the updated resource back in the response, especially this is needed if the
resource is modified by the server somehow. The server will respond with the HTTP status code 404
Not Found if the requested resource is not found.
If the supplied resource in the request is invalid, the server should respond with the HTTP status code
400 Bad Request. The response body should contain an error object in JSON.
HTTP PUT request will replace the existing resource with the supplied resource. You can also modify
an existing resource partially using the HTTP PATCH method:
API Design Principles 429
{
"price": 30
}
The above request only modifies the price property of the sales item identified with id 1.
You can do bulk updates by specifying a filter in the URL, for example:
{
"price": 10
}
The above example will update the price property of each resource where the price is less than ten
currently. On the server side, the API endpoint could use the following parameterized SQL statement
to implement the update functionality:
The above SQL statement will only modify the price column, and other columns will remain intact.
When you get a resource from the server and then try to update it, it is possible that someone else has
updated it after you got it and before you try to update it. Sometimes this can be ok if you don’t care
about other clients’ updates. But sometimes, you want to ensure no one else has updated the resource
before you update it. In that case, you should use resource versioning. In the resource versioning,
there is a version field in the resource, which is incremented by one during each update. If you get
a resource with version x and then try to update the resource giving back the same version x to the
server, but someone else has updated the resource to version x + 1, your update will fail because of
the version mismatch (x != x + 1). The server should respond with the HTTP status code 409 Conflict.
After receiving the conflict response, you can fetch the latest version of the resource from the server
and, based on the resource’s new state, decide whether your update is still relevant or not.
The server should assign the resource version value to the HTTP response header ETag. A client can
use the received ETag value in a conditional HTTP GET request by assigning the received ETag value
to the request header If-None-Match. Now the server will return the requested resource only if it has
a newer version. Otherwise, the server returns nothing with the HTTP status code 304 Not Modified.
This has the advantage of not needing to transfer an unmodified resource from the server to the client.
This can be especially beneficial when the resource is large or the connection between the server and
the client is slow.
API Design Principles 430
Deleting a resource with a REST API is done by sending an HTTP DELETE request to the API’s
resource endpoint. To delete the sales item identified with id 1, you can issue the following request:
If the resource requested to be deleted has already been deleted, the API should still respond with the
HTTP status code 204 No Content, meaning a successful operation. It should not respond with the
HTTP status code 404 Not Found.
To delete all sales items, you can issue the following request:
To delete sales items using a filter, you can issue the following kind of request:
On the server side, the API endpoint handler can use the following parameterized SQL query to
implement the deleting functionality:
Sometimes you need to perform non-CRUD actions on resources. In those cases, you can issue an
HTTP POST request and put the name of the action (a verb) after the resource name in the URL. The
below example will perform a deposit action on an account resource:
{
"amountInCents": 2510
}
{
"amountInCents": 2510
}
A resource can be composed of other resources. There are two ways to implement resource
composition: Nesting resources or linking resources. Let’s have an example of nesting resources
first. A sales item resource can contain one or more image resources. We don’t want to return all
images when a client requests a sales item because images can be large and are not necessarily used
by the client. What we could return is a set of small thumbnail images. For a client to view the images
of a sales item, we could implement an API endpoint for image resources. To get images for a specific
sales item, the following API call can be issued:
The problem with this approach is that the sales-item-service will grow in size and if you need to
add more nested resources in the future, the size will grow even more, making the microservice too
complex and being responsible for too many things.
A better alternative is to create a separate microservice for the nested resources. This will enable
utilizing the best-suited technologies to implement a microservice. Regarding the sales item images,
the sales-item-image_service could employ a cloud object storage to store images, and the sales-item-
service could utilize a standard relational database for storing sales items.
When having a separate microservice for sales item images, you can get the images for a sales item
by issuing the following request:
You can notice that the sales-item_service and sales-item-image-service are now linked by the
salesItemId.
API Design Principles 432
Hypermedia as the Engine of Application State (HATEOAS) can be used to add hypermedia/metadata
to a requested resource. Hypertext Application Language (HAL) is a convention for defining
hypermedia (metadata), such as links to external resources. Below is an example response to a request
that fetches the sales item with id 1234. The sales item is owned by the user with id 5678. The response
provides a link to the fetched resource itself and another link to fetch the user (account) that owns
the sales item:
{
"_links": {
"self": {
"href": "https://.../sales-item-service/sales-items/1234"
},
"userAccount": {
"href": "https://.../user-account-service/user-accounts/5678"
}
},
"id": 1234,
"name": "Sales item xyz"
"userAccountId": 5678
}
When fetching a collection of sales items for page 3 using HAL, we can get the following kind of
response:
{
"_links": {
"self": {
"href": "https://.../sales-items?page=3"
},
"first": {
"href": "https://...sales-items"
},
"prev": {
"href": "https://.../sales-items?page=2"
},
"next": {
"href": "https://.../sales-items?page=4"
},
},
"count": 25,
"total": 1500,
"_embedded": {
"salesItems": [
{
"_links": {
"self": {
"href": "https://.../sales-items/123"
}
},
"id": 123,
"name": "Sales item 123"
},
API Design Principles 434
{
"_links": {
"self": {
"href": "https://.../sales-items/124"
}
},
"id": 124,
"name": "Sales item 124"
},
.
.
.
]
}
}
8.1.2.9: Versioning
You can introduce a new version of an API using a versioning URL path segment. Below are example
endpoints for API version 2:
GET /sales-item-service/v2/sales-items HTTP/1.1
...
8.1.2.10: Documentation
If you need to document or provide interactive online documentation for a REST API, there are two
ways:
1) Spec-First: create a specification for the API and then generate code from the specification
2) Code-First: implement the API and then generate the API specification from the code
Tools like Swagger and Postman can generate both static and interactive documentation for your API
based on the API specification. You should specify APIs using the OpenAPI specification2 .
When using the first alternative, you can specify your API using the OpenAPI specification language.
You can use tools like SwaggerHub or Postman to write the API spec. Swagger offers code-generation
tools for multiple languages. Code generators generate code based on the OpenAPI spec. Code
generators are capable of generating client-side code in addition to the server-side code.
When using the second alternative, you can use a web framework-specific way to build the API
spec from the API implementation. For example with FastAPI, you get automatic API specification
generation. By default OpenAPI schema is served in JSON format at the following endpoint:
/openapi.json. This URL is configurable. FastAPI also provides Swagger UI interactive docs and
client at the following URL: /docs. This URL is also customizable. There is also ReDoc interactive
docs and client available at /redoc which is also configurable. For example, to set OpenAPI schema to
be served at /my-service/v1/openapi.json and set Swagger UI to be served at /my-service/v1/docs
and disable ReDoc:
2 https://swagger.io/specification/
API Design Principles 435
app = FastAPI(
openapi_url='/my-service/v1/openapi.json',
docs_url='/my-service/v1/docs',
redoc_url=None
)
I prefer to use the second approach of writing code first. I like it better when I don’t have work with
both auto-generated and handwritten code, and many web frameworks offer automatic generation of
OpenAPI schema and interactive documentation, like Swagger UI.
Let’s implement sales-item-service API endpoints for CRUD operations on sales items using FastAPI.
We use the clean microservice design principle introduced earlier and write the API endpoints inside
a controller class:
Figure 8.1. controllers/RestSalesItemController.py
class RestSalesItemController:
# Sales item service is provided by dependency injection
__sales_item_service: SalesItemService = Provide['sales_item_service']
def __init__(self):
self.__router = APIRouter()
self.__router.add_api_route(
'/sales-items/',
self.create_sales_item,
methods=['POST'],
status_code=201,
response_model=OutputSalesItem,
)
self.__router.add_api_route(
'/sales-items/',
self.get_sales_items,
methods=['GET'],
response_model=list[OutputSalesItem],
)
self.__router.add_api_route(
'/sales-items/{id_}',
self.get_sales_item,
API Design Principles 436
methods=['GET'],
response_model=OutputSalesItem,
)
self.__router.add_api_route(
'/sales-items/{id_}',
self.update_sales_item,
methods=['PUT'],
status_code=204,
response_model=None,
)
self.__router.add_api_route(
'/sales-items/{id_}',
self.delete_sales_item,
methods=['DELETE'],
status_code=204,
response_model=None,
)
@property
def router(self):
return self.__router
def create_sales_item(
self, input_sales_item: InputSalesItem
) -> OutputSalesItem:
return self.__sales_item_service.create_sales_item(
input_sales_item
)
def update_sales_item(
self, id_: str, sales_item_update: InputSalesItem
) -> None:
return self.__sales_item_service.update_sales_item(
id_, sales_item_update
)
The above controller is not production quality. The following must be added:
All of the above could be and probably should be implemented using decorators, for example:
API Design Principles 437
@allow_for_user_roles(['admin'], authorizer)
@audit_log
@increment_counter(Counters.request_attempts)
def create_sales_item(
self,
input_sales_item: InputSalesItem,
request: Request
) -> OutputSalesItem:
return self.__sales_item_service.create_sales_item(
input_sales_item
)
def audit_log(handle_request):
@wraps(handle_request)
def wrapped_handle_request(*args, **kwargs):
method = kwargs['request'].method
url = kwargs['request'].url
client_host = kwargs['request'].client.host
# The below printed text should be written to audit log
print(f'API endpoint: {method} {url} accessed from: {client_host}')
return handle_request(*args, **kwargs)
return wrapped_handle_request
def increment_counter(counter):
def decorate(handle_request):
@wraps(handle_request)
def wrapped_handle_request(*args, **kwargs):
method = kwargs['request'].method
url = kwargs['request'].url
# Increment counter by one with 'api_endpoint' label
counter.increment(1, {'api_endpoint': f'{method} {url}'})
return handle_request(*args, **kwargs)
return wrapped_handle_request
return decorate
class SalesItemImage(BaseModel):
id: PositiveInt
rank: PositiveInt
url: HttpUrl
class Config:
orm_mode = True
class Meta:
orm_model = SalesItemImageEntity
class InputSalesItem(BaseModel):
name: str = Field(max_length=256)
# We accept negative prices for sales items that act
# as discount items
priceInCents: int
images: list[SalesItemImage] = Field(max_items=25)
class Config:
orm_mode = True
class OutputSalesItem(BaseModel):
id: str
createdAtTimestampInMs: PositiveInt
name: str = Field(max_length=256)
priceInCents: int
images: list[SalesItemImage] = Field(max_items=25)
class Config:
orm_mode = True
Notice that we have a validation for each attribute in all classes. This is important because of security.
For example, string and list attributes should have maximum length validators to prevent possible
API Design Principles 439
denial of service attacks. Remember to add validation to output DTOs as well. This is important
because of security. Output validation can protected against injection attacks that try to return data
that has invalid shape. Output validation in FastAPI is also used in the automatic documentation of
the API schema and when automatically generating client code.
The SalesItemService protocol looks like the following:
Figure 8.5. service/SalesItemService.py
class SalesItemService(Protocol):
def create_sales_item(
self, input_sales_item: InputSalesItem
) -> OutputSalesItem:
pass
def update_sales_item(
self, id_: str, sales_item_update: InputSalesItem
) -> None:
pass
class SalesItemServiceImpl(SalesItemService):
# Sales item repository is provided by DI
__sales_item_repository: SalesItemRepository = Provide[
'sales_item_repository'
]
def create_sales_item(
self, input_sales_item: InputSalesItem
) -> OutputSalesItem:
API Design Principles 440
sales_item = self.__sales_item_repository.save(input_sales_item)
return OutputSalesItem.from_orm(sales_item)
def update_sales_item(
self, id_: str, sales_item_update: InputSalesItem
) -> None:
return self.__sales_item_repository.update(id_, sales_item_update)
class SalesItemRepository(Protocol):
def save(self, input_sales_item: InputSalesItem) -> SalesItem:
pass
The implementation of the SalesItemRepository is presented in the next chapter where we focus on
database principles. The next chapter provides three different implementations for the repository:
Object-Relational Mapping (ORM), parameterized SQL queries and MongoDB.
In error-handling, we are depending on the except block provided by the FastAPI web framework.
We could raise errors of the FastAPI HTTPException type in our business logic, but then we would
API Design Principles 441
be coupling our web framework with business logic which is not desired. Remember how in clean
microservice design principle, the dependency goes only from the web framework (controller) towards
business logic, not the other way around. If we used web framework specific error classes in our
business and logic and we would like to migrate the microservice to a different web framework, we
would have to refactor the whole business logic in regard to raised errors.
What we should do is to introduce an error base class for our microservice and provide a custom error
handler for FastAPI. The custom error handler translates our business logic specific errors into HTTP
responses. The possible errors the microservice can raise should all derive from the base error class.
The ApiError class is a general-purpose base error class for any API.
Figure 8.8. errors/ApiError.py
class ApiError(Exception):
def __init__(
self,
status_code: int,
status_text: str,
message: str,
code: str | None = None,
description: str | None = None,
cause: Exception | None = None,
):
self.__status_code: Final = status_code
self.__status_text: Final = status_text
self.__message: Final = message
self.__code: Final = code
self.__description: Final = description
self.__cause: Final = cause
@property
def status_code(self) -> int:
return self.__status_code
@property
def status_text(self) -> str:
return self.__status_text
@property
def message(self) -> str:
return self.__message
@property
def cause(self) -> Exception | None:
return self.__cause
@property
def code(self) -> str | None:
return self.__code
@property
def description(self) -> str | None:
return self.__description
API Design Principles 442
The code property could be named type also. The idea behind that property is to convey information
what kind of an error is in question. This property can be used on the server side as a label for
failure metrics and on the client special handling for certain kind of errors can be implemented. If
you want, you can even add one more property to the above class, namely recovery_action. This
is an optional property that contains information about recovery steps for an actionable error. For
example, a database connection error might have a recovery_action property value: “Please retry
after a while. If the problem persist, contact the technical support at ”.
Below is the base error class for the sales-item-service:
Figure 8.9. errors/SalesItemServiceError.py
from ..errors.ApiError import ApiError
class SalesItemServiceError(ApiError):
pass
class EntityNotFoundError(SalesItemServiceError):
def __init__(self, entity_name: str, entity_id: str):
super().__init__(
404,
'Not Found',
f'{entity_name} with id {entity_id} not found',
'EntityNotFound',
)
# Imports ...
app = FastAPI()
@app.exception_handler(SalesItemServiceError)
def handle_sales_item_service_error(
request: Request, error: SalesItemServiceError
):
# Log error.cause
# status_code=error.status_code
# error_code=error.code
return JSONResponse(
status_code=error.status_code,
content={
'statusCode': error.status_code,
'statusText': error.status_text,
'errorCode': error.code,
'errorMessage': error.message,
'errorDescription': error.description,
# get_stack_trace returns stack trace only
# when environment is not production
# otherwise it returns None
'stackTrace': get_stack_trace(error.cause),
},
)
The following API response should be expected in a production environment (Notice how the
stackTrace is null in production environment):
{
"statusCode": 404,
"statusText": "Not Found",
"errorCode": "EntityNotFound",
"errorMessage": "Sales item with id 10 not found",
"errorDescription": null,
"stackTrace": null
}
You should also add specific error handlers for validation messages and other possible errors:
@app.exception_handler(RequestValidationError)
def handle_request_validation_error(
request: Request, error: RequestValidationError
):
# Audit log
return JSONResponse(
status_code=400,
content={
API Design Principles 444
'statusCode': 400,
'statusText': 'Bad Request',
'errorCode': 'RequestValidationError',
'errorMessage': 'Request validation failed',
'errorDescription': str(error),
'stackTrace': None,
},
)
@app.exception_handler(Exception)
def handle_unspecified_error(request: Request, error: Exception):
return JSONResponse(
status_code=500,
content={
'statusCode': 500,
'statusText': 'Internal Server Error',
'errorCode': 'UnspecifiedError',
'errorMessage': 'Unspecified internal error',
'errorDescription': str(error),
'stackTrace': get_stack_trace(error),
},
)
The rest of API service source code files look like the following:
Figure 8.11. DiContainer.py
from dependency_injector import containers, providers
class DiContainer(containers.DeclarativeContainer):
wiring_config = containers.WiringConfiguration(
modules=[
'.service.SalesItemServiceImpl',
'.controllers.RestSalesItemController',
'.controllers.AriadneGraphQlSalesItemController',
'.controllers.StrawberryGraphQlSalesItemController',
'.controllers.GrpcSalesItemController',
API Design Principles 445
'.repositories.OrmSalesItemRepository',
'.repositories.ParamSqlSalesItemRepository',
'.repositories.MongoDbSalesItemRepository',
]
)
sales_item_service = providers.Singleton(SalesItemServiceImpl)
sales_item_repository = providers.Singleton(
ParamSqlSalesItemRepository
)
order_controller = providers.Singleton(RestSalesItemController)
return entity_dict
# Remove the below setting of the env variable for production code!
# mysql+pymysql://root:password@localhost:3306/salesitemservice
# mongodb://localhost:27017/salesitemservice
os.environ[
'DATABASE_URL'
] = 'mysql+pymysql://root:password@localhost:3306/salesitemservice'
di_container = DiContainer()
app = FastAPI()
@app.exception_handler(SalesItemServiceError)
def handle_sales_item_service_error(
request: Request, error: SalesItemServiceError
):
# Log error.cause
return JSONResponse(
status_code=error.status_code,
content={
'statusCode': error.status_code,
'statusText': error.status_text,
'errorCode': error.code,
'errorMessage': error.message,
'errorDescription': error.description,
# get_stack_trace returns stack trace only
# when environment is not production
# otherwise it returns None
'stackTrace': get_stack_trace(error.cause),
},
)
@app.exception_handler(RequestValidationError)
def handle_request_validation_error(
request: Request, error: RequestValidationError
):
# Audit log
# api_endpoint=f'{request.method} {request.url}'
# status_code=400
# error_code='RequestValidationError'
return JSONResponse(
status_code=400,
content={
'statusCode': 400,
'statusText': 'Bad Request',
'errorCode': 'RequestValidationError',
'errorMessage': 'Request validation failed',
'errorDescription': str(error),
'stackTrace': None,
},
)
@app.exception_handler(Exception)
def handle_unspecified_error(request: Request, error: Exception):
return JSONResponse(
status_code=500,
content={
'statusCode': 500,
'statusText': 'Internal Server Error',
'errorCode': 'UnspecifiedError',
'errorMessage': 'Unspecified internal error',
'errorDescription': str(error),
'stackTrace': get_stack_trace(error),
},
)
order_controller = di_container.order_controller()
app.include_router(order_controller.router)
Let’s create a GraphQL schema that defines needed types and API endpoints for the sales-item-service.
We will discuss the details of the below schema and the schema language in general after the example.
API Design Principles 448
type Image {
id: Int!
rank: Int!
url: String!
}
type SalesItem {
id: ID!
createdAtTimestampInMs: String!
name: String!
priceInCents: Int!
images(
sortByField: String = "rank",
sortDirection: SortDirection = ASC,
offset: Int = 0,
limit: Int = 5
): [Image!]!
}
input InputImage {
id: Int!
rank: Int!
url: String!
}
input InputSalesItem {
name: String!
priceInCents: Int!
images: [InputImage!]!
}
enum SortDirection {
ASC
DESC
}
type IdResponse {
id: ID!
}
type Query {
salesItems(
sortByField: String = "createdAtTimestamp",
sortDirection: SortDirection = DESC,
offset: Int = 0,
limit: Int = 50
): [SalesItem!]!
salesItemsByFilters(
nameContains: String,
priceGreaterThan: Float
): [SalesItem!]!
}
type Mutation {
createSalesItem(salesItem: InputSalesItem!): SalesItem!
API Design Principles 449
updateSalesItem(
id: ID!,
salesItem: InputSalesItem
): IdResponse!
In the above GraphQL schema, we define several types used in API requests and responses. A
GraphQL type specifies an object type: what properties the object has and the types of those
properties. A type specified with the input keyword is an input-only type (input DTO type). GraphQL
defines the following primitive (scalar) types: Int (32-bit), Float, String, Boolean, and ID. You can
define an array type with the notation: [<Type>]. By default, types are nullable. If you want a
non-nullable type, you must add an exclamation mark (!) after the type name. You can define an
enumerated type with the enum keyword. The Query and Mutation types are special GraphQL types
used to define queries and mutations. The above example defines three queries and four mutations
that clients can execute. You can add parameters for a type property. We have added parameters for
all the queries (queries are properties of the Query type), mutations (mutations are properties of the
Mutation type), and the images property of the SalesItem type.
In the above example, I have named all the queries with names that describe the values they return,
i.e., there are no verbs in the query names. It is possible to name queries starting with a verb (like
the mutations). For example, you could add get to the beginning of the names of the above-defined
queries if you prefer.
There are two ways to implement a GraphQL API:
• Schema first
• Code first (schema is generated from the code)
Let’s focus on the schema first implementation first and implement the above specified API using
Ariadne library. We will first define fake implementations for some of the API endpoints (queries/-
mutations):
import time
schema = gql(
"""
type Image {
id: Int!
rank: Int!
url: String!
}
type SalesItem {
id: ID!
createdAtTimestampInMs: String!
API Design Principles 450
name: String!
priceInCents: Int!
images(
sortByField: String = "rank",
sortDirection: SortDirection = ASC,
offset: Int = 0,
limit: Int = 5
): [Image!]!
}
input InputImage {
id: Int!
rank: Int!
url: String!
}
input InputSalesItem {
name: String!
priceInCents: Int!
images: [InputImage!]!
}
enum SortDirection {
ASC
DESC
}
type IdResponse {
id: ID!
}
type Query {
salesItems(
sortByField: String = "createdAtTimestamp",
sortDirection: SortDirection = DESC,
offset: Int = 0,
limit: Int = 50
): [SalesItem!]!
salesItemsByFilters(
nameContains: String,
priceGreaterThan: Float
): [SalesItem!]!
}
type Mutation {
createSalesItem(inputSalesItem: InputSalesItem!): SalesItem!
updateSalesItem(
id: ID!,
inputSalesItem: InputSalesItem
): IdResponse!
query = QueryType()
@query.field('salesItems')
def resolve_sales*items(**, **kwargs):
if kwargs['offset'] == 0:
return [
{
'id': 1,
'createdAtTimestampInMs': '12345678999877',
'name': 'sales item',
'priceInCents': 1095,
'images': [{'id': 1, 'rank': 2, 'url': 'url'}],
}
]
return []
@query.field('salesItem')
def resolve_sales*item(**, id):
return {
'id': id,
'createdAtTimestampInMs': '12345678999877',
'name': 'sales item',
'priceInCents': 1095,
'images': [{'id': 1, 'rank': 2, 'url': 'url'}],
}
mutation = MutationType()
@mutation.field('createSalesItem')
def resolve_create_sales*item(**, **kwargs):
return {
'id': 100,
'createdAtTimestampInMs': str(round(time.time() * 1000)),
**kwargs['inputSalesItem'],
}
@mutation.field('deleteSalesItem')
def resolve_delete_sales*item(**, id):
return {'id': id}
app = GraphQL(executable_schema)
In the above example, the gqlfunction validates the schema and raises a descriptive
GraphQLSyntaxError, if there is an issue, or returns the original string if it is correct. We created a
resolver function for the first two queries in the schema and we also created resolvers for creating
and deleting a sales item. You can start the GraphQL server with the following command (You
should have uvicorn already installed using _pip):
API Design Principles 452
uvicorn app:app
After having the server running, browse with a web browser to the following URL:
http://127.0.0.1:8000/. You will see the GraphiQL UI and are able to execute queries and mutations.
Enter the following query to the left pane in the UI.
query salesItems {
salesItems(offset: 0) {
id
createdAtTimestampInMs
name
priceInCents,
images {
url
}
}
}
You should get the following response on the right side pane:
{
"data": {
"salesItems": [
{
"id": "1",
"createdAtTimestampInMillis": "12345678999877",
"name": "sales item",
"priceInCents": 1095,
"images": [
{
"url": "url"
}
]
}
]
}
}
This is the response you would get, except for the timestamp representing the current time:
API Design Principles 453
{
"data": {
"createSalesItem": {
"id": "100",
"createdAtTimestampInMillis": "1694798999418",
"name": "test sales item",
"priceInCents": 4095,
"images": []
}
}
}
mutation delete {
deleteSalesItem(id: 1) {
id
}
}
{
"data": {
"deleteSalesItem": {
"id": "1"
}
}
}
Let’s replace the dummy static implementations in our Ariadne GraphQL controller with real calls to
the sales item service:
Figure 8.14. controllers/AriadneGraphQlSalesItemController.py
from ariadne import MutationType, QueryType, gql, make_executable_schema
from dependency_injector.wiring import Provide
schema = gql(
"""
type Image {
id: Int!
rank: Int!
url: String!
}
type SalesItem {
id: ID!
createdAtTimestampInMs: String!
name: String!
priceInCents: Int!
images: [Image!]!
API Design Principles 454
input InputImage {
id: Int!
rank: Int!
url: String!
}
input InputSalesItem {
name: String!
priceInCents: Int!
images: [InputImage!]!
}
type IdResponse {
id: ID!
}
type Query {
salesItems: [SalesItem!]!
salesItem(id: ID!): SalesItem!
}
type Mutation {
createSalesItem(inputSalesItem: InputSalesItem!): SalesItem!
updateSalesItem(
id: ID!,
inputSalesItem: InputSalesItem
): IdResponse!
query = QueryType()
@query.field('salesItems')
def resolve_sales_items(*_):
return sales_item_service.get_sales_items()
@query.field('salesItem')
def resolve_sales_item(*_, id: str):
return sales_item_service.get_sales_item(id)
mutation = MutationType()
@mutation.field('createSalesItem')
def resolve_create_sales_item(*_, inputSalesItem):
input_sales_item = InputSalesItem.parse_obj(inputSalesItem)
return sales_item_service.create_sales_item(input_sales_item)
@mutation.field('updateSalesItem')
API Design Principles 455
@mutation.field('deleteSalesItem')
def resolve_delete_sales_item(*_, id: str):
sales_item_service.delete_sales_item(id)
return {'id': id}
Notice in the above code that we must remember to validate the input in the two mutations. We can
do that when converting the input dict to a Pydantic model using parse_obj method. To make the
example more production like, we should add authorization, audit logging and metrics updates. All of
this can be done by creating decorators in a similar way we created earlier in the REST API example.
The decorators can get the request object from the info.context dict: info.context['request']
GraphQL error handling differs from REST API error handling. A GraphQL API responses do not
provide different HTTP response status codes. A GraphQL API response is always sent with status
code 200 OK. When an error happens while processing a GraphQL API request, the response body
object will include an errors array. In your GraphQL type resolvers, you should raise an error when
a query or mutation fails. You can use the same ApiError base error class as was used in the earlier
REST API example. For handling the custom API errors, we need to add an error formatter as shown
below. The error objects should always have a message field. Additional information about the error
can be supplied in an extensions object which can contain any properties.
Let’s say the a salesItem query results in an EntityNotFoundError, the API response would have a
null for the data property and errors property present:
{
"data": null,
"errors": [
{
"message": "Sales item not found with id 1",
"extensions": {
"statusCode": 404,
"statusText": "Not Found",
"errorCode": "EntityNotFound",
"errorDescription": null
"stackTrace": null
}
}
]
}
API Design Principles 456
di_container = DiContainer()
def format_custom_error(
graphql_error, debug: bool = False
) -> dict[str, Any]:
error = unwrap_graphql_error(graphql_error)
if isinstance(error, SalesItemServiceError):
return {
'message': error.message,
'extensions': {
'statusCode': error.status_code,
'statusText': error.status_text,
'errorCode': error.code,
'errorDescription': error.description,
'stackTrace': get_stack_trace(error.cause),
},
}
if isinstance(error, ValidationError):
return {
'message': 'Request validation failed',
'extensions': {
'statusCode': 400,
'statusText': 'Bad Request',
'errorCode': 'RequestValidationError',
'errorDescription': str(error),
'stackTrace': None,
},
}
if isinstance(error, Exception):
return {
'message': 'Unspecified internal error',
'extensions': {
'statusCode': 500,
'statusText': 'Internal Server Error',
API Design Principles 457
'errorCode': 'UnspecifiedError',
'errorDescription': str(error),
'stackTrace': get_stack_trace(error),
},
}
else:
return format_error(graphql_error, debug)
The Ariadne GraphQl version of the sales-item-service can be run with the following command. (We
assume that service source code is placed in a python package salesitemservice and we are located in
a parent directory of that).
uvicorn salesitemservice.app_graphql:app
It is also possible to return an error as a query/mutation return value. This can be done e.g. by
returning a union type from a query or mutation. This approach requires more complex GraphQL
schema and more complex resolvers on the server-side. For example:
# ...
type Error {
message: String!
# Other possible properties
}
type Mutation {
createSalesItem(inputSalesItem: InputSalesItem!): SalesItemOrError!
}
In the createSalesItem query resolver you must add a try-except block to handle error situation and
respond with an Error object in case of error.
You can also specify multiple errors:
API Design Principles 458
# ...
type ErrorType1 {
# ...
}
type ErrorType2 {
# ...
}
type ErrorType3 {
# ...
}
type Mutation {
createSalesItem(inputSalesItem: InputSalesItem!): SalesItemOrError!
}
The above example would require making the createSalesItem resolvers to catch multiple different
errors and responding with an appropriate error object as a result.
Also the client-side code will be more complex because for the need to handle the different types of
responses for a single operation (query/mutation). For example:
mutation {
createSalesItem(inputSalesItem: {
price: 200
name: "test sales item"
images: []
}) {
__typename
...on SalesItem {
id,
createdAtTimestampInMillis
}
...on ErrorType1 {
# Specify fields here
}
...on ErrorType2 {
# Specify fields here
}
...on ErrorType3 {
# Specify fields here
}
}
This approach also has the down-side that the client must still be able to handle possible errors
reported in the response’s errors array.
In a GraphQL schema, you can add parameters for a primitive (scalar) property, also. That is useful
for implementing conversions. For example, we could define the SalesItem type with a parameterized
price property:
API Design Principles 459
enum Currency {
USD,
GBP,
EUR,
JPY
}
type SalesItem {
id: ID!
createdAtTimestampInMillis: String!
name: String!
price(currency: Currency = USD): Float!
images(
sortByField: String = "rank",
sortDirection: SortDirection = ASC,
offset: Int = 0,
limit: Int = 5
): [Image!]!
}
Now clients can supply a currency parameter for the price property in their queries to get the price
in different currencies. The default currency is USD if no currency parameter is supplied.
Below are two example queries that a client could perform against the earlier defined GraphQL
schema:
{
# gets the name, price in euros and the first 5 images
# for the sales item with id "1"
salesItem(id: "1") {
name
price(currency: EUR)
images
}
In real life, consider limiting the fetching of resources only to the previous or the next page (or the
next page only if you are implementing infinite scrolling on the client side). Then, clients cannot
fetch random pages. This prevents attacks where a malicious user tries to fetch pages with huge page
numbers (like 10,000, for example) which can cause extra load for the server or, at the extreme, a
denial of service.
Below is an example where clients can only query the first, next, or previous page. When a client
requests the first page, the page cursor can be empty, but when the client requests the previous or the
next page, it must give the current page cursor as a query parameter.
API Design Principles 460
type PageOfSalesItems {
# Contains the page number encrypted and
# encoded as a Base64 value.
pageCursor: String!
salesItems: [SalesItem!]!
}
enum Page {
FIRST,
NEXT,
PREVIOUS
}
type Query {
pageOfSalesItems(
page: Page = FIRST,
pageCursor: String = ""
): PageOfSalesItems!
}
Let’s have another example with GraphQL and use the code-first approach this time with the
Strawberry library. We should follow the clean microservice design principle when implementing
production code. We should be able to share the services, repositories, DTOs, errors and entities with
the earlier sales-item-service REST API example and only define a separate controller for the GraphQL
API. The below example implements only two API endpoints (getting a sales item and creating a sales
item) to keep the example shorter.
Figure 8.16. controllers/StrawberryGraphQlSalesItemController.py
import strawberry
from dependency_injector.wiring import Provide
from strawberry.fastapi import GraphQLRouter
from strawberry.types import Info
class StrawberryGraphQlSalesItemController:
@strawberry.type
class Query:
@strawberry.field
def salesItems(self, info: Info) -> list[OutputSalesItem]:
output_sales_items = sales_item_service.get_sales_items()
return [
OutputSalesItem.from_pydantic(output_sales_item)
for output_sales_item in output_sales_items
]
@strawberry.field
def salesItem(self, info: Info, id: str) -> OutputSalesItem:
API Design Principles 461
output_sales_item = sales_item_service.get_sales_item(id)
return OutputSalesItem.from_pydantic(output_sales_item)
@strawberry.type
class Mutation:
@strawberry.mutation
def createSalesItem(
self, info: Info, inputSalesItem: InputSalesItem
) -> OutputSalesItem:
output_sales_item = sales_item_service.create_sales_item(
inputSalesItem.to_pydantic()
)
return OutputSalesItem.from_pydantic(output_sales_item)
@strawberry.mutation
def updateSalesItem(
self, info: Info, id: str, inputSalesItem: InputSalesItem
) -> IdResponse:
sales_item_service.update_sales_item(
id, inputSalesItem.to_pydantic()
)
return IdResponse(id=id)
@strawberry.mutation
def deleteSalesItem(self, info: Info, id: str) -> IdResponse:
sales_item_service.delete_sales_item(id)
return IdResponse(id=id)
@property
def router(self):
return self.__router
To make our controller more production like, we must add authorization, audit logging and
metrics updates. We can implement similar kind of decorators we used earlier in the REST API
example. When the decorators need to access the request, it can be done via the info parameter:
info.context['request']
In addition to the above controller, we must define strawberry types which can be based on existing
pydantic classes. Here are the strawberry types:
API Design Principles 462
import strawberry
@strawberry.experimental.pydantic.input(model=InputSalesItem)
class InputSalesItem:
name: strawberry.auto
priceInCents: strawberry.auto
images: list[InputSalesItemImage]
import strawberry
@strawberry.experimental.pydantic.input(
model=SalesItemImage, all_fields=True
)
class InputSalesItemImage:
pass
import strawberry
@strawberry.experimental.pydantic.type(model=OutputSalesItem)
class OutputSalesItem:
id: strawberry.auto
createdAtTimestampInMs: str
name: strawberry.auto
priceInCents: strawberry.auto
images: list[OutputSalesItemImage]
API Design Principles 463
import strawberry
@strawberry.experimental.pydantic.type(
model=SalesItemImage, all_fields=True
)
class OutputSalesItemImage:
pass
Server-Sent Events (SSE) is a uni-directional push technology enabling a client to receive updates
from a server via an HTTP connection.
Let’s showcase the SSE capabilities with a real-life example. The below example defines a subscribe-
to-loan-app-summaries API endpoint for clients to subscribe to loan application summaries. A client
will show loan application summaries in a list view in its UI. Whenever there is a new summary for a
loan application available, the server will send a loan application summary event to clients that will
update their UIs by adding a new loan application summary. The below example uses FastAPI and
sse-starlette library.
import json
loan_app_summaries = []
app = FastAPI()
def get_loan_app_summary():
if len(loan_app_summaries) > 0:
return loan_app_summaries.pop(0)
return None
@app.get('/subscribe-to-loan-app-summaries')
async def subscribe_to_loan_app_summaries(request: Request):
async def generate_loan_app_summary_events():
while True:
API Design Principles 464
if await request.is_disconnected():
break
loan_app_summary = get_loan_app_summary()
if loan_app_summary:
yield json.dumps(loan_app_summary)
return EventSourceResponse(
generate_loan_app_summary_events()
)
@app.post('/loan-app-summaries')
async def create_loan_app_summary(
request: Request
) -> None:
loan_app_summary = await request.json()
loan_app_summaries.append(loan_app_summary)
Next, we can implement the web client in JavaScript and define the following React functional
component:
if (loanAppSummary) {
setLoanAppSummaries([loanAppSummary, ...loanAppSummaries]);
}
} catch {
// Handle error
}
});
return (
<ul>{loanAppSummaryListItems}</ul>
);
}
Let’s have an example of a GraphQL subscription. The below GraphQL schema defines one
subscription for a post’s comments. It is not relevant what a post is. It can be a blog post or social
media post, for example. We want a client to be able to subscribe to a post’s comments.
type PostComment {
id: ID!,
text: String!
}
type Subscription {
postComment(postId: ID!): PostComment
}
On the client side, we can have the below JavaScript code to define a subscription named
postCommentText that subscribes to a post’s comments and returns the text property of comments:
If a client executes the above query for a particular post (defined with the postId parameter), the
following kind of response can be expected:
API Design Principles 466
{
"data": {
"postComment": {
"text": "Nice post!"
}
}
}
To be able to use GraphQL subscriptions, you must implement support for them both on the server and
client side. In practice this means setting up WebSocket communication, because that GraphQL uses
that protocol to implement subscriptions. For the server side, you can find instructions for the Ariadne
library here: https://ariadnegraphql.org/docs/subscriptions. And for the client side, you can find in-
structions for the Apollo client here: https://www.apollographql.com/docs/react/data/subscriptions/
etting-up-the-transport3
After the server and client-side support for subscriptions are implemented, you can use the subscrip-
tion in your React component:
if (data?.postComment) {
setPostComments([...postComments, data.postComment]);
}
const postCommentListItems =
postComments.map(( { id, text }) =>
(<li key={id}>{text}</li>));
return <ul>{postCommentListItems}</ul>;
}
Below is a chat messaging application consisting of a WebSocket server implemented with FastAPI,
Kafka and Redis, and a WebSocket client implemented with React. There can be multiple instances of
3 https://www.apollographql.com/docs/react/data/subscriptions/#setting-up-the-transport
API Design Principles 467
the server running. These instances are stateless except for storing WebSocket connections for locally
connected clients. First, we list the source code files for the server side.
A new Redis client is created using the redis-py library:
Figure 8.21. redis_client.py
import os
redis_client = Redis(
host=os.environ.get('REDIS_HOST') or 'localhost',
port=port,
username=os.environ.get('REDIS_USERNAME'),
password=os.environ.get('REDIS_PASSWORD'),
)
class ChatMsgBrokerAdminClient(Protocol):
class CreateTopicError(WebSocketExampleError):
pass
import os
class KafkaChatMsgBrokerAdminClient(ChatMsgBrokerAdminClient):
def __init__(self):
self.__admin_client = AdminClient(
{
'bootstrap.servers': os.environ.get('KAFKA_BROKERS'),
'client.id': 'chat-messaging-service',
}
)
try:
topic_name_to_creation_dict = (
self.__admin_client.create_topics([topic])
)
topic_name_to_creation_dict[name].result()
except KafkaException as error:
if error.args[0].code() != KafkaError.TOPIC_ALREADY_EXISTS:
raise self.CreateTopicError(error)
Users of the chat messaging application are identified with phone numbers. On the server side, we
store the WebSocket connection for each user in the phone_nbr_to_conn_map:
Figure 8.24. phone_nbr_to_conn_map.py
class Connection(Protocol):
class Error(WebSocketExampleError):
pass
class WebSocketConnection(Connection):
def __init__(self, websocket: WebSocket):
self.__websocket = websocket
The below module is the WebSocket server. The server accepts connections from clients. When it
receives a chat message from a client, it will first parse and validate it. For a chat message, the
server will store the message in persistent storage (using a separate chat-message-service REST API,
API Design Principles 470
not implemented here). The server gets the recipient’s server information from a Redis cache and
sends the chat message to the recipient’s WebSocket connection or produces the chat message to a
Kafka topic where another microservice instance can consume the chat message and send it to the
recipient’s WebSocket connection. The Redis cache stores a hash map where the users’ phone numbers
are mapped to the server instance they are currently connected. A UUID identifies a microservice
instance.
Figure 8.27. ChatMsgServer.py
class ChatMsgServer(Protocol):
async def handle(
self, connection: Connection, phone_number: str
) -> None:
pass
import json
from typing import Final
class WebSocketChatMsgServer(ChatMsgServer):
def __init__(self, instance_uuid: str):
self.__instance_uuid: Final = instance_uuid
self.__conn_to_phone_nbr_map: Final[dict[Connection, str]] = {}
self.__chat_msg_broker_producer: Final = (
KafkaChatMsgBrokerProducer()
)
self.__cache: Final = RedisPhoneNbrToInstanceUuidCache(
redis_client
)
self.__conn_to_phone_nbr_map[connection] = phone_number
self.__cache.try_store(phone_number, self.__instance_uuid)
while True:
chat_message: dict[
str, str
] = await connection.try_receive_json()
recipient_instance_uuid = (
self.__cache.retrieve_instance_uuid(
recipient_phone_nbr
)
)
await self.__try_send(
chat_message, recipient_instance_uuid
)
except WebSocketDisconnect:
self.__disconnect(connection)
except PhoneNbrToInstanceUuidCache.Error:
# Handle error ...
except Connection.Error:
# Handle error ...
except ChatMsgBrokerProducer.Error:
# Handle error ...
self.__chat_msg_broker_producer.close()
if recipient_conn:
await recipient_conn.try_send_json(chat_message)
elif recipient_instance_uuid:
# Recipient has active connection on different
# server instance compared to sender
chat_message_json = json.dumps(chat_message)
self.__chat_msg_broker_producer.try_produce(
API Design Principles 472
chat_message_json, topic=recipient_instance_uuid
)
if phone_number:
del phone_nbr_to_conn_map[phone_number]
del self.__conn_to_phone_nbr_map[connection]
try:
self.__cache.try_remove(phone_number)
except PhoneNbrToInstanceUuidCache.Error:
# Handle error ...
class PhoneNbrToInstanceUuidCache(Protocol):
class Error(WebSocketExampleError):
pass
def retrieve_instance_uuid(
self, phone_number: str | None
) -> str | None:
pass
class RedisPhoneNbrToInstanceUuidCache(PhoneNbrToInstanceUuidCache):
def __init__(self, redis_client: Redis):
self.__redis_client = redis_client
def retrieve_instance_uuid(
self, phone_number: str | None
) -> str | None:
if phone_number:
try:
return self.__redis_client.hget(
'phoneNbrToInstanceUuidMap', phone_number
)
API Design Principles 473
except RedisError:
pass
return None
class ChatMsgBrokerProducer(Protocol):
class Error(WebSocketExampleError):
pass
def close(self):
pass
import os
class KafkaChatMsgBrokerProducer(ChatMsgBrokerProducer):
def __init__(self):
config = {
'bootstrap.servers': os.environ.get('KAFKA_BROKERS'),
'client.id': 'chat-messaging-service',
}
self.__producer = Producer(config)
try:
self.__producer.produce(
topic, chat_message_json, on_delivery=handle_error
)
self.__producer.poll()
except KafkaException:
raise self.Error()
def close(self):
try:
self.__producer.flush()
except KafkaException:
pass
The KafkaChatMsgBrokerConsumer class defines a Kafka consumer that consumes chat messages from
a particular Kafka topic and sends them to the recipient’s WebSocket connection:
Figure 8.33. ChatMsgBrokerConsumer.py
class ChatMsgBrokerConsumer(Protocol):
def consume_chat_msgs(self) -> None:
pass
import json
import os
class KafkaChatMsgBrokerConsumer(ChatMsgBrokerConsumer):
def __init__(self, topic: str):
self.__topic = topic
config = {
'bootstrap.servers': os.environ.get('KAFKA_BROKERS'),
'group.id': 'chat-messaging-service',
'auto.offset.reset': 'smallest',
API Design Principles 475
'enable.partition.eof': False,
}
self.__consumer = Consumer(config)
self.__is_running = True
while self.__is_running:
try:
messages = self.__consumer.poll(timeout=1)
if messages is None:
continue
recipient_conn = phone_nbr_to_conn_map.get(
chat_message.get('recipientPhoneNbr')
)
if recipient_conn:
recipient_conn.try_send_text(chat_message_json)
except KafkaException:
# Handle error ...
except Connection.Error:
# Handle error ...
def close(self):
self.__consumer.close()
import sys
from threading import Thread
from uuid import uuid4
instance_uuid = str(uuid4())
# Log error
sys.exit(1)
chat_msg_consumer_thread = Thread(
target=chat_msg_broker_consumer.consume_chat_msgs
)
chat_msg_consumer_thread.start()
app = FastAPI()
chat_msg_server = WebSocketChatMsgServer(instance_uuid)
@app.websocket('/chat-messaging-service/{phone_number}')
async def handle_websocket(websocket: WebSocket, phone_number: str):
connection = WebSocketConnection(websocket)
await chat_msg_server.handle(connection, phone_number)
@app.on_event('shutdown')
def shutdown_event():
chat_msg_broker_consumer.stop()
chat_msg_consumer_thread.join()
chat_msg_broker_consumer.close()
chat_msg_server.close()
For the web client, we have the below code. An instance of the ChatMessagingService class connects to
a chat messaging server via WebSocket. It listens to messages received from the server and dispatches
an action upon receiving a chat message. The class also offers a method for sending a chat message
to the server.
Figure 8.36. ChatMessagingService.js
class ChatMessagingService {
wsConnection;
connectionIsOpen = false;
lastChatMessage;
constructor(dispatch, userPhoneNbr) {
this.wsConnection =
new WebSocket(`ws://localhost:8080/chat-messaging-service/${userPhoneNbr}`);
this.wsConnection.addEventListener('open', () => {
this.connectionIsOpen = true;
});
this.wsConnection.addEventListener('error', () => {
this.lastChatMessage = null;
});
API Design Principles 477
this.wsConnection.addEventListener(
'message',
({ data: chatMessageJson }) => {
const chatMessage = JSON.parse(chatMessageJson);
store.dispatch({
type: 'receivedChatMessageAction',
chatMessage
});
});
this.wsConnection.addEventListener('close', () => {
this.connectionIsOpen = false;
});
}
send(chatMessage) {
this.lastChatMessage = chatMessage;
if (this.connectionIsOpen) {
this.wsConnection.send(JSON.stringify(chatMessage));
} else {
// Send message to REST API
}
}
close() {
this.connectionIsOpen = false;
this.wsConnection.close();
}
}
return chatMessagingService;
}
root.render(
<Provider store={store}>
<ChatApp/>
API Design Principles 478
</Provider>
);
The chat application ChatApp parses the user’s and contact’s phone numbers from the URL and then
renders a chat view between the user and the contact:
Figure 8.38. ChatApp.jsx
return (
<div>
User: {userPhoneNbr}
<ContactChatView
userPhoneNbr={userPhoneNbr}
contactPhoneNbr={contactPhoneNbr}
/>
</div>
);
}
The ContactChatView component renders chat messages between a user and a contact:
Figure 8.39. ContactChatView.jsx
function ContactChatView({
userPhoneNbr,
contactPhoneNbr,
chatMessages,
fetchLatestChatMessages
}) {
const inputElement = useRef(null);
useEffect(() => {
fetchLatestChatMessages(userPhoneNbr, contactPhoneNbr);
}, [contactPhoneNbr,
API Design Principles 479
fetchLatestChatMessages,
userPhoneNbr]
);
function sendChatMessage() {
if (inputElement?.current.value) {
store.dispatch({
type: 'sendChatMessageAction',
chatMessage: {
senderPhoneNbr: userPhoneNbr,
recipientPhoneNbr: contactPhoneNbr,
message: inputElement.current.value
}
});
}
}
return (
<li
key={index}
className={messageIsReceived ? 'received' : 'sent'}>
{message}
</li>
);
});
return (
<div className="contactChatView">
Contact: {contactPhoneNbr}
<ul>{chatMessageElements}</ul>
<input ref={inputElement}/>
<button onClick={sendChatMessage}>Send</button>
</div>
);
}
function mapStateToProps(state) {
return {
chatMessages: state
};
}
.contactChatView {
width: 420px;
}
.contactChatView ul {
padding-inline-start: 0;
list-style-type: none;
}
.contactChatView li {
margin-top: 15px;
width: fit-content;
max-width: 180px;
padding: 10px;
border: 1px solid #888;
border-radius: 20px;
}
.contactChatView li.received {
margin-right: auto;
}
.contactChatView li.sent {
margin-left: auto;
}
API Design Principles 481
Let’s have an example of a gRPC-based API. First, we must define the needed Protocol Buffers types.
They are defined in a file named with the extension .proto. The syntax of proto files is pretty simple.
We define the service by listing the remote procedures. A remote procedure is defined with the
following syntax: rpc <procedure-name> (<argument-type>) returns (<return-type>) {}. A type is
defined with the below syntax:
message <type-name> {
<field-type> <field-name> [= <field-index>];
...
}
syntax = "proto3";
package salesitemservice;
service SalesItemService {
rpc createSalesItem (InputSalesItem) returns (OutputSalesItem) {}
rpc getSalesItems (GetSalesItemsOptions) returns (OutputSalesItems) {}
rpc getSalesItem (Id) returns (OutputSalesItem) {}
rpc updateSalesItem (SalesItemUpdate) returns (Nothing) {}
rpc deleteSalesItem (Id) returns (Nothing) {}
}
message GetSalesItemsOptions {
optional string sortByField = 1;
optional string sortDirection = 2;
optional uint64 offset = 3;
optional uint64 limit = 4;
}
message Nothing {}
message Image {
uint64 id = 1;
uint64 rank = 2;
string url = 3;
API Design Principles 484
message InputSalesItem {
string name = 1;
float price = 2;
repeated Image images = 3;
}
message SalesItemUpdate {
uint64 id = 1;
string name = 2;
float price = 3;
repeated Image images = 4;
}
message OutputSalesItem {
uint64 id = 1;
uint64 createdAtTimestampInMillis = 2;
string name = 3;
float price = 4;
repeated Image images = 5;
}
message Id {
uint64 id = 1;
}
message OutputSalesItems {
repeated OutputSalesItem salesItems = 1;
}
message ErrorDetails {
optional string code = 1;
optional string description = 2;
}
In the above example, the getSalesItems method returns an object that contains an array of sales
items. gRPC offers also possibility to stream data in both directions. For example, we could make the
getSalesItems method a streaming method and then we did not need the properties offset and limit
in the GetSalesItemsArg. To define a streaming getSalesItems method:
// ...
service SalesItemService {
// ...
rpc getSalesItems (GetSalesItemsArg) returns (stream OutputSalesItem) {}
// ...
}
// ...
After having the proto file completed, we must generate code for the gRPC server. Let’s install the
grpcio-tools library:
API Design Principles 485
After executing the above command, there should be three files generated in the directory. Creating
the actual server code requires the following two steps:
• Implementing the generated servicer interface with functions that perform the actual “work”
of the service.
• Run a gRPC server that listens for requests from clients and transmits responses.
We need to install:
if isinstance(error, SalesItemServiceError):
grpc_status_code = map_http_status_code_to_grpc_status_code(error)
message = error.message
detail.Pack(
ErrorDetails(
code=error.code,
description=error.description,
# get_stack_trace returns stack trace only
# when environment is not production
# otherwise it returns None
stackTrace=get_stack_trace(error.cause),
)
)
elif isinstance(error, ValidationError):
grpc_status_code = code_pb2.INVALID_ARGUMENT
message = 'Request validation failed'
detail.Pack(
ErrorDetails(
code='RequestValidationError', description=str(error)
)
)
else:
grpc_status_code = code_pb2.INTERNAL
message = 'Unspecified internal error'
detail.Pack(
ErrorDetails(
code='UnspecifiedError',
description=str(error),
stackTrace=get_stack_trace(error),
)
)
return status_pb2.Status(
code=grpc_status_code,
message=message,
details=[detail],
)
class GrpcSalesItemController(SalesItemServiceServicer):
__sales_item_service: SalesItemService = Provide['sales_item_service']
def createSalesItem(
self, input_sales_item: InputSalesItem, context
) -> OutputSalesItem:
try:
input_sales_item_dict = proto_to_dict(input_sales_item)
input_sales_item = PydanticInputSalesItem.parse_obj(
input_sales_item_dict
)
output_sales_item_dict = (
self.__sales_item_service.create_sales_item(
input_sales_item
API Design Principles 487
).dict()
)
output_sales_item = OutputSalesItem()
json_format.ParseDict(
output_sales_item_dict, output_sales_item
)
return output_sales_item
except Exception as error:
self.__abort_with(error, context)
def getSalesItems(
self, get_sales_items_arg: GetSalesItemsArg, context
) -> OutputSalesItems:
try:
# NOTE! Here we don't use the input message
# 'get_sales_items_arg' because our current
# business logic does not support it
output_sales_items = (
self.__sales_item_service.get_sales_items()
)
output_sales_items = [
json_format.ParseDict(
output_sales_item.dict(), OutputSalesItem()
)
for output_sales_item in output_sales_items
]
return OutputSalesItems(salesItems=output_sales_items)
except Exception as error:
self.__abort_with(error, context)
output_sales_item = OutputSalesItem()
json_format.ParseDict(
output_sales_item_dict, output_sales_item
)
return output_sales_item
except Exception as error:
self.__abort_with(error, context)
sales_item_update = PydanticInputSalesItem.parse_obj(
sales_item_update_dict
)
API Design Principles 488
self.__sales_item_service.update_sales_item(
id_, sales_item_update
)
return Nothing()
except Exception as error:
self.__abort_with(error, context)
@staticmethod
def __abort_with(error: Exception, context):
status = create_status_from(error)
context.abort_with_status(rpc_status.to_status(status))
For production, you need to add audit logging, metrics update and authorization to each gRPC
procedure implementation. You can use decorators for that purpose. Also when handling errors,
remember to do needed audit logging (e.g. audit log bad requests) and updating of failure-related
metrics).
Below is the gRPC server code:
Figure 8.46. controllers/app_grpc.py
import os
from concurrent import futures
import grpc
di_container = DiContainer()
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
add_SalesItemServiceServicer_to_server(
GrpcSalesItemController(), server
)
server.add_insecure_port('[::]:50051')
API Design Principles 489
server.start()
server.wait_for_termination()
serve()
You can run the server with the following command from a directory above the salesitemservice
directory:
python -m salesitemservice.app_grpc
Below is an example of gRPC client that performs operations using the above server:
Figure 8.47. grpc_client.py
import grpc
def run():
with grpc.insecure_channel('localhost:50051') as channel:
sales_item_service = SalesItemServiceStub(channel)
input_sales_item = InputSalesItem(
name='Test',
priceInCents=950,
images=[
Image(id=11, rank=1, url='http://server.com/images/1')
],
)
try:
sales_item = sales_item_service.createSalesItem(
input_sales_item
)
id_ = sales_item.id
print(f'Sales item with id {id_} created')
sales_items_response = sales_item_service.getSalesItems(
GetSalesItemsArg()
)
print(
f'Nbr of sales items fetched: {len(sales_items_response.salesItems)}'
)
API Design Principles 490
sales_item_service.updateSalesItem(
SalesItemUpdate(
id=id_,
name='Test 2',
priceInCents=1950,
images=[
Image(
id=11, rank=1, url='http://server.com/images/1'
)
],
)
)
print(f'Sales item with id {id_} updated')
sales_item = sales_item_service.getSalesItem(Id(id=id_))
print(f'Sales item named {sales_item.name} fetched')
sales_item_service.deleteSalesItem(Id(id=id_))
print(f'Sales item with id {id_} deleted')
except grpc.RpcError as error:
status = rpc_status.from_call(error)
if status:
print(f'gRPC status code: {status.code}')
for detail in status.details:
error_details = ErrorDetails()
detail.Unpack(error_details)
print(f'Error code: {error_details.code}')
print(f'Error message: {status.message}')
print(
f'Error description: {error_details.description}'
)
else:
print(str(error))
if __name__ == '__main__':
run()
You can run the client with the following command from a directory above the salesitemservice
directory:
python -m salesitemservice.grpc_client
In request-only asynchronous APIs, the request sender does not expect a response. Such APIs are
typically implemented using a message broker. The request sender will send a JSON format request
to a topic in the message broker, where the request recipient consumes the request asynchronously.
API Design Principles 491
Different API endpoints can be specified in a request using a procedure property, for example. You
can name the procedure property as you wish, e.g. action, operation, apiEndpoint etc. Parameters
for the procedure can be supplied in a parameters property. Below is an example request in JSON:
{
"procedure": "<procedure name>",
"parameters": {
"parameterName1": <parameter value>,
"parameterName2": <parameter value>,
// ...
}
}
Let’s have an example with an email-sending microservice that implements a request-only asyn-
chronous API and handles sending of emails. We start by defining a message broker topic for the
microservice. The topic should be named after the microservice, for example: email-sending-service
In the email-sending-service, we define the following request schema for an API endpoint that sends
an email:
{
"procedure": "sendEmailMessage",
"parameters": {
"fromEmailAddress": "...",
"toEmailAddresses": ["...", "...", ...],
"subject": "...",
"message": "..."
}
}
Below is an example request that some other microservice can produce to the email-sending-service
topic in the message broker:
{
"procedure": "sendEmailMessage",
"parameters": {
"fromEmailAddress": "[email protected]",
"toEmailAddresses": ["[email protected]"],
"subject": "Status update",
"message": "Hi, Here is my status update ..."
}
}
a message broker topic or topics. Each participating microservice should have a topic named after the
microservice in the message broker.
The request format is the same as defined earlier, but the response has a response property instead of
the parameters property. Thus, responses have the following format:
{
"procedure": "<procedure name>",
"response": {
"propertyName1": <property value>,
"propertyName2": <property value>,
// ...
}
}
{
"procedure": "assessLoanEligibility",
"parameters": {
"userId": 123456789012,
"loanApplicationId": 5888482223,
// Other parameters...
}
}
The loan-eligibility-assessment-service responds to the above request by sending the following JSON-
format response to the message broker’s loan-application-service topic:
{
"procedure": "assessLoanEligibility",
"response": {
"loanApplicationId": 5888482223,
"isEligible": true,
"amountInDollars": 10000,
"interestRate": 9.75,
"termInMonths": 120
}
}
{
"procedure": "assessLoanEligibility",
"response": {
"loanApplicationId": 5888482223,
"isEligible": false
}
}
Alternatively, request and response messages can be treated as events with some data. When we send
events between microservices, we have an event-driven architecture. With event-driven architecture
we must decide if we have a single or multiple topic for the software system in the message broker.
If we have a single topic that is shared by all the microservices in the software system, then each
microservice will consume each message from the message broker and decide if they should act on it.
This approach is suitable except when large events are produced to the message broker. When large
events are produced, each microservice must consume those large events even if they don’t need
to react on them. This will consume unnecessarily a lot of network bandwidth when the number
of microservices is also high. The other extreme is to create a topic for each microservice in the
message broker. This approach causes extra network bandwidth consumption if a large event must be
produced to multiple topics in order to be handled by multiple microservices. You can also create a
hybrid model where you have a broadcast topic and also individual topics for specific microservices.
Below are the earlier request and response messages written as events:
{
"event": "assessLoanEligibility",
"data": {
"userId": 123456789012,
"loanApplicationId": 5888482223,
// ...
}
}
{
"event": "LoanApproved",
"data": {
"loanApplicationId": 5888482223,
"isEligible": true,
"amountInDollars": 10000,
"interestRate": 9.75,
"termInMonths": 120
}
}
API Design Principles 494
{
"procedure": "LoanRejected",
"response": {
"loanApplicationId": 5888482223,
"isEligible": false
}
}
9: Databases And Database Principles
This chapter presents principles for selecting and using databases. Principles are presented for the
following database types:
• Relational databases
• Document databases
• Key-value databases
• Wide column databases
• Search engines
Relational databases are also called SQL databases because accessing a relational database happens via
issuing SQL statements. Databases of the other database types are called NoSQL databases because
they either don’t support SQL at all or they support only a subset of SQL, possibly with some additions
and modifications.
For example, if you don’t know what kind of database queries you need now or will need in the future,
you should consider using a relational database that is well-suited for different kinds of queries.
• Logical databases/schemas
– Tables
* Columns
A table consists of columns and rows. Data in a database is stored as rows in the tables. Each row
has a value for each column in the table. If a row does not have a value for a particular column, then
a special NULL value is used. You can specify if null values are allowed for a column or not.
A microservice should have a single logical database (or schema). Some relational databases have one
logical database (or schema) available by default, and in other databases, you must create a logical
database (or schema) by yourself.
Databases And Database Principles 496
In this section, examples are presented using the SQLAlchemy library’s ORM functionality. An ORM
uses entities as building blocks for the database schema. Each entity class in a microservice is reflected
as a table in the database. Use the same name for an entity and the database table, except the table
name should be plural. Below is an example of a SalesItem entity class. Before defining the actual
entity class(es), we need to declare a Base entity class:
Figure 9.1. Base.py
class Base(DeclarativeBase):
pass
class SalesItem(Base):
__tablename__ = 'salesitems'
Name the related table in plural, e.g. SalesItem entities are stored in a table named salesitems. In
this book, I use case-insensitive database identifiers and write all identifiers in lowercase. The case
sensitivity of a database depends on the database and the operating system it is running on. For
example, MySQL is case-sensitive only on Linux systems.
The properties of an entity map to columns of the entity table, meaning that the salesitems table has
the following columns:
• id
• name
Databases And Database Principles 497
• price
Each entity table should have a primary key defined. The primary key must be unique for each row
in the table. In the above example, we give the primary_key=True argument for the mapped_column
function to define that this column should be a primary key and contain unique value for each row.
We also define that the database should automatically generate an automatically incremented value
for the id column (The default value for autoincrement parameter is True, so it is not specified anymore
in further examples).
ORM can create database tables according to entity specifications in code. Below is an example SQL
statement that an ORM generates to create a table for storing SalesItem entities:
Columns of a table can be specified as unique and nullable. Below is an example where we define that
the values of the name column in the salesitems must be unique. We don’t want to store sales items
with null names, and we want to store sales items having unique names. We also add a description
column that is nullable.
class SalesItem(Base):
__tablename__ = 'salesitems'
__table_args__ = (UniqueConstraint('name'))
ORM generates the following SQL for creating the above defined salesitems table:
Databases And Database Principles 498
Let’s try to create an entity and store it to a database. First we have to create a database engine:
import os
engine = create_engine(os.environ.get('DATABASE_URL'))
The echo=True parameter defines that SQL statements generated and used by the ORM will be logged
to standard out. This is handy for debugging purposes. After we have created the database engine,
we must create the database tables in the database. That can be done using the following command:
Base.metadata.create_all(engine)
try:
with Session(engine) as session:
session.add(sales_item)
session.commit()
except SQLAlchemyError:
# Handle error
ORM will generate the needed SQL statement on your behalf and execute it. Below is an example SQL
statement generated by the ORM to persist a sales item (Remember that the database autogenerates
the id column).
Databases And Database Principles 499
You can search for the created sales item in the database:
try:
with Session(engine) as session:
sales_item = session.scalars(statement).one()
except SQLAlchemyError:
# Handle error
For the above operation, the ORM will generate the following SQL query:
Then you can modify the entity and use commit to update the database:
try:
with Session(engine) as session:
sales_item = session.get(SalesItem, 1)
sales_item.price = 20
session.commit()
except SQLAlchemyError:
# Handle error
For the above operation, the ORM will generate the following SQL statement:
try:
with Session(engine) as session:
sales_item = session.get(SalesItem, 1)
session.delete(sales_item)
session.commit()
except SQLAlchemyError:
# Handle error
Suppose your microservice executes SQL queries that do not include the primary key column in the
query’s WHERE clause. In that case, the database engine must perform a full table scan to find the
wanted rows. Let’s say you want to query sales items, the price of which is less than 10. This can be
achieved with the below query:
# price = ...
try:
with Session(engine) as session:
sales_items = session.scalars(statement).all()
except SQLAlchemyError:
# Handle error
The database engine must perform a full table scan to find all the sales items where the price column
has a value below the price variable’s value. If the database is large, this can be slow. If you perform
the above query often, you should optimize those queries by creating an index. For the above query
to be fast, we must create an index for the price column:
class SalesItem(Base):
__tablename__ = 'salesitems'
__table_args__ = (UniqueConstraint('name'),)
• One-to-one
• One-to-many
• Many-to-many
class Order(Base):
__tablename__ = 'orders'
class OrderItem(Base):
__tablename__ = 'orderitems'
__table_args__ = (
PrimaryKeyConstraint('orderId', 'id', name='orderitems_pk'),
)
id: Mapped[int]
salesitemid: Mapped[int] = mapped_column(BigInteger())
orderid: Mapped[int] = mapped_column(ForeignKey('orders.id'))
Orders are stored in the orders table, and order items are stored in the orderitems table, which
contains a join column named orderid. Using this join column, we can map a particular order item
to a specific order. Each order item maps to exactly one sales item. For this reason, the orderitems
table also contains a column named salesitemid. Sales items are stored in a different database in a
separate microservice.
Below is the SQL statement generated by the ORM for creating the orderitems table. The one-to-one
and one-to-many relationships are reflected in the foreign key constraint:
Databases And Database Principles 502
The following SQL query is executed by the ORM to fetch the order with id 123 and its order items:
In a many-to-many relationship, one entity has a relationship with many entities of another type, and
those entities have a relationship with many entities of the first entity type. For example, a student
can attend many courses, and a course can have numerous students attending it.
Suppose we have a service that stores student and course entities in a database. Each student entity
contains the courses the student has attended. Similarly, each course entity contains a list of students
that have attended the course. We have a many-to-many relationship where one student can attend
multiple courses, and multiple students can attend one course. This means an additional association
table, studentcourse, must be created. This new table maps a particular student to a particular course.
class Base(DeclarativeBase):
pass
student_course_assoc_table = Table(
'studentcourse',
Base.metadata,
Column('studentid', ForeignKey('students.id'), primary_key=True),
Column('courseid', ForeignKey('courses.id'), primary_key=True),
)
class Student(Base):
__tablename__ = 'students'
# Other fields...
class Course(Base):
__tablename__ = 'courses'
The ORM creates the students and courses tables in addition to the studentcourse mapping table:
Below is an example SQL query that the ORM executes to fetch attended courses for the user identified
with id 123:
Below is an example SQL query that the ORM executes to fetch students for the course identified with
id 123:
Let’s define a SalesItemRepository implementation using SQLAlchemy’s ORM capabilities for the
sales-item-service API defined in the previous chapter. Let’s start by defining the Base, SalesItem and
SalesItemImage entities:
class Base(DeclarativeBase):
pass
class SalesItem(Base):
__tablename__ = 'salesitems'
class SalesItemImage(Base):
__tablename__ = 'salesitemimages'
__table_args__ = (
PrimaryKeyConstraint(
'salesItemId', 'id', name='salesitemimages_pk'
),
)
id: Mapped[int]
rank: Mapped[int]
url: Mapped[str] = mapped_column(String(2084))
salesItemId: Mapped[int] = mapped_column(ForeignKey('salesitems.id'))
class OrmSalesItemRepository(SalesItemRepository):
def __init__(self):
try:
engine = create_engine(os.environ.get('DATABASE_URL'))
self.__SessionLocal = sessionmaker(
autocommit=False, autoflush=False, bind=engine
)
Base.metadata.create_all(bind=engine)
except SQLAlchemyError as error:
# Log error
raise error
if sales_item is None:
raise EntityNotFoundError('Sales item', id_)
new_sales_item = SalesItem(
**to_entity_dict(sales_item_update)
)
sales_item.name = new_sales_item.name
sales_item.priceInCents = new_sales_item.priceInCents
sales_item.images = new_sales_item.images
db_session.commit()
except SQLAlchemyError as error:
raise DatabaseError(error)
Let’s use the Python MySQL connector library mysql-connector-python. First, let’s insert data to the
salesitems table:
connection = None
try:
connection = connect(
host='...',
database='...',
user='...',
password='...'
)
cursor = connection.cursor(prepared=True)
sql_statement = 'INSERT INTO salesitems (name, price) VALUES (%s, %s)'
cursor.execute(sql_statement , ('Sample sales item 1', 20))
connection.commit()
except Error as error:
# Handle error
Databases And Database Principles 507
finally:
if connection:
connection.close()
The %s in the above SQL statement are placeholders for parameters in a parameterized SQL statement.
The second argument to the execute method contains the parameter values as a tuple. When a
database engine receives a parameterized query, it will replace the placeholders in the SQL statement
with the supplied parameter values.
Next, we can update a row in the salesitems table. The below example changes the price of the sales
item with id 123 to 20:
connection = None
try:
connection = connect(
host='...',
database='...',
user='...',
password='...'
)
cursor = connection.cursor(prepared=True)
sql_statement = 'UPDATE salesitems SET PRICE = %s WHERE id = %s'
cursor.execute(sql_statement , (20, 123))
connection.commit()
except Error as error:
# Handle error
finally:
if connection:
connection.close()
Let’s execute a SELECT statement to get sales items with their price over 20:
connection = None
try:
connection = connect(
host='...',
database='...',
user='...',
password='...'
)
cursor = connection.cursor(prepared=True)
sql_statement = 'SELECT id, name, price FROM salesitems WHERE price >= %s'
cursor.execute(sql_statement , (20,))
result = cursor.fetchall()
except Error as error:
# Handle error
finally:
Databases And Database Principles 508
if connection:
connection.close()
In an SQL SELECT statement, you cannot use parameters everywhere. You can use them as value
placeholders in the WHERE clause. If you want to use user-supplied data in other parts of an SQL
SELECT statement, you need to use string concatenation. You should not concatenate user-supplied
data without sanitation because that would open up possibilities for SQL injection attacks. Let’s say
you allow the microservice client to specify a sorting column:
import string
class ValidateColNameError(Exception):
pass
if all(
col_name_char in allowed_chars for col_name_char in column_name
):
return column_name
raise ValidateColNameError()
sql_query = (
'SELECT id, name, price FROM salesitems ORDER BY '
+ try_validate_col_name(sort_column_name)
)
# ...
As shown above, you need to validate the sort_column value so that it contains only valid characters
for a MySQL column name. If you need to get the sorting direction from the client, you should validate
that value to be either ASC or DESC. In the below example, we assume that a validateSortDirection
function exists:
class ValidateSortDirError(Exception):
pass
raise ValidateSortDirError()
Databases And Database Principles 509
validated_sort_col_name = try_validate_col_name(sort_column_name)
validated_sort_dir = try_validate_sort_dir(sort_direction)
sql_query = (
'SELECT id, name, price'
'FROM salesitems'
'ORDER BY'
f'{validated_sort_col_name}'
f'{validated_sort_dir}'
)
# ...
When you get values for a MySQL query’s LIMIT clause from a client, you must validate that those
values are integers and in a valid range. Don’t allow the client to supply random, very large values.
In the example below, we assume that two validation functions exist: try_validate_row_offset and
try_validate_row_count. The validation functions will raise if validation fails.
validated_row_offset = try_validate_row_offset(row_offset)
validated_row_count = try_validate_row_count(row_count)
sql_query = (
'SELECT id, name, price'
'FROM salesitems'
f'LIMIT {validated_row_offset}, {validated_row_count}'
)
# ...
When you get a list of wanted column names from a client, you must validate that each of them is a
valid column identifier:
# ...
class ParamSqlSalesItemRepository(SalesItemRepository):
def __init__(self):
try:
self.__conn_config = self.__try_create_conn_config()
self.__try_create_db_tables_if_needed()
except Exception as error:
# Log error
raise (error)
try:
connection = connect(**self.__conn_config)
cursor = connection.cursor(prepared=True)
sql_statement = (
'INSERT INTO salesitems'
'(createdAtTimestampInMs, name, priceInCents)'
' VALUES (%s, %s, %s)'
)
cursor.execute(
sql_statement,
(
created_at_timestamp_in_ms,
input_sales_item.name,
input_sales_item.priceInCents,
),
)
id_ = cursor.lastrowid
self.__try_insert_sales_item_images(
id_, input_sales_item.images, cursor
)
connection.commit()
return SalesItem(
Databases And Database Principles 511
**to_entity_dict(input_sales_item),
id=id_,
createdAtTimestampInMs=created_at_timestamp_in_ms,
)
except Error as error:
raise DatabaseError(error)
finally:
if connection:
connection.close()
try:
connection = connect(**self.__conn_config)
cursor = connection.cursor()
sql_statement = (
'SELECT s.id, s.createdAtTimestampInMs, s.name, s.priceInCents,'
'si.id, si.rank, si.url '
'FROM salesitems s '
'LEFT JOIN salesitemimages si ON si.salesItemId = s.id'
)
cursor.execute(sql_statement)
return self.__get_sales_item_entities(cursor)
except Error as error:
print(error)
raise DatabaseError(error)
finally:
if connection:
connection.close()
connection = None
try:
connection = connect(**self.__conn_config)
cursor = connection.cursor(prepared=True)
sql_statement = (
'SELECT s.id, s.createdAtTimestampInMs, s.name, s.priceInCents,'
'si.id, si.rank, si.url '
'FROM salesitems s '
'LEFT JOIN salesitemimages si ON si.salesItemId = s.id '
'WHERE s.id = %s'
)
cursor.execute(sql_statement, (id_,))
sales_item_entities = self.__get_sales_item_entities(cursor)
return sales_item_entities[0] if sales_item_entities else None
except Error as error:
raise DatabaseError(error)
finally:
if connection:
Databases And Database Principles 512
connection.close()
connection = None
try:
connection = connect(**self.__conn_config)
cursor = connection.cursor(prepared=True)
sql_statement = (
'UPDATE salesitems SET name = %s, priceInCents = %s '
'WHERE id = %s'
)
cursor.execute(
sql_statement,
(
sales_item_update.name,
sales_item_update.priceInCents,
id_,
),
)
sql_statement = (
'DELETE FROM salesitemimages WHERE salesItemId = %s'
)
cursor.execute(sql_statement, (id_,))
self.__try_insert_sales_item_images(
id_, sales_item_update.images, cursor
)
connection.commit()
except Error as error:
raise DatabaseError(error)
finally:
if connection:
connection.close()
connection = None
try:
connection = connect(**self.__conn_config)
cursor = connection.cursor()
sql_statement = (
'DELETE FROM salesitemimages WHERE salesItemId = %s'
)
cursor.execute(sql_statement, (id_,))
sql_statement = 'DELETE FROM salesitems WHERE id = %s'
Databases And Database Principles 513
cursor.execute(sql_statement, (id_,))
connection.commit()
except Error as error:
raise DatabaseError(error)
finally:
if connection.is_connected():
connection.close()
@staticmethod
def __try_create_conn_config() -> dict[str, Any]:
database_url = os.environ.get('DATABASE_URL')
user_and_password = (
database_url.split('@')[0].split('//')[1].split(':')
)
host_and_port = database_url.split('@')[1].split('/')[0].split(':')
database = database_url.split('/')[3]
return {
'user': user_and_password[0],
'password': user_and_password[1],
'host': host_and_port[0],
'port': host_and_port[1],
'database': database,
'pool_name': 'salesitems',
'pool_size': 25,
}
sql_statement = (
'CREATE TABLE IF NOT EXISTS salesitems ('
'id BIGINT NOT NULL AUTO_INCREMENT,'
'createdAtTimestampInMs BIGINT NOT NULL,'
'name VARCHAR(256) NOT NULL,'
'priceInCents INTEGER NOT NULL,'
'PRIMARY KEY (id)'
')'
)
cursor.execute(sql_statement)
sql_statement = (
'CREATE TABLE IF NOT EXISTS salesitemimages ('
'id BIGINT NOT NULL,'
'`rank` INTEGER NOT NULL,'
'url VARCHAR(2084) NOT NULL,'
'salesItemId BIGINT NOT NULL,'
'PRIMARY KEY (salesItemId, id),'
'FOREIGN KEY (salesItemId) REFERENCES salesitems(id)'
')'
)
cursor.execute(sql_statement)
connection.commit()
connection.close()
Databases And Database Principles 514
def __try_insert_sales_item_images(
self, sales_item_id: str | int, images, cursor
):
for image in images:
sql_statement = (
'INSERT INTO salesitemimages'
'(id, `rank`, url, salesItemId)'
'VALUES (%s, %s, %s, %s)'
)
cursor.execute(
sql_statement,
(image.id, image.rank, image.url, sales_item_id),
)
for (
id_,
created_at_timestamp_in_ms,
name,
price_in_cents,
image_id,
image_rank,
image_url,
) in cursor:
if id_to_sales_items_dict.get(id_) is None:
id_to_sales_items_dict[id_] = {
'id': id_,
'createdAtTimestampInMs': created_at_timestamp_in_ms,
'name': name,
'priceInCents': price_in_cents,
'images': [],
}
return [
SalesItem(**sales_item_dict)
for sales_item_dict in id_to_sales_items_dict.values()
]
A database relation is often described as “normalized” if it meets the first, second, and third normal
forms.
The first normal form requires that at every intersection of a row and column, a single value exists
and never a list of values. When considering a sales item, the first normal form states that there
cannot be two different price values in the price column or more than one name for the sales item
in the name column. If you need multiple names for a sales item, you must establish a one-to-many
relationship between a SalesItem entity and SalesItemName entities. What this means in practice is
that you remove the name property from the SalesItem entity class and create a new SalesItemName
entity class used to store sales items’ names. Then you create a one-to-many mapping between a
SalesItem entity and SalesItemName entities.
The second normal form requires that each non-key column entirely depends on the primary key.
Let’s assume that we have the following columns in an orderitems table:
The orderstate column only depends on the orderid column, not the entire primary key. The
orderstate column is in the wrong table. It should, of course, be in the orders table.
The third normal form requires that non-key columns are independent of each other.
Let’s assume that we have the following columns in a salesitems table:
• id (primary key)
• name
• price
• category
• discount
Databases And Database Principles 516
Let’s assume that the discount depends on the category. This table violates the third normal form
because a non-key column, discount, depends on another non-key column, category. Column
independence means that you can change any non-key column value without affecting any other
column. If you changed the category, the discount would need to be changed accordingly, thus
violating the third normal form rule.
The discount column should be moved to a new categories table with the following columns:
• id (primary key)
• name
• discount
Then we should update the salesitems table to contain the following columns:
• id (primary key)
• name
• price
• categoryid (a foreign key that references the id column in the categories table)
Document databases, like MongoDB, are useful for storing complete documents. A document is
usually a JSON object containing information in arrays and nested objects. Documents are stored
as such, and a whole document will be fetched when queried.
Let’s consider a microservice for sales items. Each sales item contains an id, name, price, image URLs,
and user reviews.
Below is an example sales item as a JSON object:
Databases And Database Principles 517
{
"id": "507f191e810c19729de860ea",
"category": "Power tools",
"name": "Sample sales item",
"price": 10,
"imageUrls": ["https://url-to-image-1...",
"https://url-to-image-2..."],
"averageRatingInStars": 5,
"reviews": [
{
"reviewerName": "John Doe",
"date": "2022-09-01",
"ratingInStars": 5,
"text": "Such a great product!"
}
]
}
A document database usually has a size limit for a single document. Therefore, the above example
does not store sales item images directly inside the document but only URLs to the images. Actual
images are stored in another data store more suitable for storing images, like Amazon S3.
When creating a microservice for sales items, we can choose a document database because we usually
store and access whole documents. When sales items are created, they are created as JSON objects
of the above shape with the reviews array being empty. When a sales item is fetched, the whole
document is retrieved from the database. When a client adds a review for a sales item, the sales item
is fetched from the database. The new review is appended to the reviews array, a new average rating
is calculated, and finally, the document is persisted with the modifications.
Below is an example of inserting one sales item to a MongoDB collection named salesItems.
MongoDB uses the term collection instead of table. A MongoDB collection can store multiple
documents.
from pymongo import MongoClient
URL = "mongodb://localhost:27017"
client = MongoClient(URL)
sales_items_coll.insert_one({
'category': 'Power tools',
'name': 'Sample sales item 1',
'price': 10,
'images': ['https://url-to-image-1...',
'https://url-to-image-2...'],
'averageRatingInStars': None,
'reviews': []
})
client.close()
You can find sales items for the Power tools category with the following query:
Databases And Database Principles 518
If clients are usually querying sales items by category, it is wise to create an index for that field:
When a client wants to add a new review for a sales item, you first fetch the document for the sales
item:
Then you calculate a new value for the averageRatingInStars field using the existing ratings and the
new rating and add the new review to the reviews array and then update the document with the
following command:
sales_items_coll.update_one(
{'_id': ObjectId('6527a461bd3c27d2d1822508')},
{
'$set': {'averageRatingInStars': 5},
'$push': {
'reviews': {
'reviewerName': 'John Doe',
'date': '2022-09-01',
'ratingInStars': 5,
'text': 'Such a great product!',
}
},
},
)
Clients may want to retrieve sales items sorted descending by the average rating. For this reason, you
might want to change the indexing to be the following:
A client can issue, for example, a request to get the best-rated sales items in the power tools category.
This request can be fulfilled with the following query that utilizes the above-created index:
class MongoDbSalesItemRepository(SalesItemRepository):
def __init__(self):
try:
database_url = os.environ.get('DATABASE_URL')
self.__client = MongoClient(database_url)
database_name = database_url.split('/')[3]
database = self.__client[database_name]
self.__sales_items = database['salesitems']
except Exception as error:
# Log error
raise (error)
self.__sales_items.insert_one(sales_item)
return self.__create_sales_item_entity(sales_item)
return (
Databases And Database Principles 520
None
if sales_item is None
else self.__create_sales_item_entity(sales_item)
)
except InvalidId:
raise EntityNotFoundError('Sales item', id_)
except (BSONError, PyMongoError) as error:
raise DatabaseError(error)
@staticmethod
def __create_sales_item_entity(sales_item: dict[str, Any]):
id_ = sales_item['_id']
del sales_item['_id']
images = [
SalesItemImage(**image) for image in sales_item['images']
]
return SalesItem(
**(sales_item | {'id': str(id_)} | {'images': images})
)
A simple use case for a key-value database is to use it as a cache for a relational database. For example,
a microservice can store SQL query results from a relational database in the cache. Redis is a popular
open-source key-value store. Let’s have an example with Redis to cache an SQL query result. In the
below example, we assume that the SQL query result is available as a dict:
Databases And Database Principles 521
import json
from redis import Redis
sql_query_result_json = redis_client.get(sql_query_statement)
With Redis, you can create key-value pairs that expire automatically after a specific time. This is a
useful feature if you are using the key-value database as a cache. You may want the cached items to
expire after a while.
In addition to plain strings, Redis also supports other data structures. For example, you can store a list,
queue, or hash map for a key. If you store a queue in Redis, you can use it as a simple single-consumer
message broker. Below is an example of producing a message to a topic in the message broker:
Table structures of a wide-column database are optimized for specific queries. With a wide-column
database, storing duplicate data is okay to make the queries faster. Wide-column databases also scale
horizontally well.
In this section, we use Apache Cassandra as an example wide-column database. Cassandra is a
scalable multi-node database engine. In Cassandra, the data of a table is divided into partitions
according to the table’s partition key. A partition key is composed of one or more columns of the
table. Each partition is stored on a single Cassandra node. You can think that Cassandra is a key-
value store where the key is the partition key, and the value is another “nested” table. The rows in
Databases And Database Principles 522
the “nested” table are uniquely identified by clustering columns sorted by default in ascending order.
The sort order can be changed to descending if wanted.
The partition key and the clustering columns form the table’s primary key. The primary key uniquely
identifies a row. Let’s have an example table that is used to store hotels near a particular point of
interest (POI):
CREATE TABLE hotels_by_poi (
poi_name text,
hotel_distance_in_meters_from_poi int,
hotel_id uuid,
hotel_name text,
hotel_address text,
PRIMARY KEY (poi_name, hotel_distance_in_meters_from_poi, hotel_id)
);
In the above example, the primary key consists of three columns. The first column (poi_name) is
always the partition key. The partition key must be given in a query. Otherwise, the query will be
slow because Cassandra must perform a full table scan because it does not know which node data is
located. When the partition key is given in a SELECT statement’s WHERE clause, Cassandra can find
the appropriate node where the data for that particular partition resides. The two other primary key
columns, hotel_distance_in_meters_from_poi and hotel_id are the clustering columns. They define
the order and uniqueness of the rows in the “nested” table.
The above figure shows that when you give a partition key value (poi_name) you have access to the
respective “nested” table where rows are ordered first by the hotel_distance_in_meters_from_poi
(ascending) and second by the hotel_id (ascending).
Databases And Database Principles 523
Now it is easy for a hotel room booking client to ask the server to execute a query to find hotels near
a POI given by a user. The following query will return the first 15 hotels nearest to Piccadilly Circus
POI:
SELECT
hotel_distance_in_meters_from_poi,
hotel_id,
hotel_name,
hotel_address
FROM hotels_by_poi
WHERE poi_name = 'Piccadilly Circus'
LIMIT 15
When a user selects a particular hotel from the result of the above query, the client can request the
execution of another query to fetch information about the selected hotel. The user wants to see other
POIs near the selected hotel. For that query, we should create another table:
Now a client can request the server to execute a query to fetch the nearest 20 POIs for a selected hotel.
(hotel with id c5a49cb0-8d98-47e3-8767-c30bc075e529):
SELECT
poi_distance_in_meters_from_hotel,
poi_id,
poi_name,
poi_address
FROM pois_by_hotel_id
WHERE hotel_id = c5a49cb0-8d98-47e3-8767-c30bc075e529
LIMIT 20
In a real-life scenario, a user wants to search for hotels near a particular POI for a selected period of
time. The server should respond with the nearest hotels having free rooms for the selected period.
For that kind of query, we create an additional table for storing hotel room availability:
Databases And Database Principles 524
The above table is updated whenever a room for a specific day is booked or a booking for a room is
canceled. The available_room_count column value is either decremented or incremented by one in
the update procedure.
Let’s say that the following query has been executed:
SELECT
hotel_distance_in_meters_from_poi,
hotel_id,
hotel_name,
hotel_address
FROM hotels_by_poi
WHERE poi_name = 'Piccadilly Circus'
LIMIT 30
Next, we should find hotels from the result of 30 hotels that have available rooms between the 1st of
September 2022 and 3rd of September 2022. We cannot use joins in Cassandra, but we can execute
the following query where we specifically list the hotel ids returned by the above query:
As a result of the above query, we have a list of a maximum of 15 hotels for which the minimum
available room count is listed. We can return a list of those max 15 hotels where the minimum
available room count is one or more to the user.
If Cassandra’s query language supported the HAVING clause, which it does not currently support, we
could have issued the following query to get what we wanted:
Databases And Database Principles 525
A wide-column database is also useful in storing time-series data from IoT devices and sensors. Below
is a table definition for storing measurement data in a telecom network analytics system:
In the above table, we have defined a compound partition key containing three columns: measure_-
name, dimension_name, and aggregation_period. Columns for a compound partition key are given in
parentheses.
Suppose we have implemented a client that visualizes measurements. In the client, a user can first
choose what counter/KPI (= measure name) to visualize, then select a dimension and aggregation
period. Let’s say that the user wants to see _dropped_callpercentage for cells calculated for a one-
minute period at 2022-02-03 16:00. The following kind of query can be executed:
The above query returns the top 50 cells where the dropped call percentage is highest for the given
minute.
We can create another table to hold measurements for a selected dimension value, e.g., for a particular
cell id. This table can be used to drill down to a particular dimension and see measure values in the
history.
Databases And Database Principles 526
The below query will return dropped call percentage values for the last 30 minutes for the cell
identified by cell id 3000:
A search engine (like Elasticsearch, for example) is useful for storing information like log entries
collected from microservices. You typically want to search the collected log data by the text in the
log messages.
It is not necessary to use a search engine when you need to search for text data. Other databases, both
document and relational, have a special index type that can index free-form text data in a column.
Considering the earlier example with MongoDB, we might want a client to be able to search sales
items by the text in the sales item’s name. We don’t need to store sales items in a search engine
database. We can continue storing them in a document database (MongoDB) and introduce a text
type index for the name field. That index can be created with the following MongoDB command:
• Threading principle
• Thread safety principle
• Publish/subscribe shared state change Principle
When developing modern cloud-native software, microservices should be stateless and automatically
scale horizontally (scaling out and in via adding and removing processes). The role of threading in
modern cloud-native microservices is not as prominent as it was earlier when software consisted of
monoliths running on bare metal servers, mainly capable of scaling up or down. Nowadays, you
should use threading if it is a good optimization or otherwise needed. Apart from microservices, if
you have a library, standalone application or a client software component, the situation is different,
and you can use threading, of course.
Suppose we have a software system with an event-driven architecture. Multiple microservices
communicate with each other using asynchronous messaging. Each microservice instance has only a
single thread that consumes messages from a message broker and then processes them. If the message
broker’s message queue for a microservice starts growing too long, the microservice should scale out
by adding a new instance. When the load for the microservice diminishes, it can scale in by removing
an instance. There is no need to use threading at all.
We could use threading in the data exporter microservice if the input consumer and the output
producer were synchronous. The reason for threading is optimization. If we had everything in a
single thread and the microservice was performing network I/O (either input or output-related), the
microservice would have nothing to execute because it is waiting for some network I/O to complete.
Using threads, we can optimize the execution of the microservice so that it potentially has something
to do when waiting for an I/O operation to complete.
Many modern input consumers and output producers are available as asynchronous implementations.
If we use an asynchronous consumer and producer in the data exporter microservice, we can eliminate
Concurrent Programming Principles 528
threading because network I/O will not block the execution of the main thread anymore. As a rule of
thumb, consider using asynchronous code first, and if it is not possible or feasible, only then consider
threading.
You might need a microservice to execute housekeeping tasks on a specific schedule in the background.
Instead of using threading and implementing the housekeeping functionality in the microservice,
consider implementing it in a separate microservice to ensure that the single responsibility principle
is followed. You can configure the housekeeping microservice to be run at regular intervals using a
Kubernetes CronJob, for example.
Threading also brings complexity to a microservice because the microservice must ensure thread
safety. You will be in big trouble if you forget to implement thread safety. Threading and
synchronization-related bugs are hard to find. Thread safety is a topic that is discussed later in this
chapter. Threading also brings complexity to deploying a microservice because the number of vCPUs
requested by the microservice depend on the number of threads used.
import multiprocessing
import os
if __name__ == '__main__':
numbers = [1, 2, 3, 4]
pool = multiprocessing.Pool(4)
pool.map(print_stdout, numbers)
1 97672
2 97671
4 97672
3 97670
Do not assume thread safety if you use a data structure or library. You must consult the documentation
to see whether thread safety is guaranteed. If thread safety is not mentioned in the documentation,
it can’t be assumed. The best way to communicate thread safety to developers is to name things so
that thread safety is explicit. For example, you could create a thread-safe collection library and have
a class named ThreadSafeList to indicate the class is thread-safe.
The main way in Pyhton to ensure thread safety is to use a lock. Python does not have atomic
variables.
class ThreadSafeCounter:
def __init__(self):
self.__lock = Lock()
self.__counter = 0
@property
def value(self) -> int:
with self.__lock:
return self.__counter
Python also contains a Lock class in the multiprocessing module and it can be used to synchronise
multiple processes in a similar fashion.
class AtomicInt():
def __init__(self, value: int):
self.__value = value
self.__lock = Lock()
@property
def value(self) -> int:
with self.__lock:
return self.__value
@value.setter
def value(self, new_value: int):
with self.__lock:
self.__value = new_value
Now we can use it. Imagine that all the last three operations below can be done safely from different
threads:
my_int = AtomicInt(0)
my_int.increment(1)
my_int.decrement(2)
print(my_int.value) # Prints -1
T = TypeVar('T')
class ThreadSafeList(Generic[T]):
def __init__(self):
self.__list: list[T] = []
self.__lock = Lock()
self.__list.append(value)
Condition objects are useful when you have a queue and there is a producer and consumer for the
queue in different threads. The Producer thread can inform the consumer thread when there is a new
item in the queue and the consumer thread waits for an item to be available in the queue. If you did
not have a condition object, you would have to implement this using a sleep in the consumer. This is
not optimal, because you don’t necessarily know what duration of sleep would be optimal.
T = TypeVar('T')
class ThreadSafeQueue(Generic[T]):
def __init__(self):
self.__items: Final[list[T]] = []
self.__item_waiter: Final = Condition()
self.__lock: Final = Lock()
@property
def item_waiter(self):
return self.__item_waiter
class MsgQueueProducer(Generic[T]):
def __init__(self, queue: ThreadSafeQueue[T]):
Concurrent Programming Principles 532
self.__queue = queue
class MsgQueueConsumer(Generic[T]):
def __init__(self, queue: ThreadSafeQueue[T]):
self.__queue = queue
The above statements come from customer stories1 of some companies having adopted Scaled Agile
Framework (SAFe)2 .
An agile framework describes a standardized way of developing software, which is essential,
especially in large organizations. In today’s work environments, people change jobs frequently, and
teams tend to change often, which can lead to a situation where there is no common understanding
of the way of working unless a particular agile framework is used. An agile framework establishes a
clear division of responsibilities, and everyone can focus on what they do best.
In the SAFe, for example, during a program increment (PI) planning, development teams plan features
for the next PI (consisting of 4 iterations, two weeks per iteration, a total of 8 weeks). In the PI
planning, teams split features into user stories and see which features fit in the PI. Planned user
stories will be assigned story points (measured in person days, for example), and stories will be placed
into iterations. This planning phase results in a plan the team should follow in the PI. Junior SAFe
1 https://scaledagile.com/insights-customer-stories/
2 https://www.scaledagileframework.com/
Teamwork Principles 534
practitioners can make mistakes like underestimating the work needed to complete a user story. But
this is a self-correcting issue. When teams and individuals develop, they will better estimate the
needed work amount, and plans become more solid. Teams and developers learn that they must
make all work visible. For example, reserve time to learn new things, like a programming language
or framework, and reserve time for refactoring. It is very satisfying to keep the planned schedule and
sometimes even complete work early. This will make you feel like a true professional and is a boost
to self-esteem.
My personal experience with SAFe from over five years is only positive. I feel I can concentrate
more on “the real work” which makes me happier. There are fewer meetings, irrelevant emails and
interruptions. This is mainly because the team has a Product Owner and Scrum master whose role
is to protect the team members from any “waste” or “the management stuff” and allow the team
members to concentrate on their work.
In the most optimal situation, development teams have a shared understanding of what is needed to
declare a user story or feature done. When having a common definition of done, each development
team can ensure consistent results and quality.
When considering a user story, at least the following requirements for a done user story can be defined:
The product owner’s (PO) role in a team is to accept a user story as done. Some of the above-mentioned
requirements can be automatically checked. For example, the static code analysis should be part of
every CI/CD pipeline and can also check the unit test coverage automatically. If static code analysis
does not pass or the unit test coverage is not acceptable, the CI/CD pipeline does not pass.
Some additional requirements for done-ness should be defined when considering a feature because
features can be delivered to customers. Below is a list of some requirements for a done feature:
To complete all the needed done-ness requirements, development teams can use tooling that helps
them remember what needs to be done. For example, when creating a new user story in a tool like
Jira, an existing prototype story could be cloned (or a template used). The prototype or template story
should contain tasks that must be completed before a user story can be approved.
Situations where you work alone with a piece of software are relatively rare. You cannot predict what
will happen in the future. There might be someone else responsible for the code you once wrote. And
there are cases when you work with some code for some time and then, maybe after several years,
need to return to that code. For these reasons, writing clean code that is easy to read and understand
by others and yourself in the future is essential. Remember that code is not written for a computer
only but also for people. People should be able to read and comprehend code easily. Remember that
at its best, code reads like beautiful prose!
• The architecture team should design the high-level architecture (Each team should have a
representative in the architecture team. Usually, it is the technical lead of the team)
• Development teams should perform object-oriented design first, and only after that proceed
with implementation
• Conduct object-oriented design within the team with relevant senior and junior developers
involved
• Don’t take the newest 3rd party software immediately into use, instead use mature 3rd party
software that has an established position in the market
• Design for easily replacing a 3rd party software component with another 3rd party component.
• Design for scalability (for future load)
• Design for extension: new functionality is placed in new classes instead of modifying existing
classes
• Utilize a plugin architecture (possibility to create plugins to add new functionality later)
• Reserve time for refactoring
Teamwork Principles 536
Software component documentation should reside in the same source code repository where the
source code is. The recommended way is to use a README.MD file in the root directory of the
source code repository for documentation in Markdown format. You can split the documentation
into multiple files and store additional files in the docs directory of the source code repository.
Below is an example table of contents that can be used when documenting a software component:
– Environment variables
– Configuration files
– Secrets
Teamwork Principles 538
Before reviewing code, a static code analysis should be performed to find any issues a machine can
find. The actual code review should focus on issues that static code analyzers cannot find. You should
not need to review code formatting, but everybody in the team should use the same code format and
this should be ensured by an automatic formatting tool. You cannot review your own code. At least
one of the reviewers should be in a senior or lead role. Things to focus on in a code review are
presented in the subsequent sections.
Consistent code formatting is vital because if team members have different source code formatting
rules, one team member’s small change to a file can reformat the whole file using his/hers formatting
rules, which can cause another developer to face a major merge conflict that slows down the
development process. Always agree on common source code formatting rules and preferably use
a tool like Prettier to enforce the formatting rules. If no automatic formatting tool is available, you
can create source code formatting rules for IDEs used by team members and store those rules in the
source code repository.
Concurrent development is enabled when different people modify different source code files. When
several people need to alter the same files, it can cause merge conflicts. These merge conflicts cause
extra work because they often must be resolved manually. This manual work can be slow, and it is
error-prone. The best thing is to avoid merge conflicts as much as possible. This can be achieved in
the ways described in the following sections.
Pair programming is something some developers like, and other developers hate. So it is not a one-fits-
all solution. It is not take it or leave it, either. You can have a team where some developers program in
pairs and others don’t. Also, people’s opinions about pair programming can be prejudiced. Perhaps,
they have never done pair programming, so how do they know if they like it or not? It is also true
that choosing the right partner to pair with can mean a lot. Some pairs have better chemistry than
other pairs.
Does pair programming just increase development costs? What benefits pair programming brings?
Teamwork Principles 541
I see pair programming as valuable, especially in situations where a junior developer pairs with a
more senior developer, and in this way, the junior developer is onboarded much faster. He can “learn
from the best”. Pair programming can improve software design because there is always at least two
persons’ view of the design. Bugs can be found easier and usually in an earlier phase (four eyes
compared to two eyes only). So, even if pair programming can add some cost, it usually results in
software with better quality: better design, less technical debt, better tests, and fewer bugs.
A software development team does not function optimally if everyone is doing everything or if it is
expected that anyone can do anything. No one is a jack of all trades. A team achieves the best results
when it has specialists targeted for different tasks. Team members need to have focus areas they like
to work with and where they can excel. When you are a specialist in some area, you can complete
tasks belonging to that area faster and with better quality.
Below is a list of needed roles for a development team:
• Backend developers
• Frontend developers
• Full-stack developers
• Mobile developers
• Embedded developers
A backend developer develops microservices, like APIs, running in the backend. A frontend
developer develops web clients. Typically, a frontend developer uses JavaScript or TypeScript,
React/Angular/Vue, HTML and CSS. A full-stack developer is a combination of a backend and
frontend developer capable of developing backend microservices and frontend clients. A mobile
developer develops software for mobile devices, like phones and tablets.
A team should have software developers at various seniority levels. Each team should have a lead
developer with the best experience in the used technologies and the domain. The lead developer
typically belongs to the virtual architectural team led by the system architect. There is no point in
having a team with just junior developers or just senior developers. The idea is to transfer skills and
knowledge from senior developers to junior developers. This also works the other way around. Junior
developers can have knowledge of some of the latest technologies and practices that senior developers
are missing. So overall, the best team is a team consisting of a good mix of both junior and senior
developers.
of some testing tools, like Apache JMeter, is appreciated. Test automation developers can also develop
internal testing tools, like interface simulators and data generators. Test automation developers should
form a virtual team to facilitate the development of E2E and automated non-functional tests.
11.10.6: UI Designer
A UI designer is responsible for designing the final UIs based on higher-level UX/UI designs/wire-
frames. The UI designer will also conduct usability testing of the software.
• Threat modeling
– To find out what kind of security features and tests are needed
– Implementation of threat countermeasures and mitigation. This aspect was covered in
more detail in the earlier security principles chapter
• Scan
– Static security analysis (also known as SAST = Static Application Security Testing)
– Security testing (also known as DAST = Dynamic Application Security Testing)
– Container vulnerability scanning
• Analyze
– Analyze the results of the scanning phase, detect and remove false positives and prioritize
corrections of vulnerabilities
• Remediate
• Plan
• Code
• Build
• Test
• Release
• Deploy
• Operate
• Monitor
12.2.1: Plan
Plan is the first phase in the DevOps lifecycle. In this phase, software features are planned, and high-
level architecture and UX are designed. This phase involves business (product management) and
software development organizations.
12.2.2: Code
Code is the software implementation phase. It consists of software components’ design and
implementation, writing unit tests, integration tests, E2E tests, and other automated tests. This phase
also includes all other coding needed to make the software deployable. Most of the work is done in
this phase, so it should be streamlined as much as possible.
The key to shortening this phase is to parallelize everything to the maximum possible extent. In the
Plan phase, the software was architecturally split into smaller pieces (microservices) that different
teams could develop in parallel. Regarding developing a single microservice, there should also be as
much parallelization as possible. This means that if a microservice can be split into multiple subdo-
mains, the development of these subdomains can be done very much in parallel. If we think about
the data exporter microservice, we identified several subdomains: input, decoding, transformations,
encoding, and output. If you can parallelize the development of these five subdomains, you can
significantly shorten the time needed to complete the implementation of the microservice.
To shorten this phase even more, a team should have a dedicated test automation developer who can
start developing automated tests in an early phase parallel to the implementation.
Providing high-quality software relies on high-quality design, implementation with little technical
debt, and comprehensive functional and non-functional testing. All of these aspects were already
handled in the earlier chapters.
• Checkout the latest source code from the source code repository
• Build the software
• Perform static code analysis. A tool like SonarQube/SonarCloud can be used
12.2.4: Release
In the Release phase, built and tested software is released automatically. After a software component’s
CI pipeline is successfully executed, the software component can be automatically released. This is
called continuous delivery (CD). Continuous delivery is often combined with the CI pipeline to create a
CI/CD pipeline for a software component. Continuous delivery means that the software component’s
artifacts are delivered to artifact repositories, like Artifactory, Docker Hub, or a Helm chart repository.
A CD pipeline should perform the following tasks:
• Perform static code analysis for the code that builds a container image (e.g., Dockerfile). A tool
like Hadolint can be used for Dockerfiles.
• Build a container image for the software component
• Publish the container image to a container registry (e.g., Docker Hub, Artifactory, or a registry
provided by your cloud provider)
• Perform a container image vulnerability scan
• Perform static code analysis for deployment code. Tools like Helm’s lint command, Kubesec
and Checkov can be used
• Package and publish the deployment code (for example, package a Helm chart and publish it to
a Helm chart repository)
Below is an example Dockerfile for a API microservice written using FastAPI library. The Dockerfile
uses Docker’s multi-stage feature. First (at the install_deps stage), it install dependencies. source
code files to JavaScript source code files. Then (at the intermediate stage), it creates an intermediate
image that copies The last stage (final) copies files from the install-deps stage to a distroless Python
base image. You should use a distroless base image to make the image size and the attack surface
smaller. A distroless image does not contain any Linux distribution inside it. Unfortunately, the below
advertised gcr.io/distroless/python images are currently (at the time of writing this book) considered
experimental and are not recommended for production.
DevSecOps 550
WORKDIR /microservice
COPY ./requirements.txt /microservice/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /microservice/requirements.txt
COPY ./app /microservice/app
Below is an example Helm chart template deployment.yaml for a Kubernetes Deployment. The
template code is given in double braces.
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "microservice.fullname" . }}
labels:
{{- include "microservice.labels" . | nindent 4 }}
spec:
{{- if ne .Values.env "production" }}
replicas: 1
{{- end }}
selector:
matchLabels:
{{- include "microservice.selectorLabels" . | nindent 6 }}
template:
metadata:
{{- with .Values.deployment.pod.annotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "microservice.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.deployment.pod.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "microservice.serviceAccountName" . }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.imageRegistry }}/{{ .Values.imageRepository }}:{{ .Values.im\
ageTag }}"
imagePullPolicy: {{ .Values.deployment.pod.container.imagePullPolicy }}
securityContext:
{{- toYaml .Values.deployment.pod.container.securityContext | nindent 12 }}
{{- if .Values.httpServer.port }}
ports:
- name: http
containerPort: {{ .Values.httpServer.port }}
protocol: TCP
DevSecOps 551
{{- end }}
env:
- name: ENV
value: {{ .Values.env }}
- name: ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: {{ include "microservice.fullname" . }}
key: encryptionKey
- name: MICROSERVICE_NAME
value: {{ include "microservice.fullname" . }}
- name: MICROSERVICE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MICROSERVICE_INSTANCE_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: MYSQL_HOST
value: {{ .Values.database.mySql.host }}
- name: MYSQL_PORT
value: "{{ .Values.database.mySql.port }}"
- name: MYSQL_USER
valueFrom:
secretKeyRef:
name: {{ include "microservice.fullname" . }}
key: mySqlUser
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "microservice.fullname" . }}
key: mySqlPassword
livenessProbe:
httpGet:
path: /isAlive
port: http
failureThreshold: 3
periodSeconds: 10
readinessProbe:
httpGet:
path: /isReady
port: http
failureThreshold: 3
periodSeconds: 5
startupProbe:
httpGet:
path: /isStarted
port: http
failureThreshold: {{ .Values.deployment.pod.container.startupProbe.failureThr\
eshold }}
periodSeconds: 10
resources:
{{- if eq .Values.env "development" }}
{{- toYaml .Values.deployment.pod.container.resources.development | nindent 1\
DevSecOps 552
2 }}
{{- else if eq .Values.env "integration" }}
{{- toYaml .Values.deployment.pod.container.resources.integration | nindent 1\
2 }}
{{- else }}
{{- toYaml .Values.deployment.pod.container.resources.production | nindent 12\
}}
{{- end}}
{{- with .Values.deployment.pod.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.deployment.pod.affinity }}
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: {{ include "microservice.name" . }}
topologyKey: "kubernetes.io/hostname"
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.deployment.pod.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
The values (indicated by .Values.<something>) in the above template come from a values.yaml file.
Below is an example values.yaml file to be used with the above Helm chart template.
imageRegistry: docker.io
imageRepository: pksilen2/backk-example-microservice
imageTag:
env: production
auth:
# Authorization Server Issuer URL
# For example
# http://keycloak.platform.svc.cluster.local:8080/auth/realms/<my-realm>
issuerUrl:
pod:
annotations: {}
imagePullSecrets: []
container:
imagePullPolicy: Always
securityContext:
privileged: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 65532
runAsGroup: 65532
allowPrivilegeEscalation: false
env:
startupProbe:
failureThreshold: 30
resources:
development:
limits:
cpu: '1'
memory: 768Mi
requests:
cpu: '1'
memory: 384Mi
integration:
limits:
cpu: '1'
memory: 768Mi
requests:
cpu: '1'
memory: 384Mi
production:
limits:
cpu: 1
memory: 768Mi
requests:
cpu: 1
memory: 384Mi
nodeSelector: {}
tolerations: []
affinity: {}
You can remove things from the above list only if it is mandatory for a microservice. For example, if
a microservice must write to the filesystem for some valid reason, then the filesystem should not be
defined as read-only.
Below is a GitHub Actions CI/CD workflow for a Python microservice. The declarative workflow is
written in YAML. The workflow file should be located in the microservice’s source code repository in
the .github/workflows directory. Steps in the workflow are described in more detail after the example.
-config replacer.full_list(0).enabled=true
-config replacer.full_list(0).matchtype=REQ_HEADER
-config replacer.full_list(0).matchstr=Authorization
-config replacer.full_list(0).regex=false
-config 'replacer.full_list(0).replacement=Bearer ZXlK...aG\
JHZ='"
uses: docker/build-push-action@v5
with:
context: .
builder: ${{ steps.setupBuildx.outputs.name }}
push: true
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache
tags: ${{ steps.dockerImageMetadata.outputs.tags }}
labels: ${{ steps.dockerImageMetadata.outputs.labels }}
sonar.python.coverage.reportPaths=coverage.xml
17) Extract metadata, like the tag and labels for building and pushing a Docker image
18) Build and push a Docker image
19) Perform a Docker image vulnerability scan with Anchore
20) Upload the Anchore scan report to the GitHub repository
21) Install Helm
22) Extract the microservice version from the Git tag (remove the ‘v’ letter before the version
number)
23) Replace Helm chart versions in the Helm chart’s Chart.yaml file using the sed command
24) Update the Docker image tag in the values.yaml file
25) Lint the Helm chart and perform static code analysis for it
26) Upload the static code analysis report to the GitHub repository and perform git user configu-
ration for the next step
27) Package the Helm chart and publish it to GitHub Pages
Some of the above steps are parallelizable, but a GitHub Actions workflow does not currently support
parallel steps in a job. In Jenkins, you can easily parallelize stages using a parallel block.
You could also execute the unit tests and linting when building a Docker image by using the following
kind of Dockerfile:
WORKDIR /microservice
COPY ./requirements.txt /microservice/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /microservice/requirements.txt
COPY ./app /microservice/app
RUN pylint src
RUN python -m coverage run -m unittest
RUN python -m coverage xml
# You must implement sending unit test coverage report
# to SonarQube/SonarCloud here
The problem with the above solution is that you don’t get a clear indication of what failed in a build.
You must examine the output of the Docker build command to see if linting or unit tests failed. Also,
you cannot use the SonarCloud GitHub Action anymore. You must implement SonarCloud reporting
in the builder stage of the Dockerfile (after completing the unit testing to report the unit test coverage
to SonarCloud).
12.2.5: Deploy
In the Deploy phase, released software is deployed automatically. After a successful CI/CD pipeline
run, a software component can be automatically deployed. This is called continuous deployment
DevSecOps 559
(CD). Notice that both continuous delivery and continuous deployment are abbreviated as CD.
This can cause unfortunate misunderstandings. Continuous delivery is about releasing software
automatically, and continuous deployment is about automatically deploying released software to
one or more environments. These environments include, for example, a CI/CD environment,
staging environment(s) and finally, production environment(s). There are different ways to automate
software deployment. One modern and popular way is to use GitOps, which uses a Git repository or
repositories to define automatic deployments to different environments using a declarative approach.
GitOps can be configured to update an environment automatically when new software is released.
This is typically done for the CI/CD environment, which should always be kept up-to-date and contain
the latest software component versions.
GitOps can also be configured to deploy automatically and regularly to a staging environment. A
staging environment replicates a production environment. It is an environment where end-to-end
functional and non-functional tests are executed before the software is deployed to production. You
can use multiple staging environments to speed up the continuous deployment to production. It is
vital that all needed testing is completed before deploying to production. Testing can take a couple of
days to validate the stability of the software. If testing in a staging environment requires three days
and you set up three staging environments, you can deploy to production every day. On the other
hand, if testing in a staging environment takes one week and you have only one staging environment,
you can deploy to production only once a week (Assuming here that all tests execute successfully)
Deployment to a production environment can also be automated. Or it can be triggered manually
after successfully completing all testing in a staging environment.
12.2.6: Operate
Operate is the phase when the software runs in production. In this phase, it needs to be secured
that software updates (like security patches) are timely deployed. Also, the production environment’s
infrastructure and platform should be kept up-to-date and secure.
12.2.7: Monitor
Monitor is the phase when a deployed software system is monitored to detect any possible problems.
Monitoring should be automated as much as possible. It can be automated by defining rules for alerts
triggered when the software system operation requires human intervention. These alerts are typically
based on various metrics collected from the microservices, infrastructure, and platform. Prometheus
is a popular system for collecting metrics, visualizing them, and triggering alerts.
The basic monitoring workflow follows the below path:
1) Monitor alerts
2) If an alert is triggered, investigate metrics in relevant dashboards
3) Check logs for errors in relevant services
DevSecOps 560
4) Distributed tracing can help to visualize if and how requests between different microservices
are failing
Each service must log to the standard output. If your microservice is using a 3rd party library that logs
to the standard output, choose a library that allows you to configure the logging format or request
the log format configurability as an enhancement to the library. Choose a standardized log format
and use it in all microservices, e.g., use Syslog format or OpenTelemetry Log Data Model (defined in
a later section). Collect logs from each microservice to a centralized location, like an ElasticSearch
database.
Integrate microservices with a distributed tracing tool, like Jaeger. A distributed tracing tool collects
information about network requests microservices make.
Define what metrics are needed to be collected from each microservice. Typical metrics are either
counters (e.g., number of requests handled or request errors) or gauges (e.g., current CPU/memory
usage). Collect metrics that are needed to calculate the service level indicators (SLIs). Below are listed
the five categories of SLIs and a few examples of SLIs for each category.
• Availability
• Error rate
– How many times a service has been restarted due to a crash or unresponsiveness
– Message processing errors
– Request errors
DevSecOps 561
– Other errors
– Different errors can be monitored by setting a metric label. For example, if you a _-
requesterrors counter and request produces an internal server error, you can increment
the _requesterrors counter with a label _internal_servererror by one.
• Latency
• Throughput
• Saturation
Instrument your microservice with the necessary code to collect the metrics. This can be done using
a metrics collection library, like Prometheus.
Create a main dashboard for each microservice to present the SLIs. You must also present service level
objectives (SLOs). When all SLOs are met, the dashboard should show SLI values in green. If an SLO
is not met, the corresponding SLI value should be shown in red. You can also use yellow and orange
colors to indicate that an SLO is still met, but the SLI value is no more optimal. Use a visualization tool
that integrates with the metrics collection tool, like Grafana with Prometheus. You can usually deploy
metric dashboards as part of the microservice deployment. If you are using Kubernetes, Prometheus
and Grafana, you can create Grafana dashboards as custom resources (CRs) when using the Grafana
Operator.
12.2.7.5: Alerting
To define alerting rules, define first the service level objectives (SLOs) and base the alerting rules on
them. An example of an SLO: “service error rate must be less than x percent”. If an SLO cannot be
met, an alert should be triggered. If you are using Kubernetes and Prometheus, you can define alerts
using the Prometheus Operator and PrometheusRule CRs.
Software operations staff connects back to the software development side of the DevOps lifecycle in
the following ways:
The first one will result in a solved case or bug report. The latter two will reach the Plan phase of
the DevOps lifecycle. Bug reports usually enter the Code phase immediately, depending on the fault
severity.
12.2.9.1: Logging
• (CRITICAL/FATAL)
• ERROR
• WARNING
• INFO
• DEBUG
• TRACE
I don’t usually use the CRITICAL/FATAL severity at all. It is better to report all errors with the ERROR
severity because then it is easy to query logs for errors using a single keyword, for example:
You can add information to the log message itself about the criticality/fatality of an error. When you
log an error for which there is a solution available, you should inform the user about the solution in
the log message, e.g., provide a link to a troubleshooting guide or give an error code that can be used
to search the troubleshooting guide.
Do not log too much information using the INFO severity because the logs might be difficult to read
when there is too much noise. Consider carefully what should be logged with the INFO severity and
what can be logged with the DEBUG severity instead. The default logging level of a microservice
should be WARNING or INFO.
Use the TRACE severity to log only tracing information, e.g., detailed information related to
processing a single request, event, or message.
If you are implementing a 3rd party library, the library should allow customizing the logging if the
library logs something. There should be a way to set the logging level, and a way to allow the code
that is using the library to customize the format a log entry is written in. Otherwise, 3rd party library
log entries appears in the log in different format than the log entries from the microservice itself.
This section describes the essence of the OpenTelemetry log data model version 1.12.0 (Please check
https://github.com/open-telemetry/opentelemetry-specification for possible updates).
A log entry is a JSON object containing the following properties:
DevSecOps 565
Below is an example log entry according to the OpenTelemetry log data model.
{
"Timestamp": "1586960586000000000",
"TraceId": "f4dbb3edd765f620",
"SpanId": "43222c2d51a7abe3",
"SeverityText": "ERROR",
"SeverityNumber": 9,
"Body": "20200415T072306-0700 ERROR Error message comes here",
"Resource": {
"service.namespace": "default",
"service.name": "my-microservice",
"service.version": "1.1.1",
"service.instance.id": "my-microservice-34fggd-56faae"
},
"Attributes": {
"http.status_code": 500,
"http.url": "http://example.com",
"myCustomAttributeKey": "myCustomAttributeValue"
}
}
The above JSON-format log entries might be hard to read as plain text on the console, for example,
when viewing a pod’s logs with the kubectl logs command in a Kubernetes cluster. You can create
a small script that extracts only the Body property value from each log entry.
PrometheusRule custom resources (CRs) can be used to define rules for triggering alerts. In the
below example, an example-microservice-high-request-latency alert will be triggered with a major
severity when the median request latency in seconds is greater than one (request_latencies_in_sec-
onds{quantile=“0.5”} > 1).
DevSecOps 566
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: example-microservice-rules
spec:
groups:
- name: example-microservice-rules
rules:
- alert: example-microservice-high-request-latency
expr: request_latencies_in_seconds{quantile="0.5"} > 1
for: 10m
labels:
application: example-microservice
severity: major
class: latency
annotations:
summary: "High request latency on {{ $labels.instance }}"
description: "{{ $labels.instance }} has a median request latency above 1s (curre\
nt value: {{ $value }}s)"
13: Appendix A
Figure 13.1. utils.py
import os
import traceback
class InputOrder(BaseModel):
userId: str
orderItems: list[OrderItem]
class Config:
orm_mode = True
class OrderItem(BaseModel):
id: int
salesItemId: str
quantity: PositiveInt
class Config:
orm_mode = True
class Meta:
orm_model = OrderItemEntity
class OutputOrder(BaseModel):
id: str
userId: str
orderItems: list[OrderItem]
class Config:
orm_mode = True
Appendix A 569
class Base(DeclarativeBase):
pass
class Order(Base):
__tablename__ = 'orders'
class OrderItem(Base):
__tablename__ = 'orderitems'
__table_args__ = (
PrimaryKeyConstraint('orderId', 'id', name='orderitems_pk'),
)
id: Mapped[int]
salesItemId: Mapped[int] = mapped_column(BigInteger())
quantity: Mapped[int]
orderId: Mapped[int] = mapped_column(ForeignKey('orders.id'))
Appendix A 570
class OrderServiceError(Exception):
def __init__(
self,
status_code: int,
message: str,
cause: Exception | None = None,
):
self.__status_code: Final = status_code
self.__message: Final = message
self.__cause: Final = cause
@property
def status_code(self) -> int:
return self.__status_code
@property
def message(self) -> str:
return self.__message
@property
def cause(self) -> Exception | None:
return self.__cause
class DatabaseError(OrderServiceError):
def __init__(self, cause: Exception):
super().__init__(500, 'Database error', cause)
class EntityNotFoundError(OrderServiceError):
def __init__(self, entity_name: str, entity_id: int):
super().__init__(
404, f'{entity_name} with id {entity_id} not found'
)
Appendix A 571
import strawberry
@strawberry.experimental.pydantic.input(model=InputOrder)
class InputOrder:
userId: strawberry.auto
orderItems: list[InputOrderItem]
import strawberry
@strawberry.experimental.pydantic.input(model=OrderItem, all_fields=True)
class InputOrderItem:
pass
import strawberry
@strawberry.experimental.pydantic.type(model=OutputOrder)
class OutputOrder:
id: strawberry.auto
userId: strawberry.auto
orderItems: list[OutputOrderItem]
import strawberry
@strawberry.experimental.pydantic.type(model=OrderItem, all_fields=True)
class OutputOrderItem:
pass
14: Appendix B
Here is the source code for the proto_to_dict function:
Figure 14.1. grpc/proto_to_dict.py
# This is free and unencumbered software released into the public domain
# by its author, Ben Hodgson <[email protected]>.
EXTENSION_CONTAINER = '___X'
TYPE_CALLABLE_MAP = {
FieldDescriptor.TYPE_DOUBLE: float,
FieldDescriptor.TYPE_FLOAT: float,
FieldDescriptor.TYPE_INT32: int,
FieldDescriptor.TYPE_INT64: int,
FieldDescriptor.TYPE_UINT32: int,
FieldDescriptor.TYPE_UINT64: int,
FieldDescriptor.TYPE_SINT32: int,
FieldDescriptor.TYPE_SINT64: int,
FieldDescriptor.TYPE_FIXED32: int,
FieldDescriptor.TYPE_FIXED64: int,
FieldDescriptor.TYPE_SFIXED32: int,
FieldDescriptor.TYPE_SFIXED64: int,
FieldDescriptor.TYPE_BOOL: bool,
FieldDescriptor.TYPE_STRING: str,
FieldDescriptor.TYPE_BYTES: lambda b: b.encode('base64'),
FieldDescriptor.TYPE_ENUM: int,
}
Appendix B 573
def repeated(type_callable):
return lambda value_list: [
type_callable(value) for value in value_list
]
def proto_to_dict(
pb, type_callable_map=TYPE_CALLABLE_MAP, use_enum_labels=False
):
result_dict = {}
extensions = {}
for field, value in pb.ListFields():
type_callable = _get_field_value_adaptor(
pb, field, type_callable_map, use_enum_labels
)
if field.label == FieldDescriptor.LABEL_REPEATED:
type_callable = repeated(type_callable)
if field.is_extension:
extensions[str(field.number)] = type_callable(value)
continue
result_dict[field.name] = type_callable(value)
if extensions:
result_dict[EXTENSION_CONTAINER] = extensions
return result_dict
def _get_field_value_adaptor(
pb, field, type_callable_map=TYPE_CALLABLE_MAP, use_enum_labels=False
):
if field.type == FieldDescriptor.TYPE_MESSAGE:
# recursively encode protobuf sub-message
return lambda pb: proto_to_dict(
pb,
type_callable_map=type_callable_map,
use_enum_labels=use_enum_labels,
)
if field.type in type_callable_map:
return type_callable_map[field.type]
raise TypeError(
'Field %s.%s has unrecognised type id %d'
% (pb.__class__.__name__, field.name, field.type)
)
def get_bytes(value):
Appendix B 574
return value.decode('base64')
REVERSE_TYPE_CALLABLE_MAP = {
FieldDescriptor.TYPE_BYTES: get_bytes,
}
return field_mapping
return input_value