Block 1 Part 3 - Architectures For A Modern Web - View As Single Page - OU Online
Block 1 Part 3 - Architectures For A Modern Web - View As Single Page - OU Online
Unless otherwise stated, copyright © 2024 The Open University, all rights reserved.
Printable page generated Monday, 28 Oct 2024, 19:01
Mark Hall
1 Introduction
The web started out as a place primarily for accessing information, but it very quickly developed into a
space that allowed a bi-directional exchange between the user and the service provider. The exchange
may be as simple as having the user send a search query and the service provider responding with a list
of matching services. Nowadays, however, more complex exchanges are common, as we use fully
fledged applications via the web.
Such web applications will contain a large number of components that interact, including the user-
interface the users interact with, the server(s) that provide the business logic, the data-storage systems,
and any external services that are used to provide the application and its services.
Correctly developing and maintaining the components is a complex task. However, over time a number
of common architectures have been developed, which provide guidelines for structuring the components
and their interactions. The main benefits of working based on an existing architecture are:
The resulting code structures are generally easier to maintain, as the architecture forces certain
separations of concern. This causes a slight reduction in the initial development speed, but
significantly speeds up maintenance in the long run.
It is easier for new members of the development team to understand the code and how it works, as
they can use their existing understanding of the architecture as a guide towards understanding the
application’s code.
It reduces the amount of code that has to be written, as existing frameworks, which fit into the
architecture, can be used to implement common functionality.
In Block 1 Part 3 you will look at a subset of architectures for structuring a large web application, as well
as a few of the architectures that are available for data storage. Not all of the possible architectures, will
be considered here, so it is generally worth spending some time investigating and thinking about
possible architectures before starting development (Goniwada, 2022).
2 Aims
After studying this part you will have an understanding of:
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 1/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
the elements of these architectures and how they can be combined to achieve the business
requirements
the core principles used to model the business requirements
the primary data storage architectures and how they relate to the business requirements.
3 Client-server
In the web development domain you will find a large number of architectures that can be used.
Fundamentally these are all variations and extensions of the core ‘client-server’ architecture, which in its
most simple form looks as the one shown in Figure 1
Figure 1 Diagram showing how the client is connected to the server via a Network
Show description
A Client sends a request to the Server via the Network. The server does some processing and then
returns a response to the Client via the Network. The reason this architecture underpins so much is that
it is incredibly flexible. It says nothing about what the client is or what the server is. It places no
constraints on the type of network (or networks) that link the two.
The client could be a person using a computer to request a service or it could be an Internet-of-things
device providing data to a smart-home controller. The client could provide the user with a complex,
interactive, graphical interface or it could be a simple command-line client. The server could be a simple
piece of software that sends back the same static data every time or it could itself act as a client to a
range of other servers in order to combine their services into a meta-service that it provides.
In order to keep things manageable, you will primarily look at the scenario where the client uses a Web
Browser to present the information to the user and to react to user interactions.
Three-tier model
Additionally, rather than work off the very generic client-server model, the three-tier model shown in
Figure 2 is more commonly employed.
Show description
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 2/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
In the classic three-tier model, the Business Logic layer fetches the data from the Data layer, then based
on the data creates the HTML to send to the client. In the Client, the HTML is then shown to the user in
the browser and the user interacts with the rendered HTML in the browser. Based on the interactions,
the browser will send a request to the Business Logic layer, which will interpret the request, compare it
to the business logic that has been implemented, fetch the appropriate data, render it to HTML and send
it back.
Modern web applications, rather than just being static displays of HTML often employ complex frontend
applications written in JavaScript. The result of this is that the modern three-tier model looks more like
that shown in Figure 3.
Figure 3 Diagram showing the modern web application three-tier structure of UI Logic–Business
Logic–Data
Show description
At the client end we have the User Interface (UI) Logic layer, which presents the data to the user and
provides a rich user interface that the user can interact with. By moving this user-interface logic into the
client’s browser and running it directly on the user’s client system, it is possible to create a much more
reactive and fluid user experience.
An additional benefit is that this approach also forces a relatively strong separation between this user-
interface logic and the business logic that defines which actions the user is able to undertake at any
point and how the data is updated. Because UI Logic and Business Logic are now separated by the
network, with only data flowing between the two, the Business Logic layer must define a very clear API
(Application programming interface) that specifies which pieces of data can be requested, and how and
what parameters must be sent in order to change the stored data.
This has two main benefits. First, it ensures that a specific part of the data is only retrieved or modified in
a single place in the code. Thus if there are any changes that have to be made to that, then they only
need to be made in a single place. If the data was modified in multiple places, then the changes would
have to be made in multiple places, increasing the chances that one place would be missed, resulting in
the introduction of bugs or security issues.
Second, by separating UI and business logic, it is possible to test the two separately. This simplifies the
testing process and also makes it much easier to automate it. The automation also means that the tests
can be run after every change, ensuring that the change does not have any unexpected side effects.
With this motivation in place, you will now look at the three layers in more detail.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 3/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
There are of course variations in what exactly is provided as functionality by the UI Logic layer. In some,
data flows primarily from the server to the client, while in others that is inverted or the data flows are
relatively even. All possible scenarios are, however, covered by the web application lifecycle shown in
Figure 4.
Show description
After the initial load of the application, an initial version of the user interface is rendered (displayed) to
the user. Then the application enters its core loop. First, it waits for an event to occur. We will cover what
kind of events this includes later. Then, depending on the event, it will either send data to the server,
fetch data from the server, or simply update its user interface state. In all three cases this will update the
state of the application, which will cause all or parts of it to be re-rendered. After the re-rendering is
complete, the application waits for the next event.
Load: loading the application represents the first step in the application lifecycle. This includes
loading the core HTML of the application, the JavaScript files that implement the functionality, and
the CSS files that style the application. An important aspect of this step is that speed often matters.
Where users have the choice of trying something else out, if the application does not show them
that something is happening within a very short time, they will often leave before trying the
application. Over time various tricks have been developed to improve this, some of which will be
discussed when you look at server-side rendering later in this block.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 4/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Initial Render: after the minimum HTML, CSS, and JavaScript files have been loaded, an initial
version of the user interface is shown to the user. This does not necessarily mean that the
application is fully useable, but it provides a minimum state that the user can see and know that
something is happening. If you use a tool such as Lighthouse, then this initial render time is
reported as an important statistic.
Wait for Event: at this point the application enters its main loop, where it waits for an event to
occur and then takes an action based on that event. Events fundamentally fall into four categories:
User interaction events: these are the most commonly used type of event, as they include
every kind of interaction the user may undertake with the application. These range from
simple interactions such as scrolling the page, clicking on a link, or entering data into a field,
to complex interactions such as dragging and dropping something or multi-touch events.
Timed events: timed events are exactly that, events that happen at a specific time. Some of
these are one-off events, such as using a timeout to automatically hide a notification popup
after a given amount of time has passed. Others are timed events that happen at regular
intervals, for example to fetch the latest news on a news application or regularly sending
updates to the server.
Server-sent events: server sent events are an alternative to timed events for fetching data.
Checking for updates at regular intervals is inefficient, as in most cases there will be nothing
new. The idea with server-sent events is to have a persistent network connection and then,
when there is new data on the server, the server can initiate sending this data to the
application, generating a server-sent event with the new data.
State events: state events include the initial render having been completed, often triggering
the fetching of additional data or the application being unloaded, because the user has closed
the tab, window, or browser and causing some final cleanup activities. The event will then
either trigger sending data to the server, fetching data from the server, or a change to the
user interface that does not cause an any change in the data. An example of the last case is if
in the Music Streaming case study, the user switches between showing the available albums
in a grid or in a list layout.
Send Data: this step sends data to the server. This may be in response to a user action, such as
submitting a form, but it may also be based on a time event, such as sending a regular ‘I’m still
active’ message to the server. Depending on the data sent, the server may simply store that, but it
may also send updated data in a response. In both cases the application will now re-render.
Fetch Data: this step simply retrieves data from the server and then updates the state of the data
in the application. It then triggers a re-render.
Rerender: at this point the state of the application has updated and this change needs to be
reflected in what is shown to the user. After that the application returns to the Wait for Event state
to wait for the next event.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 5/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Implementing each one of these steps creates a lot of complexity and would require a lot of code. Since
it would be inefficient to build everything from scratch, in practice we use Frontend Frameworks to
provide the foundations for implementing the UI Logic layer, which you will look at in more detail in
Part 4 .
10–15 minutes
Pick one of your favourite web applications. Then, spend a few minutes identifying a few of the
elements of the application lifecycle in your interaction with the web application. You can use the
browser’s developer tools (Shift+Ctrl+I or F12 (Firefox on Windows/Linux), Option+⌘+I
[Firefox on MacOS], Shift+Ctrl+J (Chrome on Windows/Linux), Option+⌘+J (Chrome on Mac
OS), Shirt+Ctrl+I (Edge on Windows/Linux), Option+⌘+I (Edge on Mac OS) to look inside
the application. You can use the inspector to see which parts of the application change as you
interact with it. Use the network tab to see what data is being transmitted from and to your
application.
Share your thoughts and patterns on the forum to see what patterns other students found and
where your patterns overlap and differ.
Note
You will see that in this section we use the terms ‘resource’ and ‘data’ to refer to what the server
provides to the client. While the terms are often used interchangeably, a ‘resource’ is the
conceptual object that the server provides, while the ‘data’ is the concrete data provided for the
‘resource’. In the case of our Music Streaming service, one ‘resource’ provided by the service is a
‘playlist’. Accessing the playlist resource will return the ‘data’ for the playlist, which will be the
concrete data for each song in the playlist.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 6/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Figure 5 Flowchart showing the access flow for both public and authorised access
Show description
At the top there are two levels of access, either ‘public’, meaning without requiring the client to be
authorised to access the server, or ‘authorised’, meaning the client needs to provide some kind of
credentials to access the server. Both access types can Read Resources, which means that they can
request data from the server. Whether resources are available with or without authorisation always
depends on the specific use scenario.
Unlike read access, Write Resources access allows the client to request that the server update the data
for a given resource. While such access may be provided without authorisation, it is very unusual. When
the client sends data to write to a resource, the server will first check that they are authorised to do so
and then the next step is to Validate the Data. This needs to be done to ensure that the application’s
data remains valid, as defined by the business requirements. These can be very simple checks, such as
ensuring that a new playlist has a title, but can also be very complex such as ensuring that, for example,
the user has the necessary subscription level or funds available in their account in order to access a
given song. If any validation rules fail, then that information is sent back to the client, which can then
show the user an appropriate message.
If the data passes the validation checks, then the final step is to Apply the Business Logic. In some
cases that will just mean storing the data in the Data Tier, such as when creating a new playlist, which
simply needs to be stored. However, some data changes may prompt further processes to run. For
example, if the user changes their subscription level, then that would trigger further processes in the
billing side of the service. The change in the billing may then trigger even further processes such as
sending a notification email to the user. Some of these business logic processes may take a while, so
may continue running in the background even after the user’s original request has long been completed.
As you can see, because of the wide range of possible scenarios and business requirements, the
Business Logic layer will often contain a large number of interacting components and a number of
specific architectures for this have been developed over the years.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 7/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Monolithic architecture
From a conceptual point-of-view the most simple architecture is the ‘monolithic’ architecture. In this
architecture all the services functionalities are contained within one, monolithic piece of software. The
advantages this brings is that the tight coupling of the individual components makes for an efficient and
performant solution. The main downsides are on the maintenance and development side of things. With
all functionality contained in a single code base, all changes have to happen in lockstep, which can be
limiting for areas of the service that are younger and as a result need to evolve more quickly.
A further downside is that having everything in one place makes it possible to take developmental
shortcuts, that then create maintenance headaches. For example, rather than modifying an existing
function to handle the extra option that a new feature needs, code from the function can easily be added
to the new feature. This, however, generates code duplication and if a bug needs to be fixed in the
original function, it now needs to be fixed in two places.
Service-oriented architecture
The ‘service-oriented’ architecture (SOA) forms addresses the issues of monolithic software by splitting
the design into two or more independent services (Dikmans and Van Luttikhuizen, 2012)). Each service
implements a specific business activity and provides an explicit API for interacting with the service.
Multiple services are then combined in order to implement the full set of business requirements. The
focus on explicit APIs for interaction between services focuses the development on the clean definition
of the interaction patterns and what data is exchanged in what way.
The main advantage is that each service can now be developed and evolved at its own speed. As long
as the API doesn’t change or any change is compatible, the change doesn’t have any impact on the
other services. Additionally, the structure makes it easier to (re-)use existing services.
The main downside is that there is an additional overhead that is incurred when calling a service.
Additionally when the API of a service changes in an incompatible way, more effort needs to be
expended to ensure that all uses of that API in other services are found and updated.
Overall, these downsides are relatively small and variations of the service-oriented architecture form the
basis of most modern architectures.
Web-oriented architecture
The ‘service-oriented’ architecture in itself does not specify how the client talks to the individual services
or how the services talk to each other. The web-oriented architecture (WOA) is a version of the service-
oriented architecture, using technologies from the web environment for service access and data
exchange.
Microservices
‘Microservices’ take the SOA and WOA concepts and further reduce the size of each service and then
use service composition to combine the microservices into the overall service (Bucchiarone et al.,
2020)). The Figure 6 illustrates this for the music streaming service.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 8/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Figure 6 Flowchart showing the links between the components in a microservices architecture,
including the clients, frontend servers, and the backend microservices that are combined to deliver the
solution
Show description
On the right side of the diagram there are four services (Account, Streaming, Playlist, and Shop) that
implement the individual parts of the music streaming service. On the left there are three clients. A web
client in the browser, a mobile app client, and a smart speaker client. The mobile app and smart speaker
clients communicate with the service via an additional ‘API service’. The API service is a composite
service that in itself doesn’t implement anything, but instead provides a single entry point for all API
requests and then forwards them to the relevant microservices. The web client doesn’t use the API
service, but instead accesses the ‘Web server’, which in turn accesses the individual microservices. The
difference here being that the Web server renders the data for sending to the client as HTML, rather
than the client doing the rendering, as is the case for the other two clients.
As will be seen later with frontend frameworks, it is entirely possible for a modern Web client to also
directly talk to the API service, further simplifying the complexity of the solution.
The reason microservices are very attractive is that it is often easy to map each service to an individual
development team within the organisation. This means that development changes within a service (that
do not change the API) do not have to be co-ordinated between teams, significantly reducing the amount
of co-ordination effort required.
The example diagram also illustrates the main downside. Even a simple application requires a large
number of services. Co-ordinating and testing any API changes adds significant complexity. Also the
increased number of network connections that have to be made for each request adds a performance
penalty to the application.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 9/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Persistence
The primary functionality a data storage solution provides to the application is a location where data can
be stored persistently, meaning that even if the application (and the storage solution) is restarted or
moved somewhere else, the data remains available and the application restarts into the last state that
was persisted.
When looking into the kind of persistence offered by a storage solution, the first aspect to consider is the
tradeoff between certainty and performance. The storage system can offer a high degree of certainty in
the storage process, for example by first writing a log entry for the data change, then making and storing
the change in the actual data storage, and only when that has all been successful, to confirm to the
application that the data has been persisted. This has performance implications, as writing to physical
storage is slow.
In order to improve performance, it is possible to relax the certainty aspects in the storage process. A
storage solution could offer to make only a best effort at storing the incoming data. While accepting that
not all data is persisted may initially seem strange, there are cases where it is not only possible, but has
only limited impact. For example, a smart meter sends constant usage data to a variety of receivers,
which power things like the in-house monitor or online usage graphs. If a small fraction of these usage
data points is not persisted correctly, then that has only a minor impact, as it will only show as small
gaps in the data. As long as the usage data at the end of the billing period is stored with full certainty, the
performance improvement is worth the occasional loss of data.
In the last example the data streams are relatively constant, but in many scenarios the incoming data
tends to come in bursts. For these scenarios the data storage system can promise that the stored data
will eventually be consistent and certain. In this approach during bursty phases, the stored data will not
always represent the latest state, but the system will catch up when less data is incoming and will
eventually catch up and represent the latest state.
In addition to this basic storage functionality, the data storage solution can also enforce data consistency
and data access rules. Again use of both of these will reduce the performance of the system. Where to
find the balance between using the data storage solution to enforce certain conditions and performance
is something that is again primarily driven by the requirements of the use case.
Data exchange
The data storage system can play a second role and that is to support the data exchange between
different applications. In this scenario the data storage system acts as a way of decoupling the two or
more applications that are exchanging data. The smart meter example given above is exactly such a
scenario. The app on the user’s mobile phone which shows the user their current consumption does not
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 10/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
have to know anything about the smart meter that is measuring this data. Instead the smart meter sends
the data to the server, which stores them in the data storage system. Then, the mobile app can simply
request the current data from the server and show this to the user.
The big advantage of this decoupling of the data producer and data consumer via the data storage
system is that both the data producer and consumer systems can be updated without affecting the other
system, as long as the data model does not change. It is also easily possible to add further consumers
to the system, without having to re-engineer any of the existing systems.
Scalability
The final major functionality that the data storage system can provide is support for scaling. As the
number of users of a system increases, there comes a point where a single piece of physical hardware
can no longer support the full use. At this point the application needs to be split over multiple pieces of
physical hardware. As most data storage systems come with built-in support for this, the data storage
system can simplify scaling the application itself.
As long as all of the application’s state is stored in the data storage system, then the application can
(relatively) trivially be scaled by running multiple copies of the application on separate pieces of
hardware, but all using the same data storage system. In this setup the application never needs to know
that there are actually multiple copies of it running, which significantly simplifies the application’s
architecture and code. At the same time the only limit to scaling is the scalability of the data storage
system.
4 Representing reality
The vast majority of web applications are in one way or another linked to our (physical) reality and even
those, such as games, that are not necessarily linked to our reality are linked to a reality. At the same
time it is never possible to represent all of the reality in a computational representation (Eco, 1994). We
thus need to restrict ourselves to representing those aspects of the reality that are relevant to
successfully implement the web application’s requirements. This is the process of modelling reality.
This modelling process is always a process of reduction, simplification, and abstraction. But what do we
have to reduce it to? This brings us to the technical aspects of storing and modelling reality. Since the
technical requirements are shared quite heavily across applications (both in the web area and in other
areas as well), a number of common architectures for storing data have been developed over time,
including relational, document-centric, and graph databases, as well as search systems.
These data storage systems sit behind the backend layer of the application and in this part you will also
look at two example backend frameworks that sit between the data coming from the client and the data
storage systems. One of the core roles that the backend layer provides is the validation of data going
into and out of the storage system. This ensures that the stored data always conforms to the business’
requirements.
The process of representing reality in a digital system is a process that consists of a number of activities
that interact with each other. The first activity is to develop an accurate understanding of the
requirements that have to be represented in the system. The second activity is the translation of these
requirements into an abstract model, which is able to contain all the information needed to satisfy the
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 11/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
requirements. As an additional requirement, the model should also be open to future modifications. The
third activity is to then convert the model into the technical representation of the chosen data storage
system. Finally, the fourth activity is to actually populate the storage system with data.
Importantly these activities are for most cases not linear or static, but instead interdependent and
dynamic. Depending on the chosen storage system, individual aspects of the model need to be
modelled differently. For example, when planning for a relational storage system, things like a user and
their addresses would be modelled as two independent objects with a relation between them. However,
in a document-centric model, the addresses would be modelled as part of the user.
Regardless of which approach is taken, the solution will have advantages and disadvantages and, even
though the marketing materials will suggest otherwise, there is no solution that comes only with
advantages. The aim of this section is to give you some exposure to this process and to harden you
against the wild promises of a single solution to rule them all.
To these specific requirements come the implicit requirements that exist because within the company
everybody knows these anyway:
Each one of these would have further details attached to them, but for the purpose of modelling the
requirements, some assumptions about the details will simply be made . Let us put all of that into a
(simplified) UML Class Diagram (Grässle et al., 2005). You could also have used an Entity-relationship
diagram (Bagui and Earp, 2011) or a mind map. The modelling tools used and the specific format are
less important than whether the model is accompanied by a textual description explaining the modelling
decisions that went into the process.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 12/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Show description
The important first thing to remember is that this is just one possible way to model this reality. It is not
the only one and until you actually have to deal with the specific details in practice and in
implementation, it is not even possible to state that it is correct, never mind good. The only thing that can
be said is that it satisfies the requirements.
You will now look at some of the data modelling aspects in a bit more detail and justify why the shipping
example has been modelled like that. First, you can see that there are two objects representing the
Sender and Recipient of the Shipment. These could both either be companies or individuals, so you
specify that they are both sub-classes of a type called Entity, which consists simply of a name and an
Address object.
The model raises the question of whether Sender and Recipient really need to exist. We have
modelled them here because the requirements state that these two entities exist. However, we have
already modelled that they are actually just sub-classes of the more generic Entity and we could also
model them only as that entity and then add separate relationships between the Shipment and the
Entity as in the alternative model shown in Figure 8.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 13/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Show description
The model looks much simpler, so maybe it is better. However, in the first model, if, in the process of
digging into the details of the requirements, we find that there are attributes that are only relevant to the
Sender or Recipient, then this model cannot represent that easily. This second model is thus simpler,
but potentially harder to evolve, if additional knowledge becomes available.
A similar case arises with the Address and the Entity. In both models we have modelled these as
separate objects, even though the model specifies that each Entity has exactly one Address. Why
keep things this way, rather than merge the information into the Entity? This is the point where
experience with modelling comes into play. From experience we can know that even though it is not
specified as a requirement here, an Entity can often have multiple Addresses. As it is known that this
is often a situation that has to be modelled at some point, the model can be prepared for this change, by
keeping the two things separate.
These two examples highlight the three main tensions when modelling. You want to create a model that
can represent all of the requirements, you want to keep it as simple as possible, and you want to ensure
that the model can be maintained and extended in the future. Unfortunately there are no general rules
as to how these tensions can be balanced. There are really just the following two guidelines that should
be taken into account:
In general it is better to not overengineer a model. An overengineered model is one that can take
into account all kinds of future changes and additional requirements. That is useful if those
changes or additional requirements become necessary. However, until that comes to pass, the
additional complexity creates a lot more work and adds the potential for more bugs to arise.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 14/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
It is good to design a model so that it can be adapted for any future changes. The Entity and
Address modelling is a good example of not tying two parts of the model together too closely, thus
allowing for more easy future change.
How to balance the two is something that only comes with experience. In particular it comes from the
experience of getting the balance wrong and then having to put in a lot of effort to fixing the balance.
10–15 minutes
As stated above, only practical experience can really help with developing instincts for how to
model. To develop these instincts it is also good to look at other people’s models.
In the current model the Entity is very underspecified. Spend a few minutes thinking about how
the Entity should be modelled. Things to consider are the Address issue mentioned above, but
also things such as:
Shipping is frequently an international scenario. How does that impact the model?
Share your thoughts and models on the forum to see how your fellow students approached the
problem and think about what the relative strengths and weaknesses of their and your solutions
are.
The tables that hold the actual data. This is what is actually meant by the ‘relational’ part of the
name, as it derives from the mathematical foundation of a finitary relation, which essentially means
a set of values that are in some way related to each other. In the table structure each set of values
make up one row in the table.
A way of defining relationships between the tables, specifically a way of defining relationship
between individual rows in the tables.
Tables
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 15/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
The table is the core structure and getting it right in the database model can make the difference
between an application that is easy to develop for and maintain and one that is not. Before you look at
practically modelling something into tables, have a quick look at the basic structure of a table and its
elements. Look at Table 1, which is an example that stores the shipping activity information.
The example in Table 1 has two rows and four columns. Each row represents one set of values that
belong together, while each column represents one attribute that can be stored in the table. The four
columns in this case represent the following four attributes: a unique identifier, a reference to the
shipment the activities belong to, the activity description, and the timestamp at which the activity
occurred. Each of the four columns has a data type associated with it, in this case the first two are
numbers, the third a string, and the fourth a timestamp.
The first of these four attributes is particularly important. This unique identifier is what is known as the
primary key of the table. The primary key is defined as the one or more columns of the table where the
combination of the values in those columns is unique. This means that each combination of values
appears at most once in the table and can be used to uniquely identify a single row in the database.
it must be unchanging
When choosing a primary key, you could either use one of the actual attributes that the table has, which
would be called a ‘natural’ key, or you could create an additional attribute, which contains a unique key
(generally a number) that has no meaning beyond being the unique key (a ‘surrogate’ key). In the case
here, it has two attributes: the activity description and the timestamp. Neither will uniquely identify an
activity amongst all activities, so we have created an additional surrogate key, which is just a number to
uniquely identify each row.
An alternative is to look at the Shipment from the model. This already has an id attribute, which is
marked as a String, so can contain both letters and numbers. This seems like a good candidate for a
natural key. However, this is when the second rule comes into play and that is that the primary key must
never change. It must never change, because when it changes, that breaks any links from any other
tables that are referring to that row. So, is this id ever likely to change? Initially it looks unlikely, but if you
think of most shipping identifiers that consist of letters and numbers, then often the letters indicate
additional information, such as what long-distance shipping method to use (air, ship, lorry) or target area
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 16/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
(local, national, international). Where that is the case, then there can be a scenario where the primary
key needs to be changed, because, for example, the shipping method changes. If you investigate most
natural keys, you will find that there are scenarios in which any of them can change.
Thus even in this case, it is probably best to use a surrogate key as the primary key. You would also
mark the id attribute as unique, so that you don’t get any duplicates, but we don’t rely on it being
unchanging. Which is why the second column in the activity table refers to a numeric shipment id, rather
than a string id.
Relationships
The second major aspect of relational databases is that they have a way of defining relationships
between rows in different tables (and also between rows in the same table). This is where the full power
of relational databases comes to play, because the way these relationships work, they ensure that the
data remains consistent when it is changed. The way these relationships are marked out is through a
so-called ‘foreign key’, which is a column that contains a value that must also exist as a unique value in
another column and acts as a pointer from the current row to another row.
In the example above (Table 1) the second column is such a foreign key, pointing to a primary key in the
shipments table (not shown here) and uniquely identifying the shipment that this specific activities
row relates to.
Because multiple rows in the activities table can point to the same row in the shipments table, this
is referred to as a one-to-many relationship (one shipment to many activities), which is one of the three
types of relationships that are possible:
one-to-one: this is a relatively infrequent scenario, where there is a unique mapping between rows
in the two tables. An example is a users table, which contains information on the individual users,
and a configurations table, which contains configuration settings for each user. It makes sense
to keep them separate, because they are conceptually different things, but at the same time each
user will only ever have one configuration and each configuration belongs to exactly one user.
To create such a one-to-one relationship, the foreign key is added to one of the two rows and is
also marked as either being that table’s primary key or as being unique. That way it is possible to
link two rows, but the uniqueness constraint means that it is never possible to link one row to more
than one other row.
many-to-many: this relationship is used to model any kind of linkage between two tables, that can
occur any number of times. Unlike the other two relationships, where the foreign key is placed in
one of the two tables, a many-to-many relationship requires an additional table that contains only
two foreign keys, generally referred to as a ‘link table’. A good example for such a scenario would
be a table with students and one with modules. A student can study multiple modules and a
module will have many students. To create the link between the two, you would add an extra table
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 17/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
student_modules that would contain two columns: student_id and module_id, which would
both be foreign keys pointing to the respective tables. The link table’s primary key would then be
the combination of the two columns.
10–15 minutes
In Activity 2, you looked at modelling the Entity. Now, try turning the model you created into a
table structure, with one or more tables. Things to consider are:
How are you going to represent the address? As columns or as a separate table?
Share your thoughts and tables on the forum to see how your fellow students approached the
problem and think about what the relative strengths and weaknesses of their and your solutions
are.
Data definition language (DDL): this is the set of commands used to create and manipulate
tables and any other constraints and relationships.
Data control language (DCL): this is the set of commands used to manage access controls to the
data, for example to specify which users can read which tables, and which other tables they can
modify.
Data manipulation Language (DML): this is the set of commands used to create data in the
tables created via the DDL and then access or delete that data.
In practice with many backend frameworks we no longer write SQL directly. Instead we use what are
known as Object relational mapping (ORM) libraries that translate from an object representation of the
data (useful for data transmission and also display in the frontend) into the SQL representation that the
database wishes to see. The big advantage of this is that it becomes much easier to switch between
different databases, as the ORM library handles the subtle differences between how the database
systems implement SQL. When you come to the practical activities, you will use an ORM, rather than
dealing directly with SQL, which is also why we are not covering it in much detail.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 18/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
The document database approach fundamentally breaks with this idea and works off the principle that
data should be stored in chunks in the database that map closely to the structure required by the
application. As a result of this, the basic structural element in the document database is the document,
rather than the table row. However, unlike the table row, where each column contains exactly one value,
the document allows for more complex internal structures.
If you look at the Shipment and Activity objects again, then in a document database these could be
modelled as follows:
id: "EO8379283-4324"
activities:
- activity: "Shipment picked up at the sender"
timestamp: 1681908984
- activity: "Shipment created"
timestamp: 1681901757
There are many different file formats for representing documents. Here we use YAML (Yet another
markup language), simply because it is quite human-readable. The two main structures of YAML (and
document databases) are objects and lists. Objects are based on key-value pairs (such as id:
"E08379283-4324"), with the key (id) and value ("E08379283-4324") separated by a :. Multiple keys
at the same level of indentation together form an object (such id and activities). List elements are
defined by a starting -, with multiple elements that are at the same level of indentation forming a list. In
the example above, we have three objects: id and activities are the main object, and the two
activity and timestamp pairs form nested objects. The two nested objects together form a list with
two elements.
In the document the id field plays the same role as the primary key, in that it uniquely identifies the
document in the database. However, most document databases allow for easier updating of this id field,
as long as it remains unique. Thus we can, where appropriate, use a ‘natural’ id value, such as in the
example above.
The second thing you see in the example above is that the activities have been included directly into the
shipment document as a list. This makes sense, because in practice, the activities and the shipment
will always be updated together, as the shipment makes its way through the shipping process. Similarly,
when either the sender or the recipient check on the progress of their shipment, they will always be
shown all this information together. Since we are always dealing with this data as a chunk, it also makes
sense to store it as a chunk.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 19/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Unlike in the relational world, document databases in general do not have an explicit way of
representing relationships between documents. However, that does not mean that we cannot add a field
to the document that contains as its value the id of another document. The only thing that we do not get
is the automatic consistency checking that the relational database provides us with. If we delete a row in
a relational database that is the target of a foreign key, then you get an error. In the same situation in a
document database, the reference field now simply contains an id for a document that no longer exists,
but there will be no error raised by the system when that happens. We need to check and correct for this
situation in our application code.
10–15 minutes
In Activity 3 you modelled the Entity for a relational database. Now try modelling it in a document
database format. Things to consider are:
Should the sender and recipient be modelled inside the Shipment document or use
references?
Share your thoughts and tables on the forum to see how your fellow students approached the
problem and think about what the relative strengths and weaknesses of their and your solutions
are.
Search systems share the external data model with document-centric databases, in that the main model
for representing data to be added to the search system or data returned by the search system is the
document. Where the search systems differ is how that data is then stored within the system. There is
some variation between search systems in the storage structure, mainly depending on what kind of
searches the system is designed to support. These variations all build upon the basic data structure of
the search index.
If you work on the assumption that you have the following two shipment documents to index in the
system:
- id: FJ-3429-432
title: Your parcel from The Amazing Fun Shop
- id: VF-9302-4392
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 20/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
What the search system is designed to support is to allow the user to search both by the id, as well as
by part of the title. To enable that to work efficiently, the search system turns the two documents into
the following index:
Table 2 Search index consisting of the term and document identifier columns
FJ FJ-3429-432
3429 FJ-3429-432
432 FJ-3429-432
your FJ-3429-432
parcel FJ-3429-432
from FJ-3429-432
the FJ-3429-432
amazing FJ-3429-432
fun FJ-3429-432
shop FJ-3429-432
VF VF-9302-4392
9302 VF-9302-4392
4392 VF-9302-4392
consignment VF-9302-4392
#490284 VF-9302-4392
You can see that the index consists of only two columns: the term which can be searched for and the
identifier for the document that the search term appears in. If you now search for the term ‘amazing
shop’, then the search system can find all documents that the term ‘amazing’ appears in and also all
documents that the term ‘shop’ appears in. It can then return the list containing both those two sets, but
ordered so that documents in which both appear are returned first, a process known as ‘ranking’.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 21/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
If you look at the values in detail, you can see that the title has simply been split using white space. The
identifier has also been split up, but here using the ‘-’ to allow for search within the identifier. This could
be useful if a shipping label is damaged and there is a need to search for a partial shipping identifier.
This is a very simple search index and modern search systems use more complex data structures, but
they all share this basic principle that there are links from potential search terms to document identifiers.
The more complex structures allow for things like searching for prefixes. If you have ever encountered a
search box that starts showing results as soon as you start typing in it, then that is using prefix search.
Search indexes can also be extended to support things such as faceted search. Faceted search is
where certain fields in the original document are not split up into individual terms, but used as they are.
Where the number of values for a single field is relatively limited, this can be useful, because the list of
values can be shown to the user and they can limit their search simply by clicking on a value in the
interface.
The most complex aspect of any search system is ‘ranking’ the search results. The success of any
search system is primarily defined by the quality of its ranking. While at the most basic level the ‘best’
ranking can be defined by the degree to which the search terms and the text in the documents overlap.
When the number of search terms becomes larger, it becomes necessary to also take into account how
relevant the terms are for describing the document’s content. Similarly, in some contexts we may want to
include factors such as user-provided judgements in the search results, to ensure that more highly
judged documents are returned further up the search list.
5 Summary
You have now completed the third part of Block 1 and have had an introduction to the core architectures
of the web and how reality can be represented within the various data storage systems that exist. You’ve
also had the opportunity to try out modelling a few problems yourself.
Where next?
You are now ready to move on to Block 1 Part 4, where you will look at turning these architectures
and data storage models into working frontend applications.
References
Bagui, S. and Earp, R. (2011) Database design using entity-relationship Diagrams. 2nd edn. Boca
Raton, FL: Auerbach Publications.
Bucchiarone, A., Dragoni, N., Dustdar, S., Lago, P., Mazzara, M., Rivera, V. and Sadovykh, A. (eds)
(2020) Microservices: science and engineering. Cham: Springer. Available at: https://doi.org/10.1007/
978-3-030-31646-4
DeBarros, A. (2022) Practical SQL: a beginners’s guide to storytelling with data. 2nd edn. San
Francisco: No Starch Press.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 22/23
10/28/24, 7:00 PM Block 1 Part 3: Architectures for a modern web: View as single page | OU online
Dikmans, L. and Van Luttikhuizen, R. (2012) SOA made simple: discover the true meaning behind the
buzzword that is ‘service oriented architecture’. Birmingham: Packt Publishing.
Eco, U. (1994) ‘On the impossibility of drawing a map of the empire on a scale of 1 to 1’ in How to travel
with a salmon and other essays. 2nd edn. Translated from the Italian by W. Weaver. London: Minerva,
pp 95–106.
Goniwada, S. R. (2022) Cloud native architecture and design: a handbook for modern day architecture
and design with enterprise-grade examples. Berkley, CA: Apress.
Grässle, P., Baumann, H. and Baumann, P. (2005) UML 2.0 in action a project-based tutorial.
Birmingham: Packt Publishing. From technologies to solutions.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=2353273&printable=1 23/23