12 Factor Apps and Mlops Maturity
12 Factor Apps and Mlops Maturity
Put all code in the source control system, in one repository, all the time. A
codebase is forked, branched, modified, and run by developers on their own
dev VMs. Changes are committed to the branch, and when ready, a pull
request is made to merge the branch into the main, with review, at a new
version level. Over time, the code base is deployed to any number of other
environments, including many sets of testing machines and ultimately the live
production servers.
All the environments that the code runs in need to have dependencies, like a
database, or an image processing library, or a command-line tool. Never let
an application assume those things will be in place on any given machine.
Ensure it by baking those dependencies into the package description.
Most languages and frameworks provide a natural way to do this. List all the
versions of all the libraries expected to be in place, and when the code is
deployed, a command is run to download all the right versions and put them in
place. In R, use renv or build packages to be curated via a package manager.
This philosophy extends to the team managing entire machine configurations
using management tools like Docker.
Importance: High Without this, the team will have a constant slow time-suck
of confusion and frustration, multiplied by their size and number of
applications. Spare yourselves.
The code that talks to your database will always be the same. But the location
of that database (which machine it’s running on) will be different for a local
developer machine than it will be for production servers. Likewise, in the
testing environment, the team will want to log debugging information about
each web request, but in production that would be overkill. The same principle
applies to blob storage locations and taking advantage of parallel compute
cores.
Usernames and passwords for various servers and services also count as
configuration, and should never be stored in the code. This is especially true
because code is in source control (see I. above) which means that anyone
with access to the source will know all your service passwords, which is a bad
security hole as your team grows.
All configuration data must be stored in a separate place from the code,
strictly separated, and read in by the code at a deployment for runtime.
This allows great flexibility, so someone from your team could replace a local
instance of Redis with one served by Amazon through Elasticache, and the
code wouldn’t have to change.
This is another case where defining dependencies cleanly keeps the system
flexible and each part is abstracted from the complexities of the others (very
much a core tenet of good architecture).
Importance: High Given the current bindings to services, there’s little reason
not to adhere to this best-practice.
The process of turning the code into a bundle of scripts, assets and binaries
that run the code is the build. In R, the build assembles the package elements
like documentation, unit tests, and binaries. The release sends that code to a
server in a fresh package together with the nicely separated config files for
that environment (See III. above). Then the code is run so the application is
available on those servers.
The idea here is that the build stage does a lot of heavy lifting, and developers
manage it. The run stage should be simple and bullet-proof so that the team
can sleep soundly through the night, knowing that the application is running
well, and that if a machine gets restarted (say, a power failure happens) that
the app will start up again on launch without the need for human intervention.
Let’s say our app has a signup workflow, where a user has to enter 3 screens
of information to create their profile. One (wrong) model would be to store
each intermediate state in the running code, and direct the user back to the
same server until the signup process is complete. The right approach is to
store intermediate data in a database or persistent key-value store, so even if
the web server goes down in the middle of the user’s signup, another web
server can handle the traffic, and the system is none-the-wiser.
Importance: High Not only is a stateless app more robust, but it’s easier to
manage, generally incurs fewer bugs, and scales better.
This factor is an extension of factor IV. above. The idea is that, just like all the
backing services you are consuming, your application also interfaces to the
world using a simple URL.
Most of the time we get this for free because the application is already
presenting itself through a web-server. But let’s say we have an API that’s
used by both customers in the outside world (untrusted) and an internal
website (trusted). We might create a separate URL to the API that the website
can use which doesn’t go through the same security (firewall and
authentication), so it’s a bit faster for us than for untrusted clients.
Importance: Low Most runtime frameworks will give you this for free. If not,
don’t sweat it. It’s a clean way to work, but it’s generally not hard to change
later.
By keeping all these small parts working independently, and running them as
separate processes (in a low-level technical sense), the application will scale
better. In particular, you’ll be able to do more stuff concurrently, by smoothly
adding additional servers, or additional CPU/RAM and taking full advantage of
it through the use of more of these small, independent processes.
Importance: Low Trust the data architect to raise the red flag if this is going
to become an issue.
When deploying new code, we want that new version to launch right away and
start to handle traffic. If an application has to do 20 seconds of work (say,
loading giant mapping files into RAM) before it’s ready to handle real traffic,
we’ve made it harder to rapidly release code, and we’ve introduced more
churn on the system to stop/start independent processes.
Importance: Medium Depending on how often you are releasing new code
(hopefully many times per day), and how much you have to scale your app
traffic up and down on demand, be sure to understand the implications.
X. Dev/prod parity — Design for continuous deployment by keeping
development, staging, and production all as similar as possible
It has become in vogue in recent years to have a much more rapid cycle
between developing a change to your app and deploying that change into
production. For many companies, this happens in a matter of hours. In order
to facilitate that shorter cycle, and the risk that something breaks when
entering production, it’s desirable to keep each developer’s local environment
as similar as possible to production.
This means using the same backing services, the same configuration
management techniques, the same versions of package libraries, and so on.
Importance: Medium Developers will feel like taking shortcuts if their local
environment is working “well enough”. Onboarding new personnel is made
much easier if the entire team has nearly identical environments and tools.
Log files keep track of a variety of things, from the mundane (your app has
started successfully) to the critical (users are receiving thousands of errors).
At the very least, the std.out device in the environment should be capturing
errors and sending them to an error reporting service.
Importance: Low If you are relying on logs as a primary forensic tool, you are
probably already missing out on better solutions. Be sure to consolidate your
logs for convenience, but beyond that, don’t worry about being a purist here.
XII. Admin processes — Run admin/management tasks as one-off
processes
You’ll want to do lots of one-off administrative tasks once you have a live app.
For example, doing data cleanup on bad data you discover; running analytics
for a presentation you are putting together, or turning on and off features for
A/B testing.
Usually a developer will run these tasks, and when they do, they should be
doing it from a machine in the production environment that’s running the latest
version of the production code. In other words, run one-off admin tasks from
an identical environment as production. Don’t run updates directly against a
database, don’t run them from a local terminal window.
Summary
Some of these items may seem esoteric, as they are rooted in fundamental
systems design debates. But at the heart of a happily running system is an
architecture that is robust, reliable, and surprises us as little as possible.
These 12 factors are being adopted by most major software platforms and
frameworks, and to cut corners against their grain is a bad idea. Discuss
these issues with the dev ops team and investigate whether there are some
quick wins to improve the quality of your application design.