Ecto Cookbook PDF
Ecto Cookbook PDF
Ecto Cookbook PDF
of contents
Foreword
Ecto is not your ORM
Schemaless queries
Data mapping and validation
Dynamic queries
Multi tenancy with query prefixes
Aggregates and subqueries
Test factories
Constraints and Upserts
Polymorphic associations with many to many
Composable transactions with Multi
Replicas and dynamic repositories
Foreword
Ecto is one of oldest projects in the Elixir community. It started as a "Summer of
Code" project back in 2013, lead by Eric Meadows-Jönsson, with myself (José
Valim) as a mentor, and sponsored by Pragmatic Programmers and Interline
Vacations.
At the time, Elixir was still being defined as a programming language. For
instance, structs were not yet part of Elixir! And Ecto played a very important
role for the development of Elixir itself. I will explain why.
When I started designing Elixir, I knew I would eventually use Elixir for
building web applications and systems, as that's part of our daily job at
Plataformatec, the company behind Elixir. At the same time, I didn't want Elixir
to be a language focused on just Web programming. Quite the opposite, Elixir
should applicable to a wide variety of problems and domains. Therefore, Elixir
should be designed an extensible language, and eventually Elixir would be
extended to the Web domain. With this in mind, "Extensibility" became one of
the three language goals, alongside "Productivity" and "Compatibility" (with the
Erlang VM).
I spent months working on Elixir, with those goals in mind, but at some point we
would need to test if Elixir was truly an extensible language. Ecto would
eventually become one of those tests.
Still, working on Ecto was very exciting because it brought two interesting
challenges:
1. Can we write a performant, safe, and readable query language? At the time,
the benchmark was LINQ from .NET. However, as the name says, LINQ
was a Language Integrated Query, i.e. they had to change the host
programming language (C#) to support it. Therefore, could Ecto implement
something akin to LINQ, but without a need to change Elixir?
For the first question, we got the query language surprisingly right. The query
language we use today is still based on the one from our early drafts. We added
more features with time, made the syntax less verbose and more dynamic, but
the foundation is still the same and, more importantly, we didn't have to change
Elixir itself to make Ecto Query possible!
When it comes to the second question though, we got a bunch of things right and
a bunch of things wrong (although strangers on the internet are not always so
pleasant when you get things wrong).
For example, Ecto v1.0 had models and life-cycle callbacks, in a poor attempt to
emulate features found in patterns like Active Record. At the same time, when
implementing other features commonly associated to ORMs (Object-Relational
Mappers), such as dirty tracking, we were able to provide a lean and functional
solution via Ecto Changesets, which are well accepted to this day (the original
proposal dates back to Jan 2015).
Ecto also hit the mark when it comes to relying on the database strengths,
instead of using databases as dumb storage. Many applications that rely on
ORMs that treat the database as dumb storage end up with corrupt and
inconsistent data. Ecto, on the other hand, knows how to walk the line between
validations and constraints quite well. We will revisit the topic of ORMs as the
first recipe in the book.
Nevertheless, Ecto v1.0 had enough weaknesses that eventually led to Ecto v2.0.
When Ecto v2.0 was released, Plataformatec also announced the "What's new in
Ecto 2.0" ebook, which told readers how to migrate away from the "model
mindset", common in other languages and frameworks, to a more functional
approach. The ebook also covered many of the new Ecto features.
Eventually, the Ecto team started working on Ecto v3.0. The jump from Ecto
v2.0 to Ecto v3.0 was much smaller than from Ecto v1.0 to Ecto v2.0. That's
because Ecto v3.0 was mostly solidifying the choices done in Ecto v2.0.
For example, it turned out many developers were using the facilities in Ecto to
work with data that would never touch the database, so we broke Ecto apart into
Ecto and Ecto.SQL. Ecto v3.0 was also a good opportunity to remove outdated
code and do further performance improvements. We wrote a series of articles on
those changes on the Plataformatec blog.
After Ecto v3.0 was released, we also decided to open source the "What's new in
Ecto 2.0" ebook as Ecto guides. This cookbook is a curation of the available
Ecto guides into recipes. You can find the complete list of guides online. If you
find any typos or errors, you can fix them directly at the source.
But perhaps, the most important news in Ecto v3.0 is that Ecto has finally
become stable API. In other words, while we will continue releasing new Ecto
versions with bug fixes, enhancements, and performance improvements, we
don't have any plans for a new major version (v4.0). We know Ecto is not
perfect (nothing is) but today it is clear what Ecto is and what Ecto isn't. If Ecto
was never your cup of tea, now is a great time to explore different solutions, and
build on top of the foundation of the database drivers built by the Ecto team and
the Ecto community.
Acknowledgments
We want to thank the Ecto team for their fantastic work behind Ecto: Eric
Meadows-Jönsson, James Fish, José Valim, Michał Muskała, and Wojtek Mach.
We also thank everyone who has contributed to Ecto, be it with code,
documentation, by writing articles, giving presentations, organizing workshops,
etc. We also want to thank contributors to the Ecto guides, whose work and
changes are featured in this ebook.
Finally we appreciate everyone who has reviewed the "What's new in Ecto 2.0"
ebook, before it was made open source, and sent us feedback: Adam Rutkowski,
Alkis Tsamis, Christian von Roques, Curtis Ekstrom, Eric Meadows-Jönsson,
Jeremy Miranda, John Joseph Sweeney, Kevin Baird, Kevin Rankin, Michael
Madrid, Michał Muskała, Po Chen, Raphael Vidal, Steve Pallen, Tobias Pfeiffer,
Victoria Wagman and Wojtek Mach.
Ecto is not your ORM
Depending on your perspective, this is a rather bold or obvious statement to start
this book. After all, Elixir is not an object-oriented language, so Ecto can't be an
Object-relational Mapper. However, this statement is slightly more nuanced than
it looks and there are important lessons to be learned here.
O is for Objects
At its core, objects couple state and behaviour together. In the same user
object, you can have data, like the user.name , as well as behaviour, like
confirming a particular user account via user.confirm() . While some
languages enforce different syntaxes between accessing data ( user.name
without parentheses) and behaviour ( user.confirm() with parentheses), other
languages follow the Uniform Access Principle in which an object should not
make a distinction between the two syntaxes. Eiffel and Ruby are languages that
follow such principle.
Elixir fails the "coupling of state and behaviour" test. In Elixir, we work with
different data structures such as tuples, lists, maps and others. Behaviour cannot
be attached to data structures. Behaviour is always added to modules via
functions.
When there is a need to work with structured data, Elixir provides structs.
Structs define a set of fields. A struct will be referenced by the name of the
module where it is defined:
defmodule User do
defstruct [:name, :email]
end
Once a user struct is created, we can access its email via user.email . However,
structs are only data. It is impossible to invoke user.confirm() on a particular
struct in a way it will execute code related to e-mail confirmation.
defmodule User do
defstruct [:name, :email]
def confirm(user) do
# Confirm the user email
end
end
Even with the definition above, it is impossible in Elixir to confirm a given user
by calling user.confirm() . Instead, the User prefix is required and the user
struct must be explicitly given as argument, as in User.confirm(user) . At the
end of the day, there is no structural coupling between the user struct and the
functions in the User module. Hence Elixir does not have methods, it has
functions.
Relational mappers
An Object-Relational Mapper is a technique for converting data between
incompatible type systems, commonly databases, to objects, and back.
Similarly, Ecto provides schemas that maps any data source into an Elixir struct.
When applied to your database, Ecto schemas are relational mappers. Therefore,
while Ecto is not a relational mapper, it contains a relational mapper as part of
the many different tools it offers.
For example, the schema below ties the fields name , email , inserted_at and
updated_at to fields similarly named in the users table:
defmodule MyApp.User do
use Ecto.Schema
schema "users" do
field :name
field :email
timestamps()
end
end
The appeal behind schemas is that you define the shape of the data once and you
can use this shape to retrieve data from the database as well as coordinate
changes happening on the data:
MyApp.User
|> MyApp.Repo.get!(13)
|> Ecto.Changeset.cast([name: "new name"], [:name, :email])
|> MyApp.Repo.update!
By relying on the schema information, Ecto knows how to read and write data
without extra input from the developer. In small applications, this coupling
between the data and its representation is desired. However, when used wrongly,
it leads to complex codebases and sub par solutions.
Here are some examples of issues often associated with ORMs that Ecto
developers may run into when using schemas:
Projects using Ecto may end-up with "God Schemas", commonly referred
as "God Models", "Fat Models" or "Canonical Models" in some languages
and frameworks. Such schemas could contain hundreds of fields, often
reflecting bad decisions done at the data layer. Instead of providing one
single schema with fields that span multiple concerns, it is better to break
the schema across multiple contexts. For example, instead of a single
MyApp.User schema with dozens of fields, consider breaking it into
MyApp.Accounts.User , MyApp.Purchases.User and so on. Each struct
with fields exclusive to its enclosing context
Developers may excessively rely on schemas when sometimes the best way
to retrieve data from the database is into regular data structures (like maps
and tuples) and not pre-defined shapes of data like structs. For example,
when doing searches, generating reports and others, there is no reason to
rely or return schemas from such queries, as it often relies on data coming
from multiple tables with different requirements
Developers may try to use the same schema for operations that may be
quite different structurally. Many applications would bolt features such as
registration, account login, into a single User schema, while handling each
operation individually, possibly using different schemas, would lead to
simpler and clearer solutions
In any case, for a book called "The little Ecto cookbook", this chapter does not
look like a recipe at all. And you are right! In our defense, knowing "what we
must not do" is sometimes as important as "knowing what to do". So consider
this chapter our "don't put metals in the microwave" warning. :)
In the next two chapters, we will go back to the planned scheduled, and learn
different recipes on how to break apart the "bad practices" above by exploring
how to use Ecto without schemas or even with multiple schemas per context. By
learning how to insert, delete, manipulate and validate data with and without
schemas, we hope developers will feel comfortable with building complex
applications without relying on one-size-fits-all schemas.
Schemaless queries
Most queries in Ecto are written using schemas. For example, to retrieve all
posts in a database, one may write:
MyApp.Repo.all(Post)
In the construct above, Ecto knows all fields and their types in the schema,
rewriting the query above to:
query =
from p in Post,
select: %Post{title: p.title, body: p.body, ...}
MyApp.Repo.all(query)
Although you might use schemas for most of your queries, Ecto also adds the
ability to write regular schemaless queries when prefered.
One example is this ability to select all desired fields without duplication:
When a list of fields is given, Ecto will automatically convert the list of fields to
a map or a struct.
Support for passing a list of fields or keyword lists is available to almost all
query constructs. For example, we can use an update query to change the title of
a given post without a schema:
def increment_page_views(post) do
query =
from "posts",
where: [id: ^post.id],
update: [inc: [page_views: 1]]
MyApp.Repo.update_all(query)
end
Let's take a look at another example. Imagine you are writing a reporting view, it
may be counter-productive to think how your existing application schemas relate
to the report being generated. It is often simpler to write a query that returns only
the data you need, without trying to fit the data into existing schemas:
import Ecto.Query
MyApp.Repo.all(query)
end
The function above does not rely on schemas. It returns only the data that
matters for building the report. Notice how we use the type/2 function to
specify what is the expected type of the argument we are interpolating,
benefiting from the same type casting guarantees a schema would give.
MyApp.Repo.insert_all(
Post,
[
[title: "hello", body: "world"],
[title: "another", body: "post"]
]
)
It is not hard to see how these operations directly map to their SQL variants,
keeping the database at your fingertips without the need to intermediate all
operations through schemas.
Data mapping and validation
We will take a look at the role schemas play when validating and casting data
through changesets. As we will see, sometimes the best solution is not to
completely avoid schemas, but break a large schema into smaller ones. Maybe
one for reading data, another for writing. Maybe one for your database, another
for your forms.
An Ecto schema is used to map any data source into an Elixir struct.
For instance, when you write a web application using Phoenix and you use Ecto
to receive external changes and apply such changes to your database, we have
this mapping:
Although there is a single Ecto schema mapping to both your database and your
API, in many situations it is better to break this mapping in two. Let's see some
practical examples.
Imagine you are working with a client that wants the "Sign Up" form to contain
the fields "First name", "Last name" along side "E-mail" and other information.
You know there are a couple problems with this approach.
First of all, not everyone has a first and last name. Although your client is
decided on presenting both fields, they are a UI concern, and you don't want the
UI to dictate the shape of your data. Furthermore, you know it would be useful
to break the "Sign Up" information across two tables, the "accounts" and
"profiles" tables.
Given the requirements above, how would we implement the Sign Up feature in
the backend?
One approach would be to have two schemas, Account and Profile, with virtual
fields such as first_name and last_name , and use associations along side
nested forms to tie the schemas to your UI. One of such schemas would be:
defmodule Profile do
use Ecto.Schema
schema "profiles" do
field :name
field :first_name, :string, virtual: true
field :last_name, :string, virtual: true
...
end
end
It is not hard to see how we are polluting our Profile schema with UI
requirements by adding fields such first_name and last_name . If the Profile
schema is used for both reading and writing data, it may end-up in an awkward
place where it is not useful for any, as it contains fields that map just to one or
the other operation.
One alternative solution is to break the "Database <-> Ecto schema <-> Forms /
API" mapping in two parts. The first will cast and validate the external data with
its own structure which you then transform and write to the database. For such,
let's define a schema named Registration that will take care of casting and
validating the form data exclusively, mapping directly to the UI fields:
defmodule Registration do
use Ecto.Schema
embedded_schema do
field :first_name
field :last_name
field :email
end
end
changeset =
%Registration{}
|> Ecto.Changeset.cast(params["sign_up"], fields)
|> validate_required(...)
|> validate_length(...)
Now that the registration changes are mapped and validated, we can check if the
resulting changeset is valid and act accordingly:
if changeset.valid? do
# Get the modified registration struct from changeset
registration = Ecto.Changeset.apply_changes(changeset)
account = Registration.to_account(registration)
profile = Registration.to_profile(registration)
MyApp.Repo.transaction fn ->
MyApp.Repo.insert_all "accounts", [account]
MyApp.Repo.insert_all "profiles", [profile]
end
{:ok, registration}
else
# Annotate the action so the UI shows errors
changeset = %{changeset | action: :registration}
{:error, changeset}
end
def to_account(registration) do
Map.take(registration, [:email])
end
def to_profile(%{first_name: first, last_name: last}) do
%{name: "#{first} #{last}"}
end
In the example above, by breaking apart the mapping between the database and
Elixir and between Elixir and the UI, our code becomes clearer and our data
structures simpler.
Schemaless changesets
Although we chose to define a Registration schema to use in the changeset,
Ecto also allows developers to use changesets without schemas. We can
dynamically define the data and their types. Let's rewrite the registration
changeset above to bypass schemas:
data = %{}
types = %{name: :string, email: :string}
You can use this technique to validate API endpoints, search forms, and other
sources of data. The choice of using schemas depends mostly if you want to use
the same mapping in different places or if you desire the compile-time
guarantees Elixir structs gives you. Otherwise, you can bypass schemas
altogether, be it when using changesets or interacting with the repository.
However, the most important lesson in this guide is not when to use or not to use
schemas, but rather understand when a big problem can be broken into smaller
problems that can be solved independently leading to an overall cleaner solution.
The choice of using schemas or not above didn't affect the solution as much as
the choice of breaking the registration problem apart.
Dynamic queries
Ecto was designed from the ground up to have an expressive query API that
leverages Elixir syntax to write queries that are pre-compiled for performance
and safety. When building queries, we may use the keywords syntax
import Ecto.Query
from p in Post,
where: p.author == "José" and p.category == "Elixir",
where: p.published_at > ^minimum_date,
order_by: [desc: p.published_at]
import Ecto.Query
Post
|> where([p], p.author == "José" and p.category == "Elixir")
|> where([p], p.published_at > ^minimum_date)
|> order_by([p], desc: p.published_at)
While many developers prefer the pipe-based syntax, having to repeat the
binding p made it quite verbose compared to the keyword one.
Another problem with the pre-compiled query syntax is that it has limited
options to compose the queries dynamically. Imagine for example a web
application that provides search functionality on top of existing posts. The user
should be able to specify multiple criteria, such as the author name, the post
category, publishing interval, etc.
To solve those problems, Ecto also provides a data-structure centric API to build
queries as well as a very powerful mechanism for dynamic queries. Let's take a
look.
from p in Post,
where: [author: "José", category: "Elixir"],
where: p.published_at > ^minimum_date,
order_by: [desc: :published_at]
and
Post
|> where(author: "José", category: "Elixir")
|> where([p], p.published_at > ^minimum_date)
|> order_by(desc: :published_at)
Notice how we were able to ditch the p selector in most expressions. In Ecto, all
constructs, from select and order_by to where and group_by , accept data
structures as input. The data structure can be specified at compile-time, as above,
and also dynamically at runtime, shown below:
Dynamic fragments
For cases where we cannot rely on data structures but still desire to build queries
dynamically, Ecto includes the Ecto.Query.dynamic/2 macro.
The dynamic macro allows us to conditionally build query fragments and
interpolate them in the main query. For example, imagine that in the example
above you may optionally filter posts by a date of publication. You could of
course write it like this:
query =
Post
|> where(^where)
|> order_by(^order_by)
query =
if published_at = params["published_at"] do
where(query, [p], p.published_at < ^published_at)
else
query
end
filter_published_at =
if published_at = params["published_at"] do
dynamic([p], p.published_at < ^published_at)
else
true
end
Post
|> where(^where)
|> where(^filter_published_at)
|> order_by(^order_by)
The dynamic macro allows us to build dynamic expressions that are later
interpolated into the query. dynamic expressions can also be interpolated into
dynamic expressions, allowing developers to build complex expressions
dynamically without hassle.
To tackle this in Ecto, we can break our problem into a bunch of small functions,
that build either data structures or dynamic fragments, and then we interpolate it
into the query:
def filter(params) do
Post
|> order_by(^filter_order_by(params["order_by"]))
|> where(^filter_where(params))
end
def filter_order_by("published_at_desc"),
do: dynamic([p], desc: p.published_at)
def filter_order_by("published_at"),
do: dynamic([p], p.published_at)
def filter_order_by(_),
do: []
def filter_where(params) do
Enum.reduce(params, dynamic(true), fn
{"author", value}, dynamic ->
dynamic([p], ^dynamic and p.author == ^value)
Testing also becomes simpler as we can test each function in isolation, even
when using dynamic queries:
assert dynamic_match?(
filter_where(%{"published_at" => "2010-04-17"}),
"true and q.published_at > ^\"2010-04-17\""
)
end
In the example above, we created a small helper that allows us to assert on the
dynamic contents by matching on the results of inspect(dynamic) .
def filter_order_by("published_at"),
do: dynamic([p], p.published_at)
def filter_order_by("author_name_desc"),
do: dynamic([authors: a], desc: a.name)
def filter_order_by("author_name"),
do: dynamic([authors: a], a.name)
def filter_order_by(_),
do: []
Adding more filters in the future is simply a matter of adding more clauses to the
Enum.reduce/3 call in filter_where .
Multi tenancy with query prefixes
With Ecto we can run queries in different prefixes using a single pool of
database connections. For databases engines such as Postgres, Ecto's prefix maps
to Postgres' DDL schemas. For MySQL, each prefix is a different database on its
own.
Query prefixes may be useful in different scenarios. For example, multi tenant
apps running on Postgres would define multiple prefixes, usually one per client,
under a single database. The idea is that prefixes will provide data isolation
between the different users of the application, guaranteeing either globally or at
the data level that queries and commands act on a specific prefix.
While query prefixes were designed with the two scenarios above in mind, they
may also be used in other circumstances, which we will explore throughout this
guide. All the examples below assume you are using Postgres. Other databases
engines may require slightly different solutions.
Connection prefixes
As a starting point, let's start with a simple scenario: your application must
connect to a particular prefix when running in production. This may be due to
infrastructure conditions, database administration rules or others.
# lib/repo.ex
defmodule MyApp.Repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres
end
# lib/sample.ex
defmodule MyApp.Sample do
use Ecto.Schema
schema "samples" do
field :name
timestamps
end
end
# config/config.exs
config :my_app, MyApp.Repo,
username: "postgres",
password: "postgres",
database: "demo",
hostname: "localhost",
pool_size: 10
# priv/repo/migrations/20160101000000_create_sample.exs
defmodule MyApp.Repo.Migrations.CreateSample do
use Ecto.Migration
def change do
create table(:samples) do
add :name, :string
timestamps()
end
end
end
Now let's create the database, migrate it and then start an IEx session:
$ mix ecto.create
$ mix ecto.migrate
$ iex -S mix
Interactive Elixir - press Ctrl+C to exit
iex(1)> MyApp.Repo.all MyApp.Sample
[]
Luckily Postgres allows us to change the prefix our database connections run on
by setting the "schema search path". The best moment to change the search path
is right after we setup the database connection, ensuring all of our queries will
run on that particular prefix, throughout the connection life-cycle.
$ iex -S mix
Interactive Elixir - press Ctrl+C to exit
iex(1)> MyApp.Repo.all MyApp.Sample
** (Postgrex.Error) ERROR (undefined_table):
relation "samples" does not exist
Our previously successful query now fails because there is no table "samples"
under the new prefix. Let's try to fix that by running migrations:
$ mix ecto.migrate
** (Postgrex.Error) ERROR (invalid_schema_name):
no schema has been selected to create in
Oops. Now migration says there is no such schema name. That's because
Postgres automatically creates the "public" prefix every time we create a new
database. If we want to use a different prefix, we must explicitly create it on the
database we are running on:
$ mix ecto.migrate
$ iex -S mix
Interactive Elixir - press Ctrl+C to exit
iex(1)> MyApp.Repo.all MyApp.Sample
[]
Data in different prefixes are isolated. Writing to the "samples" table in one
prefix cannot be accessed by the other unless we change the prefix in the
connection or use the Ecto conveniences we will discuss next.
Schema prefixes
Ecto also allows you to set a particular schema to run on a specific prefix.
Imagine you are building a multi-tenant application. Each client data belongs to
a particular prefix, such as "client_foo", "client_bar" and so forth. Yet your
application may still rely on a set of tables that are shared across all clients. One
of such tables may be exactly the table that maps the Client ID to its database
prefix. Let's assume we want to store this data in a prefix named "main":
defmodule MyApp.Mapping do
use Ecto.Schema
@schema_prefix "main"
schema "mappings" do
field :client_id, :integer
field :db_prefix
timestamps
end
end
Ecto also supports the :prefix option on all relevant repository operations:
One interesting aspect of prefixes in Ecto is that the prefix information is carried
along each struct returned by a query:
The example above returned nil, which means no prefix was specified by Ecto,
and therefore the database connection default will be used. In this case,
"connection_prefix" will be used because of the :after_connect callback we
added at the beginning of this guide.
Since the prefix data is carried in the struct, we can use such to copy data from
one prefix to the other. Let's copy the sample above from the
"connection_prefix" to the "public" one:
Prefixes in queries and structs always cascade. For example, if you run
MyApp.Repo.preload(sample, [:some_association]) , the association will be
queried for and loaded in the same prefix as the sample struct. If sample has
associations and you call MyApp.Repo.insert(sample) or
MyApp.Repo.update(sample) , the associated data will also be inserted/updated
in the same prefix as sample . That's by design to facilitate working with groups
of data in the same prefix, and especially because data in different prefixes
must be kept isolated.
Those will take precedence over all other prefixes we have defined so far. For
each join/from in the query, the prefix used will be determined by the following
order:
Migration prefixes
When the connection prefix is set, it also changes the prefix migrations run on.
However it is also possible to set the prefix through the command line or per
table in the migration itself.
For example, imagine you are a gaming company where the game is broken in
128 partitions, named "prefix_1", "prefix_2", "prefix_3" up to "prefix_128".
Now, whenever you need to migrate data, you need to migrate data on all
different 128 prefixes. There are two ways of achieve that.
The first mechanism is to invoke mix ecto.migrate multiple times, once per
prefix, passing the --prefix option:
The other approach is by changing each desired migration to run across multiple
prefixes. For example:
defmodule MyApp.Repo.Migrations.CreateSample do
use Ecto.Migration
def change do
for i <- 1..128 do
prefix = "prefix_#{i}"
create table(:samples, prefix: prefix) do
add :name, :string
timestamps()
end
Summing up
Ecto provides many conveniences for working with querying prefixes. Those
conveniences allow developers to configure prefixes with different precedence,
starting with the highest one:
1. from/join prefixes
2. query/struct prefixes
3. schema prefixes
4. connection prefixes
Aggregates
Ecto includes a convenience function in repositories to calculate aggregates.
For example, if we assume every post has an integer column named visits, we
can find the average number of visits across all posts with:
Imagine that instead of calculating the average of all posts, you want the average
of only the top 10. Your first try may be:
MyApp.Repo.one(
from p in MyApp.Post,
order_by: [desc: :visits],
limit: 10,
select: avg(p.visits)
)
#=> #Decimal<1743>
Oops. The query above returned the same value as the queries before. The option
limit: 10 has no effect here since it is limiting the aggregated result and
queries with aggregates return only a single row anyway. In order to retrieve the
correct result, we would need to first find the top 10 posts and only then
aggregate. That's exactly what aggregate/4 does:
query =
from MyApp.Post,
order_by: [desc: :visits],
limit: 10
inner_query =
from MyApp.Post,
order_by: [desc: :visits],
limit: 10
query =
from q in subquery(inner_query),
select: avg(q.visits)
MyApp.Repo.one(query)
Subqueries
In the previous section we have already learned some queries that would be hard
to express without support for subqueries. That's one of many examples that
caused subqueries to be added to Ecto.
Subqueries in Ecto are created by calling Ecto.Query.subquery/1 . This
function receives any data structure that can be converted to a query, via the
Ecto.Queryable protocol, and returns a subquery construct (which is also
queryable).
inner_query =
from MyApp.Post,
order_by: [desc: :visits],
limit: 10
query =
from q in subquery(query),
select: avg(q.visits)
MyApp.Repo.one(query)
Because the query does not specify a :select clause, it will return select: p
where p is controlled by MyApp.Post schema. Since the query will return all
fields in MyApp.Post , when we convert it to a subquery, all of the fields from
MyApp.Post will be available on the parent query, such as q.visits . In fact,
Ecto will keep the schema properties across queries. For example, if you write
q.field_that_does_not_exist , your Ecto query won't compile.
Ecto also allows an Elixir map to be returned from a subquery, making the map
keys directly available to the parent query.
Let's see one last example. Imagine you manage a library (as in an actual library
in the real world) and there is a table that logs every time the library lends a
book. The "lendings" table uses an auto-incrementing primary key and can be
backed by the following schema:
defmodule Library.Lending do
use Ecto.Schema
schema "lendings" do
belongs_to :book, MyApp.Book # defines book_id
belongs_to :visitor, MyApp.Visitor # defines visitor_id
end
end
Now consider we want to retrieve the name of every book alongside the name of
the last person the library has lent it to. To do so, we need to find the last lending
ID of every book, and then join on the book and visitor tables. With subqueries,
that's straight-forward:
last_lendings =
from l in MyApp.Lending,
group_by: l.book_id,
select: %{
book_id: l.book_id,
last_lending_id: max(l.id)
}
from l in Lending,
join: last in subquery(last_lendings),
on: last.last_lending_id == l.id,
join: b in assoc(l, :book),
join: v in assoc(l, :visitor),
select: {b.name, v.name}
Test factories
Many projects depend on external libraries to build their test data. Some of those
libraries are called factories because they provide convenience functions for
producing different groups of data. However, given Ecto is able to manage
complex data trees, we can implement such functionality without relying on
third-party projects.
defmodule MyApp.Factory do
alias MyApp.Repo
# Factories
def build(:post) do
%MyApp.Post{title: "hello world"}
end
def build(:comment) do
%MyApp.Comment{body: "good post"}
end
def build(:post_with_comments) do
%MyApp.Post{
title: "hello with comments",
comments: [
build(:comment, body: "first"),
build(:comment, body: "second")
]
}
end
def build(:user) do
%MyApp.User{
email: "hello#{System.unique_integer()}",
username: "hello#{System.unique_integer()}"
}
end
# Convenience API
Our factory module defines four "factories" as different clauses to the build
function: :post , :comment , :post_with_comments and :user . Each clause
defines structs with the fields that are required by the database. In certain cases,
the generated struct also needs to generate unique fields, such as the user's email
and username. We did so by calling Elixir's System.unique_integer() - you
could call System.unique_integer([:positive]) if you need a strictly
positive number.
At the end, we defined two functions, build/2 and insert!/2 , which are
conveniences for building structs with specific attributes and for inserting data
directly in the repository respectively.
That's literally all that is necessary for building our factories. We are now ready
to use them in our tests. First, open up your "mix.exs" and make sure the
"test/support/factory.ex" file is compiled:
def project do
[...,
elixirc_paths: elixirc_paths(Mix.env),
...]
end
Now in any of the tests that need to generate data, we can import the
MyApp.Factory module and use its functions:
import MyApp.Factory
build(:post)
#=> %MyApp.Post{id: nil, title: "hello world", ...}
put_assoc vs cast_assoc
Imagine we are building an application that has blog posts and such posts may
have many tags. Not only that, a given tag may also belong to many posts. This
is a classic scenario where we would use many_to_many associations. Our
migrations would look like:
create table(:posts) do
add :title
add :body
timestamps()
end
create table(:tags) do
add :name
timestamps()
end
Note we added a unique index to the tag name because we don't want to have
duplicated tags in our database. It is important to add an index at the database
level instead of using a validation since there is always a chance two tags with
the same name would be validated and inserted simultaneously, passing the
validation and leading to duplicated entries.
Now let's also imagine we want the user to input such tags as a list of words split
by comma, such as: "elixir, erlang, ecto". Once this data is received in the server,
we will break it apart into multiple tags and associate them to the post, creating
any tag that does not yet exist in the database.
While the constraints above sound reasonable, that's exactly what put us in
trouble with cast_assoc/3 . The cast_assoc/3 changeset function was
designed to receive external parameters and compare them with the associated
data in our structs.To do so correctly, Ecto requires tags to be sent as a list of
maps. We can see an example of this in Polymorphic associations with many to
many. However, here we expect tags to be sent in a string separated by comma.
Furthermore, cast_assoc/3 relies on the primary key field for each tag sent in
order to decide if it should be inserted, updated or deleted. Again, because the
user is simply passing a string, we don't have the ID information at hand.
defmodule MyApp.Post do
use Ecto.Schema
schema "posts" do
field :title
field :body
timestamps()
end
defp get_or_insert_tag(name) do
Repo.get_by(MyApp.Tag, name: name) ||
Repo.insert!(MyApp.Tag, %Tag{name: name})
end
end
In the changeset function above, we moved all the handling of tags to a separate
function, called parse_tags/1 , which checks for the parameter, breaks each tag
apart via String.split/2 , then removes any left over whitespace with
String.trim/1 , rejects any empty string and finally checks if the tag exists in
the database or not, creating one in case none exists.
However, our code is not yet ready for production. Let's see why.
defp get_or_insert_tag(name) do
%Tag{}
|> Ecto.Changeset.change(name: name)
|> Ecto.Changeset.unique_constraint(:name)
|> Repo.insert
|> case do
{:ok, tag} -> tag
{:error, _} -> Repo.get_by!(MyApp.Tag, name: name)
end
end
Instead of inserting the tag directly, we now build a changeset, which allows us
to use the unique_constraint annotation. Now if the Repo.insert operation
fails because the unique index for :name is violated, Ecto won't raise, but return
an {:error, changeset} tuple. Therefore, if Repo.insert succeeds, it is
because the tag was saved, otherwise the tag already exists, which we then fetch
with Repo.get_by! .
While the mechanism above fixes the race condition, it is a quite expensive one:
we need to perform two queries for every tag that already exists in the database:
the (failed) insert and then the repository lookup. Given that's the most common
scenario, we may want to rewrite it to the following:
defp get_or_insert_tag(name) do
Repo.get_by(MyApp.Tag, name: name) ||
maybe_insert_tag(name)
end
defp maybe_insert_tag(name) do
%Tag{}
|> Ecto.Changeset.change(name: name)
|> Ecto.Changeset.unique_constraint(:name)
|> Repo.insert
|> case do
{:ok, tag} -> tag
{:error, _} -> Repo.get_by!(MyApp.Tag, name: name)
end
end
The above performs 1 query for every tag that already exists, 2 queries for every
new tag and possibly 3 queries in the case of race conditions. While the above
would perform slightly better on average, Ecto has a better option in stock.
Upserts
Ecto supports the so-called "upsert" command which is an abbreviation for
"update or insert". The idea is that we try to insert a record and in case it
conflicts with an existing entry, for example due to a unique index, we can
choose how we want the database to act by either raising an error (the default
behaviour), ignoring the insert (no error) or by updating the conflicting database
entries.
While the above won't raise an error in case of conflicts, it also won't update the
struct given, so it will return a tag without ID. One solution is to force an update
to happen in case of conflicts, even if the update is about setting the tag name to
its current name. In such cases, PostgreSQL also requires the
:conflict_target option to be given, which is the column (or a list of
columns) we are expecting the conflict to happen:
defp get_or_insert_tag(name) do
Repo.insert!(
%MyApp.Tag{name: name},
on_conflict: [set: [name: name]],
conflict_target: :name
)
end
And that's it! We try to insert a tag with the given name and if such tag already
exists, we tell Ecto to update its name to the current value, updating the tag and
fetching its id. While the above is certainly a step up from all solutions so far, it
still performs one query per tag. If 10 tags are sent, we will perform 10 queries.
Can we further improve this?
defmodule MyApp.Post do
use Ecto.Schema
# Schema is the same
schema "posts" do
add :title
add :body
timestamps()
end
defp insert_and_get_all([]) do
[]
end
defp insert_and_get_all(names) do
maps = Enum.map(names, &%{name: &1})
Repo.insert_all MyApp.Tag, maps, on_conflict: :nothing
Repo.all from t in MyApp.Tag, where: t.name in ^names
end
end
Instead of attempting to get and insert each tag individually, the code above
work on all tags at once, first by building a list of maps which is given to
insert_all and then by looking up all tags with the existing names. Therefore,
regardless of how many tags are sent, we will perform only 2 queries (unless no
tag is sent, in which we return an empty list back promptly). This solution is
only possible thanks to the :on_conflict option, which guarantees
insert_all won't fail in case a unique index is violated, such as duplicate tag
names.
Finally, keep in mind that we haven't used transactions in any of the examples so
far. That decision was deliberate as we relied on the fact that getting or inserting
tags is an idempotent operation, i.e. we can repeat it many times for a given
input and it will always give us the same result back. Therefore, even if we fail
to introduce the post to the database due to a validation error, the user will be
free to resubmit the form and we will just attempt to get or insert the same tags
once again. The downside of this approach is that tags will be created even if
creating the post fails, which means some tags may not have posts associated to
them. In case that's not desired, the whole operation could be wrapped in a
transaction or modeled with the Ecto.Multi .
Polymorphic associations with many
to many
Besides belongs_to , has_many , has_one and :through associations, Ecto
also includes many_to_many . many_to_many relationships, as the name says,
allows a record from table X to have many associated entries from table Y and
vice-versa. Although many_to_many associations can be written as has_many
:through , using many_to_many may considerably simplify some workflows.
In our case, there is one aspect of todo list applications we are interested in,
which is the relationship where the todo list has many todo items. We have
explored this exact scenario in detail in an article we posted on Plataformatec's
blog about nested associations and embeds. Let's recap the important points.
Our todo list app has two schemas, Todo.List and Todo.Item :
defmodule MyApp.TodoList do
use Ecto.Schema
schema "todo_lists" do
field :title
has_many :todo_items, MyApp.TodoItem
timestamps()
end
end
defmodule MyApp.TodoItem do
use Ecto.Schema
schema "todo_items" do
field :description
timestamps()
end
end
One of the ways to introduce a todo list with multiple items into the database is
to couple our UI representation to our schemas. That's the approach we took in
the blog post with Phoenix. Roughly:
When such a form is submitted in Phoenix, it will send parameters with the
following shape:
%{
"todo_list" => %{
"title" => "shipping list",
"todo_items" => %{
0 => %{"description" => "bread"},
1 => %{"description" => "eggs"}
}
}
}
We could then retrieve those parameters and pass it to an Ecto changeset and
Ecto would automatically figure out what to do:
# In MyApp.TodoList
def changeset(struct, params \\ %{}) do
struct
|> Ecto.Changeset.cast(params, [:title])
|> Ecto.Changeset.cast_assoc(:todo_items, required: true)
end
The advantage of using cast_assoc/3 is that Ecto is able to do all of the hard
work of keeping the entries associated, as long as we pass the data exactly in
the format that Ecto expects. However, such approach is not always
preferrable and in many situations it is better to design our associations
differently or decouple our UIs from our database representation.
First of all, it is important to remember Ecto does not provide the same type of
polymorphic associations available in frameworks such as Rails and Laravel. In
such frameworks, a polymorphic association uses two columns, the parent_id
and parent_type . For example, one todo item would have parent_id of 1
with parent_type of "TodoList" while another would have parent_id of 1
with parent_type of "Project".
The issue with the design above is that it breaks database references. The
database is no longer capable of guaranteeing the item you associate to exists or
will continue to exist in the future. This leads to an inconsistent database which
end-up pushing workarounds to your application.
The design above is also extremely inefficient, especially if you're working with
large tables. Bear in mind that if that's your case, you might be forced to remove
such polymorphic references in the future when frequent polymorphic queries
start grinding the database to a halt even after adding indexes and optimizing the
database.
create table(:todo_lists) do
add :title
timestamps()
end
create table(:projects) do
add :name
timestamps()
end
create table(:todo_items) do
add :description
timestamps()
end
create table(:todo_lists_items) do
add :todo_item_id, references(:todo_items)
add :todo_list_id, references(:todo_lists)
timestamps()
end
create table(:projects_items) do
add :todo_item_id, references(:todo_items)
add :project_id, references(:projects)
timestamps()
end
By adding one table per association pair, we keep database references and can
efficiently perform queries that relies on indexes.
First let's see how implement this functionality in Ecto using a has_many
:through and then use many_to_many to remove a lot of the boilerplate we
were forced to introduce.
defmodule MyApp.TodoList do
use Ecto.Schema
schema "todo_lists" do
field :title
has_many :todo_list_items, MyApp.TodoListItem
has_many :todo_items,
through: [:todo_list_items, :todo_item]
timestamps()
end
end
defmodule MyApp.TodoListItem do
use Ecto.Schema
schema "todo_list_items" do
belongs_to :todo_list, MyApp.TodoList
belongs_to :todo_item, MyApp.TodoItem
timestamps()
end
end
defmodule MyApp.TodoItem do
use Ecto.Schema
schema "todo_items" do
field :description
timestamps()
end
end
The trouble is that :through associations are read-only since Ecto does not
have enough information to fill in the intermediate schema. This means that, if
we still want to use cast_assoc to insert a todo list with many todo items
directly from the UI, we cannot use the :through association and instead must
go step by step. We would need to first cast_assoc(:todo_list_items) from
TodoList and then call cast_assoc(:todo_item) from the TodoListItem
schema:
# In MyApp.TodoList
def changeset(struct, params \\ %{}) do
struct
|> Ecto.Changeset.cast(params, [:title])
|> Ecto.Changeset.cast_assoc(
:todo_list_items,
required: true
)
end
%{"todo_list" => %{
"title" => "shipping list",
"todo_list_items" => %{
0 => %{"todo_item" => %{"description" => "bread"}},
1 => %{"todo_item" => %{"description" => "eggs"}},
}
}}
To make matters worse, you would have to duplicate this logic for every
intermediate schema, and introduce MyApp.TodoListItem for todo lists,
MyApp.ProjectItem for projects, etc.
defmodule MyApp.TodoList do
use Ecto.Schema
schema "todo_lists" do
field :title
many_to_many :todo_items, MyApp.TodoItem,
join_through: MyApp.TodoListItem
timestamps()
end
end
defmodule MyApp.TodoListItem do
use Ecto.Schema
schema "todo_list_items" do
belongs_to :todo_list, MyApp.TodoList
belongs_to :todo_item, MyApp.TodoItem
timestamps()
end
end
defmodule MyApp.TodoItem do
use Ecto.Schema
schema "todo_items" do
field :description
timestamps()
end
end
%{"todo_list" => %{
"title" => "shipping list",
"todo_items" => %{
0 => %{"description" => "bread"},
1 => %{"description" => "eggs"},
}
}}
# In MyApp.TodoList
def changeset(struct, params \\ %{}) do
struct
|> Ecto.Changeset.cast(params, [:title])
|> Ecto.Changeset.cast_assoc(:todo_items, required: true)
end
In other words, we can use exactly the same code we had in the "todo lists
has_many todo items" case. So even when external constraints require us to use
a join table, many_to_many associations can automatically manage them for us.
Everything you know about associations will just work with many_to_many
associations as well.
defmodule MyApp.TodoList do
use Ecto.Schema
schema "todo_lists" do
field :title
many_to_many :todo_items, MyApp.TodoItem,
join_through: "todo_list_items"
timestamps()
end
end
In this case, you can completely remove the MyApp.TodoListItem schema from
your application and the code above will still work. The only difference is that
when using tables, any autogenerated value that is filled by Ecto schema, such as
timestamps, won't be filled as we no longer have a schema. To solve this, you
can either drop those fields from your migrations or set a default at the database
level.
Summary
In this guide we used many_to_many associations to drastically improve a
polymorphic association design that relied on has_many :through . Our goal
was to allow "todo_items" to associate to different entities in our code base, such
as "todo_lists" and "projects". We have done this by creating intermediate tables
and by using many_to_many associations to automatically manage those join
tables.
defmodule MyApp.TodoList do
use Ecto.Schema
schema "todo_lists" do
field :title
many_to_many :todo_items, MyApp.TodoItem,
join_through: "todo_list_items"
timestamps()
end
defmodule MyApp.Project do
use Ecto.Schema
schema "todo_lists" do
field :name
many_to_many :todo_items, MyApp.TodoItem,
join_through: "project_items"
timestamps()
end
defmodule MyApp.TodoItem do
use Ecto.Schema
schema "todo_items" do
field :description
timestamps()
end
create table("todo_lists") do
add :title
timestamps()
end
create table("projects") do
add :name
timestamps()
end
create table("todo_items") do
add :description
timestamps()
end
Overall our code looks structurally the same as has_many would, although at the
database level our relationships are expressed with join tables.
While in this guide we changed our code to cope with the parameter format
required by cast_assoc , in Constraints and Upserts we drop cast_assoc
altogether and use put_assoc which brings more flexibilities when working
with associations.
Composable transactions with Multi
Ecto relies on database transactions when multiple operations must be performed
atomically. The most common example used for transaction are bank transfers
between two people:
Repo.transaction(fn ->
mary_update =
from Account,
where: [id: ^mary.id],
update: [inc: [balance: +10]]
{1, _} = Repo.update_all(mary_update)
john_update =
from Account,
where: [id: ^john.id],
update: [inc: [balance: -10]]
{1, _} = Repo.update_all(john_update)
end)
Repo.transaction(fn ->
mary_update =
from Account,
where: [id: ^mary.id],
update: [inc: [balance: +10]]
Transactions in Ecto can also be nested arbitrarily. For example, imagine the
transaction above is moved into its own function that receives both accounts,
defined as transfer_money(mary, john, 10) , and besides transferring money
we also want to log the transfer:
Repo.transaction(fn ->
case transfer_money(mary, john, 10) do
{:ok, {mary, john}} ->
transfer = %Transfer{
from: mary.id,
to: john.id,
amount: 10
}
Repo.insert!(transfer)
The snippet above starts a transaction and then calls transfer_money/3 that
also runs in a transaction. In case of multiple transactions, they are all flattened,
which means a failure in an inner transaction causes the outer transaction to also
fail. That's why matching and rolling back on {:error, error} is important.
While nesting transactions can improve the code readability by breaking large
transactions into multiple smaller transactions, there is still a lot of boilerplate
involved in handling the success and failure scenarios. Furthermore, composition
is quite limited, as all operations must still be performed inside transaction
blocks.
A more declarative approach when working with transactions would be to define
all operations we want to perform in a transaction decoupled from the
transaction execution. This way we would be able to compose transactions
operations without worrying about its execution context or about each individual
success/failure scenario. That's exactly what Ecto.Multi allows us to do.
mary_update =
from Account,
where: [id: ^mary.id],
update: [inc: [balance: +10]]
john_update =
from Account,
where: [id: ^john.id],
update: [inc: [balance: -10]]
Ecto.Multi.new
|> Ecto.Multi.update_all(:mary, mary_update)
|> Ecto.Multi.update_all(:john, john_update)
transfer = %Transfer{
from: mary.id,
to: john.id,
amount: 10
}
This is considerably simpler than the nested transaction approach we have seen
earlier. Once all operations are defined in the multi, we can finally call
Repo.transaction , this time passing the multi:
transfer = %Transfer{
from: mary.id,
to: john.id,
amount: 10
}
If all operations in the multi succeed, it returns {:ok, map} where the map
contains the name of all operations as keys and their success value. If any
operation in the multi fails, the transaction is rolled back and
Repo.transaction returns {:error, name, value, changes_so_far} , where
name is the name of the failed operation, value is the failure value and
changes_so_far is a map of the previously successful multi operations that
have been rolled back due to the failure.
In other words, Ecto.Multi takes care of all the flow control boilerplate while
decoupling the transaction definition from its execution, allowing us to compose
operations as needed.
Dependent values
Besides operations such as insert , update and delete , Ecto.Multi also
provides functions for handling more complex scenarios. For example, prepend
and append can be used to merge multis together. And more generally, the
Ecto.Multi.run/3 and Ecto.Multi.run/5 can be used to define any
operation that depends on the results of a previous multi operation.
defmodule MyApp.Post do
use Ecto.Schema
defp insert_and_get_all([]) do
[]
end
defp insert_and_get_all(names) do
maps = Enum.map(names, &%{name: &1})
Repo.insert_all MyApp.Tag, maps, on_conflict: :nothing
Repo.all from t in MyApp.Tag, where: t.name in ^names
end
end
Let's fix the problem above by introducing using Ecto.Multi . Let's start by
splitting the logic into both Post and Tag modules and keeping it free from
side-effects such as database operations:
defmodule MyApp.Post do
use Ecto.Schema
schema "posts" do
field :title
field :body
many_to_many :tags, MyApp.Tag,
join_through: "posts_tags",
on_replace: :delete
timestamps()
end
defmodule MyApp.Tag do
use Ecto.Schema
schema "tags" do
field :name
timestamps()
end
def parse(tags) do
(tags || "")
|> String.split(",")
|> Enum.map(&String.trim/1)
|> Enum.reject(& &1 == "")
end
end
Now, whenever we need to introduce a post with tags, we can create a multi that
wraps all operations and the repository access:
alias MyApp.Tag
In the example above we have used Ecto.Multi.run/3 twice, albeit for two
different reasons.
1. In Ecto.Multi.run(:tags, ...) , we used run/3 because we need to
perform both insert_all and all operations, and while the multi
exposes Ecto.Multi.insert_all/4 , it does not yet expose a
Ecto.Multi.all/3 . Whenever we need to perform a repository operation
that is not supported by Ecto.Multi , we can always fallback to run/3 or
run/5 .
Note: The first argument received by the function given to run/3 is the
repo in which the transaction is executing.
In other cases, you may need a single Ecto repository to interact with different
database instances which are not known upfront. For instance, you may need to
communicate with hundreds of database very sporadically, so instead of opening
up a connection to each of those hundreds of database when your application
starts, you want to quickly start connection, perform some queries, and then shut
down, while still leveraging Ecto's APIs as a whole.
defmodule MyApp.Repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres
@replicas [
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
def replica do
Enum.random(@replicas)
end
The code above defines a regular MyApp.Repo and four replicas, called
MyApp.Repo.Replica1 up to MyApp.Repo.Replica4 . We pass the :read_only
option to the replica repositories, so operations such as insert , update and
friends are not made accessible. We also define a function called replica with
the purpose of returning a random replica.
Next we need to make sure both primary and replicas are configured properly in
your config/config.exs files. In development and test, you can likely use the
same database credentials for all repositories, all pointing to the same database
address:
replicas = [
MyApp.Repo,
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
repos = %{
MyApp.Repo => "prod-primary",
MyApp.Repo.Replica1 => "prod-replica-1",
MyApp.Repo.Replica2 => "prod-replica-2",
MyApp.Repo.Replica3 => "prod-replica-3",
MyApp.Repo.Replica4 => "prod-replica-4"
}
children = [
MyApp.Repo,
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
Now that all repositories are configured, we can safely use them in your
application code. Every time you are performing a read operation, you can the
replica/0 function that we have added to return a random replica we will send
the query to:
MyApp.Repo.replica().all(query)
And now you are ready to work with primary and replicas, no hacks or complex
dependencies required!
Testing replicas
While all of the work we have done so far should fully work in development and
production, it may not be enough for tests. Most developers testing Ecto
applications are using a sandbox, such as the Ecto SQL Sandbox.
When using a sandbox, each of your tests run in an isolated and independent
transaction. Once the test is done, the transaction is rolled back. Which means
we can trivially revert all of the changes done in a test in a very performant way.
Unfortunately, even if you configure your primary and replicas to have the same
credentials and point to the same hostname, each Ecto repository will open up
their own pool of database connections. This means that, once you move to a
primary + replicas setup, a simple test like this one won't pass:
That's because Repo.insert! will write to one database connection and the
repository returned by Repo.replica() will perform the read in another
connection. Since the write is done in transaction, its contents won't be available
to other connections until the transaction commits, which will never happen for
test connections.
There are two options to tackle this problem: one is to change replicas and the
other is to use dynamic repos.
if Mix.env() == :test do
def replica, do: __MODULE__
else
def replica, do: Enum.random(@replicas)
end
Now during tests, the replica will always return the repository primary repository
itself. While this approach works fine, it has the downside that, if you
accidentally invoke a write function in in a replica, the test will pass, since the
replica function is returning the primary repo, while the code will fail in
production.
Using :default_dynamic_repo
When you list a repository in your supervision tree, such as MyApp.Repo , behind
the scenes it will start a supervision tree with a process named MyApp.Repo . By
default, the process has the same name as the repository module itself. Now
every time you invoke a function in MyApp.Repo , such as
MyApp.Repo.insert/2 , Ecto will use the connection pool from the process
named MyApp.Repo .
From v3.0, Ecto has the ability to start multiple processes from the same
repository. The only requirement is that they must have different process names,
like this:
children = [
MyApp.Repo,
{MyApp.Repo, name: :another_instance_of_repo}
]
While the particular example doesn't make much sense (we will cover an actual
use case for this feature next), the idea is that now you have two repositories
running: one is named MyApp.Repo and the other one is named
:another_instance_of_repo . Each of those processes have their own
connection pool. You can tell Ecto which process you want to use in your repo
operations by calling:
MyApp.Repo.put_dynamic_repo(MyApp.Repo)
MyApp.Repo.put_dynamic_repo(:another_instance_of_repo)
Once you call MyApp.Repo.put_dynamic_repo(name) , all invocations made on
MyApp.Repo will use the connection pool denoted by name .
How can this help with our replica tests? If we look back to the supervision tree
we defined earlier in this guide, you will find this:
children = [
MyApp.Repo,
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
We are starting five different repositories and five different connection pools.
Since we want the replica repositories to use the MyApp.Repo , we can achieve
this by doing the following on the setup of each test:
@replicas [
MyApp.Repo.Replica1,
MyApp.Repo.Replica2,
MyApp.Repo.Replica3,
MyApp.Repo.Replica4
]
setup do
for replica <- @replicas do
replica.put_dynamic_repo(MyApp.Repo)
end
:ok
end
defmodule repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres,
read_only: true,
dynamic_default_repo: dynamic_default_repo
end
end
And now your tests should work as before, while still being able to detect if you
accidentally perform a write operation in a replica.
Dynamic repositories
At this point, we have learned that Ecto allows you to start multiple connections
based on the same repository. This is typically useful when you have to connect
multiple databases or perform short-lived database connections.
For example, you can start a repository with a given set of credentials
dynamically, like this:
MyApp.Repo.start_link(
name: :some_client,
hostname: "client.example.com",
username: "...",
password: "...",
pool_size: 1
)
Ecto also allows you to start a repository with no name (just like that famous
horse). In such cases, you need to explicitly pass name: nil and match on the
result of MyApp.Repo.start_link/1 to retrieve the PID, which should be given
to put_dynamic_repo . Let's also use this opportunity and perform proper
database clean-up, by shutting up the new repository and reverting the value of
put_dynamic_repo :
default_dynamic_repo = MyApp.Repo.get_dynamic_repo()
{:ok, repo} =
MyApp.Repo.start_link(
name: nil,
hostname: "client.example.com",
username: "...",
password: "...",
pool_size: 1
)
try do
MyApp.Repo.put_dynamic_repo(repo)
MyApp.Repo.all(Post)
after
MyApp.Repo.put_dynamic_repo(default_dynamic_repo)
MyApp.Repo.stop(repo)
end
We can encapsulate all of this in a function too, which you could define in your
repository:
defmodule MyApp.Repo do
use Ecto.Repo, ...
try do
MyApp.Repo.put_dynamic_repo(repo)
callback.()
after
MyApp.Repo.put_dynamic_repo(default_dynamic_repo)
MyApp.Repo.stop(repo)
end
end
end
credentials = [
hostname: "client.example.com",
username: "...",
password: "..."
]
MyApp.Repo.with_dynamic_repo(credentials, fn ->
MyApp.Repo.all(Post)
end)
And that's it! Now you can have dynamic connections, all properly encapsulated
in a single function and built on top of the dynamic repo API.