Case-Insensitive String Columns with Postgres, Phoenix, and Ecto
Annie Kiley, Former Application Development Director
Article Categories:
Posted on
Postgres' citext extension allows you to treat text as case-insensitive at the database level. In this post, we'll walk through implementing and using a citext email field in Phoenix and Ecto.
I was working on a Phoenix/Ecto/Postgres application recently, and ran into a bug with the password reset flow. It didn't work unless users entered their email address with the exact same capitalization they used when they registered. My first pass at a fix involved changing the query to do a case-insensitive comparison of the email address using Ecto.Query.API.ilike/2. This fixed the issue, but I had a suspicion that similar bugs might be lurking in other places.
The heart of the problem was this: For our purposes, email addresses were inherently case-insensitive. Since strings aren't automatically treated that way, my application code needed to account for that in every place where the data was used. I went looking for a more robust solution.
I found it in Postgres' citext module. This extension is well-supported by Ecto and allows you to treat fields as case-insensitive at the database level. This way, the application code (and application developers) never need to worry about it.
Implementation #
Implementing a case-insensitive string column in Phoenix/Ecto is straightforward. The example below walks through adding a case-insensitive email
column to an existing users
table in a Phoenix app.
Step 1. Create the citext extension #
The citext extension ships with Postgres, so we don't need to install it from anywhere. We do, however, need to create it to make it available. With Ecto, the easiest way to do this is in a migration.
defmodule MyApp.Repo.Migrations.CitextEmails do
use Ecto.Migration
def up do
execute "CREATE EXTENSION citext"
end
def down do
execute "DROP EXTENSION citext"
end
end
Step 2. Add a field using the citext type #
Now that the citext type is available, you can add (or modify, if you're updating an existing column) the field you want using the new type. Here is the same migration, updated to add a required case-insensitive email
column to an existing users
table.
defmodule MyApp.Repo.Migrations.CitextEmails do
use Ecto.Migration
def up do
execute "CREATE EXTENSION citext"
alter table(:users) do
add(:email, :citext, null: false)
end
end
def down do
alter table(:users) do
remove(:email)
end
execute "DROP EXTENSION citext"
end
end
To get this working, we also need to add the email
field to our user schema. Notice that we use :string
as the type. The database is using citext to ignore capitalization for comparisons, but this field is a string at the application level.
defmodule MyApp.User do
use Ecto.Schema
import Ecto.Changeset
schema "users" do
field(:email, :string)
end
def changeset(struct, attrs) do
struct
|> cast(attrs, [:email])
|> validate_required([:email])
end
end
Citext also plays nicely with other features we might want to use. Often we'll want things like email addresses and usernames to be unique, so let's take this a little further and add a unique index.
Step 3. Add a unique index #
We don't need to do anything special to get case-insensitive uniqueness enforced. Since the email column is already using citext, we can just add a unique index as we would to a text field. Let's add this to the same migration.
defmodule MyApp.Repo.Migrations.CitextEmails do
use Ecto.Migration
def up do
execute "CREATE EXTENSION citext"
alter table(:users) do
add(:email, :citext, null: false)
end
create index(:users, [:email], unique: true)
end
def down do
alter table(:users) do
remove(:email)
end
execute "DROP EXTENSION citext"
end
end
As the last piece, let's add a uniqueness constraint in the user schema so that ecto can add errors directly to the changeset.
defmodule MyApp.User do
use Ecto.Schema
import Ecto.Changeset
schema "users" do
field(:email, :string)
end
def changeset(struct, attrs) do
struct
|> cast(attrs, [:email])
|> validate_required([:email])
|> unique_constraint(:email)
end
end
And that's it for implementation! We've moved responsibility for thinking about case-insensitivity from the application to the database. Let's take a look at what this means for working with the email column in the console.
Usage #
First, we'll set up sample data to play with. We'll start with two users, one with an email that's all lowercase and one that's all uppercase.
iex> alias MyApp.Repo
MyApp.Repo
iex> alias MyApp.User
MyApp.User
iex> %User{} |> User.changeset(%{email: "lower@viget.com"}) |> Repo.insert()
INSERT INTO "users" ("email") VALUES ($1) RETURNING "id" ["lower@viget.com"]
{:ok,
%MyApp.User{
__meta__: #Ecto.Schema.Metadata<:loaded, "users">,
email: "lower@viget.com",
id: 1
}}
iex> %User{} |> User.changeset(%{email: "UPPER@VIGET.COM"}) |> Repo.insert()
INSERT INTO "users" ("email") VALUES ($1) RETURNING "id" ["UPPER@VIGET.COM"]
{:ok,
%MyApp.User{
__meta__: #Ecto.Schema.Metadata<:loaded, "users">,
email: "UPPER@VIGET.COM",
id: 2
}}
Now we can use all the same queries we would use for regular text columns, and Postgres will automatically make these comparisons case-insensitive.
We can use Ecto.Repo.get_by/3
to fetch a user record by the email field, and it doesn't matter what capitalization we use when writing the query:
iex> Repo.get_by(User, %{email: "LOWER@VIGET.COM"})
SELECT u0."id", u0."email" FROM "users" AS u0 WHERE (u0."email" = $1) ["LOWER@VIGET.COM"]
%MyApp.User{
__meta__: #Ecto.Schema.Metadata<:loaded, "users">,
email: "lower@viget.com",
id: 1
}
We can search users using Ecto.Repo.where/3
with ==
and it will ignore case when searching for matches:
iex> User |> where([u], u.email == "upper@viget.com") |> Repo.all()
SELECT u0."id", u0."email" FROM "users" AS u0 WHERE (u0."email" = 'upper@viget.com') []
[
%MyApp.User{
__meta__: #Ecto.Schema.Metadata<:loaded, "users">,
email: "UPPER@VIGET.COM",
id: 2
}
]
Even Ecto.Query.API.like/3
will allow us to ignore case when using wildcards, like searching for all emails with the same domain:
iex> import Ecto.Query
Ecto.Query
iex> (from u in User, where: like(u.email, "%viget.com")) |> Repo.all()
SELECT u0."id", u0."email" FROM "users" AS u0 WHERE (u0."email" LIKE '%viget.com') []
[
%MyApp.User{
__meta__: #Ecto.Schema.Metadata<:loaded, "users">,
email: "lower@viget.com",
id: 1
},
%MyApp.User{
__meta__: #Ecto.Schema.Metadata<:loaded, "users">,
email: "UPPER@VIGET.COM",
id: 2
}
]
The unique constraint is also enforced exactly as we'd expect. It won't allow us to create two users with the same email but different capitalization.
iex> %User{} |> User.changeset(%{email: "annie@viget.com"}) |> Repo.insert()
INSERT INTO "users" ("email") VALUES ($1) RETURNING "id" ["annie@viget.com"]
{:ok,
%MyApp.User{
__meta__: #Ecto.Schema.Metadata<:loaded, "users">,
email: "annie@viget.com",
id: 1
}}
iex> %User{} |> User.changeset(%{email: "ANNIE@VIGET.COM"}) |> Repo.insert()
INSERT INTO "users" ("email") VALUES ($1) RETURNING "id" ["ANNIE@VIGET.COM"]
{:error,
#Ecto.Changeset<
action: :insert,
changes: %{email: "ANNIE@VIGET.COM"},
errors: [
email: {"has already been taken",
[constraint: :unique, constraint_name: "users_email_index"]}
],
data: #MyApp.User<>,
valid?: false
>}
Postgres' citext
extension was a perfect solution for the problem I started with. Using it fixed the original bug, simplified several queries in my application, and now I can sleep at night knowing that similar bugs are not going to crop up in the future.