Searching Serialized Fields in Rails Using Postgres Arrays
Ryan Stenberg, Former Developer
Article Category:
Posted on
In Rails, ActiveRecord allows you store and retrieve data structures like arrays and hashes in a single string
or text
-type database column through application-level serialization.
It is a common pattern with models having fields like tags
or nicknames
, where you want to be able to store an arbitrary list of strings:
class Post < ActiveRecord::Base serialize :tags end
This is all fine n' dandy until you're like, "I want to find all posts tagged with either 'hobbits' or 'gandalf'!" Well..
The database only knows it has a string, not a comma-separated list of strings. It is possible to search application-level serialized fields, but it's pretty hacky and brittle:
class Post scope :including_all_tags, -> (tags) { where(matching_tag_query(tags, 'AND')) } scope :including_any_tags, -> (tags) { where(matching_tag_query(tags, 'OR')) } private def matching_tag_query(tags, condition_separator = 'OR') tags.map { |tag| "(tags LIKE '%#{tag}%')" }.join(" #{condition_separator} ") end end # In use Post.including_all_tags(['hobbits', 'gandalf']) Post.including_any_tags(['hobbits', 'gandalf'])
This approach builds a WHERE
condition and matches against the given tags, which grows and becomes increasingly inefficient as the number of searched tags increases. Also, if you have tags like gandalf
and gandalf-the-grey
, you'd get false positives with the above example. While passable for simple cases, there's got to be a better way! Fortunately for us, along comes Postgres.
PostgreSQL Arrays
Postgres has built-in array-type columns, which we can take advantage of in our Rails 4.X apps with the postgres_ext
gem. What does this mean for our Rails apps? If we can store arrays in a single database column, then our database and application have the same understanding of the structure of the data. This isn't possible with the out-of-the-box serialization model. Most importantly, we can use SQL to effectively search inside our array-type columns.
With PostgresExt installed, we can add an array: true
flag to string/text columns:
class PostgresHaveMyBabiesMigration < Migration def change create_table :posts do |t| t.text :tags, array: true end end end
Now, the model is no longer concerned with serializing the :tags
field:
class Post < ActiveRecord::Base # crickets end
Searching
As mentioned, one of the primary benefits of Postgres's array-type column is the enhanced searchability through array-specific SQL. The Postgres Extensions gem unlocks a number of array operators and functions for usage in SQL strings. Let's take a look at operators in both their SQL and Arel forms. If you're unfamiliar with Arel, check out this post where I expand on its usefulness.
Overlap Operator
The overlap operator (&&
) is useful when you're comparing two arrays to see if they share one or more elements in common.
In SQL:
SELECT * FROM posts WHERE posts.tags && '{hobbits,gandalf}'
Utilizing Arel:
Post.where(Post.arel_table[:tags].overlap(['hobbits', 'gandalf'])) # or Post.where.overlap(tags: ['hobbits', 'gandalf'])
These queries would return any posts having either 'hobbit' or 'gandalf' as a tag.
Contains Operator
The contains operator (@>
) is useful when you're looking for arrays that contain all the elements of another.
In SQL:
SELECT * FROM posts WHERE posts.tags @> '{hobbits,gandalf}'
Utilizing Arel:
Post.where(Post.arel_table[:tags].contains(['hobbits', 'gandalf'])) # or Post.where.contains(tags: ['hobbits', 'gandalf'])
These queries would return any posts having both the 'hobbits' and 'gandalf' tags.
Wrapping Up
Postgres is awesome. The array extensions are magical. Instead of application-level serialization, let Postgres handle array persistance for you. It unifies the understanding of array-type data between the database and application and simplifies searching.