-
Notifications
You must be signed in to change notification settings - Fork 2.6k
ActiveRecord SQL Styleguide
update_all
and where
both use "condition" syntax. Condition syntax is either a literal string of SQL, an array with the first element being SQL and insertion markers, and the rest being the insertions, or a hash. The following rules apply:
-
Hash syntax is preferred (if possible), then array, then literal:
where(user_id: user) # FIRST CHOICE :) where("user_id=?", user) # SECOND CHOICE :| where("user_id=#{user.id}") # THIRD CHOICE :( This is a good way to allow SQL injection attacks.
Remember, this applies to
update_all
:update_all(user_id: user)
Also to literals (scalar or arrays):
where(workflow_state: 'deleted') where(workflow_state: ['invited', 'active'])
And even to betweens:
# BETTER where(id: min..max) # WORSE where("id>=? AND id<=?", min, max)
-
Hash syntax is preferred for several reasons. It is easier (read: possible) for sharding to parse to do automatic shard inference and id translation for you, is less susceptible (not possible) to SQL injection, allows duplicate-condition folding, and allows for proper per-database quoting of column names (such as the from column on the messages table).
-
Nested hash syntax is preferred over string keys:
# BETTER where(enrollments: { user_id: user, workflow_state: 'active') # WORSE where('enrollments.user_id' => user, 'enrollments.workflow_state' => 'active')
-
-
If hash syntax is not possible, don't use insertions for string and integer literals:
# BETTER where("workflow_state<>'deleted'") # WORSE where("workflow_state<>?", 'deleted')
Once we've left hash syntax, we lose its benefits, so don't make it do extra work of doing an insertion. Insertions are still necessary for boolean and date literals, since those are formatted differently depending on the database.
-
Don't use extraneous braces and brackets.
where
(but notupdate_all
) takes a splat, so no need to use brackets there:# BETTER where(user_id: user) # WORSE where({user_id: user}) # BETTER where("created_at<?", 1.week.ago) # WORSE where(["created_at<?", 1.week.ago]) # WORKING update_all(["updated_at=created_at, user_id=?", new_user]) # NOT WORKING! update_all("updated_at=created_at, user_id=?", new_user)
-
Pass AR objects instead of IDs if you have them. This reduces code (no
.id
ormap(&:id)
oruser.is_a?(User) ? user.id : user
), and is also helpful for sharding because id translation is delayed until the shard the query will actually run on is active:# BETTER where(user_id: user) # WORSE where(user_id: user.id) where(user_id: user.try(:id)) where(user_id: (user.is_a?(User) ? user.id : user)) where(user_id: users.map(&:id)
This also applies to
update_all
, and array syntax (update_all("created_at=updated_at, user_id=?", user)
). This even means you don't have to worry about writing a differentIS NULL
query (if using hash syntax). -
When using SQL, don't have whitespace between non-word operators. This slightly reduces network traffic, but more importantly lets ops and monitoring see more of a query in a very small space:
# BETTER where("workflow_state<>'deleted'") # WORSE where("workflow_state <> 'deleted'")
-
When using SQL, use double quotes. Thus your string literals will work with single quotes:
# BETTER where("workflow_state<>'deleted'") # WORSE where('workflow_state<>?', 'deleted')
(notice that single quotes forces me to either use an insertion when I shouldn't or escape the single quote itself.)
-
Prefer
<>
over!=
in SQL# BETTER where("workflow_state<>'deleted'") # WORSE where("workflow_state!='deleted'")
select
, order
, and group
can take an array and symbols now. Rails will uniq duplicate columns.
- Prefer array and symbols if possible:
# BETTER
select([:id, :name])
# WORSE
select("id, name")
-
order
andgroup
can take a splat, so omit the brackets:
# BETTER
order(:id, :name)
# WORSE
order([:id, :name])
- use hash for desc
order
rather than raw SQL:
# BETER
order(id: :desc)
# WORSE
order("id DESC")
distinct
is a thing. It's preferred to use distinct
and make a scope, rather than use a select("DISTINCT something")
:
# BETTER
select(:id).distinct
# WORSE
select("DISTINCT id")
pluck
is a thing. It saves instantiating AR objects (not a cheap process).
# BETTER
pluck(:id)
# WORSE
select(:id).map(&:id)
first
and last
will automatically add an order (by primary key, or reverse an existing order) if necessary.
# BEST
User.last
User.order(:position).last
# BETTER
User.order(:id).last
# WORSE
User.order(id: :desc).first
# Rails screws this up ("ORDER BY COALESCE(id DESC, 0) DESC")
User.order("COALESCE(id, 0)").last
# so you have to do "WORSE"
User.order("COALESCE(id, 0) DESC").first
exists?
can be used to check if there are any records matching the query. Typically in the past we've written this as query.first.present?
, !!query.first
, query.empty?
or query.count > 0
, simply due to ignorance of exists?
. exists?
is the preferred method. count > 0
is bad because it has to count all rows, which can be much slower than just checking if at least one row exists. Be kind to your DB.
# BETTER
User.where(id: ids, workflow_state: 'active').exists?
# WORSE
User.where(id: ids, workflow_state: 'active').first.present?
!!User.where(id: ids, workflow_state: 'active').first
# EVEN WORSE
User.where(id: ids, workflow_state: 'active').count > 0
User.where(id: ids, workflow_state: 'active').any?
Just don't. See Associations.
Do not put options on your associations. This just causes a lot of pain later. Remember, it's easier to add to a scope than to remove from it.
# BETTER
has_many :assignments
def self.assignments_with_submissions
assignments.include(:submissions)
end
# WORSE
has_many :assignments, -> { includes(:submissions) }
course.assignments.where(assignment_group_id: 1).each {} # just loaded a TON of submission objects, that you probably didn't need
# BETTER
# course.rb
has_many :enrollments
# enrollment.rb
scope :active, -> { where(workflow_state: 'active') }
# WORSE
# course.rb
has_many :enrollments, -> { where(workflow_state: 'active') }
# in a datafixup
course.enrollments.where(user_id: u).delete_all # we miss data, assuming enrollments means, you know, enrollments, regardless of state
Preloading can be a huge performance boon, by avoiding N+1 queries. However, you must do it carefully.
Prefer eager_load
/preload
over includes
. eager_load
always does a LEFT OUTER JOIN
; preload
always splits it into two queries. includes
may do either, even depending on the contents of user provided data. This can be especially bad if the preloaded data crosses shard boundaries - you just broke your query! Usually you want to do preload
- doing a LEFT OUTER JOIN
makes very hard to understand and reason-about queries, and actually transfers a lot more data because the non-changing data is duplicated with each row in a one-to-many.
# GOOD
course.enrollments.preload(user: :communication_channels).where(type: some_variable)
# BAD
# if some_variable happens to contain the string "users." or "communication_channels.", then Rails will use
# a left outer join in a single query. Not only is the SQL very hard to read, but it also breaks in a
# multi-shard environment. It will assume it got all the data, even though some users may belong on a
# different shard, so it won't pull in their communication_channels from that shard.
course.enrollments.includes(user: :communication_channels).where(type: some_variable)
# GOOD
course.enrollments.eager_load(:user).where(users: { name: 'Cody' })
Sometimes you want a relation (to continue chaining query methods). Sometimes you want an array (to further manipulate it, or just to access it a lot). In Rails 2 and Rails 3, #all
will give you an array. In Rails 4, #all
just returns another relation. Use #to_a
instead; it works in all versions of Rails.
Some queries you just can't optimize. You'll want to run them on the slave:
Shackles.activate(:slave) do
some_really_expensive_query
end
Sometimes you're building up a relation, and then discover that it shouldn't return any results. However, you can't return an empty array, because the caller expects a relation. Instead of using a dummy condition that can never be true, use the none
scope. It behaves like a relation, but will avoid even going to the database at all.
def students_that_basket_weave
students = self.students
unless feature_enabled?(:basket_weaving)
return students.none
end
students
end
ActiveRecord can take a subquery as a condition. This can be useful in various ways. For example, a NOT EXISTS
query can be more performant than doing a join, or for forming a query slightly different than an association would.
User.where(id: list_of_users).where("NOT EXISTS (?)", Enrollment.active.where("user_id=users.id")).to_a
# Returns users that don't have any active enrollments (as opposed to checking them one at a time with
# list_of_users.select { |u| !u.enrollments.exists? } )
# => SELECT "users".* FROM "users" WHERE "users"."id" IN (1, 2) AND (NOT EXISTS (SELECT "enrollments".* FROM "enrollments" WHERE (enrollments.workflow_state<>'deleted') AND (user_id=users.id)))
You can merge associations to easily find a many-to-many record:
assignment.submissions.merge(user.submissions.scoped).first
# Returns the user's assignment submission, without having to explicitly add the conditions from both associations
# => SELECT "submissions".* FROM "submissions" WHERE "submissions"."assignment_id" = 1 AND "submissions"."user_id" = 6
When using raw SQL, quote the table name whenever used in FROM clauses (including all JOINs), and the target of UPDATEs and DELETEs. It is not necessary for the conditions of WHERE or HAVING clauses, or on ORDER BY clauses. This allows us to use multiple shards in the same database, without switching users or schema search paths:
# GOOD
User.connection.update("UPDATE #{User.quoted_table_name} SET #{some_really_complicated_dynamic_set}")
Assignment.joins("INNER JOIN #{Course.quoted_table_name} ON context_id=courses.id")
# BAD
User.connection.update("UPDATE users SET #{some_really_complicated_dynamic_set}")
Assignment.joins("INNER JOIN courses ON context_id=courses.id")
Vanilla Rails doesn't support joins on updates or deletes. Canvas does, by using Postgres and MySQL specific syntax for it.
Enrollment.joins(:course).where(courses: { root_account_id: 1 }).update_all(updated_at: Time.now.utc)
# => UPDATE "enrollments" SET "updated_at" = '2015-01-27 17:12:04.127305' FROM "courses" WHERE "courses"."root_account_id" = 1 AND ("courses"."id" = "enrollments"."course_id")
In data fixups, you may need to update or delete a lot of rows. If you do a normal update or delete, it will do the entire table at once. You could use a find_ids_in_batches or find_ids_in_ranges, but that may force a non-optimal query plan (essentially a single pass over the entire table, even though there's a useful index on one of your conditions). Instead, just loop around a limited update_all, and it will use the index, but only do a batch at a time. Notice that it simply does a subquery. This can also be useful to delete all but the first record from a scope (i.e. when deleting duplicates):
while course.enrollments.limit(1000).update_all(updated_at: Time.now.utc) > 0; end
# => UPDATE "enrollments" SET "updated_at" = '2015-01-27 17:15:04.313610' WHERE "enrollments"."id" IN (SELECT "enrollments"."id" FROM "enrollments" WHERE "enrollments"."course_id" = 1 AND (enrollments.workflow_state != 'deleted') LIMIT 1000)
duplicate_scope = Enrollment.where(user_id: 1, type: 'StudentEnrollment', course_id: 2)
duplicate_scope.offset(1).delete_all
If you have a lot of data to insert, and don't need to run ActiveRecord callbacks and such, you can use bulk_insert
. It's even faster than running a transaction with a bunch of INSERT statements in it. It just takes an array of hashes:
def bulk_insert_participants(user_ids, options = {})
options = {
:conversation_id => self.id,
:workflow_state => 'read',
:has_attachments => has_attachments?,
:has_media_objects => has_media_objects?,
:root_account_ids => self.root_account_ids
}.merge(options)
ConversationParticipant.bulk_insert(user_ids.map{ |user_id|
options.merge({:user_id => user_id})
})
end
Are you looking for one of our commercial subscriptions, professional services, support, or our hosted solution? Check out canvaslms.com.