Notes from Carrierwave to Active Storage

I recently migrated Day of the Shirt, my graphic t-shirt sale aggregator, from storing image attachments with Carrierwave to Active Storage. It went ok! 👍

There were a couple of things driving this migration, though Carrierwave had served me very well for nearly a decade:

  • For budgetary reasons, I was moving the storage service from S3 to Digital Ocean Spaces. I knew I’d be doing some sort of data migration regardless.
  • I was using some monkeypatches of Carrierwave v2 that weren’t compatible with Carrierwave v3. So I knew I’d have to dig into the internals anyways if I wanted to stay up to date.
  • I generally trust Rails, and by extension Active Storage, to be reliable stewards when I take them on as a dependency.

And I had a couple of requirements to work though, largely motivated because images in Day of the Shirt are the content with dozens or hundreds displayed on a single page:

  • For budget (slash performance), I need to link directly to image assets. No proxying or redirecting through the Rails app.
  • For SEO, I need to customize the image filenames so they are relevant to the content.
  • For performance (slash availability), I need to pre-process image transformations (convert, scale, crop) before they are published. Dozens of new designs can go up on the homepage at once.
  • For availability, I need to validate that the images are (1) transformable and (2) actually transformed before they are published; invalid or missing images are unacceptable.

How’d it go? Great! 🎉 I am now fully switched over to Active Storage. It’s working really well and I was able to meet all of my requirements. Active Storage is very nice, as nice as Carrierwave.

But the errata? Yes, that’s why I’m writing the blog post, and probably why you’re reading. To document all of the stuff I did that wasn’t in the very excellent Active Storage Rails Guide. Let’s go through it:

Direct Linking to images is possible via the method described in this excellent post from Florin Lipan: “Serving Active Storage uploads through a CDN with Rails direct routes”.

Customizing Active Storage filenames is possible with a monkeypatch (maybe someday it will be possible directly). The patch simply adds the specified filename to the end of what otherwise would be a random string; and it seems durable through variants such that the variant extensions will be updated properly when the format is transformed (e.g. from a .png to a .jpg):

# config/initializers/active_storage.rb
module MonkeypatchBlobKey
  def key
    self[:key] ||= begin
      # ActiveStorage::Filename doesn't provide an easy nil-check
      filename_string = begin
        filename.to_s
      rescue StandardError
        nil
      end

      unique_token = self.class.generate_unique_secure_token(length: ActiveStorage::Blob::MINIMUM_TOKEN_LENGTH)
      if filename_string
        # "xyz1234/foobar.jpg"
        File.join(unique_token, filename_string)
      else
        # "xyz1234"
        unique_token
      end
    end
  end
end

ActiveSupport.on_load(:active_storage_blob) do
  ActiveStorage::Blob.prepend MonkeypatchBlobKey
end

Preprocessing variants required tapping into some private methods to get the variant names back out of the system. Here’s an example of processing all of the variants when the attachment changes. Beware: attachments happen in an after_commit, which is good, but means that I had to introduce a published state to the record to ensure it was not visible until the variants were processed (there is a preprocessed: option to process individual variants async in a background job but that, unfortunately, doesn’t meet my needs for synchronizing them all at once):


class Shirt < ApplicationRecord
  has_one_attached :graphic do |attachable|
    attachable.variant :full, format: :jpg
    attachable.variant :large, resize_to_limit: [1024, 1024], format: :jpg
    attachable.variant :square, resize_to_fill: [300, 300], format: :jpg
    attachable.variant :thumb, resize_to_fill: [100, 100], format: :jpg
  end

  after_commit :process_graphic_variants_and_publish, if: -> (shirt){ shirt.graphic&.blob&.saved_changes? }, on: [:create, :update]

  def process_graphic_variants
    attachment_variants(:graphic).each do |variant|
      graphic.variant(variant).processed
    end
    update(published: true)
  end

  # All of the named variants for an attachment
  # @param attachment [Symbol] the name of the attachment
  # @return Array[Symbol] the names of the variants
  def attachment_variants(attachment)
    send(attachment).attachment.send(:named_variants).keys
  end
end

Validating variants was easy with a very nice and well-named gem: active_storage_validations. It works really well.

You will have N+1s, where you forget to add with_attached_* scopes to some queries. Unfortunately Active Storage’s schema is laid out in a way that it will emit queries to the same model/table even when it’s loading correctly, so you may get detection false positives too. You can see that clearly in the next example with the doubly-nested blob association.

Active Storage’s schema is a beast. I get that it’s gone through a lot of changes, and Named Variants are an amazing hack when you see how they’ve been implemented. And it’s wild. You can see that by how the scope for with_attached_* is generated:

includes("#{name}_attachment": { blob: {
  variant_records: { image_attachment: :blob },
  preview_image_attachment: { blob: { variant_records: { image_attachment: :blob } } }
} })

I originally thought that when eager-loading through an association (e.g. Merchant.includes(:shirts)) I’d have to do something like this (🫠):

Merchant.includes(shirts: { blob: {
  variant_records: { image_attachment: :blob },
  preview_image_attachment: { blob: { variant_records: { image_attachment: :blob } } })

…but fortunately this seems to work too (💅):

Merchant.includes(:shirts).merge(Shirt.with_attached_graphic)

That’s everything. All in all I’m very happy with the migration 🌅

Introducing GoodJob v4

GoodJob version 4.0 has been released! 🎉 GoodJob v4 has breaking changes that should be addressed through a transitionary v3.99 release, but if you’ve kept up with v3.x releases and migrations, you’re likely ready to upgrade 🚀

The README has an upgrade guide. If you’d like to leave feedback about this release, please comment on the GitHub Discussions post 📣

If you’re not familiar with GoodJob, you can read the introductory blog post from four years ago. We’ve come pretty far.

Breaking changes to job schema

GoodJob v4 changes how job and job execution records are stored in the database; moving from job and executions being commingled in the good_jobs table to Jobs (still in good_jobs) having many discrete Execution records in the good_job_executions table.

To safely upgrade, all unfinished jobs must use the new schema relationship, tracked in the good_jobs.is_discrete column. This change was transparently introduced in GoodJob v3.15.4 (April 2023), so your application is likely ready-to-upgrade already if you have kept up with GoodJob updates and migrations. You can check by running v3.99’s GoodJob.v4_ready? in production or run the following SQL query on the production database and check it returns zero: SELECT COUNT(*) FROM "good_jobs" WHERE finished_at IS NULL AND is_discrete IS NOT TRUE. If not all unfinished jobs are stored in the new format, either wait to upgrade until those jobs finish or discard them. If you upgrade prematurely to v4 without allowing those jobs to finish, they may never be performed.

Other notable changes

GoodJob v4:

  • Only supports Rails 6.1+, CRuby 3.0+ and JRuby 9.4+, Postgres 12+. Rails 6.0 is no longer supported. CRuby 2.6 and 2.7 are no longer supported. JRuby 9.3 is no longer supported.
  • Changes job priority to give smaller numbers higher priority (default: 0), in accordance with Active Job’s definition of priority.
  • Enqueues and executes jobs via the GoodJob::Job model instead of GoodJob::Execution
  • Changes the behavior of config.good_job.cleanup_interval_jobs, GOOD_JOB_CLEANUP_INTERVAL_JOBS, config.good_job.cleanup_interval_seconds, or GOOD_JOB_CLEANUP_INTERVAL_SECONDS set to nil or to no longer disable count- or time-based cleanups. Instead, now set to false to disable, or -1 to run a cleanup after every job execution.

New Features

GoodJob v4 does not introduce any new features on its own. In the 110 releases since GoodJob v3.0 was released (June, 2022), these new features and improvements have been introduced:

  • Batches
  • Bulk enqueueing including support for Active Job’s perform_all_later.
  • Labelled jobs
  • Throttling added to Concurrency Controls
  • Improvements to the Web Dashboard, including Dark Mode, performance dashboard, and improved UI, and customizable templates.
  • Storage of error backtraces. Improved handling of job error conditions, including signal interruptions. Added GoodJob.current_thread_running? and GoodJob.current_thread_shutting_down? to support job iteration.
  • Ordered Queues, queue_select_limit and further options for configuring queue order and performance.
  • Improvements to Cron / Repeating Jobs.
  • Operational improvements including systemd integration, improved health checks.

A huge thank you to 88 (!) GoodJob v3.x contributors 🙇🏻 @afn, @ain2108, @aisayo, @Ajmal, @aki77, @alec-c4, @AndersGM, @andyatkinson, @andynu, @arnaudlevy, @baka-san, @benoittgt, @bforma, @BilalBudhani, @binarygit, @bkeepers, @blafri, @blumhardts, @ckdake, @cmcinnes-mdsol, @coreyaus, @DanielHeath, @defkode, @dixpac, @Earlopain, @eric-christian, @erick-tmr, @esasse, @francois-ferrandis, @frans-k, @gap777, @grncdr, @hahwul, @hidenba, @hss-mateus, @Intrepidd, @isaac, @jgrau, @jklina, @jmarsh24, @jpcamara, @jrochkind, @julienanne, @julik, @LucasKendi, @luizkowalski, @maestromac, @marckohlbrugge, @maxim, @mec, @metalelf0, @michaelglass, @mitchellhenke, @mkrfowler, @morgoth, @Mr0grog, @mthadley, @namiwang, @nickcampbell18, @padde, @patriciomacadden, @paul, @Pauloparakleto, @pgvsalamander, @remy727, @rrunyon, @saksham-jain, @sam1el, @sasha-id, @SebouChu, @segiddins, @SemihCag, @shouichi, @simi, @sparshalc, @stas, @steveroot, @TAGraves, @tagrudev, @thepry, @ur5us, @WailanTirajoh, @yenshirak, @ylansegal, @yshmarov, @zarqman

Rails Strict Locals, undefined local_assigns, and reserved keywords

Update: This has been mostly fixed upstream in Rails (Rails 8.0, I think) and documented in the Rails Guides.

Huge thank you to Vojta Drbohlav in the Rails Performance Slack for helping me figure this out! 🙇

Things I learned today about a new-to-me Ruby on Rails feature:

  • Rail 7.1 added a feature called “Strict Locals” that uses a magic comment in ERB templates to declare required and optional local variables. It looks like this: <%# locals: (shirts:, class: nil) %> which in this example means when rendering the partial, it must be provided a shirts parameter and optional class parameter. Having ERB templates act more like callable functions with explicit signatures is a nice feature.
  • When using Rails Strict Locals, the local_assigns variable is not defined. You can’t use local_assigns. You’ll see an error that looks like ActionView::Template::Error (undefined local variable or method 'local_assigns' for an instance of #<Class:0x0000000130536cc8>). This has been fixed 🎉 though local_assigns doesn’t expose default values.
  • This is a problem if your template has locals that are also Ruby reserved keywords like class or if, which can be accessed with local_assigns[:class] unless you start using Strict Locals.
    To access local variables named with reserved keywords in your ERB template when using Strict Locals, you can use binding.local_variable_get(:the_variable_name), e.g., binding.local_variable_get(:class) or binding.local_variable_get(:if). This is still necessary if you want to access reserved keywords with defaults because the defaults don’t show up in local_assigns.

Recently, June, 2024

  • I finished reading the Poppy Wars trilogy. It got tiresome by the end. I liked Babel much more, and I’m probably reading Yellowface next. I also read Exit Interview, which was another thrilling entry to the canon of “fantastically brilliant not-men who work in tech for whom it really should go better but unfortunately and predictably doesn’t”.
  • We saw “Film is dead. Long live film!” at the Roxie. It was enjoyable, reminded me of my big-Cable-Access-TV-energy days, and also gave far too little screen time to hearing from the film collectors wives (yes, exactly) and children. I thought this was my first movie theater since Covid, but Angelina reminded me we saw the Barbie Movie in a theater.
  • I played Animal Well until the credits roll, and then I read the spoilers and have been going for completionism. Though I don’t imagine I’ll get there before fully losing interest.
  • I joined the Program Committee for RubyConf in Chicago in November. We’re trying to get the Call for Papers/Speakers released this week. Should be a good one.
  • I started working on the GoodJob major version 4 release. It’s simply doing all the breaking changes that were previously warned about. A deprecation-warning made is debt unpaid. It’s not so bad.
  • With my friend Rob, I started volunteering on the backend of Knock for Democracy. I occasionally see people try to make a Tolstoy-inspired statement like “Healthy applications are all the same, but unhealthy ones are each unhealthy in their own way”. But that’s not true! All the apps that need some TLC have the exact same problems as all the others. Surprise me for once!
  • At work I’m nearly, nearly done with performance feedback and promotion packets and calibrations and all that. It’s the stuff that I truly enjoy as a manager, and also is terrible because of everything that’s outside my control. I also got asked to draw up a Ruby sponsorship budget for the year, which is the most normal administrative thing I think I’ve ever been asked in my (checks watch) 12 years of working in Bay Area tech.
  • I think the days of mobile apps for Day of the Shirt are numbered. I haven’t updated them ever since Apple rejected an update for the olde “Section 4.2: Minimum Functionality” thing, after like 8 freaking years in the App Store already. I did the olde “talk to someone on the phone at Apple who unhelpfully can’t tell me what would be enough functionality.” And so they’ve just been sitting and recently I got an email from a user that it won’t install on their new Android phone. So that sucks. It was a pain (when I was doing it) to develop 3 separate versions of Day of the Shirt: web, iOS, and Android, so maybe this is a sign to probably just commit to the web.
  • A triple recipe of bolognese is too much for my pot to handle.

A comment on Second Systems

I recently left this comment on a Pragmatic Engineer review of Fred Brook’s Mythical Man Month in “What Changed in 50 Years of Computing: Part 2”. This was what I reacted to:

Software design and “the second-system effect”

Brooks covers an interesting phenomenon in Chapter 5: “The Second-System Effect.” He states that architects tend to design their first system well, but they over-engineer the second one, and carry this over-engineering habit on to future systems.

“This second system is the most dangerous system a [person] ever designs. When [they] do this and [their] third and later ones, [their] prior experiences will confirm each other as to the general characteristics of such systems, and their differences will identify those parts of [their] experience that are particular and not generalizable.”

The general tendency is to over-design the second system, using all the ideas and frills that were sidetracked on the first one.”

I can see this observation making sense at a time when:

  • Designing a system took nearly a year
  • System designs were not circulated, pre-internet
  • Architects were separated from “implementers”

Today, all these assumptions are wrong:

  • Designing systems takes weeks, not years
  • System designs are commonly written down and critiqued by others. We cover more in the article, Engineering Planning with RFCs, Design Documents and ADRs
  • “Hands-off architects” are increasingly rare at startups, scaleups, and Big Tech

As a result, engineers design more than one or two systems in a year and get more feedback, so this “second-system effect” is likely nonexistent.

And this was my comment/reaction:

I think the Second-System Effect is still very present.

I would say it most frequently manifests as a result of not recognizing Gall’s Law: “all complex systems that work evolved from simpler systems that worked.”

What trips people up is usually that they start from a place of “X feature is hard to achieve in the current system” and then they start designing/architecting for that feature and not recognizing all of the other table-stakes necessities and Chesterton Fences of the current system, which only are recognized and bolted on late in the implementation when it is more difficult and complicated.

The phrase “10 years of experience, or 1 year of experience 10 times” comes to mind when thinking of people who only have the experience of implementing a new system once and trivially, and do not have the experience of growing and supporting and maintaining a system they designed over a meaningful lifecycle.

Which also reminds me of a recent callback to a Will Larson review of Kent Beck’s Tidy First about growing software:

I really enjoyed this book, and reading it I flipped between three predominant thoughts:

  • I’ve never seen a team that works this way–do any teams work this way?
  • Most of the ways I’ve seen teams work fall into the “never tidy” camp, which is sort of an implicit belief that software can’t get much better except by replacing it entirely. Which is a bit depressing, if you really think about it
  • Wouldn’t it be inspiring to live in a world where your team believes that software can actually improve without replacing it entirely?

A Ruby Meetup about Rails autoloading and 3 Podcasts

Me standing on a small stage in front of a slide with 2 adoptable cats and the GitHub logo

Last week I spoke at the SF Bay Area Ruby Meetup, which was hosted at GitHub HQ, which made for an easy commute for me. Here’s the video and the slides. My talk was entitled “An OK compromise: Faster development by designing for the Rails autoloader”

Also, I haven’t shared here the 3 podcasts I did over the past few years. Here they are:

Rails Active Record: Will it bind?

Active Record, Ruby on Rail’s ORM, has support for Prepared Statements that work if you structure your query for it. Because of my work on GoodJob, which can make a lot of nearly identical database queries every second to pop its job queue, I’ve invested a lot of time trying to make those queries as efficient as possible.

Prepared Statements are a database feature that allow the database to reuse query parsing and planning when queries are structurally the same. Prepared statements, at least in Postgres, are linked to the database connection/session and stored in memory in the database. This implies some things:

  • There can be a performance benefit to making queries “preparable” for Prepared Statements which Active Record will decide for you based on how a query is structured.
  • There can be a performance harm (or at least non-useful database processing and memory growth) if your application produces a lot of preparable queries that are never reused again.

By default, Rails will have the database store 1,000 queries (statement_limit: 1000). Many huge Rails monoliths (like GitHub, where I work) disable prepared statements (prepared_statements: false) because there is too much query heterogeneity to get a performance benefit and the database spends extra and unnecessary cycles storing and evicting unused prepared statements. But that’s maybe not your application!

Structurally similar queries can still have variable values inside of them: these are called bind parameters. For example, in GoodJob, I want pop jobs that are scheduled to run right now. In SQL that might look like:

SELECT * FROM good_jobs WHERE scheduled_at < '2024-03-31 16:44:11.047499`

That query has the downside of the timestamp changing multiple times a second as new queries are emitted. What I want to do is extract out the timestamp into a bind parameter (a ? or $1 depending on the database adapter) instead of embedded in the query:

SELECT * FROM good_jobs WHERE scheduled_at < ?

That’s the good stuff! That’s ideal and preparable 👍

But that’s raw SQL, now how to do that in Active Record? In the following exploration, I’m using the private to_sql_and_binds method in Active Record; I’m also using the private Arel API. This is all private and subject to change, so be sure to write some tests around this behavior if you do choose to use it. Here’s some quick experiments I’ve done:

job = Job.create(scheduled_at: 10.minutes.ago)

# Experiment 1: string with ?
relation = Job.where("scheduled_at < ?", Time.current)
# =>  Job Load (0.1ms)  SELECT "good_jobs".* FROM "good_jobs" WHERE (scheduled_at < '2024-03-31 16:34:11.064614')
expect(relation.to_a).to eq([job])
_query, binds, prepared = Job.connection.send(:to_sql_and_binds, relation.arel)
expect(binds.size).to eq 0 # <-- Not a bind parameter
expect(prepared).to eq false # <-- Not preparable

# Experiment 2: Arel query with value
relation = Job(Job.arel_table['scheduled_at'].lt(Time.current))
# =>  Job Load (0.1ms)  SELECT "good_jobs".* FROM "good_jobs" WHERE scheduled_at < '2024-03-31 16:34:11.064614'
expect(relation.to_a).to eq([job])
_query, binds, prepared = Job.connection.send(:to_sql_and_binds, relation.arel)
expect(binds.size).to eq 0 # <-- Not a bind parameter
expect(prepared).to eq true # <-- Yikes 🥵

# Experiment 3: Arel query with QueryAttribute
relation = Job.where(Job.arel_table['scheduled_at'].lt(ActiveRecord::Relation::QueryAttribute.new('scheduled_at', Time.current, ActiveRecord::Type::DateTime.new)))
# =>  Job Load (0.1ms)  SELECT "good_jobs".* FROM "good_jobs" WHERE scheduled_at < $1  [["scheduled_at", "2024-03-31 16:34:11.064614"]]
expect(relation.to_a).to eq([job])
_query, binds, prepared = Job.connection.send(:to_sql_and_binds, relation.arel)
expect(binds.size).to eq 1 # <-- Looking good! 🙌
expect(prepared).to eq true # <-- Yes! 👏

That very last option is the good one because it has the bind parameter ($1) and then the Active Record logger will show the values in the nested array next to the query. The successful combination uses:

  • Arel comparable syntax
  • Wrapping the value in an ActiveRecord::Relation::QueryAttribute

Note that many Active Record queries will automatically do this for you, but not all of them. In this particular case, it’s because I use the “less than” operator, whereas equality does make it preparable. You’ll have to inspect each query yourself. For example, it’s also necessary with Arel#matches/ILIKE. It’s also possible to temporarily disable prepared statements within a block in the undocumented (!) Model.connection.unprepared_statement { your_query }.

The above code is true as of Rails 7.1. Jean Boussier has improved Active Record in newer (currently unreleased) Rails to also properly bind Job.where("scheduled_at < ?", Time.current) query syntax too 🙇

Update: I realized I didn’t try beginless/endless range values. Good news: they create bind parameters 🎉

# Experiment 4: beginless range
relation = Job.where(scheduled_at: ...Time.current)
# =>  Job Load (0.1ms)  SELECT "good_jobs".* FROM "good_jobs" WHERE scheduled_at < $1  [["scheduled_at", "2024-03-31 16:34:11.064614"]]
expect(relation.to_a).to eq([job])
_query, binds, prepared = Job.connection.send(:to_sql_and_binds, relation.arel)
expect(binds.size).to eq 1 # <-- Looking good! 🙌
expect(prepared).to eq true # <-- Yes! 👏

Low-effort prototyping

From Bunnie Hung’s blog about exploring and designing an infrared chip imaging rig. I thought this is an interesting distinction between “low-effort” and “rapid” prototypes. I think the analogy in software would be a “Walking Skeleton that is production-like in architecture and deployment but does very little, versus building a demo using lightweight scripting and static site generators. (bolded text mine)

Sidebar: Iterate Through Low-Effort Prototypes (and not Rapid Prototypes)

With a rough idea of the problem I’m trying to solve, the next step is build some low-effort prototypes and learn why my ideas are flawed.

I purposely call this “low-effort” instead of “rapid” prototypes. “Rapid prototyping” sets the expectation that we should invest in tooling so that we can think of an idea in the morning and have it on the lab bench by the afternoon, under the theory that faster iterations means faster progress.

The problem with rapid prototyping is that it differs significantly from production processes. When you iterate using a tool that doesn’t mimic your production process, what you get is a solution that works in the lab, but is not suitable for production. This conclusion shouldn’t be too surprising – evolutionary processes respond to all selective pressures in the environment, not just the abstract goals of a project. For example, parts optimized for 3D printing consider factors like scaffolding, but have no concern for undercuts and cavities that are impossible to produce with CNC processes. Meanwhile CNC parts will gravitate toward base dimensions that match bar stock, while minimizing the number of reference changes necessary during processing.

So, I try to prototype using production processes – but with low-effort. “Low-effort” means reducing the designer’s total cognitive load, even if it comes at the cost of a longer processing time. Low effort prototyping may require more patience, but also requires less attention. It turns out that prototyping-in-production is feasible, and is actually the standard practice in vibrant hardware ecosystems like Shenzhen. The main trade-off is that instead of having an idea that morning and a prototype on your desk by the afternoon, it might take a few days. And yes – of course there ways to shave those few days down (already anticipating the comments informing me of this cool trick to speed things up) – but the whole point is to not be distracted by the obsession of shortening cycle times, and spend more attention on the design. Increasing the time between generations by an order of magnitude might seem fatally slow for a convergent process, but the direction of convergence matters as much as the speed of convergence.

More importantly, if I were driving a PCB printer, CNC, or pick-and-place machine by myself, I’d be spending all morning getting that prototype on my desk. By ordering my prototypes from third party service providers, I can spend my time on something else. It also forces me to generate better documentation at each iteration, making it easier to retrace my footsteps when I mess up. Generally, I shoot for an iteration to take 2-4 weeks – an eternity, I suppose, by Silicon Valley metrics – but the two-week mark is nice because I can achieve it with almost no cognitive burden, and no expedite fees.

I then spend at least several days to weeks characterizing the results of each iteration. It usually takes about 3-4 iterations for me to converge on a workable solution – about a few months in total. I know, people are often shocked when I admit to them that I think it will take me some years to finish this project.

A manager charged with optimizing innovation would point out that if I could cut the weeks out where I’m waiting to get the prototype back, I could improve the time constant on an exponential and therefore I’d be so much more productive: the compounding gains are so compelling that we should drop everything and invest heavily in rapid prototyping.

However, this calculus misses the point that I should be spending a good chunk of time evaluating and improving each iteration. If I’m able to think of the next improvement within a few minutes of receiving the prototype, then I wasn’t imaginative enough in designing that iteration.

That’s the other failure of rapid prototyping: when there’s near zero cost to iterate, it doesn’t pay to put anything more than near zero effort into coming up with the next iteration. Rapid-prototyping iterations are faster, but in much smaller steps. In contrast, with low-effort prototyping, I feel less pressure to rush. My deliberative process is no longer the limiting factor for progress; I can ponder without stress, and take the time to document. This means I can make more progress every step, and so I need to take fewer steps.

Farewell Brompt

I’m planning to shut down Brompt, which I previously wrote about in 2008, 2011, and 2022. I archived the code on GitHub.

Let’s do a final confetti drop 🎉 together.

Farewell 👋 Brompt is shutting down

I have some sad news to share: I’m planning to shut down this service, Brompt, at the end of the month (February, 2024).

Shutting down Brompt means that you’ll no longer receive these automated reminders for your blog or writing. I’ve been running Brompt since 2008 and unfortunately, I haven’t been able to make it sustainable. You’re one of about 80 people who are still using the service, though I’m never sure if you or anyone ever opens the emails regularly.

Regardless of Brompt shutting down, I hope you’re doing well and I’d love to stay in touch. Send me an email at [email protected] or read my own blog at https://island94.org

All the best,

Ben (the person who made Brompt)

Screenshots from Brompt

Replacing Devise with Rails has_secure_password and friends

I love the Devise user-authentication gem. I’ve used it for years, and I recently moved off of it in one of my personal apps and replaced it with Rails’s built-in has_secure_password and generates_secure_token and a whole bunch of custom controllers and helpers and code that I now maintain myself. I do not recommend this! User authentication is hard! Security is hard!

And… maybe you need to walk the same path too. So I want to share what I learned through the process.

Ok, so to back up, why did I do this?

  • Greater compatibility with Rails main. My day job runs Rails main, and I’m more frequently contributing to Rails development; I’d like to run my personal projects on Rails main too. When I looked back on upgrade-blocking gems, Devise (and its dependencies, like Responders) topped my list.
  • More creative onboarding flows. I’ve twisted Devise quite a bit (it’s great!) to handle the different ways I want users to be able to register (elaborate onboarding flows, email-only subscriptions, optional passwords, magic logins). I’ve already customized or overloaded nearly every Devise controller and many model methods, so it didn’t seem like such a big change anyway.
  • Hubris. I’ve built enterprise auth systems from scratch, managed the Bug Bounty program, and worked with security researchers. I have seen and caused and fixed some shit. (Fun fact: I have been paid for reporting auth vulnerabilities on the bug bounty platforms themselves.) I know that even if it’s not a bad idea for me, it’s not a great idea either. Go read all of the Devise-related CVEs; seriously, it’s a responsibility.

That last bit is why this blog post will not be like, “Here’s everything you need to know and do to ditch Devise.” Don’t do it! Instead, here’s some stuff I learned that I want to remember for the next app I work on.

A test regime

I went back through all of my system tests for auth, and here is a cleaned-up, though not exhaustive list of my scenarios and assertions. It seems like a lot. It is! There are also unit tests for models and controllers and mailers and separate API tests for the iOS and Android apps. Don’t take this lightly! (Remember, many of these are specific to my custom onboarding flows).

  • When a new user signs up for an account
    • Their email is valid, present and stored; password is nil.
    • They are not confirmed
    • They receive a confirmation email
    • If not confirmed, registering again with the same email resends the confirmation email but does not leak account presence
    • If the email associated account already exists and is confirmed, sends a “you already have an account” email and does not leak account presence.
    • Following the link in the confirmation email confirms the new account and redirects to the account setup page.
  • When a user sets up their account
    • They can assign a username and password
    • A password cannot be assigned if a password already exists
    • A username cannot be assigned if a username already exists
    • If a username and password already exist, the setup page redirects to the account update page
    • The account update page redirects to the setup page if a username or password does not yet exist
    • Signing in with an unsetup account redirects to setup page
    • Resetting password with an unsetup account redirects to setup page
    • Adding a password invalidates reset-password links.
  • When a user updates their account
    • The current password is required to update email, username, or password.
    • When the email address is changed, a new confirmation email is sent out to that email address.
    • An email change confirmation can be confirmed with or without an active session.
    • If the email address is already confirmed by a different account, send the “you already have an account” email and do not leak account presence.
    • Multiple accounts can have the same unconfirmed email address.
  • When a user performs a password reset
    • Can’t be accessed with an active session
    • Link is invalidated after 20 minutes, or when email, or password changes.
    • Can be performed on an unsetup account
    • Confirms an email but not an email change
    • Signs in the user
    • Does not leak account presence
    • Is throttled to only send once a minute.
  • When a user performs or resends an email confirmation
    • Can be accessed with an active session.
    • Cannot resend confirmation of an email change without an active session.
    • Link is invalidated after 20 minutes, or when email, unconfirmed email, confirmed at, or password changes.
    • Signs in the user
    • Does not leak account presence
    • Is throttled to only send once a minute.
    • When user is already confirmed, send them an email with a link to reset their password
  • When a user signs into a session
    • Requires a valid email or username, and password
    • Cannot sign in with a nil, blank, or absent password param (unsetup account)
    • Session is invalidated when email or password changes.
    • Does not leak account presence with missing or invalid credentials
    • Redirects to the session[:return_to] path if present, otherwise the root path.

Using has_secure_password

This was a fairly simple change. I had to explicitly add bcrypt to the gemfile, and then add to my User model:

# models/user.rb
alias_attribute :password_digest, :encrypted_password
has_secure_password :password

I’ll eventually rename the database column, but this was a zero-migration change.

Also, you might need to use validations: false on has_secure_password and implement your own validations if you have custom onboarding flows like me. Read the docs and the Rails code.

When authenticating on sign in, you’ll want to use User.authenticate_by(email:, password:), which is intended to avoid timing attacks.

Using generates_token_for

The generates_token_for methods are new in Rails 7.1 and really nice. They create a signed token containing the user id and additional matching attributes and it doesn’t need to be stored in the database:

# models/user.rb
generates_token_for :email_confirmation, expires_in: 30.minutes do
  [confirmed_at, email, unconfirmed_email, password_salt]
end

generates_token_for :password_reset, expires_in: 30.minutes do
  [email, password_salt]
end

I’ll explain that password_salt in a bit.

To verify this, you want to do use something like this: User.find_by_token_for(:email_confirmation, value_from_the_link).

btw security: when you put a link in an email message, you can only use a GET , because emails can’t reliably submit web forms (some clients can, but it’s weird and unreliable). So your link is going to look like https://example.com/account/reset_password?token=blahblahblahblahblah. If there is any links to 3rd party resources like script tags or off-domain images, you will leak the token through the referrer when the page is loaded with the ?token= in the URL. Devise never fixed it (😱) . What you should do is take value out of the query param and put it in the session and redirect back to the same page without the query parameter and use the session value instead. (Fun fact: this is a bug bounty that got me paid.)

Authenticatable salts

Here’s where I explain that password_salt value.

There’s several places I’ve mentioned where tokens and sessions should be invalidated when the account password changes. When bcrypt stores the password digest in the database, it also generates and includes a random “salt” value that changes every time the password changes. Comparing that salt is a proxy for “did the password change?” and it’s safer to embed that random salt in cookies and tokens instead of the user’s hashed password.

Devise uses the first 29 characters of the encrypted password (which is technically the algorithm, cost and salt):

# models/user.rb
def authenticatable_salt
  encrypted_password[0, 29] if encrypted_password
end

But it’s also possible to simply get the salt. I dunno if the difference matters (tell me!):

# models/user.rb
def password_salt
  BCrypt::Password.new(password_digest).salt[-10..] if password_digest
end

A nice session

There’s a lot to write about creating sessions and remember-me cookies, that I won’t be writing here. The main thing to note is that I’m storing and verifying both the user id and their password salt in the session; that means all of their session are invalidated when they change their password:

# app/controllers/application_controller.rb
UNASSIGNED = Module.new
USER_SESSION_KEY = "_yoursite_user".freeze

def initialize
  super
  @_current_user = UNASSIGNED
end

def sign_in(user)
  session[USER_SESSION_KEY] = [user.id, user.password_salt]
end

def current_user
  return @_current_user unless @_current_user == UNASSIGNED

  # Check if the user was already loaded by route helpers
  @_current_user = if request.env.key?("current_user")
                     request.env["current_user"]
                   else
                     user_id, password_salt = session[USER_SESSION_KEY]
                     User.find_by_id_and_password_salt(user_id, password_salt) if user_id && password_salt
                   end
end

In doing this project I learned that Rail’s cookies will magically serialize/deserialize arrays and hashes. I’ve been manually and laboriously converting them into JSON strings for years 🥵

btw, if that UNASSIGNED stuff is new to you, go read my Writing Object Shape friendly code in Ruby.

Rotating active sessions

This is a little extra but I wanted the switchover to be transparent to users. To do so, I read from Devise’s active sessions and then create a session cookie using the new format. It looks something like this:

# controllers/application_controller.rb
before_action :upgrade_devise_session

def upgrade_devise_session
  # Devise session structure: [[USER_ID],"AUTHENTICATABLE_SALT"]
  if session["warden.user.user.key"].present?
    user_id = session["warden.user.user.key"].dig(0, 0)
    user_salt = session["warden.user.user.key"].dig(1)
  elsif cookies.signed["remember_user_token"].present?
    user_id = cookies.signed["remember_user_token"].dig(0, 0)
    user_salt = cookies.signed["remember_user_token"].dig(1)
  end
  return unless user_id.present? && user_salt.present?

  # Depending on your deploy/rollout strategy ,
  # you may want need to retain and dual-write both
  # Devise and new user session values instead of this.
  session.delete("warden.user.user.key")
  cookies.delete("remember_user_token")

  user = User.find_by(id: user_id)
  sign_in(user) if user && user.devise_authenticatable_salt == user_salt
end

Route helpers

Devise mixes some nice helper methods into Rails’s routing DSL like authenticated; they’re even necessary if you need to authenticate Rails Engines that can’t easily access the app’s ApplicationController methods. Here’s how to recreate them using Route Constraints and monkeypatching ActionDispatch::Routing::Mapper (that’s how Devise does it)

# app/constraints/current_user_constraint.rb
class CurrentUserConstraint
  def self.matches?(request)
    new.matches?(request)
  end

  def initialize(&block)
    @block = block
  end

  def matches?(request)
    current_user = if request.env.key?("current_user")
                      request.env["current_user"]
                    else
                      user_id, password_salt = request.session[USER_SESSION_KEY]
                      request.env["current_user"] = User.find_by_id_and_password_salt(user_id, password_salt) if user_id && password_salt
                   end

    if @block
      @block.call(current_user, request)
    else
      current_user.present?
    end
  end
end


# config/routes.rb
module ActionDispatch
  module Routing
    class Mapper
      def authenticated(&)
        scope(constraints: CurrentUserConstraint, &)
      end

      def unauthenticated(&)
        scope(constraints: CurrentUserConstraint.new { |user| user.blank? }, &)
      end

      def admin_only(&)
        scope(constraints: CurrentUserConstraint.new { |user| user&.admin? }, &)
      end
    end
  end
end

Rails.application.routes.draw do
  # ...
  authenticated do
    resources :special_somethings
  end
end

Because routing happens before a controller is initialized, the current user is put into request.env so that the controller won’t have to query it a second time from the database. This could also be done in a custom Rack Middleware.

If you want to put stuff into not-the-session cookies, those cookies can be accessed via request.cookie_jar, e.g., request.cookie_jar.permanent.encrypted["_my_cookie"].

Closing thoughts

That was all the interesting bits for me. I also learned quite a bit poking around Dave Kimura’s ActionAuth (thank you!), and am thankful for the many years of service I’ve gotten from Devise.


Newer posts Older posts