Writing Object Shape friendly code in Ruby

Update: Jean Boussier wrote a deeper explaination of how Ruby Object Shapes are implemented (and more up-to-date for Ruby 3.3, unreleased as of October 23, 2023) and when and how to optimize for them.

My rule of thumb is that one or two memoized variables in a class are fine, but more than that likely deserve a quick refactor.

My original post is below…

Ruby 3.2 includes a performance optimization called Object Shapes, that changes how the Ruby VM stores, looks up, and caches instances variables (the variables that look like @ivar) . YJIT also takes advantage of Object Shapes, and the upcoming Ruby 3.3 has further improvements that improve the performance of Object Shapes.

This is a brief blog post about how to write your own Ruby application code that is optimized for Object Shapes. If instead you’d like to learn more about how Object Shapes is implemented in Ruby, watch Aaron Patterson’s RubyConf 2022 video or read this explanation from Ayush Poddar .

Big thank you to my colleagues John Hawthorn and Matthew Draper for feedback on the coding strategies described here. And John Bachir, Nate Matykiewicz, Josh Nichols, and Jean Boussier whose conversation in Rails Performance Slack inspired it.

The general rule: define your instance variables in the same order every time

To take advantage of Object Shape optimizations in your own Ruby Code, the goal is to minimize the number of different shapes of objects that are created and minimize the number of object shape transitions that occur while your application is running:

  • Ensure that instances of the same class share the same object shape
  • Ensure that objects do not frequently or unnecessarily transition or change their shape
  • Help objects that could share the same object shape (e.g. substitutable child classes) to do so, with reasonable effort and without compromising readability and maintainability.

This succinct explanation is from Ayush Poddar, and explains the conditions that allow objects to share a shape:

New objects with the same [instance variable] transitions will end up with the same shape. This is independent of the class of the object. This also includes the child classes since they, too, can re-use the shape transitions of the parent class. But, two objects can share the same shape only if the order in which their instance variables are set is the same.

That’s it, that’s what you have to do: if you want to ensure that two objects share the same shape, make sure they define their instance variables in the same order. Let’s start with a counterexample:

# Bad: Object Shape unfriendly
class GroceryStore
  def fruit
    @fruit = "apple"
  end

  def vegetable
    @vegetable = "broccoli"
  end
end

# The "Application"
alpha_store = GroceryStore.new
alpha_store.fruit # defines @fruit first
alpha_store.vegetable # defines @vegetable second

beta_store = GroceryStore.new
beta_store.vegetable # defines @vegetable first
beta_store.fruit # defines #fruit second 

In this example, alpha_store and beta_store do not share the same object shape because the order in which their instance variables are defined depends on the order the application calls their methods. This code is not Object Shape friendly.

Pattern: Define your instance variables in initialize

The simplest way to ensure instance variables are defined in the same order every time is to define the instance variables in #initialize:

# Good: Object Shape friendly
class GroceryStore
  def initialize
    @fruit = "apple"
    @vegetable = nil # declare but assign later
  end

  def fruit
    @fruit
  end

  def vegetable
    @vegetable ||=  "broccoli"
  end
end

It’s also ok to define instance variables implicitly with attr_* methods in the class body, which has the same outcome of always defining the instance variables in the same order. Update: Ufuk Kayserilioglu informed me that attr_* do not define the instance variable until they are first called, meaning that these methods or their associated instance variables should also be declared with a value in #initialize.

Now I realize this is a very simplistic example, but that’s really all there is to it. If it makes you feel better, at GitHub where I work, we have classes with upwards of 200 instance variables. In hot code, where we have profiled, we go to a negligible effort of making sure those instance variables are defined in the same order; it’s really not that bad!

Pattern: Null memoization

Using instance variables to memoize values in your code may present a challenge when nil is a valid memoized value. This is a common pattern in Ruby that is not Object Shape friendly:

# Bad: Object Shape unfriendly
class GroceryStore
  def fruit
    return @fruit if defined?(@fruit)
    @fruit = an_expensive_operation
  end
end

Rewrite this by creating a unique NULL constant and check for its presence instead:

# Good: Object Shape friendly
class GroceryStore
  NULL = Object.new
  NULL.freeze # not strictly necessary, but makes it Ractor-safe

  def initialize
    @fruit = NULL
  end

  def fruit
    return @fruit unless @fruit == NULL
    @fruit = an_expensive_operation 
  end
end

Alternatively, if you’re doing a lot of meta or variable programming and you need an arbitrary number of memoized values, use a hash and key check instead:

# Good: Object Shape friendly
class GroceryStore
  def initialize
    @produce = {}
  end

  def produce(type)
    return @produce[type] if @produce.key?(type)
    @produce[type] = an_expensive_operation(type) 
  end
end

That’s it

Creating Object Shape friendly code is not very complicated!

Please reach out if there’s other patterns I’m missing: [email protected] / twitter.com/@bensheldon / ruby.social/@bensheldon


In defense of consensus

There’s a style of reactionary meme that takes a photo of like, empty store-shelves or a trash-strewn street, and applies the image macro “This is what Communism looks like”. But upon closer inspection (and social media lampooning), it’s a photo of America, capitalist America, very much not under communism. We’ll come back to this.

Let’s talk about “consensus”. Not a week goes by in my Bay Area tech worklife where I don’t read or hear someone dragging consensus. Consensus is pilloried: weak, indecisive, lowest-common denominator, unclear, drawn out… consensus is bad, they say.

Working in tech for a decade, I have to admit this struck me as strange the first time I heard a coworker complain about that bogeyman “consensus”. I’ve been a facilitator of consensus-based practice for 13 years. These practices, taught to me through the Institute for Cultural Affair’s ToP (“Technology of Participation”) series, served me well when I was doing nonprofit and community work, serving on boards and facilitating offsites. And consensus-based practices have served me well in tech and business too: using its methods to do discovery, lead meetings, get feedback, and drive decision-making and action. I do strategic planning consultation too.

The consensus-based practices I’ve learned take a group through a process: beginning with a prompt or need, then collecting facts and inputs, understanding people’s reactions to them, their interpretations and implications, and ultimately describing a series of actions and commitments to take. This can be a simple conversation, or a multi-day event that builds fractally on itself: a preceding session’s final actions could be deciding on what will be the following session’s initial inputs. When I’m working with leaders to design the process, we’ll discuss what responsibilities we want to delegate to the group, and what decisions will be retained among leadership. Leadership remains accountable, in the sense that there is a legible decision-making process, which is a strong benefit of deliberative practice. That’s “consensus”.

Alternatively, the Bay Area tech process, not “consensus”, oh no, seems to follow these recipes:

  • Plan and socialize
  • Disagree and commit

I was introduced to “plan and socialize” in my second tech job, being mentored by the Director of Engineering. To “socialize” is more than informing people, it’s having conversations and helping them understand how a plan or proposal will affect their work, and getting feedback that might lead to adjustments or compensatory actions. It’s also somewhat vague: asking people to leave comments in a google doc, attend an office hours, or a loosely moderated feedback session. Decisions, once made, are also socialized: explained, defended, adjusted, or white-knuckled through.

Depending on their power level, leaders may then ask people to “disagree and commit” meaning that the (negative) feedback has been heard but those underlings must commit to carrying the plan out regardless. Suck it up, professionally, so to speak. Sometimes this is used as performance feedback: “I’m aware you’ve been sharing your dislike of the plan with coworkers. That lack of trust is undermining the business. I need you to disagree and commit”… and keep your thoughts to yourself.

Under the spotlight, these approaches look less like bold and steely decision-making, and more like mumbly plan shifting backed by blusterful threats. Like the “this is what communism looks like”-meme, the scary-othered threat is not “consensus” but simply the current reality: confused, inadequate, probationary, triangulating, embarrassing, shameful.

There’s a joke in civic tech: government tech projects may say they can’t do incremental development, but that’s exactly what happens after their big-bang waterfall launch crashes-and-burns and they end up having to fix it one piece at a time. Clay Shirky captures it in “How Willful Ignorance Doomed HealthCare.gov”:

It is hard for policy people to imagine that HealthCare.gov could have had a phased rollout, even while it is having one. At launch, on Oct. 1, only a tiny fraction of potential users could actually try the service. They generated errors. Those errors were handed to a team whose job was to improve the site, already public but only partially working. The resulting improvements are incremental and put in place over a period of months. That is a phased rollout, just one conducted in the worst possible way.

Bay Area tech has the same relationship to decisions and consensus: by “socializing” plans and decisions, leaders are trying to craft a deliberativeu process for information sharing, feedback gathering, and alignment building. They’re simply doing it after they’ve already written and decided on an insufficient course of action and are grasping for a fix. Ultimately they are reaching for consensus, just consensus conducted in the worst possible way.

Please think of this the next time you hear (or say) something bad about consensus. Consensus is pretty great, and even better when used from the start.

The Institute for Cultural Affairs has lots of trainings on consensus-based facilitation. The Center for Strategic Facilitation is the Bay Area’s local trainer and service provider, but there are trainers and service providers all over the globe.

There is a system known as “Formal Consensus” which gained some notability during the 1999 “Battle of Seattle” WTO protests as a means of empowering small groups, particularly indigenous representatives, by providing a limited and fixed number of “blocks” during deliberations to stop actions proposed by far larger groups. Also how my buddy organized FreeGeek Chicago. I have never heard anyone in Bay Area tech reference any of this in regards to what they mean by consensus.


Appropriately using Ruby’s Thread.handle_interrupt

Working on GoodJob, I spend a lot of time thinking about multithreaded behavior in Ruby. One piece of Ruby functionality that I don’t see written about very often is Thread.handle_interrupt. Big thanks to John Bachir and Matthew Draper for talking through its usage with me.

Some background about interrupts: In Ruby, exceptions can be raised anywhere and at anytime in a thread by other threads (including the main thread, that’s how timeout works). Even rescue and ensure blocks can be interrupted. Everywhere. Most of the time this isn’t something you need to think about (unless you’re using Timeout or rack-timeout or doing explicit multithreaded code). But if you are, it’s important to think and code defensively.

Starting with an example:

thread = Thread.new do 
  open_work
  do_work
ensure
  close_work
end

# wait some time
thread.kill # or thread.raise

In this example, it’s possible that the exception raised by thread.kill will interrupt the middle of the ensure block. That’s bad! And can leave that work in an inconsistent state.

Ruby’s Thread.handle_interrupt is the defensive tool to use. It allows for modifying when those interrupting exceptions are raised:

# :immediate is the default and will interrupt immediately
Thread.handle_interrupt(Exception: :immediate) { close_work }

# :on_blocking interrupts only when the GVL is released 
# e.g. IO outside Ruby
Thread.handle_interrupt(Exception: :on_blocking) { close_work }

# :never will never interrupt during that block
Thread.handle_interrupt(Exception: :never) { close_work }

Thread.handle_interrupt will modify behavior for the duration of the block, and it will then raise the interrupt after the block exits. It can be nested too:

Thread.handle_interrupt(Exception: :on_blocking) do
  ruby_stuff
  
  Thread.handle_interrupt(Exception: :never) do
   really_important_work
  end
  
  file_io # <= this can be interrupted
end  

FYI BE AWARE: Remember, the interrupt behavior is only affected within the handle_interrupt block. The following code has a problem:

thread = Thread.new do 
  open_work
  do_work
ensure
  Thread.handle_interrupt(Exception: :never) do
    close_work
  end
end

Can you spot it? It’s right here:

ensure
  # <- Interrupts can happen right here
  Thread.handle_interrupt(Exception: :never) do

There’s a “seam” right there between the ensure and the Thread.handle_interrupt where interrupts can happen! Sure, it’s probably rare that an interrupt would hit right then and there, but if you went to the trouble to guard against it, it’s likely very bad if it did happen. And it happens! “Why puma workers constantly hung, and how we fixed by discovering the bug of Ruby v2.5.8 and v2.6.6”

HOW TO USE IT APPROPRIATELY: This is the pattern you likely want:

thread = Thread.new do 
  Thread.handle_interrupt(Exception: :never) do
    Thread.handle_interrupt(Exception: :immediately) do
      open_work
      do_work
    end
  ensure
    close_work
  end
end

That’s right: have the ensure block nested within the outer Thread.handle_interrupt(Exception: :never) so that interrupts cannot happen in the ensure, and then use a second Thread.handle_interrupt(Exception: :immediately) to allow the interrupts to take place in the code before the ensure block.

There’s another pattern you might also be able to use with :on_blocking:

thread = Thread.new do 
  Thread.handle_interrupt(Exception: :on_blocking) do
    open_work
    do_work
  ensure
    Thread.handle_interrupt(Exception: :never) do
      close_work
    end
  end
end

Doesn’t that have the problematic seam? Nope, because when under :on_blocking there isn’t an operation taking place right there would release the GVL (e.g. no IO).

But it does get tricky if, for example, the do_work is some Ruby calculation that is unbounded (I dunno, maybe a problematic regex or someone accidentally decided to calculate prime numbers or something). Then the Ruby code will not be interrupted at all and your thread will hang. That’s bad too. So you’d then need to do something like this:

thread = Thread.new do 
  Thread.handle_interrupt(Exception: :on_blocking) do
    open_work
    Thread.handle_interrupt(Exception: :immediately) do
      do_work
    end
  ensure
    Thread.handle_interrupt(Exception: :never) do
      close_work
    end
  end
end

See, it’s ok to nest Thread.handle_interrupt and likely necessary to achieve the safety and defensiveness you’re expecting.


Fake the algorithm til you make it

Almost exactly a decade ago I worked at OkCupid Labs where small teams (~2 engineers, a designer, and a fractional PM) would build zero-to-one applications. It was great fun and I worked mainly on Ravel! though bopped around quite a bit too.

With small teams and quick timelines, I learned a lot about where to invest time in early social apps (onboarding and core loop) and where not to (matchmaking algorithms). The following is lightly adapted from a bunch of comments I wrote on an r/webdev post a few years ago, asking for “Surprisingly simple web apps?”. My response was described as “one of the more interesting things I’ve read on reddit in 5 years”:

If you’re looking for inspiration, what is successful today is likely more complex than it was when it was originally launched. Twitter, Tinder, Facebook all likely launched with simple CRUD and associations, and only later did they get fancy algorithms. Also, Nextdoor, Grindr, Yelp [this was 2013].

I used to work on social and dating apps and it is all “fake it till you make it”. The “secret sauce” is bucket by distance intervals, then order by random using the user-id as a seed so it’s determinist, but still just random sort. Smoke and mirrors and marketing bluster.

You see this “Secret Sauce” marketing a lot. An app will imply that they have some secret, complex algorithm that no other competitor has. The software equivalent of “you can get a hamburger from anywhere, but ours has our secret sauce that makes it taste best”. But that can be bluster and market positioning rather than actually complexity. In truth, it’s secretly mayo, ketchup and relish. Or as I’ve encountered building apps, deterministic random.

Imagine you have a dating/social app and you want to have a match-making algorithm. You tell your users that you have the only astrologist datascience team building complex machine-learning models that can map every astronomical body in the known universe to individual personality traits and precisely calculate true love to the 9th decimal.

In truth, you:

  • For the current user, bucket other users by distance: a bucket of users that are less than 5km away; less than 25km; less than 100km; and everyone else. Early social app stuff is hard because you have a small userbase but you need to appear to be really popular, so you may need to adjust those numbers; also a reason to launch in a focused market.
  • Within each distance bucket, simply sort the users by random, seeded by the user id of the current user (Postgres setseed). That way the other people will always appear in the same order to the current user.

It works on people’s confirmation bias: if you confidently tell someone that they are a match, they are likely to generate their own evidence to support that impression. You don’t even have to do the location bucketing either, but likely you want to give people something that is actionable for your core loop.

And remember, this is really about priorities in the early life of a product. It’s not difficult to do something complex, but it takes time and engaged users to dial it in; so that’s why you don’t launch with a real algorithm.

This is all really easy to do with just a relational database in the database, no in-memory descent models or whatever. Here’s a simple recommendation strategy for t-shirts (from my Day of the Shirt), in SQL for Ruby on Rails:

For a given user, find all of the t-shirts they have favorited, then find all of the users that have also favorited those t-shirts and strength them based on who has favorited the most t-shirts in common with the initial user, and then find all of the t-shirts those users have favorited, multiply through the counts and strengths, sum and order them. There’s your recommended t-shirts:

class Shirts < ApplicationRecord
  # ...
  scope :order_by_recommended, lambda { |user|
    joins(<<~SQL.squish).order('strength DESC NULLS LAST')
      LEFT JOIN (
        WITH recommended_users AS (
          SELECT user_id, count(*) AS strength
          FROM favorite_shirts_users
          WHERE
            shirt_id IN (
              SELECT shirt_id
              FROM favorite_shirts_users
              WHERE #{sanitize_sql_for_conditions(['user_id = ?', user.id])}
            )
          GROUP BY user_id
        )
        SELECT shirt_id, SUM(strength) AS strength
        FROM favorite_shirts_users
        LEFT JOIN recommended_users ON recommended_users.user_id = favorite_shirts_users.user_id
        GROUP BY shirt_id
      ) AS recommended_shirts ON recommended_shirts.shirt_id = shirts.id
    SQL
  }
end

That’s a relatively lightweight strategy, that you can run in real-time and if there is enough engagement can appear effective. And if you don’t have enough engagement, again, enrich it with some deterministically random results.

It’s basic but you can also add in other kinds of engagement and weigh them differently or whatever. It’s all good. Then you have massive success and hire a real datascience team.


How to isolate I18n configuration in a Rails Engine

GoodJob is a multithreaded, Postgres-based, Active Job backend for Ruby on Rails.

GoodJob includes an administrative web dashboard that is packaged as a mountable Rails Engine. The dashboard is currently translated into 8 different languages: English, German, Spanish, French, Japanese, Dutch, Russian, Turkish, and Ukrainian (I’d love your help improving these and translating additional languages too). Demo here: https://goodjob-demo.herokuapp.com/

I have learned quite a lot during the GoodJob development process about internationalizing a Rails Engine. I’ve previously worked on rather large and complicated localized government welfare applications, so I’m familiar with localization in the context of a Rails app. But internationalizing a Rails Engine was new, getting it right was harder than I expected, and I had trouble finding documentation and code examples in other libraries.

Overall, internationalizing a Rails Engine was nearly identical to the process of internationalizing a Rails Application as covered in the Rails Guides: using the I18n library and extracting strings from ERB views into keyed configuration files (e.g. config/locales/en.yml) and replacing them with <%= t(".the.key") %> . Simple.

The difficult part was separating and isolating GoodJob’s I18n configuration from the parent applications.

Why is it necessary to isolate I18n?

As a mountable Rails Engine, GoodJob’s dashboard sits within a parent Rails application. GoodJob should step lightly.

The I18n library provides a number of configuration options:

  • I18n.current_locale
  • I18n.default_locale
  • I18n.available_locales
  • I18n.enforce_available_locales , which will raise an exception if the locale is switched to one not contained within the set of available locales.

It’s possible that GoodJob’s administrative web dashboard would have different values for these than the parent Rails Application. Imagine: An English and Ukrainian speaking development and operations team administering a French and German language only website. How to do it?

Isolating configuration values

I18n configuration needs to be thread-local, so that a multithreaded webserver like Puma can serve a web request to the GoodJob Dashboard in Ukrainian (per the previous scenario) while also serving a web request for the parent Rails application in French (or raise an exception if someone tries to access it in Italian).

Unfortunately, I18n.current_locale is the only configuration value that delegates to a thread-locale variable. All other configuration values are implemented as global @@ class variables on I18n.config. This makes sense when thinking of a monolithic application, but not when a Rails application is made up of multiple Engines or components that serve different purposes and audiences (the frontend visitor and the backend administrator). I struggled a lot figuring out a workaround for this, until I discovered that I18n.config is also thread-local.

Swap out the entire I18n.config value with your Engine’s own I18n::Config-compatible object:

# app/controllers/good_job/application_controller.rb
module GoodJob
  class ApplicationController < ActionController::Base
    around_action :use_good_job_locale

    def use_good_job_locale(&action)
      @original_i18n_config = I18n.config
      I18n.config = ::GoodJob::I18nConfig.new
      I18n.with_locale(current_locale, &action)
    ensure
      I18n.config = @original_i18n_config
      @original_i18n_config = nil
    end
  end
end

# lib/good_job/i18n_config.rb
module GoodJob
  class I18nConfig < ::I18n::Config
    BACKEND = I18n::Backend::Simple.new
    AVAILABLE_LOCALES = GoodJob::Engine.root.join("config/locales").glob("*.yml").map { |path| File.basename(path, ".yml").to_sym }.uniq
    AVAILABLE_LOCALES_SET = AVAILABLE_LOCALES.inject(Set.new) { |set, locale| set << locale.to_s << locale.to_sym }

    def backend
      BACKEND
    end

    def available_locales
      AVAILABLE_LOCALES
    end

    def available_locales_set
      AVAILABLE_LOCALES_SET
    end

    def default_locale
      GoodJob.configuration.dashboard_default_locale
    end
  end
end

Here’s the PR with the details that also shows the various complications I had introduced prior to finding this better approach: https://github.com/bensheldon/good_job/pull/1001

Isolating Rails Formatters

The main reason I implemented GoodJob’s Web Dashboard as a Rails Engine is because I want to take advantage of all of Rail’s developer niceties, like time and duration formatters. These are also necessary to isolate, so that GoodJob’s translations don’t leak into the parent application.

First, time helper translations should be namespaced in the yaml translation files:

# config/locales/en.yml
good_job: 
  # ...
  datetime:
    distance_in_words:
    # ...
  format: 
    # ...
  # ...

Then, for each helper, here’s how to scope them down:

  • number_to_human(number, unit: "good_job.number")
  • time_ago_in_words(timestamp, scope: "good_job.datetime.distance_in_words")

By the way, there is a great repository of translations for Rails helpers here: https://github.com/svenfuchs/rails-i18n/tree/64e3b0e59994cc65fbc47046f9a12cf95737f9eb/rails/locale

Closing thoughts

Whenever I work on internationalization in Rails, I have to give a shoutout for the i18n-tasks library, which has been invaluable in operationalizing translation workflows: surfacing missing translation, normalizing and linting yaml files, making it easy to export the whole thing to a spreadsheet for review and correction, or using machine translation to quickly turn around a change (I have complicated feelings on that!).

Internationalizing GoodJob has been a fun creative adventure. I hope that by writing this that other Rails Engine developers prioritize internationalization a little higher and have an easier time walking in these footsteps. And maybe we’ll make the ergonomics of it a little easier upstream too.


I read "Recoding America" by Jen Pahlka

| Review | ★★★★★

The last time I communicated with Jen Pahlka was in the early days of the pandemic: May 2020. No longer locked down but still heavily masked, I visited the San Francisco Ferry Building Farmers Market and bought some fava beans. In the years before the pandemic, Jen had brought her own harvest of fava beans into the Code for America offices and shared them with any staff who wanted them. That had been my very first experience with fresh fava beans: shelling and boiling and shelling them a second time. Now, again with fresh fava beans in hand again, I thought of Jen amidst the pandemic turmoil. I sent her a short email hoping she was well. Jen never replied.

Jen’s book, Recoding America is a good book. It is a laundry list of scenes and vignettes of the greatest hits of government technology, albeit in language of elites discussing elite things in a bloodless elite way. Paired with a more hands-on manual like Cyd Harrell’s A Civic Technologists Practice Guide it covers the ground of the past decade.

That decade is an interesting one. Jen and I shared roughly the same tenure at Code for America: 2011 - 2020 for her, 2012 - 2022 for me; both of us with a gap in the middle. Of course, she was the founder and CEO, whereas I was a fellow, and then an engineer, then a manager, then a director. So we saw a lot of the same things, though from different vantage points on a different journey. In addition to my formal duties, which had a finger in every program at Code for America (if not an arm and leg) there were two activities that were initially happenstance but ended up turning into personal programss of mine:

First, I reached out to new hires, especially new people managers, and offered to have a casual 1:1 and welcome them to the organization. We would chitchat and the message I would work to impart was this: the dissonance between Code for America’s competent external brand and its lived internal chaos could chew people up and was ripe for gaslighting. Instead, those new hires should remember they deserved to be here, they should trust their experience and competence, and truly everyone is winging it (some nicer and more self-aware about it than others).

I’d tell them a story about a prior illuminating executive AMA with Jen where she had shared a philosophy: instead of drawing a bullseye on the wall and trying to hit it down the center with your programs and activities, you can simply throw stuff at the wall and draw the bullseye afterwards around what sticks.

Second, my desk overlooked the executive conference room, which was a glass fishbowl with a couch. It was not infrequent for people to leave that room in a mess of hot tears. When they did and I saw, I’d send them a Slack message and gently offer to buy them a coffee at the Blue Bottle around the corner from the office. There was no ulterior motive; working through it with my leadership coach, a protege of financier Carl Icahn: I simply operationalized giving a shit about people.

Compared to writing a book or leading an organization or serving as a government executive for a year, as Intel CEO Andy Grove might observe: my activities were not high leverage. They also fall outside the bullseye drawn around the stories of the book.

We’re all heroes of our own story. The book briefly touches on an employee who describes himself as “the new guy” because he’d only been a claims processor for 17 years, far less than his more senior colleagues, and still not enough to be capable of processing the department’s most complex claims. His offhand comment is a foil for a deep dive into classifications and processing bottlenecks and mythical man months, but not much else about him as a person. Or what I imagine is the multitudes contained in that brief remark. Is it self-protection, a humblebrag, an invitation for further dialogue about those 17 years, or the years ahead? I’ll never know because the focus shifts back to the administrator and her institutional processes.

In Zen and the Art of Motorcycle Maintenance, a professor advises a student who is hopelessly writers-blocked; the student wants to write about the history of the United States, but is gently redirected to first focus on one brick in one building on the main street of their college town. Writers block broken, the student cannot stop writing, successfully building towards their initial wide vision.

The first brick of this book is possibly the few people “who had all once been part of the team that keeps Google up and running, then had come to DC to help get healthcare.gov back on track.” Easily overlooked in a footnote, they now run a private consultancy for government. The book frames their work as a series of high stakes technology and leadership interventions but not their personal stories, motivations, finances (business and personal), deal flow, engagement philosophy and practices, their loves, losses, missed opportunities, sacrifices, shames, human complexities, ironies, paradoxes. There is no crying, no dying, no problematic influencers, no sketchy investors nor strange bedfellows, no grudgefull quitting, no harassment, no union busting. Nor ziplines, chickens, joyful tears, pirate flags, PDF-form frostings from the sexy-cake shop down the street, nor fava beans for that matter.

In reviewing why too much law and policy ultimately ends up in the courts, Recoding America references a criticism by Ezra Klein: “liberals are too often missing or too timid to claim: a vision of what the law is for.” Recoding America similarly fails here to share a compelling vision of just what civic technology is for. It describes the work of pushing the rock, but without a destination in mind. Every over, under, and through of the bear hunt, without describing the bear. A road built by walking… a division of men gathering wood for a ship… you get the idea.

In all, if you care about civic technology and want to know the major story points: read the book. I am hoping it moves the Overton Window on the stories people will tell about civic tech. Please, someone write our movement’s _Sex and Broadcasting, the humanistic tome of my prior career in community media and community technology. (Sociologist’s Karina Alexis Rider’s “Volunteering the Valley: Designing technology for the common good in the San Francisco Bay Area” suffices in the meantime.)

Lastly and memorably, Jen recounts serving on a task force addressing pandemic unemployment insurance. Writing with a vague yet startling honesty that haunts my own recollections:

The state should not have needed a task force to tell the EDD what it already knew, and it shouldn’t have needed us to secure permission to act on it. These things are never said out loud—neither the permission we had nor [administrator] Paula’s lack of it. But when we were gone, so was that permission. And soon after, for reasons that were not clear to me, a new backlog began to accrue.

This passage brings into focus the qualities that characterize my own experience with civic tech: power, permission, access, the parasocial qualities of professional relationships, and the fleeting closures of our ongoing experiments to live together in liberal democracy. I hope Jen is doing well, and though I didn’t write this explicitly in my last email, I’d love to hear from her.


Systems of utopia

I found these two essays juxtaposed in my feed reader and they both rang true. The first is Matt Stoller’s “The Long Annoying Tradition of Anti-Patriotism”, with a particular emphasis on a disordered appeal to utopia in society at large (emphasis mine):

Anti-populism, and its cousin of anti-patriotism, is alluring for our elites. Many lack faith in fellow citizens, and think the work of convincing a large complex country isn’t worth it, or may not even be possible. Others can’t imagine politics itself as a useful endeavor because they believe in a utopia. Indeed, those who believe in certain forms of socialism and libertarianism believe that politics itself shouldn’t exist, that one must perfect the soul of human-kind, and then the messy work of making a society will become unnecessary. In this frame, political institutions, like courts, corporations, and government agencies, are unimportant except as aesthetic objects.

Anti-populism and anti-patriotism leads nowhere, because these attitudes are about convincing citizens to give up their power, to give up on the idea that America is a place we can do politics to make a society.

And then closer to work, on engineering leadership in Will Larsen’s “Building personal and organizational prestige”:

In my experience, engineers confronted with a new problem often leap to creating a system to solve that problem rather than addressing it directly. I’ve found this particularly true when engineers approach a problem domain they don’t yet understand well, including building prestige.

For example, when an organization decides to invest into its engineering brand, the initial plan will often focus on project execution. It’ll include a goal for publishing frequency, ensuring content is representationally accurate across different engineering sub-domains, and how to incentivize participants to contribute. If you follow the project plan carefully, you will technically have built an engineering brand, but my experience is that it’ll be both more work and less effective than a less systematic approach.


19 sales & marketing strategies in 19 weeks

The book Traction, by Gabriel Weinberg and Justin Mares has an unstated thesis I find compelling:

Comprehensive, grinding, tryhard mediocrity over narrow, stabby, hopeful genius.

Or: boil the sales & marketing ocean with these spreadsheets and punchlists (a marketing lifestyle I associate with Penny Arcade’s former business manager with stuff like this).

There are lots of quotes from like Paul Graham, Peter Thiel, Marc Andreesen. This is what one signs up for in a book like this. Also replace “traction” with “sales and marketing”:

…spend your time constructing your product or service and testing traction channels in parallel…. We strongly believe that many startups give up way too early… You should always have an explicit traction goal you’re working toward.

…The importance of choosing the right traction goal cannot be overstated. Are you going for growth or “profitability, or something in between? If you need to raise money in X months, what traction do you need to show to do so? These are the types of questions that help you determine the right traction goal.

Once that is defined, you can work backward and set clear quantitative and time-based traction subgoals, such as reaching one thousand customers by next quarter or hitting 20 percent monthly growth targets. Clear subgoals provide accountability. By placing traction activities on the same calendar as product development and other company milestones, you ensure that enough of your time will be spent on traction.

It feels a little dumb pulling out quotes, but also I get security from seeing it not be overthought:

  • Put half your efforts into getting traction
  • Learn what growth numbers potential investors respect
  • Set your growth goals: Set quantitative numbers.
  • Find your bright spots: if not hitting quantitative numbers, who is qualitively excited and try to learn from them and replan.

And then the spreadsheets. There are pictures of spreadsheets, and descriptions of columns in spreadsheets. It’s great!

  1. How much will it cost to acquire customers through this channel?
  2. How many customers are available through this channel
  3. Are the customers that you are getting through this channel the kind of customers that you want right now?
  4. What’s the lifetime value of this customer

“…we encourage you to be as quantitative as possible, even if it is just guesstimating at first.”

And the 19 strategies:

  1. Targeting Blogs: building up from tiny outlets to large
  2. Publicity: building relationships with journalists, HARO.
  3. Unconventional PR: stunts, customer appreciation
  4. Search Engine Marketing
  5. Social and Display Ads
  6. Offline Ads
  7. Search Engine Optimization: fat-head (narrow ranking) and long-tail (lots of landing page content, content marketing farming) strategies. Google Adword’s Keyword Planner, Open Site Explorer.
  8. Content Marketing. Spend 6 months blogging do stuff that doesn’t scale (contact influential people, do guest posts, write about recent news events, nerd out)
  9. Email Marketing: with your own mailing list or advertise on other mailing lists. Transactional funnel reminders, retention, upselling/expansion, referral emails too.
  10. Viral Marketing: map out the loop.
  11. Engineering as Marketing: giveaway tools/services (lol, examples are all SEO companies building free SEO tools for marketers, not necessarily engineering)
  12. Business Development: partnerships, joint ventures, licensing, distribution, supply. Boil the ocean of potential partners, spreadsheet and pipeline it. Identify who is in charge of the partner’s metric you’re targeting, and make your cold emails forwardable. Write a memo to yourself afterwords “how the deal was done” (how long it took to get to milestones, key contacts, sticking points, partner’s specific interests and influences)
  13. Sales. It’s sales! lol, about time wasters (“have you ever brought other technology into your company?”)
  14. Affiliate Programs
  15. Existing Platforms
  16. Trade Shows. Prep, prep, prep.
  17. Offline Events. Conferences.
  18. Speaking Engagements. Answer upfront “Why are you important enough to be the one giving the talk? What value can you offer me? …then… what your startup is doing, why you’re doing it, specifically how you got to where you are or where things are going.” Recycle and reuse the same 1 or 2 talks.
  19. Community Building. Nurturing connections among your customers.

There’s not even a conclusion! Just an acknowledgement and an appendix with specific suggested goals for each category in case the short chapters weren’t boiled enough. It’s not hard, it just takes work.


Rebuilding Concurrent Ruby: ScheduledTask, Event, and TimerSet

I’ve been diving into Concurrent Ruby library a lot recently. I use Concurrent Ruby as the foundation for GoodJob where it has saved me immense time and grief because it has a lot of reliable, complex thread-safe primitives that are well-shaped for GoodJob’s needs. I’m a big fan of Concurrent Ruby.

I wanted to cement some of my learnings and understandings by writing a quick blog post to explain how some parts of Concurrent Ruby work, in the spirit of Noah Gibb’s Rebuilding Rails. In the following, I’ll be sharing runnable Ruby code that is similar to how Concurrent Ruby solves the same kind of problems. That said, Concurrent Ruby is much, much safer—and thus a little more complex—than what I’m writing here so please, if you need this functionality, use Concurrent Ruby directly.

The use case: future scheduled tasks

Imagine you want to run some bits of code, at a point in time in the future. It might look like this example creating several tasks at once with varying delays in seconds:

ScheduledTask.execute(delay = 30) do
  # run some code
end

ScheduledTask.execute(60) do
  # run some code
end

ScheduledTask.execute(15) do
  # run some code
end

In Concurrent Ruby, the object to do this is a Concurrent::ScheduledTask (good name, right?). A ScheduledTask will wait delay seconds and then run the block of code on a background thread.

Behind the ScheduledTask is the real star: the Concurrent::TimerSet, which executes a collection of tasks, each after a given delay. Let’s break down the components of a TimerSet:

  • TimerSet maintains a list of tasks, ordered by their delays, with the soonest first
  • TimerSet runs a reactor-like loop in a background thread. This thread will peek at the next occurring task and wait/sleep until it occurs, then pop the task to execute it.
  • TimerSet uses a Concurrent::Event (which is like a Mutex and ConditionVariable combined in a convenient package) to interrupt the sleeping reactor when new tasks are created.

I’ll give examples of each of these. But first, you may be asking….

Why is this so hard?

This is a lot of objects working together to accomplish the use case. This is why:

  • Ruby threads have a cost, so we can’t simply create a new thread for each and every task, putting it to sleep until an individual task is intended to be triggered. That would be a lot of threads.
  • Ruby threads aren’t safe be canceled/killed, so we can’t, for example, create a single thread for the soonest task but then terminate it and create a new thread if new task is created with a sooner time.

The following section will show how these objects are put together. Again, this is not the exact Concurrent Ruby implementation, but it’s the general shape of how Concurrent Ruby solves this use case.

The Event

Concurrent Ruby describes a Concurrent::Event as:

Old school kernel-style event reminiscent of Win32 programming in C++.

I don’t know what that means exactly, but an Event can be in either a set or unset state, and it can wait (with a timeout!) and be awakened via signals across threads.

I earlier described Event as a Mutex and ConditionVariable packaged together. The ConditionVariableis the star here, and the mutex is simply a supporting actor because the ConditionVariable requires it.

A Ruby ConditionVariable has two features that are perfect for multithreaded programming:

  • wait, which is blocking and will put a thread to sleep, with an optional timeout
  • set, which broadcasts a signal to any waiting threads to wake up.

Jesse Storimer’s excellent and free ebook Working with Ruby Threads has a great section on ConditionVariables and why the mutex is a necessary part of the implementation.

Here’s some code that implements an Event with an example to show how it can wake up a thread:

class Event
  def initialize
    @mutex = Mutex.new
    @condition = ConditionVariable.new
    @set = false
  end

  def wait(timeout)
    @mutex.synchronize do
      @set || @condition.wait(@mutex, timeout)
    end
  end

  def set
    @mutex.synchronize do
      @set = true
      @condition.broadcast
    end
  end

  def reset
    @mutex.synchronize do
      @set = false
    end
  end
end

Here’s a simple example of an Event running in a loop to show how it might be used:

event = Event.new
running = true
thread = Thread.new do
  # A simple loop in a thread
  while running do
    # loop every second unless signaled
    if event.wait(1)
      puts "Event has been set"
      event.reset
    end
  end
  puts "Exiting thread"
end

sleep 1
event.set
#=> Event has been set

sleep 1
event.set
#=> Event has been set

# let the thread exit
running = false
thread.join
#=> Exiting thread

The ScheduledTask

The implementation of the ScheduledTask isn’t too important in this explanation, but I’ll sketch out the necessary pieces, which match up with a Concurrent::ScheduledTask:

# GLOBAL_TIMER_SET = TimerSet.new

class ScheduledTask
  attr_reader :schedule_time

  def self.execute(delay, timer_set: GLOBAL_TIMER_SET, &task)
    scheduled_task = new(delay, &task)
    timer_set.post_task(scheduled_task)
  end

  def initialize(delay, &task)
    @schedule_time = Time.now + delay
    @task = task
  end

  def run
    @task.call
  end

  def <=>(other)
    schedule_time <=> other.schedule_time
  end
end

A couple things to call out here:

  • The GLOBAL_TIMER_SET is necessary so that all ScheduledTasks are added to the same TimerSet. In Concurrent Ruby, this is Concurrent.global_timer_set, though a ScheduledTask.execute can be given an explicit timer_set: parameter if an application has multiple TimerSets (for example, GoodJob initializes its own TimerSet for finer lifecycle management).
  • The <=> comparison operator, which will be used to keep our list of tasks sorted with the soonest tasks first.

The TimerSet

Now we have the pieces necessary to implement a TimerSet and fulfill our use case. The TimerSet implemented here is very similar to a Concurrent::TimerSet:

class TimerSet
  def initialize
    @queue = []
    @mutex = Mutex.new
    @event = Event.new
    @thread = nil
  end

  def post_task(task)
    @mutex.synchronize do
      @queue << task
      @queue.sort!
      process_tasks if @queue.size == 1
    end
    @event.set
  end

  def shutdown
    @mutex.synchronize { @queue.clear }
    @event.set
    @thread.join if @thread
    true
  end

  private

  def process_tasks
    @thread = Thread.new do
      loop do
        # Peek the first item in the queue
        task = @mutex.synchronize { @event.reset; @queue.first }
        break unless task

        if task.schedule_time <= Time.now
          # Pop the first item in the queue
          task = @mutex.synchronize { @queue.shift }
          task.run
        else
          timeout = [task.schedule_time - Time.now, 60].min
          @event.wait(timeout)
        end
      end
    end
  end
end

There’s a lot going on here, but here are the landmarks:

  • In this TimerSet, @queue is an Array that we explicitly call sort! on so that the soonest task is always first in the array. In the Concurrent Ruby implementation, that’s done more elegantly with a Concurrent::Collection::NonConcurrentPriorityQueue. The @mutex is used to make sure that adding/sorting/peeking/popping operations on the queue are synchronized and safe across threads.
  • The magic happens in #process_tasks, which creates a new thread and starts up a loop. It loops over the first task in the queue (the soonest):
    • If there is no task, it breaks the loop and exits the thread.
    • If there is a task, it checks whether it’s time to run, and if so, runs it. If it’s not time yet, it uses the Event#wait until it is time to run, or 60 seconds, whichever is sooner. That 60 seconds is a magic number in the real implementation, and I assume that’s to reduce clock drift. Remember, Event#wait is signalable, so if a new task is added, the loop will be immediately restarted and the delay recalculated.
    • In real Concurrent Ruby, task.run is posted to a separate thread pool where it won’t block or slow down the loop.
  • The Event#set is called inside of #add_task which inserts new tasks into the queue. The process_tasks background thread is only created the first time a task is added to the queue after the queue has been emptied. This minimizes the number of active threads.
  • The Event#reset is called when the queue is first peeked in process_tasks. There’s a lot of subtle race conditions being guarded against in a TimerSet. Calling reset unsets the event at the top of the loop to allow the Event to be set again before the Event#wait

And finally, we can put all of the pieces together to fulfill our use case of scheduled tasks:

GLOBAL_TIMER_SET = TimerSet.new

ScheduledTask.execute(1) { puts "This is the first task" }
ScheduledTask.execute(5) { puts "This is the third task" }
ScheduledTask.execute(3) { puts "This is the second task" }

sleep 6
GLOBAL_TIMER_SET.shutdown

#=> This is the first task
#=> This is the second task
#=> This is the third task

That’s it!

The TimerSet is a really neat object that’s powered by an Event, which is itself powered by a ConditionVariable. There’s a lot of fun thread-based signaling happening here!

While writing my post, I came across a 2014 post from Job Vranish entitled “Ruby Queue Pop with Timeout”, which builds something very similar looking using the same primitives. In the comments, Mike Perham linked to Connection Pool’s TimedStack which also looks similar. Again please use a real library like Concurrent Ruby or Connection Pool. This was just for explanatory purposes 👍


Whatever you do, don’t autoload Rails lib/

Update: Rails v7.1 will introduce a new configuration method config.autoload_lib to make it safer and easier to autoload the /lib directory and explicitly exclude directories from autoloading. When released, this advice may no longer be relevant, though I imagine it will still be possible for developers to twist themselves into knots and cause outages with autoloading overrides.

One of the most common problems I encounter consulting on Rails projects is that developers have previously added lib/ to autoload paths and then twisted themselves into knots creating error-prone, project-specific conventions for subsequently un-autoloading a subset of files also in lib/.

Don’t do it. Don’t add your Rails project’s lib/ to autoload paths.

How does this happen?

A growing Rails application will accumulate a lot of ruby classes and files that don’t cleanly fit into the default app/ directories of controllers, helpers, jobs, or models. Developers should also be creating new directories in app/* to organize like-with-like files (your app/services/ or app/merchants/, etc.). That’s ok!

But frequently there are one-off classes that don’t seem to rise to the level of their own directory in app/. From looking through the cruft of projects like Mastodon or applications I’ve worked on, these files look like:

  • A lone form builder
  • POROs (“Plain old Ruby objects”) like PhoneNumberFormatter, or ZipCodes or Seeder, or Pagination. Objects that serve a single purpose and are largely singletons/identity objects within the application.
  • Boilerplate classes for 3rd party gems, e.g. ApplicationResponder for the responders gem.

That these files accumulate in a project is a fact of life. When choosing where to put them, that’s when things can go wrong.

In a newly built Rails project lib/ looks like the natural place for these. But lib/ has a downside: lib/ is not autoloaded. This can come as a surprise, even to experienced developers, because they have been accustomed to the convenience of autoloaded files in app/. It’s not difficult to add an explicit require statement into application.rb or in an initializer, but that may not be one’s first thought.

That’s when people jump to googling “how to autoload lib/”. Don’t do it! lib/ should not be autoloaded.

The problem with autoloading lib/ is that there will subsequently be files added to lib/ that should not be autoloaded; because they should only be provisionally loaded in a certain environment or context, or deferred, for behavioral, performance, or memory reasons. If your project has already enabled autoloading on lib/, it’s now likely you’ll then add additional configuration to un-autoload the new files. These overrides and counter-overrides accumulate over time and become difficult to understand and unwind, and they cause breakage because someone’s intuition of what will or won’t be loaded in a certain environment or context is wrong.

What should you do instead?

An omakase solution

DHH writes:

lib/ is intended to be for non-app specific library code that just happens to live in the app for now (usually pending extraction into open source or whatever). Everything app specific that’s part of the domain model should live in app/models (that directory is for POROs as much as ARs)… Stuff like a generic PhoneNumberFormatter is exactly what lib/ is intended for. And if it’s app specific, for some reason, then app/models is fine.

The omakase solution is to manually require files from lib/ or use app/models generically to mean “Domain Models” rather than solely Active Record models. That’s great! Do that.

A best practice

Xavier Noria, Zeitwerk’s creator writes:

The best practice to accomplish that nowadays is to move that code to app/lib. Only the Ruby code you want to reload, tasks or other auxiliary files are OK in lib.

Sidekiq’s Problems and Troubleshooting explains:

lib/ directory will only cause pain. Move the code to app/lib/ and make sure the code inside follows the class/filename conventions.

The best practice is to create an app/lib/ directory to home these files. Mastodon does it, as do many others.

This “best practice” is not without contention, as usually anything in Rails that deviates from omakase does, like RSpec instead of MiniTest or FactoryBot instead of Fixtures. But creating app/lib as a convention for Rails apps works for me and many others.

Really, don’t autoload lib/

Whatever path you take, don’t take the path of autoloading lib/.