Introducing GoodJob 1.0, a new Postgres-based, multithreaded, ActiveJob backend for Ruby on Rails

GoodJob is a new Postgres-based, multithreaded, second-generation ActiveJob backend for Ruby on Rails.

Inspired by Delayed::Job and Que, GoodJob is designed for maximum compatibility with Ruby on Rails, ActiveJob, and Postgres to be simple and performant for most workloads.

  • Designed for ActiveJob. Complete support for async, queues, delays, priorities, timeouts, and retries with near-zero configuration.
  • Built for Rails. Fully adopts Ruby on Rails threading and code execution guidelines with Concurrent::Ruby.
  • Backed by Postgres. Relies upon Postgres integrity and session-level Advisory Locks to provide run-once safety and stay within the limits of schema.rb.
  • For most workloads. Targets full-stack teams, economy-minded solo developers, and applications that enqueue less than 1-million jobs/day.

Visit Github for instructions on adding GoodJob to your Rails application , or read on for the story behind GoodJob.

A “Second-generation” ActiveJob backend

Why “second-generation*”? GoodJob is designed from the beginning to be an ActiveJob-backend in a conventional Ruby on Rails application.

First-generation ActiveJob backends, like Delayed::Job and Que, all predate ActiveJob and support non-Rails applications. First-generation ActiveJob backends are significantly more complex than GoodJob because they separately maintain a lot of functionality that comes with a conventional Rails installation (ActiveRecord, ActiveSupport, Concurrent::Ruby) and re-implement job lifecycle hooks so they can work apart from ActiveJob. I’ve observed that this can make them slow to keep up with major Rails changes. An impetus for GoodJob was reviewing the number of outages, blocked upgrades, and forks of first-generation backends I’ve managed during both major and minor Rails upgrades over the years.

As a second-generation ActiveJob backend, GoodJob can draft off of all the advances and solved problems of ActiveJob and Ruby on Rails. For example rescue_from, retry_on, discard_on are all implemented already by ActiveJob.

GoodJob is significantly thinner than first-generation backends, and over the long run hopefully easier to maintain and keep up with changes to Ruby on Rails. For example, GoodJob is currently ~600 lines of code, whereas Que is ~1,200 lines, and Delayed::Job is ~2,300 lines (2,000 for delayed_job, and an additional 300 for delayed_job_active_record).

*“Second generation” was coined for me by Daniel Lopez on Ruby on Rails Link Slack.

Postgres-based

I love Postgres. Postgres offers a lot of features, has safety and integrity guarantees, and simply running fewer services (skipping Redis) means less complexity in development and production.

GoodJob builds atop ActiveRecord. It’s numbingly boring, in a good way.

GoodJob uses session-level Advisory Locks to provide run-once guarantees with relatively little performance implications for most workloads.

GoodJob’s session-level Advisory Lock implementation is perhaps the only “novel” aspect, that comes from my experience orchestrating complex web-driving of government systems (“the browser is the API”) for Code for America. GoodJob uses a Common Table Expression (CTE) to find, lock, and return the next workable job in a single query. Session-level Advisory Locks will gracefully relinquish that lock if interrupted, without having to maintain a transaction for the duration of the job.

Multi-threaded

GoodJob uses Concurrent::Ruby to scale and manage jobs across multiple threads. “Concurrent Ruby makes one of the strongest thread-safety guarantees of any Ruby concurrency library”. Ruby on Rails has adopted Concurrent Ruby, and GoodJob follows its lead and thread-execution and safety guidelines.

In building GoodJob I leaned heavily on my positive experiences running Que, another multithreaded backend, on Heroku. Threads are great for balancing simplicity, economy, and performance for typical IO-bound workloads like heavy database queries, API requests, Selenium web-driving, or sending emails.

A feature that won’t be in GoodJob 1.0, but I hope to implement soon, is the ability to run the GoodJob scheduler inside the webserver process (“async mode”). This was a feature withdrawn from Que , but I believe can be safely implemented with Concurrent Ruby. An async mode would offer even greater economy, for example, in Heroku’s constrained environment.

GoodJob is right for me

GoodJob’s design is based directly on my experience in 2-pizza, full-stack teams, and as an economy-minded solo developer. GoodJob already powers Day of the Shirt and Brompt performing tens-of-thousands of real-world jobs a day.

Is GoodJob right for you?

Try it out and let me know.


Retail politics

I will quote anything that reinforces the necessity of showing up. From SF Weekly’s “The Many Faces of Leland Yee: A Politician’s Calculated Rise and Dramatic Fall” :

Upon reflection, Yee’s principles may be ever-shifting and his policies may be decorative, but he found a way around this: by being omnipresent.

He knew the name of every neighborhood stalwart from every neighborhood club; he cleaned hundreds of plates at hundreds of Chinatown banquets; he sat through countless community meetings, gathering hundreds of converts at a time: “In local politics,” says one longtime player, “a cup of coffee and a handshake can win you a friend for life.”

Yee showed up at your kid’s bar mitzvah or high school graduation; he showed up at your community gathering; he showed up at your neighborhood bazaar — in short, he showed up. His staff returned your phone call. And he read your letters: A former associate says Yee never failed to leave the office at the end of a long day toting a thick stack of mail that he made a point of poring through. In insider jargon, this is known as “retail politics.” Few worked harder or did it better.


Engineering Operations is not the same as Development

I wrote this memo several years ago when I joined GetCalFresh as the first outside engineering hire. An early focus of mine was helping the team move more confidently into an operational mindset: this memo reframes the teams existing values that drive development as values that also support operations. This also overlaps greatly with a talk I gave at Code for America’s 2018 Summit “Keeping Users at the Forefront While Scaling Services”.

Over the past month GetCalFresh has tripled the number of food stamp applications we’re processing. We often talk about “build the right thing”, but I wanted to focus on what it means to “operate a thing safely”.

Understanding operational failure

GetCalFresh collects foodstamp applicant’s information via a series of webforms, and then submits that applicant information to the county to begin the foodstamp eligibility process.

The website and webforms being offline or unavailable is bad.

Failing to submit application information to the county in a timely manner is awful. Foodstamp benefits are prorated to the day that the client’s application arrives at the county before 5pm. Failing to deliver a clients application in a timely manner literally means less food on the table for a hungry family.

Our system is operationally “safe” when it ensures that client information is transmitted to the county in a timely manner. Our system experiences an operational “failure” when information is not submitted in a timely manner. Our system has operational “risk” that degrades safety and is the potential for an operational failure.

Risks in complicated, complex and chaotic systems

Keeping a website online is complicated, but can be addressed with good practice. We use boring technologies: Ruby on Rails, SQL, AWS, that scale and respond predictably and are part of a mature ecosystem of monitoring tools and practice.

Submitting client information to the county is complex and sometimes chaotic. Because county systems often have no API, we have a queue of jobworkers that use Selenium Webdriver to click through and type into a “virtualized” headless Firefox browser. Automating this leads to emergent and novel problems. Client data must be transformed into a series of scripted actions to be performed across multiple county webpages, with dynamic forms and data fields. The county websites may be offline or degraded, and occasionally their structure and content changes. Additional client documents may need to be faxed, emailed or uploaded to the county, and those systems can be degraded as well.

Our applicants themselves can cause operational risks. As we target new populations and demographics (e.g. seniors, students, military families, homeless, low-literacy or non-English-speaking), we discover new usability issues and challenges in collecting and transforming data from our webforms into county systems. For example, different county systems have different optional and required fields and expect names and addresses to by sanitized and tokenized differently.

In this system, we cannot reliably (or affordably, with time and resources) predict how this system will respond as it scales to new users or integrates with new counties.

Creating safety with staff and time

We ensure that foodstamp applications are submitted in a timely manner through existing staff and dedicated time. Because we cannot reliably predict how our system scales or responds to changes, we have systems that alerts us to the risk of operational failure and engineers who are available to respond, remediate, and harden against similar circumstances in the future.

Every day, engineers block out 4pm to 5pm as “Apps & Docs”. We use this time to review any food stamp applications that failed our automated submission process to ensure the applications are submitted to the county by the daily deadline. Problems are documented and potential improvements are added to or reprioritized within the team’s backlog. We create safety by sometimes reaching out to clients for clarification or correction. In the event of an operational failure (we are not able to submit their application that day), we try to make things right; sometimes offering a gift card the client can use to purchase food.

Examples of problems identified during our hour of Apps & Docs:

  • Services not allowing multiple parallel sessions using the same credentials.
  • Inconsistent address tokenization for college campuses, military bases, PO boxes, and Private Mail Boxes
  • Frequency of people uploading iexplore.exe and notes.app instead of their intended document
  • Forms that do not allow non-ASCII characters
  • Forever optimizing headless Firefox, writing flexible and reliable Selenium scripts, and managing an increasing fleet of specialized jobworkers

Trade operational risk for speed of learning

We can’t predict the exact operational issues we’ll experience during a given day, but by scheduling and protecting one hour per day for operational tasks, we can deliberately trade risk for flexibility. Flexibility comes because we can accept small risks by introducing incomplete or manual-intervention-required workflows into the system. We do not have to build for every edge case or automate every action. We can develop features faster and create more opportunities to learn with real users in a real operational environment. This is an operationalization of our engineering principle “don’t argue, ship”.

Takeaways

  • Define operational failure: Leaving failure ambiguous can lead to fire-drills on every bad experience and exception, even if they may not have a material impact on business process or metrics. Defining service level objectives helps everyone self-organize, prioritize and understand the impact of their work.
  • Operationalize operations: Unexpected things happen all the time, but merely saying “high priority interrupt” does not expose the actual cost of response and remediation. Blocking out explicit times and spaces helps measure, and thus manage, work that might otherwise be overlooked.
  • Protect Developers’ time only so much: “Any improvement not made at the constraint is an illusion.” Approaching automation as an iterative and forever-incomplete process enables our team to move quickly in optimizing the system as a whole. When manual remediation is at risk of overflowing our time block, we dedicate time to greater automation; when we have perceived sufficient tolerances, we can push product features faster by manually tasking edge-cases.
  • Operations is a practice: Product Design and Development principles and practice provide a strong foundation and an experienced team can greatly reduce the risk of technical and market failure… but they can’t eliminate it. Operations is a field and practice that can reinforce and elevate Product Design and Development.

Decade in Review 2010-2019

In loose category and no particular order, other than I think they warrant mentioning.

Personal

  • Communications and mental health. Two things that really greatly influenced me was reading Nonviolent Communications and doing Mood Gym.
  • Inclusion (continuation). Compared to last decade I’ve practiced in larger groups and communities, from workplace to church. Two books that stick with me are White Fragility and Dear Church: A Love Letter From a Black Preacher to the Whitest Denomination in the US.
  • Business. I incorporated my own business, Day of the Shirt, for which I’ve been filing taxes, hiring contractors, and businessing since 2011.
  • Fiction. Malazan Book of the Fallen. Jemisin’s Inheritance and broken Earth trilogies. Up to book 26 of The Cat Who…. Remembrance of Earth’s Past trilogy. The Dark Tower series. And the entirety of Discworld.
  • Many deaths. Dottie Stephens. Many folks from Church: Dale, Clifton, Sam, Kirsten.
  • Affluence and finance. The move to software engineering has had a four-fold increase on my income. As well as the matters of founders stock, options, shares, RSUs, etc. We bought a new car.

Family

  • Marriage. Angelina and I got married in 2014 in San Francisco. We’ve also been together for the entirety of the decade.
  • Membership organizations (continuation). I became a member of St. Francis Lutheran Church, the South End Rowing Club, Golden Gate Angling and Casting Club, and numerous museums.
  • Cat changes. We lost Jose Pierpont, but gained Sally Ride and Billie Jean King.
  • Extended family. Living near a lot of extended family has been a new experience and we’ve gained many new nephews and nieces around the country.
  • Spending time together. This decade has been marked by a ramping up of weekend trips and travel, from Calistoga to Australia.

Career

  • San Francisco. 8 years of this decade have been spent in San Francisco, longer even then my time in Boston where I spent the majority of the naughties.
  • Transition from community-based work to software/tech. Shutting down the Transmission Project and Digital Arts Service Corps was hard. Software/Tech is fine.
  • I have great appreciation for friends and colleagues who have introduce me to the body of work on ergonomics is software development. For example, DevOps, Extreme programming, TDD, and Christopher Alexander.
  • Facilitation, Coaching and Sponsorship (continue). Still doing it.

The water never goes away

From Ronan Farrow’s Catch and Kill:

Perez said that she urged Sciorra to speak by describing her own experience of going public about her assault. “I told her, ‘I used to tread water for years. It’s fucking exhausting, and maybe speaking out, that’s your lifeboat. Grab on and get out,’” Perez recalled. “I said, ‘Honey, the water never goes away. But, after I went public, it became a puddle and I built a bridge over it, and one day you’re gonna get there, too.’”

From Thomas Page McBee’s Man Alive:

“Abandon all hope,” I’d written on a Post-it note, and I watched it move gently beneath the heat duct. I read it in some book. The idea was that hope misses the point: it’s either going to happen or not. You can’t make a new reality, only fashion something real from the one that you’ve got.


To care about AI

From the endnotes of Ted Chiang’s Exhalation on the short story “The lifecycle of softare objects”:

I’ve read stories in which people argue that AIs deserve legal rights, but in focusing on the big philosophical question, there’s a mundane reality that these stories gloss over. It’s similar to the way movies always depict love in terms of grand romantic gestures when, over the long term, love also means working through money problems and picking dirty laundry off the floor. So while achieving legal rights for AIs would be a major step, another milestone that would be just as important is people putting real effort into their individual relationships with AIs.

And even if we don’t care about them having legal rights, there’s still good reason to treat conscious machines with respect. You don’t have to believe that bomb-sniffing dogs deserve the right to vote to recognize that abusing them is a bad idea. Even if all you care about is how well they can detect bombs, it’s in your best interest that they be treated well. No matter whether we want AIs to fill the role of employees, lovers, or pets, I suspect they will do a better job if, during their development, there were people who cared about them.


Deterministic test data with Faker, FactoryBot, and RSpec

I get a lot of joy from using Faker and FactoryBot to efficiently generate real-world test data, but its randomness can be a liability when trying to debug complicated specs or when setting up systems that require repeatable data across RSpec test runs like Percy’s visual diffs.

Without deterministic test data, generating three new users with 3.times { puts Faker::Name.first_name } would result in Danny, Solomon, Fabian when run once, then Jordon, Shawn, Asa when run a second time, then Bruce, Leonor, Paulette when run a third time.

With deterministic test data, I expect to always generate the same set of names no matter how many times the code is run. Faker has documented how to configure and seed the random number generator and this can be achieved with:

3.times do |n| 
  Faker::Config.random = Random.new(n)
  puts Faker::Name.first_name 
end

This script outputs Zachery, Dawna, Desmond every single time it is run, meaning that it’s deterministic.

Faker’s deterministic configuration can be combined with a FactoryBot sequence to always get the same data every time a new factory instance is created. For example, here’s what a deterministic User factory could look like:

# spec/factories/users.rb

FactoryBot.define do
  factory :user do
    sequence(:first_name) do |n|
      Faker::Config.random = Random.new(n)
      Faker::Name.first_name
    end

    sequence(:last_name) do |n|
      Faker::Config.random = Random.new(n)
      Faker::Name.last_name
    end

    email { "#{first_name.parameterize}.#{last_name.parameterize}@example.com" }
    password { 'password123' }
  end
end

Within every sequence, the Faker random number generator is seeded with Faker::Config.random = Random.new(n) , where n is the integer generated by the sequence.

Unfortunately, just using a sequence isn’t completely sufficient when running tests in random order, or inserting new tests or rearranging the tests, as one would expect in an active codebase. FactoryBot sequences are global, meaning that they don’t reset by default between each and every test; a FactoryBot instance during one test run might use a different sequence number than a previous test run.

Therefore, it’s also necessary to rewind FactoryBot sequences after each RSpec example. Place this in your spec/rails_helper.rb or spec/support directory:

# spec/support/factory_bot.rb

RSpec.configure do |config|
  config.after do
    FactoryBot.rewind_sequences
  end
end

That’s all you need to combine Faker and FactoryBot to get deterministic test data in your RSpec tests. Have fun!


Because it helps them to release software

I have been thinking a lot about the framing of a sentence in this piece on Agile. If it’s given that software is the strategy, then it’s legitimate to focus on being better at releasing software.

From Graham Lee’s The value of the things on the left:

That software engineering department now has different management and is Agile. They have releases at least every month (they already released daily, though those releases were of minimal scope). They respond to change rather than follow a plan (they already did this, though through hefty “change control” procedures). They meet daily to discuss progress (they already did this).

But, importantly, they do the things they do because it helps them release software, not because it helps them hit project milestones. The revolution really did land there.


The Concrete Sumo - Ethics in Software Engineering Discussion Guide

I prepared this discussion guide for Taft H. Broome, Jr’s The Concrete Sumo”

and facilitated it two weeks ago for the software engineering team at Code for America.


To prepare for the discussion, please read the following sections of paper, “The Concrete Sumo”

:

  • Forward
  • The Concrete Sumo

    Note: the paragraph beginning “In the Johnny-on-the-Spot, Tubby was the first to speak to me…” is particularly difficult because it begins with an unfamiliar colloquialism (“Johnny-on-the-Spot” meaning to be on-call, in the hot seat, put on the spot, or put on notice); names three characters who are not introduced until much later in the commentary (Tubby, Roebling, and Uncle Roy); and the protagonist is imagining the three characters giving him advice though they are not actually present. — Ben

  • Heuristic: Uncle Roy, the Mutumin Kiri
  • The Assigned World
  • Afterword

These sections have been selected for brevity and focus. The paper has been described by ethicist Michael Davis as an “informative story groaning under the weight of an interpretation it cannot bear.” Therefore, the reading and discussion will focus on the story and its application to software engineering ethics. — Ben

Discussion Questions

  • This is a paper about ethics. Generally, what do you think of when you think of “ethics”? What does it mean to you to act ethically or be ethical? 
  • In the paper, the author introduces the idea of “exigent circumstances”, described as situations that “are so complex as to deny engineers the reflection required to invoke ethical theories, and so novel as to discourage engineers from appealing to case studies.”
    • In the story of the Concrete Sumo, what is the exigent circumstances the author confronts?
    • What similar situations have you had like this in your life or work?
  • The author speaks of a “scientific” decision defined as “with or without scientific certitude, but with the commitment of the parties to the situation”. 
    • In the story of the Concrete Sumo, what made the decision “scientific”? Who were the directly committed parties? Why do you think “commitment” is specifically called out?
    • Thinking more broadly about engineering as a discipline and vocation, who is committed to engineers making good decisions? Within software engineering, what groups and organizations make up our “scientific” community? 
  • The author introduces a practice (“praxistic”) to be used in exigent situations. Broadly that practice is to “think of an aged, highly mature person: a family member or some legendary character; someone who exhibited great wisdom and caring for others” and to “do what [they] would do.”
    • In the story, what people did the author imagine and act out? Who was the counter-example whose actions they rejected?
    • Does the practice here seem familiar and in what ways? Do you have people, real or fictional, that you have sought, mentally, for advice? Are there situations where you have or would apply this?
    • In the Afterword, the author speaks of the practice helping students pass an ethics exam. How does that make you feel?
  • The author goes to great lengths to assert not only that the imagined role model inspires a suitable action, but also that they are respected in their social context and communities, with attributes such as “wisdom”, “character” and “caring for others”. 
    • What cultural context did the author use in choosing Uncle Roy and rejecting Tubby? Is this familiar to you?
    • What different social contexts, worlds, or communities, fictional or real, could guide you? How are they different and similar? Is breadth or depth of understanding better?
    • As an engineer, how does the idea of being guided by an imagined or fictional character make you feel? As an engineer, do you think your education or experience has prepared you to think in this way?
    • What are the ingredients necessary to further develop this practice both for yourself and engineering as a whole?
  • In “The Rhetoric” section (not required reading), the author writes “In Western ethics, the decision-maker is the subject, and the rightness or wrongness of his or her actions its predicate. Among the Nigerian Hausa, however, the community is the subject, and the decision-maker’s character the predicate.” 
    • What do you think the author means by making this comparison?
    • How does it make you feel to shift from “the decision-maker and their decision” to “the community’s responsibility for the decision a person makes”?
    • How many communities can a person be a part of? How can they overlap or diverge? How does intersectionality affect your thoughts about responsibility?
  • Within software and technology, there are recognized leaders who have made large contributions to the field, but also have been called out for their gross personal beliefs and antisocial behavior. For example, Steve Jobs, Linus Torvalds, Richard Stallman, Uncle Bob Martin, etc. 
    • Thinking of the practice described in the paper, is it practical to separate people’s technical contributions from their character?
    • Is it easy for you to imagine yourself acting in their skin? Why or why not?
    • Thinking of the practice described in the paper, how might diversity and inclusion in our engineering communities help people to act ethically? 
    • What is our communities’ responsibility for creating the conditions in which people make ethical decisions? What can we reasonably expect?
  • Bowen H. McCoy, in Harvard Business Review’s “Parable of the Sadhu” describes the concept of “business” ethics. Business ethics “has to do with the authenticity and integrity of the enterprise. To be ethical is to follow the business as well as the cultural goals of the corporation, its owners, its employees, and its customers. Those who cannot serve the corporate vision are not authentic businesspeople and, therefore, are not ethical in the business sense.”
    • How is this similar to the Forward’s Vanderbilt quote “The public be damned! I work for my stockholders”?
    • How is the context of “business” ethics defined? Who defines the visions and goals and what are they relative to?
    • How is this “business” ethics similar to and different from the “scientific” engineering ethics we’ve been discussing? 
  • Facebook employees recently published a letter criticising the company’s lax fact-checking policies for political ads. After explaining the problems with the policy and suggesting alternatives, they closed their letter with “This is still our company.”
    • When thinking of Western and non-Western frames, what multiple interpretations could there be of that phrase? How does framing something as a “leadership” decision affect how we approach it compared to the idea of “community” responsibility?
    • How is the idea of “scientific” decisions challenged in a “business” environment? How are the power dynamics different in a business than a community? How are they the same? Can they be wholly separated within the context of software engineering?
  • Software engineering communities have frequently raised the idea of a “Hippocratic Oath”

to improve ethical conduct in software engineering and emerging fields such as Machine Learning and AI.

  • Given the reading, how applicable would such an oath be in exigent circumstances? 
  • Given the reading, what else would be necessary to make a Hippocratic Oath actionable and meaningful to engineers? How could existing software engineering communities better provide stories of such an oath’s usage by representative role models?
  • At the very end of the story, the foreman says “When it comes to rookie engineers, it is better to pay early, than to pay later.” 
    • Given all we have discussed, what could this mean? 
    • Who pays early? Later? What are the costs?
  • In what ways, if any, has this reading made you think you would act differently in the future?

Public comments on Sunset housing

Today I spoke in public comment before the San Francisco Planning Commission on a proposed 20-unit building in the Outer Sunset at 3945 Judah St.

Good afternoon, Commissioners.

My name is Ben Sheldon. I have been a resident of the Outer Sunset for 8 years.

I support this project.

I live 4 blocks away from the proposed project. I live in a 4 story, 12 unit multi-family apartment building.

My building is vibrant. It is home to senior citizens, families with young children, teachers, and working professionals like myself.

We shop at local businesses. Eat at local restaurants. Attend local schools. We participate fully in the civic life of our neighborhood.

Multi-family buildings. Dense. And large. Like my own, and the proposed project, are part of the character of our neighborhood.

My building was built in 1928. It makes me sad. And at times angry. That a building like my own seemingly could not be built today.

My building is not enough. It is not modern. It is not accessible for the very old or people with disabilities. It has lead and toxicity issues of concern for very young children.

The neighborhood needs more buildings like my own. Better ones.

Multi-family. Dense. Accessible. Modern. Vibrant.

This project is equally an issue of inclusion as it is of character.

I support this project.

I urge you to support this project too.

Thank you.