Ruby/Rails

Undervalued: The Most Useful Design Pattern

By
published

On the ten year anniversary of my first RailsConf, I had the privilege to speak at RailsConf 2024 alongside two other Normans: my little brother, Alistair, and friend, Cody. The talk explores how we can use value objects and data objects (also called data transfer objects) alongside the factory pattern to write decoupled, easily-testable software.

The Problem: An XML Product Feed

Let’s examine some code. This code is production-like. It’s code that was taken from a real Solidus app and modified to fit in on my slides. We’re going to explore an approach to refactoring this code.

Nokogiri::XML::Builder.new(encoding: "UTF-8") do |xml|
  xml.rss(base_xml_params) do
    Product.available.each do |product|
      xml.entry do
        xml["g"].id(product.id)
        xml["g"].title(product.name)
        xml["g"].description(product.description)
        xml["g"].price("%.2f %s" % [product.price.amount, product.price.currency])
        xml["g"].link(Rails.application.routes.url_helpers.product_url(product, host: @host))
        xml["g"].image_link(product.images.cover_image&.attachment_url || "")
        xml["g"].availability(if product.in_stock? then "In Stock" else "Out of Stock" end)
        xml["g"].ean_barcode(product.ean_barcode)
      end
    end
  end
end.to_xml

The code generates an XML feed that is consumed by Google. We won’t concern ourselves with the details of how this code is used. It might be called from a controller, but many stores have catalogs that are too large for that, so it might be called from a background job and uploaded to something like AWS S3.

Don’t worry about the exact details of the code. Here’s a basic run down of how it works. It uses the Nokogiri’s Nokogiri::XML::Builder which yields an xml object that we can call methods on to define the structure of the XML document.

Within the block, we loop over Product.available and create an “entry” for each product with information about that product. Product is an ActiveRecord model and the .available scope queries the products that are currently available for sale on the site.

Within the entry, we’ve got a series of fields that provide Google with information about that specific product:

  1. The id field contains a unique identifier for the product. We’re currently using the product’s database id.
  2. We put the product’s name in the title field. This shows the beginning of the disconnect between our application’s understanding of a product and Google’s; the names are different.
  3. The description column in our database maps directly to the description field in the feed. That’s consistent.
  4. We fill the price field with a specially formatted string containing the price of the product.
  5. In the link field we generate the URL for this product. There’s an instance variable in there that provides the host, but we’re not going to worry about it in our refactoring.
  6. The image_link field gets the URL of a “cover image” for the product, if it has one.
  7. The availability field contains another red flag. We put the string “In Stock” or “Out of Stock” depending on whether we inventory for that product. We already used the available scope on product, and that means something different than “in stock”. Our understanding of what it means for a product to be “available” differs from Google’s.
  8. Like description, ean_barcode matches right up with our model.

Don’t concern yourself too much with what the generated XML might look like. We’re concerned with the structure of the code here, not the structure of the XML. The important thing is that you understand that this code uses Nokogiri, queries some products from the database, and builds up an XML document containing data from our database in a special format that Google will consume.

Coupling and Cohesion

Coupling and cohesion are related software design heuristics. They are arguably the most important concepts in software design.

Coupling is the degree of interdependence between two different parts of the system. Good designs usually have relatively low coupling. If you make a change in one place, you should not have to make corresponding changes in many other places.

Cohesion is how much the stuff in an object belongs together. The Single-Responsibility Principle is about cohesion. It’s often stated as:

A class should have only one reason to change.

A “reason to change” is a pretty abstract idea. I prefer to talk about what a class “knows about”. Let’s consider the code above. It knows about:

  • The Nokogiri API
  • How to load products from the database and which products to load
  • The structure of the XML feed
  • How to map data from the structure in our database to the fields in the Google feed
  • How that data needs to be formatted

That’s quite a few “responsibilities”, but let’s make it clear why this code knowing about all these things is bad.

Firstly, it’s not very reusable. If we needed to create more than one feed or segment our feed, the code isn’t parameterized. That’s easy to fix, though.

It’s hard to test. Testing this code requires that we load the database with products, and the only output is a string of XML. That means we’re going to have to write tests where the inputs and outputs are very disconnected. The tests will have to save products to the database, generate the XML, parse it, and use some mechanism like XPath selectors to make assertions about the XML.

These tests will be slow, too. They have to load data into the database, including saving images to some kind of file storage (to test the image_link field properly). Those operations are slow. In tests where we’re testing conditional logic, which this code has some, we want to be writing fast tests to avoid bogging down our test suite.

Not only will these tests be slow, but their failures will be hard to understand. If your XPath selectors don’t find the content you’re looking for, you’ll be left reading through the XML document trying to understand why the selector didn’t find what you were looking for.

Let’s explore some patterns that can be useful for improving the design of code like this.

Data [Transfer] Objects

Data objects (sometimes called “Data-Transfer Objects” or DTOs) are object that bundle together data without any meaningful behaviour attached to it. A common example of data objects are configuration objects. You’ve probably seen some code like this:

MyGem.configure do |config|
  config.some_setting = true
  config.another_setting = "foo"
end

In many cases, the config object here is just an object with a bunch of reader and writer methods on it. This object is then exposed to you to set those attributes as you see fit, then passed to parts of the gem that depend on those settings so they can vary their behaviour according to your configuration.

The object itself it just a bundle of data. It doesn’t do anything. These objects aren’t limited to configuration. You can use them wherever you have a cohesive set of data that you need to pass around together.

Value Objects

Value objects are a bit more advanced. Ruby supports all kinds of values out of the box. If you’ve written a bit of Ruby, you’ve almost certainly worked with booleans, symbols, and different kinds of numbers.

We can build more complex kinds of values. Here are some examples of other types of values:

  • Dates, times, and durations
  • Ranges
  • Colours
  • Vectors and coordinates

Ruby even has built-in support for some of these. Because of the domain of the example, let’s talk about the most common kind of value object in eCommerce systems, monetary amounts.

Monetary amounts have a value and a currency. Think like $20 USD or ¥1,234 JPY. They behave like numbers when adding or subtracting them only if the currencies match. You can’t add $20 to ¥1,234, but you can add ¥20 and ¥1,234. If you have two monetary amounts, like two instances of $20, they are totally interchangeable. One twenty dollar USD amount is equivalent in all ways to another twenty dollar USD amount. They don’t have an identity.

That’s one of the rules of value objects: no identity. Two equal instances are interchangeable. Another rule is that they are immutable. Just like with Ruby’s built in Integer class, a value object’s methods should return new values.

It would be deeply confusing if 10.next mutated the number 10 into the number 11. Suddenly all the values of 10 in your app were actually eleven, things would go really bad. Values should be immutable.

That’s really all there is to value objects, just two rules

  1. They have no identity and are considered equal (and interchangeable) if their properties are the same.
  2. They are immutable.

These objects can be used to model complex domain operations and simplify code. They allow systems to work at a higher level and can hide the underlying structure of information from code that depends on it. They have many uses, but in this example we’re focused on how they can decouple code at domain boundaries.

Implementing Data and Value Objects

Before we can refactor our code, we should look at how data and value objects are implemented. There are two important cases and the important distinction is around immutability. If we want a value object (which are always immutable) or an immutable data object, we choose one approach. If we need a mutable approach, we take a different approach. Let’s explore each.

Value Objects and Immutable Data Objects

Ruby v3.2 introduced a new class, Data. Data allows us to define classes that have immutable properties and handle equality based on those properties. Here a simple money example:

Money = Data.define(:amount, :currency)

Money.new(10, "USD") == Money.new(1234, "JPY")
#=> false
Money.new(9, "JPY") == Money.new(1234, "JPY")
#=> false
Money.new(10, "USD") == Money.new(10, "USD")
#=> true

This gives us everything we need for simple value and data objects (though our data objects probably won’t care about this equality property.) If we’re building a value object that supports some kind of operations, we can expand its definition with new methods.

Money = Data.define(:amount, :currency) do
  def +(other)
    unless other.is_a?(Money)
      raise TypeError, "Unsupported argument type: #{other.class.name}"
    end

    if other.currency != currency
      raise ArgumentError, "Can only add Money values with the same currency"
    end

    Money.new(amount + other.amount, currency)
  end
end

Money[100, "JPY"] + Money[19, "JPY"]
#=> #<data Money amount=119, currency="JPY">
Money[100, "JPY"] + Money[19, "CAD"]
#=> ArgumentError

Notice that our + method returns a new instance of the Money class. This preserves the immutable property of our value object.

Mutable Data Objects

Classes defined by Data.define have immutable properties. They don’t have any writer methods. You can’t do something like this:

ten_bucks = Money[10, "USD"]
ten_bucks.amount = 20
#=> NoMethodError

If you need to be able to mutate the attributes of your data objects, you need to use Struct. Struct works nearly identically to Data, but it does have writer methods. This makes it perfect for our mutable data objects.

module MyGem
    Config = Struct.new(:foo_enabled, :bar_size, :baz_name)

    def self.config
      @config ||= Config.new(
        foo_enabled: false,
        bar_size: 1,
        baz_name: "Jardo"
      )
    end

    def self.configure
      yield config
    end
end

MyGem.configure do |c|
  c.foo_enabled = true
  c.bar_size = 69
  c.baz_name = "Jared"
end

MyGem.config
#=> #<struct MyGem::Config foo_enabled=true, bar_size=69, baz_name="Jared">

Mutability is the only difference between Data and Struct that we care about here. In fact, if you’re not yet running Ruby 3.2, you can use structs in place of data objects, and simply avoid mutating them. But really you should just upgrade.

Factory Methods

The Factory Method is a pattern that can be used in many contexts outside of value and data objects, but is commonly used with them. A factory method is simply an alternate constructor. Let’s say that you store prices in your database, but sometimes you want to do some money math on them. It’s preferable to work with money objects for these operations since they enforce all the domain rules for working with monetary amounts. We might write a method like this:

Money = Data.define(:currency, :amount) do
  def self.from_price(price)
    new(price.amount, price.currency)
  end
end

This would allow use to convert our prices to money amounts, decoupling the data we’re working with from the database, and do whatever math we need to do with them. We can do the same with mutable data objects.

We might have configuration objects that describe what features are currently available in a certain context in our app. We can write a factory method that examines a user and determines which features they might be able to access. It could look something like this:

Config = Struct.new(:feature_a_enabled, :feature_b_enabled) do
  def self.from_user(user)
    new(
      feature_a_enabled: user.membership_active?,
      feature_b_enabled: user.tier == :premium
    )
  end
end

This pattern allows us to take existing data, extract the information we need from it, and convert it into a new object that has no reference to where that information came from. We can even construct these objects without using their constructors, making it possible to test code that depends on the value objects and data objects without ever constructing a price or user.

A Better Feed Implementation

We now have the pieces we need for a better implementation. In the video I walk through this refactoring step-by-step, but here I’m just going to cut to the chase. The calling code is going to change from what we showed originally, to this:

google_products = Product.available.map { |product|
  GoogleProduct.from_product product
}

GoogleProductFeed.new(google_products).to_xml

This code queries the products that we want in our feed, converts them to GoogleProduct data objects, then feeds those into a GoogleProductFeed class. Let’s look at GoogleProduct first:

GoogleProduct = Data.define(
  :id, :name, :description, :price, :link,
  :image_link, :available, :ean_barcode
) do
  def self.from_product(product)
    new(
      id: product.id,
      title: product.name,
      description: product.description,
      price: Money.from_price(product.price.amount, product.price.currency),
      link: Rails.application.routes.url_helpers.product_url(product, host: @host),
      image_link: product.images.cover_image&.attachment_url || "",
      available: product.in_stock?,
      ean_barcode: product.ean_barcode
    )
  end
end

This class now knows how to take a product from the database and convert it into a domain object that represent’s Google’s understanding of a product. Once we have an instance of GoogleProduct, we’re fully decoupled from our database products. This class doesn’t hold a reference to the original product and (through its regular constructor) can even be constructed without a product model from the database.

The GoogleProductFeed class now only knows about the Nokogiri API and the structure of our XML feed. It looks like this:

class GoogleProductFeed
  def initialize(google_products)
    @google_products = google_products
  end

  def to_xml
        Nokogiri::XML::Builder.new(encoding: "UTF-8") do |xml|
          xml.rss(base_xml_params) do
            @google_products.each do |product|
              xml.entry do
                xml["g"].id(product.id)
                xml["g"].title(product.title)
                xml["g"].description(product.description)
                xml["g"].price("%.2f %s" % [product.price.amount, product.price.currency])
                xml["g"].link(product.link)
                xml["g"].image_link(product.image_link)
                xml["g"].availability(if product.available then "In Stock" else "Out of Stock" end)
                xml["g"].ean_barcode(product.ean_barcode)
              end
            end
          end
        end.to_xml
  end
end

Our query logic, formatting logic, and XML structure are no longer mixed into one class. The top-level queries for the data, then hands it off to GoogleProduct‘s factory method for the relevant information to be extracted, then passes that data along to GoogleProductFeed to construct the XML.

Handling Change

A good design is easy to change, so let’s explore how this design might handle different kinds of change. Let’s first examine the first snippet:

google_products = Product.available.map { |product|
  GoogleProduct.from_product product
}

GoogleProductFeed.new(google_products).to_xml

This code only “know about” what products to put in the feed. Nothing here is going to need to change if anything about the details of the feeds change. The only thing likely kind of change that would require changes here is if we need to make changes to which products are supposed to be in this feed.

Next, let’s examine our data object, the GoogleProduct:

GoogleProduct = Data.define(
  :id, :name, :description, :price, :link,
  :image_link, :available, :ean_barcode
) do
  def self.from_product(product)
    new(
      id: product.id,
      title: product.name,
      description: product.description,
      price: Money.from_price(product.price.amount, product.price.currency),
      link: Rails.application.routes.url_helpers.product_url(product, host: @host),
      image_link: product.images.cover_image&.attachment_url || "",
      available: product.in_stock?,
      ean_barcode: product.ean_barcode
    )
  end
end

The class handles the mapping of our database products to Google’s understanding of a product. It might need to change for a variety of reasons:

  • It will need to change if we need to add or remove fields from our feed.
  • It will need to change if the contents or formats of these fields change.
  • It will need to change if structure of our database changes.

That’s a few different sources of changes (and I can technically think of more), but it’s all focused around one responsibility: mapping our domain model of a product to Google’s domain model of a product. From a design perspective, that’s great.

This class is easily testable, doesn’t depend on products being saved to the database, and provides a nice clear mapping between the names and concepts of the two systems.

Finally, let’s look at the GoogleProductFeed class:

class GoogleProductFeed
  def initialize(google_products)
    @google_products = google_products
  end

  def to_xml
        Nokogiri::XML::Builder.new(encoding: "UTF-8") do |xml|
          xml.rss(base_xml_params) do
            @google_products.each do |product|
              xml.entry do
                xml["g"].id(product.id)
                xml["g"].title(product.title)
                xml["g"].description(product.description)
                xml["g"].price("%.2f %s" % [product.price.amount, product.price.currency])
                xml["g"].link(product.link)
                xml["g"].image_link(product.image_link)
                xml["g"].availability(if product.available then "In Stock" else "Out of Stock" end)
                xml["g"].ean_barcode(product.ean_barcode)
              end
            end
          end
        end.to_xml
  end
end

This class is relatively isolated from change. It might need to change if:

  • the fields we want in the feed change
  • we were wrong about the structure or formatting of the feed
  • Nokogiri’s API changes

Beyond that, it’s well protected from change. It doesn’t know about the database or even if there is a database. It doesn’t know where the GoogleProduct instances come from. All it knows about is the structure and format of the feed, and the fields that GoogleProduct has.

Because it is isolated in this way, the tests for it will be extremely fast. They can construct some GoogleProduct instances, pass them in, and verify the structure and format of the XML. Yes, there’ll be some XML parsing in these tests, but this is the only class that knows anything about XML, so that’s expected.

Drawing Boundaries

Let’s talk about the pattern we’ve implemented in this code. Our code is responsible for taking products from our database and creating a specially formatted XML feed containing information about those products.

We identified that there’s a domain boundary hidden inside this task. We’re bridging our application’s domain model and its understanding of a product and Google’s domain model and understanding of a product. The two understandings of a product were incongruous. Some fields had different names. What the term “available” mean had a different meaning on each side.

By refactoring the code, we separated the concerns. One object, GoogleProductFeed, knew only about generating XML and the XML being generated. Another object, GoogleProduct, mapped our understanding of a product to Google’s. The top-level code was left gluing these pieces together.

Effectively, this design draws a boundary between our domain and Google’s. GoogleProduct’s factory method is that line. Once a GoogleProduct has been initialized, it doesn’t know anything about our database models or the structure of our database. Objects, like GoogleProductFeed, that consume those objects aren’t dependent on where they came from.

Factory Method Tips

Combining factory methods with data and value objects is an effective way to draw boundaries in our system. That’s where this pattern is most useful. Look for places where you’re reconciling to different domain models (like in our example) or places where some kind of transformation is taking place.

For example, you might use this pattern when transforming an eCommerce order into an object that represents a shipping label for that order. It’s within one domain model, but there’s a conceptual transformation happening.

You can also use the pattern at the boundaries of module as a mechanism for for information-hiding. Rather than leading your module’s internal objects, you can transform them into simpler, data or value objects, keeping your modules decoupled.

Multiple Factory Methods

Giving a single value or data object multiple factory methods allows you to have multiple parts of your system converge to reuse shared data or value objects. We use this pattern in Solidus to around pricing options.

We need to know what parameters to use to select a price in different contexts. Sometimes pricing options are constructed from an HTTP request for determining what price to show in a view. Other times pricing options are constructed from an order, so that we know what price to apply to an item in that order.

This makes the parts of the system that operate on pricing options simpler and more testable, because they don’t need to know where the pricing options came from.

One-to-Many, Many-to-One, Many-to-Many

You can write factory methods that take one object and return many value/data objects. You can also create factory methods that take many objects and return only one. Both cases are extremely useful.

Consider our feed example. It’s possible that we misunderstood what Googles products mapped to in our system. Rather than mapping to what we called “products”, they actually mapped to individual SKUs, what we call “variants”. If t-shirt comes in six different sizes, we want a GoogleProduct for each size.

In this case, we could modify our factory method to return an array of objects, one for each size. The GoogleProductFeed object wouldn’t even need to change.

What matters is that there’s a transformation, not how many values are on each side of that transformation. You might even have transformations where you’re transforming multiple kinds of values into something else. This pattern works just as well in that situation.

Highlight What Matters

When you pass an ActiveRecord model around, consumers have full access to the entire ActiveRecord API, on top of all the methods you’ve added to that class. It’s not going to be obvious without reading through the code which parts of the API the consumer actually cares about.

Coversely, when you pass data and value objects around, their API is very limited. You make it very clear that there are only a few possible values that the consumer could possibly care about.

Facade, More Like Fa-bad

I’m writing this more than a year after giving the talk, disappointed that it took me this long to come up with that (terrible) pun. I’m especially disappointed because I’ve been railing against the facade and decorator patterns for so long.

These patterns pop up again and again in different contexts, but let’s use view objects as our example. These are objects that typically wrap database models and provide extra functionality that is only necessary in views. Here’s an article on them that happens to be at the top of my search results for “view object rails” right now.

The issue with these objects is that they suffer from the exact issue we’ve been trying to avoid in this talk. To instantiate them, you still need to pass in an instance of the underlying model. You need it for tests and you need it in production code. I don’t think you should be transforming all your database models into data objects just for your views, but that’s because there’s no domain boundary at play.

The facade pattern isn’t inherently bad.

You should be using it for creating wrappers around third-party APIs and complex interfaces. It’s just a bad fit when you’re transforming data at a boundary.

In Summary

Whether using value objects to represent something in your domain, or data objects to bundle related data together, combining them with factory methods will help you draw nice, clean boundaries in your systems. Because these kinds of objects can be initialized independently from the inputs to their factory methods, you can avoid coupling in your system and your tests.