Undervalued: The Most Useful Design Pattern
On the ten year anniversary of my first RailsConf, I had the privilege to speak at RailsConf 2024 alongside two other Normans: my little brother, Alistair, and friend, Cody. The talk explores how we can use value objects and data objects (also called data transfer objects) alongside the factory pattern to write decoupled, easily-testable software.
The Problem: An XML Product Feed
Let’s examine some code. This code is production-like. It’s code that was taken from a real Solidus app and modified to fit in on my slides. We’re going to explore an approach to refactoring this code.
Nokogiri::XML::Builder.new(encoding: "UTF-8") do |xml|
xml.rss(base_xml_params) do
Product.available.each do |product|
xml.entry do
xml["g"].id(product.id)
xml["g"].title(product.name)
xml["g"].description(product.description)
xml["g"].price("%.2f %s" % [product.price.amount, product.price.currency])
xml["g"].link(Rails.application.routes.url_helpers.product_url(product, host: @host))
xml["g"].image_link(product.images.cover_image&.attachment_url || "")
xml["g"].availability(if product.in_stock? then "In Stock" else "Out of Stock" end)
xml["g"].ean_barcode(product.ean_barcode)
end
end
end
end.to_xml
The code generates an XML feed that is consumed by Google. We won’t concern ourselves with the details of how this code is used. It might be called from a controller, but many stores have catalogs that are too large for that, so it might be called from a background job and uploaded to something like AWS S3.
Don’t worry about the exact details of the code. Here’s a basic run down of how it works. It uses the Nokogiri’s Nokogiri::XML::Builder
which yields an xml
object that we can call methods on to define the structure of the XML document.
Within the block, we loop over Product.available
and create an “entry” for each product with information about that product. Product
is an ActiveRecord model and the .available
scope queries the products that are currently available for sale on the site.
Within the entry, we’ve got a series of fields that provide Google with information about that specific product:
- The
id
field contains a unique identifier for the product. We’re currently using the product’s databaseid
. - We put the product’s
name
in thetitle
field. This shows the beginning of the disconnect between our application’s understanding of a product and Google’s; the names are different. - The
description
column in our database maps directly to thedescription
field in the feed. That’s consistent. - We fill the
price
field with a specially formatted string containing the price of the product. - In the
link
field we generate the URL for this product. There’s an instance variable in there that provides the host, but we’re not going to worry about it in our refactoring. - The
image_link
field gets the URL of a “cover image” for the product, if it has one. - The
availability
field contains another red flag. We put the string “In Stock” or “Out of Stock” depending on whether we inventory for that product. We already used theavailable
scope on product, and that means something different than “in stock”. Our understanding of what it means for a product to be “available” differs from Google’s. - Like
description
,ean_barcode
matches right up with our model.
Don’t concern yourself too much with what the generated XML might look like. We’re concerned with the structure of the code here, not the structure of the XML. The important thing is that you understand that this code uses Nokogiri, queries some products from the database, and builds up an XML document containing data from our database in a special format that Google will consume.
Coupling and Cohesion
Coupling and cohesion are related software design heuristics. They are arguably the most important concepts in software design.
Coupling is the degree of interdependence between two different parts of the system. Good designs usually have relatively low coupling. If you make a change in one place, you should not have to make corresponding changes in many other places.
Cohesion is how much the stuff in an object belongs together. The Single-Responsibility Principle is about cohesion. It’s often stated as:
A class should have only one reason to change.
A “reason to change” is a pretty abstract idea. I prefer to talk about what a class “knows about”. Let’s consider the code above. It knows about:
- The Nokogiri API
- How to load products from the database and which products to load
- The structure of the XML feed
- How to map data from the structure in our database to the fields in the Google feed
- How that data needs to be formatted
That’s quite a few “responsibilities”, but let’s make it clear why this code knowing about all these things is bad.
Firstly, it’s not very reusable. If we needed to create more than one feed or segment our feed, the code isn’t parameterized. That’s easy to fix, though.
It’s hard to test. Testing this code requires that we load the database with products, and the only output is a string of XML. That means we’re going to have to write tests where the inputs and outputs are very disconnected. The tests will have to save products to the database, generate the XML, parse it, and use some mechanism like XPath selectors to make assertions about the XML.
These tests will be slow, too. They have to load data into the database, including saving images to some kind of file storage (to test the image_link
field properly). Those operations are slow. In tests where we’re testing conditional logic, which this code has some, we want to be writing fast tests to avoid bogging down our test suite.
Not only will these tests be slow, but their failures will be hard to understand. If your XPath selectors don’t find the content you’re looking for, you’ll be left reading through the XML document trying to understand why the selector didn’t find what you were looking for.
Let’s explore some patterns that can be useful for improving the design of code like this.
Data [Transfer] Objects
Data objects (sometimes called “Data-Transfer Objects” or DTOs) are object that bundle together data without any meaningful behaviour attached to it. A common example of data objects are configuration objects. You’ve probably seen some code like this:
MyGem.configure do |config|
config.some_setting = true
config.another_setting = "foo"
end
In many cases, the config
object here is just an object with a bunch of reader and writer methods on it. This object is then exposed to you to set those attributes as you see fit, then passed to parts of the gem that depend on those settings so they can vary their behaviour according to your configuration.
The object itself it just a bundle of data. It doesn’t do anything. These objects aren’t limited to configuration. You can use them wherever you have a cohesive set of data that you need to pass around together.
Value Objects
Value objects are a bit more advanced. Ruby supports all kinds of values out of the box. If you’ve written a bit of Ruby, you’ve almost certainly worked with booleans, symbols, and different kinds of numbers.
We can build more complex kinds of values. Here are some examples of other types of values:
- Dates, times, and durations
- Ranges
- Colours
- Vectors and coordinates
Ruby even has built-in support for some of these. Because of the domain of the example, let’s talk about the most common kind of value object in eCommerce systems, monetary amounts.
Monetary amounts have a value and a currency. Think like $20 USD or ¥1,234 JPY. They behave like numbers when adding or subtracting them only if the currencies match. You can’t add $20 to ¥1,234, but you can add ¥20 and ¥1,234. If you have two monetary amounts, like two instances of $20, they are totally interchangeable. One twenty dollar USD amount is equivalent in all ways to another twenty dollar USD amount. They don’t have an identity.
That’s one of the rules of value objects: no identity. Two equal instances are interchangeable. Another rule is that they are immutable. Just like with Ruby’s built in Integer class, a value object’s methods should return new values.
It would be deeply confusing if 10.next
mutated the number 10 into the number 11. Suddenly all the values of 10 in your app were actually eleven, things would go really bad. Values should be immutable.
That’s really all there is to value objects, just two rules
- They have no identity and are considered equal (and interchangeable) if their properties are the same.
- They are immutable.
These objects can be used to model complex domain operations and simplify code. They allow systems to work at a higher level and can hide the underlying structure of information from code that depends on it. They have many uses, but in this example we’re focused on how they can decouple code at domain boundaries.
Implementing Data and Value Objects
Before we can refactor our code, we should look at how data and value objects are implemented. There are two important cases and the important distinction is around immutability. If we want a value object (which are always immutable) or an immutable data object, we choose one approach. If we need a mutable approach, we take a different approach. Let’s explore each.
Value Objects and Immutable Data Objects
Ruby v3.2 introduced a new class, Data. Data allows us to define classes that have immutable properties and handle equality based on those properties. Here a simple money example:
Money = Data.define(:amount, :currency)
Money.new(10, "USD") == Money.new(1234, "JPY")
#=> false
Money.new(9, "JPY") == Money.new(1234, "JPY")
#=> false
Money.new(10, "USD") == Money.new(10, "USD")
#=> true
This gives us everything we need for simple value and data objects (though our data objects probably won’t care about this equality property.) If we’re building a value object that supports some kind of operations, we can expand its definition with new methods.
Money = Data.define(:amount, :currency) do
def +(other)
unless other.is_a?(Money)
raise TypeError, "Unsupported argument type: #{other.class.name}"
end
if other.currency != currency
raise ArgumentError, "Can only add Money values with the same currency"
end
Money.new(amount + other.amount, currency)
end
end
Money[100, "JPY"] + Money[19, "JPY"]
#=> #<data Money amount=119, currency="JPY">
Money[100, "JPY"] + Money[19, "CAD"]
#=> ArgumentError
Notice that our +
method returns a new instance of the Money
class. This preserves the immutable property of our value object.
Mutable Data Objects
Classes defined by Data.define
have immutable properties. They don’t have any writer methods. You can’t do something like this:
ten_bucks = Money[10, "USD"]
ten_bucks.amount = 20
#=> NoMethodError
If you need to be able to mutate the attributes of your data objects, you need to use Struct. Struct
works nearly identically to Data
, but it does have writer methods. This makes it perfect for our mutable data objects.
module MyGem
Config = Struct.new(:foo_enabled, :bar_size, :baz_name)
def self.config
@config ||= Config.new(
foo_enabled: false,
bar_size: 1,
baz_name: "Jardo"
)
end
def self.configure
yield config
end
end
MyGem.configure do |c|
c.foo_enabled = true
c.bar_size = 69
c.baz_name = "Jared"
end
MyGem.config
#=> #<struct MyGem::Config foo_enabled=true, bar_size=69, baz_name="Jared">
Mutability is the only difference between Data
and Struct
that we care about here. In fact, if you’re not yet running Ruby 3.2, you can use structs in place of data objects, and simply avoid mutating them. But really you should just upgrade.
Factory Methods
The Factory Method is a pattern that can be used in many contexts outside of value and data objects, but is commonly used with them. A factory method is simply an alternate constructor. Let’s say that you store prices in your database, but sometimes you want to do some money math on them. It’s preferable to work with money objects for these operations since they enforce all the domain rules for working with monetary amounts. We might write a method like this:
Money = Data.define(:currency, :amount) do
def self.from_price(price)
new(price.amount, price.currency)
end
end
This would allow use to convert our prices to money amounts, decoupling the data we’re working with from the database, and do whatever math we need to do with them. We can do the same with mutable data objects.
We might have configuration objects that describe what features are currently available in a certain context in our app. We can write a factory method that examines a user and determines which features they might be able to access. It could look something like this:
Config = Struct.new(:feature_a_enabled, :feature_b_enabled) do
def self.from_user(user)
new(
feature_a_enabled: user.membership_active?,
feature_b_enabled: user.tier == :premium
)
end
end
This pattern allows us to take existing data, extract the information we need from it, and convert it into a new object that has no reference to where that information came from. We can even construct these objects without using their constructors, making it possible to test code that depends on the value objects and data objects without ever constructing a price or user.
A Better Feed Implementation
We now have the pieces we need for a better implementation. In the video I walk through this refactoring step-by-step, but here I’m just going to cut to the chase. The calling code is going to change from what we showed originally, to this:
google_products = Product.available.map { |product|
GoogleProduct.from_product product
}
GoogleProductFeed.new(google_products).to_xml
This code queries the products that we want in our feed, converts them to GoogleProduct
data objects, then feeds those into a GoogleProductFeed
class. Let’s look at GoogleProduct
first:
GoogleProduct = Data.define(
:id, :name, :description, :price, :link,
:image_link, :available, :ean_barcode
) do
def self.from_product(product)
new(
id: product.id,
title: product.name,
description: product.description,
price: Money.from_price(product.price.amount, product.price.currency),
link: Rails.application.routes.url_helpers.product_url(product, host: @host),
image_link: product.images.cover_image&.attachment_url || "",
available: product.in_stock?,
ean_barcode: product.ean_barcode
)
end
end
This class now knows how to take a product from the database and convert it into a domain object that represent’s Google’s understanding of a product. Once we have an instance of GoogleProduct
, we’re fully decoupled from our database products. This class doesn’t hold a reference to the original product and (through its regular constructor) can even be constructed without a product model from the database.
The GoogleProductFeed
class now only knows about the Nokogiri API and the structure of our XML feed. It looks like this:
class GoogleProductFeed
def initialize(google_products)
@google_products = google_products
end
def to_xml
Nokogiri::XML::Builder.new(encoding: "UTF-8") do |xml|
xml.rss(base_xml_params) do
@google_products.each do |product|
xml.entry do
xml["g"].id(product.id)
xml["g"].title(product.title)
xml["g"].description(product.description)
xml["g"].price("%.2f %s" % [product.price.amount, product.price.currency])
xml["g"].link(product.link)
xml["g"].image_link(product.image_link)
xml["g"].availability(if product.available then "In Stock" else "Out of Stock" end)
xml["g"].ean_barcode(product.ean_barcode)
end
end
end
end.to_xml
end
end
Our query logic, formatting logic, and XML structure are no longer mixed into one class. The top-level queries for the data, then hands it off to GoogleProduct
‘s factory method for the relevant information to be extracted, then passes that data along to GoogleProductFeed
to construct the XML.
Handling Change
A good design is easy to change, so let’s explore how this design might handle different kinds of change. Let’s first examine the first snippet:
google_products = Product.available.map { |product|
GoogleProduct.from_product product
}
GoogleProductFeed.new(google_products).to_xml
This code only “know about” what products to put in the feed. Nothing here is going to need to change if anything about the details of the feeds change. The only thing likely kind of change that would require changes here is if we need to make changes to which products are supposed to be in this feed.
Next, let’s examine our data object, the GoogleProduct
:
GoogleProduct = Data.define(
:id, :name, :description, :price, :link,
:image_link, :available, :ean_barcode
) do
def self.from_product(product)
new(
id: product.id,
title: product.name,
description: product.description,
price: Money.from_price(product.price.amount, product.price.currency),
link: Rails.application.routes.url_helpers.product_url(product, host: @host),
image_link: product.images.cover_image&.attachment_url || "",
available: product.in_stock?,
ean_barcode: product.ean_barcode
)
end
end
The class handles the mapping of our database products to Google’s understanding of a product. It might need to change for a variety of reasons:
- It will need to change if we need to add or remove fields from our feed.
- It will need to change if the contents or formats of these fields change.
- It will need to change if structure of our database changes.
That’s a few different sources of changes (and I can technically think of more), but it’s all focused around one responsibility: mapping our domain model of a product to Google’s domain model of a product. From a design perspective, that’s great.
This class is easily testable, doesn’t depend on products being saved to the database, and provides a nice clear mapping between the names and concepts of the two systems.
Finally, let’s look at the GoogleProductFeed
class:
class GoogleProductFeed
def initialize(google_products)
@google_products = google_products
end
def to_xml
Nokogiri::XML::Builder.new(encoding: "UTF-8") do |xml|
xml.rss(base_xml_params) do
@google_products.each do |product|
xml.entry do
xml["g"].id(product.id)
xml["g"].title(product.title)
xml["g"].description(product.description)
xml["g"].price("%.2f %s" % [product.price.amount, product.price.currency])
xml["g"].link(product.link)
xml["g"].image_link(product.image_link)
xml["g"].availability(if product.available then "In Stock" else "Out of Stock" end)
xml["g"].ean_barcode(product.ean_barcode)
end
end
end
end.to_xml
end
end
This class is relatively isolated from change. It might need to change if:
- the fields we want in the feed change
- we were wrong about the structure or formatting of the feed
- Nokogiri’s API changes
Beyond that, it’s well protected from change. It doesn’t know about the database or even if there is a database. It doesn’t know where the GoogleProduct
instances come from. All it knows about is the structure and format of the feed, and the fields that GoogleProduct
has.
Because it is isolated in this way, the tests for it will be extremely fast. They can construct some GoogleProduct
instances, pass them in, and verify the structure and format of the XML. Yes, there’ll be some XML parsing in these tests, but this is the only class that knows anything about XML, so that’s expected.
Drawing Boundaries
Let’s talk about the pattern we’ve implemented in this code. Our code is responsible for taking products from our database and creating a specially formatted XML feed containing information about those products.
We identified that there’s a domain boundary hidden inside this task. We’re bridging our application’s domain model and its understanding of a product and Google’s domain model and understanding of a product. The two understandings of a product were incongruous. Some fields had different names. What the term “available” mean had a different meaning on each side.
By refactoring the code, we separated the concerns. One object, GoogleProductFeed
, knew only about generating XML and the XML being generated. Another object, GoogleProduct
, mapped our understanding of a product to Google’s. The top-level code was left gluing these pieces together.
Effectively, this design draws a boundary between our domain and Google’s. GoogleProduct
’s factory method is that line. Once a GoogleProduct
has been initialized, it doesn’t know anything about our database models or the structure of our database. Objects, like GoogleProductFeed
, that consume those objects aren’t dependent on where they came from.
Factory Method Tips
Combining factory methods with data and value objects is an effective way to draw boundaries in our system. That’s where this pattern is most useful. Look for places where you’re reconciling to different domain models (like in our example) or places where some kind of transformation is taking place.
For example, you might use this pattern when transforming an eCommerce order into an object that represents a shipping label for that order. It’s within one domain model, but there’s a conceptual transformation happening.
You can also use the pattern at the boundaries of module as a mechanism for for information-hiding. Rather than leading your module’s internal objects, you can transform them into simpler, data or value objects, keeping your modules decoupled.
Multiple Factory Methods
Giving a single value or data object multiple factory methods allows you to have multiple parts of your system converge to reuse shared data or value objects. We use this pattern in Solidus to around pricing options.
We need to know what parameters to use to select a price in different contexts. Sometimes pricing options are constructed from an HTTP request for determining what price to show in a view. Other times pricing options are constructed from an order, so that we know what price to apply to an item in that order.
This makes the parts of the system that operate on pricing options simpler and more testable, because they don’t need to know where the pricing options came from.
One-to-Many, Many-to-One, Many-to-Many
You can write factory methods that take one object and return many value/data objects. You can also create factory methods that take many objects and return only one. Both cases are extremely useful.
Consider our feed example. It’s possible that we misunderstood what Googles products mapped to in our system. Rather than mapping to what we called “products”, they actually mapped to individual SKUs, what we call “variants”. If t-shirt comes in six different sizes, we want a GoogleProduct
for each size.
In this case, we could modify our factory method to return an array of objects, one for each size. The GoogleProductFeed
object wouldn’t even need to change.
What matters is that there’s a transformation, not how many values are on each side of that transformation. You might even have transformations where you’re transforming multiple kinds of values into something else. This pattern works just as well in that situation.
Highlight What Matters
When you pass an ActiveRecord model around, consumers have full access to the entire ActiveRecord API, on top of all the methods you’ve added to that class. It’s not going to be obvious without reading through the code which parts of the API the consumer actually cares about.
Coversely, when you pass data and value objects around, their API is very limited. You make it very clear that there are only a few possible values that the consumer could possibly care about.
Facade, More Like Fa-bad
I’m writing this more than a year after giving the talk, disappointed that it took me this long to come up with that (terrible) pun. I’m especially disappointed because I’ve been railing against the facade and decorator patterns for so long.
These patterns pop up again and again in different contexts, but let’s use view objects as our example. These are objects that typically wrap database models and provide extra functionality that is only necessary in views. Here’s an article on them that happens to be at the top of my search results for “view object rails” right now.
The issue with these objects is that they suffer from the exact issue we’ve been trying to avoid in this talk. To instantiate them, you still need to pass in an instance of the underlying model. You need it for tests and you need it in production code. I don’t think you should be transforming all your database models into data objects just for your views, but that’s because there’s no domain boundary at play.
You should be using it for creating wrappers around third-party APIs and complex interfaces. It’s just a bad fit when you’re transforming data at a boundary.
In Summary
Whether using value objects to represent something in your domain, or data objects to bundle related data together, combining them with factory methods will help you draw nice, clean boundaries in your systems. Because these kinds of objects can be initialized independently from the inputs to their factory methods, you can avoid coupling in your system and your tests.