Structure and Purpose
There are infinitely many ways to evaluate the design of software. We have code smells like feature envy and primitive obsession. TDD people focus on coupling and cohesion. Automated tools can measure cyclomatic complexity. You can get into style and aesthetics.
This article explores how the structure of your code can communicate its purpose. It covers tips and techniques for making your code look like it does what you intend it to. If you want to write code that other developers will intuitively understand, then read on.
This article uses Ruby in its examples and some conventions are Ruby-specific, but the concepts apply to most languages.
The Worst Fizz Buzz Solution
As a counterexample, let’s consider a piece of code where the purpose of the code (and the programmer’s intent for it) has been intentionally obfuscated. This is a solution to a programming puzzle that Shopify presented at RubyConf 2022.
def fizzbuzz(upper_limit)
Range.new(0, upper_limit).each do |n|
if n % 15 == 0
puts "FizzBuzz"
elsif n % 3 == 0
puts "Fizz"
elsif n % 5 == 0
puts "Buzz"
else
puts n
end
end
end
def Range.new(_, upper_limit)
CSV.parse(upper_limit, headers: true).tap do |csv|
def csv.each(&block)
@h = Hash.new { |h, k| h[k] = [] }
super do |row|
def row.%(other)= 1
instance_exec(row, &block)
end
@h
end
def csv.puts(row)
a = @h[row["url"]]
s = [row["filename"], row["size in kilobytes"].to_i]
if a.empty? || a.last.sum(&:last) + s.last > 120000
a << [s]
else
a.last << s
end
end
end
end
puts fizzbuzz(URI("https://gist.githubusercontent.com/Schwad/9938bc64a88727a3ab2eaaf9ee63c99a/raw/6519e28c85820caac2684cfbec5fe4af33f179a4/rows_of_sweet_jams.csv").read)
The first method should give you the impression that you are looking at a solution to Fizz Buzz. In reality, this code downloads a Gist, interprets it as a CSV of song data, and groups the songs by website, dividing them to fit on CDs.
Even if you read the second method (Range.new
), the intent is extraordinarily unclear. The variables mostly have single-letter names, there are arbitrary constants (i.e. 120000
, 1
) and a number of methods get redefined. The asshole who wrote this code (me) left very few clues as what this code does and left many clues that suggest it does something other than what it does.
Hopefully, we can all agree that this code doesn’t express its intent clearly. If you found code like this in a production system, you’d be justified in reaching for git blame
to determine who needs to be held responsible (or at least who can explain what it does).
It’s easier to understand code that does what it looks like it does. Some people will say that it requires less “cognitive load” to work with. Let’s explore some of the aspects of our code that contribute to this effect.
Naming
Naming is half of the two hard problems in computer science. As such, I cannot provide a comprehensive guide to the expansive subject, but most developers recognize its role in building maintainable software. Names are your first primary tool for communicating the purpose and intent of the code you write, but there are considerations that go into naming.
Our counterexample named a method that processed playlist data “fizzbuzz”. If you’re familiar with Fizz Buzz, you would assume that the method is a solution to it. Reading the method’s body would (erroneously) confirm this. This is an absurdly bad name, but what makes a good name?
What makes a good name varies based on context; many different aspects might be more or less important on your project. Generally, concise names are preferred, but not at the expense of specificity. Names should be specific enough to avoid ambiguity and avoid conflation with similar, related concepts.
Names can communicate purpose. In a section of code that updates the title of a blog post, you could store the new title of the blog post in a variable called title
, or post_title
, or new_post_title
, or new_title
. Except for the first, they all communicate (and highlight) different aspects of the value which might be relevant to the programmer.
There’s no right answer. new_title
might make more sense in a module that only deals with blog posts or if the old title is also referenced. new_post_title
might make sense in a module that isn’t focused on blog posts, where the title change is part of some larger operation.
Like I said, naming is too broad a topic to be covered fully here. The important takeaway is that you can express intent with names. In this example, the new_
prefix communicates that the value will be used to replace another title, even before you read the code that performs that operation.
Brevity
While extreme brevity tends to obfuscate code, appropriately concise code helps spotlight important details. The previous section dealt with how you can communicate purpose by including significant details when naming constants, variables, methods, functions, and classes. Excluding extraneous details helps highlight the significant details that you do choose to include.
The Fizz Buzz counterexample is unnecessarily verbose and contains an excessive amount of incidental complexity. Despite the code’s relatively simple function, you have to sift through conditional branches that are never run and indirection that serves no value. Determining which the meaningful parts of the code unnecessarily hard.
Most code won’t be quite so obnoxious. In my talk TDD on the Shoulders of Giants I used an example where an argument was named email_notifier
when the email
part was actually an extraneous detail; the notifier object was really a shipment_notifier
. In retrospect, that name wasn’t great either, but at least it didn’t encode the irrelevant (to the consuming code) information that the notification would be sent by email.
When naming something, consider what information is relevant where the name will be referenced. Avoid long names that contain details that are either liable to change or simply not important.
Abstraction
Concise names are great, but concise implementations are even more important. This is, besides naming, perhaps the most powerful tool at your disposal for making code look like it does what it does.
There are many philosophies of how to approach breaking your code down into useful abstractions, Domain-Driven Design being the most well-known. The topic is too broad to cover in detail here, but it’s still important to discuss.
The Fizz Buzz example uses abstractions (”Fizz Buzz”, ranges, modulo division, puts
) that have absolutely nothing to do with parsing song data and splitting it up onto CDs. A better implementation might have employed abstractions like songs, playlists, and CDs. These objects could have been given methods that mapped to the coherent operations, like adding a song to a CD.
More apt abstractions not only communicate the purpose of the code more clearly, but provide an opportunity to use abstraction to hide the details of the implementation, leaving concise top-level code that makes the broad intent of the code clear. Consider this alternate solution. (I’ve omitted the class implementations for clarity.)
songs = Song.from_csv("https://gist.githubusercontent.com/Schwad/9938bc64a88727a3ab2eaaf9ee63c99a/raw/6519e28c85820caac2684cfbec5fe4af33f179a4/rows_of_sweet_jams.csv")
cd_collection = CDCollection.new
songs.each_song do |song|
cd_collection.add_song(song)
end
puts cd_collection.to_h
I don’t claim this is the ideal solution, but it makes it clear that the code loads songs from a CSV, loops over them, and adds them to a collection. You might feel that this is enough detail for the top-level, but you might also want to make it clear that the songs are being grouped by blog. It would be relatively easy to tweak the structure of this code surface that detail.
The abstractions you use not only serve to explain the domain of your application to other programmers, but also offer the opportunity to highlight what operations are significant and which are mere details. Place important details at the top and push implementation details down the object tree and into private methods.
Colocation and Ordering
Putting things together implies that on some level they are related. For example, you can arrange the assignment of dependent variables so that they are consecutive. When the developer reads the first and sees that the second depends on the first, they can infer that subsequent consecutive statements likely also depend on the computations before them. This takes advantage of both colocation and ordering.
The idea follows for not just variables and computations, but also also things like methods. If a class has many methods, you can colocate methods that are meant to be used together and even place them in the order that they are meant to be used.
Colocation and ordering only really hint at the programmer’s intent, but they are a nice way to nudge readers towards the information that they’re looking for in the code. Being consistent with the technique can result in codebases where things seem to be where you expect them, without necessarily realizing why.
Primitives and Syntax
Syntactic constructs have intended purposes. Loops are for iterating over things. if
statements are for conditional logic. Modules are for organizing related functionality. Pattern matching is for matching patterns. Especially in Ruby, you can use these constructs for other things, but you’re violating the principle of least astonishment and obfuscating the purpose of your code.
Loops are a general-purpose primitive that gets used to perform specific actions which can be expressed more directly. In most languages, you can procedurally build a new array using a loop, but you could clarify your intent by instead using map
and filter
. You can loop over an array and break out once you find the value you’re looking for, but you could simply call find
instead. In both cases the latter better expresses the purpose of the code.
The same idea goes for values. Primitive values all have their own semantics and interfaces. In Ruby, you could represent the genre of songs in a playlist with numbers, but symbols, strings, or your own value object will probably better reflect the operations that make sense for those values.
Using (or creating) constructs that reflect your domain helps clarify the purpose of your code. Using syntax, primitives, and types that are unnecessarily general makes your intent less clear. This leads us to the broader topic of specificity.
Specificity
Rails adds the concept of presence/blankness to Ruby. You can ask whether any value is present or blank, which are opposites. Empty strings, empty arrays, empty hashes, false
, and nil
are blank. All other values are not blank; they are present. The purpose of this feature is primarily to handle uncertain and heterogenous user inputs.
It’s very useful, but it’s also overused. It’s common to see a conditional like this in a Rails codebase.
if fido.present?
# Do something with fido
end
There’s nothing intrinsically wrong with this code. The code tells us that fido
might be some kind of “blank” value and that we only want to execute the logic in the conditional if it’s not.
This is totally reasonable, but what if the developer who wrote this code knew what kind types of values fido
might hold, and that it was always either an instance of Dog
or nil
. The developer could structure the code to reflect that.
There are a few options. You could simply change the first line to if fido
. Most Rubyists would recognize that for what it is. If you wanted to be really explicit, you could write unless fido.nil?
. Both give the reader a much better idea of the possible values of fido
.
Using unnecessarily general functionality has benefits in terms of the flexibility, but downsides when it comes to communicating intent. Excessively defensive code gives the impression that it needs to be defensive for some reason.
Conventions
Another way to communicate your intent is using conventions. Conventions can come from your language, framework, community, or project. Good conventions used well can help to compress a large amount of information about the author’s intent into a small token.
For example, projects that use Ruby’s RSpec testing framework often format the names of test contexts to indicate whether the method under test is an instance method or a class method using a single-character prefix; #foo
is an instance method and .bar
is a class method. Just by reading the first character, you’ve already gained context on what’s under test.
Not following conventions has the opposite effect. In Rails, the public methods on controllers are normally “actions”, methods meant to service HTTP requests, but nothing stops you from adding public methods that aren’t used as actions.
By violating conventions, you’re liable to confuse someone reading your code later. The structure of the code misleads the reader. Someone will eventually look at the public method and try to understand it as an action. At least briefly, you’ll have confused them.
Many frameworks and languages come with their own conventions, but you can create your own too. Teams and projects can agree on and document their own conventions to more richly communicate domain-specific information that would otherwise be cumbersome to communicate. Be careful here, though; there’s an onboarding and maintenance cost associated with this practice.
Whatever the conventions at play, follow them wherever you can. Break conventions only when the break is meaningful, and document breaches appropriately.
Cleverness Considered Harmful
“Clever” code is code that performs some action in a unique or unexpected way. It is the opposite of what we’ve discussed here. Clever is always bad unless the goal is to show off how clever you are.
The XOR swap algorithm is an example of this. The algorithm allows you to swap the values of two variables without an intermediate variable by repeatedly applying the “exclusive or” operation on them.
To someone unfamiliar with this technique, code that uses this technique will appear to be some kind of bitwise computation. The alternative, using an intermediate variable, not only looks like a simple reassignment, but even offers you the opportunity to name that intermediate variable, making your intent clearer.
For the sake of future readers of your code, avoid unnecessary cleverness. Being clever obscures both what your code does and what it’s trying to do from other developers.
Josh Comeau has an excellent article titled Clever Code Considered Harmful that is well worth a read if you want to explore this idea further.
Code is a conversation
Some of the aspects listed here have only minor impacts on how easy your code is to understand to other developers, but others are more significant. Together they add up. It can be hard to put your finger on why, but code that expresses its intent and purpose well feels different. You can understand it intuitively, independent of its complexity.
Conversely, code that fails at communicating its purpose is hard to follow and work. It’s frustrating, and the fact that it’s difficult to tell why compounds the frustration.
How well your code expresses its purpose and your intent is only a single lens through which you can evaluate a design. It’s an aspect worth considering, but no less so than performance, or domain-modelling, or SOLID, or coupling and cohesion. Different aspects of design matter more or less depending on the context.
If you’re looking to write code that’s easier to understand for the next person, consider what your code looks like it does, which information it highlights rather than hides, and how specifically it expresses your intent. Code isn’t just commands for a computer; it’s also a conversation with other developers. Make it a clear conversation.