Modeling - data first

05 Feb, 2025

A common assumption many programmers have that creates unnecessary complexity is encoding concepts they want to model in programming language constructs, instead of just data. What I found is that unlearning that instinct and trying to model the concept as simple data first, leads to more maintainable programs. Here are some examples:

Sentinels (special cases as data)

Other names: null objects, stubs, special objects Instead of modeling exceptional cases in code and littering the codebase with nil checks or exception juggling, just model special cases as data. Instead of returning nulls, return empty struct or object so that the consumer does not have to check for nulls. Here are different forms of this idea: Data structures and invariants Guarantee valid reads - Ryan Fleury Special Case - Martin Fowler

Instructions (verbs as data)

The most common form of modeling a concept in code instead of data is modeling a verb as a function or a method in OOP. Here is an example of this happening. Here is this conversation between Casey Muratori and Robert C. Martin about clean code. cleancodequa.md cleancodeqa-2.md It is pretty long, they talk about "cpu cycles vs developer cycles". Robert talks about how it is worth to sacrifice cpu cycles for developer cycles because computers are cheaper than developers. Towards the end of part 2 we get some actual code. Robert writes a bad version with switch case and a good version with dynamic dispatch and explains why the version with dynamic dispatch is better because it sacrifices performance for maintainability. Then Casey says hold your horses, since he is the switch case advocate, he should write the switch case code. The important thing is that Casey breaks the dichotomy of performance vs maintainability because his version is both more performant and more maintainable, then both the switch case and the dynamic dispatch versions of Roberts code. The real, impactful difference that is not highlighted in the conversation is that Robert, like many programmers has the inherent assumption that verbs have to be modeled as functions or as object methods in the dynamic dispatch case. This same false assumption leads to what is known as The Expression Problem—an artificial issue caused by modeling verbs as functions or methods. ### Branchless programming (control flow as data)

for (int i = 0; i < N; i++)
    if (a[i] < 50)
        s += a[i];

for (int i = 0; i < N; i++)
    s += (a[i] < 50) * a[i];

Here instead of branching based on the predicate, the boolean information can be transformed into a number:1 or 0 to do arithmetic with. ### Ontology as data Here is another example from Casey's blog, that illustrates using an enum (modeling with data) vs class hierarchies (modeling with programming language constructs). ### Abstraction through metadata Here are some great blog posts from David B. Black on using metadata for abstraction. The Three Dimensions of Software Architecture Goodness The Progression of Abstraction in Software Applications How to Improve Software Productivity and Quality: Code and Metadata How to Pay Down Technical Debt

Instead of building abstractions from programming language constructs, we should architect our systems in such a way that it is easy to pull out abstractions from it. Here is an example of that: ### Redesign of the routing api in web frameworks First, let's see the way most, if not all web frameworks model the concept of routes. Express

app.get('/user', (req, res) => {
  res.send('Got a GET request at /user')
})
app.post('/user', (req, res) => {
  res.send('Got a POST request at /user')
})
app.put('/user', (req, res) => {
  res.send('Got a PUT request at /user')
})

Go chi

r.Get("/", home)
r.Get("/contact", contact)
r.Get("/api/widgets", apiGetWidgets)
r.Post("/api/widgets", apiCreateWidget)
r.Post("/api/widgets/{slug}", apiUpdateWidget)
r.Post("/api/widgets/{slug}/parts", apiCreateWidgetPart)
r.Post("/api/widgets/{slug}/parts/{id:[0-9]+}/update", apiUpdateWidgetPart)
r.Post("/api/widgets/{slug}/parts/{id:[0-9]+}/delete", apiDeleteWidgetPart)
r.Get("/{slug}", widgetGet)
r.Get("/{slug}/admin", widgetAdmin)
r.Post("/{slug}/image", widgetImage)

Java Spring

@SpringBootApplication
@RestControllerpublic
class DemoApplication {
	@GetMapping("/helloworld")
	public String hello() {
		return"Hello World!";
	}
}

Why do I think that these are bad abstraction for the concept of routes? If we look at these with the properties I mentioned in my previous blog post: What is abstraction in programming

they take away control from you: The framework is in control of the abstraction, I mean this both in terms of control flow: (callback based interface, the Java annotation based Interface is also a form of callback, it calls the hello method when it wants to) and in terms of project control: (the framework team can decide to change the way routes are specified and than the dependent team gets more work to do)
they are encapsulating: This overlaps with it taking away control, you don't have access to the data, only through apis that are provided to you. The abstraction encapsulates the context that the framework API developers were thinking about.
they decrease optionality: This also overlaps with the taking away control property. You basically sell an options contract and bet against volatility, if new requirements that were unforseen by the developers of the framework API appear. It makes it very hard to use this sort of abstraction for them. This abstraction is overfitted to the context that the framework developers had in their minds during development. How can we design a better API for routing if we are in charge of developing a framework, or better yet a library? How can we reverse those properties?
it should keep the control in the hands of the user
it should not encapsulate
it should increase optionality We have to encode the route concept as data:

// Created by the app developer
type Route struct {
	Path string
	/// etc
}

type RouteId int
const (
	RouteId_None RouteId = iota
	RouteId_Home
	RouteId_Settings
	// etc.
	RouteId_COUNT
)

// this is one way to declare the routes
// but you can build it up any other way
// create an addRoute function that makes it more ergonomic
// the important part is that the data structure of the route concept
// is in your control
var routes = [RouteId_COUNT]Route{
	RouteId_None: {},
	RouteId_Home: {
		Path: "/"
		// etc.
	},
	RouteId_Settings: {
		Path: "/settings"
		// etc.
	},
	// etc.
}
// this is given by the library
type RoutingState struct {
	// defined by library
}
// Given by the framework
func SetPaths(state *RoutingState, paths []string) {
// this is called at startup time
// the framework can create any kind of accelerating index to find the route fast
}
func MatchRoute
(state *RoutingState, r: http.Request)
(idx int, pathParams map[string]string)
{
	// parse the request path, take out params and match the route
}

// usage
{
	routingState := RoutingState{}
	paths := getPaths(routes)
	SetPaths(&routingState, paths)

	// in a global handler
	{
		idx, pathParams := MatchRoute(&routingState, r)
		routeId := RouteId(idx)
		if routeId == RouteId_None {
			// return 404
		}
		// etc.
	}
}

If the route matching functionality is designed in such a way, it does not take control away from the app developer. If the concept of a route is encoded in data, it increases optionality. How? Let's say that you are maintaining a web store, and you run into this problem that looses a lot of conversions.And your boss says that this should never ever happen. How can this design help you with that? Well, you already have routes as enums that you can refer to. You need a make sure that you create links with functions and enums.

func GoTo(from RouteId, to RouteId, queryParams string) string {
return "href = \"...\" data-link-id=\"...\""
}

// in html
<a GoTo(RouteId_Product, RouteId_Checkout, ...) >

What you want is to create a data abstraction, a graph of possible user journeys, where the nodes are routes, and the edges are links, or buttons. There are different ways to achieve this, for example parsing source code files, and recording place where the GoTo functions were called, and from those build that graph. Let's call that the reachability_graph, it tells you where you can go from which route by clicking links. What you can do is to query the graph every way you can go from the Home page to the Checkout page. Get a bunch of paths, route1 --linkId1--> route2 --linkId2--> route3 From those paths you can generate an automated browser test suite based on those paths. When clicking you find the anchor or button tag by the data-link-id, and get the bounding box of the rectangle, generate am x, and y coordinate inside that rectangle, and then issue a click to those coordinates. And test if you reached the next node. If something is overlapping the button, or link, it will get clicked, and you go to the wrong page. You can generate this test suite for you essential user journeys. It will find bugs mentioned in the video linked above, and also other weird css bugs, where something overlaps anchor tags or buttons. You can easily add new routes, without changing the test generating code. The more the project evolves, the more useful this becomes. This is made possible because:

the abstraction of route is encoded as data
the abstraction is globally invariant
- the graph of routes, and links, is a global invariant in this project, the graph itself changes as the project evolves, but the fact that it is a graph structure does not, you can safely build on it It increased optionality, because it helped you create a powerful solution to an unforseen problem. Doing this with any existing framework would be much harder, at the very least you have to include the framework into the test suite generating code, or the framework developers have to have thought about this problem, and developed a solution for you. This reachability_graph can be used for many other useful features, for example it can help you put numbers on User eXperience. See this book for more on how to use CS data structures to measure UX: Press On – Principles of Interaction Programming. The important part is that having concepts encoded in open data opens up ways to solve problems more effectively.

Here Jonathan Blow and Casey Muratori talk about this concept: Jonathan Blow And Casey Muratori talk about how in the crafting interpreters book they moved data into code. They talk about why putting concepts in code instead of data is a bad idea. First Jon talks about the performance reasons, second about maintainability reasons.

Conclusion

Many programmers focus too much on building encapsulating abstractions. What we should focus on instead, is write the software in such a way that we can easily pull out abstractions from it, we can do that my moving concepts into data and meta-data.