Abstraction in programming
There is this concept is in programming called "premature abstraction]". A closely related term is "over-engineering", it also has its own slogan: YAGNI. But there are also things that you are going to need that are a lot harder to include later than now. So how do you decide if you are over-engineering or not? Most advice related to how not to over-engineer is basically this:
- don't solve a problem you don't have now
- don't look too far into the future I find that this advice is not actionable or useful and misses the point entirely. Here's how I think of abstractions that help me in programming and the principles I use to orchestrate the abstractions in my programs.
Let's start with what an abstraction is
Here is a video of Alfred Korzybski1 explaining what an abstraction is Abstraction is a pattern in the information we get from reality. We abstract because it makes acting in the world possible. Abstractions are bosonic, or wave like, in that multiple things can occupy the same place. Like multiple signals can be extracted from the same waveform. The same stream of ones and zeroes can be interpreted in multiple ways. We can get multiple abstractions from the same pattern in reality, the rubin vase illusion illustrates it nicely. It illustrates it better than Korzybskis example because the picture does not change at all, the change is only in the interpretation.
If you make one interpretation explicit by naming it, it hides the others. And what are the chances that you have the best abstraction? I am not sure if there is a best one, but I have heuristics to find good ones and avoid bad ones, I will explain those in other blog posts down the line.
Almost all of abstractions have to stay abstract, implicit, because some data can have any number of interpretations. Different abstractions can be applied to the same data. For example the same bites can be seen
- as an integer
- or as an unsigned integer with an error flag When you have a function indexOf that returns -1 if the element is not found in an array. It is semantically used as an unsigned int for the index and an error flag bit. Or the same abstraction can be applied to different data. For example the idea of a function, which is a mapping between two sets can be represented as:
- a function in a programming language
- as a switch case statement
- as an array, where the domain is represented by the indices of the array
- as a hash_map, where the domain is represented by the keys of the hash_map So, are there any heuristics that can help us manage abstractions? I read this enlightening blog post from Jason Cohen, it is about how to validate startup ideas, but this categorization of convergent versus divergent ideas is incredibly useful in programming. The example of the drag function is a divergent abstraction, because as you expand the context it has to change. Conservation of energy on the other hand is a convergent abstraction, because as you expand the context it does not have to change. I don't like the terms divergent and convergent for this explanation, so I will instead use encapsulating and expansive.
- drag function is an encapsulating abstraction, because it encapsulates the context. (That drag function is only useful in a certain velocity interval, certain pressure interval, etc...).
- conservation of energy is an expansive abstraction because they expand to a larger context.
From a mathematical viewpoint Abstraction is invariants. An expansive abstraction is a global invariant for the system we are currently working on. Conservation of energy comes from continuous translation symmetry, if the invariance of a system of equations under translation holds, it will have conservation of energy. As a program matures, its context typically expands. If you rely on encapsulating abstractions, they will likely need to change. In contrast, building on expansive abstractions is more stable, as they can accommodate growth without modification. If you write a function: f_drag(), you can use it within a certain velocity interval, pressure interval, etc... But if the context widens: you want to get to the moon, f_drag() either changes to handle the wider context or you rename it to f_drag_with_limits(). When choosing abstractions that you want to reuse:
- you have to make sure that they are either global invariants in the system you are building
- or explicitly state the context boundary that it can be used in A major issue with OOP languages is that objects inherently function as encapsulating abstractions. This leads to a tangled web of interdependent objects, making widespread changes necessary as the context expands.
Expansive abstractions can serve as ground truth data, while encapsulating abstractions should be more flexible, pushed to the boundaries of the system and not built on. I talk more about this in Modeling - territory first.
Here are some properties that expansive abstractions have:
- they are dependable and less likely to change
- they tend to increase your optionality instead of decreasing it.
- they tend to let you keep control instead of stealing it from you.
- they tend to be close to the territory instead of close to the map.
In the following blog posts I will illustrate some examples and explain how these properties apply. Since abstraction is an overloaded term in programming, I will use the term modeling instead. These heuristics are in different blog posts, but they are not necessarily separable, and they reinforce each other.
He is the originator of the phrase "The map is not the territory"↩