Clean Code Tip 1: Use abstractions

In software engineering, abstraction is the process of hiding complex implementation details and exposing only the essential features or behavior of a system, object, or function. To put it simply:

Abstraction means focusing on what something does rather than how it does it.

How are abstractions connected to clean code? They are closely linked. You can't write clean code without using abstractions. Here’s an example written in assembly code for the Intel x86-64 processor:

section .data
    msg db "Result: ", 0
    newline db 10, 0

section .bss
    result resb 4

section .text
    global _start

_start:
    mov eax, 5
    mov ebx, 7
    add eax, ebx
    mov [result], eax
    mov rax, 1
    mov rdi, 1
    mov rsi, msg
    mov rdx, 8
    syscall
    mov eax, [result]
    add al, '0'
    mov [result], al
    mov rax, 1
    mov rdi, 1
    mov rsi, result
    mov rdx, 1
    syscall
    mov rax, 1
    mov rdi, 1
    mov rsi, newline
    mov rdx, 1
    syscall
    mov rax, 60
    xor rdi, rdi
    syscall

Is the above code clean? I would say no. It is not readable, understandable, or easy to maintain. It is not readable because there is too much to read. If you knew assembly language, you might be able to read the code and understand what each line is doing. But overall, it is still not easy to understand or maintain. Let's modify the above example to use figurative language that allows us to use abstractions within the assembly code. Using that language, our example immediately becomes cleaner: more readable, understandable, and easier to maintain.

section .data
    msg db "Result: ", 0
    newline db 10, 0

section .bss
    result resb 4

section .text
    global _start

_start:
    add sum of 5 and 7 to result:
      mov eax, 5
      mov ebx, 7
      add eax, ebx
      mov [result], eax

    print message:
      mov rax, 1
      mov rdi, 1
      mov rsi, msg
      mov rdx, 8
      syscall

    convert result to ASCII (simple version: assume one digit):
      mov eax, [result]
      add al, '0'
      mov [result], al

    print result:
      mov rax, 1
      mov rdi, 1
      mov rsi, result
      mov rdx, 1
      syscall

    print newline:
      mov rax, 1
      mov rdi, 1
      mov rsi, newline
      mov rdx, 1
      syscall

    exit:
      mov rax, 60
      xor rdi, rdi
      syscall

In the above code, it is relatively easy to read and understand that we add numbers 5 and 7 to a result, print a message followed by the result converted to ASCII, print a newline, and terminate the program. However, if you need to analyze an individual assembly code section, reading and understanding will be hard for an average programmer.

Let's use Python instead of assembly language so that we can get rid of all the mov, add, and syscall commands:

result = 5 + 7
print("Result: ", result)

The code presented above is considered clean because it is readable and understandable by any programmer. It is also straightforward to maintain. For instance, to change the numbers being added, one only needs to modify the first line; to alter the printed message, the second line should be adjusted. Abstraction in programming languages relies on the use of well-named functions that invoke other well-named functions. For example, when utilizing a sort() function, the primary concern is that it sorts a list, rather than the specific sorting algorithm implemented internally, such as quicksort or mergesort. This exemplifies abstraction, where implementation details are concealed and users interact solely with a simplified interface.

We can increase abstraction even more if we enclose the above Python code in a well-named function:

def printResult:
    result = 5 + 7
    print("Result: ", result)

When using the above printResult function, we do not need to care how the result is calculated or printed.

We can still increase the level of abstraction by making the function more generic by making it parameterized:

def printSum(a, b):
    result = a + b
    print("Result: ", result)

The printSum function is a more generalized version of printResult. When using the printSum function, we can print the sum of any two values, and we don’t have to care how the sum calculation is done, nor how the resulting sum is printed.

There are three types of abstraction in software you can use:

Data abstraction: Hides how data is stored or represented. Example: Using an ArrayList in Java — you know it holds elements, but you don't need to know the underlying array mechanics (in most cases).
Procedural (or functional) abstraction: Hides how a function performs a task. Example: You call calculateTax(income) — you don’t see the internal formula or logic, and that's ok because you should be able to trust the abstraction: the function calculates the tax for the given income always correctly.
Control abstraction: Hides control flow details (loops, conditions). Example: Using the forEach() method on a JavaScript array instead of writing a for loop. Functional programming utilizes higher-order functions to establish control abstractions. Employing higher-order functions such as map, filter, and reduce results in more concise and readable code, while also reducing the likelihood of errors in control flow construction.

Why abstraction matters? Let's analyze that in the following sections.

Simplifies complexity

A key principle in software engineering is that abstractions facilitate the comprehension of large systems. For example, an application written entirely in assembly code or as a single, monolithic function containing thousands of lines would lack abstractions, making it difficult to understand and maintain. While individual lines within such a function may be readable, the overall purpose of the application often remains unclear. Without contextual understanding, it becomes challenging to retain information from previously read code segments. Employing effective abstractions improves code clarity and maintainability. With well-defined abstractions, developers typically do not need to examine implementation details; it is sufficient to rely on the abstraction's descriptive naming and intended functionality. For instance, a function such as calculateTax(income) is expected to compute the tax for a given income. Accessing the implementation of calculateTax(income) should be unnecessary in most cases and should occur only when modifications are required, which is infrequent.

Enables reuse

Abstract components, such as functions, can be reused across multiple contexts. A single piece of code can be written once and utilized in various locations within a codebase. In the absence of abstraction, identical functionality may be implemented repeatedly throughout the codebase. This practice reduces code cleanliness and increases the risk of errors during maintenance.

Improves maintainability

Modifications to the internal implementation do not impact users of the abstraction. For example, if the method for calculating tax on a given income requires adjustment, this change is made solely within the calculateTax(income) function. Other functions that utilize this abstraction remain unaffected and continue to operate as before, without awareness of the underlying implementation change.

Enhances flexibility

Implementation can change without breaking external code. For example, if you have a sort() function that currently uses the quicksort algorithm, you can easily change its implementation to use a different algorithm, e.g., mergesort. This change does not affect the users of the sort() function.

When designing procedural abstractions, it is essential to use descriptive function names. For example, consider a function named deleteOrphaned(exportName). This name lacks clarity because it does not specify what is being deleted (it is not the exportName). The term "orphaned" serves only as an adjective and does not identify the object of deletion. To improve clarity, the function should be renamed to deleteOrphanedResources(exportName), which specifies that resources are being deleted. This assumes familiarity with the context and the type of resources involved. If the resource type is not clear, the function name should be further refined, such as deleteOrphanedKubeResources. However, the terms "orphaned resources" and the purpose of the exportName argument remain ambiguous. Upon examining the function implementation, it may become evident that the function deletes resources when the exportName changes, rendering the current resources orphaned. The function name should reflect this behavior, for instance, deleteExistingResourcesIfExportNameChanged(newExportName). This name clearly communicates the procedural abstraction. If the function is part of a namespace, package, or object, it could be named resources.deleteExistingIfExportNameChanged(newExportName). Further discussion on naming conventions for functions and other software entities will follow.

Choosing clear names helps us understand abstractions. When we struggle to understand them, we have to dig deeper to see what information is hidden. This means we end up reading and understanding more code.

Creating abstractions, whether they are function signatures or interfaces, is not always as easy as creating something concrete. This is why you must be ready to correct your initial assumptions and change the abstraction if it turns out to be incorrect.

If you are interested in reading more about clean code, grab yourself a copy of my 140 Tips For Clean Code booklet.