Why should we write clean code?


Recently I took part in an architecture dojo. This is an exercise, where a certain simplified example of a program specification is given. The participants are asked to propose an architecture how the task can be solved. Architecture dojos usually do not focus on the code level, rather on a more course grained level dealing with the structure of the program rather than the code itself.
The goal is to present a good architecture in the end after a certain period of time. What makes this architecture good and what makes it bad? The outcome should meet certain requirements like evolvability, testability and the alignment to clean code principles.

The specification

There is OCR software which scans paper documents with document numbers. It produces a file with multiple entries.
One logical line consists of three physical lines and may look like this:
Each character consists of three horizontal characters over four lines. Only pipes and underscores are allowed and can be arranged to represent a certain number.
The fourth line contains no data, as it serves as delimiter for the next document number.
You will have to write a program that reads this file from a client computer, parses it and outputs the actual document numbers on the screen.
Please keep in mind: In the future, the source where the files are uploaded from may change from client side storage to server side storage. Additionally the solution should be easily testable using unit tests.

Executing the program

Of course the task of this blog post was not to implement a program. But here it is.
This is the sample file content.
Sample File
And this is the output by the program.
sample output

Bringing design principles into play

To achieve the requirements given in the specification, a single monolithic program wouldn’t be an option.
A single program which reads the file, converts and parses it within one huge method, wouldn’t be easily testable.
Additionally, any other source where the file may come from would have to be coded directly into that method. This change would cause a break to the encapsulation.
This brings us to the question how clean code principles might help us to enforce a good architecture.
Especially the following principles should be considered:

  • Single Responsibility Principle (SRP)
  • Open Closed Principle (OCP)
  • Single Level of Abstraction (SLA)

No more buzzwords!

What do all these principles mean? Let’s have a short briefing.

Single Responsibility Principle (SRP)

A class should have only one responsibility. Always try to give the class, that you are currently developing, a description of what it does. If you need to use the word “and”, or you can only hardly cope without using it, there could be a violation to SRP.

Open Closed Principle (OCP)

A class should be open for extensions, but closed for modifications. This usually only works for some kinds of changes, but not for every thinkable future change. A list of what could change with the program, which we need to design in the dojo, is mentioned within the specification.
For example, if the way how the file is going to be accessed changes, the change should not affect existing code in the core program. Every new data source will have to be made available through extension, but not through modification.
But please remember: As often, the list of the above mentioned possible changes might not be a complete one.

Single Level of Abstraction (SLA)

“Each piece of code should talk on a distinctive level of granularity. Don’t mix implementation details with invocation of high-level abstractions. Refactor code to balance the level of abstraction.” (http://lumiera.org/project/background/CleanCodeDevelopment.html)
If we apply this principle to the future design, there should be at least distinct methods to access the file contents, to group the logical lines into sets of physical lines, or to split the logical characters which cover multiple physical lines into distinct 3 character X 4 lines chunks. The calls to these and certain more methods will have to be arranged by another and more course-grained method.

Okay, so what needs to be done?

Let’s get an overview of the different responsibilities of the program. First of all, there will be some kind of file access. When we got the file into the memory, it should be separated into chunks of four lines each. This is another responsibility which actually has nothing to do with how the file is accessed. Finally, each chunk consisting of string lines will need to be separated by another functionality, which extracts the characters out of the chunks. As these characters are still somehow encodes (as every character spans 3 characters x 4 lines in the file contents) there will be another function that parses these characters. In the end, the parsed characters will need to be concatenated and returned by a component which arranges the calls to the described functionalities.

File access

The access to the file contents is a functionality which will be replaced later on by another implementation (client side upload vs. server side upload). This is what is already mentioned in the specification. We decide to implement that each of these functionalities is implemented by a specific class. These classes will need to implement a common interface as callers to these functionalities should not really need to care about how the file access is implemented in particular. The first implementation of this interface will have to read a file from the client computer.

Lines Separation

Each logical line which represents one account number consists of four physical lines. We will need to split these into chunks of four lines each. This responsibility is specified by another interface.
If any detail relevant for the line separation should change in the future (e.g. if another empty line besides the fourth one is introduced), the caller of this functionality should not really care about what is going on. Instead, we just describe the functionality by an interface which will also need to be implemented by a specific class later on. The output will be TYPE TABLE OF STRING_TABLE (custom type)

Character separation

Each line which consists of four physical lines should be separated into chunks which represent the logical characters (that is, chunks of 3 x 4 characters). So here is another interface which describes this interaction. Character separation will take place for each logical line, so the input will be STRING_TABLE and the output will be another custom type which represents a nested table with logical characters. A logical character representation is TYPE TABLE OF CHAR03, a collection of them is just another nested table of TYPE ZOCR_T_LOGICAL_CHARACTERS.

Character Parsing

Every logical chunk representing a real character which is relevant for the output, must be parsed.

Who is the organizer?

Of course there should be a class which organizes the calls to the string manipulation and file access components.
Here it is. The interface requires the method to return a list of account numbers in string format.

The whole picture

What does it look like now? How does the implementer of ZIF_OCR_PARSING_ALGORITHM know all the other components which are to be used?
The answer is “constructor injection”. Every dependency is handed over to the implementer of ZIF_OCR_PARSING_ALGORITHM when the CREATE OBJECT statement to create this instance is executed. These dependencies are instances of specific classes on their own. They implement the already introduced interfaces.
There will be another caller which puts all of these functional bricks together, but for now, this information should be comprehensive enough to form a big picture:
Big Picture


This approach of having such small classes might look a little bit strange for someone who developed in ABAP for years. But it has its benefits. The resulting structure is easily maintainable as interfaces hide the implementation details of classes and their sub-components (that is, more fine-grained, other classes). The implementations behind these interfaces can be switched easily without changing existing code within the core parser code.
Additionally, unit tests can be applied quite easy to each of these implementations. Unit tests have the goal to document and test behavior consistently, comprehensible and repeatable. This safes testing efforts when in integration tests and tracks down possible issues with the code to specific classes just by executing the unit within milliseconds.
Writing a testable program and applying unit tests to it usually takes up two to three times of what it would take to write it quick & dirty. But when it comes to testing and when issues start to arise, the test-based approach takes up speed in terms of defect rate and time to localize an error. In the end, it usually safes more time than it costs you to do it properly.


5 thoughts on “Why should we write clean code?

  1. Dear Uwe,

    I read you post with quite some interest and as “someone who developed in ABAP for years” this indeed feels quite strange to me.
    Though I’m a big fan of ABAP interfaces, I don’t know whether I like the reduction of an interface to one method. I intentionally had created the implementation of the parsing algorithm interface as one ABAP class with four protected methods, which are directly implemented by the zcl_ocr_parsing_algorithm. Variations could be implemented using inheritance. Of course, composition should be in general favored to inheritance in general, but I believe that not only the SRP should be applied, but also that objects with high cohesion should be grouped together in one class, if each of the responsibilities is only relevant in the encapsulating context.
    If at a later point in time a subset was identified as potentially of interest in other contexts, I’d favor to refactor it to an own interface.
    Injection in the constructor – even to an interface – looks like quite tight coupling to me as it leaves the consumer of the coarse-grained class (the algorithm, in your case) to instantiate the composites. I’d favor the composition by delegation to an instance which has been accessed via a factory from inside the protected methods.
    As per the argument for easy unit-testing, I don’t quite agree. Even within a class with multiple protected methods, each of these methods can be tested using a separate testclass.
    What I described is not less effort with respect to implementation, but the number of artifacts is a lower and – imho – increases understandability of the model.
    Looking forward to your reply, I’m sure you had discussed this approach in your dojo as well and have not discarded it without good reason 😉

    Looking forward to you future posts 😉


  2. Dear Oliver,

    you are absolutely right, If the encapsulating context leads to objects with high cohesion, you could group the functionality in one class with specific protected methods.
    From the Testing point of view, this makes no difference. If the requirement forces us to reuse certain functionalities, they could be separated the way you metnioned.
    Only if every sub-functionality would have been refactored over time, the result would be the described class diagram. But usually this is not the result of the first iteration.

    However, your described approach to instantiate the composites via factory calls from inside the protected methods sounds like the service locator pattern – which is actually a common antipattern.
    Allowing the course-grained function to receive the fine-grained objects via dependency injection keeps the coupling loose as the injected implementation details might vary per use case. Actually, it flattens the way to use a Dependency Injection (IoC) container later on. The use of such a container even makes a seperate factory needless – at least for this specific scenario.



    • Dear Uwe,

      There have been plenty of discussions about whether a service locator is an anti-pattern or not. Imho, it really depends on the usecase. Can we agree that what the service locator does is to hide the dependencies?
      In the sample you described, you expose the dependencies of the algorithm to the consumer, so that the consumer has the chance of injecting the various parts when constructing the algorithm. But for being able to do this, he has to know about the inner construction of your algorithm. I am not convinced, that this is always intentional for the sake of “easy consumption”. Of course, you could start with optional parameters in the constructor, but I’m sure you don’t like that either 😉
      If an injection is really necessary, I’d favor the injection of an abstract factory. For the instantiation of the factory, varying mechanisms with more or less variable composites could be offered.
      Of course, there are benefits and pitfalls in this mechanism as well, but I believe that it’s just one step further towards a loose coupling without exposing too much of information to the consumer.
      Having said that: It really depends on the scenario and also the technology used. Given a development environment, where code can always be analyzed easily at runtime and which still has some performance issues instantiating objects (ABAP), I believe that the pitfalls of the service locator might be favorable to the higher complexity of consuming an injected container – depending on the usecase, of course 😉


      • Dear Oliver,

        your comments sound reasonable and I feel I missed to add some more contents to the blog post related to how I would have implemented the client side in order to make things clear.
        Probably my next blog post deals with these questions. Your suggestions sound feasible and if you don’t mind, I would like to include them as an alternative to my approach in a possible future blog post as well.
        By the way, I find your suggestions very interesting. We could stay in contact via Xing, if you want.
        So I updated my Xing profile in the “About” section.



  3. Pingback: Uwe Kunath's Blog | The issue with having many small classes

Comments are closed.