Isolate components for better testing with mockA

Introduction

This blog post is strongly connected to the presentation that has been given by Damir Majer and Martin Steinberg during SAP Inside Track 2014 in Munich.

The presentation focuses on solving a code kata using Test-Driven Development (TDD).

The purpose is to show how TDD can lead to better tests coverage, thus more robust and maintainable software components. This blog post focuses not about the solution of the code kata itself, but rather on how to test each component separately from each other using mockA.

The Kata

Summarized, the Kata says:

Implement a simple String calculator class that accepts a flexible amount of numbers in a string. The numbers should be summed up and the sum needs to be returned.

Examples:

  • An empty string returns “0”
  • For single numbers, the number itself will be returned, e.g. for “1”, the sum 1 will be returned
  • For “1,2”, the sum 3 will be returned
  • Also multiple delimiters will have to be accepted, e.g. “1;2\3;1” will lead to 7
  • This also applies to line breaks like “\n”. “1\n2,3” results in 6
  • Delimiters might have variable length. “//***\1***2***3\2***2” results in 10
  • Raise an exception in case negative numbers are passed to the method
  • Numbers bigger than 1000 should be ignored

The Kata requires you to implement the code step by step, without skipping steps. Every step should contain

  • A unit test that tests the requirement and will fail at the first run
  • An implementation that covers the requirement
  • A new unit test run that will succeed
  • Refactoring
  • Running the test again to ensure nothing broke

The Solution

The solution class can be found in the attachments (“zcl_string_calculator.txt”).

The class ZCL_STRING_CALCULATOR contains

  • One protected method that replaces all delimiters with a comma (“replace_delimiter_with_comma”)
  • One protected method that sums up the consolidated string (“compute”)
  • One public method to rule them all (“add”)
  • Several attributes

“add” basically delegates the task of replacing all delimiters with commas to a specific protected method. It uses its output to sum up the values.

The Unit Test report “Unit Test v1.txt” shows the corresponding unit tests.

Isolate helper methods from the add-method

While “replace_delimiter_with_comma” and “compute” are helper methods, the public “add”-method delegates its own calls to these methods. Thus, it is dependent from the helper methods.

In some point of time, it might be helpful to check, if the “add”-method works as expected, which means, that it delegates its calls correctly to the helper method.

Think of the following unit test, which does not directly link to the code kata, but may ensure code quality:

  • Test the “add” method with the string “<1sd2,3rtrt,4”
  • Ensure, that “add” calls “replace_delimiter_with_comma” with “<1sd2,3rtrt,4”
  • The call will return “1,2,3,4”
  • Ensure, that “compute” will be called with “1,2,3,4”
  • Ensure, that the output of compute is returned without modification (result will be 10)

Such a test will need you to subclass ZCL_STRING_CALCULATOR and redefine the helper methods with hard coded return values based on specific inputs. Furthermore, some logic behind “compute” should allow you verify if the method has been called with the correct parameters.

Subject to the test will be the subclass of ZCL_STRING_CALCULATOR, which will partly contain so called faked functionality regarding “replace_delimiter_with_comma”. But it will also contain some mock features, as “compute” should not only conditionally return some values based on its input, but it should also allow you to determine, if it has been called with the expected input.

mockA allows you to skip this subclassing and lets you focus on the test. It will create a subclass at runtime for you, which follows constraints like conditional method output. These constraints can be hard coded by you. It will also allow you to verify method calls of mocked methods.

“Unit Test v2.txt” shows you how to do it. Keep a look a test method “test_add”.

The first call

lo_mocker = zcl_mocka_mocker=>zif_mocka_mocker~mock( ‘ZCL_STRING_CALCULATOR’ ).

lo_mocker->method( ‘replace_delimiter_with_comma’

)->with_changing( ‘<1sd2,3rtrt,4’

)->changes( ‘1,2,3,4’

).

tells mockA to fake the method “replace_delimiter_with_comma”, while

lo_mocker->method( ‘compute’

)->with( ‘1,2,3,4’

)->returns( 10

).

tells mockA to fake the output of “compute”.

Subject to the test will be the object generated by mockA (which is a subclass of ZCL_STRING_CALCULATOR in reality)

go_string_calculator ?= lo_mocker->generate_mockup( ).

After the call of “add”, the result is verified in the unit test. But besides this verification, you may also ensure, that “compute” has been called correctly with the input value “1,2,3,4”:

DATA lv_has_been_called_correctly TYPE abap_bool.

lv_has_been_called_correctly = lo_mocker->method( ‘compute’ )->has_been_called_with( ‘1,2,3,4’ ).

assert_not_initial( lv_has_been_called_correctly ).

Further information

You may find further information of the presenters at

damir-majer.com / @majcon

http://about.me/martin.steinberg / @SbgMartin

attachments: https://code.google.com/p/uwekunath-wordpress-com/source/browse/#git%2FCodeKATA

Advertisements

No comment!

Are Comments overrated?

Comments are a wonderful tool to document what you intended with certain program logic. You can place it directly where that logic resides: In the source code. As comments are non-functional, they are not getting compiled and hence, not executed. However, I’ve got the feeling that sometimes we should really think about how to use comments and how not to use them. These points are neither fundamentally new nor innovative to developers at all, but I still have the feeling, that these are the most commonly violated principles.

Antipattern #1 – Comments do not replace program structure

Usually, developers should tell the reader of their code why they are doing something, not what they are doing. Before you start to comment by describing what the next few lines of code should do, stop and think about it.
Consider the following example. What do you think would be better to read?

*   parse header from excel

lv_row = 2.

lv_column = zcl_excel_common=>convert_column2int( lv_range_start_column ).

 

lv_current_column = 0.

WHILE lv_column <= lv_highest_column.

lv_current_column = lv_current_column + 1.

READ TABLE mt_fcat INTO ls_fcat INDEX lv_current_column.

CHECK sy-subrc = 0.

lv_col_str = zcl_excel_common=>convert_column2alpha( lv_column ).

io_excel_worksheet->get_cell(

EXPORTING

ip_column = lv_col_str

ip_row    = lv_row

IMPORTING

ep_value = lv_value

).

 

ASSIGN COMPONENT ls_fcat-fieldname OF STRUCTURE ls_scale TO <lv_value>.

CHECK sy-subrc = 0.

CASE ls_fcat-inttype.

WHEN 'D'.

<lv_value> = zcl_excel_common=>excel_string_to_date( lv_value ).

WHEN OTHERS.

<lv_value> = lv_value.

ENDCASE.

lv_column = lv_column + 1.

ENDWHILE.

 

lv_column = zcl_excel_common=>convert_column2int( lv_range_start_column ).

lv_row = lv_row + 1.

*             …Some more coding…

lv_highest_row = io_excel_worksheet->get_highest_row( ).

 

*   parse items from excel

WHILE lv_row <= lv_highest_row.

lv_column = 2.

lv_col_str = zcl_excel_common=>convert_column2alpha( lv_column ).

io_excel_worksheet->get_cell(

EXPORTING

ip_column = lv_col_str

ip_row    = lv_row

IMPORTING

ep_value = lv_value

).

lv_catyp = lv_value.

 

lv_column = 1.

lv_col_str = zcl_excel_common=>convert_column2alpha( lv_column ).

io_excel_worksheet->get_cell(

EXPORTING

ip_column = lv_col_str

ip_row    = lv_row

IMPORTING

ep_value = lv_value

).

*                           some more coding…

lv_row = lv_row + 1.

ENDWHILE.

Basically, the comments tell me what the next few lines of code will do. But there is no benefit. You just don’t understand immediately, what the code does, no matter if there are comments available or not.
Even worse, more comments would not make clear what the code does. Furthermore, as you shift code to another part of your program, the comments might lose their relationship to the code – as they are non-functional and not taken into account by the compiler. This will eventually lead to comments that are misleading users in the best case.
Consider this refactored example:
parse_header_from_excel( io_excel_worksheet = io_excel_worksheet io_scale = ro_scale ).
parse_items_from_excel( io_excel_worksheet = io_excel_worksheet io_scale = ro_scale ).

Actually, replacing comments by special methods with a useful name does not reduce complexity – but it increases readability dramatically.
By the way, that’s why I think that code scanners which measure the ratio of comments and source code are useless – as the goals that they proclaim to address cannot be addressed by such simple measurements.
Normally, I try to produce code which has no comments in the middle of some logical program unit like a method. If there are comments, they are at the top of the code – or nowhere.

Antipattern #2 – Commented code

Having code that is getting useless by the time will lead to the end of its lifecycle. Usually most of the developers just comment this piece of code out. How often have they ever read this program logic again?
I personally do not read it as it might lose track of its relationship to the surrounding source code by the time. My eyes just went over these comments. Whenever I need to delete some coding that I might miss in the future, I generate a new version of the ABAP source code and delete the code. Code Versioning systems are meant for this kind of tasks. When I miss something in the code, I take a look at the versions – This immediately allows me to compare two versions by the changes that have been applied in the meantime.

The issue with having many small classes

Aftermath

The previous blog post was about applying design principles to the architecture of a simple parser application, which served as an exercise application. It led to quite a few, small classes in the end which all had their own responsibility.
Big Picture

Issue

As most of the classes deal with solving a certain aspect of the exercise, one class has the responsibility to arrange the work of the other classes to solve what needs to be solved. This class implements the interface ZIF_OCR_PARSING_ALGORITHM . An instance of that class is given to every consumer which needs to parse some sample input.
Every low level class that this class is dealing with to “orchestrate” the program flow, is registered as an dependency. Dependencies can be declared in the constructor of this course grained class. This makes creation of that class impossible without satisfying its dependencies.

methods CONSTRUCTOR
importing
!IO_FILE_ACCESS type ref to ZIF_OCR_FILE_ACCESS
!IO_LINES_SEPERATOR type ref to ZIF_OCR_LINES_SEPERATOR
!IO_CHARACTERS_SEPERATOR type ref to ZIF_OCR_CHARACTERS_SEPERATOR
!IO_CHARACTER_PARSER type ref to ZIF_OCR_CHARACTER_PARSER .

Providing dependencies during object creation by an external caller is called “dependency injection” (DI) as opposed to having classes that create their dependencies on their own. As the creation of the low level instances is not in the control of the class that orchestrates the parser flow, this paradigm is also called “Inversion of control” (IoC).
However, this approach has another challenge ready. That issue has been described very clearly by Oliver as a reply to the previous blog post
“Injection in the constructor – even to an interface – looks like quite tight coupling to me as it leaves the consumer of the coarse-grained class (the algorithm, in your case) to instantiate the composites.”

Factory

Of course the consumer of the composite object model should not create it on its own. One possibility would be to utilize the factory pattern. Instead of composing the main object and all of its dependent objects on its own, the consumer delegates this task to a factory method, which could be a static method call in its most simple realization:
DATA lo_parser TYPE REF TO ZIF_OCR_PARSING_ALGORITHM.
lo_parser = ZCL_PARSER_FACTORY=>CREATE_NEW_PARSER( ).

Within CREATE_NEW_PARSER( ) the same magic takes place as before: in the first step, all low level objects are created, in the second step the main parser object is created, while its dependencies declared by the CONSTRUCTOR are satisfied. This approach is called “Poor man’s dependency injection”

*alternative 1: poor man's DI
DATA lo_ocr TYPE REF TO zif_ocr_parsing_algorithm.
DATA lo_char_sep TYPE REF TO zif_ocr_characters_seperator.
DATA lo_char_parser TYPE REF TO zif_ocr_character_parser.
DATA lo_file_access TYPE REF TO zif_ocr_file_access.
DATA lo_lines_seperator TYPE REF TO zif_ocr_lines_seperator.
CREATE OBJECT lo_char_sep TYPE zcl_ocr_characters_seperator.
CREATE OBJECT lo_char_parser TYPE zcl_ocr_character_parser.
CREATE OBJECT lo_file_access TYPE zcl_ocr_gui_file_access.
CREATE OBJECT lo_lines_seperator TYPE zcl_ocr_lines_seperator.

CREATE OBJECT lo_ocr TYPE zcl_ocr_parsing_algorithm
EXPORTING
io_character_parser = lo_char_parser
io_characters_seperator = lo_char_sep
io_file_access = lo_file_access
io_lines_seperator = lo_lines_seperator.

IoC Container

However, the task of manually creating dependent objects in order to hand them over to a course grained object when it is created, can be automated. Think of the facts that are known:

  • all low level interfaces like ZIF_OCR_CHARACTER_PARSER have one class that implements each of them
  • instances of these classes could be created dynamically, as their constructors currently require no input parameter
  • the course grained interface ZIF_OCR_PARSING_ALGORITHM also has a specific class that implements it
  • however, its constructor requires some instances of the low level interfaces
  • as these required parameters are instances of the low level interfaces, these dependencies could be satisfied automatically

As a brief summary, this is exactly what an IoC Container should do, also refered to as DI containers.
The registration of the interfaces and classes is done programmatically in the following example. However, it may be moved to customizing in order to “free” the code of having hard references to explizit class names

*alternative 2: IoC Container
DATA lo_ioc_container TYPE REF TO /leos/if_ioc_container.
lo_ioc_container = /leos/cl_ioc_container=>create_empty_instance( ).
lo_ioc_container->add_implementer( iv_contract = 'zif_ocr_characters_seperator' iv_implementer = 'zcl_ocr_characters_seperator' ).
lo_ioc_container->add_implementer( iv_contract = 'zif_ocr_character_parser' iv_implementer = 'zcl_ocr_character_parser' ).
lo_ioc_container->add_implementer( iv_contract = 'zif_ocr_file_access' iv_implementer = 'zcl_ocr_gui_file_access' ).
lo_ioc_container->add_implementer( iv_contract = 'zif_ocr_lines_seperator' iv_implementer = 'zcl_ocr_lines_seperator' ).
lo_ioc_container->add_implementer( iv_contract = 'zif_ocr_parsing_algorithm' iv_implementer = 'zcl_ocr_parsing_algorithm' ).

No matter, how the classes have been registered, either in the code or by customizing, the caller now only needs to call the container and request an instance from it at startup:

DATA lo_ocr TYPE REF TO zif_ocr_parsing_algorithm.
*create the container either by reference to customizing or by the registration coding shown above...
*...
lo_ocr ?= lo_ioc_container->get_implementer( iv_contract = 'zif_ocr_parsing_algorithm' ).

Why should we write clean code?

Dojo

Recently I took part in an architecture dojo. This is an exercise, where a certain simplified example of a program specification is given. The participants are asked to propose an architecture how the task can be solved. Architecture dojos usually do not focus on the code level, rather on a more course grained level dealing with the structure of the program rather than the code itself.
The goal is to present a good architecture in the end after a certain period of time. What makes this architecture good and what makes it bad? The outcome should meet certain requirements like evolvability, testability and the alignment to clean code principles.

The specification

There is OCR software which scans paper documents with document numbers. It produces a file with multiple entries.
One logical line consists of three physical lines and may look like this:
Digits
Each character consists of three horizontal characters over four lines. Only pipes and underscores are allowed and can be arranged to represent a certain number.
The fourth line contains no data, as it serves as delimiter for the next document number.
You will have to write a program that reads this file from a client computer, parses it and outputs the actual document numbers on the screen.
Please keep in mind: In the future, the source where the files are uploaded from may change from client side storage to server side storage. Additionally the solution should be easily testable using unit tests.

Executing the program

Of course the task of this blog post was not to implement a program. But here it is.
This is the sample file content.
Sample File
And this is the output by the program.
sample output

Bringing design principles into play

To achieve the requirements given in the specification, a single monolithic program wouldn’t be an option.
A single program which reads the file, converts and parses it within one huge method, wouldn’t be easily testable.
Additionally, any other source where the file may come from would have to be coded directly into that method. This change would cause a break to the encapsulation.
This brings us to the question how clean code principles might help us to enforce a good architecture.
Especially the following principles should be considered:

  • Single Responsibility Principle (SRP)
  • Open Closed Principle (OCP)
  • Single Level of Abstraction (SLA)

No more buzzwords!

What do all these principles mean? Let’s have a short briefing.

Single Responsibility Principle (SRP)

A class should have only one responsibility. Always try to give the class, that you are currently developing, a description of what it does. If you need to use the word “and”, or you can only hardly cope without using it, there could be a violation to SRP.

Open Closed Principle (OCP)

A class should be open for extensions, but closed for modifications. This usually only works for some kinds of changes, but not for every thinkable future change. A list of what could change with the program, which we need to design in the dojo, is mentioned within the specification.
For example, if the way how the file is going to be accessed changes, the change should not affect existing code in the core program. Every new data source will have to be made available through extension, but not through modification.
But please remember: As often, the list of the above mentioned possible changes might not be a complete one.

Single Level of Abstraction (SLA)

“Each piece of code should talk on a distinctive level of granularity. Don’t mix implementation details with invocation of high-level abstractions. Refactor code to balance the level of abstraction.” (http://lumiera.org/project/background/CleanCodeDevelopment.html)
If we apply this principle to the future design, there should be at least distinct methods to access the file contents, to group the logical lines into sets of physical lines, or to split the logical characters which cover multiple physical lines into distinct 3 character X 4 lines chunks. The calls to these and certain more methods will have to be arranged by another and more course-grained method.

Okay, so what needs to be done?

Let’s get an overview of the different responsibilities of the program. First of all, there will be some kind of file access. When we got the file into the memory, it should be separated into chunks of four lines each. This is another responsibility which actually has nothing to do with how the file is accessed. Finally, each chunk consisting of string lines will need to be separated by another functionality, which extracts the characters out of the chunks. As these characters are still somehow encodes (as every character spans 3 characters x 4 lines in the file contents) there will be another function that parses these characters. In the end, the parsed characters will need to be concatenated and returned by a component which arranges the calls to the described functionalities.

File access

The access to the file contents is a functionality which will be replaced later on by another implementation (client side upload vs. server side upload). This is what is already mentioned in the specification. We decide to implement that each of these functionalities is implemented by a specific class. These classes will need to implement a common interface as callers to these functionalities should not really need to care about how the file access is implemented in particular. The first implementation of this interface will have to read a file from the client computer.
ZIF_OCR_FILE_ACCESS

Lines Separation

Each logical line which represents one account number consists of four physical lines. We will need to split these into chunks of four lines each. This responsibility is specified by another interface.
If any detail relevant for the line separation should change in the future (e.g. if another empty line besides the fourth one is introduced), the caller of this functionality should not really care about what is going on. Instead, we just describe the functionality by an interface which will also need to be implemented by a specific class later on. The output will be TYPE TABLE OF STRING_TABLE (custom type)
ZIF_OCR_LINES_SEPERATOR

Character separation

Each line which consists of four physical lines should be separated into chunks which represent the logical characters (that is, chunks of 3 x 4 characters). So here is another interface which describes this interaction. Character separation will take place for each logical line, so the input will be STRING_TABLE and the output will be another custom type which represents a nested table with logical characters. A logical character representation is TYPE TABLE OF CHAR03, a collection of them is just another nested table of TYPE ZOCR_T_LOGICAL_CHARACTERS.
ZIF_OCR_CHARACTERS_SEPERATOR

Character Parsing

Every logical chunk representing a real character which is relevant for the output, must be parsed.
ZIF_OCR_CHARACTER_PARSER

Who is the organizer?

Of course there should be a class which organizes the calls to the string manipulation and file access components.
Here it is. The interface requires the method to return a list of account numbers in string format.
ZIF_OCR_PARSING_ALGORITHM

The whole picture

What does it look like now? How does the implementer of ZIF_OCR_PARSING_ALGORITHM know all the other components which are to be used?
The answer is “constructor injection”. Every dependency is handed over to the implementer of ZIF_OCR_PARSING_ALGORITHM when the CREATE OBJECT statement to create this instance is executed. These dependencies are instances of specific classes on their own. They implement the already introduced interfaces.
There will be another caller which puts all of these functional bricks together, but for now, this information should be comprehensive enough to form a big picture:
Big Picture

Benefits

This approach of having such small classes might look a little bit strange for someone who developed in ABAP for years. But it has its benefits. The resulting structure is easily maintainable as interfaces hide the implementation details of classes and their sub-components (that is, more fine-grained, other classes). The implementations behind these interfaces can be switched easily without changing existing code within the core parser code.
Additionally, unit tests can be applied quite easy to each of these implementations. Unit tests have the goal to document and test behavior consistently, comprehensible and repeatable. This safes testing efforts when in integration tests and tracks down possible issues with the code to specific classes just by executing the unit within milliseconds.
Writing a testable program and applying unit tests to it usually takes up two to three times of what it would take to write it quick & dirty. But when it comes to testing and when issues start to arise, the test-based approach takes up speed in terms of defect rate and time to localize an error. In the end, it usually safes more time than it costs you to do it properly.