Performing source code review is one of the important skills that you should pickup as an Application Security engineer. Being polygot in programming is helpful because you might be reviewing source code that are written in different languages. Right now, I seldom see anyone mentioning a systematic approach to read source code if you are a novice with the codebase or languages.
Reading source code is a underrated skill in today’s programming education. Often when we want to learn programming, we are given advice to build projects and write more code to learn programming. However, another aspects of learning programming is to read the code of other programmers.
This is why I think “The Programmer’s Brain” is one of the insightful programming books to read because it discusses about different aspects of being a programmer:
– How to get better at reading code?
– How to get better at thinking about code?
– How to get better at writing code?
Get better at reading code!
Cognitive Processes in relations to programming
We can model our cognitive processes with the below diagram:
At a high level, when we start reading the code, information relating to the code enters to our Short-term memory. Think of the time when you remember some information for a short period of time to memorize a phone number or a quick task to complete. This is the use of short-term memory.
Then when we are thinking and interpreting the code, we are using the information from our Short-term memory to our Working memory. The information is processed in our Working memory to generate new understanding / meaning.
console.log(...) will print out information based on our Long-term memory.
In short, when we are reading a code, different cognitive processes are engaged to comprehend the code and perform certain actions later such as changing the code or adding new code etc.
Why we get confused when reading code?
1) Lack of knowledge
Sometimes you get confused when reading a source code if you are unfamiliar with the language syntax or concepts. Or you might be unfamiliar with the specific industry / domain knowledge that the code is written for. In this context, we say that you lack knowledge to understand the code.
For example, below is a snippet of a Racket code. If you do not understand LISP-like code or concept of
Lambda, then you will not know how to evaluate
((double inc) 5)
(define (inc n) (+ n 1)) (define (double f) (lambda (x) (f (f x)))) ((double inc) 5)
To understand what is going on:Example code here
– You need to know the syntax of a function in LISP
– You understand that you can pass function into another function
– In this case, the function
incis passed to the function
double. In this function
incfunction will be applied twice.
((double inc) 5)will evaluate to 7.
To understand in terms of cognitive processes, we are unable to retrieve any information from our Long-term memory that can be used by the working memory to evaluate the code.
2) Lack of information
Unfamiliar with how a particular method works or purpose of a class etc because the information cannot be retrieved directly from the code itself.
For example, we can see a python function that seems to filter a group of members by name. But we do not know how exactly the name is gonna be filtered. Hence we might guess or search for additional information from the code base.
We are temporarily confused because we cannot understand the function immediately from the code itself.
def filter_members(all_members): return filter_by_name(all_members)
3) Lack of processing power
Too many items to hold in your working memory.
Code is written in too complex manner.
Why reading unfamiliar code is hard?
Types of cognitive loads
Difference between Experts and Novice?
Learning language syntax quickly
While coding a program, we might forget about how a particular concept works or their syntax. Often, we will just quickly google to retrieve for an answer (either from the documentation or stackoverflow).
The author argues it is better to know some of these syntax by heart rather than googling for answers all the time. First of all, it is distracting to google for the answers as you might be tempted to do something else (once you are in your browser especially with the multi-tabs). Second, you need to be able to recall the language syntax from your long-term memory in order to clunk the code (that you are reading) effectively.
Our long-term memories will decay in a pattern similar to a forgetting curve. If we don’t recall the things that we learned in a specific period of time, we will forget things. But if we try to recall the memories at a specific time, the decay will be slowed down.
We know this very well when we tried to cram for an exam, then we might forget everything a few things after the exam is over. Because of the failure of having future reminders, we will struggle a retrieve a concept or syntax from our long-term memory.
What can you do to recall the syntax effectively?
It is not efficient to recall every concepts every single day. Instead, we can use a spaced repetition system (SRS) software to automate the reminders for us based on how well we think we can recall a particular concept.
Note: I have tried spaced repetition system and struggles to integrate to my everyday workflow. It’s not because SRS methodology does not work. Rather it takes a commitment to adopt SRS well and make it a habit go through your deck everyday. I will keep trying and see what are the ways we can integrate SRS better into our lives.
Adopt a spaced repetition system (SRS) and add new knowledge that you have to your deck. This can be a situation where you are learning a new programming language, a new concept or framework. You will need to use judgments to know what concepts or syntax need to be included as you can’t add everything you learned into the deck.
Also when you find yourself googling for an answer, then add a new card to your deck. This shows that you have not understand the concepts or syntax by heart. For example, most programmers would know how to write a for-loop in their language. If they do not know, it means that they have not learned the language deeply yet.
Further reading on SRS
Another way to recall the syntax better is to actively think about the concepts that you are studying. It is easier for us to recall something if they are related to something that we know. When we relate a new concept / syntax to our existing knowledge, then we have a better chance of recalling the new concept / syntax in the future. Some questions to think about can be:
- Think and write down the concepts that you think is related to this new concept or syntax.
- In what ways are they similar and different?
- Think of variants of code that can achieve the same goals as this new concept or syntax.
- How important is this concept or syntax to the language, framework or codebase etc.?
Reducing cognitive loads when reading complex code
- Refactoring code temporarily
- Replacing unfamiliar language constructs
- Adding the concepts that you are confused to SRS.
- Working memory aids
- Create Dependency graph
- Using a state table
Think about code better!
Reaching deeper understanding of the codebase
Get better at solving programming problems
Avoiding bugs (misconceptions in thinking)
Write better code!
Naming things better
- Name moulds
- Feitelson’s three-step model
Avoiding code smells and cognitive loads
- 22 code smells from Martin Fowler’s Refactoring
- Arnaoudova’s six linguistic antipatterns
Get better at solving complex programming problem
- Learn from code and its explanation
- Germane Load
Practices that you can do
A) Reading different code base and attempt to understand what each code is doing.
1. Choose a code base to read
Choose a codebase where you have at least some knowledge of the programming language. You should have a high level understanding of what the code does.
2. Select a code snippet and study it for two minutes.
Choose a method, function or coherent code that is about half a page or maximum 50 lines of code.
3. Reproduce the code in paper or in an IDE (new file).
4. Reflect on what you have produced.
Which lines do you find easy and which lines are difficult?
Does the lines of code that are unfamiliar to you because of the programming concepts or domain knowledge?