Skip to content

Category: Programming

Reading Git Blob objects in Java

In order to read the Git Blob objects, we need to understand that git uses zlib to compress the stored objects. We can use the Java zip utils to decompress the Git blob.

Code Snippets

Below are some methods of decompressing the Blob file. If you did some research online, you will find many examples showing Method 1.

But I recommend using Method 2 as it is does not assume the size of the decompressed file. This method is also used in Apache Commons library.

Make sure the binary file is not corrupted or you might encounter java.util.zip.ZipException

Method 1

The byte array size of result can be arbitrarily set with a specific size. But you will have problems if the decompressed file size is uncertain.

String file = "<PATH to Git blob>";
byte[] fileBytes = Files.readAllBytes(Paths.get(file));
Inflater decompresser = new Inflater();
decompresser.setInput(fileBytes, 0, fileBytes.length);

byte[] result = new byte[1024]; // Size need to be set

int resultLength = 0;
resultLength = decompresser.inflate(result);

decompresser.end();

Method 2: Checks if end of compressed data is reached

This method reads the content of the file and output to ByteArrayOutputStream object.

String file = "<PATH to Git blob>";
byte[] fileBytes = Files.readAllBytes(Paths.get(file));

Inflater inflater = new Inflater();
inflater.setInput(fileBytes);

ByteArrayOutputStream outputStream = new ByteArrayOutputStream(fileBytes.length);
byte[] buffer = new byte[1024];

while (!inflater.finished()) {
  int count = inflater.inflate(buffer);
  outputStream.write(buffer, 0, count);
}

outputStream.close();

byte[] result = outputStream.toByteArray();

References

The Programmer’s Brain – Improving your skills in reading and writing code

Performing source code review is one of the important skills that you should pickup as an Application Security engineer. Being polygot in programming is helpful because you might be reviewing source code that are written in different languages. Right now, I seldom see anyone mentioning a systematic approach to read source code if you are a novice with the codebase or languages.

Reading source code is a underrated skill in today’s programming education. Often when we want to learn programming, we are given advice to build projects and write more code to learn programming. However, another aspects of learning programming is to read the code of other programmers.

This is why I think “The Programmer’s Brain” is one of the insightful programming books to read because it discusses about different aspects of being a programmer:
– How to get better at reading code?
– How to get better at thinking about code?
– How to get better at writing code?

Get better at reading code!

Cognitive Processes in relations to programming

We can model our cognitive processes with the below diagram:

Hermans, F. (2021). Programmer’s brain: What every programmer needs to know about cognition. Manning.

At a high level, when we start reading the code, information relating to the code enters to our Short-term memory. Think of the time when you remember some information for a short period of time to memorize a phone number or a quick task to complete. This is the use of short-term memory.

Then when we are thinking and interpreting the code, we are using the information from our Short-term memory to our Working memory. The information is processed in our Working memory to generate new understanding / meaning.

Our Working memory retrieves and connect information from our Long-term memory to process the information. Think of our Working memory like a melting pot. For example, we can recall certain syntax pattern from our Long-term memory. Like when we are reading the Javascript code, we know console.log(...) will print out information based on our Long-term memory.

console.log(message)

In short, when we are reading a code, different cognitive processes are engaged to comprehend the code and perform certain actions later such as changing the code or adding new code etc.

Why we get confused when reading code?

1) Lack of knowledge

Sometimes you get confused when reading a source code if you are unfamiliar with the language syntax or concepts. Or you might be unfamiliar with the specific industry / domain knowledge that the code is written for. In this context, we say that you lack knowledge to understand the code.

For example, below is a snippet of a Racket code. If you do not understand LISP-like code or concept of Lambda, then you will not know how to evaluate ((double inc) 5)

(define (inc n) (+ n 1))
(define (double f)
  (lambda (x) (f (f x))))

((double inc) 5)

To understand what is going on:
– You need to know the syntax of a function in LISP
– You understand that you can pass function into another function
– In this case, the function inc is passed to the function double. In this function double, the inc function will be applied twice.
– Hence ((double inc) 5) will evaluate to 7.

Example code here

To understand in terms of cognitive processes, we are unable to retrieve any information from our Long-term memory that can be used by the working memory to evaluate the code.

2) Lack of information

Unfamiliar with how a particular method works or purpose of a class etc because the information cannot be retrieved directly from the code itself.

For example, we can see a python function that seems to filter a group of members by name. But we do not know how exactly the name is gonna be filtered. Hence we might guess or search for additional information from the code base.

We are temporarily confused because we cannot understand the function immediately from the code itself.

def filter_members(all_members):
  return filter_by_name(all_members)

3) Lack of processing power

Too many items to hold in your working memory.

Code is written in too complex manner.

Why reading unfamiliar code is hard?

Types of cognitive loads

Intrinsic

Extraneous

Germane

Difference between Experts and Novice?

Clunking

Learning language syntax quickly

While coding a program, we might forget about how a particular concept works or their syntax. Often, we will just quickly google to retrieve for an answer (either from the documentation or stackoverflow).

The author argues it is better to know some of these syntax by heart rather than googling for answers all the time. First of all, it is distracting to google for the answers as you might be tempted to do something else (once you are in your browser especially with the multi-tabs). Second, you need to be able to recall the language syntax from your long-term memory in order to clunk the code (that you are reading) effectively.

Our long-term memories will decay in a pattern similar to a forgetting curve. If we don’t recall the things that we learned in a specific period of time, we will forget things. But if we try to recall the memories at a specific time, the decay will be slowed down.

We know this very well when we tried to cram for an exam, then we might forget everything a few things after the exam is over. Because of the failure of having future reminders, we will struggle a retrieve a concept or syntax from our long-term memory.

Want to Remember Everything You'll Ever Learn? Surrender to This Algorithm  | WIRED
https://www.wired.com/wp-content/uploads/archive/images/article/magazine/1605/ff_wozniak_graph_f.jpg


What can you do to recall the syntax effectively?

It is not efficient to recall every concepts every single day. Instead, we can use a spaced repetition system (SRS) software to automate the reminders for us based on how well we think we can recall a particular concept.

Note: I have tried spaced repetition system and struggles to integrate to my everyday workflow. It’s not because SRS methodology does not work. Rather it takes a commitment to adopt SRS well and make it a habit go through your deck everyday. I will keep trying and see what are the ways we can integrate SRS better into our lives.

Adopt a spaced repetition system (SRS) and add new knowledge that you have to your deck. This can be a situation where you are learning a new programming language, a new concept or framework. You will need to use judgments to know what concepts or syntax need to be included as you can’t add everything you learned into the deck.

Adding a new card on Javascript’s array prototype filter method

Also when you find yourself googling for an answer, then add a new card to your deck. This shows that you have not understand the concepts or syntax by heart. For example, most programmers would know how to write a for-loop in their language. If they do not know, it means that they have not learned the language deeply yet.

Further reading on SRS

Memorizing a programming language using spaced repetition software by Derek Sivers

Augmenting Long-term Memory

Spaced Repetition for Efficient Learning by Gwern

Effective learning: Twenty rules of formulating knowledge

Another way to recall the syntax better is to actively think about the concepts that you are studying. It is easier for us to recall something if they are related to something that we know. When we relate a new concept / syntax to our existing knowledge, then we have a better chance of recalling the new concept / syntax in the future. Some questions to think about can be:

  • Think and write down the concepts that you think is related to this new concept or syntax.
    • In what ways are they similar and different?
  • Think of variants of code that can achieve the same goals as this new concept or syntax.
  • How important is this concept or syntax to the language, framework or codebase etc.?

Reducing cognitive loads when reading complex code

  • Refactoring code temporarily
  • Replacing unfamiliar language constructs
  • Adding the concepts that you are confused to SRS.
  • Working memory aids
    • Create Dependency graph
    • Using a state table

Think about code better!

Reaching deeper understanding of the codebase

Get better at solving programming problems

Avoiding bugs (misconceptions in thinking)

Write better code!

Naming things better

  • Name moulds
  • Feitelson’s three-step model

Avoiding code smells and cognitive loads

  • 22 code smells from Martin Fowler’s Refactoring
  • Arnaoudova’s six linguistic antipatterns

Get better at solving complex programming problem

  • Automatization
  • Learn from code and its explanation
  • Germane Load

Practices that you can do


A) Reading different code base and attempt to understand what each code is doing.

1. Choose a code base to read

Choose a codebase where you have at least some knowledge of the programming language. You should have a high level understanding of what the code does.

2. Select a code snippet and study it for two minutes.

Choose a method, function or coherent code that is about half a page or maximum 50 lines of code.

3. Reproduce the code in paper or in an IDE (new file).

4. Reflect on what you have produced.

Which lines do you find easy and which lines are difficult?

Does the lines of code that are unfamiliar to you because of the programming concepts or domain knowledge?

B) Reading and using more programming concepts

Learning JavaScript as Beginner?

Besides learning about Python ASYNCIO, For the last few weeks, I have been learning JavaScript for web development. My methodology is to consume the knowledge from multiple resources (book, blogs and MOOCs).

Why multiple resources that explain the same concepts?
If you use different resources, you will be exposed to the concept in different context. This is especially useful for beginners to not stuck in one context. You need to understand the concept in different situation.

Also, feel free to modify the tutorial steps. Add in anything that is interesting. Apply previously learned knowledge to the tutorial. Combine two different concepts. In short, be active in experimenting.

For this post, I want to share some resources that are useful in my journey of learning JavaScript.

The Modern JavaScript Bootcamp

https://www.udemy.com/share/1013A0AkIdcFlTTHw=/

If you are starting to learn JavaScript from the basics, please use this course. I find that there is a balanced mix of explanation and practical usage of concepts.

One particular thing that is useful is the challenges that the course instructor gave to the students. After the instructor demonstrated on a practical concept, you are expected to complete the variant of the demo.

Eloquent JavaScript

Eloquent JavaScript

Disclaimer: I cannot give a complete review since I completed only the earlier chapters (1-7). In future, I will read the remaining chapters again.
The book introduces foundational programming knowledge. If you are new to programming, you can consider reading chapter 1 – 7 to learn the fundamentals. The chapter on different JavaScript built-in functions for Arrays (e.g. forEach(..), filter(...) and map(...)) was useful later on when I studied with the other MOOCs.

I also advise beginners to try the few challenges that are available at the end of each chapters. I have consolidated my understanding by doing these challenges. Some of the challenges may be difficult. So you should feel free to refer to the code (this is not school).

The Complete React Developer Course

https://www.udemy.com/share/101XgIAkIdcFlTTHw=/

Before you take this React course, I suggest that you take Modern JavaScript Bootcamp. At the same time, you should create a few demo web applications. If you want to learn Web Development, then React is one of the JS framework that you need to learn. Why? Because of the wide adoption. One cool thing that I like about React is the speed of rendering and JSX (JavaScript XML).

Food for Thought: 
React seems like a powerful framework that allows the application to process and compute data for the client side. Does this mean that more application will start perform business logic workflows in the client side and forgets about backend validation?

Python: Asyncio and Aiohttp

Introduction

Suppose your program is trying to execute a series of tasks (1 – 6). If each task takes different time to complete, then your program will need to wait for each task to be completed sequentially before it can proceed.

Asyncio will be useful in such scenarios because it enables the program to continue running other tasks while waiting for the specific task to be completed. In order to use Asyncio, you will need to use compatible libraries. For example, instead of using requests (a popular HTTP library in Python), you will use aiohttp.

Note: In order to install aiohttp library in Windows system, you will need to download and install Microsoft C++ Build Tools. https://visualstudio.microsoft.com/visual-cpp-build-tools/

When to use Asyncio?

  • You want to speed up a specific part of your program where you are running a list of tasks sequentially for large-N items.
  • Suppose you are making API calls based on a list of different values for a parameter, you can use asyncio and aiohttp to make the API requests.
  • You do not need to change your entire program to use async/await syntax. Try to observe which part of the program is a bottleneck and explore how asyncio can improve performance on this particular flow.

Example: Crawling Wikipedia for info on Football (Soccer) Clubs

In this demo, we are going to perform the list of tasks below:

  1. Read the list of football clubs from a csv file.
  2. Get the Wikipedia URL of each Football club.
  3. Get the Wikipedia HTML page of each Football club.
  4. Write the HTML page into a HTML file for each Football Club.
  5. From the HTML page, we need to parse for information (Full Name, Ground, Founded Date etc.) using BeautifulSoup library.
  6. For the information of each club, we want to append the information into a Dataframe.
  7. Finally, print out the Dataframe to see if the information is correct.

See the Synchronous example and Asynchronous example from my Github repo. If we execute both scripts, we can an estimated difference here where Asyncio complete the execution faster by about 20-30%.

Execution time for Asyncio : 17.885913610458374
Execution time for Synchronous: 23.075875997543335

Refactor Tips

  • As a practice, a co-routine main is often defined and used in an event loop (e.g. asyncio.run(main()). Then in the co-routine main function, all the other co-routines are await.
  • If the request has a consistent response time, then you should stick to the synchronous approach. For example, if you are using Pandas, then you should use apply() on a function. For parts of the program which are bottleneck, you should try with asyncio to see if the speed performance is improved.

Key Terms

Event Loop. You must use an event loop to run the co-routine.

# Running event loop for Python 3.7+
asyncio.run(main())

# Older syntax before Python 3.7+
loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

Co-routine

async / await. This is the syntax for defining co-routines in python. You can declare a co-routine by using async def in front of a function. await is used inside a co-routine and tells the program to come back to foo() when do_something() is ready. Make sure that do_something() is also a co-routine.

async def foo():
    x = await do_something()
    return x

Recommended Resources

GraphQL Notes

Types of Common Vulnerabilities

SQL injection

Access Control

Information Disclosure

NoSQL Injection

How to turn ON or OFF the GraphQL Interface?

GraphQL Interface (https://hostname:port/graphql)

Toggle between true or false for the parameter graphiql. Note that you can still send query via the API request even if the interface is turned off.

Reference

Black Hat Programming Series

Recently, I plan to work through two technical books (Black Hat Python and Black Hat Go).

One of the motivations of going through these books is to understand how to build tools for content discovery and brute-forcing. Also I will like to develop my Python scripting skills further.

In Black Hat Python, the sample code for the chapters are in Python 2. I decided to convert the Python 2 code to Python 3 code. I will also use libraries such as requests to replace some of the steps were performed by urllib and urllib2.

Here are some sample projects from Black Hat Python that were converted to Python 3:

Web Application Mapper
Once you identified the open source technology used by the target web app, you can download the open source code to your directory. The mapper will send request to the target and spider the target using the directories and file names used in the open source code.

The script uses the known directories of the particular to map out the attack surfaces of the web app

Content Brute Forcing
In cases where you do not know the exact technology stack, you will need to brute force using a common word list. The word list can contain the common directory and file names. In the book, the script allow extension brute forcing as well. I have added filter method that allow the script to display responses that have specific status codes (e.g. 200).

Notice only response with status 200 are displayed?

A common workflow that we can observe from these tooling scripts:

  • A word list or list of test cases are generated or taken from open source. These are added to the queue.
  • A filter or specific information list is given based on what we are interested during our recon.
  • Brute forcing can be done faster with threads.
  • The code might be simpler with the use of requests instead of urllib

All source code in this blog post can be found here