In order to read the Git Blob objects, we need to understand that git uses zlib
to compress the stored objects. We can use the Java zip utils to decompress the Git blob.
Code Snippets
Below are some methods of decompressing the Blob file. If you did some research online, you will find many examples showing Method 1.
But I recommend using Method 2 as it is does not assume the size of the decompressed file. This method is also used in Apache Commons library.
Make sure the binary file is not corrupted or you might encounter
java.util.zip.ZipException
Method 1
The byte array size of result can be arbitrarily set with a specific size. But you will have problems if the decompressed file size is uncertain.
String file = "<PATH to Git blob>";
byte[] fileBytes = Files.readAllBytes(Paths.get(file));
Inflater decompresser = new Inflater();
decompresser.setInput(fileBytes, 0, fileBytes.length);
byte[] result = new byte[1024]; // Size need to be set
int resultLength = 0;
resultLength = decompresser.inflate(result);
decompresser.end();
Method 2: Checks if end of compressed data is reached
This method reads the content of the file and output to ByteArrayOutputStream
object.
String file = "<PATH to Git blob>";
byte[] fileBytes = Files.readAllBytes(Paths.get(file));
Inflater inflater = new Inflater();
inflater.setInput(fileBytes);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(fileBytes.length);
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
byte[] result = outputStream.toByteArray();
References
- https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
- https://matthew-brett.github.io/curious-git/reading_git_objects.html
- https://docs.oracle.com/javase/8/docs/api/java/util/zip/Deflater.html
- https://docs.oracle.com/javase/8/docs/api/java/util/zip/Inflater.html
- https://www.programcreek.com/java-api-examples/?api=java.util.zip.Inflater