Automated bibisect to find source of a bug

In programming, we usually face bugs that we should fix to maintain or improve our software. In order to fix a bug, first we should find the source of the problem, and there are tools like “Automated bibisect” are available to help, specially when the bug is a regression.

You probably know what a regression is:

Regression bugs are special kinds of bugs. They describe a feature that was previously working well, but at some point a change in the code has caused it to stop working. They are source of disappointment for the user, but they are easier to fix for the developers compared to other bugs! Why? Because every single change to the LibreOffice code is kept in the source code management system (Git), it is possible to find that which change actually introduced the bug.

From: https://blog.documentfoundation.org/blog/2021/07/29/fixing-an-interoperability-bug-in-libreoffice-missing-lines-from-docx-part-1-3/

But, how to find where (in which commit) the bug has actually introduced? The answer is provided by the Git; the source code management system that most of us use, and is used in LibreOffice development. Git provides a command named bisect.

Using git bisect, you can find the exact commit where the bug was introduced with the minimum number of tries possible using binary search. After invoking git bisect start, you should mark the last bad commit you know using git bisect good, and then the first bad commit you know using git bisect bad. Then you are told what is the number of estimated steps you need to go. You should compile each commit that is checked out, and then test to see if the problem is there.

Binary bibisect

But wait! Isn’t it true that compiling LibreOffice takes a while? Is git bisect for LibreOffice is something usable, in a short period time? The answer is no, but there is a solution called binary bisect or bibisect. Binary builds of all commits within certain periods of time are available as git repositories, and you can do git bisect on these repositories.

You can read more about bibisect here:

https://wiki.documentfoundation.org/QA/Bibisect

First, you have to find a suitable repository. If the bug is reproducible on every platform, you can choose among the repositories according to your OS:

For example, consider tdf#141049. It is about bad rendering of an EMF figure, which is wrongly displayed as blank. It was OK in LibreOffice 6.2, but in newer versions of LibreOffice 6.2, the problem appeared. So, if you are working on Linux, bibisect-linux-64-6.3 would be the right choice, because it provides “libreoffice-6-2-branch-point to libreoffice-6-3-branch-point and then libreoffice-6-3”.

good output in binary bibisect

(GOOD)

bad output in binary bibisect

(BAD)

Figure 1: The good, and the bad!

We start by downloading the repository bundle from TDF repo: (Warning: The bundle is ~8GB, and you need more space to extract and work with it)

$ wget --continue https://bibisect.libreoffice.org/linux-64-6.3.git-bundle
$ git clone -o bundle linux-64-6.3.git-bundle linux-64-6.3

Then you can work with the linux-64-6.3 like a normal git repository, and do bisect on it. In each step, you open the example file of the bug in LibreOffice using ./instdir/program/soffice and check if it the problem is there, or not. If it’s good, you use git bisect good and if it’s bad, you invoke git bisect bad. You should continue this process until it is finished, and you find the first bad commit.

Let’s see a video tutorial:

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

Automated bibisect

But what if we could automate this process? Fortunately, this is possible! It is called automated bisect, and it is usable on binary bisect repositories as an automated bibisect.

For the tdf#141049, if you create PDF output from the input file, there is a visible difference in the size of the blank output, and the size of the correct output.

To simplify, I’ve create tdf141049.doc from one of the figures that is not shown in the original example. The blank output is <20k, but the correct output is > 40k.  This provides a way to differentiate between good and bad commits automatically: checking the size to be > 40k is used to determine if the commit is good or bad.

We store this script as auto.sh:

./instdir/program/soffice --headless --convert-to pdf tdf141049.doc
file=tdf141049.pdf
minsize=40000
size=$(wc -c <"$file")
if [ $size -ge $minsize ]; then
    exit 0
else
    exit 1
fi

And then we do automated bibisect:

$ cd linux-64-6.3
$ git bisect start
$ git bisect bad  master    # master is bad
$ git bisect good oldest    # oldest is good
Bisecting: 4085 revisions left to test after this (roughly 12 steps)
$ git bisect run ./auto.sh
...
Author: Jenkins Build User <tdf@pollux.tdf>
Date: Tue May 28 09:22:52 2019 +0200

source sha:69b62cfcbd364d7f62142149c2f690104b217ca1

That’s it! The auto.sh script determines which commit is good, and which is bad, and through bisect binary search, the first bad commit will be found within seconds. The automated bibisect provides the exact commit which is the source of the bug, and you should work on this commit to fix the problem.

Final Notes

In the end, I should note  that not it is not easy to do automated bibisect for every bug. But, if you create PDF or SVG output from the input file, you can possibly write a script to analyze the output and automate the task. The script have to return 0 to the shell if the commit was good, and non-zero if it was bad. The values greater than 127 and also 125 can be used to skip the commit.

If you want to get started with LibreOffice development, I suggest you to see our video tutorial:

Getting Started (Video Tutorial)