14 Oct 2021

Automated bibisect to find source of a bug

In programming, we usually face bugs that we should fix to maintain or improve our software. In order to fix a bug, first we should find the source of the problem, and there are tools like “Automated bibisect” are available to help, specially when the bug is a regression. You probably know what a regression is:

Regression bugs are special kinds of bugs. They describe a feature that was previously working well, but at some point a change in the code has caused it to stop working. They are source of disappointment for the user, but they are easier to fix for the developers compared to other bugs! Why? Because every single change to the LibreOffice code is kept in the source code management system (Git), it is possible to find that which change actually introduced the bug.

From: https://blog.documentfoundation.org/blog/2021/07/29/fixing-an-interoperability-bug-in-libreoffice-missing-lines-from-docx-part-1-3/

But, how to find where (in which commit) the bug has actually introduced? The answer is provided by the Git; the source code management system that most of us use, and is used in LibreOffice development. Git provides a command named bisect.

Using git bisect, you can find the exact commit where the bug was introduced with the minimum number of tries possible using binary search. After invoking git bisect start, you should mark the last bad commit you know using git bisect good, and then the first bad commit you know using git bisect bad. Then you are told what is the number of estimated steps you need to go. You should compile each commit that is checked out, and then test to see if the problem is there.

Binary bibisect

But wait! Isn’t it true that compiling LibreOffice takes a while? Is git bisect for LibreOffice is something usable, in a short period time? The answer is no, but there is a solution called binary bisect or bibisect. Binary builds of all commits within certain periods of time are available as git repositories, and you can do git bisect on these repositories.

You can read more about bibisect here:

https://wiki.documentfoundation.org/QA/Bibisect

First, you have to find a suitable repository. If the bug is reproducible on every platform, you can choose among the repositories according to your OS:

For example, consider tdf#141049. It is about bad rendering of an EMF figure, which is wrongly displayed as blank. It was OK in LibreOffice 6.2, but in newer versions of LibreOffice 6.2, the problem appeared. So, if you are working on Linux, bibisect-linux-64-6.3 would be the right choice, because it provides “libreoffice-6-2-branch-point to libreoffice-6-3-branch-point and then libreoffice-6-3”.

good output in binary bibisect

(GOOD)

bad output in binary bibisect

(BAD)

Figure 1: The good, and the bad!

We start by downloading the repository bundle from TDF repo: (Warning: The bundle is ~8GB, and you need more space to extract and work with it)

$ wget --continue https://bibisect.libreoffice.org/linux-64-6.3.git-bundle
$ git clone -o bundle linux-64-6.3.git-bundle linux-64-6.3

Then you can work with the linux-64-6.3 like a normal git repository, and do bisect on it. In each step, you open the example file of the bug in LibreOffice using ./instdir/program/soffice and check if it the problem is there, or not. If it’s good, you use git bisect good and if it’s bad, you invoke git bisect bad. You should continue this process until it is finished, and you find the first bad commit.

Let’s see a video tutorial:

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

Automated bibisect

But what if we could automate this process? Fortunately, this is possible! It is called automated bisect, and it is usable on binary bisect repositories as an automated bibisect.

For the tdf#141049, if you create PDF output from the input file, there is a visible difference in the size of the blank output, and the size of the correct output.

To simplify, I’ve create tdf141049.doc from one of the figures that is not shown in the original example. The blank output is <20k, but the correct output is > 40k.  This provides a way to differentiate between good and bad commits automatically: checking the size to be > 40k is used to determine if the commit is good or bad.

We store this script as auto.sh:

./instdir/program/soffice --headless --convert-to pdf tdf141049.doc
file=tdf141049.pdf
minsize=40000
size=$(wc -c <"$file")
if [ $size -ge $minsize ]; then
    exit 0
else
    exit 1
fi

And then we do automated bibisect:

$ cd linux-64-6.3
$ git bisect start
$ git bisect bad  master    # master is bad
$ git bisect good oldest    # oldest is good
Bisecting: 4085 revisions left to test after this (roughly 12 steps)
$ git bisect run ./auto.sh
...
Author: Jenkins Build User <tdf@pollux.tdf>
Date: Tue May 28 09:22:52 2019 +0200

source sha:69b62cfcbd364d7f62142149c2f690104b217ca1

That’s it! The auto.sh script determines which commit is good, and which is bad, and through bisect binary search, the first bad commit will be found within seconds. The automated bibisect provides the exact commit which is the source of the bug, and you should work on this commit to fix the problem.

Final Notes

In the end, I should note  that not it is not easy to do automated bibisect for every bug. But, if you create PDF or SVG output from the input file, you can possibly write a script to analyze the output and automate the task. The script have to return 0 to the shell if the commit was good, and non-zero if it was bad. The values greater than 127 and also 125 can be used to skip the commit.

If you want to get started with LibreOffice development, I suggest you to see our video tutorial:

Getting Started (Video Tutorial)

4 Oct 2021

Getting Started (Video Tutorial)

LibreOffice development starts with setting up a development environment. After that, you can do the development in your favorite IDE. In this 80 minutes presentation, you will find everything you need to know to get started with LibreOffice development; from installing dependencies using distribution tools, LODE (LibreOffice Development Environment) or manual setup to compilation itself.
With this tutorial, you can build LibreOffice for yourself. Then we look at some simple tasks from LibreOffice EasyHacks. After that, you can try to get your submission merged into the LibreOffice code by submitting it to gerrit, and doing the fixes requested by the reviewers.

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

Presentation: Getting Started with LibreOffice Development Hossein Nourikhah

This talk was recorded presented in the LibreOffice Conference 2021 (LibOCon 2021) Slides

LibreOffice Conference 2021

LibreOffice Conference 2021

4 Oct 2021

LibreOffice Development blog has started!

This is the first post of the LibreOffice Development Blog!

To know more about what is going on in LibreOffice, you can refer to the main Document Foundation blog. Also, if you want to learn more about the LibreOffice design, you can refer to the LibreOffice Design Team blog. And now, we have created a new blog, dedicated to the LibreOffice development!

Important Topics

Here is a good place to get more information about ongoing development efforts. Alongside the Document Foundation Wiki and #libreoffice-dev IRC, we will provide development related information here. We will talk about LibreOffice internals, and modules, how to fix bugs, write tests, and many other things! If you want to start LibreOffice development, this is a good place familiarize yourself with LibreOffice code, tools and developers.

Do you know C++, Java, Python, SQL, or other programming languages? If so, you can find useful information about the latest development related news and other up-to-date information here. Although we emphasize using these programming languages,  you may find some areas that you can help and contribute, even without being an expert in programming.

LibreOffice core, a mix of many modules used in LibreOffice development

LibreOffice core, a mix of many modules used in LibreOffice development

We will focus on LibreOffice core development, which contains many modules that are listed in LibreOffice modules documentation. LibreOffice applications like Writer,  Calc, Impress, Draw, Math and Base that any user interact with will be part of our focus. Additionally, there are modules like VCL (UI toolkit for LibreOffice) that normal users may not fully understand. As a developer, you should know a lot about the LibreOffice internals, so we will discuss these modules.

Additionally, we will also talk about git, Gerrit, Jenkins, Bugzilla, compilers, IDEs and many other tools and techniques.

Above all, you can improve your development skills here, so stay tuned for interesting contents soon. We hope to see your name in the list of LibreOffice developers, here!