24 Oct 2024

Crash fix part 5: crash report tool

In previous blog posts about crashes in LibreOffice, I have discussed how to debug and fix some of crashes. Now I discuss a nice tool to keep track of the crash reports from volunteers: Crash report tool.

Crash Report Statistics

Crash report is available via this LibreOffice website:

You can see that different versions of LibreOffice listed there, and for each and every tracked version, number of crashes during the previous 1, 3, 7, 14 and 28 days can be seen. This is possible using the appropriate buttons on the top.

This data is gathered from those to volunteer to submit reports to make LibreOffice better.

This statistic is very helpful to understand the robustness of the builds in different versions.

Crash Signatures

If you choose a specific version, you may see signatures of the crashes. This is helpful when trying to fix crashes. For example, this is one of the crash signatures found in LibreOffice 24.8.0.3:

This shows that the crash happens in GetCharFormat() function. One may use this information to track and fix the problem.

Looking into one of the crashes, one may see the details of the crash, including the stack trace in the crashing thread, and link to the exact place of the source code that leads to the crash.

As an example, you can see this crash report.

Sometimes, experienced developers may be able to reproduce the bug using crash signatures while knowing some background. Otherwise, in most cases, filing a bug with documents and instructions to reproduce the bug is essential. Adding a link to the crash report can be helpful.

14 May 2024

Crash fixes, part 4: assertion failure

In the previous parts of the blog posts series on fixing software crashes, I have written about some crash fixes in LibreOffice around segfaults, aborts, and I discussed how test them. Here I write about fixing assertion failure.

What is Assertion Failure?

Assertion is a mechanism provided to the developers to make sure that things are good in runtime, conforming to what they were assuming. For example, making sure that some pointer is valid, some data elements match expectation, or some similar assumptions that need to be valid in order to get the correct results.

As an example, consider the C++ function basegfx::utils::createAreaGeometryForLineStartEnd(), which creates a geometric representation of an arrow. The code resides here:

basegfx/source/polygon/b2dlinegeometry.cxx:84

This is the line of code which contains assertion:

assert((rCandidate.count() > 1) && "createAreaGeometryForLineStartEnd: Line polygon has too few points");

On top of the actual function implementation, the C++ code asserts many conditions to make sure that they are met. In the above assertion, it checks to make sure that the number of points in the given data structure is more than 1. Otherwise, it leads to an assertion failure.

For various reasons, sometimes these sort of assumption may not be valid. To avoid reaching to incorrect results, it is important to have such assertions in place, to find such issues as soon as possible. If you are developing LibreOffice, you may have already built the code from sources in debug mode. Therefore, you may see software stops working, or simply crashes.

This crash may not happen for the end users, which use the release version of software. Therefore, these type of crashes have lower impact for the end users, and they are usually considered of lower importance compared to the crashes that happen for the end users.

Backtrace

One of the things that can help diagnose the problem is the stack trace, or simply backtrace. The way to obtain a backtrace depends on the platform and/or IDE that you use. If you use an IDE like Qt Creator, Visual Studio, etc., getting a backtrace would be as easy as debugging LibreOffice, making the assert fail, and then copy the backtrace from the UI. To learn more about IDEs, see this Wiki page:

If you want to use gdb on Linux, you may run LibreOffice with this command line:

$ instdir/program/soffice –backtrace

and then make the assert fail, and you will have the backtrace in gdbtrace.log file. You can learn more int this QA Wiki article:

TDF Wiki: QA/BugReport/Debug_Information

One thing to mention is that if you work on a reported bug regarding to assertion failure, then the actual way to reproduce the issue and make the assertion fail is usually described in the relevant TDF Bugzilla issue. In the meta bug related to assertion failure, you may find some of these issues in the last part of this blog post.

Fixing the Problem

To fix the problem, first you should gain understanding of the assumption, and why it fails. You should be able to answer these questions by reading the code and debugging it:

  • What does some assumption mean?
  • Why it fails?
  • How to fix that, so that it does not fail?

To gain this understanding, you have to look into the places where backtrace points. Backtrace can be complex, containing the whole stack of the function calls across the software, linking to places in the code, but let’s discuss a simplified form.

Consider this bug fix:

tdf#152012 Fix assert fail on opening date picker

The problem was that an assertion failure was happening when opening date picker field in a DOCX file. This is the simplified form of the stack trace:

1  __pthread_kill_implementation           pthread_kill.c:44
2  __pthread_kill_internal                 pthread_kill.c:78
3  __GI___pthread_kill                     pthread_kill.c:89
4  __GI_raise                              raise.c:26
5  __GI_abort                              abort.c:79
6  __assert_fail_base                      assert.c:92
7  __GI___assert_fail                      assert.c:101
8  o3tl::span::operator[]                  span.hxx:83
9  OutputDevice::ImplLayout                text.cxx:1396
10 OutputDevice::DrawTextArray             text.cxx:948
11 Calendar::ImplDraw                      calendar.cxx:71
12 Calendar::Paint                         calendar.cxx:1133

The problem was caused by an out of bound access to a vector of integers, and to fix that, you had to see why this happens, and fix that.

This is the description from the commit title:

Previously, for the 7 days of the week, a 7 letter string smtwtfs (depending on the week start day this can be different, but length is always 7) was sent to this method without providing the length, thus the string length: 7 was used. In this case, positions of the letters were calculated and used from other array named mnDayOfWeekAry[7]. mnDayOfWeekAry[k+1] was used as the position of letter k (k=0..5). In this case, there was 7 letters for 7 days, and only 6 positions provided by the array. This caused assertion failure in span.hxx:83 when trying to access mnDayOfWeekAry[7] via o3tl::span<T>::operator[].

As you can see, a false assumption was creating assertion failure, and the fix was essentially changing the code based on the corrected assumption.

Sometimes, the fix can be easier. As an example, by only checking a pointer to make sure that it is valid, the assertion failure does not happen. Therefore, to fix the problem, you have to carefully study the code and its behavior.

Final Notes

Many of the assertion failure bugs are fixed in daily works of the developers, before causing problems for the end users, which use the “release” version of the software, that do not crash because of the assertion failure. But there are some of these issues remaining.

Do you want to help fix some of the remaining issues? Then, please refer to the list here, read some bug report, and pick one:

24 Oct 2022

Crashes that you can fix! – EasyHack

As discussed in the “crash fixes for LibreOffice, part 1: segfaults“, bugs that lead to crash in LibreOffice or other native applications can have different causes, and therefore need completely different fixes. Some of these bugs are hard to fix: you have to change the behavior of the application in order to avoid crashes. (more…)

6 Sep 2022

Crash fixes, part 1: segfault

One of the bugs that we see in computer programs including LibreOffice is the crash. You’re working with the application, and the program is suddenly closed! I discuss the usual causes for crashes, and how to fix some of them. Here I write about segfault crashes.
(more…)

29 Jul 2024

Fuzz testing to maintain LibreOffice code quality

Here I discuss what fuzz testing is, and how LibreOffice developers use it incrementally to maintain LibreOffice code quality.

Maintaining Code Quality

LibreOffice developers use various different methods and tools to maintain LibreOffice code quality. These are some of them:

1. Code review: Every patch from contributors should pass code review on Gerrit, and after conforming to coding standards and conventions, it can become part of the LibreOffice source code.

2. Static code checking: “Coverity Scan” continuously scans LibreOffice source code to find the possible defects. An automated script reports these issues to the LibreOffice developers mailing list so that developers can fix them.

3. Continuous Testing: There are various C++ unit test and Python UI tests in LibreOffice core source code to make sure that the functionalities of the software remain working during the later changes. They are also helpful for making sure that the fixed regressions do not happen again. These test run continuously for each and every Gerrit submission on CI machines via Jenkins.

4. Crash testing: A good way to make sure that LibreOffice works fine is to batch open and convert a huge set of documents. This task is done regularly, and if some failure occurs developers are informed to fix the issue.

5. Crash reporting: LibreOffice uses crash testing to find out about the recurrent crashes, and fix them.

6. Tinderbox Platforms: Using dedicated machines with various different architectures, LibreOffice developers make sure that LibreOffice source code builds and runs without problem on different platforms. Here is the description of tinderbox (TB) from TDF Wiki:

Tinderbox is a script to run un-attended build on multiple repos, for multiple branches and for gerrit patch review system.

LibreOffice tinderboxes status

LibreOffice tinderboxes status

You can see the build status here:

https://tinderbox.libreoffice.org/

7. Fuzz testing: LibreOffice software is checked continuously using Fuzz testing. This is essentially giving various automated inputs to the program to find the possible places in the code where problem occurs. Then, developers will become aware of the those problematic places in the code, and can fix them.

Fuzz Testing LibreOffice

Fuzz testing on LibreOffice source code is active since 2017, and since then there has been various bug fixes for the problems that the fuzz tester reported. You can see more than 1500 of such fixes in the git log until now:

$ git shortlog -s -n --grep=ofz#

Issues Found with Fuzz Testing

This tool can find various different problems. These issues are then filed in a section of Chromium bug tracker, and after ~30 days, they are made public. When developers fix bugs of this kind, they refer to the issue number (for example 321) as ofz#321. A comprehensive list of all issues found is visible here:

Fixing the Issues

Let’s look at one of the fixes. You can find commits related to fuzzing with:

$ git log --grep=ofz

This is a recent fix from Caolán, an experienced LibreOffice developer that provided most of the fixes found through oss-fuzz:

commit d30ecb5fb07f005ebd944e864f0a15678289a4ed
    ofz#69809 Integer-overflow

--- a/filter/source/graphicfilter/icgm/cgm.cxx
+++ b/filter/source/graphicfilter/icgm/cgm.cxx
@@ -227,7 +227,7 @@ double CGM::ImplGetFloat( RealPrecision eRealPrecision, sal_uInt32 nRealSize )
         else
         {
             sal_Int32* pLong = static_cast<sal_Int32*>(pPtr);
-            nRetValue = static_cast(abs( pLong[ nSwitch ] ));
+            nRetValue = fabs(static_cast(pLong[nSwitch]));
             nRetValue *= 65536;
             nVal = static_cast( pLong[ nSwitch ^ 1 ] );
             nVal >>= 16;

As you can see, using abs() first, and then casting to double is changed in this commit to cast to double first, and then using fabs(). The reason of this change lies in the data type of some variables.

pLong is an array of sal_Int32, which is 32 bit signed integer. It can take values from -2,147,483,648 to 2,147,483,647. As you can see, the smallest negative 32-bit signed integer can not be stored in the same 32-bit signed variable if abs() is used to remove sign from that.

As the result is stored in nRetValue, a varible of type double, it is possible to first cast the array item to double, and then use floating point version of absolute function, fabs() over it. In this way, “integer overflow” will not happen anymore.

This patch was one of the smallest examples of what a fix can be. There are many bugs that are more complex, and require more careful examination to provide a fix.

Further Reading

You can read more in the TDF Wiki article around fuzz testing in LibreOffice. Also, OSS-Fuzz documentation is a good place to look into: