In the previous parts of the blog posts series on fixing software crashes, I have written about some crash fixes in LibreOffice around segfaults, aborts, and I discussed how test them. Here I write about fixing assertion failure.
What is Assertion Failure?
Assertion is a mechanism provided to the developers to make sure that things are good in runtime, conforming to what they were assuming. For example, making sure that some pointer is valid, some data elements match expectation, or some similar assumptions that need to be valid in order to get the correct results.
As an example, consider the C++ function basegfx::utils::createAreaGeometryForLineStartEnd()
, which creates a geometric representation of an arrow. The code resides here:
basegfx/source/polygon/b2dlinegeometry.cxx:84
This is the line of code which contains assertion:
assert((rCandidate.count() > 1) && "createAreaGeometryForLineStartEnd: Line polygon has too few points");
On top of the actual function implementation, the C++ code asserts many conditions to make sure that they are met. In the above assertion, it checks to make sure that the number of points in the given data structure is more than 1. Otherwise, it leads to an assertion failure.
For various reasons, sometimes these sort of assumption may not be valid. To avoid reaching to incorrect results, it is important to have such assertions in place, to find such issues as soon as possible. If you are developing LibreOffice, you may have already built the code from sources in debug mode. Therefore, you may see software stops working, or simply crashes.
This crash may not happen for the end users, which use the release version of software. Therefore, these type of crashes have lower impact for the end users, and they are usually considered of lower importance compared to the crashes that happen for the end users.
Backtrace
One of the things that can help diagnose the problem is the stack trace, or simply backtrace. The way to obtain a backtrace depends on the platform and/or IDE that you use. If you use an IDE like Qt Creator, Visual Studio, etc., getting a backtrace would be as easy as debugging LibreOffice, making the assert fail, and then copy the backtrace from the UI. To learn more about IDEs, see this Wiki page:
If you want to use gdb on Linux, you may run LibreOffice with this command line:
$ instdir/program/soffice –backtrace
and then make the assert fail, and you will have the backtrace in gdbtrace.log file. You can learn more int this QA Wiki article:
TDF Wiki: QA/BugReport/Debug_Information
One thing to mention is that if you work on a reported bug regarding to assertion failure, then the actual way to reproduce the issue and make the assertion fail is usually described in the relevant TDF Bugzilla issue. In the meta bug related to assertion failure, you may find some of these issues in the last part of this blog post.
Fixing the Problem
To fix the problem, first you should gain understanding of the assumption, and why it fails. You should be able to answer these questions by reading the code and debugging it:
- What does some assumption mean?
- Why it fails?
- How to fix that, so that it does not fail?
To gain this understanding, you have to look into the places where backtrace points. Backtrace can be complex, containing the whole stack of the function calls across the software, linking to places in the code, but let’s discuss a simplified form.
Consider this bug fix:
tdf#152012 Fix assert fail on opening date picker
The problem was that an assertion failure was happening when opening date picker field in a DOCX file. This is the simplified form of the stack trace:
1 __pthread_kill_implementation pthread_kill.c:44 2 __pthread_kill_internal pthread_kill.c:78 3 __GI___pthread_kill pthread_kill.c:89 4 __GI_raise raise.c:26 5 __GI_abort abort.c:79 6 __assert_fail_base assert.c:92 7 __GI___assert_fail assert.c:101 8 o3tl::span::operator[] span.hxx:83 9 OutputDevice::ImplLayout text.cxx:1396 10 OutputDevice::DrawTextArray text.cxx:948 11 Calendar::ImplDraw calendar.cxx:71 12 Calendar::Paint calendar.cxx:1133
The problem was caused by an out of bound access to a vector of integers, and to fix that, you had to see why this happens, and fix that.
This is the description from the commit title:
Previously, for the 7 days of the week, a 7 letter string
smtwtfs
(depending on the week start day this can be different, but length is always 7) was sent to this method without providing the length, thus the string length: 7 was used. In this case, positions of the letters were calculated and used from other array namedmnDayOfWeekAry[7]
.mnDayOfWeekAry[k+1]
was used as the position of letterk
(k=0..5). In this case, there was 7 letters for 7 days, and only 6 positions provided by the array. This caused assertion failure inspan.hxx:83
when trying to access mnDayOfWeekAry[7] viao3tl::span<T>::operator[]
.
As you can see, a false assumption was creating assertion failure, and the fix was essentially changing the code based on the corrected assumption.
Sometimes, the fix can be easier. As an example, by only checking a pointer to make sure that it is valid, the assertion failure does not happen. Therefore, to fix the problem, you have to carefully study the code and its behavior.
Final Notes
Many of the assertion failure bugs are fixed in daily works of the developers, before causing problems for the end users, which use the “release” version of the software, that do not crash because of the assertion failure. But there are some of these issues remaining.
Do you want to help fix some of the remaining issues? Then, please refer to the list here, read some bug report, and pick one: