Integer data types improvement – EasyHack

Many different data types are used in LibreOffice code. During the long history of the LibreOffice, and before that in OpenOffice, there were integer data types that are no longer in use today. The task I discuss here is to choose appropriate data types to use instead of sal_uLong and similar deprecated integer data types.

Integer Data Types in LibreOffice

One of the old deprecated integer data types is ULONG, which then converted to sal_uLong. The latter, sal_uLong is still problematic, because it can be different on distinct platforms. Being “long” does not mean that it can fit to everything. It should be reviewed one by one, and replaced by another suitable data type. This EasyHack is focused on this change:

As sal_uLong is unsigned, usually an unsigned type like size_t, sal_uInt16, sal_uInt32, or sal_uInt64 can be suitable, but this is not always the case. Sometimes you should use signed types according to context. There are even cases that using floating point types like double is the correct choice.

Finding Instances to Change

Finding instances is easy. You can simply use grep to find the remaining instances:

$ git grep sal_uLong

Using the count.sh script provided in the EasyHack page, you can count the number of remaining instances in each folder of the LibreOffice core source code. It is defined as:

for d in */
do
        cd $d
        count=$(git grep sal_uLong *.cxx *.hxx|wc -l)
        if [ $count -ne 0 ]
        then
                echo -n $count
                echo ": $d"
        fi
        cd ..
done

It would be good if you start from some of the folders with less number of changes required, in order to reduce the number of remaining folders.

$ ./count.sh | sort -h
1: dbaccess/
1: unotools/
2: desktop/
2: drawinglayer/
3: framework/
6: svl/
8: svx/
11: toolkit/
13: compilerplugins/
25: starmath/
33: svtools/
34: filter/
61: include/
111: sd/
416: vcl/
593: sc/
635: sw/

To find instances inside a specific folder like toolkit/, mention it after sal_uLong in grep command, like this:

$ git grep sal_uLong toolkit/

Beware to preserve the capital letter L in sal_uLong.

Choosing Data Types

The main issue here is to find a specific integer data type that can replace sal_uLong, so that it can handle all the possible values in foreseeable scenarios.

You should look into where the data type is used to get the idea of the possible values that are stored in the variable, and are read later. Sometimes, it is obvious from the context. For example, as describe in Bugzilla, in below commit, data type for the positions of a SvStream is chosen:

Here, sal_uInt64 is chosen because the files that are read and write via SvStream can be larger than 4 GB. As an example, 32 bit unsigned integer sal_uInt32 can only handle size as big as ~2^32 which equals to something around 4*10^9 B = 4 GB. With a 64 bit unsigned integer, the possible size is much larger, and suitable for the purpose.

Using Return Types of Functions

If the variable is filled from the output of a function, then the data type of that function can be suitable for the variable. This may not be always the case, and you should have in mind that sometimes you also need to change the return types of the functions.

Using auto keyword and an IDE

Sometimes, you can use auto for the data type, and use your IDE capabilities to conclude the data type, and use mouse over or similar actions to find the provided data types.

Many C/C++ IDEs support this feature. For more information on how to setup an IDE, please refer to this wiki article:

Doing the Change

To actually do the change, you have to replace sal_uLong with the integer data type that you have chosen. But this is not the end! Sometimes you have to change many other places, like data type for return types in functions, member variables in classes, and many other places. That may also trigger another set of changes, where those functions or variables are used.

To get a better understanding on what is needed to be change, you can use a very handy feature of your IDE: “find usages”. This feature may be provided with different names, but is usually available when you right click on the variable/identifier name. For example:

  • Qt Creator: “Find reference to symbol under cursor”
  • Visual Studio: “Find all references”
  • Visual Studio Code: “Go to references”

You should look for similar functionality in the IDE of your choice.

Keeping the Change Minimal

Please try to keep the changes minimal, and limit the changes to 1 or at most a few files. Otherwise, you may end up modifying several files, facing a difficult to mange change. Such a change would not be suitable here. That is because the goal of the EasyHack is to give you the opportunity to change small parts of the code to gain better understanding of the LibreOffice developments at early stages.

There are rare cases, where such large changes succeed. For example, look at this change:

This is a huge change. Although it is a spin-off of this EasyHack, it was eventually done as a fix to another bug, visible with the symptom that the line count was resetting to zero after 65535. Therefore, please keep your change minimal in this EasyHack, and postpone larger changes to the time when you have accomplished several difficultyBeginner EasyHacks.

Compromise: Keeping Some Deprecated Integer Data Types

It is not always possible, or easy to remove all the deprecated data types like tools:Long. Sometimes, you have to keep them, and there are even situations that you have to convert sal_uLong to tools::Long. This is fine for now, as tools::Long and other tools:: data types are still in use extensively. You can count:

$ git grep tools::Long *.cxx *.hxx|wc -l
16481

Writing the Commit Message

In the commit message for this EasyHack, please justify your selection of data types briefly. Please do not describe the data types themselves, but the reasoning behind your actual choice of the data type for the variables in place of sal_uLong.

You can also refer to the below blog post to understand how to write good commit messages:

Final Words

To do this task, you need to be able to build LibreOffice from source code, and send your changes to Gerrit. To do that, you can refer to our getting started guide:

Getting Started (Video Tutorial)