In the first part of the series on string types in LibreOffice, I discussed some of the string data types that are in use in various places of the LibreOffice code. I discussed various character and string data types briefly: OString, OUString, char/char*, sal_Unicode, sal_Unicode*, rtl_String, rtl_uString and also std::string. Now I want to explain string literals.
In C/C++, a string literal is a sequence of characters in double quotations, and represent read-only textual data. For example:
const char *str = "abc";
Please note that it is different from a character literal, which is a single character in single quotation marks:
const char c = 'a';
The non read-only version of these data types does not have const in it.
The char* data type is widely used in C programming language, but it is not the data type of choice in LibreOffice. As described in my previous post, OString is used for for 8-byte text, and OUStringis used for Unicode text in LibreOffice. It is worth noting that it is possible to store UTF-8 encoded Unicode text in OString.
In the past, it was possible to convert the const char* literal to OString/OUString like this: (it will not compile now)
It was not an efficient way to define and use such strings. A read-only memory is used to store the plain string literals. But then, a new dynamic memory chunk is allocated on the heap to store the new O[U]String object, and through the constructor, that read-only memory is copied into that memory. Also, the new OUString needs reference counting. These are non-necessary expensive operations, and we should avoid them.
In LibreOffice, OStringLiteral and OUStringLiteral are the data types used to represent string literals for ASCII and Unicode data, respectively.
As an example, you can see lines like this in LibreOffice .cxx files:
The constexpr ensures that the expression is evaluated at compiled time, and this can improve the performance of the program. Also, avoiding reference counting in O[U]String helps to make the operation cheaper.
Later, OString/OUString variables are constructed from the OUStringLiterals. Or, they are passed to functions that expect OString/OUString parameters. The difference is that when static constexpr literals are used, the memory used for storing data is not the dynamic memory, it is allocated once, and it is read-only, which increases the performance. This approach is only usable when you work with strings that will be only initialized once, and will not be manipulated later.
String Literals in Headers
If you are working with a .hxx C++ header file, you have to use inline keyword to avoid creating duplicate copies of the global variable. For example:
inline constexpr OUStringLiteral ABC(u"abc");
Later we will see that we can re-write the above with a suffix as:
inline constexpr OUString ABC = u"abc"_ustr;
Essentially, that is a better replacement of the macro:
#define ABC "abc"
const char ABC = "abc";
These are no longer desirable in C++ having the string literals available with the latest C++ standard and new LibreOffice code. Also, it is important to know that the goal is eventually get rid of O[U]StringLiteral data types using the simpler form with suffixes.
String literals with no prefix are single byte strings which consist of 8-bit characters. Multi-byte Unicode string literals have various prefixes used to indicate their types. For example, to represent ABC in ASCII, UTF-8, UTF-16, UTF-32 and wide-char, you need to write:
Now that C++20 has become the baseline for LibreOffice source code, and thanks to Stephan Bergmann, it became possible to simplify the code, and avoid O(U)StringLiteral data type to write it it in a much shorter form, like:
Don’t be afraid of various string types that we discussed here! Most of the time, you will be using OUString. The other types will come up occasionally when you work with different parts the huge LibreOffice source code.
There are still other data types related to working with string like streams, buffers and stringview types that I will discuss in the next part of this series of blog posts.
If you want to know more, refer to the presentation from Stephan Bergmann in LibreOffice conference 2023. He talks about the improvements in C++20 (Class non-type template parameters) that made it possible to simplify the string literals in LibreOffice code: