Writer tables converted to plain text – difficultyInteresting EasyHack

If you copy contents from LibreOffice Writer to a plain text editor like gedit or Notepad, you will see that it does a straightforward thing: It copies the text and some basic formatting like converting bullets to ‘•’. For the Writer tables, the conversion is very simple right now: every cell is written in a separate line.

For example, if you have a table like this:

A | B
--|--
C | D

When you copy the table from LibreOffice Writer and paste it into a plain text editor, it will become something like this, which is not always desirable.

A
B
C
D

It is requested that like LibreOffice Calc, or Microsoft Word, and many other programs, the copy/paste mechanism should create a text like this:

A	B
C	D

The columns are separated by <tab>.

This feature request is filed in Bugzilla as tdf#144576:

Code pointers for Handling Writer tables

There are many steps in copy/pasting, including the data/format conversion and clipboard format handling. Here, you have to know that the document is converted to plain text via “text” filter.

The plaintext (ASCII) filter is located here in the LibreOffice core source code:

Therefore, to change the copy/paste output, you have to fix the ASCII filter. That would also provide the benefit that plain text export will be also fixed as requested here.

In this folder, there are a few files:

$ ls sw/source/filter/ascii/
ascatr.cxx parasc.cxx wrtasc.cxx wrtasc.hxx

To change the output, you have to edit this file:

In this file, there is a loop dedicated to create the output.

// Output all areas of the pam into the ASC file
do {
    bool bTstFly = true;
    ...
}

Inside this loop, the code iterates over the nodes inside the document structure, and extracts text from them. To check for yourself, add the one line below to the code, build LibreOffice, and then test. You will see that a * is appended before each node.

SwTextNode* pNd = m_pCurrentPam->GetPoint()->GetNode().GetTextNode();
if( pNd )
{
+   Strm().WriteUChar('*');
    ...
}

For example, having this table, with 1 blank paragraph up and down:

A | B
--|--
C | D

You will get this after copy/paste into a plain text editor:

*
*a
*b
*c
*d
*

To fix the bug, you have to differentiate between table cells and other nodes. Then, you should take care of the table columns and print tab between them.

To go further, you can only add star before table cells:

if( pNd )
{
    SwTableNode *pTableNd = pNd->FindTableNode();
+   if (pTableNd)
+   {
+       Strm().WriteUChar('*');
+    }
    ...
}

You can look into how other filters handled tables. For example, inside sw/source/filter/html/htmltab.cxx you will see how table is managed, first cell is tracked and appropriate functions to handle HTML table are called.

For the merged cells, the EasyHacker should first checks the behavior in other software, then design and implement the appropriate behavior.

Final Words

To gain a better understanding of the Writer document model / layout, please see this document:

This presentation is also very helpful to gain a good understanding of Writer development:

Introduction to Writer Development – LibreOffice 2023 Conference Workshop, Miklos Vajna

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.