Telemetry required? Ask users first!
In this article, I will discuss the recent problems with compiling LibreOffice using Microsoft Visual Studio, things that I did to debug and find the root cause, the source of problem itself – which is problems in Microsoft’s telemetry – and how I could fix it.
Describing The Problem
Recently, I was encountering a problem when configuring LibreOffice’s source code before compilation. Sometimes, random errors appeared without further details on why. The title: “powershell.exe” was also strange, as I wasn’t using PowerShell directly.
At first, I ignored the message, but then it become more error common, and at some point the configuration was aborted. I ignored that for a while, but after a few days, one of the mentees reported a somehow similar problem.
The error was that the UCRT (which is Microsoft Visual Studio C++’s standard C library), was not found. This is an error log:
$ ./autogen.sh . . . checking for Windows SDK... found Windows SDK 10.0 (/cygdrive/c/PROGRA~2/WI3CF2~1/10) checking for midl.exe... C:\Program Files (x86)\Windows Kits\10\/Bin/10.0.20348.0/x64/midl.exe checking for csc.exe... C:\Windows\Microsoft.NET\Framework\v4.0.30319\/csc.exe checking for al.exe... C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.8 Tools\/al.exe checking .NET Framework... found: C:/PROGRA~2/WI3CF2~1/NETFXSDK/4.8/ checking whether jumbo sheets are supported... yes checking whether to enable runtime optimizations... yes checking for valgrind/valgrind.h... no checking for sys/sdt.h... no checking what the C++ library is... configure: error: Could not figure out what C++ library this is Error running configure at ./autogen.sh line 321.
Checking the Error Logs
The important log that contains the output of the configuration is the config.log file. In this file, I could see these related lines:
... configure:19511: result: no configure:20052: checking what the C++ library is configure:20078: C:/PROGRA~1/MIB055~1/2022/COMMUN~1/VC/Tools/MSVC/1430~1.307/bin/Hostx64/x64/cl.exe -c -IC:/PROGRA~2/WI3CF2~1/10/Include/ucrt -IC:/PROGRA~2/WI3CF2~1/10/Include/ucrt -IC:/PROGRA~1/MIB055~1/2022/COMMUN~1/VC/Tools/MSVC/1430~1.307/Include conftest.cpp >&5 conftest.cpp C:/PROGRA~1/MIB055~1/2022/COMMUN~1/VC/Tools/MSVC/1430~1.307/Include\cstddef(12): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30711.2 for x64 Copyright (C) Microsoft Corporation. All rights reserved. ...
The strange thing was that I could configure that compilation with another Cygwin terminal with slightly different settings. To find the differences, I used the command export to see the values of the environment variables in the two configured terminals, and compare them using diff.
Then, I found that I could evade the problem by setting this environment variable. This was the environment variable from one of the terminals:
export CYGWIN="disable_pcon" https://cygwin.com/cygwin-ug-net/using-cygwinenv.html
Unfortunately, this was not the case for our mentee who has the same problem. I also knew that this approach may lead to performance degradation.
Looking Further Into the Details
I tried to look further into the details of configure.ac, and debug to understand the root cause of the problem. At first, I changed the version manually in configure.ac, and the configuration actually worked! If you take a look into find_ucrt() function, the relevant part is:
PathFormat "$(win_get_env_from_vsdevcmdbat UniversalCRTSdkDir)" UCRTSDKDIR=$formatted_path UCRTVERSION=$(win_get_env_from_vsdevcmdbat UCRTVersion)
Setting the PathFormat and UCRTVERSION to something from a good build fixed the problem: configuration and make went smooth, and finished successfully.
Then, I tried to look into win_get_env_from_vsdevcmdbat() function. As the name implies, it runs the VsDevCmd.bat, and uses the contents of the two environment variables: PathFormat and UCRTVERSION.
This function creates a batch file in the temporary folder, runs it and gets the output, and then removes it. So, removed the removal part, and saved the created batch files.
I was skeptical about the commands that were processing the outputs of the batch files, so I tried to change them a little, but that didn’t help. The nice thing was that each of them were working fine. I ran them several times, but there was no problem! Then I decided to run them exactly one after another, and then I saw that sometimes there was no output.
Finding the Root Cause
At the point, I was almost certain that the problem was from the VSDevCMD.bat itself, but I didn’t know why, and how to fix that. So, I took a look into the script, and guess what: the problem was from the telemetry! If the variable VSCMD_SKIP_SENDTELEMETRY is not set, the command line tries to open a PowerShell script, and send data to Microsoft! That was the source of problem. This is the relevant part of the code:
@REM Send Telemetry if user's VS is opted-in if "%VSCMD_SKIP_SENDTELEMETRY%"=="" ( if "%VSCMD_DEBUG%" NEQ "" ( @echo [DEBUG:%~nx0] Sending telemetry powershell.exe -NoProfile -Command "& {Import-Module '%~dp0\Microsoft.VisualStudio.DevShell.dll'; Send-VsDevShellTelemetry -NewInstanceType Cmd;}" ) else ( START "" /B powershell.exe -NoProfile -Command "& {if($PSVersionTable.PSVersion.Major -ge 3){Import-Module '%~dp0\Microsoft.VisualStudio.DevShell.dll'; Send-VsDevShellTelemetry -NewInstanceType Cmd; }}" > NUL ) )
To fix that, I used the value 1 for the variable to opt out of telemetry:
set VSCMD_SKIP_SENDTELEMETRY=1
This change is now merged into the LibreOffice code:
So, the problem should be fixed by now.
Best Practices for Doing Telemetry
It took a lot of time to debug and find the root cause of the problem. I think the best way to avoid causing problems for the users of the Visual Studio would be asking for the users’ consent before activating the telemetry.
I agree that there are legitimate or justifiable reasons to do telemetry, but getting the users’ consent is very important before sending data back to the corporate servers.
In LibreOffice, we consider users the top priority, and we are bound to the best practice of: “Telemetry required? Ask users first”, and we ask others to do the same.
Agree with you
I understand that telemetry can be useful for several reasons.
But in specific environments (like offline environment, yes there still are) or special cases (I hate spaces in directory and file names, and those are usually the root of many other problems) having a piece of code that you are not aware of can produce very heavy headaches and will be a huge waste of time to figure out if it interfere with your job because of internal errors.
What if tomorrow telemetry servers shut down?
In my opinion they should be opt-in. At least when they fail they must fail silently without affecting the rest of the process.
Then we can discuss on what’s a legit and clear use of this.