Documents reproducible step-by-step progress to safely extract the malware payload from the tainted XZ Utils release tarball.
https://hackmd.io/@cve-2024-3094/how-to-extract-the-malware-payload
The following conditions must be met in order to run this tutorial:
In this tutorial the Docker container runtime is used, though other virtualization solutions may be used with minor modifications to the process.
Note that the command examples in the following sections assumes that you're using a Bash shell, you may need to translate it to the equivalent variants of your specific shell when running them.
The following are the environment that reproduces the tutorial during the writing process:
Ubuntu 24.04
Ubuntu 22.04
24.0.5
Most of the following steps are required to be executed in a text terminal, launch your preferred text terminal emulator application to do so.
To avoid accidental usage the tainted software, we should store all files that may be malicious in a specific folder (especially not your Downloads folder).
Use your preferred file manager application or run the following shell command to do so:
Run the following command to switch your working directory in order to minimize the keystroke required to refer files in that directory:
To reduce the time required for the Update the container system to avoid zero-day exploits step during each tutorial reproduction session, fetch the latest Ubuntu 22.04 container image from the Docker registry by run the following command as root:
The command should have either one of the following similar output, depending on whether you already have the latest specified container image downloaded to your local host:
Note:
If you are in a networking environment where the Internet access is only available through a specific HTTP/HTTPS proxy service, you need to merge the following JSON dictionary keys and values with your Docker Daemon configuration file:
and restart the Docker daemon to let the configuration change become effective. Refer the Configure Docker to use a proxy server | Docker Docs official documentation for more information.
Security implications:
It would even be safer if a virtual machine is created for this case, as the isolation of the guest and the host system is even better.
This is still isn't 100% safe though, given that virtual machine escape exploits still exist. Doing the work in another non-critical host machine with a likely malware-incompatible CPU architecture(like ARM or RISC-V) would be your best bet of safety.
Run the following commands as root to launch a ephemeral Docker container for the inspection of the malicious payload:
Note:
array=(...)
command is in the GNU Bash shell scripting language's indexed array assignment syntax, the above command example uses this notion to add descriptive comments for the command-line options and their arguments."${array[@]}"
notion is [one of the Bash scripting language'sansion syntax](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html), which will be replaced by each of the double-quoted array members, separated by a space character after expansion. You can prepend the echo
command before the command containing such notion(e.g. echo "${array[@]}"
) to preview the expanded result.
Note:
If you're in an network environment where Internet access is only
available through an HTTP/HTTPS proxy, run the following commands
before the invocation of the docker run
command:
docker_run_opts+=(--env http_proxy=__HTTP_PROXY_URL__)
docker_run_opts+=(--env https_proxy=__HTTPS_PROXY_URL__)
You may omit part of the argument of the --env
docker run
command-line option after the equal sign(not including the trailing
closing parenthesis) if you already have the environment variables set
in you host machine.
Note:
By default the Ubuntu container image has the timezone set to the Coordinated Universal Time(UTC), if you're in a different timezone you can run the following command to fix the timezone settings:
Replace the std placeholder of the TZ
environment variable's value
to a suitable time zone abbreviation, refer List of time zone abbreviations - Wikipedia
for the full list of them or simply use LOC
as it doesn't really
used other than displaying such abbreviation to the user so it doesn't
really matter much.
Replace the offset placeholder of the TZ
environment variable's
value to the time value you must add to the local time to get a UTC time. It has a [+|-]hh[:mm[:ss]]
format where the square bracket pair denotes optional field and the pipe(|
) character denotes possible alternations, and for the hh
, mm
, and ss
fields leading zeros can also be omitted.
For example for users in Taiwan(UTC+8) a proper TZ
value would be CST-8
, as the time zone abbreviation is CST
and minus 8 hours must be applied to the local time to get the UTC time.
Security implications:
You may not need to run the docker run
command as root if you've set the proper permission to access the Docker daemon control socket, however, this setup also has security implications that need to be taken care of.
Warning:
It is recommended to access all data from the potentially evil actor from within the container/virtual machine's isolation from this point as such data may contain malicious logic that may compromise your host system.
By default the Ubuntu docker container image has the archive.ubuntu.com
software archive URL configured in the APT software package management
system software sources list, however the servers of this address are
located in England and United States(according to the ip
address-geolocation lookup results):
…and thus the package download speed will be limited if you don't live in one of these regions.
To fix this problem we can switch to your local country representative mirror service (using the country_code.archive.ubuntu.com domain, where country_code is a ISO 3166-1 alpha-2 code) or one of your local regular mirror services of the Ubuntu software repository archives by running the following commands as root:
As the software sources are modified we need to refresh the APT software package management system's local cache the make the modification effective, run the following command as root should do so:
As the release archive to inspect is potentially dangerous, we should fully update our container system to reduce the possibility that a 0day exploit may be used to compromise our container system (and in turn, our working host system).
Run the following command as root to achieve so:
Security implication:
Note that the currently running container process(the Bash shell you're pasting commands to) is still unpatched and may still be exploited by the attacker, to mitigate this risk launch a subshell by running the following command:
If you want to avoid the mitigation please avoid source
-ing or .
-ing any scripts or non-script files in the project.
Run the following command to change the working directory to the bind-mounted working directory:
The upstream release archives are not accessible right now as GitHub disabled the access to the upstream project Git repository.
Fortunately, with the help of the Wayback Machine and some other third-party backups we are still able to retrieve them as well as the PGP signature that can be used to verify its authenticity (to the extent of an incomplete PGP web of trust).
You may locate the files in the Wayback Machine search results page for the https://github.com/tukaani-project/xz/releases/download/* URLs, and some, other, sources, or simply download the files from the following currated list using your preferred web browser application:
Note:
As the XZ Utils release archives fetched from the Wayback Machine and other backup sources doesn't necessary be unmodified or even released by the evil actor themselves, we must verify their authenticity.
We can achieve so by using the Pretty Good Privacy(PGP) public key of the evil actor as well as the PGP signature files distributed along with the release archives.
First, we need to install the software required for verifying PGP-signed documents, The GNU Privacy Guard(GnuPG) is the one that is mainly used on a GNU+Linux operating system, let's install it by running the following command as root:
We also need to have a copy of the PGP public key of the evil actor, which can be obtained from the Wayback Machine.
Simply copy the entire -----*PUBLIC KEY BLOCK-----
(including the beginning and ending marker lines) and save the content to the "potential-evil-actor.pubkey" new file in the working directory using a plaintext editor should suffice.
To verify documents signed with PGP we must first import the evil actor's PGP public key to the GnuPG keyring, run the following command will do so:
The following output should be displayed:
Note:
gpg: key 59FCF207FEA7F445: 1 signature not checked due to a missing key
warning message is due to the fact that the public key of the unknown person that signed the evil actor's PGP keypair is not in your keychain, which is an expected result as the keychain does not exist in the first place.gpg: no ultimately trusted keys found
warning message is due to the fact that the PGP web of trust is missing for this particular public key, which is expected as the web of trust requires that you have your own private/public key pair and has signed(trusted) either the evil actor's keypair, or other people's keypairs who (directly or indirectly) have signed the evil actor's keypair, which does not exist in the first place. How to satisfy such trust model to get rid of the warning, however, is out of the scope of this tutorial.Run the following command to verify the release archive's authenticity:
The following output should be displayed:
Note:
gpg: Good signature from "_display_name_ <_email_address_>..."
output indicates that the PGP-signed file is successfully verified. The [unknown]
part at the end of such line indicates that the trustworthyness of the keypair that signed the file is determined to be "unknown" according to the the PGP web of trust.WARNING: This key is not certified with a trusted signature! There is no indication that the signature belongs to the owner.
warning message is due to the fact that the PGP web of trust is missing for the evil actor's public key, refer the gpg: no ultimately trusted keys found
warning message note above for more info.Now we know that the release archive is indeed signed by the potential evil actor, the result of the further extraction process should be reproducible by everyone as long as they also retrieves the same file.
Now we shall extract the tainted XZ Utils release archive, according to the .tar.bz2
filename extension the release archive is a bzip-compressed tarball, you can run the following command as root to install software that is needed for extracting such files:
Run the following command to extract the XZ Utils release archive:
Note:
It is recommended to use in-container utilities to inspect the files from the potential evil actor as you may unintentionally run the programs inside of it(by the GUI double-click mechanism) or have your system compromised due to a vulnerability of one of your host system's components being exploited.
We can rebuild the unmodified aforementioned build system files from the source tree checked-out from the upstream Git repository to compare what actually did the evil actor do to the files.
The firt step would be to install the Git version control system software, you can run the following command as root to do so:
then run the following command to clone the upstream Git repository to the local host and check-out the 5.6.1 version from it:
Let's change the working directory back to initial one, as we no longer need to perform Git operations:
Now we can read the Installation document(INSTALL) in the XZ Utils source tree, however we need a pager utility to do so. As the pager utility available in the Ubuntu container by default is more
which is limited in browsing features(e.g. page up), let's install the much capable less
pager by running the following command as root:
Security implication:
Never use the cat
command to read a file, the file may contain escape code that may be interpreted by your terminal, which may have unintended or even malicious results. Using a pager utility(e.g. less
) to mitigate such risks.
Then we can read the installation document in the XZ Utils source tree by running the following command:
Note:
Press the q
key on your keyboard to leave the less
pager program.
According to the Preface section of the XZ Utils installation document, XZ Utils uses the GNU build system to build the software, which may include the following (including, but not limited to) software components:
Makefile.in
files compliant with the GNU Coding Standards.(The XZ Utils software also implements the CMake build system as an alternative build system, however, since the malware injection is not made into that portion it is out of scope of this tutorial).
It would be more helpful if we build the build system files using the exact same version of software that the evil actor do as it will introduce less noise during the content comparison operation(to figure out what actually the evil actor do to inject the malicious code).
We can determine the version of the GNU Autoconf build tool used by the software(assuming that it is not obfuscated by the evil actor), by checking the heading comment of the build configuration program of the released source code by running the following command:
which should have the following output, indicate that the Autotool version used for building the build files is likely 2.72:
We can determine the version of the GNU Automake build tool used by the software(assuming that it is not obfuscated by the evil actor), by checking the heading comment of the Makefile input file of the released source code by running the following command:
which should have the following output, indicate that the Automake version used for building the build files is likely 1.16.5:
We can determine the version of GNU Libtool used by the software (assuming that it is not obfuscated by the evil actor), by checking the heading comment of the build-aux/ltmain.sh file of the released source code by running the following command:
which should have the following output, indicate that the GNU Libtool version used for building the build files is likely a modified version of 2.4.7.4(with the revision fingerprint 1ec8f):
After checking the revision history of the GNU libtool project you can realize the actual version of it is the fourth revision beyond 2.4.7 version tag(as the commit hash(1ec8f) matches). The -dirty
version string suffix indicate that the libtool source may have additional unknown changes that deviate it from the 1ec8f revision.
Upon inspection of the configure.ac GNU Autoconf input file we can notice that the XZ Utils software also make uses the GNU Gettext internationalication(I18N) support library:
As GNU Gettext will also generate build system files to the source tree, we must build the exact same GNU Gettext version the potential evil actor used as well, which can be found via inspecting the header of the xz-5.6.1/m4/gettext.m4 file by running the following commands:
, according to the following output, the GNU Gettext version used to build the 5.6.1 XZ Utils seems to be 0.22.4:
While the Ubuntu software distribution may provide these software with the required versions, they may also introduces additional changes on their own that will complicate the difference comparison process, so we'll have to build and install these softwares from the source code manually.
The installation of the GNU Autoconf software can be done by following the Downloading Autoconf section of the Autoconf project page to locate the download URLs of the 2.72 release archive and its respective PGP signature file and download them to the working directory using your web browser.
Then, as usual, verify that the release archive we've downloaded are actually made by the Autotools project maintainer. We can determine who actually signed the release archive by running the following command:
You should see the following error message from GnuPG:
The PGP signature can't be checked as we don't have the signer's public key in our PGP keyring, we can retrieve the public key using the key's ID(82F854F3CE73174B8B63174091FCC32B6769AA64) from the previous output by running the following command:
You should see the following command output:
which indicates that the one who signed the release is supposed to be Zack Weinberg <zackw@panix.com>. According to the Maintainers section of the Autoconf project page the project maintainer seems to be:
which is wierd as Zack isn't in the list. However, from checking the summary page of the Autoconf's Git repository we can verify that it is indeed Zack Weinberg that prepares the 2.72 release:
We can now proceed to verify the released GNU Autoconf source archive:
Note:
As mentioned in the Verify the authenticity of the tainted XZ Utils release archive section, the following output line indicate that the PGP-signed file is verified successfully:
We can now proceed to extract the Autoconf source archive, but first we need to install the (hopefully not malicious :P) XZ Utils software required to do so, run the following command as root:
Then run the following command to extract the Autoconf source archive:
Now we can try building the Autoconf from source. As the build configuration program(autoconf-2.72/configure) will create build files in your working directory let's switch the working directory to the autoconf source directory to avoid writing files outside of it:
Then we can iteratively run the build configuration program to satisfy the build requirement and finally, generate the Makefile to actually build and install the software. Run the following command to do so:
The Autoconf's build configuration program prints the following error message, which indicates that it require the GNU M4 software to be installed:
We can install it by running the following command as root:
Now we can run the build configuration program again:
If you see the following output and that the Makefile file appears in the source tree, it means that the build configuration of the GNU Autoconf software has finished successfully:
We need the GNU Make software to read the Makefile to build the GNU Autoconf software, run the following command as root to do so:
Now we can start building the GNU Autoconf software by running the following command:
Note:
$(_command_)
portion of the command is in the Bash scripting language's command substitution syntax, it will be replaced by the content outputted by the execution of the command.${_variable_name_}
portion of the command is in the Bash scripting language's parameter expansion syntax for regular variables, it will be replaced by the variable_name variable's value.if the command's exit status code is zero it means that the build is successful. Run the following command right after the make
command's invocation to verify this:
Then run the following command to install the built GNU Autoconf software to the system:
After the installation of the GNU Autoconf software you should have the
autoconf
command in your command search PATHs(/usr/local/bin to be
specific). You may check the version of the installation by running the
following command:
Let's run the following command to switch the working directory back to the initial one as we have no business to do with the Autoconf source tree anymore:
The installation of the GNU Automake software can be started by locating the download URL of the 1.16.5 release archive and its respective PGP signature file and download them to the working directory using your web browser.
Then, as usual, verify that the release archive we downloaded are actually made by the GNU Automake project maintainer. We can determine who have actually signed the release archive by running the following command:
You should see the following error message from GnuPG:
The PGP signature can't be checked as we don't have the signer's public key in our PGP keyring, we can retrieve the public key using the key's ID from the previous output by running the following command:
It should have the following output:
Note:
If you still can't successfully retrieve the public key it might be the
connection has been blocked by your network's firewall, you could also
try using the hkp://keyserver.ubuntu.com:80
keyserver which should get
through the hostile networking environment.
As shown by the GNU Automake project page, Jim Meyering is indeed the one who released the packages, so the public key we've fetched here is probably legit:
We can now proceed to verify the 1.16.5 GNU Automake source archive:
Note:
As mentioned in the Verify the authenticity of the tainted XZ Utils release archive section, the following output line indicate that the PGP-signed file is verified successfully:
We can now proceed to extract the GNU Automake source archive, run the following command to do so:
Now we can try building the Automake from source. As the build configuration program(automake-1.16.5/configure) will create build files in your working directory let's switch the working directory into the GNU Autoconf 1.16.5 source directory to avoid writing files outside of it:
Then we can iteratively run the build configuration program to satify the build requirement and finally, generate the Makefile to actually build and install the software. Run the following command to do so:
If you see the following output, the build configuration should be completed without errors:
Note:
The build configuration program does printed some warning messages, however all of them seemed to only affect with the automated software testing of the GNU Automake software which we doesn't really need in this tutorial so let's let them slide.
Now we can start building the GNU Automake software by running the following command:
if the command's exit status code is zero it means that the build is
successful. Run the following command right after the make
command's
invocation to verify this:
Then run the following command to install the built GNU Automake software to the system:
After the installation of the GNU Automake software you should have the
automake
command in your command search PATHs(/usr/local/bin). You
may check the version of the installation by running the following
command:
Let's turn our heads back to the initial working directory as we have no longer have business to do with the Automake source tree anymore:
The installation of the GNU Libtool software can be started by locating the download URL of the 2.4.7.4 release archive…until you noticed that there's no such release version called 2.4.7.4:
According to the full version string 2.4.7.4-1ec8f-dirty
we found earlier, it appears that the .4
part of the GNU Libtool version string indicates the development revision that has deviated from the 2.4.7 release by 4 revisions and has the revision fingerprint 1ec8f. By searching the revisions committed to the repository after the 2.4.7 release and matching the revision fingerprints we can found that the GNU Libtool software that created the build system files seems to be based on the "libtool: passthru '-Werror' flags" revision.
We can fetch the GNU Libtool source tree of this specific revision from the upstream Git repository URL listed in the Clone section of the libtool.git Git repository summary page by running the following commands:
Note:
If the following output is printed during the git checkout
command, it indicates that your history fetch depth is too shallow:
you can retry the operation after deepening the history fetch depth by running the following command after switching your working directory to the libtool-git directory:
Now we can try building the GNU Libtool from source. Unfortunately, the INSTALL installation documentation doesn't exist in the source tree and in the URL instructed in the GNU Libtool 1ec8fa2 revision README:
As a fallback option we refer to the current revision of the GNU Libtool README documentation, which now references the existing INSTALL installation document of the GNU Automake software instead:
As the source build bootstrapping program(libtool-git/bootstrap) expects it to be run when your working directory is under the source tree be sure to change the working directory if you haven't done in the previous step by running the following command:
Then run the source build bootstrpping program by running the following command:
which should print the following error messages, indicate that we haven't satisfy their source build bootstrpping prerequisites:
To install the help2man prerequisite, we can run the following command to search the APT software package management system:
, which reveals that there's a package that can be installed:
However this isn't the case for the makeinfo
prerequisite:
Fortunately, we can locate which package provide this prerequisite by using the apt-file
utility. First, install the utility by running the following command as root:
then, run the following command as root to fetch the metadata required by apt-file
's operation:
We can then query the packages that provides the makeinfo
utility by running the following command:
According to the following apt-file
command's output, we can confirm that the textinfo package provides the makeinfo
command:
Note:
We can also use the Ubuntu Packages Search website to query the package that provides the specific file.
We can now run the following command as root to install all the missing packages:
We can now retry the source build bootstrpping program by running the following command:
If you see the following output then the bootstrapping process has completed successfully:
Note:
If you're in an networking environment where the Internet access is only available via an HTTP proxy, the source build bootstrapping program will print the following error:
You can run the following command to patch the gnulib submodule's repository address to workaround the problem:
Then we can iteratively run the build configuration program to satify the build requirement and finally, generating the Makefile to actually build and install the software. Run the following command to do so:
First of all, the build configuration program complained about missing C compiler:
which can be fixed by installing a compatible C compiler like GCC, install it by running the following command as root:
We can now retry the build confiugration program by running the following command:
This time the build configuration should be completed without errors
Now we can start building the GNU Libtool software by running the following commands:
if the command's exit status code is zero it means that the build is successful. Run the following command right after the make
command's invocation to verify this:
Then run the following command to install the built GNU Libtool software to the system:
After the installation of the GNU Libtool software you should have the libtool
command in your command search PATHs(/usr/local/bin). You may check the version of the installation by running the following command:
Let's run the following command to switch the working directory back to the initial one as we have no business to do with the GNU Libtool source tree anymore:
The installation of the 0.22.4 version of the GNU Gettext software can be started by locating the download URL of the 0.22.4 version GNU Gettext release archive and its respective PGP signature file and download them to the working directory using your web browser.
Then, as usual, verify that the release archive we downloaded are actually made by one of the GNU Gettext project maintainers. We can determine who have actually signed the release archive by running the following commands:
You should see the following error message from GnuPG:
The PGP signature can't be checked as we don't have the signer's public key in our PGP keyring, we can retrieve the public key using the key's ID from the previous output by running the following commands:
It should have the following output:
Note:
If you still can't successfully retrieve the public key it might be the
connection has been blocked by your network's firewall, you could also
try using the hkp://keyserver.ubuntu.com:80
keyserver which should get
through the hostile networking environment.
As shown by the GNU Gettext project group memberlist page, Bruno Haible <bruno@clisp.org> is indeed the one of the GNU Gettext project maintainers, so the public key we've fetched here is probably legit:
We can now proceed to verify the 0.22.4 version of the GNU Gettext release archive:
Note:
As mentioned in the Verify the authenticity of the tainted XZ Utils release archive section, the following output line indicate that the PGP-signed file is verified successfully:
Before we extract the 0.22.4 version of the GNU Gettext release archive, we need to install the required software to extract lzip-compressed tar archive files(as depicted by the .lz filename extension) by running the following commands as root:
We can now proceed to extract the 0.22.4 version of the GNU Gettext release archive, run the following commands to do so:
Now we can try building the 0.22.4 version of the GNU Gettext software from source. According to the installation document (gettext-0.22.4/INSTALL), the build procedure is similar to the other GNU build system components, though the build configuration program is already shipped in the release archive so we can directly use that to configure our build.
As the build configuration program(gettext-0.22.4/configure) will create build files in your working directory let's switch the working directory into the GNU Gettext 0.22.4 source tree to avoid writing files outside of it:
Then we can iteratively run the build configuration program to satify the build requirement and finally, generate the Makefile to actually build and install the software. Run the following command to do so:
if the command's exit status code is zero it means that the build
configuration is successful. Run the following command right after
the ./configure
command's invocation to verify it:
Now we can start building the GNU Gettext software by running the following command:
if the command's exit status code is zero it means that the build is
successful. Run the following command right after the make
command's invocation to verify this:
Then run the following command to install the built GNU Gettext software to the system:
After the installation of the GNU Gettext software you should have
the gettext
command in your command search PATHs(/usr/local/bin).
You may check the version of the installation by running the
following command:
Let's turn our heads back to the initial working directory as we have no longer have business to do with the GNU Gettext source tree anymore:
Now that the software dependencies are installed, let's start generate the build system files as in the project maintainers role.
As the XZ Utils autogen.sh program require us to have the working directory set to the checked-out source directory and for not polluting XZ Utils build artifacts outside of its directory, run the following command to make the working directory switch:
…and run the following command to build the build system files:
which will errored with the following message:
We can search for packages that provide the po4a
program by using the aforementioned apt-file
utility by running the following commands:
Which should have the following output, indicating that there's a same-name package that provides such program:
We can satisfy the dependency by running the following commands as root:
The next iteration of the ./autogen.sh
command prints the following error message:
which we can again, use the apt-file
utility to locate the package that provides that program to install. We leave this step to you and simply satisfy the resulting dependency package by running the following command as root:
This time the ./autogen.sh
command invocation should have the following output, indicate that it has successfully generated all the files:
Let's return to the initial working directory as the further operation don't need the current one by running the following command:
According to the oss-security mailing list discussion thread, it is the build system files that is not in the upstream Git repository that is modified to contain the logic to extract the malicious object code file from the test data.
For the ease of comparing differences, let's install a helper utility that supports plaintext data syntax highlighting: bat: A cat(1) clone with wings. We can install it by running the following commands as root:
Note:
Due to a command name conflict the bat
command shipped in older versions of Debian and it's derivatives are renamed to batcat
. To increase the interoperability with other GNU/Linux distributions let's rename it back to the upstream name by running the following commands as root:
We can generate the content differences between the checked-out 5.6.1 version from the Git repository and the potentially evil-actor prepared 5.6.1 version source tree and save it to a file by running the following commands:
Here's the reference result.
Note:
The >_file_
part of the diff command is in the output redirection
syntax of the Bash scripting language, which
saves the content from the standard output device of the command to
the specified file.
By using this feature we can avoid piping the output directly to the bat
utility, which in this case won't be able to apply the syntax highlighting unless we explicitly specify what --language
the input data is in.
Then we can run the following commands to inspect the content differences:
you should be able to see the following output:
Note:
By default the bat
utility will launch the less
pager on data that is over a terminal size, thus the keybindings supported by the less
pager can be used, including but not limited to the following:
q
key to exit the pager program.g
key to jump to the first line of the file.G
key to jump to the last line of the file.
G
key to jump to the specific line(due to the additional formatting caused by bat
, it may not be the exact line that you intend to jump).↑
/ j
key to move down a line.↓
/ k
key to move up a line.Page Up(Pgup)
key to move up a page.Page Down(Pgdown)
key to move up a page.For the content differences we can notice that the tainted XZ Utils 5.6.1 release source tree has its build configuration program and the m4/build-to-host.m4 M4 macro file modified to contain suspicious logic, where the former one is generated by the latter one by GNU Autoconf.
Unfortunately, we aren't experts in reading M4 macros so we shall temporary put them aside and only analyse the resulting build configuration program as it is in a much readable bourn shell compatible shell script format. Let's analyse the build configuration script in a top-to-down manner.
Note: About the unified diff format:
---
indicate the file before the
modification.+++
indicate the file after the
modification.@@
marks a start of a hunk(and the end of a previous hunk, if exists).
-
) indicate the file before modification and the one prepended by a plus character(+
) indicate the file after modification
) indicate the lines aren't changed between the modification.+
) indicate the lines are inserted after the modification.-
) indicate the lines are removed after the modification.Take the following diff hunk as an example:
This diff hunk featuring line 1748 to 1754 of the xz-git/build-aux/config.sub(7 lines in total, including the line 1751 which is removed after modification), corresponding to line 1753 to 1760 of the xz-5.6.1/build-aux/config.sub file(8 lines in total, including the line 1756 and 1757 which are inserted after modification).
Refer Detailed Description of Unified Format | GNU Diffutils manual for more information.
Note:
In case you need another terminal window to read the full file, you can get another container shell by run the following command as root in a separate terminal window:
Let's examine the next hunk of the build configuration program:
The 18686 line assigns the gl_am_configmake
variable with the standard output of the following command:
Note:
The `command`
portion of the command is in the aforementioned Bash scripting language's command substitution syntax, but is a deprecated format of $(command)
.
We can expand the command as the following commands:
The #{4}[[:alnum:]]{5}#{4}
(POSIX) extended regular expression matches any strings that has the following elements in the following
order:
#{4}
)[[:alnum:]]{5}
)#{4}
)Running the command will reveal the assigned variable value to be:
This is already alarming as the software testing data file shouldn't have business to do with the software build itself.
The if
block at line 18687~18691 checks whether the
gl_am_configmake
variable is a non-null string(whether the grep
command can match the special search pattern in the software source
tree) and assigns the HAVE_PKG_CONFIGMAKE
variable to 1 when it is.
This is likely to avoid the build configuration program errors when
the maliciously crafted test data does not exist in the source tree
for any reason (like the file being removed in a future time), which
may attract people's attention.
Note:
People that is familiar with the build configuration program generated by GNU Autoconf may notice that the if
condition uses a test feature (-n
) that is not normally used in a build configuration program due to shell compatibility(we will notice the proper way to do so in the following sections). This increases the suspiciousness of the change.
The line 18695 assigns the tr "\t \-_" " \t_\-"
string to the
gl_path_map
variable, which seems to be a tr
command camouflaged
to translate some data using the following rules:
\t
) to a space character.
) to a tab character.\-
) to a underscore character.Then, for the following difference hunk:
The wrongly indented 19886 line:
assigns the gl_localedir_prefix
variable to the output of the following command:
, which, after the parameter expansion and quotre, becomes:
This command essentially calls the sed
plaintext data manipulation
utility to do a search & replace:
.*\.
basic regular expression.
.*
basic regular expression pattern matches zero or more occurrences of the .
RE pattern(which matches any single character).\.
basic regular expression pattern matches a literal .
character., which filters out the xz
filename extension portion from
the xz-5.6.1/tests/files/bad-3-corrupt_lzma2.xz
filename, as you can
noticed in the later section it is a sneaky way to insert the xz
command call.
Note:
The \.
substring in the "s/.*\.//g"
double-quoted string expands
to literal \.
instead of .
due to the following behaviors
documented in the Bash reference manual:
Let's examine the next hunk of the build configuration program, which strangely features a big chunk of blank lines:
The 19903~19907 if
block checks whether the gl_am_configmake
variable's value is not a null string(essentially the -n
option of the test
command used in the aforementioned hunk).
Note:
This is what a GNU Autoconf-generated build configuration program
would normally do in order to check whether a string is not null, as
opposed to using the -n
command-line option of the test
builtin
command mentioned previously.
When the variable's value is not a null string, it sets the gl_localedir_config
variable to the following single-quoted string:
We'll keep this string unprocessed for now until it is actually evaluated by the shell interpreter.
The 19930 line appends the build-to-host
string to the
ac_config_commands
variable, which, seems to be define a GNU
Autoconf build configuration command that will be run at the end of
the build configuration. I'm not entirely sure what effect does it create so let's just leave this one at the moment.
For the next diff hunk:
The 23924~23928 line do some sort of the translation to the content of the previously defined variables using an sed expression that is defined in the line 8403~8404 as the following:
After the script interpreter's escape character interpretation and quote removal the actual sed expression can be determined to be:
According to the s
command section of the GNU sed manual, only the \\
substring in the replacement portion of the s
sed command is treated as a escape sequence and is interpreted as a single backslash character(\
). As the result the sed expression will search for every occurrences of the single-quote character in each input line and replacing it to '\\''
This operation seems to be GNU Autoconf workarounding the shell interpreter behaviors and is not related to the backdoor itself, we'll ignore it as of now.
Line 19896 assigns the value of the gl_localedir_config
to the following command:
which seems to be a shell command pipeline that can be further expanded to:
I won't able to figure out what the weird sed
command do so I simply look it up for now. According to Daniel Feldman the sed
command is essentially a disguised cat
that simply outputs the content of the bad-3-corrupt_lzma2.xz file to the rest of the pipeline, which does the aforementioned character translation and decompress the result using the xz
utility.
What does it extracts to? A shell script!
Let's ignore the multiple attempts to terminate the script when the Operating system isn't Linux, in the next portion of the commands:
the script attempts to locate the root directory of the XZ Utils source tree via the srcdir
variable set in the config.status
GNU Autoconf build intermediate file as the build configuration program may not be run in the root directory of the source tree in some software build scenarios(like in distribution packaging).
In the last portion of the script commands the following commands are executed:
Smells like another round of code/data obfuscation, let's divide and conquer the command pipeline.
In the first component of the command pipeline:
the script decompresses the seemingly benign tests/files/good-large_compressed.lzma test data to the command's output.
The i
shell variable essentially houses a subshell of AND LIST commands to run, after doing the parameter expansion:
The eval
command interprets the following string as a script. It doesn't seem to matter much in this occasion so let's drop it for now:
As the name indicates, the subshell markup launches another shell interpreter process as the current shell shell interpreter's sub-process, and run the enclosed commands inside that process. The shell interpreter will receive the output from the previous component of the command pipeline, process it using the commands in the sub shell, then output them to the standard output device(stdout), which is then redirected as the standard input(stdin) data of the next component in the pipeline.
The subshell commands simply:
The next shell pipeline component:
filters the bytes before the 32133 byte.
The next shell pipeline component:
shuffles the data that contains
The next shell pipeline component:
Decompresses the data by assuming the data is a raw LZMA1 stream, which is a very specific decompression parameter combination.
By running the pipeline but instead of piping to /bin/sh, redirect the output to a file we can retrieve the deobfuscation result:
Surprise, surprise! Another shell script!
This time it's also a heavily obfuscated shell script as well, discourage researchers for digging deeper into the abyss.
As analysing the script requires deep understanding of the following subjects I have to give up and just look for answers from other people now:
I would suggest checking out the research!rsc: The xz attack shell script article by Russ Cox who have explains what the segments in this script (may) do.
The following, included but not limited, people helps during the writing of this tutorial:
During the writing of this tutorial, the following third-party materials are referenced:
--receive-keys
gpg
command-line option.autopoint
command will do.dpkg-divert
(1) manual pagedpkg-divert
command to rename a file installed from dpkg.diff
Output Formats - GNU Diffutils manualeval
- Bourne Shell Builtins - Shell Builtin Commands - Bash Reference Manualeval
Bash builtin command works.s
sed command.-c
command-line option functions.-c
command-line option functions.tr
interprets the \NNN octal number sequence.This work is released under Public Domain, refer the workspace homepage for more details.
This work is initially written by 林博仁(Buo-ren Lin), attributions will be appreciated.